Big Data and Stream Processing 101 – Part 4: How to Do a Simple Key-Value Enrichment in Apache NiFi

Demo Video

In my previous blog I published a video that showed how to query a database table. In the next step I want to show how to enrich this data with data from an external API.

Again: enjoy and don’t forget to ask questions or give me feedback!

Configuration

  • This is how the LookupRecord processor is configured
  • This is how the RESTLookupService (Pokemon Lookup Service) is configured:
  • This is the expected input after the QueryDatabaseTableRecord before the LookupRecord processor:
  • This is an expected output example after the LookupRecord processor:

Version Information

This demo was created with Apache Nifi 1.10.

Links

Big Data and Stream Processing 101 – Part 1

Big Data and Stream Processing 101 – Part 2

Big Data and Stream Processing 101 – Part 3

Big Data and Stream Processing 101 – Part 3: How to Connect to a RDBMS Using Apache NiFi

Demo Video

Apache NiFi can be used to connect to (almost) any system, process data and put it into (almost) any (other) system.

In this video, I show specifically how to connect to a relational database, in this case MySQL and walk you through the most basic options you have.

Enjoy and don’t forget to ask questions or give me feedback!

Configuration

Here are the screenshots of the NiFi components used for reference:

  • QueryDatabaseTableRecord
  • MySQL Connection Pool

This is the simple database schema I used for the MySQL example table:

CREATE DATABASE company;
CREATE TABLE people(id int, name varchar(20));
INSERT INTO people (id, name) VALUES (1,'stefan');

Version Information

This demo was recorded on November 9, 2019 using the recently published version of Apache NiFi 1.10.

Links

Big Data and Stream Processing 101 – Part 1

Big Data and Stream Processing 101 – Part 2

Setting Up Apache Nifi on a Raspberry Pi

Apache NiFi is part of the Hortonworks Data Flow (HDF) product and manages data flows. The Raspberry Pi is a small, open source, multi-purpose computer.  If you are not familiar with one or more of these products, just follow the links for more information. 🙂

Hardware and Software Specifications

Setup

Impressions and Remarks

  • Docs say that after installation the command
    service nifi start

    should work out of the box, but for me only this works without further modifications:

    /etc/init.d/nifi start
  • After starting, I tried to access the Web Interface, but it didn’t work. I checked the logs, but everything seemed alright. I saw something like the following in the nifi-bootstrap.log
    2016-04-02 21:06:29,563 INFO [NiFi Bootstrap Command Listener] org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for Bootstrap requests on port 47094

    After 6 minutes and 3 seconds, the web interface was available though. As you can see in the screenshot below HDF takes 100% of one core of the RasPi during the start up process:

screenshot_top_nifi
The HDF start-up process occupies one full core of the RasPi

  • After the webserver is up and running, NiFi’s resource usage looks more moderate:

screenshot_top_nifi_running
NiFi needs about 16.7% of (400% of) CPU and almost 40.5 % of the RasPi’s RAM

  • I followed the “Getting Started” where NiFi is configured to have two processors, one of which reads files from the disk, sends them to the other processor and deletes them. The other processor just receives the files and logs their information to the nifi-app.log. Although the name of the processor “LogAttribute” is quite obvious, the official documentation does not provide a description on what it actually does. I found this amazing blog post on a www.nifi.rocks, where quite a lot of processors are described.

test_data
Writing a file, then being deleted by the NiFi GetFile processor 100000 times, then …

hdf_ui
…, then getting transfered to the LogAttribute processor, and finally …

nifi-app.log
… finally the LogAttribute processor logs the incoming FlowFile data in the nifi-app.log.

Conclusion

NiFi is as easy to install on a Raspberry Pi as anywhere else and sticks out with all of its features, being complex but not complicated. I did not test a lot of different processors on the RasPi nor did I test this simple setup with large amounts of data, but even in its simplicity the possibilities are endless. Combining the power and easy of use of the RasPi’s GPIOs with NiFi’s power and simplicity to direct and redirect data (flows), practically every child can, e.g., send temperature sensor data into a Hadoop File System and even process and filter it on its way.