In one of my previous articles I wrote about Apache Metron as an Example for a Real-Time Streaming Pipeline. Since then, I’ve refined the figure I’ve used to explain the architecture. In this article, I just briefly explain the updated part of the figure and add a video of myself talking about Apache Metron at the Openslava conference in Bratislava using those updated figures in my slides.
I added a few more details into the figure on the enrichment part:
- The enrichment Storm topology is capable of using external database sources on-boarded into HBase or from the Model as a Service (MaaS) capability.
- The arrow from the enrichments Kafka topic is not entirely correct, but should depict that data sources coming in in real-time can be stored in HBase as an enrichment source. Correct would be to draw the arrow to HBase directly from the parser topology.
- Huge data sets can be fairly easily batch loaded into HBase as an enrichment source.
- The profiler is a Storm topology that saves data of certain (user-defined) entities in a time series to HBase. From there it can be used as an enrichment for any future events as aggregates over time.