How to Onboard a New Data Source in Apache Metron

Introduction

Apache Metron aims to be a tool for analysts in a cyber security team to help them defining intelligent alerts, detecting threats and work on them in real-time. This is the first blog post in a row to ease operations and share my experiences with Apache Metron. Thus, it serves as an introduction to Metron.

Technical Introduction

Apache Metron is a cyber security platform making heavy use of the Hadoop Ecosystem to create a scalable and available solution. It utilizes Apache Storm and Apache Kafka to parse, enrich, profile, and eventually index data from telemetry sources, such as network traffic, firewall logs, or application logs in real-time. Apache Solr or Elastic Search are used for random access searches, while Apache Hadoop HDFS is used for long term and analytical storage. It comes with its own scripting language “Stellar” to query, transform and enrich data. A security operator/analyst uses the Metron Management UI to configure and manage input sources as well the Metron Alerts UI to search, filter and group events.

Screen Shot 2018-07-18 at 08.07.45 — Metron Alerts UI, showing a few dummy events from a Squid log.

Scope of this Post

Since virtually every data source can be used to generate events, it is natural that the platform operator/analyst wants to add data from new sources over time. I use this post as a small check list, to document considerations for the “onboarding” process of new data sources. You might want to automate this process in a way that works for you. In future posts I will cover the steps in detail.

Onboard a New Data Source

I need to ingest data to Kafka

It’s very handy to use Apache NiFi for the ingest part. Just create a data flow consisting of two processors: a simple tcp listener to receive data and a Kafka producer to push the event further into Kafka.
I can also push data directly into Kafka if the architecture, firewall and the source system allow it.
If there are no active components on the source system pushing data, I might want to install an instance of MiNiFi on my source system.

Screen Shot 2018-07-19 at 10.32.29 — Simple example of a data ingest into Kafka via NiFi

Before I can ingest data into Kafka, I need a new Kafka topic

While the Kafka topics “enrichments” and “indexing” Kafka topics will be used by all data sources, the parser topics are specific to a data source.
I create a topic named “squid” with a number of partitions that corresponds to the amounts of data I receive.

To make the events searchable, i.e., to store the events into Apache Solr, I need to create a new Solr collection (or Elastic Search template)

For each parser Storm topology and parser Kafka topic, there is a parser Solr collection.
I add a few specific fields common to all Metron Solr collections and optionally define data source specific fields in the schema.xml.
I create a new collection named “squid” with a number of shards that corresponds to the amount of data I receive.

I define my parser in the Metron Management UI

I click the “+” button in the right bottom corner of the Metron Management UI.
I configure my parser by choosing a Java class and/or define a Grok pattern, insert a sample and check if the parsed output is what I expect.
I configure the parser: Kafka topic name, Solr collection name, parser config, enrichment defintions, threat intel logic, transformations, parallelism.
I save the parser configuration and press the “Play” button next to the new parser to start it.

Screen Shot 2018-07-18 at 08.37.04 1 — Metron Management UI with my configured parsers. Currently only the Squid parser is running that produces the events in the first screenshot.

Outlook

I hope this post was helpful and informative. For questions I refer to the documentation, future posts, the Metron mailing list or post a question below.

Datahovel

How to Onboard a New Data Source in Apache Metron

Introduction

Technical Introduction

Scope of this Post

Onboard a New Data Source

I need to ingest data to Kafka

Before I can ingest data into Kafka, I need a new Kafka topic

To make the events searchable, i.e., to store the events into Apache Solr, I need to create a new Solr collection (or Elastic Search template)

I define my parser in the Metron Management UI

Outlook

5 thoughts on “How to Onboard a New Data Source in Apache Metron”

Leave a comment Cancel reply

Introduction

Technical Introduction

Scope of this Post

Onboard a New Data Source

I need to ingest data to Kafka

Before I can ingest data into Kafka, I need a new Kafka topic

To make the events searchable, i.e., to store the events into Apache Solr, I need to create a new Solr collection (or Elastic Search template)

I define my parser in the Metron Management UI

Outlook

Share this:

5 thoughts on “How to Onboard a New Data Source in Apache Metron”

Leave a comment Cancel reply