A Cookiecutter for Metron Sensors

What is “Cookiecutter”? Cookiecutter is a project that helps create boiler plate and project structures and is very famous and widely used in both the Python and data scientist communities. But you can use Cookiecutter for virtually anything, also for Apache Metron sensors.
Apache Metron is,…. well read some of the earlier blog posts, or the documentation. 🙂

What is the cookiecutter-metron-sensor Project?

The cookiecutter-metron-sensor project helps you to create sensor configuration files and it generates deployment instructions and a corresponding deployment script for the specific sensor. If you need all the details check out the README.md of the project on github:

https://github.com/Condla/cookiecutter-metron-sensor

Usage

To use the Metron sensor cookiecutter you only need one thing installed: cookiecutter:

pip install cookiecutter

Then you need to clone the project mentioned above and run the template. That’s it.

git clone https://github.com/Condla/cookiecutter-metron-sensor
cookiecutter cookiecutter-metron-sensor

Now simply fill in the prompts to configure the cookiecutter and the lion’s share of the work you need do to onboard a new data source is done. In the directory created you find a deployment script as well as another README.md file that you can use to document everything around your sensor as you go ahead and define your own transformations and enrichments. The README.md comes with the deployment instructions for its own specific parser.

Help to fill in the Cookiecutter prompts

While the cookiecutter-metron-sensor helps you to create and complete all of the Metron sensor configuration files, it does not do anything to explain what those prompts mean. You still need to read the documentation for this. However, to assist you in your efforts I’ll walk you through the configuration prompts and point you to the documentation, so you understand what and why you need to configure it.

  • sensor_name: This will be the name of the sensor in the Metron Management UI and determines the name of the parser Storm topology and the name of the Kafka consumer group.
  • index_name: The name Metron will use to store the result of the Metron processing pipeline in HDFS, Elastic Search or Apache Solr.
  • kafka_topic_name: This is the name of the Kafka topic the sensor parser will subscribe to.
  • kafka_number_partitions: The number of partitions of the Kafka topic above. It also determines the number of “ackers” and Storm “spouts” of the sensor parser topology. If you’re not sure it’s good to start with 2 and increase this number later on, if you see that the parser topology builds up lag. Check the Metron performance tuning guide for more information.
  • kafka_number_replicas: The number of replicas of the above Kafka topic. For data security and service availability reasons this should be 2 or 3.
  • storm_number_of_workers: The number of Storm workers you want to launch for the sensor parser topology. Each worker is it’s own JVM Linux process with memory assigned to it. All Storm processing units will be distributed over these workers. For availability reasons use 2 or more workers.
  • storm_parser_parallelism: This will affect how fast the sensor parser will be processing the incoming data stream. Per default cookiecutter sets it to your choice of kafka_number_partitions which as mentioned above affect the number of processing units reading the stream from Apache Kafka.
  • batch_indexing_size: This is the batch size written to HDFS per writer and should be determined based on the parallelism and the number of events per second your are dealing with. Again, refer to the performance tuning guide.
  • ra_indexing_size: Similar to batch_indexing_size, but for indexing to Elastic Search or Solr.
  • write_to_hdfs: Select true if you want to use the batch indexing capabilities to HDFS.
  • write_to_elastic_search: Select true if you want to use the random access indexing capabilities to Elastic Search.
  • write_to_solr: Select true if you want to use the random access indexing capabilities to Apache Solr.
  • write_to_hbase: Choose false if you want a “common” Metron pipeline [Parsing/Transforming] –> [Enrichment] –> [Indexing] –> [HDFS/Elastic/Solr]. Choose true if you want to onboard a stream ingest enrichment source [Parsing/Transforming] –> [HBase].
    shew_table: The HBase table name you want to write to in case you use write_to_hbase. You can ignore this and use the defaults in case you don’t.
    shew_cf: The HBase column family name you want to write to in case you write_to_hbase. You can ignore this and use the defaults in case you don’t.
    shew_key_columns: The name of the field you want to act as the lookup-key for you enrichment source in case you write_to_hbase. You can ignore this and use the defaults in case you don’t.
    shew_enrichment_type: The name of the enrichment to uniquely identify this, when you want to use this enrichment. It will be part of the lookup-key. Only important in case you write_to_hbase. You can ignore this and use the defaults in case you don’t.
  • parser_class_name: Select one of the possible parsers. Note: As all of these values, you can change that later in the Metron Management UI if you are using a custom parser or can’t find you parser in this list.
  • grok_pattern_label: Per default this is the sensor_name in upper case letters, but you might want to change this.
  • zookeeper_quorum: This is important for the deployment script so you can create a Kafka topic. If you deployed Metron using Ambari you’ll find this information in the Ambari UI.
  • elastic_user: Important for the deployment. If your Elastic Server does not use the X-Pack for security you can leave this field empty.
  • elastic_master: The URL to the Elastic Search Master server
  • metron_user: An admin user that has access to the Metron REST server
  • metron_rest: The URL to the Metron REST server.

Note: This cookiecutter-metron-sensor project is very young and work in progress to continuously add new features with time with the aim to make it even easier for a cyber security operator to master threat intelligence data flows.