How to Define Elastic Search Templates for Apache Metron

When you onboard a new data source on Apache Metron and you use Elastic Search (ES) as your indexing + search engine you need to specify and submit an ES template before the indexing topology attempts the first write to the ES cluster.

The template should contain the following items:

  • Dynamic fields for possible geo enrichments of any ip address field,
  • dynamic fields for other kinds of enrichments
  • well defined static fields (“properties”) based on the fields that are unique to this parser.
  • As found in the official Metron docs: The metron_alert field type needs to be nested. As per the documentation, if you forget to do this, you’ll run into this Exception:
QueryParsingException[[nested] failed to find nested object under path [metron_alert]];

Use the Elastic Search Reference Manual to get familiar which data types Elastic Search offers and how to use them!

How to Create an Elastic Search Template for an Apache Metron Parser

An efficient way to create your own template is to get an existing one that comes with Apache Metron, adapt it and use it to create your own.

  • Step 1: Obtain an existing template, e.g., the yaf_index:
export ELASTICSEARCH_MASTER=condla0.field.hortonworks.com:9200
curl -X GET $ELASTICSEARCH_MASTER/_template/
curl -X GET $ELASTICSEARCH_MASTER/_template/yaf_index | python -m json.tool > template.json
  • Step 2: Modify it to your needs. Assume we are creating a squid template
    • Remove the outer most json layer. The "template" key must be on the top level.
    • Rename any “yaf” fields to “squid” fields.
    • Refer to the list in the beginning of this blog entry to get an idea what else you need to modify.
    • A working squid template can be found here.
    • Note that you can find a set of fields that all data sources should have in common:
      • timestamp
      • guid
      • source:type
      • ip_dst_addr
      • ip_src_addr
      • ip_dst_port
      • ip_src_port
    • as well as a set of fields unique to squid:
      • action
      • bytes
      • code
      • elapsed
      • method
      • url
vi template.json
{
  "template": "squid_index",
  "mappings": {
    "squid_doc": {
       "dynamic_templates": [
       ...
       ]
       "properties": {
       ...
       }
    }
  }
}
  • Step 3: Submit the new template:
curl -X POST $ELASTICSEARCH_MASTER/_template/squid_index -d @template.json
  • Step 4: Check if template was created correctly
curl -X GET $ELASTICSEARCH_MASTER/_template | python -m json.tool

You can find a basic, fully working squid template here.

Troubleshooting

If you query a collection via the Kibana Metron UI and see an error similar to the following exception in the Elastic Search Master log, your template is either not valid or the index is not using it.

Caused by: java.lang.IllegalArgumentException: Fielddata is disabled on text fields by default. Set fielddata=true on [source:type] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory. Alternatively use a keyword field instead.

 Thus, after you created the template and after you ingested your first events via the random access indexing topology, you want to check if your (rollover) index was created with the correct template:

# check if our squid index is there:
curl -X GET $ELASTICSEARCH_MASTER/_cat/indices
## example output:
## yellow open squid_index_2018.11.26.23 l7BO0FflRg6H0op3fM5wkw 5 1  5  0  48.3kb  48.3kb
## yellow open .kibana                   sEGp3YyZSXu40A1nRv1umQ 1 1 46 41 207.4kb 207.4kb

# check in the logs if there is a line that specifies which template was used when the index was created:
tail -f /var/log/elasticsearch/metron.log
## example output:
## ...
## [2018-11-26T23:13:58,395][INFO][o.e.c.m.MetaDataCreateIndexService][condla0.field.hortonworks.com] [squid_index_2018.11.26.23] creating index, cause [auto(bulk api)], templates [squid_index], shards [5]/[1], mappings [squid_doc]
## ...

Important Things to Note

  • /var/log/elasticsearch/metron.log is the most important log file for debugging ES template related actions
  • If you want to make your new data source available in Kibana, don’t forget to add the index pattern – in our case "squid_index_*":
    • Kibana: Management –> Create Index Pattern

One thought on “How to Define Elastic Search Templates for Apache Metron

  1. Pingback: How to Onboard a New Data Source in Apache Metron – Datahovel

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s