How to Troubleshoot an Apache Storm Topology

Apache Storm is a real-time, fault-tolerant, event-based streaming framework and platform that runs your code in a highly parallelized way on distributed nodes. It’s all about Spouts (processing units to read from data sources) and Bolts (general processing units). Storm is often used to read data from Apache Kafka and write the results back to Kafka or to a data store. Apache Storm and Apache Kafka are the … Continue reading How to Troubleshoot an Apache Storm Topology

Basics of Hadoop User Management

Hadoop is old, everyone has their own Hadoop cluster and everyone knows how to use it. It’s 2018, right? This article is just a collection of a few gotchas, dos and don’ts with respect to User Management that shouldn’t happen in 2018 anymore. Terminology Just a few terms and definitions so that everyone is on the same page for the rest of the article. Roll … Continue reading Basics of Hadoop User Management

4 Things Factorio Taught Me about DevOps

What is Factorio? Factorio is a computer game. You probably ask yourself, in which ways a computer game is related to this blog? Well, not at all – or is it? Let’s find out. Basically, in the game you take over the role of a character in 3rd person perspective, whose rocket ship crashed on a foreign planet. You don’t have anything, but a pick … Continue reading 4 Things Factorio Taught Me about DevOps

How to Troubleshoot Apache Knox

Apache Knox is a gateway application and the door to access data in a Hadoop cluster hidden behind a firewall. While the usage is fairly simple the setup, configuration and debugging process can be tedious due to many different components that Apache Knox ties together. On Hortonworks Community Connection I wrote an article that shows you exactly what could be wrong with your Knox setup … Continue reading How to Troubleshoot Apache Knox

How to Write a Marker File in a Luigi “PigJobTask”

This is supposed to be a brief aid to memory on how to write marker files, when using “Luigi“, which I explained in a former blog post. What is a Marker File? A marker file is an empty file created with the sole purpose of signalizing to another process or application that some process is currently ongoing or finished. In the context of scheduling using … Continue reading How to Write a Marker File in a Luigi “PigJobTask”

Book Review: Learning Responsive Data Visualization

This post is about describing my experiences reading a book: “Learning Responsive Data Visualization” by Christoph Körner. What is it all about? The book aims to explain the concepts and application of responsive data visualization technologies. It describes the famous CSS framework from Twitter “Bootstrap“, SVG graphics and the JavaScript visualization framework D3.js. The book has 9 chapters: starting from a short introduction of the … Continue reading Book Review: Learning Responsive Data Visualization