How to Troubleshoot Apache Knox

Apache Knox is a gateway application and the door to access data in a Hadoop cluster hidden behind a firewall. While the usage is fairly simple the setup, configuration and debugging process can be tedious due to many different components that Apache Knox ties together. On Hortonworks Community Connection I wrote an article that shows you exactly what could be wrong with your Knox setup … Continue reading How to Troubleshoot Apache Knox

How to Write a Marker File in a Luigi “PigJobTask”

This is supposed to be a brief aid to memory on how to write marker files, when using “Luigi“, which I explained in a former blog post. What is a Marker File? A marker file is an empty file created with the sole purpose of signalizing to another process or application that some process is currently ongoing or finished. In the context of scheduling using … Continue reading How to Write a Marker File in a Luigi “PigJobTask”

Book Review: Learning Responsive Data Visualization

This post is about describing my experiences reading a book: “Learning Responsive Data Visualization” by Christoph Körner. What is it all about? The book aims to explain the concepts and application of responsive data visualization technologies. It describes the famous CSS framework from Twitter “Bootstrap“, SVG graphics and the JavaScript visualization framework D3.js. The book has 9 chapters: starting from a short introduction of the … Continue reading Book Review: Learning Responsive Data Visualization

How to Create a Data Pipeline Using Luigi

This is a simple walk-through of an example usage of Luigi. Online there is the excellent documentation of Spotify themselves. You can find all bits and bytes out there to create your own pipeline script. Also, there are already a few blog posts about what is possible when using Luigi, but then – I believe – it’s not very well described how to implement it. So, … Continue reading How to Create a Data Pipeline Using Luigi

How to Write a Command Line Tool in Python

Scope and Prerequisites This rather long blog entry basically consists of two parts: In the first part “Motivation” we will learn a few reasons on why to wrap a command line tool (in Python) around an existing REST interface. If you are not interested in that, but want to know how to build a command line tool skip to the second part – “Ingredients“, “Project Structure” and … Continue reading How to Write a Command Line Tool in Python