The Concepts of Tag-Based Authorization

What is classical authorization?

The answer to this question is resource based authorisation. Everybody is familiar with resource based authorization. It’s about managing a set of policies for all resources, i.e., databases, tables, views, columns, processes, applications and others. That means whenever you create a new resource, you need to create a new policy that matches this resources with users or groups and assigns adequate permissions to them.

In resource-based authorization security policies match resources with users/groups.

Thus, authorization services must be aware of the resources (from a specific resource providing service) as well as users and groups (usually from an authentication provider, such as an Active Directory).

The Process

The authorization service connects to the resource-providing service to be aware of the resources. The service typically knows which types of permission the specific resources allow for. In the diagram below you see a simplified process of how resource based authorization typically works and how the “stakeholders” interact.

Typical components and interactions involved in resource based authorization

In the Big Data landscape the de-facto standard authorization service is Apache Ranger.

Tag Based Authorization

Tag-based authorization is not so much more different. Instead of having a set of policies that match resources with users/groups, you create a set of policies that match tags with users/groups. This means also, that you need another instance or service to match resources with tags. Now, whenever you create a new resource, the only thing you need to do is to tag it. All existing policies for that tag will automatically apply for the new resource. This gives you more flexibility if you have a complex authorization model in your company, because one tag might be connected with multiple security policies:

  • It saves you from duplicating the same policies from similar resources
  • It’s more user-friendly and comes more natural to assign tags to a resource than thinking about which permissions/policies might be required, everytime you add a new resource.

In tag-based authorization security policies match tags with users/groups.

The Process

As mentioned before, an additional service is needed to manage the relationship between resources and tags. The authorization service knows the resource, syncs user and groups as well as the tags for the resources. The tag provider knows the resource and is the interface for the user to assign tags to the resource.

Typical components and interactions involved in tag-based authorization.

You can manage tags and govern your data sources using Apache Atlas. Apache Atlas integrates well with Apache Ranger and other services in the Big Data Landscape and can be integrated with any tool by leveraging its REST API.

Create Useful Tags

Tagging is powerful, since you can look from different angles at your resources, i.e., you can introduce multiple dimensions. Once you decided to go with tag-based security, the first step is to think about which dimensions you want to introduce in the beginning. The second step is to consistently apply those dimensions across your resources.

You can think of dimensions as categories of tags:

  • One category of tags classifies a resource, e.g., a database based on the source system the data came from: MySQL, Server Log, HBase, …
  • Another category of tags introduces the dimension of use cases: cyber_security, customer_journey, marketing_campaign2, …
  • A third category might be the career level within a company: common, manager, executive
  • Another category of tags distinguishes departments: sales, engineering, marketing, …

As long as you are consequently tagging your resources appropriately, the advantages of tagging in the context of authorization are immediately apparent: When you create a new resource, for example a Hive table, you apply the tags MySQL, customer_journey, executive, marketing and based on the pre-defined tag-based policies you’ll know that

  • The technical user, that does the hourly load from the MySQL database to Hive has write access to the table.
  • The team of all people that work on the customer journey project has read access to the table.
  • All employees on the executive level have read access to the table.
  • The marketing department has full access to the table.

Conclusions

I hope this article made it easy to understand the process and benefits of tag-based authorization. However, simplified security is only one of the benefits of tagging. Tagging is also useful to describe lineage and thus facilitate data governance.