Decoupling Policies From Your Software With Open Policy Agent, Part 2
Milos Svana
4 years ago
In a previous article, we introduced Open Policy Agent (OPA), a simple open source tool that lets you decouple your organizational policies, including service authorization rules, from other parts of your software stack. We showed you many ways how OPA can be integrated into your service architecture, and also presented a few policy definition examples.
In this article, we deep dive into the code of a simple microservice stack composed of applications written in different languages, and using OPA to verify different policies related to these applications. This scenario is common in organizations that manage a large number of legacy systems created using various technologies.
Our goal is to show you how OPA makes the dependency between services and policies truly minimal. We also want to share the experience we gained during the development of this simple application, and hope it helps you integrate OPA into your own environment.
Running the demo
If you want to test our demo app yourself, you first need to have Docker and Docker Compose installed on your machine. Next, you can simply clone our public GIT repository:
git clone [email protected]:profiq/opa-demo-app.git
and locally deploy the app stack:
cd opa-demo-app/ docker-compose up --build
By default, the app frontend runs on port 80. You can easily access it through your web browser by going to http://local.profiq.com/.
OPA integration: university course management system
You can find the code of the whole application in our Git repository. We demonstrate OPA integration on a very simple university course management system. This system includes two types of entities: users, that are further divided into students and teachers, and courses.
Each course is taught by a set of teachers and has a maximal student capacity that depends on the number of teachers, who teach the course. Furthermore, we defined a set of time slots that tell us when the classes of a given course should take place. Students can enroll in as many courses as they want, given there are no time slot conflicts and the course capacity is not exceeded.
Each time slot is represented by an integer ranging from 1 to 40. We assumed that there are 8 time slots each day. Each time slot is 1 hour long, and the first time slot starts at 8 AM. The very first time slot of each week, Monday 8 AM – 9 AM, is represented by number 1, while the last slot, Friday 3 PM – 4 PM, is represented by number 40.
We made a few simplifying assumptions in our system. First, we assumed that students enrolled in a course have to attend all classes as defined by the course time slots every week. Next, we ignored many details that need to be taken into account in a real life setting, such as room requirements, combining courses into study programs, and student evaluations and exams.
The interface of our demo app is rather simple. We focused on transparency and displaying raw data, so you as a developer have a good idea about what is going on in the background. The UI lets you switch between multiple users that differ in their role, and the courses with which they are involved:
After you assume the role of a specific user, you can interact with a set of predefined courses. These interactions include loading and modifying the course details, or enrolling in a course. The role you select determines which of these actions you can perform for a given course.
The UI is written in Typescript. It was created using React with Redux as an app state container and Material-UI, which gave us a set of themed, reusable UI elements, so we could focus on showcasing how the services policy verification and actually work.
Application architecture
For our demo app, we adopted a standard microservice architecture with an API gateway that manages access to the actual services. The UI part of the stack described above does not communicate with the services directly. Instead, it uses the API gateway as a middle-man. Each request is sent to the API gateway, which in turn redirects it to an appropriate service:
Redirecting the request to a service has to be first allowed by OPA. The API gateway talks to a simple middleware that acts as an adapter that facilitates the communication between the gateway and OPA itself.
In the previous article, we talked about two of the ways how OPA can be integrated into an app stack. We show both approaches in our app. Most of the time, policy verification is handled by the API gateway so the services are not aware of OPA’s existence. There is one exception, though. As explained in detail later, the Enroll service talks to OPA directly to get the information about maximum student capacity for a course.
We store all course and user data in a single JSON file. Since multiple services need to access this information, we defined a Docker volume containing the JSON file, which is then shared inside a private network among the containers of all services that need it. In a real-world scenario, we would of course replace this file with a real database. Some services, such as Enroll, can not only read but also modify the file.
OPA as a middleware
We implement our API gateway using Traefik, a relatively new open source alternative to nginx that’s designed for cloud environments. We chose Traefik because it offers all the required features for free, as well as for the ease with which everything can be set up. Defining service routing rules in Traefik is very simple. Most of the work is done in the traefik/dynamic.yml file. At the bottom of the file, we define a set of services, for example:
user-service: loadBalancer: servers: - url: "http://user-service:10000"
In our simple use case, we only needed to provide the URL of the web server responsible for handling a given service.
Using our service definitions, we can then configure a set of routes to redirect requests to the correct services, for example:
user-service: rule: "Host(`user.localhost`)" service: user-service middlewares: - "middleware-auth-opa"
Each route is defined by a matching rule and a service to which a request should be redirected if it complies with this rule. In this case, we simply based the traffic redirection on the requested hostname.
In many applications, such a router definition would be sufficient. But, in order to verify that a given request is allowed, Traefik first needs to talk to OPA. This means that we also have to define a middleware and register it with a given router. The definition of our simple authorization middleware looks like this:
middleware-auth-opa: forwardAuth: address: "http://middleware-auth-opa" trustForwardHeader: true
We configured the middleware to manage forward request authentication, and defined the address of a server that is responsible for actually performing it.
The middleware itself is a very simple NodeJS application capable of handling HTTP requests. Its singular purpose is to act as an adapter that translates requests received from Traefik to a format suitable for OPA. As you can see for yourself in middlewares/auth-opa/app.js, this requires less than 50 lines of code.
When working with Traefik, we discovered that, as of now, only GET requests are supported for forward authorization. We originally planned to send some portion of the data, such as course ID, which is required by some services, in the body of a POST request. Since this is not possible, we had to use HTTP headers to transfer all the required information.
Finally, we get to talk about OPA. In our case. we run it in an HTTP server mode. As shown in the code extract below, we talk to it by sending POST requests to a specific URL. Each request carries a JSON object containing information about the method used by the original request, the user’s authorization token, the path to the resource contained in the original URL, and optionally, the unique course ID, which is required by some services:
… axios .post( ‘http://opa:8181/v1/data/demo/opa’, { input: { method: req.headers[‘x-forwarded-method’], token: (req.headers[‘authorization’] || ‘<anonymous user token>’), path: req.headers[‘x-forwarded-uri’], course: req.headers[‘course’] || ”, }, }, { headers: { ‘Content-Type’: ‘application/json’, }, } ) .then(function (resp) { if (resp.status === 200 && resp.data.result.allow === true) { res.status(200).send(”); } else { res.status(403).send( JSON.stringify({ info: ‘Not authorized (blocked by OPA)’, }) ); } }) …
OPA responds by providing a JSON object containing information about request authorization. We are interested in a single rule called allow. As explained in the previous article, we can provide multiple definitions for a single rule. This rule is then evaluated as true if at least one of its definitions is evaluated as true. These different definitions then implement access policies for different services in our stack.
Just to provide an example, let’s have a look at two definitions of the allow rule found in opa/policy.rego:
allow { input.method == "POST" input.path == "/auth" }
and
allow { input.method == "POST" input.path == "/course_update" teacher_teaching_course }
The first definition handles requests directed towards the /auth endpoint. As explained later, this endpoint is responsible for generating JWT tokens. The rule checks if we use the correct path, and makes the request using the POST method.
The second definition of the allow rule then allows a user to update a course only if the request path is set to /course_update, the user uses the POST method and if a “sub-rule” checking if the current user is a teacher teaching this course is also evaluated as true.
We only need one of these definitions to be true to allow the user to perform the request they want. If the allow rule is evaluated as true, our NodeJS middleware sends an empty HTTP response with code 200 back to Traefik. In case the access is not allowed due to our current policies, we respond with a 403 response code, optionally providing an error message in the response body.
If you want to know more about how you can define different policies in OPA, we recommend reading our previous article.
Services
To show you that OPA is indeed language-independent, we split our system into multiple services, each written in a different programming language and deployed in a standalone container. The source code of each service can be found in the services/ directory of the demo app Git repository.
Auth
This simple service is written in Python. We use the Flask framework to implement a trivial HTTP API, providing a single /auth endpoint. This endpoint is responsible for generating a JWT token for a given user. This token is then used by the middleware to authenticate the user when they want to access other services. This service requires no special authentication rules.
While working with authorization tokens in OPA, we discovered that our rules would have trouble handling requests without a token. We therefore provide a default token representing an anonymous user that is in use until a proper user token is generated.
Courses
Written in Python and Flask again, this service is responsible for handling all information about university courses. It provides three endpoints.
First, the /all_courses endpoint lists all courses in our database. All users can access this list. The /course_details endpoint gives you details about a specific course. Finally, /course_update lets teachers teaching the course modify its details, such as the course name, or the number of credits students receive for finishing it.
Enroll
The Enroll service lets students enroll in courses. It’s implemented in Node.js with the help of the Express web framework.
Not just any student can enroll in any course. We need to define a few enrollment policies that will be handled by OPA. First, each course has a maximum capacity that cannot be exceeded. This capacity depends on the number of teachers who teach the course. Since we have very few users in our demo, we allow only two students per teacher:
course_capacity = capacity { no_teachers := count(data.courses[input.course].teachers) capacity := no_teachers * 2 }
Students are also denied enrollment if attending the course would lead to schedule conflicts. On a more technical level, we also want to prevent students from enrolling twice in the same course. This last requirement is covered by the schedule conflict rules, since enrolling in the same course twice inevitably leads to having two classes at the same time.
In most cases, policy verification is handled by the API gateway which talks to OPA using a simple adapter written in JavaScript. But we also want to show you the alternative. Therefore, we made the Enroll service talk to OPA directly to get the information about the maximum capacity for a given course. When running in server mode, OPA provides an HTTP API. This eliminates the need for a special library to talk to it. We simply make an appropriate HTTP request and OPA provides us with all the data we need:
... axios .post( http://opa:8181/v1/data/demo/opa, { input: { token: (req.headers['authorization'] || '<anonymous user token>'), path: '/enroll_course_endpoint', course: req.body.course || '', }, }, { headers: headers } ) .then(function (resp) { if ( resp.status === 200 && resp.data.result.course_is_full === true ) { res.status(403).send( JSON.stringify({ info: 'Course capacity is full (blocked by OPA)', }) ); } else if ( resp.status === 200 && resp.data.result.allow === true ) { // Enroll the student } else { res.status(403).send( JSON.stringify({ info: 'Conflict in timetable (blocked by OPA)', }) ); } }) ...
The code is very similar to the middleware interpretation, the main difference being that instead of examining the results for the allow rule, we now also check how the course_is_full rule was evaluated. If OPA returns true for this specific rule, we can respond by notifying the user that the course they want to enroll in is currently full.
Users
The last service is implemented in Go using the Gin framework. It provides two endpoints. The /all_users endpoint provides us with a list of all users registered in our course system. This list is then rendered in the UI, which lets you select any user, generate an appropriate JWT token, and assume their role in regards to other microservices. The second endpoint, /user_detail, lets us get detailed information about a specific user.
Development experience
When creating this demo application, we became fully aware of the advantages OPA provides to developers. After we had the basic structure in place, it was very easy to scale the demo by adding more services. This was supported by the fact that OPA is for the most part language independent. Even the implementation of direct communication between the Enroll service and OPA was easy, as we only needed to use standard HTTP requests. No OPA-specific library was required and we modified only a few lines of code.
Of course, not everything went as smoothly and we bumped into a few obstacles. We already mentioned one of them; as of now, Traefik can send only GET requests to the authentication middleware. This issue is related mainly to Traefik and not to OPA. We decided to use relatively bleeding-edge technology to implement the API gateway. This is one of the costs of such an approach.
As for OPA itself, we found that, at the time of development, the documentation changed quite rapidly, and not everything we needed was in the current version. In some cases, we had to switch to older versions of the online docs in order to find the information we needed. This prolonged the development process.
We truly think OPA delivers on its promises. Although a bit of polishing is still needed, our overall experience was good, and the advantages of separating policies from actual services were visible during the development process. We recommend considering OPA for policy enforcement in your organization, especially if you are dealing with a set of diverse software services.