4. ID Broker components#

In this chapter the components that make up the ID Broker system are described.

Interaction of components.

Fig. 4.1 Interaction of components#

Service providers use the ID Broker for authentication and to retrieve information about logged in users. School authorities do the actual authentication and send their users data to the ID Broker. The ID Broker system provides interfaces for multi tenant authentication, user data storage and retrieval.

4.1. Modules#

The base for an ID Broker system is the UCS@school platform, on top of which various components implement the required interfaces.

4.1.1. UCS / UCS@school core system#

UCS@school components, like the UCS@school Kelvin REST API, are build on top of UCS’ core components OpenLDAP, Univention Directory Manager (UDM) and the UDM HTTP REST API.

Relevant for the ID Broker system are:

4.1.1.1. LDAP structure#

Schools are represented as OU nodes with containers for users, groups, computers and so on below them.

All school object belong to a single OU, except users. User object are stored inside one of its schools OUs, but have an additional attribute which lists all schools (OUs) they are members of.

Usernames and group names must be unique. Under the hood, names of school groups are prefixed with the OUs name, so the same school groups name can be used by multiple schools.

A regular UCS@school system represents the domain of one school authority with all its schools, users, groups etc. For the multi tenant feature of the ID Broker, the names of objects that must have unique names in LDAP are internally prefixed with the identifier of the tenant or replaced with a UUID.

4.1.1.2. UDM#

Univention Directory Manager (UDM) is a Python library that adds business logic on top of LDAP objects. UDMs features can be used through its Python UDM interface, the UDM command line or the UDM REST API.

The UDM extended attributes feature is used to register additional LDAP attributes required for the ID Broker system. For example the new user attribute brokerID is used to map a UUID to the username of a tenants user. Another attribute will be used to map between service provider specific aliases and the real user account names. All LDAP attributes registered with UDM are accessible as UDM properties in the UDM REST API.

4.1.1.3. UDM REST API#

UCS provides a the UDM REST API which can be used to inspect, modify, create and delete UDM objects via HTTP requests. All UDM modules and their attributes are accessible through it. The UDM REST API converts the types of most attributes from their LDAP string representations to more useful JSON representations. It does not do that for extended attributes though.

4.1.1.4. UCS@school Kelvin REST API#

The UCS@school Kelvin REST API provides HTTP endpoints to create and manage UCS@school domain objects like school users, school groups and schools (OUs). The Kelvin REST API internally uses the UCS@school library to add business logic on top of regular UDM user, group and computer objects. The result are for example complex user and server roles and finer grained authorization. To handle UCS@school objects, use the Kelvin REST API and not the UDM REST API, as it will take of data consistency. The Kelvin REST API uses the UDM REST API to communicate with the LDAP database and the Open Policy Agent for authorization.

Kelvin API details.

Fig. 4.2 Kelvin REST API components and connections#

4.1.2. Provisioning API#

Users and groups have to be created in the ID Broker system. Those users originate from the school authority systems. The Provisioning API is a REST API with methods and routes to read, create, update and delete users, school groups and schools.

The UCS@school ID Connector of each school authority uses the Provisioning API to send user and group data to the ID Broker system. Each school authority has an account in the ID Broker that allows it modify only its own objects. All objects managed through such an identity share a common namespace implemented as prefixes for the username / group names / OU names.

The Provisioning API transparently adds prefixes when talking to internal systems and removes them when talking to external ones. It acts like an adapter between the UCS@school ID Connector and the Kelvin REST API.

The Provisioning API is responsible for generating service provider specific pseudonyms. Separate pseudonyms are generated for each service provider and stored in separate attributes in the users/groups/OUs LDAP objects. A mapping from service provider ID to LDAP attribute name is retrieved from LDAP. Additionally a mapping from service provider ID to a secret password (used as salt in the generation of the pseudonym) is retrieved from LDAP. Each pseudonym is generated as a hash from the following three values:

  • entryUUID of the object in the school authorities LDAP (assumed to be a globally unique string)

  • service provider specific secret (the salt, known only to the ID Broker system)

  • school authority ID

The service provider specific secret prevents cooperating service providers to identify common users.

Provisioning API details.

Fig. 4.3 Provisioning API communication#

4.1.3. Self-disclosure API#

The design goal of the Self-disclosure API is to receive and send only service provider specific pseudonyms instead of clear text user IDs and other personal information. To make the services usable, the clear text values of some fields in the Self-disclosure API are transmitted (cf. section Future evolutions of the pseudonymization). Fig. 4.4 shows the API communication of the self disclosure API as described in the following paragraphs.

The Self-disclosure API is one example of an HTTP API where a content provider can fetch user data customized to their needs. It is implemented as a plugin for the UCS@school APIs app. It runs in a Docker container on an UCS@school system.

The Self-disclosure API uses the Redis database populated by the Self-disclosure database builder to fetch user and group data.

The client of the the API, e.g. Bettermarks, needs an auth code to get access to the API. This token is typically passed on from a student’s or teacher’s browser. The student or teacher in return has received this auth code from the IDP of its school authority.

The ID in the tokens subject field is the pseudonym of the requesting user. The ID in the resource request parameter (in the URL) is the pseudonym of the user or group that information is requested about.

Separate pseudonyms are generated for each service provider and stored in separate attributes in the users/groups/OUs LDAP objects. A mapping from service provider ID to LDAP attribute name exists in LDAP. The Self-disclosure API retrieves that mapping for the connecting service provider using the Redis cache.

When the Self-disclosure API has to lookup an object, it searches for the supplied pseudonym in the Redis cache.

When the Self-disclosure API has to return an object, instead of sending the user ID, it sends the service provider specific pseudonym of that object.

Self-disclosure API details.

Fig. 4.4 Self-disclosure API communication#

4.1.4. Self-disclosure database builder#

The design goal of the Self-disclosure database builder is to improve the performance of the Self-disclosure API. It uses a Redis database to build a cache of user, group, and service provider mappings stored in LDAP.

The diagram below shows a detailed view of the components involved. Fig. 4.5 shows a detailed view of the components involved.

Self-disclosure database builder details.

Fig. 4.5 Self-disclosure Database Builder#

Changes pushed to the ID Broker LDAP trigger a listener module (3) which enqueues the insert/update or delete event to a Redis database table which acts as a queue (4). The converter daemon consumes these events (I), reads from Kelvin and LDAP (II) and saves the data in a Redis table which saves the complete object (III).

The Self-disclosure API uses the data stored in the SDDB database to get user and group data (D). If the data hasn’t yet been inserted by the converter daemon because the replication isn’t yet finished, the Self-disclosure API query the SDDB API to add the object to the high priority queue so it will be there when the object is requested again (F).

During this process statistics are written by the internal components of the SDDB builder in a third table (IV). They can be requested through a prometheus endpoint of the SDDB API. The endpoint does not require authentication since it does not offer any private information.

4.1.5. SSO Broker#

The main job of the SSO Broker component is to handle multiple-tenant authentication, using pseudonyms. This involves the student (or her browser) doing the login and passing authentication tokens/tickets back and forth.

SSO Broker communications

Fig. 4.6 SSO Broker communications#

The SSO Broker participates in the following communications:

  • The student gets sent to the SSO Broker upon first login (a redirect from the school portal). This first step is part of an OIDC flow. The SSO Broker then sends the student to the school authority’s IDP, to do SAML authentication there. This is done using a real user identifier. The student returns to the SSO Broker with her SAML ticket.

  • The SSO Broker then needs to get a service provider specific pseudonym. This information is provided by the ID Broker IDM system, which also contains other user data provided by the school authority. An auth code valid for the (service provider specific) pseudonym is then given to the student, who passes it on to the service provider.

  • The service provider then swaps the auth code for both an access token and an ID token at the SSO Broker. The ID token (containing the pseudonym) is consumed by the service provider, while the access token can be used to request more data about the student (referred to by the pseudonym) at the Self-disclosure API.

The SSO Broker is implemented using Keycloak.

The SSO Broker is available at:

  • for OIDC at https://FQDN/auth/realms/SERVICE PROVIDER ID/protocol/openid-connect

  • for SAML at https://FQDN/auth/realms/SERVICE PROVIDER ID/broker/saml

Information about the configuration of Keycloak can be found at https://univention.gitpages.knut.univention.de/id-broker/operations-manual/

4.2. Pseudonymization#

A core concept of the ID Broker is the pseudonymization of user data towards the service providers. It is not only desired to hide the clear text values for names etc. from service providers but also prevent data analysis between multiple providers. To that effect each user, group and school OU in the ID Broker system get’s a separate pseudonym for each service provider which is saved in its own LDAP attribute (idBrokerPseudonym0001 through idBrokerPseudonym0030).

4.2.1. Management of Service Providers#

To enable each component of the ID Broker to always have access to the correct pseudonyms for each service provider the pseudonyms will be saved as individual LDAP fields on users, groups and school OUs. Those fields are indexed to ensure quick searches. To know which field belongs to which service provider a mapping from provider name to LDAP field name has to be created and maintained as well. Since this mapping has to be available on the host and its docker containers alike, saving this mapping in the LDAP is the most obvious solution.

To manage service providers in the ID Broker, the script manage-service-providers can be used. It provides functionalities to add, delete and show the mappings as well as the secrets. When a new service provider is added, all existing users, groups and school OUs receive the corresponding pseudonym.

The steps which are needed to configure Keycloak are described in Backup - SSO Service - Keycloak.

4.2.2. Form of the Pseudonyms#

The primary identifier of any group, user or school OU object in the ID Broker system is its entryUUID from the school authority, where it is originating from. To ensure that an objects pseudonyms are recoverable in an event of data loss or sync errors, it should be derived from said entryUUID. Thus the pseudonym is generated as

pseudonym_service_provider1 = blake2b(salt=service_provider1_salt, person=school_authority, data=entryUUID)

where blake2b is a hashing algorithm which returns a string in the ASCII space with no more than 128 symbols and service_provider1_salt is a previously generated secret string which is unique to each service provider.

4.2.3. Generation of pseudonyms#

The generation of pseudonyms happens primarily during user, group or school OU creation in the Provisioning API. The system will automatically create pseudonyms for all known service providers at that time. When a new service provider is added to the ID Broker, it is necessary to execute the script that generates pseudonyms for the new service provider for all users and groups that already exist on the system (manage-service-providers, see above).

4.2.4. Future evolutions of the pseudonymization#

The pseudonymization in its current form states that every user of every connected school authority gets a pseudonym for every existing service provider and thus is usable with it. Future iterations could implement the following ideas and features:

  • Service providers can be activated for users and groups on a school authority level

  • Service providers can be activated for users and groups individually (filtered by school, school_class, etc) by the school authority

  • The script for generating pseudonyms for new service providers is transformed into a small service which can react to new service providers and generates pseudonyms in an intelligent and load balanced way.

To make the services usable the clear text values of some fields in the id token as well as the Self-disclosure API are transmitted. This renders the current pseudonymization ineffective. This is known to all parties and will be removed in the next project phase as soon as the de-pseudonymization component is implemented.

4.3. Scaling#

The APIs that are accessible from outside the ID Broker system are the Provisioning API and the Self-disclosure API. The Provisioning API uses the Kelvin REST API to access user / group data, which in turn uses the UDM REST API to access the underlying database. The Self-disclosure API uses the Redis cache build by the Self-disclosure database builder.

As the Provisioning API and the Self-disclosure API have very different requirements regarding availability and response time, using separate systems is recommended.

In previous tests, with preliminary ID Broker system components, the UDM REST API was the bottleneck. Depending on the hardware its response times are limited by I/O or CPU time.

The current design is to keep the components of each of the Provisioning API chain (“Provisioning -> Kelvin -> UDM”) on the same host and to not do any load balancing between the internal components. The Self-disclosure API and the Self-disclosure database builder are installed on separate systems.

Vertical scaling can be done through higher CPU core count and faster disk I/O. To a degree also with more memory for caching. An optimal distribution of CPU cores to the worker processes of the three REST APIs has not yet been explored and may vary depending on the hardware.

Horizontal scaling can be done by load balancers in front of those systems. Load balancers can distribute the load depending on the response time of the front-end APIs. Care must be taken with regards to tokens handed out by front-end APIs. Either sticky HTTP sessions are required or shared keys on the servers for token verification. This is probably only relevant for the Provisioning API, as the Self-disclosure API will not hand out tokens.

API clients must be implemented with fault tolerance regarding lost sessions, as load balancers may have to move their connection when a server is down / being updated.