Large environments

6.9. Large environments#

This section describes how to configure Nubus for Kubernetes for use in large environments with a high number of concurrent users. It assumes that you are familiar with the standard Kubernetes metrics. A good option to visualize these metrics is the kube-prometheus-stack.

See also

Kubernetes metrics: for information about Kubernetes metrics.
kube-prometheus-stack at GitHub: for information about the kube-prometheus-stack, a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules.

6.9.1. Login performance#

As an identity and access management platform, a focus point of Nubus is its login performance. Nubus for Kubernetes sustains between 100 and 120 user logins per second, equating to 6,000 to 7,200 user logins per minute. Univention internal load tests with Nubus configured as described in Configuration for large environments revealed the performance statistics shown in Table 6.1. The columns in the table have the following meaning:

Concurrent logins:: Number of simultaneous login attempts during the test.
Logins per second:: Number of completed logins per second.
Median login duration:: Median login duration.
P95 login duration:: Ninety-fifth percentile of login duration meaning 95% of logins finished within this time.

The performance tests measured the login duration from the redirect to the Keycloak identity provider until the Portal Frontend in the user’s browser has loaded the personalized Portal.

Table 6.1 Login Performance Results#
Concurrent logins	Logins per second	Median login duration	P95 login duration
400	110	3.25 seconds	3.9 seconds
500	117	4.0 seconds	4.6 seconds
600	120	4.6 seconds	5.4 seconds

A higher load of 1,000 to 2,000 concurrent login attempts doesn’t significantly affect the overall logins per second. However, the individual login duration increases significantly to 30 to 60 seconds.

See also

Scalability: for configuration settings to scale components in Nubus for Kubernetes.

6.9.2. Focus areas#

The following areas require attention to optimize Nubus for Kubernetes for high login volumes.

Authentication flow
Pod distribution and scheduling constraints
Portal
LDAP

6.9.2.1. Authentication flow#

Nubus for Kubernetes uses OIDC for authentication. Nubus involves the following components in the authentication flow.

Keycloak

Keycloak provides the Identity Provider in Nubus for Kubernetes. Scale it to the supported maximum of 5 replicas. You can expect a virtual CPU load of 0.5 to 1 virtual CPU per pod.

LDAP Secondary

The LDAP Secondary pods in the Identity Store and Directory Service serve as read replicas for the user and group directory database.

Scale the LDAP Secondary pods to 8 replicas. The authentication flow puts a high request load on the LDAP Secondary pods. Consequently, a virtual CPU usage of 5 to 10 virtual CPU is normal and doesn’t imply a bottleneck.

For more information on LDAP scalability, see Directory service high availability and scalability.

UMC Server

The UMC Server acts as the OIDC service provider and manages the users’ browser sessions. Deploy 128 pod replicas to prevent bottlenecks and latency spikes. To mitigate deployment, scaling, and update latency caused by the high replica count, set the nubusUmcServer.podManagementPolicy Helm Chart value to Parallel to deploy 128 pods in parallel rather than sequentially.

See also

For architectural information of the involved components, see Univention Nubus for Kubernetes - Architecture Manual [2] and the following sections in detail:

6.9.2.2. Pod distribution and scheduling constraints#

It’s recommended to configure Kubernetes pod scheduling constraints to control the pod distribution in the Kubernetes cluster. This is important for both performance and high availability.

topologySpreadConstraints: Running all 8 LDAP Secondary pods on the same cluster node would cause performance problems. Configure topologySpreadConstraints to equally distribute the pod replicas of a Kubernetes workload across the Kubernetes cluster nodes. You can configure Kubernetes (anti)affinity rules to further optimize pod scheduling.
resources.requests: Set resources.requests to match the expected CPU and memory consumption of your pods. Kubernetes reserves these values and prevents nodes from becoming resource-constrained.
resources.limits: resources.limits control runtime resource usage, but don’t influence pod scheduling. Adjust virtual CPU requests to align with your specific usage patterns and cluster size.

For examples of how to use topologySpreadConstraints, resources.requests, and resource.limits to influence pod scheduling, see Configuration for large environments.

The scheduling constraints mentioned in this section only apply to newly created pods. When the cluster layout changes—for example, when adding worker nodes to the cluster—it’s advisable to optimize the cluster by redeploying pods. You can redeploy pods manually, for example, using the kubectl rollout restart command. Another option with automation in mind is the Descheduler project, which evicts pods that no longer meet the scheduling constraints.

See also

Kubernetes affinity rules: Kubernetes Documentation [1] for information about Kubernetes affinity.
kubectl rollout restart: for information about manually redeploying pods.
Descheduler project at GitHub: for information about Descheduler for Kubernetes.

6.9.2.3. Portal#

In addition to optimizing the authentication flow, it’s important to ensure that the Portal Service loads quickly. The following components affect the performance of the Portal.

Ingress Controller: To equally distribute external network traffic into the Kubernetes cluster across all worker nodes, you can use a cloud load balancer. Deploy at least one ingress controller on every Kubernetes worker node, for example by deploying the ingress-nginx Helm Chart as a DaemonSet. The Ingress controller isn’t part of Nubus for Kubernetes.
Portal Frontend: The portal-frontend serves most static HTML, JavaScript, and CSS files for the Portal. Scale it to 6 replicas that are equally spread across worker nodes.
Portal Server: The portal-server pod provides the required information for the Portal Frontend to personalize the portal for the signed-in user. This component is single-threaded, but can serve many requests concurrently. To avoid bottlenecks, 10 replicas are sufficient.
UMC Gateway: The umc-gateway serves additional static HTML, JavaScript, and CSS files for the Portal. Scale it to 40 replicas to prevent request latency aberrations of multiple seconds.
UMC Server: The umc-server pod supports the personalization of the portal with authorization decisions. It doesn’t need scaling beyond the 128 pods mentioned above.

See also

LoadBalancer - Kubernetes Service type: for information about type: LoadBalancer Kubernetes service.

For architectural information of the involved components, see Univention Nubus for Kubernetes - Architecture Manual [2] and the following sections in detail:

6.9.2.4. LDAP#

In the case of large environments, LDAP can reach the size limit of the LDAP MDB database. This can lead to issues while trying to modify groups.

To change the size limit of the MDB database, you need to configure the following Helm Chart value:

global.configUcr.ldap.database.mdb.maxsize#: Configures the size limit of the LDAP MDB database in bytes. The default value corresponds to the size of 4 GB. A good rule of thumb is to multiply the total database size by five.

To change the size limit of your LDAP MDB database, you need to use the following steps:

Add the lines in Listing 6.25 to your custom_values.yaml values file.

Listing 6.25 Change the LDAP size limit#

global:
  configUcr:
    ldap:
      database:
        mdb:
          # Set the maximum size of the LDAP MDB database to 5.295 GB
          maxsize: 5295000000

Apply the configuration by following the steps in Apply configuration.

Restart the LDAP primary and secondary pods, as described in Listing 6.26 and Listing 6.27.

Listing 6.26 Restart pods for LDAP primary#

$ kubectl --namespace "$NAMESPACE_FOR_NUBUS" \
    get pods \
    | grep "ldap-server-primary"

nubus-ldap-server-primary-0      2/2      Running     0     4m33s
# Name of the pod is "nubus-ldap-server-primary-0"

$ kubectl --namespace "$NAMESPACE_FOR_NUBUS" \
    delete pod "$NAME_OF_YOUR_POD"

Listing 6.27 Restart pods for LDAP secondary#

$ kubectl --namespace "$NAMESPACE_FOR_NUBUS" \
    get pods \
    | grep "ldap-server-seconday"

nubus-ldap-server-secondary-0      2/2      Running     0     4m33s
# Name of the pod is "nubus-ldap-server-secondary-0"

$ kubectl --namespace "$NAMESPACE_FOR_NUBUS" \
    delete pod "$NAME_OF_YOUR_POD"

6.9.3. Configuration for large environments#

To optimize your Nubus for Kubernetes deployment, integrate the values in Listing 6.28 into your Nubus for Kubernetes configuration. You can download the file at large-environments.yaml. Merge the yaml file with your existing custom-values.yaml or deploy the Nubus umbrella helm chart with multiple values files.

To apply the configuration, follow the steps in Apply configuration.

Listing 6.28 Configuration optimized for login performance#

# SPDX-License-Identifier: AGPL-3.0-only
# SPDX-FileCopyrightText: 2025 Univention GmbH
---
yamlAnchors:
  resources: &baselineResources
    requests:
      cpu: "0.1"
      memory: "500Mi"
    limits:
      cpu: "8"
      memory: "8Gi"

keycloak:
  replicaCount: 5
  resources:
    requests:
      cpu: 1
      memory: "4Gi"
    limits:
      cpu: 4
      memory: "8Gi"

nubusLdapServer:
  resourcesPrimary:
    requests:
      cpu: 3
      memory: "2Gi"
    limits:
      cpu: 16
      memory: "4Gi"
  resourcesSecondary:
    requests:
      cpu: 5
      memory: "2Gi"
    limits:
      cpu: 16
      memory: "4Gi"
  resourcesProxy:
    requests:
      cpu: 3
      memory: "2Gi"
    limits:
      cpu: 16
      memory: "4Gi"
  replicaCountPrimary: 2
  replicaCountSecondary: 8
  replicaCountProxy: 0

nubusUmcServer:
  # Deploy Pods in parallel instead of sequentially
  podManagementPolicy: "Parallel"
  resources:
    requests:
      cpu: "0.15"
      memory: "500M"
    limits:
      cpu: 2
      memory: "2Gi"
  replicaCount: 128
  proxy:
    replicaCount: 4
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "kubernetes.io/hostname"
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
              - umc-server
              - nubus-umc-server-proxy

nubusUmcGateway:
  resources: *baselineResources
  replicaCount: 40
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "kubernetes.io/hostname"
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: umc-gateway

nubusPortalServer:
  # Disable the /me endpoint
  portalServer:
    featureToggles:
      api_me: false
  resources:
    requests:
      cpu: "0.4"
      memory: "500M"
    limits:
      cpu: 2
      memory: "2Gi"
  replicaCount: 10
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "kubernetes.io/hostname"
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: portal-server

nubusPortalFrontend:
  resources: *baselineResources
  replicaCount: 6
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: "kubernetes.io/hostname"
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app.kubernetes.io/name: portal-frontend

nubusNotificationsApi:
  requests:
    cpu: "300m"
    memory: "500Mi"
  limits:
    cpu: "8"
    memory: "8Gi"
  replicaCount: 2

postgresql:
  resources:
    limits:
      cpu: "8"
      memory: "8Gi"
    requests:
      cpu: "1000m"
      memory: "500Mi"
  primary:
    resources: *baselineResources
    extendedConfiguration: |
      max_connections = 1200

minio:
  resources:
    limits:
      cpu: "8"
      memory: "8Gi"
    requests:
      cpu: "1.5"
      memory: "500Mi"
  networkPolicy:
    resources: *baselineResources
  tls:
    resources: *baselineResources
  provisioning:
    resources: *baselineResources
    cleanupAfterFinished:
      resources: *baselineResources

nubusGuardian:
  enabled: false

nubusTwofaHelpdesk:
  enabled: false

nubusLdapNotifier:
  resources: *baselineResources

nubusPortalConsumer:
  resources: *baselineResources

nubusProvisioning:
  resources:
    dispatcher: *baselineResources
    prefill: *baselineResources
    api: *baselineResources
  nats:
    resources: *baselineResources
    reloader:
      resources: *baselineResources
    natsBox:
      resources: *baselineResources

nubusUdmListener:
  resources: *baselineResources

nubusSelfServiceConsumer:
  resources: *baselineResources

nubusKeycloakBootstrap:
  resources: *baselineResources

nubusKeycloakExtensions:
  resources: *baselineResources

nubusStackDataUms:
  resources: *baselineResources

nubusUdmRestApi:
  resources: *baselineResources
  replicaCount: 2

Nubus for Kubernetes - Operation Manual 1.x

Large environments

Contents

6.9. Large environments#

6.9.2. Focus areas#

6.9.2.1. Authentication flow#

6.9.2.2. Pod distribution and scheduling constraints#

6.9.2.3. Portal#

6.9.2.4. LDAP#

6.9.3. Configuration for large environments#