Service Foundry
Young Gyu Kim <credemol@gmail.com>

Deploying Observability Stack in Development Environment

observability in development

Introduction

This guide outlines how to set up observability in development environments using Observability Foundry—a lightweight, Kubernetes-native observability platform. Built as part of the Service Foundry ecosystem, it enables developers to deploy essential monitoring tools—including metrics, logs, and traces—on their own Kubernetes clusters with support for cloud-native storage.

Optimized for resource efficiency, the development deployment profile is ideal for local clusters and low-cost cloud setups, offering full-stack observability without the complexity of production systems.

What’s New in This Release

We’re excited to announce a new version of Observability Foundry, bringing several key updates and enhancements:

  • OpenTelemetry Operator: Upgraded to version 0.127.0 of opentelemetry-collector-contrib, offering improved compatibility and stability.

  • Grafana: Updated to version 12.0.1, which introduces enhanced visualization features and more powerful alerting capabilities.

  • LGTM Stack: Now includes native support for Loki, Grafana, Tempo, and Mimir, enabling a fully integrated observability pipeline.

  • Kubelet cAdvisor Collector: Improved metrics collection for Kubernetes nodes, offering deeper visibility into container resource usage.

Deployment Profiles of Observability Foundry

The Observability Foundry supports multiple deployment profiles tailored for specific environments:

sf deployment profiles
Figure 1. Deployment Profiles of Observability Foundry
  • Development: A resource-efficient configuration for local or cloud-based development environments. Provides metrics, logs, and traces with minimal overhead.

  • Staging: A pre-production configuration that mirrors production settings, allowing feature validation and integration testing.

  • Production: A highly available, scalable deployment profile designed to handle large volumes of telemetry with advanced features such as long-term retention, alerting, and custom dashboards.

Core Components in Development Profile

The development deployment of Observability Foundry includes the following components:

  • OpenTelemetry Collector (2 replicas): Collects and exports telemetry data.

  • Kubelet cAdvisor Collector (1 replica): Collects Kubernetes container metrics.

  • Grafana (1 replica): Visualization layer for logs, metrics, and traces.

  • Prometheus (1 replica): Time-series metrics store. (Storage: Filesystem)

  • Tempo (2 replicas): Distributed tracing backend. (Storage: S3)

  • Loki (2 replicas): Log aggregation system. (Storage: S3)

  • OpenTelementry Spring Application (2 replicas): Spring Boot application instrumented with OpenTelemetry for backend observability.

  • OpenTelemetry React Application (1 replica): React application instrumented with OpenTelemetry for frontend observability.

  • Keycloak (1 replica): Provides Single Sign-On (SSO) capabilities for secure access to the observability tools.

  • Keycloak Postgres (1 replica): Provides OIDC-based authentication and SSO.

  • Traefik (1 replica): Ingress controller routing traffic to observability services.

Namespaces

The deployment utilizes the following Kubernetes namespaces:

  • cert-manager: TLS certificate management

  • default: Core components like Prometheus Operator

  • keycloak: Keycloak authentication server

  • o11y: Observability components (Grafana, Loki, etc.)

  • opentelemetry-operator-system: OpenTelemetry Operator

  • traefik: Ingress controller components

All Pods in the Development Environment

You can view all running pods across relevant namespaces with:

$ kubectl get pods --all-namespaces | grep -E '^(cert-manager|default|keycloak|o11y|opentelemetry-operator-system|traefik)\s'

Sample Output:

Namespace                       Pod Name
cert-manager                    cert-manager-cainjector-686546c9f7-lshwf
cert-manager                    cert-manager-d6746cf45-rcdz9
cert-manager                    cert-manager-webhook-5f79cd6f4b-tnngh
default                         prometheus-operator-55b5c96cf8-fcg79
keycloak                        keycloak-0
keycloak                        keycloak-postgresql-0
o11y                            grafana-fdf6b548c-jrtvm
o11y                            kubelet-cadvisor-collector-0
o11y                            kubelet-cadvisor-targetallocator-786d574896-9wz4q
o11y                            loki-0
o11y                            loki-1
o11y                            loki-canary-7rbpg
o11y                            loki-canary-wq5kf
o11y                            loki-chunks-cache-0
o11y                            loki-gateway-d8c77f9b4-clxjj
o11y                            loki-results-cache-0
o11y                            oauth2-proxy-6c58576b75-bzmt4
o11y                            otel-collector-0
o11y                            otel-collector-1
o11y                            otel-spring-example-57d5cc6b88-2dtnl
o11y                            otel-spring-example-57d5cc6b88-d762n
o11y                            otel-spring-example-kaniko-executor
o11y                            otel-targetallocator-784c95db5-qgzth
o11y                            prometheus-prometheus-0
o11y                            react-o11y-app-54ffccdcbf-bv64g
o11y                            tempo-0
o11y                            tempo-1
opentelemetry-operator-system   opentelemetry-operator-controller-manager-6856674db7-4zg29
traefik                         traefik-6697dc88f8-9xzqh

Cluster Specifications

This guide assumes you are running on Amazon EKS with:

  • Instance type: r6i.xlarge(4 vCPUs, 32 GiB memory)

  • Node count: 2

  • Kubernetes version: 1.32.3

Configuration via .o11y-foundry-config.yaml

All components are managed through a centralized configuration file:

o11y-foundry-config.yaml
cassandra:
  enabled: false
  # configuration for Cassandra
jaeger:
  enabled: false
  # configuration for Jaeger
prometheus:
  enabled: true
  # configuration for Prometheus
grafana:
  enabled: true
  # configuration for Grafana
opensearch:
  enabled: false
  # configuration for OpenSearch, Data Prepper, and OpenSearch Dashboards
otel-spring-example:
  enabled: true
  # configuration for OpenTelemetry Spring Example application
otel-collector:
  enabled: true
  # configuration for OpenTelemetry Collector
oauth2-proxy:
  enabled: true
  # configuration for OAuth2 Proxy
react-o11y-app:
  enabled: true
  # configuration for React OpenTelemetry application
kubelet-cadvisor-collector:
  enabled: true
  # configuration for kubelet cAdvisor collector
tempo:
  enabled: true
  # configuration for Tempo
loki:
  enabled: true
  # configuration for Loki
mimir:
  enabled: false
  # configuration for Mimir

Grafana Integration

Provisioned Data Sources

Grafana includes pre-provisioned connectors for Loki, Tempo, and Prometheus.

grafana datasources
Figure 2. Grafana UI - Provisioned Data Sources

Provisioned Dashboards for Applications

Service Foundry comes with pre-configured Grafana dashboards for both the OpenTelemetry Spring application and the React application.

grafana dashboards
Figure 3. Grafana UI - Provisioned Dashboards

Provisioned Alert Rules

Observability Foundry comes with pre-configured alert rules in Grafana to monitor the health of the applications and infrastructure. These alerts can notify users of issues such as high error rates, latency spikes, or resource exhaustion.

grafana alert rules
Figure 4. Grafana UI - Provisioned Alert Rules

Provisioned Contact Points

Alert notifications are supported via pre-configured contact points, including email (SMTP configurable in .o11y-foundry-config.yaml).

grafana contact points
Figure 5. Grafana UI - Provisioned Contact Points

Infrastructure Metrics Dashboard in Grafana

All metrics collected from the Kubernetes cluster are visualized in Grafana. The infrastructure metrics dashboard provides insights into the health and performance of the Kubernetes nodes.

grafana infrastructure metrics
Figure 6. Grafana UI - Infrastructure Metrics Dashboard

Explore Traces in Grafana

The traces collected by Tempo can be explored in Grafana. This allows users to analyze the performance of their applications and identify bottlenecks.

grafana traces
Figure 7. Grafana UI - Explore Traces

Explore Logs in Grafana

The logs collected by Loki can be explored in Grafana. This provides a powerful way to search and analyze logs from applications and infrastructure.

grafana logs
Figure 8. Grafana UI - Explore Logs

Explore Traces and Logs in one Place

The Grafana Explore feature allows users to search and analyze both traces and logs in a unified interface. This makes it easier to correlate events and troubleshoot issues across different telemetry data types.

grafana traces logs
Figure 9. Grafana UI - Explore Traces and Logs

Authentication with Keycloak

Keycloak is integrated as the identity provider for secure access. It includes:

  • OIDC Server

  • Predefined OAuth2 Clients

  • Users, roles, and groups for access control

keycloak clients
Figure 10. Keycloak OIDC Server Provisioned

SSO is fully supported across the observability stack through Keycloak and OAuth2 Proxy.

Learn More

Interested in deploying Observability Foundry in your development cluster?

Conclusion

Observability Foundry offers a robust yet lightweight solution for development teams looking to implement full-stack observability. With minimal setup, it provides visibility into both application behavior and infrastructure performance using modern, cloud-native tools like OpenTelemetry, Grafana, Tempo, and Loki.

📘 View the web version: