Promtail (or other agents): Promtail is the most commonly used log collector for Loki. It reads logs from files (e.g., /var/log) or receives them from systemd, Kubernetes, or other sources. Promtail adds metadata to the logs, such as Kubernetes pod labels, hostnames, or custom labels defined in its configuration. Logs are sent to Loki via the HTTP push API or through other ingestion mechanisms like Fluentd or Logstash. Ingestion API: Loki exposes an HTTP API endpoint for receiving log streams. Promtail and other log agents send log entries to this endpoint in batches.

Think of Promtail as a delivery driver who collects logs from various locations (servers, containers) and delivers them to a centralized log warehouse (Loki).

Example: Promtail is like a driver who needs to know where to pick up packages (logs) from. It can either be told exactly where to go (static) or automatically figure it out (Kubernetes).

Example: Promtail is like a worker who checks specific files (logs) for new entries (logs) to ship to Loki.

Example: Think of Promtail as a labeling machine that tags each log with useful information, like the name of the department (pod name) or item type (log file).

Example: Promtail is like a postal service that picks up logs, holds them until a certain amount is collected, and then sends them to the central warehouse (Loki).

Example: Promtail is like a delivery driver who marks the last stop made so they can pick up where they left off the next time.

Example: Promtail acts like a log editor, adjusting the logs as needed before sending them to the warehouse (Loki).

Example: Promtail can be configured to receive logs from a centralized syslog server and then forward them to Loki.

Example: Promtail can act as a hub that receives logs from multiple remote sources before shipping them to Loki.

The Distributor is the first component in Loki's backend that processes incoming logs. Responsibility: It validates and deduplicates incoming log data. It assigns the logs to specific tenants (if multi-tenancy is enabled). It hashes the log stream's labels and uses consistent hashing to determine which ingester will process the logs. Load Balancing: Distributors ensure even distribution of log streams across ingesters using the hash ring.

The Distributor performs the following validation steps:

Aayush has configured a global rate limit of 10MB/s for this tenant. The cluster has 5 Distributors, so each Distributor enforces a limit of 2MB/s for dev-team.

ensuring {app="backend", env="production"} is equivalent to {env="production", app="backend"}

Combines the Tenant ID (dev-team) and Label Set ({app="backend", env="production"}) to create a unique hash

With a replication factor of 3, the quorum is floor(3/2) + 1 = 2. At least two Ingesters must confirm the write for the Distributor to consider it successful.

Key Benefits in This Scenario Scalability: The Distributor is stateless, so Aayush can scale it horizontally to handle increased log traffic. Fault Tolerance: The replication factor ensures that logs are not lost even if one Ingester fails. Rate Limiting: Per-tenant rate limits protect the system from being overwhelmed by a single tenant's log traffic. Efficient Load Distribution: Consistent hashing ensures that logs are evenly distributed across Ingesters.

Role of Ingester:

Think of it like a warehouse that stores products (logs) temporarily before shipping them to a storage facility (cloud storage).

Example: If an ingester is ACTIVE, it can handle incoming logs (write requests) and serve recent logs for queries (read requests).

Example: Imagine a log entry from a web server is stored in a chunk. When the chunk is full, it's sealed (compressed), and a new chunk is created for further logs.

Example: If the same log is received by two ingesters, they will not write the same chunk to storage, avoiding data duplication.

Example: If a log from server1 arrives at 10:05:00 and another log arrives at 10:03:00, the latter will be rejected unless out-of-order writes are allowed.

Example: Think of backups for your logs — if one warehouse (ingester) loses its data, other warehouses (replicas) will have copies of it.

Failure Mitigation: If an ingester crashes, any unflushed data is lost. To mitigate this, Loki uses Write-Ahead Logs (WAL) and replication. Example: If a warehouse suddenly catches fire (ingester crash), the backup warehouse (replica) can still provide the data. Filesystem Support: Ingester can write to the filesystem using BoltDB in single-process mode. However, this approach is limited since multiple processes can't access the same BoltDB instance concurrently. Example: Think of single-tenant mode where only one process can manage the warehouse’s inventory at a time.

This query requests all logs labeled with app="backend" that contain the word "error" in their content.

The Query Frontend intercepts Aayush's query before it reaches the Querier.

For example: If I queried logs for 00:00 to 03:00 an hour ago, those results are fetched directly from the cache, skipping redundant processing.

Monitoring Logs with Loki, Promtail & Grafana

By: Aayush Pokharel

CNCF Kathmandu - Dec 28, 2024

Loki Architecture

Core Components

Data Flow

Log Ingestion

Role of Promtail:

Log Discovery:

Log File Discovery:

Labeling and Metadata:

Log Shipping:

Handling Log Offsets:

Log Processing:

Receiving Logs from Syslog:

Loki Push API:

Loki Components

Distributor

Example

Distributor Validates Incoming Logs

Hashing and Ingester Selection

Example:

Write Quorum & Handling Failures

Ingester

Lifecycle Management

Data Handling

Data Persistence

Timestamp Ordering

Replication:

Querier

Query Frontend (Optional)

Example

Sub-Queries Generated:

Query Frontend Optimizes Performance

Queriers Process Sub-Queries

Query Frontend Aggregates Results

Deployment Modes

1. Single Binary Mode

2. Simple Scalable Mode

3. Microservices Mode