Monitoring and Observability - Part 1

As an engineer, I’ve always been fascinated by the inner workings of the systems I build. But it wasn’t until a recent discussion at work that I felt the need to share my deep dive into monitoring and observability. The topic piqued my interest, and I found myself diving headfirst into a world of logs, metrics, and traces. I wanted to understand how I could gain deeper insights into the behavior of my applications.

In this two-part series, I’ll take you on a journey through my exploration of monitoring and observability. In Part 1, we’ll focus on logging and how I leveraged tools like Pino as logging library and Grafana suite (Loki, Promtail and Grafana) to shed light on the mysteries of my application. To make things concrete, I’ll use a fictional API service built with Hono to demonstrate the setup and configuration process.

But before we get into the nitty-gritty details, let me set the stage. Imagine this, you’ve just built an API service that’s ready to take on the world. It’s fast, it’s reliable, and it’s handling requests like a champ. But as the traffic grows and the complexity increases, you start to realize that you need a way to peek behind the curtains and understand what’s really going on under the hood. That’s where logging comes in.

Setting up the stage

Let’s dive into our fictional API service, which we’ll call “HonoMart”. It’s an e-commerce platform that allows users to browse products, add items to their cart, and place orders. It’s built using Hono, a lightweight and efficient web framework for building API services. As HonoMart grows in popularity, we need to ensure that we have a robust logging system in place to monitor its health and performance.

Pino is a fast and simple logger that will allow us to capture valuable information about our application’s behavior. Loki, a horizontally scalable, highly available, and multi-tenant log aggregation system (or in short, a fancy way of saying that it is reliable), will store and index our logs for efficient querying. Promtail, a log collection agent, will collect and forward our logs to Loki. Finally, Grafana, a powerful open-source platform for monitoring and observability, will provide us with a visually appealing and intuitive way to explore and analyze our logs.

Installing Pino

The first step is to integrate Pino into our HonoMart API service. Pino is pretty lightweight and adaptable to Hono. Here’s how we can set it up:

Install pino in the project:

bun add pino

I am using Bun in this project as I find it quite convenient to work with as it provides me with a test runner, package manager and a blazingly fast runtime.

Create a logger instance in the entrypoint file of the application.

import pino from "pino";
import { SonicBoom } from "sonic-boom";

const logger = pino(
  {
    level: process.env.LOG_LEVEL || "info",
    formatters: {
      level: (label) => ({ level: label }),
    },
  },
  pino.multistream([
    {
      stream: new SonicBoom({
        dest: "./logs/app.log",
        sync: false,
        mkdir: true,
        append: false,
      }),
      level: "debug",
    },
  ])
);

export default logger;

Provide the correct type so our context can recognise the logger.

type Variables = {
  logger: pino.Logger;
};

const app = new Hono<{ Variables: Variables }>();

Set the logger instance at the middleware with more details about the request.

import baseLogger from "@/logger";

app.use("*", async (c, next) => {
  // Create a child logger with request-specific context
  const requestLogger = baseLogger.child({
    requestId: crypto.randomUUID(),
    method: c.req.method,
    path: c.req.path,
    userAgent: c.req.header("user-agent"),
  });

  // Add logger to context
  c.set("logger", requestLogger);

  // Log the incoming request
  requestLogger.info({
    msg: "Incoming request",
    query: c.req.query(),
  });

  try {
    await next();
  } catch (err) {
    const error = err as Error;
    requestLogger.error({
      msg: "Unhandled error occurred",
      error: error.message,
      stack: error.stack,
    });
    return c.json({ error: error.message }, 500);
  }

  // Log the response
  requestLogger.info({
    msg: "Request completed",
    status: c.res.status,
  });
});

Use the logger into our endpoints. For example we are going to extend our POST /products endpoint for adding products

.post('/products', async (c) => {
  const log = c.get("logger");

  try {
    const body = await c.req.json<Omit<Product, "id">>();
    log.debug({ msg: "Creating new product", body });

    const newProduct: Product = {
      id: Math.random().toString(36).substring(2, 9),
      ...body,
    };
    products.push(newProduct);

    log.info({
      msg: "Product created successfully",
      productId: newProduct.id,
    });
    return c.json(newProduct, 201);
  } catch (error) {
    log.error({ msg: "Failed to create product", error });
    return c.json({ message: "Invalid request body" }, 400);
  }
})

We are using a docker compose file to define and run this multiple services easily for our development environment. Here’s how our docker-compose.yml file looks like:

version: "3"

services:
  loki:
    image: grafana/loki:2.9.0
    ports:
      - "3100:3100"
    networks:
      - loki-net

  promtail:
    image: grafana/promtail:latest
    volumes:
      - ./promtail-config.yaml:/etc/promtail/config.yml
      - ./logs:/var/log/app
    command: -config.file=/etc/promtail/config.yml
    networks:
      - loki-net

  grafana:
    image: grafana/grafana:10.2.0
    ports:
      - "3030:3000"
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
      - GF_AUTH_DISABLE_LOGIN_FORM=true
    volumes:
      - ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
    networks:
      - loki-net
    depends_on:
      - loki

networks:
  loki-net:
    driver: bridge

In this configuration, we define three services: Loki, Promtail, and Grafana. Loki is configured to listen on port 3100 and uses a local configuration file. Promtail mounts the configuration file and the logs directory, allowing it to collect logs from our application. Grafana is set up to listen on port 3030 and depends on Loki.

Configuring Promtail

Promtail plays a crucial role in collecting and forwarding logs to Loki. Here’s how we configure Promtail using the promtail-config.yaml file:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: varlogs
          __path__: /var/log/*.log

In this configuration, Promtail listens on port 9080 for HTTP traffic and forwards the logs to Loki running at http://loki:3100/loki/api/v1/push. The scrape_configs section defines the log files to be collected, in this case, all files with the .log extension in the /var/log directory.

Conclusion

By setting up Pino, Loki, Promtail, and Grafana, we’ve laid the foundation for a robust logging system in our HonoMart API service. We can now capture valuable information about our application’s behavior and store it efficiently for analysis. You can find the source code under this repository. So feel free to clone it or fork it and try the documented steps in the README.

In Part 2 of this series, we’ll explore how to set up monitoring and distributed tracing to further enhance our observability capabilities using metrics and traces. We’ll dive into tools like Prometheus, Jaeger or OpenTelemetry to gain even deeper insights into the performance and health of our application.

Stay tuned for the next part of this exciting journey into the world of monitoring and observability!