DevOps Sessions - Week 14 - Monitoring

devops monitoring metrics dashboards alerts 31-10-2024 ​​

DevOps Sessions - Week 14 - Monitoring

Welcome to Week 14 of our “Becoming a DevOps Engineer” series! This week, we will focus on monitoring, an essential practice in DevOps for ensuring the health, performance, and reliability of your applications and infrastructure. Effective monitoring enables you to detect issues proactively, respond quickly, and maintain high availability. We will explore key concepts, popular monitoring tools like Prometheus, Grafana, ELK Stack, and Datadog, and best practices for implementing a robust monitoring strategy. Let’s get started!

Session Overview

1. Introduction to Monitoring

2. Key Metrics to Monitor

3. Prometheus

4. Grafana

5. ELK Stack

6. Datadog

7. Best Practices and Tools

1. Introduction to Monitoring

What is Monitoring?

Monitoring is the practice of continuously observing and analyzing the performance, health, and reliability of your applications and infrastructure. It involves collecting data from various sources and using this data to identify and resolve issues proactively.

Importance of Monitoring in DevOps

2. Key Metrics to Monitor

Infrastructure Metrics

Application Metrics

Business Metrics

3. Prometheus

Overview of Prometheus

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays results, and triggers alerts if a condition is met.

Setting Up Prometheus

  1. Install Prometheus:

    • Download and install Prometheus from the official website.
    • Extract the tarball and navigate to the Prometheus directory.
  2. Configure Prometheus:

    • Edit the prometheus.yml file to define the monitoring targets.
    global:
      scrape_interval: 15s
    
    scrape_configs:
      - job_name: 'node_exporter'
        static_configs:
          - targets: ['localhost:9100']
  3. Start Prometheus:

    ./prometheus --config.file=prometheus.yml

Collecting Metrics with Prometheus

  1. Install Node Exporter:

    • Download and install Node Exporter from the official website.
    • Start Node Exporter to collect system metrics.
    ./node_exporter
  2. Access Prometheus:

    • Open a web browser and navigate to http://localhost:9090 to access the Prometheus dashboard.
    • Use the Prometheus query language (PromQL) to explore and visualize metrics.

4. Grafana

Overview of Grafana

Grafana is an open-source platform for monitoring and observability. It allows you to visualize and analyze metrics collected from various sources, including Prometheus.

Setting Up Grafana

  1. Install Grafana:

    • Download and install Grafana from the official website.
    • Start the Grafana server.
    sudo systemctl start grafana-server
  2. Access Grafana:

    • Open a web browser and navigate to http://localhost:3000 to access the Grafana dashboard.
    • Log in with the default credentials (username: admin, password: admin).

Visualizing Metrics with Grafana

  1. Add a Data Source:

    • Navigate to Configuration > Data Sources and add Prometheus as a data source.
    • Enter the URL of your Prometheus server (e.g., http://localhost:9090).
  2. Create a Dashboard:

    • Navigate to Create > Dashboard and add a new panel.
    • Use the PromQL query language to select metrics from Prometheus and visualize them in Grafana.

5. ELK Stack

Overview of ELK Stack

The ELK Stack (Elasticsearch, Logstash, Kibana) is a popular open-source stack for searching, analyzing, and visualizing log data in real time.

Setting Up ELK Stack

  1. Install Elasticsearch:

    • Download and install Elasticsearch from the official website.
    • Start the Elasticsearch service.
    sudo systemctl start elasticsearch
  2. Install Logstash:

    • Download and install Logstash from the official website.
    • Create a configuration file (logstash.conf) to define the input, filter, and output.
    input {
      file {
        path => "/var/log/syslog"
        start_position => "beginning"
      }
    }
    filter {
      grok {
        match => { "message" => "%{SYSLOGLINE}" }
      }
    }
    output {
      elasticsearch {
        hosts => ["localhost:9200"]
      }
    }
  3. Start Logstash:

    ./logstash -f logstash.conf
  4. Install Kibana:

    • Download and install Kibana from the official website.
    • Start the Kibana service.
    sudo systemctl start kibana

Logging with ELK Stack

  1. Access Kibana:

    • Open a web browser and navigate to http://localhost:5601 to access the Kibana dashboard.
    • Configure an index pattern to visualize the logs collected by Logstash.
  2. Create Visualizations and Dashboards:

    • Use Kibana to create visualizations and dashboards based on the log data stored in Elasticsearch.

6. Datadog

Overview of Datadog

Datadog is a cloud-based monitoring and analytics platform that provides comprehensive visibility into the health and performance of your applications and infrastructure.

Setting Up Datadog

  1. Sign Up for Datadog:

  2. Install the Datadog Agent:

    DD_AGENT_MAJOR_VERSION=7 DD_API_KEY=<YOUR_API_KEY> DD_SITE="datadoghq.com" bash -c "$(curl -L https://s3.amazonaws.com/dd-agent/scripts/install_script.sh)"

Monitoring with Datadog

  1. Configure Integrations:

    • Navigate to Integrations > Integrations and enable integrations for the services you want to monitor (e.g., AWS, Docker, Kubernetes).
  2. Create Dashboards:

    • Navigate to Dashboards > New Dashboard and add widgets to visualize metrics collected by Datadog.
  3. Set Up Alerts:

    • Navigate to Monitors > New Monitor and create alerts based on specific conditions (e.g., high CPU usage, error rates).

7. Best Practices and Tools

Best Practices for Monitoring


By mastering monitoring with tools like Prometheus, Grafana, ELK Stack, and Datadog, you can ensure the health, performance, and reliability of your applications and infrastructure. Stay tuned for next week’s session, where we will explore logging. Happy monitoring!

Author's photo

Nihit Jain

Architecting DevOps 🏗️ with Data, AI, Security, & IoT on Cloud ☁️




See other articles:

Sessions