NGINX Performance Metrics with Prometheus

Prometheus is a combination of monitoring tool and time-series database that I have come to appreciate highly over the last few months. This article demonstrates how Prometheus can be used to visualize and monitor various web server metrics without changing the configuration of the web server itself.

Metrics are imported into Prometheus by pulling. This means that a monitored service needs to offer an HTTP endpoint which will be queried by Prometheus in regular intervals (usually 15 seconds). This endpoint (for example, http://<service-name>/metrics) needs to serve a response with the respective time series data:

http_response_count_total{application="my-application"} 12423
http_response_time_seconds{application="my-application"} 1832745

This works really well in Microservice Architectures – here, every service can implements its own /metrics endpoint that produces each and every conceivable metrics.

The Problem

This approach does not work that well when you want to use Prometheus to monitor performance metrics of (older) web applications served by a traditional LEMP stack (Linux, NGINX, MySQL, PHP). The reason for this is that a PHP application typically does not track metrics like the amount of processed requests or the average request time (or would need to gather this data by itself and then persist it in a database).

There are multiple connectors available for integrating NGINX web servers with Prometheus – however, many of those just process the information available on the NGINX status page offered by the ngx_http_stub_status_module module (not enough detail for my use case), or require you to reconfigure the web server to send its data to another logging server (much too invasive for my cherished legacy application).

My Solution

My solution to this challenge is my own Exporter that can generate performance metrics from existing NGINX access logs. This exporter is available as Open Source on Github, and exports various metrics based on an existing access.log file:

Number of processed requests per HTTP request method and response status
The total sum of time needed for processing HTTP requests per HTTP request method and status (together with the first metrics, this can be used to compute the average response time)
The total sum of the “Upstream Time”, which is the time that NGINX spent on waiting for PHP-FPM or other FastCGI modules.
Various quantiles of the response times and upstream times

Configuration

The NGINX exporter is configured using an HCL config file. In this config file, you can configure multiple “namespaces” – there will then be exported as separate sets of metrics. Have a look at the following configuration file as an example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
listen {
  port = 4040
  address = "0.0.0.0"
}

namespace "app1" {
  format = "$remote_addr - $remote_user [$time_local] \"$request\" $status $body_bytes_sent \"$http_referer\" \"$http_user_agent\" \"$http_x_forwarded_for\""
  source_files = ["/var/log/nginx/access.log"]
  labels {
    app = "my-application"
    environment = "production"
    foo = "bar"
  }
}

The exporter starts its own HTTP server that listens on the configured IP address and port (by default, 0.0.0.0:4040). Using the URL http://<IP>:4040/metrics, Prometheus can then scape the respective performance metrics.

The namespace name (app1, in this case) will later be used in the name of the exported metrics – for example, app1_http_response_time_seconds. The format (format) describes the format in which NGINX writes its access logs (for more information, refer to the documentation).

Starting the Exporter

The exporter is a statically compiled Go binary, that you can install immediately after downloading:

wget https://github.com/martin-helmich/prometheus-nginxlog-exporter/releases/download/v1.0.0/prometheus-nginxlog-exporter
./prometheus-nginxlog-exporter -config-file /pfad/zur/config.hcl

To ensure that the exporter starts automatically on system startup, you can configure a systemd unit (starting at Debian 8, Ubuntu 16.04 or CentOS 7). Place this file at /etc/systemd/system/prometheus-nginxlog-exporter.service:

1
2
3
4
5
6
7
8
9
10
11
12
[Unit]
Description=NGINX metrics exporter for Prometheus
After=network-online.target

[Service]
ExecStart=/usr/local/bin/prometheus-nginxlog-exporter -config-file /etc/prometheus-nginxlog-exporter.hcl
Restart=always
ProtectSystem=full
CapabilityBoundingSet=

[Install]
WantedBy=multi-user.target

Note that this unit file expects the executable to be located at /usr/local/bin/prometheus-nginxlog-exporter and the configuration file to be located at /etc/prometheus-nginxlog-exporter.hcl. Of course, you can adjust these paths as required.

The Result

I have been operating the prometheus-nginxlog-exporter in production for some time,now. Especially together with Grafana, you can easily create some excellent reports and monitoring dashboards:

NGINX monitoring in action

The diagrams in the screenshot above were generated from the following Prometheus queries:

Average response time: sum(rate(app_http_response_time_seconds_sum[5m])) by (instance) / sum(rate(app_http_response_time_seconds_count[5m])) by (instance)
Requests per second: sum(rate(app_http_response_time_seconds_count[1m])) by (instance)
Response time (90% quantile): app_http_response_time_seconds{quantile="0.9",method="GET",status="200"}
HTTP traffic: sum(rate(app_http_response_size_bytes[5m])) by (instance)
Status codes per second: sum(rate(app_http_response_count_total[1m])) by (status)

The Problem

My Solution

Configuration

Starting the Exporter

The Result

Comments