Parsing NGINX error logs with PHP

Joeri Poesen //

nginx php fpm

I needed a way to fetch and parse various system logs, including the NGINX access and error logs.

Here's an example log entry that I needed to parse into an array:

2017/07/16 10:35:16 [error] 1365#1365: *5 FastCGI sent in stderr: "PHP message: PHP Parse error: syntax error, unexpected '$autoloader' (T_VARIABLE) in /var/www/path/to/project/index.php on line 14" while 
reading response header from upstream, client: 192.168.33.1, server: example.dev, request: "GET / HTTP/1.1", 
upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "example.dev"

A quick way to start parsing NGINX error logs with PHP is by using kassner/log-parser.

While I could have used a big one-off regular expression, I like using this parsing library: it has a clean interface and is extensible enough to reuse on different log formats.

1. Use composer to fetch kassner/log-parser:

composer require kassner/log-parser:~1.0

2. Instantiate a parser:

$parser = new \Kassner\LogParser\LogParser();

3. Tell the parser which log file format to expect.

This is where it gets tricky. This particular parser is based on pre-defined regular-expression patterns, and contains a number of patterns / format strings you can use to define the log format.

The parser assumes the standard Apache access log format by default but that won't work for NGINX error logs. Its documentation does mention the format to use to parse NGINX access logs, but doesn't mention how to parse NGINX error logs.

The solution:

  • add additional named regex patterns
  • use the additional patterns to build up the format used in NGINX error logs

4. Define new named patterns.

Have a look at the example error log entry at the top of this article. It consists of a number of elements, each of which needs to be defined and targeted with an individual regular expression.

I've chosen to prefix the pattern placeholder names with '%NGX' to be sure their names won't clash with the pre-defined patterns. You're free to name the placeholders ('%NGXDT') and their corresponding regex patterns ('') however you want.

$parser->addPattern('%NGXDT', '(?P<datetime>[\d+/ :]+)');
$parser->addPattern('%NGXLVL', '\[(?P<errorlevel>.+)\]');
$parser->addPattern('%NGXPID', '(?P<processid>\d+(?=\#))');
$parser->addPattern('%NGXTID', '(?P<threadid>(?<=\#)\d+)');
$parser->addPattern('%NGXCID', '(?P<connectionid>(?<=\:\s\*)\d+)');
$parser->addPattern('%NGXMSG', '(?P<message>.+)');
$parser->addPattern('%NGXCL', '(?P<client>.+)');
$parser->addPattern('%NGXSRV', '(?P<server>.+)');
$parser->addPattern('%NGXREQ', '(?P<request>.+)');
$parser->addPattern('%NGXUPS', '(?P<upstream>.+)');
$parser->addPattern('%NGXHST', '(?P<host>.+)');

5. Build up the NGINX error log format using the new named patterns:

 $parser->setFormat('%NGXDT %NGXLVL %NGXPID\#%NGXTID: \*%NGXCID %NGXMSG, client: %NGXCL, server: %NGXSRV, request: "%NGXREQ", upstream: "%NGXUPS", host: "%NGXHST"');

NOTE: pay particular attention to whitespaces and special characters. Notice that the asterisk and pound sign are escaped with a backslash!

6. Fetch, loop through the log file, and parse each line:

$entries = [];
$lines = file('/var/log/nginx/error.log', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($lines as $line) {
    $entries[] = $parser->parse($line);
}

If all went well, the $entries array should look like this:

Array
(
    [0] => stdClass Object
        (
            [datetime] => 2017/07/16 10:35:16
            [errorlevel] => error
            [processid] => 1365
            [threadid] => 1365
            [connectionid] => 5
            [message] => FastCGI sent in stderr: "PHP message: PHP Parse error:  syntax error, unexpected '$autoloader' (T_VARIABLE) in /var/www/path/to/project/www/index.php on line 14" while reading response header from upstream
            [client] => 192.168.33.1
            [server] => example.dev
            [request] => GET / HTTP/1.1
            [upstream] => fastcgi://unix:/run/php/php7.0-fpm.sock:
            [host] => example.dev
        )

Success!

Conclusion

Unless you find a tool that is able to parse all the log formats you're interested in, you'll need to come up with one or more regular expressions to parse and process your log lines. Kassner/log-parser makes this process easier by allowing you to break down what would otherwise be a hug regex into smaller regex patterns, one for each element in your log line.

Once defined, you can reuse your patterns across different projects. Never reinvent the wheel again.