Grok logs stored in OpenStack Swift via middleware
OpenStack Swift is a distributed storage product - it lets you store blobs efficiently and easily. Since many use such storage engines for storing textual data - often times system logs or other types of structured text, it makes sense to make reading this data easier.
Logstash is a well known log shipper and processor, mostly known for it's grok
filter. With grok you can easily parse log data where text structure repeats an agreed pattern, by providing the pattern as regex (or aliases to pre-defined regexes) and applying it to every line of the log file to generate a structured JSON with extracted data.
For instance, a OpenStack Swift log looks like this - very similar to an HTTP server log line:
Dec 12 23:35:48 vagrant-ubuntu-trusty-64 proxy-server: 127.0.0.1 127.0.0.1 12/Dec/2015/23/35/48 GET /v1/AUTH_test/sample-log/sample-log HTTP/1.0 200 - python-swiftclient-2.6.1.dev26 AUTH_tk262ff273b... - 71 - tx6398b237474f4a69a9a37-00566caf54 - 0.0057 - - 1449963348.351684093 1449963348.357352018 0
A grok pattern which will extract important information from it, per line, and output a JSON with that data is going to look like this:
%{SYSLOGTIMESTAMP:date} %{HOSTNAME:client} %{SYSLOGPROG:program} %{HOSTNAME} %{HOSTNAME} %{NOTSPACE} %{WORD:verb} %{NOTSPACE:request} (?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest}) %{NUMBER:response} - %{QS:agent} %{NOTSPACE} - %{NUMBER:client_etag} - %{NOTSPACE:transaction_id} - {NUMBER} - - {NUMBER:request_start_time} {NUMBER:request_end_time} {NUMBER:policy_index}
Quite nice and easy to work with. However, if you store logs on Swift you have to read whole files from storage to execute the grok operation. Wouldn't it be nicer if you could get the grokked content directly from the storage engine?
This is what https://github.com/synhershko/swift-middleware-grok is for. It is a Swift middleware, that once installed will allow you to specify a grok pattern and get a grok of a file instead of the file itself.
Usage looks something like this:
vagrant@saio:~/$ echo "awesome" > test
vagrant@saio:~/$ swift upload test test
vagrant@saio:~/$ swift download test test -o -
awesome
vagrant@saio:~/$ swift download test test --header "grok-pattern":"%{WORD:word}" -o -
{"word": "awesome"}
More details are in the README in the github repo. Contributions or comments welcome.