LogStash
LogStash is a utility that can read, filter and store or transmit log data on a server.
The program runs as a Java Application using Java 1.6.
Multiple inputs can be used such as files, stdin, syslog etc.
There multiple different filters that can be created using grep, grok, json etc.
Then the output from the program can be one of many different destinations such as amqp, file, mongodb (webscale >.>;), etc.
Contents
Basic Setup for LogStash
The basic setup will require mainly JAVA 1.6 installed on the server. There will be a few other requirements in order to use the Java wrapper by Tanuki.
For my setup I set everything in /opt/logstash.
The basic setup will not cover configuration of logstash itself since that will need to be set for each role.
[bash,n]
yum install java-1.6.0-openjdk
Create directory structure.
This may change a little as I continue to play around with the system but here is the basic structure for now.
Please note that you will need to set the basedir to where you would like to install logstash.
[bash,n]
basedir=/opt/logstash
mkdir -p $basedir/conf
mkdir $basedir/lib
mkdir $basedir/logs
mkdir $basedir/tmp
Download the Tanuki Java Wrapper for your system.
Make sure you download the community version unless you want to give them all your monies.
Modify the wget line if you are using something other then linux x64.
$basedir/shipper can be changed to describe the function logstash is going to be performing. I originally had it set to logstash but since the first 4 characters match the logs dir it made it hard to call manually.
[bash,n]
wget http://wrapper.tanukisoftware.com/download/3.5.14/wrapper-linux-x86-64-3.5.14.tar.gz -O /home/temp/wrapper.tar.gz
tar zxf /home/temp/wrapper.tar.gz
cp /home/temp/wrapper-linux*/src/conf/wrapper.conf.in $basedir/conf/wrapper.conf
cp /home/temp/wrapper-linux*/src/bin/sh.script.in $basedir/shipper
cp /home/temp/wrapper-linux*/src/bin/wrapper $basedir/lib/
cp /home/temp/wrapper-linux*/lib/libwrapper.so $basedir/lib/
cp /home/temp/wrapper-linux*/lib/wrapper.jar $basedir/lib/
rm -rf /home/temp/wrapper-linux*
rm -f /home/temp/wrapper.tar.gz
chmod +x $basedir/logstash
Configure wrapper
There will be several different files to edit for the wrapper.
logstash sh script
[bash,n]
vim $basedir/logstash
| APP_NAME | This will be the name of the application, and will be the service name if you activate it as a service. You can set it to something generic like logstash or define what it is like logstash-shipper. |
| APP_LONG_NAME | This is the plain text name for the application, this will be displayed when the service is starting. |
| WRAPPER_CMD | This should be set to "./lib/wrapper" |
| WRAPPER_CONF | This should be set to "./conf/wrapper.conf" |
Wrapper configuration
[bash,n]
vim $basedir/conf/wrapper.conf
Do note that some of the values will be commended out by default, obviously they need to be uncommented.
Also some of the strings will not exist if they are numbered, just add them below .1
Updated this so that it calls on the relative paths instead of a set path. This means you can put the script in any directory and just run it.
| wrapper.java.command | This needs to be changed to org.tanukisoftware.wrapper.WrapperJarApp |
| wrapper.java.classpath.1 | ../lib/wrapper.jar |
| wrapper.java.classpath.2 | ../lib/logstash.jar |
| wrapper.java.library.path.1 | ../lib |
| wrapper.java.additional.1 | -Djava.io.tmpdir=../tmp |
| wrapper.app.parameter.1 | ../lib/logstash.jar |
| wrapper.app.parameter.2 | agent |
| wrapper.app.parameter.3 | -f |
| wrapper.app.parameter.4 | ../conf/logstash.conf |
| wrapper.app.parameter.5 | -l |
| wrapper.app.parameter.6 | ../logs/logstash.log |
| wrapper.logfile | ../logs/wrapper.log |
| wrapper.logfile.maxsize | 100m |
| wrapper.logfile.maxfiles | 2 |
Obviously you can change the logfile settings depending on how much logging you want.
Get the logstash jar file
You will need the logstash jar file from the logstash site.
This can also be used to update to the latest version (obviously the url will need to be updated).
[bash,n]
wget http://semicomplete.com/files/logstash/logstash-1.1.0-monolithic.jar -O $basedir/lib/logstash.tar
Logstash Configuration
Ok here is where things get interesting. As of right now you should have a working wrapper and logstash will be ready once this configuration is added.
There are multiple roles you can assign the logstash agent, the two main ones we will be exploring will be shipper and indexer. Obviously there are more roles if you wish.
Logstash Shipper
The goal of the shipper is to get the logs from the local server and send them somewhere. You have the option of processing the logs either on the server or just shipping them off.
The main advantage of processing them on the server is that you can off load some of the work load to multiple different servers and keep the resource usage on the receiving end at a minimum.
The main advantage of processing them on the receiving end is that you can manage your filter rules in one place, which is the way I am going to set it up. But obviously this is all dependent on how you are planning on using logstash. If you want it to only send alerts you will want to do the filtering on the source servers.
[bash,n]
vim $basedir/conf/logstash.conf
[bash,n]
input {
file {
type => "syslog"
# Wildcards work here :) path => [ "/var/log/messages", "/var/log/syslog", "/var/log/maillog", "/var/log/faillog", "/var/log/cron", "/var/log/lfd.log", "/var/log/secure", "/var/log/yum.log" ] }
file {
type => "exim"
path => [ "/var/log/exim_mainlog", "/var/log/exim_paniclog", "/var/log/exim_reject_log" ]
}
file {
type => "apache-access"
path => [ "/usr/local/apache/domlogs/*/*", "/usr/local/apache/logs/access_log" ]
}
file {
type => "apache-error"
path => "/usr/local/apache/logs/error_log"
}
file {
type => "cpanel"
path => [ "/usr/local/cpanel/logs/access_log", "/usr/local/cpanel/logs/error_log", "/usr/local/cpanel/logs/login_log" ]
}
exec {
type => "system-memory"
command => "free"
interval => "120"
}
}
output {
# Output events to stdout for debugging. Feel free to remove
# this output if you don't need it.
#stdout { }
# Ship events to the amqp fanout exchange named 'rawlogs"
amqp {
host => "10.0.123.12"
exchange_type => "fanout"
name => "rawlogs"
}
}
This is the actual configuration for my current shipper on my web server. I will go through and explain the main points.
First there are three main sections to a logstash configuration file: input, filter, output.
Notice I do not have a filter section in this config file, we will cover that on the indexer configuration section
Input
The first thing you will notice is under the input section there are individual inputs that are labeled file or exec.
These are the individual input streams that are being grabbed by logstash. Keep in mind that there are actually many different inputs.
(amqp, exec, file, gelf, redis, stdin, stomp, syslog, tcp, twitter, xmpp, zeromq)
For our purposes we mainly want to tail files and execute commands.
So lets look at an example of a file input stream:
[bash,n]
file {
type => "syslog"
# Wildcards work here :) path => [ "/var/log/messages", "/var/log/syslog", "/var/log/maillog", "/var/log/faillog", "/var/log/cron", "/var/log/lfd.log", "/var/log/secure", "/var/log/yum.log" ] }
Type The type tag identifies the stream name, this will be used later on to determine what filters to apply to it. You can define these any way you want. They should be grouped together by file format type.
Path This is the actual file or list of files that you want to include in the input stream.
This can actually be written in several different formats
[bash,n]
path => [ "/var/log/messages", "/var/log/syslog", "/var/log/maillog", "/var/log/faillog", "/var/log/cron", "/var/log/lfd.log", "/var/log/secure", "/var/log/yum.log" ]
[bash,n]
path => "/var/log/messages"
[bash,n]
path => "/var/log/messages" path => "/var/log/syslog" path => "/var/log/maillog" path => "/var/log/faillog" path => "/var/log/cron" path => "/var/log/lfd.log" path => "/var/log/secure" path => "/var/log/yum.log"
The first example is to set up all of the files in an array. Obviously this saves the most space, but can be hard to read once you get a few files added.
The second example is if you want to include only a single file in to the stream.
The third example is showing that you can have multiple path defines. This is the easiest to read, but will also make the configuration file larger.
There are many different defines for the file input, you can see them all here: http://logstash.net/docs/1.1.0/inputs/file
The second input type we will use is exec.
[bash,n]
exec {
type => "system-memory"
command => "free"
interval => "120"
}
Type functions the exact same way that it does in the file input.
Command This is the command that you wish to execute. You can supply alternate flags to the command also such as free -m
Interval This sets how often you would like to execute the command in seconds.
As of the current version the output from a command will be one log line, there are plans in the future to have a split value to allow for splitting by newlines or other values. For something like free this is fine as the output really is one output. Something like ps, you may want to split although logging wise it still is one output.
Output
So now that we have the log information, what do we want to do with it?
Just as there are multiple different inputs there are multiple different outputs:
(amqp, elasticsearch, elasticsearch_river, file, ganglia, gelf, graphite, internal, loggly, mongodb, nagios, null, redis, statsd, stdout, stomp, tcp, websocket, xmpp, zabbix, zeromq)
For our shipper example we are going to use amqp.
amqp is basically a message storage system that will be running on the remote server. It acts as a queue for logs messages as they are being processed. The nice thing about this is you can stop the indexers with out worrying about losing messages.
We will explore a few more outputs on the indexer configuration, but here is the basic output configuration for amqp.
[bash,n]
amqp {
host => "10.0.123.12"
exchange_type => "fanout"
name => "rawlogs"
}
Host Obviously this is the host name or IP for the amqp server.
Exchange_Type This defines how the exchange will be handled, I am not sure what the values mean.
'Name This is the overall container name for the output. If you want to run different indexers with different functions you can split this up in to different named groups such as rawlogs and alerts.
There are different defines for this also, you can see them all here: http://logstash.net/docs/1.1.0/outputs/amqp
The example configuration also had a output that is commented out called stdout.
This allows you to mirror the output information to the terminal, or in our case to the logstash.log.
The advantage to this is you can see exactly what the shipper is sending to the amqp server, obviously if you are not using it actively turn it off to save space.
Execution and autostart
Ok, now for some testing. At this point we do not have the remote server set up, so the log files will complain about not being able to talk with the amqp server, but we can still test that everything works and set up the chkconfig entry.
You can start the logstash shipper with the following command
[bash,n]
$basedir/shipper start
If everything goes as planned it should say that it started properly and you can check the status with:
[bash,n]
$basedir/shipper status
One other helpful option may be to start the script using console to see the output directly on the screen (for troubleshooting):
[bash,n]
$basedir/shipper console
If for some reason it is not running you are likely getting a java error somewhere :|.
You can check the logs to see what is going on or try to manually run the application to see if there is an error there.
To set up auto start (init.d) you can simply run this command.
[bash,n]
$basedir/shipper install
At this point you can execute all of the logstash commands using init.d or the service command.
Do keep in mind that the application name set in the configuration will be the actual name you will use so for our example:
[bash,n]
service logstash-shipper status
Package and go
Ok, here is the really cool part about this setup, once you get a working shipper with the log files you want to ship out you can tar.gz the directory and move this package to as many servers as you want.
The only real requirement of this system is that you are running java 1.6. Everything else is now included in the install directory.
This means that you can do the first build and quickly deploy it to many different servers. The logstash.conf file may need to be altered if you are running different software that has different logs but that is fairly trivial.
You may as well package the files since we can use the exact same setup on the receiving side as a baseline.
Logstash Indexer
You will want to probably rename the logstash program in the bash script to something like logstash-indexer:
[bash,n]
vim $basedir/logstash
If you are looking for a web interface for searching (not using greylog2) you can enable it in the wrapper.conf file
[bash,n]
vim $basedir/conf/wrapper.conf
We just need to add one option to be passed to the script
[bash,n]
wrapper.app.parameter.7=web
Here is our basic code for the indexer ($basedir/conf/logstash.conf):
[bash,n]
input {
amqp {
# ship logs to the 'rawlogs' fanout queue.
type => "all"
host => "127.0.0.1"
exchange => "rawlogs"
name => "rawlogs_consumer"
durable => "true"
exclusive => "false"
auto_delete => "false"
}
}
filter {
grok {
type => "syslog" # for logs of type "syslog"
pattern => "%{SYSLOGLINE}"
# You can specify multiple 'pattern' lines
}
grok {
type => "apache-access" # for logs of type 'apache-access'
pattern => "%{COMBINEDAPACHELOG}"
}
date {
type => "syslog"
# The 'timestamp' and 'timestamp8601' names are for fields in the # logstash event. The 'SYSLOGLINE' grok pattern above includes a field # named 'timestamp' that is set to the normal syslog timestamp if it # exists in the event. timestamp => "MMM d HH:mm:ss" # syslog 'day' value can be space-leading timestamp => "MMM dd HH:mm:ss" timestamp8601 => ISO8601 # Some syslogs use ISO8601 time format }
date {
type => "apache-access"
timestamp => "dd/MMM/yyyy:HH:mm:ss Z"
}
}
output {
#stdout { }
# If your elasticsearch server is discoverable with multicast, use this:
#elasticsearch { }
# If you can't discover using multicast, set the address explicitly
elasticsearch {
host => "127.0.0.1"
}
}
Input
Just like with the shipper you will need to set where you would like to get the information for the indexer. In this case this will be from the amqp server.
Type This is the name of the filter groups you want these inputs to be labeled as, all will match all filter types. This allows you to direct the input to specific filters.
Host This is the hostname or ip for the amqp server, in this case we are running it on the same machine.
Exchange This should be the name of the stream that was sent from the shipper, in this case it is rawlogs but it could also be something like alerts.
Name This is the queue name for the amqp server, this is optional but will make things a little easier to understand on the amqp server.
I do want to point out that the following defines were not provided by the logstash wiki, these were added as tweaks to force RabbitMQ to store the logs on the server if the indexer is not running and allow more then one indexer to run on the queue. I think that their idea is that different indexers will have different queues. So if one closes its own queue can go away. But if you only have one indexer you will lose any messages from the time that the indexer was not running.
Durable This sets the queue to be durable and remain active even if the RabbitMQ server is restarted
Exclusive By default it is set to only allow one indexer. This allows more then one to run on the queue.
Auto_Delete The default action is to nuke the queue when the first indexer logs in, this is kinda silly if you want to keep your log messages that are not yet processed.
Filter
So, what are filters. Basically filters allow you to sort through incoming data and decide what to do with it and also process the log in to something more meaningful.
You can set up a filter to remove any 200 message from an apache log since it really does not need to be logged if you do not want it to be.
You can also set up a filter to take the raw apache log line and transform it in to a more usable format that can be cataloged and browsed easier. The main part of this is to identify the individual fields inside the log.
As stated before these can be set up in the shipper or on the indexer, in our setup we are doing these on the indexer to make management a little easier.
There are multiple different filter mechanisms:
(date, dns, gelfify, grep, grok, grokdiscovery, json, multiline, mutate, split)
The main ones we will be using are grok and date. Anything that does not have a filter will be passed through raw, which is fine too.
[bash,n]
grok {
type => "syslog" # for logs of type "syslog"
pattern => "%{SYSLOGLINE}"
# You can specify multiple 'pattern' lines
}
Grok is a filter manager that allows you to great complex filters from smaller parts and easily reference them with a single name. There are a few default filters, you can also build your own.
Type This is the name of the log stream as specified by the shipper. This should be something like 'apache-access' or 'apache-error' Pattern This is the filter identifier, you can use the few defaults that are provided or make your own. In this example we are using the SYSLOGLINE pattern.
[bash,n]
date {
type => "syslog"
# The 'timestamp' and 'timestamp8601' names are for fields in the # logstash event. The 'SYSLOGLINE' grok pattern above includes a field # named 'timestamp' that is set to the normal syslog timestamp if it # exists in the event. timestamp => "MMM d HH:mm:ss" # syslog 'day' value can be space-leading timestamp => "MMM dd HH:mm:ss" timestamp8601 => ISO8601 # Some syslogs use ISO8601 time format }
You can specify more then one filter for a single file stream.
In this example we create an additional filter to handle the date in the message.
This provides the actual date from the message instead of when it was read from the file. Obviously every log file seems to have its own date format so you will be using this a lot.
Output
Just like the shipper, we have to do something with the information once we have processed it.
[bash,n]
output {
#stdout { }
# If your elasticsearch server is discoverable with multicast, use this:
#elasticsearch { }
# If you can't discover using multicast, set the address explicitly
elasticsearch {
host => "127.0.0.1"
}
}
In this case we are going to store all of our data in elasticsearch to allow for full text searching of the logs.
Host self explanatory this is the hostname or ip of the elasticsearch server. In this case we have it on the same server.
Just like the shipper there is a stdout output for debugging the filtered logs before they are sent to elasticsearch (or whatever) the output will be saved to the logstash.log file.
Setup amqp server
In this example we are going to run RabbitMQ.
One thing to keep in mind with RabbitMQ, with the current version it does not look like there is a easy way to set its memory usage. So it could use a lot of memory if the queue gets large and is not getting indexed fast enough. Do keep in mind that it will not use more memory then the system has, but the other java applications may. I have had a few instances where swapping occurred on my server, but then again I only gave it 2Gb of ram.
Enable EPEL Repo
CentOS 5
[bash,n]
rpm -Uvh http://download.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
CentOS 6
[bash,n]
rpm -Uvh http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-5.noarch.rpm
Obviously it is up to you if you want to enable the yum priorities plugin and setup the repos or just disable the repos when you are done.
Enable EPEL Erlang Repo
You only need to do this if you are running CentOS 5
CentOS 6 already has a newer version of Erlang so you should be ok with this.
[bash,n]
wget -O /etc/yum.repos.d/epel-erlang.repo http://repos.fedorapeople.org/repos/peter/erlang/epel-erlang.repo
Install RabbitMQ
Their install instructions tell you to install Erlang first... But fortunately we are using yum so we do not need to do this.
[bash,n]
wget http://www.rabbitmq.com/releases/rabbitmq-server/v2.8.1/rabbitmq-server-2.8.1-1.noarch.rpm
rpm --import http://www.rabbitmq.com/rabbitmq-signing-key-public.asc
yum install rabbitmq-server-2.8.1-1.noarch.rpm
At this point you should be able to start the service and it does not really require any additional configuration.
Start RabbitMQ
This runs like any other service you can start and set it to auto start with
[bash,n]
chkconfig rabbitmq-server on
service rabbitmq-server start
Setup ElasticSearch
LogStash gave us this nifty install script, you will want to run it in the location you want ElasticSearch. In my setup I used /opt/elasticsearch
[bash,n]
ES_PACKAGE=elasticsearch-0.18.7.zip
ES_DIR=${ES_PACKAGE%%.zip}
SITE=https://github.com/downloads/elasticsearch/elasticsearch
if [ ! -d "$ES_DIR" ] ; then
wget --no-check-certificate $SITE/$ES_PACKAGE unzip $ES_PACKAGE
fi
If you want a wrapper like we have for log stash (Tanuki) you can get it here:
https://github.com/elasticsearch/elasticsearch-servicewrapper
I will probably rewrite this later with the same instructions that I have for the logstash wrapper.
Essentially it is the same thing, just set up for ElasticSearch
One thing that will probably come up is the open file limit with ulimit. By default this is set to 1024 which is way too low for elastic search. You will want to change this to something higher:
[bash,n]
vim /etc/security/limits.conf
[bash,n]
root hard nofile 102400
root soft nofile 102400
Web Interfaces
There are several different web interfaces, it is up to you what ones you allow through the firewall.
LogStash
If you enabled the web interface you can access it using $ip:9292
This allows you to view and search through your logs. The interface is kinda limited, many installs instead opt to use graylog2.
RabbitMQ
If you want to use the web interface you will need to enable the plugin and restart RabbitMQ
[bash,n]
rabbitmq-plugins enable rabbitmq_management
service rabbitmq-server restart
You can login using guest/guest on port 55672
You will probably want to create a new admin user and assign it all the rights that guest has. Then you will want to remove the permissions from guest.
By default logstash connects via the guest user, so it will need to have access or you will need so specify a user/password in the configurations.
The web interface will allow you to view the queue and the basic statistics on it such as messages per second and queue size.
ElasticSearch
By default there is not a web interface installed, you can quickly install one using:
[bash,n]
/opt/elasticsearch/bin/plugin -install mobz/elasticsearch-head
Then from there you can open the following url in your browser:
http://localhost:9200/_plugin/head/
You do need the / at the end or you will get a blank white page.
Custom Patterns
Here are some custom patterns I am working on for GROK. I will continue to add to these for all of the different log types I encounter.
Do keep in mind that if there is a grok parse error (logstash is not able to parse based on the filter) it will let you know in the message... It just will not tell you anything other then there was an error.
These will go in to $basedir/patterns/
I usually name the files logically based on the application name.
The only other main thing you will need to add is a specific call to your patterns in the actual input:
[bash,n]
grok {
type => "cpanel-access"
patterns_dir => "../patterns"
pattern => "%{CPANELACCESSLOG}"
}
$basedir/patterns/cpanel:
[bash,n]
- cPanel uses the followign date stamp for the access_log: 02/25/2012:13:40:23 -0000
CPANELACCESSDATE %{MONTHNUM}/%{MONTHDAY}/%{YEAR}:%{TIME} %{INT:ZONE}
- cPanel uses the following date stamp for the error_log: 2012-02-25 13:40:23 -0500
CPANELERRORDATE %{DATESTAMP} %{INT:ZONE}
- cPanel access logs are build almost identically to the apache access logs, the main difference is the time stamp.
CPANELACCESSLOG %{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{CPANELACCESSDATE:timestamp}\] "%{WORD:verb} %{URIPATHPARAM:request} HTTP/%{NUMBER:httpversion}" %{NUMBER:response} (?:%{NUMBER:bytes}|-) "(?:%{URI:referrer}|-)" %{QS:agent}
- cPanel error logs are built in the following format: [2012-02-25 13:40:23 -0500] info [tailwatchd] message
CPANELERRORLOG \[%{CPANELERRORDATE:timestamp}\] %{WORD:level} \[%{PROG:program}\]
I stole and edited this one since it also handled modsec messages. Updated the code for better naming and to allow for a single call vs two calls as in the original script.
https://gist.github.com/1346387
$basedir/patterns/apache:
[bash,n]
APACHEERRORDATE %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{YEAR}
APACHEERRORPREFIX \[%{APACHEERRORDATE:date}\] \[%{WORD:level}\] \[%{WORD} %{IPORHOST}\]
GENERICAPACHEERROR %{APACHEERRORPREFIX} %{GREEDYDATA:message}
MODSECPREFIX %{APACHEERRORPREFIX} ModSecurity: %{NOTSPACE:modsecseverity}\. %{GREEDYDATA:modsecmessage} MODSECRULEFILE \[file %{QUOTEDSTRING:rulefile}\] MODSECRULELINE \[line %{QUOTEDSTRING:ruleline}\] MODSECMATCHOFFSET \[offset %{QUOTEDSTRING:matchoffset}\] MODSECRULEID \[id %{QUOTEDSTRING:ruleid}\] MODSECRULEREV \[rev %{QUOTEDSTRING:rulerev}\] MODSECRULEMSG \[msg %{QUOTEDSTRING:rulemessage}\] MODSECRULEDATA \[data %{QUOTEDSTRING:ruledata}\] MODSECRULESEVERITY \[severity %{QUOTEDSTRING:ruleseverity}\] MODSECRULETAGS (?:\[tag %{QUOTEDSTRING:ruletag0}\] )?(?:\[tag %{QUOTEDSTRING:ruletag1}\] )?(?:\[tag %{QUOTEDSTRING:ruletag2}\] )?(?:\[tag %{QUOTEDSTRING:ruletag3}\] )?(?:\[tag %{QUOTEDSTRING:ruletag4}\] )?(?:\[tag %{QUOTEDSTRING:ruletag5}\] )?(?:\[tag %{QUOTEDSTRING:ruletag6}\] )?(?:\[tag %{QUOTEDSTRING:ruletag7}\] )?(?:\[tag %{QUOTEDSTRING:ruletag8}\] )?(?:\[tag %{QUOTEDSTRING:ruletag9}\] )?(?:\[tag %{QUOTEDSTRING}\] )* MODSECHOSTNAME \[hostname %{QUOTEDSTRING:targethost}\] MODSECURI \[uri %{QUOTEDSTRING:targeturi}\] MODSECUID \[unique_id %{QUOTEDSTRING:uniqueid}\]
MODSECAPACHEERROR %{MODSECPREFIX} %{MODSECRULEFILE} %{MODSECRULELINE} (?:%{MODSECMATCHOFFSET} )?(?:%{MODSECRULEID} )?(?:%{MODSECRULEREV} )?(?:%{MODSECRULEMSG} )?(?:%{MODSECRULEDATA} )?(?:%{MODSECRULESEVERITY} )?%{MODSECRULETAGS}%{MODSECHOSTNAME} %{MODSECURI} %{MODSECUID}
APACHEERRORLOG %{MODSECAPACHEERROR}|%{GENERICAPACHEERROR}
Troubleshooting
There are a few things to keep in mind when troubleshooting this setup. The most common error that I have found is the RabbitMQ queue is not being processed.
More often then not this is caused by ElasticSearch either not accepting connections or accepting them very slowly.
Tailing /opt/elasticsearch/logs/elasticsearch.log will give you an idea if it is causing the problems. Generally if it is a issue with ulimits it will spew messages constantly.
Alternatively check the logstash logs too to see if something is up with logstash.
/opt/logstash/logs/logstash.log
Also to quickly check if the processes are even running you can use service $servicename status. This will give you a message like:
[bash,n]
service elasticsearch status
ElasticSearch is running: PID:2779, Wrapper:STARTED, Java:STARTED
Complete Package
If you are lazy and want the complete package you can download it here:
http://jamesdooley.us/logstash.tar.gz
This is currently set up as a shipper, but you can modify it to work in a different role.
And yes I put this at the very bottom so you would have to read all the stuff above :P