Creator of this small website
Jul 31, 2017 3 min read

Adventures In Hindsight 2

thumbnail for this post

This is the follow-up of my previous post regarding hindsight experimentations and discovery

Outputing to ES

This one is fairly easy, since it uses the lua sandbox extensions for ES. I ended up with the following run/output/es.cfg config

filename        = "elasticsearch_bulk_api.lua"
message_matcher = "TRUE"
memory_limit    = 200 * 1024 * 1024
-- ticker_interval = 10

address             = "es"
port                = 9200
timeout             = 10
flush_count         = 1
flush_on_shutdown   = false
preserve_data       = not flush_on_shutdown --in most cases this should be the inverse of flush_on_shutdown
discard_on_error    = true
max_retry           = 1

-- See the elasticsearch module directory for the various encoders and configuration documentation.
encoder_module  = "encoders.elasticsearch.payload"
encoders_elasticsearch_common    = {
    es_index_from_timestamp = true,
    index                   = "logs-v1-%{%Y.%m.%d}",
    type_name               = "%{Logger}",

Several things to note here : my values of configuration are for testing : flushing the bulk at 1 make no sense outside of that context. I also discard messages on error, which I would not do in production. An insteresting thing, and maybe a best practice, is that the name of the index embeds a “version”. You can also note that it is composed out of interpolation of variables.

Getting input from the network

To get input from syslog, RFC 3164 formatted, I rely on the udp module, as follows in run/input/syslog1.cfg (naming matters here, we’ll see later) :

filename            = "udp.lua"
instruction_limit   = 0

-- listen on all interfaces
address = ""

-- unprivileged port
port = 1514

-- decode the flow !
decoder_module = "decoders.syslog"

-- display errors when decoder fails
send_decode_failures = true

Doing so, it listens on port 1514, but the interesting part is the use of a decoder : it will split the message, separating standard fields from the payload.

logger -n --rfc3164 -P 1514 test

This will give an output looking like this :

:Uuid: EAB2822B-985-4CAA-AA96-A83D0834A58
:Timestamp: 2017-07-31T07:45:58.000000000Z
:Type: <nil>
:Logger: input.syslog1                           <<<<< Note that the input file name reflects into logger name
:Severity: 5
:Payload: test
:EnvVersion: <nil>
:Pid: <nil>
:Hostname: fe5a39526650
    | name: syslogfacility type: 3 representation: <nil> value: 1
    | name: programname type: 0 representation: <nil> value: root
    | name: sender_ip type: 0 representation: <nil> value:
    | name: sender_port type: 2 representation: <nil> value: 44439

Here I still need to explore (and probably write my own) subdecoders, which may be the subject of another post. Regarding the Logger field, you can think of using it in your index name, to split source in indexing, to have different policies in your ES storage; for example with a logger named awesome_app you could create an index named logs-v1-awesome_app-2017.07.31.

Pruning input

This one cleans messages that have processed for output (using checkpoints to do so), which is something you clearly want to do. It relies on the standard prune_input module

filename = "prune_input.lua"
message_matcher = "TRUE"
input = true