file

Milestone: 2

Stream events from files.

By default, each event is assumed to be one line. If you would like to join multiple log lines into one event, you’ll want to use the multiline codec.

Files are followed in a manner similar to “tail -0F”. File rotation is detected and handled by this input.

Synopsis

This is what it might look like in your config file:
input {
  file {
    add_field => ... # hash (optional), default: {}
    codec => ... # codec (optional), default: "plain"
    discover_interval => ... # number (optional), default: 15
    exclude => ... # array (optional)
    path => ... # array (required)
    sincedb_path => ... # string (optional)
    sincedb_write_interval => ... # number (optional), default: 15
    start_position => ... # string, one of ["beginning", "end"] (optional), default: "end"
    stat_interval => ... # number (optional), default: 1
    tags => ... # array (optional)
    type => ... # string (optional)
  }
}

Details

add_field

  • Value type is hash
  • Default value is {}

Add a field to an event

charset DEPRECATED

  • DEPRECATED WARNING: This config item is deprecated. It may be removed in a further version.
  • Value can be any of: "ASCII-8BIT", "Big5", "Big5-HKSCS", "Big5-UAO", "CP949", "Emacs-Mule", "EUC-JP", "EUC-KR", "EUC-TW", "GB18030", "GBK", "ISO-8859-1", "ISO-8859-2", "ISO-8859-3", "ISO-8859-4", "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8", "ISO-8859-9", "ISO-8859-10", "ISO-8859-11", "ISO-8859-13", "ISO-8859-14", "ISO-8859-15", "ISO-8859-16", "KOI8-R", "KOI8-U", "Shift_JIS", "US-ASCII", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "Windows-1251", "GB2312", "IBM437", "IBM737", "IBM775", "CP850", "IBM852", "CP852", "IBM855", "CP855", "IBM857", "IBM860", "IBM861", "IBM862", "IBM863", "IBM864", "IBM865", "IBM866", "IBM869", "Windows-1258", "GB1988", "macCentEuro", "macCroatian", "macCyrillic", "macGreek", "macIceland", "macRoman", "macRomania", "macThai", "macTurkish", "macUkraine", "CP950", "CP951", "stateless-ISO-2022-JP", "eucJP-ms", "CP51932", "GB12345", "ISO-2022-JP", "ISO-2022-JP-2", "CP50220", "CP50221", "Windows-1252", "Windows-1250", "Windows-1256", "Windows-1253", "Windows-1255", "Windows-1254", "TIS-620", "Windows-874", "Windows-1257", "Windows-31J", "MacJapanese", "UTF-7", "UTF8-MAC", "UTF-16", "UTF-32", "UTF8-DoCoMo", "SJIS-DoCoMo", "UTF8-KDDI", "SJIS-KDDI", "ISO-2022-JP-KDDI", "stateless-ISO-2022-JP-KDDI", "UTF8-SoftBank", "SJIS-SoftBank", "BINARY", "CP437", "CP737", "CP775", "IBM850", "CP857", "CP860", "CP861", "CP862", "CP863", "CP864", "CP865", "CP866", "CP869", "CP1258", "Big5-HKSCS:2008", "eucJP", "euc-jp-ms", "eucKR", "eucTW", "EUC-CN", "eucCN", "CP936", "ISO2022-JP", "ISO2022-JP2", "ISO8859-1", "CP1252", "ISO8859-2", "CP1250", "ISO8859-3", "ISO8859-4", "ISO8859-5", "ISO8859-6", "CP1256", "ISO8859-7", "CP1253", "ISO8859-8", "CP1255", "ISO8859-9", "CP1254", "ISO8859-10", "ISO8859-11", "CP874", "ISO8859-13", "CP1257", "ISO8859-14", "ISO8859-15", "ISO8859-16", "CP878", "CP932", "csWindows31J", "SJIS", "PCK", "MacJapan", "ASCII", "ANSI_X3.4-1968", "646", "CP65000", "CP65001", "UTF-8-MAC", "UTF-8-HFS", "UCS-2BE", "UCS-4BE", "UCS-4LE", "CP1251", "external", "locale"
  • There is no default value for this setting.

The character encoding used in this input. Examples include “UTF-8” and “cp1252”

This setting is useful if your log files are in Latin-1 (aka cp1252) or in another character set other than UTF-8.

This only affects “plain” format logs since json is UTF-8 already.

codec

  • Value type is codec
  • Default value is "plain"

The codec used for input data. Input codecs are a convenient method for decoding your data before it enters the input, without needing a separate filter in your Logstash pipeline.

debug DEPRECATED

  • DEPRECATED WARNING: This config item is deprecated. It may be removed in a further version.
  • Value type is boolean
  • Default value is false

discover_interval

  • Value type is number
  • Default value is 15

How often we expand globs to discover new files to watch.

exclude

  • Value type is array
  • There is no default value for this setting.

Exclusions (matched against the filename, not full path). Globs are valid here, too. For example, if you have

path => "/var/log/*"

You might want to exclude gzipped files:

exclude => "*.gz"

format DEPRECATED

  • DEPRECATED WARNING: This config item is deprecated. It may be removed in a further version.
  • Value can be any of: "plain", "json", "json_event", "msgpack_event"
  • There is no default value for this setting.

The format of input data (plain, json, json_event)

message_format DEPRECATED

  • DEPRECATED WARNING: This config item is deprecated. It may be removed in a further version.
  • Value type is string
  • There is no default value for this setting.

If format is “json”, an event sprintf string to build what the display @message should be given (defaults to the raw JSON). sprintf format strings look like %{fieldname}

If format is “json_event”, ALL fields except for @type are expected to be present. Not receiving all fields will cause unexpected results.

path (required setting)

  • Value type is array
  • There is no default value for this setting.

TODO(sissel): This should switch to use the ‘line’ codec by default once file following The path(s) to the file(s) to use as an input. You can use globs here, such as /var/log/*.log Paths must be absolute and cannot be relative.

You may also configure multiple paths. See an example on the Logstash configuration page.

sincedb_path

  • Value type is string
  • There is no default value for this setting.

Where to write the sincedb database (keeps track of the current position of monitored log files). The default will write sincedb files to some path matching “$HOME/.sincedb*”

sincedb_write_interval

  • Value type is number
  • Default value is 15

How often (in seconds) to write a since database with the current position of monitored log files.

start_position

  • Value can be any of: "beginning", "end"
  • Default value is "end"

Choose where Logstash starts initially reading files: at the beginning or at the end. The default behavior treats files like live streams and thus starts at the end. If you have old data you want to import, set this to ‘beginning’

This option only modifies “first contact” situations where a file is new and not seen before. If a file has already been seen before, this option has no effect.

stat_interval

  • Value type is number
  • Default value is 1

How often we stat files to see if they have been modified. Increasing this interval will decrease the number of system calls we make, but increase the time to detect new log lines.

tags

  • Value type is array
  • There is no default value for this setting.

Add any number of arbitrary tags to your event.

This can help with processing later.

type

  • Value type is string
  • There is no default value for this setting.

Add a ‘type’ field to all events handled by this input.

Types are used mainly for filter activation.

The type is stored as part of the event itself, so you can also use the type to search for it in the web interface.

If you try to set a type on an event that already has one (for example when you send an event from a shipper to an indexer) then a new input will not override the existing type. A type set at the shipper stays with that event for its life even when sent to another Logstash server.


This is documentation from lib/logstash/inputs/file.rb