Configure Data Collection¶
Automatic Node Run Data Collection with Chef Server¶
Note
Requires Chef Client 12.16.42 or greater, and Chef server 12.11.0 or greater.
Nodes can send their run data to Chef Automate through the Chef server automatically. To enable this functionality, you must perform the following steps:
Multiple Chef servers can send data to a single Chef Automate server.
Step 1: Configure a Data Collector token in Chef Automate¶
All messages sent to Chef Automate are performed over HTTP and are authenticated with a pre-shared key called a token
. Every Chef Automate installation configures a token by default, but we strongly recommend that you create your own.
Note
The Data Collector token has no minimum or maximum character length restrictions. While the UTF-8 character set is supported, US-ASCII is recommended for best results.
To set your own token, add the following to your /etc/delivery/delivery.rb
file:
data_collector['token'] = 'sometokenvalue' # Save and close the file
To apply the changes, run:
sudo automate-ctl reconfigure
If you do not configure a token, the default token value is: 93a49a4f2482c64126f7b6015e6b0f30284287ee4054ff8807fb63d9cbd1c506
Step 2: Configure your Chef server to point to Chef Automate¶
In addition to forwarding Chef run data to Automate, Chef server will send messages to Chef Automate whenever an action is taken on a Chef server object, such as when a cookbook is uploaded to the Chef server or when a user edits a role.
Warning
If running chef-client releases prior to Chef 14, please disable the Ohai Passwd and Sessions plugins on your nodes in /etc/chef/client.rb
or using the chef-client cookbook to keep the data sent to your Automate system to a minimum. This improves search performance and reduces disk space requirements.
ohai.disabled_plugins = [ :Passwd, :Sessions ]
Setting up data collection on Chef server versions 12.14 and higher¶
Channel the token setting through the veil secrets library because the token is considered a secret, and cannot appear in /etc/opscode/chef-server.rb
:
sudo chef-server-ctl set-secret data_collector token 'TOKEN' sudo chef-server-ctl restart nginx sudo chef-server-ctl restart opscode-erchef
Then add the following setting to /etc/opscode/chef-server.rb
on the Chef server:
data_collector['root_url'] = 'https://my-automate-server.mycompany.com/data-collector/v0/' # Add for compliance scanning profiles['root_url'] = 'https://my-automate-server.mycompany.com' # Save and close the file
To apply the changes, run:
chef-server-ctl reconfigure
where my-automate-server.mycompany.com
is the fully-qualified domain name of your Chef Automate server.
Setting up data collection on Chef server versions 12.13 and lower¶
On versions 12.13 and prior, simply add the 'root_url'
and token
values in /etc/opscode/chef-server.rb
:
data_collector['root_url'] = 'https://my-automate-server.mycompany.com/data-collector/v0/' data_collector['token'] = 'TOKEN' # Add for compliance scanning profiles['root_url'] = 'https://my-automate-server.mycompany.com' # Save and close the file
To apply the changes, run:
chef-server-ctl reconfigure
where my-automate-server.mycompany.com
is the fully-qualified domain name of your Chef Automate server, and
TOKEN
is either the default value or the token value you configured in the prior section.
Additional options¶
Option | Description | Default |
---|---|---|
data_collector['timeout'] |
Timeout in milliseconds to abort an attempt to send a message to the Chef Automate server. | Default: 30000 . |
data_collector['http_init_count'] |
Number of Chef Automate HTTP workers Chef server should start. | Default: 25 . |
data_collector['http_max_count'] |
Maximum number of Chef Automate HTTP workers Chef server should allow to exist at any time. | Default: 100 . |
data_collector['http_max_age'] |
Maximum age a Chef Automate HTTP worker should be allowed to live, specified as an Erlang tuple. | Default: {70, sec} . |
data_collector['http_cull_interval'] |
How often Chef server should cull aged-out Chef Automate HTTP workers that have exceeded their http_max_age , specified as an Erlang tuple. |
Default: {1, min} . |
data_collector['http_max_connection_duration'] |
Maximum duration an HTTP connection is allowed to exist before it is terminated, specified as an Erlang tuple. | Default: {70, sec} . |
Use an external Elasticsearch cluster (optional)¶
Chef Automate uses Elasticsearch to store its data, and the default Chef Automate install includes a single Elasticsearch service. This is sufficient to run production workloads; however, for greater data retention, we recommend using a multi-node Elasticsearch cluster with replication and sharding to store and protect your data.
As of Automate 1.7.114, the compliance service uses a compliance-latest
Elasticsearch index to improves the performance of the reporting APIs at scale. Automate creates this index automatically as part of the upgrade to Automate 1.7.114. The index is updated with each new compliance report. If the compliance-latest
Elasticsearch index becomes out of sync with the time-series data, it can be regenerated using the automate-ctl migrate-compliance
subcommand. For more information, see migrate-compliance.
Prerequisites¶
- Chef Automate server
- Elasticsearch (version 2.4.1 or greater; version 5.x is required for Chef Automate 1.6 and above)
Elasticsearch configuration¶
To utilize an external Elasticsearch installation, set the following configuration option in your
/etc/delivery/delivery.rb
:
elasticsearch['urls'] = ['https://my-elasticsearch-cluster.mycompany.com']
Or for a three node on premise install
elasticsearch['urls'] = ['http://172.16.0.100:9200', 'http://172.16.0.101:9200', 'http://172.16.0.100:9202']
The elasticsearch['urls']
attribute should be an array of Elasticsearch nodes over
which Chef Automate will round-robin requests. You can also supply a single entry which corresponds to
a load-balancer or a third-party Elasticsearch-as-a-service offering.
After saving the file, run sudo automate-ctl reconfigure
.
An additional Elasticsearch option is elasticsearch['host_header']
. This is the HTTP Host
header to send with the request.
When this attribute is unspecified, the default behavior is as follows:
- If the
urls
parameter contains a single entry, the host of the supplied URI will be sent as the Host header.- If the
urls
parameter contains more than one entry, no Host header will be sent.
When this attribute is specified, the supplied string will be sent as the Host
header on all requests. This may be required for some third-party Elasticsearch offerings.
Troubleshooting: My data does not show up in the UI¶
If an organization does not have any nodes associated with it, it does not show up in the Nodes section of the Chef Automate UI. This is also true for roles, cookbooks, recipes, attributes, resources, node names, and environments. Only those items that have a node associated with them will appear in the UI. Chef Automate has all the data for all of these, but does not highlight them in the UI. This is designed to keep the UI focused on the nodes in your cluster.