Chef Push Jobs
Chef Push Jobs is an extension of the Chef Infra Server that allows jobs to be run against nodes independently of a Chef Infra Client run. A job is an action or a command to be executed against a subset of nodes; the nodes against which a job is run are determined by the results of a search query made to the Chef Infra Server.
Chef Push Jobs uses the Chef Infra Server API and a Ruby client to initiate all connections to the Chef Infra Server. Connections use the same authentication and authorization model as any other request made to the Chef Infra Server. A knife plugin is used to initiate job creation and job tracking.
Install Push Jobs using the push-jobs cookbook and a Chef Infra Client run on each of the target nodes.
Requirements
Chef Push Jobs has the following requirements:
- An on-premises Chef Infra Server. Hosted Chef does not support Chef Push Jobs.
- The Chef Push Jobs client can be configured using a push-jobs cookbook, but Chef Infra Client must also be present on the node. Only Chef Infra Client can use a cookbook to configure a node.
- TCP protocol ports 10000, 10002 and 10003. 10000 is the default heartbeat port, 10002 is the default command port, 10003 is the default API port. These may be configured in the Chef Push Jobs configuration file. The command port allows Chef Push Jobs clients to communicate with the Chef Push Jobs server and also allows chef server components to communicate with the push-jobs server. In a configuration with both front and back ends, this port only needs to be open on the back end servers. The Chef Push Jobs server waits for connections from the Chef Push Jobs client, and never initiates a connection to a Chef Push Jobs client. In situations where the chef server has a non-locally-assigned public address (like a cloud deployment / or behind NAT ) the api port should be added to the network security configuration for the chef server to connect to itself on the public IP, if that is what the chef server hostname points to.
Components
Chef Push Jobs has three main components: jobs (managed by the Chef Push Jobs server), a client that is installed on every node in the organization, and one (or more) workstations from which job messages are initiated.
All communication between these components is done with the following:
- A heartbeat message between the Chef Push Jobs server and each managed node
- A knife plugin named
knife push jobs
with four subcommands:job list
,job start
,job status
, andnode status
- Various job messages sent from a workstation to the Chef Push Jobs server
- A single job message that is sent (per job) from the Chef Push Jobs server to one (or more) nodes that are being managed by the Chef server
The following diagram shows the various components of Chef Push Jobs:
Jobs
The Chef Push Jobs server is used to send job messages to one (or more) managed nodes and also to manage the list of jobs that are available to be run against nodes.
A heartbeat message is used to let all of the nodes in an organization know that the Chef Push Jobs server is available. The Chef Push Jobs server listens for heartbeat messages from each Chef Push Jobs client. If there is no heartbeat from a Chef Push Jobs client, the Chef Push Jobs server will mark that node as unavailable for job messages until the heartbeat resumes.
Nodes
The Chef Push Jobs client is used to receive job messages from the Chef Push Jobs server and to verify the heartbeat status. The Chef Push Jobs client uses the same authorization / authentication model as Chef Infra Client. The Chef Push Jobs client listens for heartbeat messages from the Chef Push Jobs server. If there is no heartbeat from the Chef Push Jobs server, the Chef Push Jobs client will finish its current job, but then stop accepting any new jobs until the heartbeat from the Chef Push Jobs server resumes.
Workstations
A workstation is used to manage Chef Push Jobs jobs, including maintaining the push-jobs cookbook, using knife to start and stop jobs, view job status, and to manage job lists.
push-jobs Cookbook
The push-jobs cookbook contains attributes that are used to configure the Chef Push Jobs client. In addition, Chef Push Jobs relies on the whitelist
attribute to manage the list of jobs (and commands) that are available to Chef Push Jobs.
Whitelist
A whitelist is a list of jobs and commands that are used by Chef Push Jobs. A whitelist is saved as an attribute in the push-jobs cookbook. For example:
default['push_jobs']['whitelist'] = {
'job_name' => 'command',
}
The whitelist is accessed from a recipe using the node['push_jobs']['whitelist']
attribute. For example:
template 'name' do
source 'name'
...
variables(:whitelist => node['push_jobs']['whitelist'])
end
Use the knife exec
subcommand to add a job to the whitelist. For example:
knife exec -E 'nodes.transform("name:A_NODE_NAME") do |n|
n.set["push_jobs"]["whitelist"]["ntpdate"] = "ntpdate -u time"
end'
where ["ntpdate"] = "ntpdate -u time"
is added to the whitelist:
default['push_jobs']['whitelist'] = {
'ntpdate' => 'ntpdate -u time',
}
Reference
The following sections describe the knife subcommands, the Push Jobs API, and configuration settings used by Chef Push Jobs.
knife push jobs
The knife push jobs
subcommand is used by Chef Push Jobs to start jobs, view job status, view job lists, and view node status.
Note
job list
Use the job list
argument to view a list of Chef Push Jobs jobs.
Syntax
This argument has the following syntax:
knife job list
Options
This command does not have any specific options.
job start
Use the job start
argument to start a Chef Push Jobs job.
Syntax
This argument has the following syntax:
knife job start (options) COMMAND [NODE, NODE, ...]
Options
This argument has the following options:
--timeout TIMEOUT
The maximum amount of time (in seconds) by which a job must complete, before it is stopped.
-
-q QUORUM
,--quorum QUORUM
-
The minimum number of nodes that match the search criteria, are available, and acknowledge the job request. This can be expressed as a percentage (e.g.
50%
) or as an absolute number of nodes (e.g.145
). Default value:100%
.For example, there are ten total nodes. If
--quorum 80%
is used and eight of those nodes acknowledge the job request, the command will be run against all of the available nodes. If two of the nodes were unavailable, the command would still be run against the remaining eight available nodes because quorum was met.
Examples
Run a job
To run a job named add-glasses
against a node named ricardosalazar
, run the following command:
knife job start add-glasses 'ricardosalazar'
Run a job using quorum percentage
To search for nodes assigned the role webapp
, and where 90% of those nodes must be available, run the following command:
knife job start --quorum 90% 'chef-client' --search 'role:webapp'
Run a job using node names
To search for a specific set of nodes (named chico
, harpo
, groucho
, gummo
, zeppo
), and where 90% of those nodes must be available, run the following command:
knife job start --quorum 90% 'chef-client' chico harpo groucho gummo zeppo
to return something similar to:
Started. Job ID: GUID12345abc
quorum_failed
Command: chef-client
Created_at: date
unavailable: zeppo
was_ready:
gummo
groucho
chico
harpo
On_timeout: 3600
Status: quorum_failed
Note
If quorum had been set at 80% (--quorum 80%
), then quorum would have passed with the previous example.
job status
Use the job status
argument to view the status of Chef Push Jobs jobs. Each job is always in one of the following states:
new
New job status.
voting
Waiting for nodes to commit or refuse to run the command.
running
Running the command on the nodes.
complete
Ran the command. Check individual node statuses to see if they completed or had issues.
quorum_failed
Did not run the command on any nodes.
crashed
Crashed while running the job.
timed_out
Timed out while running the job.
aborted
Job aborted by user.
Syntax
This argument has the following syntax:
knife job status <job id>
Options
This command does not have any specific options.
Examples
View job status by job identifier
To view the status of a job that has the identifier of 235
, run the following command:
knife job status 235
to return something similar to:
Node name Status Last updated
foo Failed 2012-05-04 00:00
bar Done 2012-05-04 00:01
node status
Use the node status
argument to identify nodes that Chef Push Jobs may interact with. Each node is always in one of the following states:
new
Node has neither committed nor refused to run the command.
ready
Node has committed to run the command but has not yet run it.
running
Node is presently running the command.
succeeded
Node successfully ran the command (an exit code of 0 was returned).
failed
Node failed to run the command (an exit code of non-zero was returned).
aborted
Node ran the command but stopped before completion.
crashed
Node went down after it started running the job.
nacked
Node was busy when asked to be part of the job.
unavailable
Node went down before it started running.
was_ready
Node was ready but quorum failed.
timed_out
Node timed out.
Syntax
This argument has the following syntax:
knife node status [<node> <node> ...]
Options
This command does not have any specific options.
Push Jobs API
The Push Jobs API is used to create jobs and retrieve status using Chef Push Jobs, a tool that pushes jobs against a set of nodes in the organization. All requests are signed using the Chef Infra Server API and the validation key on the workstation from which the requests are made. All commands are sent to the Chef Infra Server using the knife exec
subcommand.
Each authentication request must include /organizations/organization_name/pushy/
as part of the name for the endpoint. For example: /organizations/organization_name/pushy/jobs/ID
or /organizations/organization_name/pushy/node_states
.
connect/NODE_NAME
The /organizations/ORG_NAME/pushy/node_states/NODE_NAME
endpoint has the following methods: GET
.
GET
The GET
method is used to get the status (up
or down
) for an individual node.
This method has no parameters.
Request
GET /organizations/ORG_NAME/pushy/node_states/NODE_NAME
Response
The response is similar to:
{
"node_name": "FIONA",
"status": "down",
"updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
}
where updated_at
shows the date and time at which a node’s status last changed.
Response Code | Description |
---|---|
200 |
OK. The request was successful. |
400 |
Bad request. The contents of the request are not formatted correctly. |
401 |
Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request. |
403 |
Forbidden. The user who made the request is not authorized to perform the action. |
404 |
Not found. The requested object does not exist. |
jobs
The /organizations/ORG_NAME/pushy/jobs
endpoint has the following methods: GET
and POST
.
GET
The GET
method is used to get a list of jobs.
This method has no parameters.
Request
GET /organizations/ORG_NAME/pushy/jobs
Response
The response is similar to:
{
"aaaaaaaaaaaa25fd67fa8715fd547d3d",
"aaaaaaaaaaaa6af7b14dd8a025777cf0"
}
Response Code | Description |
---|---|
200 |
OK. The request was successful. |
400 |
Bad request. The contents of the request are not formatted correctly. |
401 |
Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request. |
403 |
Forbidden. The user who made the request is not authorized to perform the action. |
404 |
Not found. The requested object does not exist. |
POST
The POST
method is used to start a job.
This method has no parameters.
Request
POST /organizations/ORG_NAME/pushy/jobs
with a request body similar to:
{
"command": "chef-client",
"run_timeout": 300,
"nodes": ["NODE1", "NODE2", "NODE3", "NODE4", "NODE5", "NODE6"]
}
Response
The response is similar to:
{
"id": "aaaaaaaaaaaa25fd67fa8715fd547d3d"
}
Response Code | Description |
---|---|
201 |
Created. The object was created. |
400 |
Bad request. The contents of the request are not formatted correctly. |
401 |
Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request. |
403 |
Forbidden. The user who made the request is not authorized to perform the action. |
404 |
Not found. The requested object does not exist. |
jobs/ID
The /organizations/ORG_NAME/pushy/jobs/ID
endpoint has the following methods: GET
.
GET
The GET
method is used to get the status of an individual job, including node state (running, complete, crashed).
This method has no parameters.
The POST
method is used to start a job.
This method has no parameters.
Request
POST /organizations/ORG_NAME/pushy/jobs
with a request body similar to:
{
"command": "chef-client",
"run_timeout": 300,
"nodes": ["NODE1", "NODE2", "NODE3", "NODE4", "NODE5", "NODE6"]
}
Response
The response is similar to:
{
"id": "aaaaaaaaaaaa25fd67fa8715fd547d3d"
}
Response Code | Description |
---|---|
201 |
Created. The object was created. |
400 |
Bad request. The contents of the request are not formatted correctly. |
401 |
Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request. |
403 |
Forbidden. The user who made the request is not authorized to perform the action. |
404 |
Not found. The requested object does not exist. |
Request
GET /organizations/ORG_NAME/pushy/jobs/ID
Response
The response will return something similar to:
{
"id": "aaaaaaaaaaaa25fd67fa8715fd547d3d",
"command": "chef-client",
"run_timeout": 300,
"status": "running",
"created_at": "Tue, 04 Sep 2012 23:01:02 GMT",
"updated_at": "Tue, 04 Sep 2012 23:17:56 GMT",
"nodes": {
"running": ["NODE1", "NODE5"],
"complete": ["NODE2", "NODE3", "NODE4"],
"crashed": ["NODE6"]
}
}
where:
-
nodes
is one of the following:aborted
(node ran command, stopped before completion),complete
(node ran command to completion),crashed
(node went down after command started running),nacked
(node was busy),new
(node has not accepted or rejected command),ready
(node has accepted command, command has not started running),running
(node has accepted command, command is running), andunavailable
(node went down before command started). -
status
is one of the following:aborted
(the job was aborted),complete
(the job completed; seenodes
for individual node status),quorum_failed
(the command was not run on any nodes),running
(the command is running),timed_out
(the command timed out), andvoting
(waiting for nodes; quorum not yet met). -
updated_at
is the date and time at which the job entered its presentstatus
Response Code | Description |
---|---|
200 |
OK. The request was successful. |
400 |
Bad request. The contents of the request are not formatted correctly. |
401 |
Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request. |
403 |
Forbidden. The user who made the request is not authorized to perform the action. |
404 |
Not found. The requested object does not exist. |
node_states
The /organizations/ORG_NAME/pushy/node_states
endpoint has the following methods: GET
.
GET
The GET
method is used to get a list of nodes and their status (up
or down
).
This method has no parameters.
Request
GET /organizations/ORG_NAME/pushy/node_states
Response
The response is similar to:
{
{
"node_name": "FARQUAD",
"status": "up",
"updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
}
{
"node_name": "DONKEY",
"status": "up",
"updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
}
{
"node_name": "FIONA",
"status": "down",
"updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
}
}
The following values are possible: up
or down
.
Response Code | Description |
---|---|
200 |
OK. The request was successful. |
400 |
Bad request. The contents of the request are not formatted correctly. |
401 |
Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request. |
403 |
Forbidden. The user who made the request is not authorized to perform the action. |
404 |
Not found. The requested object does not exist. |
node_states/NODE_NAME
The /organizations/ORG_NAME/pushy/node_states/NODE_NAME
endpoint has the following methods: GET
.
GET
The GET
method is used to get the status (up
or down
) for an individual node.
This method has no parameters.
Request
GET /organizations/ORG_NAME/pushy/node_states/NODE_NAME
Response
The response is similar to:
{
"node_name": "FIONA",
"status": "down",
"updated_at": "Tue, 04 Sep 2012 23:17:56 GMT"
}
where updated_at
shows the date and time at which a node’s status last changed.
Response Code | Description |
---|---|
200 |
OK. The request was successful. |
400 |
Bad request. The contents of the request are not formatted correctly. |
401 |
Unauthorized. The user or client who made the request could not be authenticated. Verify the user/client name, and that the correct key was used to sign the request. |
403 |
Forbidden. The user who made the request is not authorized to perform the action. |
404 |
Not found. The requested object does not exist. |
push-jobs-client
The Chef Push Jobs executable can be run as a command-line tool.
Options
This command has the following syntax:
push-jobs-client OPTION VALUE OPTION VALUE ...
This command has the following options:
-
-c CONFIG
,--config CONFIG
The configuration file to use. Chef Infra Client and Chef Push Jobs client use the same configuration file: client.rb. Default value:
Chef::Config.platform_specific_path("/etc/chef/client.rb")
.-
-h
,--help
Show help for the command.
-
-k KEY_FILE
,--client-key KEY_FILE
The location of the file that contains the client key.
-
-l LEVEL
,--log_level LEVEL
The level of logging to be stored in a log file.
-
-L LOCATION
,--logfile LOCATION
The location of the log file. This is recommended when starting any executable as a daemon.
-
-N NODE_NAME
,--node-name NODE_NAME
The unique identifier of the node.
-
-S URL
,--server URL
The URL for the Chef Infra Server.
-
-v
,--version
The version of Chef Push Jobs.
opscode-push-jobs-server.rb
The opscode-push-jobs-server.rb
file is used to specify the configuration settings used by the Chef Push Jobs server.
This file is the default configuration file and is located at: /etc/opscode-push-jobs-server
.
Settings
This configuration file has the following settings:
api_port
NGINX forwards requests to this port on the push-jobs server as part of the push-jobs communication channel. Default value:
10003
.command_port
The port on which a Chef Push Jobs server listens for requests that are to be executed on managed nodes. Default value:
10002
.heartbeat_interval
The frequency of the Chef Push Jobs server heartbeat message. Default value:
1000
(milliseconds).server_heartbeat_port
The port on which the Chef Push Jobs server receives heartbeat messages from each Chef Push Jobs client. (This port is the
ROUTER
half of the ZeroMQ DEALER / ROUTER pattern.) Default value:10000
.server_name
The name of the Chef Push Jobs server.
zeromq_listen_address
The IP address used by ZeroMQ. Default value:
tcp://*
.
© Chef Software, Inc.
Licensed under the Creative Commons Attribution 3.0 Unported License.
The Chef™ Mark and Chef Logo are either registered trademarks/service marks or trademarks/servicemarks of Chef, in the United States and other countries and are used with Chef Inc's permission.
We are not affiliated with, endorsed or sponsored by Chef Inc.
https://docs.chef.io/push_jobs/