Unverified Commit 78fd25cb authored by Michael Schurter's avatar Michael Schurter Committed by GitHub
Browse files

Merge pull request #10550 from hashicorp/docs-rtd

Remote Task Driver docs
Showing with 293 additions and 24 deletions
+293 -24
---
layout: docs
page_title: 'Task Driver Plugins: ECS Task Driver'
description: >-
The AWS ECS Task Driver is an example Remote Task Driver.
---
# ECS Task Driver
Name: `nomad-driver-ecs`
Homepage: https://github.com/hashicorp/nomad-driver-ecs
~> **Note:** The ECS Task Driver is an example Remote Task Driver and **not
intended for production use.**
The ECS task driver plugin for Nomad allows running [AWS ECS][ecs] tasks via
Nomad. Allocations for these jobs are scheduled onto Nomad clients like
traditional task drivers, however the actual task is executed remotely in
AWS ECS. The Nomad client agent manages the remote ECS task like any other
local Nomad task: restarting it if it fails, stopping it when requested, etc.
When a Nomad node assigned allocations with ECS tasks is drained, the ECS tasks
are *not stopped.* Instead the replacement allocations reconnect to the
original ECS tasks to avoid unnecessary downtime.
If a Nomad node assigned allocations with ECS tasks crashes and is considered
`down`, the replacement allocations for the `lost` allocations reconnect to the
original ECS tasks to avoid unnecessary downtime. If the original crashed Nomad
node restarts, it will detect `lost` allocations and stop monitoring them since
a new node has taken over.
## Client Requirements
The AWS ECS Task Driver is not currently built into Nomad and must be
[downloaded][download] onto the client host in the configured [plugin
directory][plugin_dir].
### Plugin Options
Once the plugin binary is installed, the plugin must be configured in your
Nomad client agent's HCL:
```hcl
plugin "nomad-driver-ecs" {
config {
enabled = true
# AWS ECS Cluster to run tasks in
cluster = "nomad-remote-driver-cluster"
# AWS ECS Region to run tasks in
region = "us-east-1"
}
}
```
- `cluster` - The [AWS ECS cluster][cluster] to run tasks in.
- `region` - The [AWS region][region] to run tasks in.
## Task Configuration
Nomad ECS tasks must first be defined for the ECS cluster. See the [Nomad ECS
driver demo](demo) task for an example ECS task provisioned by Terraform.
Once the ECS task is provisioned, Nomad may run it via a job:
```hcl
job "nomad-ecs-demo" {
datacenters = ["dc1"]
group "ecs-remote-task-demo" {
restart {
attempts = 0
mode = "fail"
}
reschedule {
delay = "5s"
}
task "http-server" {
driver = "ecs"
kill_timeout = "1m" // increased from default to accomodate ECS.
config {
task {
launch_type = "FARGATE"
task_definition = "nomad-remote-driver-demo:1"
network_configuration {
aws_vpc_configuration {
assign_public_ip = "ENABLED"
security_groups = ["sg-0d647d4c7ce15034f"]
subnets = ["subnet-010b03f1a021887ff"]
}
}
}
}
}
}
}
```
- `config.task` stanza defines the configuration of the ECS task:
- `launch_type` - The launch type on which to run your task.
- `task_definition` - The family and revision (`family:revision`) or full ARN
of the task definition to run.
- `network_configuration` - The network configuration for the task (eg
`awsvpc` for `FARGATE` tasks).
- `aws_vpc_configuration` - The VPC subnets and security groups associated with a task.
- `assign_public_ip` - Whether the task's elastic network interface receives a public IP address.
- `security_groups` - The security groups associated with the task or service.
- `subnets` - The subnets associated with the task or service.
[cluster]: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/clusters.html
[demo]: https://github.com/hashicorp/nomad-driver-ecs/blob/main/demo/terraform/ecs.tf
[download]: https://releases.hashicorp.com/nomad-ecs-driver/
[plugin_dir]: /docs/configuration#plugin_dir
[region]: https://docs.aws.amazon.com/general/latest/gr/ecs-service.html
---
layout: docs
page_title: 'Task Driver Plugins: Remote Task Drivers'
description: >-
Remote Task Drivers support cloud and other nonlocal task runtime
environments.
---
# Remote Task Drivers
~> **Note:** Remote Task Driver support is experimental and subject to backward
incompatible changes between Nomad releases or deprecation. Please refer to the
[Upgrade Guide][upgrade] to find breaking changes.
~> **Known Bugs:** When a Nomad node running a remote task driver goes down,
another node must be available and able to run the replacement allocation in
order to take advantage of remote task driver's ability to avoid restarting
lost tasks. If a new node is not immediately available but started later,
it will start a new instance of the remote task instead of reconnecting to the
old one. Follow [#10592][gh-10592] for the fix.
Nomad 1.1.0 introduces support for Remote Task Drivers. Remote Task Drivers
allow custom task driver plugins to execute tasks using nonlocal runtimes such
as cloud container runtimes. Without this support, task driver plugins trying
to manage remote tasks would run into the following problems:
1. When [draining][drain] a node, Nomad stops all allocations on that node
before rescheduling them.
2. When a node is `down`, Nomad reschedules all allocations onto other nodes.
These 2 behaviors are optimal for traditional task drivers where the task
process is colocated with the Nomad agent. If the Nomad node is down or
drained, the allocations should be considered down or be drained as well.
However these 2 behaviors do not apply to tasks executing on remote runtimes.
If the Nomad node managing them goes down, a new Nomad node should be able to
manage them without restarting the task. Likewise if the Nomad node managing
the remote task is drained, a new Nomad node should manage the remote task
without requiring it be stopped and restarted.
The Remote Task Driver feature in Nomad 1.1.0 improves these behaviors for
custom plugins that advertise the [`RemoteTasks` capability][remote-cap].
## Caveats
Due to the exerpimental nature there are a number of standard Nomad features
which Remote Task Drivers do not support by default.
### Resources
~> See [*Remote Task Drivers and Resources* #10549][gh-10549] on Github for
details. Comments, ideas, and use cases welcome!
The [`resources`][resources] stanza has not been altered for remote task
drivers. Since remote tasks do not consume local resources, remote task drivers
should not use the existing `resources` stanza and instead implement their own
resource parameters in their [`task.config`][task-config] block.
Jobs using remote task drivers should use the minimum allowed resources in
their [`task.resources`][resources] stanza:
```hcl
resources {
cpu = 1
memory = 10
}
```
### Nomad Client Features
Remote task drivers defer most Nomad client features to the driver plugin.
Since the allocation directory is local to the Nomad node, unless a remote task
driver is able to remotely mount or copy its contents, the following features
will be unavailable:
- [`artifact`][artifact] - artifacts are downloaded to the local Nomad
allocation directory.
- [`dispatch_payload`][dispatch-payload] - dispatch payloads are written to the
local Nomad allocation directory.
- [`ephemeral_disk`][ephemeral-disk] - ephemeral disks are local to the Nomad
node and therefore not applicable to remote task drivers.
- [`template`][template] - templates are rendered in the local Nomad allocation
directory.
- [`vault`][vault] - a secret token may be retrieved but the task driver may
not place the token file in the expected location.
- [`volume`][volume] - volume and volue mounts assumed tasks have access to the
Nomad node's local disk are unlikely to work with remote task drivers.
Furthermore since networking is completely handled by the remote runtime the
behavior of the following features is completely driver dependent:
- [`network`][network] - group networks are created on the local Nomad node,
and task networks are up to the remote task driver to implement.
- [`connect`][connect] - since group networks and sidecars are local to the
Nomad node, Consul Connect sidecars will not work as expected.
[artifact]: /docs/job-specification/artifact
[connect]: /docs/job-specification/connect
[dispatch-payload]: /docs/job-specification/dispatch_payload
[drain]: /docs/commands/node/drain
[ephemeral-disk]: /docs/job-specification/ephemeral_disk
[gh-10549]: https://github.com/hashicorp/nomad/issues/10549
[gh-10592]: https://github.com/hashicorp/nomad/issues/10592
[network]: /docs/job-specification/network
[remote-cap]: /docs/internals/plugins/task-drivers#capabilities-capabilities-error
[resources]: /docs/job-specification/resources
[task-config]: /docs/job-specification/task#config
[template]: /docs/job-specification/template
[upgrade]: /docs/upgrade/upgrade-specific
[vault]: /docs/job-specification/vault
[volume]: /docs/job-specification/volume
...@@ -45,36 +45,32 @@ plugin][baseplugin] documentation. ...@@ -45,36 +45,32 @@ plugin][baseplugin] documentation.
Capabilities define what features the driver implements. Example: Capabilities define what features the driver implements. Example:
```go ```go
Capabilities { type Capabilities struct {
// Does the driver support sending OS signals to the task? This capability // SendSignals marks the driver as being able to send signals
// is used by 'nomad alloc signal'. SendSignals bool
SendSignals: true,
// Exec marks the driver as being able to execute arbitrary commands
// Does the driver support executing a command within the task execution // such as health checks. Used by the ScriptExecutor interface.
// environment? This capability is used by 'nomad alloc exec'. Exec bool
Exec: true,
//FSIsolation indicates what kind of filesystem isolation the driver supports.
// What filesystem isolation is supported by the driver. Options include FSIsolation FSIsolation
// FSIsolationImage, FSIsolationChroot, and FSIsolationNone. See below for
// more details. //NetIsolationModes lists the set of isolation modes supported by the driver
FSIsolation: FSIsolationImage,
// NetIsolationModes lists the set of isolation modes supported by the
// driver. Options include NetIsolationModeHost, NetIsolationModeGroup,
// NetIsolationModeTask, and NetIsolationModeNone. See below for more
// details.
NetIsolationModes []NetIsolationMode NetIsolationModes []NetIsolationMode
// MustInitiateNetwork tells Nomad that the driver must create the network // MustInitiateNetwork tells Nomad that the driver must create the network
// namespace and that the CreateNetwork and DestroyNetwork RPCs are // namespace and that the CreateNetwork and DestroyNetwork RPCs are implemented.
// implemented.
MustInitiateNetwork bool MustInitiateNetwork bool
// MountConfigs tells Nomad which mounting config options the driver // MountConfigs tells Nomad which mounting config options the driver supports.
// supports. This is used to check whether mounting host volumes or CSI
// volumes is allowed. Options include MountConfigSupportAll (default), or
// MountConfigSupportNone.
MountConfigs MountConfigSupport MountConfigs MountConfigSupport
// RemoteTasks indicates this driver runs tasks on remote systems
// instead of locally. The Nomad client can use this information to
// adjust behavior such as propogating task handles between allocations
// to avoid downtime when a client is lost.
RemoteTasks bool
} }
``` ```
...@@ -96,6 +92,30 @@ The network isolation modes are: ...@@ -96,6 +92,30 @@ The network isolation modes are:
- `NetIsolationModeNone`: There is no network to isolate. This is used for - `NetIsolationModeNone`: There is no network to isolate. This is used for
task that the client manages remotely. task that the client manages remotely.
#### Remote Task Drivers
[Remote Task Drivers][rtd] should set `RemoteTasks` to `true`. Remote Task
Drivers are task driver plugins that execute tasks on a different system than
the Nomad client. This means the tasks lifecycle is distinct from the Nomad
client's.
For task driver plugin authors there are 2 important new behaviors when
`RemoteTasks` is `true`:
1. The `TaskHandle` returned by `StartTask` will be propagated to replacement
allocations if the Nomad client is drained or down. Nomad will call
`RecoverTask` instead of `StartTask` for remote tasks in replacement
allocations when a `TaskHandle` has been propagated from the previous
allocation.
2. If the Nomad client managing a remote task is drained or if the allocation
was `lost`, the remote task is sent a special `DETACH` kill signal. This
indicates the plugin should stop managing the remote task, but *not* stop
it.
These behaviors are meant to keep remote tasks running even when the Nomad
client managing them is shutdown. Remote tasks are stopped when the job is
explicitly stopped like traditional tasks.
### `Fingerprint(context.Context) (<-chan *Fingerprint, error)` ### `Fingerprint(context.Context) (<-chan *Fingerprint, error)`
This function is called by the client when the plugin is started. It allows the This function is called by the client when the plugin is started. It allows the
...@@ -233,3 +253,4 @@ inside the running container. `ExecTask` is called for Consul script checks. ...@@ -233,3 +253,4 @@ inside the running container. `ExecTask` is called for Consul script checks.
[taskconfig]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskConfig [taskconfig]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskConfig
[taskhandle]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskHandle [taskhandle]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskHandle
[fifopackage]: https://godoc.org/github.com/hashicorp/nomad/client/lib/fifo [fifopackage]: https://godoc.org/github.com/hashicorp/nomad/client/lib/fifo
[rtd]: /docs/drivers/remote
...@@ -1414,6 +1414,19 @@ ...@@ -1414,6 +1414,19 @@
"path": "drivers/external/iis" "path": "drivers/external/iis"
} }
] ]
},
{
"title": "Remote",
"routes": [
{
"title": "Overview",
"path": "drivers/remote"
},
{
"title": "ECS",
"path": "drivers/remote/ecs"
}
]
} }
] ]
}, },
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment