Commits · ff52961918b2eab13d86f2d9efca85c97a62131a · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

23 Mar, 2020 40 commits

cli: show lifecycle info in alloc status · ff529619

Mahmood Ali authored 5 years ago

Display task lifecycle info in `nomad alloc status <alloc_id>` output.
I chose to embed it in the Task header and only add it for tasks with
lifecycle info.

Also, I chose to order the tasks in the following order:

1. prestart non-sidecar tasks
2. prestart sidecar tasks
3. main tasks

The tasks are sorted lexicographically within each tier.

Sample output:

```
$ nomad alloc status 6ec0eb52
ID                  = 6ec0eb52-e6c8-665c-169c-113d6081309b
Eval ID             = fb0caa98
Name                = lifecycle.cache[0]
[...]

Task "init" (prestart) is "dead"
Task Resources
CPU        Memory       Disk     Addresses
0/500 MHz  0 B/256 MiB  300 MiB
[...]

Task "some-sidecar" (prestart sidecar) is "running"
Task Resources
CPU        Memory          Disk     Addresses
0/500 MHz  68 KiB/256 MiB  300 MiB
[...]

Task "redis" is "running"
Task Resources
CPU         Memory           Disk     Addresses
10/500 MHz  984 KiB/256 MiB  300 MiB
[...]
```

ff529619

Merge pull request #7437 from hashicorp/ci-build-darwin · 4bcb6ad5
Mahmood Ali authored 5 years ago
```
build darwin binaries in CI
```
Unverified

4bcb6ad5
ci: fix darwin artifact path · 9069fec8
Mahmood Ali authored 5 years ago

9069fec8
Merge pull request #7436 from hashicorp/b-fix-compilation · a813b469
Drew Bailey authored 5 years ago
```
fix compilation with  correct func
```
Unverified

a813b469
fix compilation with correct func · b0fc0710
Drew Bailey authored 5 years ago

Unverified

b0fc0710
Merge pull request #7012 from hashicorp/f-csi-volumes · d23eaed8
Tim Gross authored 5 years ago
```
Container Storage Interface Support
```
Unverified

d23eaed8
Merge pull request #7419 from hashicorp/f-event-pkg · 1f32c3c7
Drew Bailey authored 5 years ago
```
Audit config, seams for enterprise audit features
```
Unverified

1f32c3c7

csi: add mount_options to volumes and volume requests (#7398) · 1bef8b88

Lang Martin authored 5 years ago

Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host.

Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context.

closes #7007

* nomad/structs/volumes: add MountOptions to volume request

* jobspec/test-fixtures/basic.hcl: add mount_options to volume block

* jobspec/parse_test: add expected MountOptions

* api/tasks: add mount_options

* jobspec/parse_group: use hcl decode not mapstructure, mount_options

* client/allocrunner/csi_hook: pass MountOptions through

client/allocrunner/csi_hook: add a VolumeMount...

1bef8b88

csi: improve error messages from scheduler (#7426) · 2cebb3e6
Tim Gross authored 5 years ago

2cebb3e6

csi: stub fingerprint on instance manager shutdown (#7388) · a280cf06

Tim Gross authored 5 years ago

Run the plugin fingerprint one last time with a closed client during
instance manager shutdown. This will return quickly and will give us a
correctly-populated `PluginInfo` marked as unhealthy so the Nomad
client can update the server about plugin health.

a280cf06

csi: dynamically update plugin registration (#7386) · 0f9983f2

Tim Gross authored 5 years ago

Allow for faster updates to plugin status when allocations become
terminal by listening for register/deregister events from the dynamic
plugin registry (which in turn are triggered by the plugin supervisor
hook).

The deregistration function closures that we pass up to the CSI plugin
manager don't properly close over the name and type of the
registration, causing monolith-type plugins to deregister only one of
their two plugins on alloc shutdown. Rebind plugin supervisor 
deregistration targets to fix that.

Includes log message and comment improvements

0f9983f2

csi: ACLs for plugin endpoints (#7380) · 2dc95485

Lang Martin authored 5 years ago

* acl/policy: add PolicyList for global ACLs

* acl/acl: plugin policy

* acl/acl: maxPrivilege is required to allow "list"

* nomad/csi_endpoint: enforce plugin access with PolicyPlugin

* nomad/csi_endpoint: check job ACL swapped params

* nomad/csi_endpoint_test: test alloc filtering

* acl/policy: add namespace csi-register-plugin

* nomad/job_endpoint: check csi-register-plugin ACL on registration

* nomad/job_endpoint_test: add plugin job cases

2dc95485

csi: implement volume ACLs (#7339) · 4573cbee

Lang Martin authored 5 years ago

* acl/policy: add the volume ACL policies

* nomad/csi_endpoint: enforce ACLs for volume access

* nomad/search_endpoint_oss: volume acls

* acl/acl: add plugin read as a global policy

* acl/policy: add PluginPolicy global cap type

* nomad/csi_endpoint: check the global plugin ACL policy

* nomad/mock/acl: PluginPolicy

* nomad/csi_endpoint: fix list rebase

* nomad/core_sched_test: new test since #7358

* nomad/csi_endpoint_test: use correct permissions for list

* nomad/csi_endpoint: allowCSIMount keeps ACL checks together

* nomad/job_endpoint: check mount permission for jobs

* nomad/job_endpoint_test: need plugin read, too

4573cbee

csi: volume ids are only unique per namespace (#7358) · 9c9a0c5e

Lang Martin authored 5 years ago

* nomad/state/schema: use the namespace compound index

* scheduler/scheduler: CSIVolumeByID interface signature namespace

* scheduler/stack: SetJob on CSIVolumeChecker to capture namespace

* scheduler/feasible: pass the captured namespace to CSIVolumeByID

* nomad/state/state_store: use namespace in csi_volume index

* nomad/fsm: pass namespace to CSIVolumeDeregister & Claim

* nomad/core_sched: pass the namespace in volumeClaimReap

* nomad/node_endpoint_test: namespaces in Claim testing

* nomad/csi_endpoint: pass RequestNamespace to state.*

* nomad/csi_endpoint_test: appropriately failed test

* command/alloc_status_test: appropriately failed test

* node_endpoint_test: avoid notTheNamespace for the job

* scheduler/feasible_test: call SetJob to capture the namespace

* nomad/csi_endpoint: ACL check the req namespace, query by namespace

* nomad/state/state_store: remove deregister namespace check

* nomad/state/state_store: remove unused CSIVolumes

* scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace

* nomad/csi_endpoint: ACL check

* nomad/state/state_store_test: remove call to state.CSIVolumes

* nomad/core_sched_test: job namespace match so claim gc works

9c9a0c5e

volumes: add task environment interpolation to volume_mount (#7364) · 8b38fc21
Tim Gross authored 5 years ago

8b38fc21

csi: implement controller detach RPCs (#7356) · 72309e3e

Tim Gross authored 5 years ago

This changeset implements the remaining controller detach RPCs: server-to-client and client-to-controller. The tests also uncovered a bug in our RPC for claims which is fixed here; the volume claim RPC is used for both claiming and releasing a claim on a volume. We should only submit a controller publish RPC when the claim is new and not when it's being released.

72309e3e

csi: e2e tests for EBS and EFS plugins (#7343) · ccbc2196

Tim Gross authored 5 years ago

This changeset provides two basic e2e tests for CSI plugins targeting
common AWS use cases.

The EBS test launches the EBS plugin (controller + nodes) and registers
an EBS volume as a Nomad CSI volume. We deploy a job that writes to
the volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The EFS test launches the EFS plugin (nodes-only) and registers an EFS
volume as a Nomad CSI volume. We deploy a job that writes to the
volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The writer jobs mount the CSI volume at a location within the alloc
dir.

ccbc2196

csi: make claims on volumes idempotent for the same alloc (#7328) · 0d6b1756

Tim Gross authored 5 years ago

Nomad clients will push node updates during client restart which can
cause an extra claim for a volume by the same alloc. If an alloc
already claims a volume, we can allow it to be treated as a valid
claim and continue.

0d6b1756

csi: add dynamicplugins registry to client state store (#7330) · 42323c41

Tim Gross authored 5 years ago

In order to correctly fingerprint dynamic plugins on client restarts,
we need to persist a handle to the plugin (that is, connection info)
to the client state store.

The dynamic registry will sync automatically to the client state
whenever it receives a register/deregister call.

42323c41

csi: use `ExternalID`, when set, to identify volumes for outside RPC calls (#7326) · 13dea1e5

Lang Martin authored 5 years ago

* nomad/structs/csi: new RemoteID() uses the ExternalID if set

* nomad/csi_endpoint: pass RemoteID to volume request types

* client/pluginmanager/csimanager/volume: pass RemoteID to NodePublishVolume

13dea1e5

csi: docstring and log message fixups (#7327) · 17e3e882

Tim Gross authored 5 years ago

Fix some docstring typos and fix noisy log message during client restarts.
A log for the common case where the plugin socket isn't ready yet
isn't actionable by the operator so having it at info is just noise.

17e3e882

csi: change the API paths to match CLI command layout (#7325) · ce8625cf

Lang Martin authored 5 years ago

* command/agent/csi_endpoint: support type filter in volumes & plugins

* command/agent/http: use /v1/volume/csi & /v1/plugin/csi

* api/csi: use /v1/volume/csi & /v1/plugin/csi

* api/nodes: use /v1/volume/csi & /v1/plugin/csi

* api/nodes: not /volumes/csi, just /volumes

* command/agent/csi_endpoint: fix ot parameter parsing

ce8625cf

csi: volumes listed in `nomad node status` (#7318) · 13e37865

Lang Martin authored 5 years ago

* api/allocations: GetTaskGroup finds the taskgroup struct

* command/node_status: display CSI volume names

* nomad/state/state_store: new CSIVolumesByNodeID

* nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator

* nomad/csi_endpoint: deal with a slice of volumes

* nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator

* nomad/structs/csi: CSIVolumeListRequest takes a NodeID

* nomad/csi_endpoint: use the return iterator

* command/agent/csi_endpoint: parse query params for CSIVolumes.List

* api/nodes: new CSIVolumes to list volumes by node

* command/node_status: use the new list endpoint to print volumes

* nomad/state/state_store: error messages consider the operator

* command/node_status: include the Provider

13e37865

csi: csi-hostpath plugin unimplemented error on controller publish (#7299) · 7083cdd8

Lang Martin authored 5 years ago

* client/allocrunner/csi_hook: tag errors

* nomad/client_csi_endpoint: tag errors

* nomad/client_rpc: remove an unnecessary error tag

* nomad/state/state_store: ControllerRequired fix intent

We use ControllerRequired to indicate that a volume should use the
publish/unpublish workflow, rather than that it has a controller. We
need to check both RequiresControllerPlugin and SupportsAttachDetach
from the fingerprint to check that.

* nomad/csi_endpoint: tag errors

* nomad/csi_endpoint_test: longer error messages, mock fingerprints

7083cdd8

csi: ensure volume query is idempotent (#7303) · a52dcb7e

Tim Gross authored 5 years ago

We denormalize the `CSIVolume` struct when we query it from the state
store by getting the plugin and its health. But unless we copy the
volume, this denormalization gets synced back to the state store
without passing through the fsm (which is invalid).

a52dcb7e

csi: ensure GET for plugin is idempotent (#7298) · 6fef4a7f

Tim Gross authored 5 years ago

We denormalize the `CSIPlugin` struct when we query it from the state
store by getting the current set of allocations that provide the
plugin. But unless we copy the plugin, this denormalization gets
synced back to the state store and each time we query we'll add
another copy of the current allocations.

6fef4a7f

csi: add Provider field to CSI CLIs and APIs (#7285) · d6c9952d

Tim Gross authored 5 years ago

Derive a provider name and version for plugins (and the volumes that
use them) from the CSI identity API `GetPluginInfo`. Expose the vendor
name as `Provider` in the API and CLI commands.

d6c9952d

csi: CLI for volume status, registration/deregistration and plugin status (#7193) · aea212d3

Lang Martin authored 5 years ago

* command/csi: csi, csi_plugin, csi_volume

* helper/funcs: move ExtraKeys from parse_config to UnusedKeys

* command/agent/config_parse: use helper.UnusedKeys

* api/csi: annotate CSIVolumes with hcl fields

* command/csi_plugin: add Synopsis

* command/csi_volume_register: use hcl.Decode style parsing

* command/csi_volume_list

* command/csi_volume_status: list format, cleanup

* command/csi_plugin_list

* command/csi_plugin_status

* command/csi_volume_deregister

* command/csi_volume: add Synopsis

* api/contexts/contexts: add csi search contexts to the constants

* command/commands: register csi commands

* api/csi: fix struct tag for linter

* command/csi_plugin_list: unused struct vars

* command/csi_plugin_status: unused struct vars

* command/csi_volume_list: unused struct vars

* api/csi: add allocs to CSIPlugin

* command/csi_plugin_status: format the allocs

* api/allocations: copy Allocation.Stub in from structs

* nomad/client_rpc: add some error context with Errorf

* api/csi: collapse read & write alloc maps to a stub list

* command/csi_volume_status: cleanup allocation display

* command/csi_volume_list: use Schedulable instead of Healthy

* command/csi_volume_status: use Schedulable instead of Healthy

* command/csi_volume_list: sprintf string

* command/csi: delete csi.go, csi_plugin.go

* command/plugin: refactor csi components to sub-command plugin status

* command/plugin: remove csi

* command/plugin_status: remove csi

* command/volume: remove csi

* command/volume_status: split out csi specific

* helper/funcs: add RemoveEqualFold

* command/agent/config_parse: use helper.RemoveEqualFold

* api/csi: do ,unusedKeys right

* command/volume: refactor csi components to `nomad volume`

* command/volume_register: split out csi specific

* command/commands: use the new top level commands

* command/volume_deregister: hardwired type csi for now

* command/volume_status: csiFormatVolumes rescued from volume_list

* command/plugin_status: avoid a panic on no args

* command/volume_status: avoid a panic on no args

* command/plugin_status: predictVolumeType

* command/volume_status: predictVolumeType

* nomad/csi_endpoint_test: move CreateTestPlugin to testing

* command/plugin_status_test: use CreateTestCSIPlugin

* nomad/structs/structs: add CSIPlugins and CSIVolumes search consts

* nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix

* nomad/search_endpoint: add CSIPlugins and CSIVolumes

* command/plugin_status: move the header to the csi specific

* command/volume_status: move the header to the csi specific

* nomad/state/state_store: CSIPluginByID prefix

* command/status: rename the search context to just Plugins/Volumes

* command/plugin,volume_status: test return ids now

* command/status: rename the search context to just Plugins/Volumes

* command/plugin_status: support -json and -t

* command/volume_status: support -json and -t

* command/plugin_status_csi: comments

* command/*_status: clean up text

* api/csi: fix stale comments

* command/volume: make deregister sound less fearsome

* command/plugin_status: set the id length

* command/plugin_status_csi: more compact plugin health

* command/volume: better error message, comment

aea212d3

storage: add volumes to 'nomad alloc status' CLI (#7256) · b4b682b7

Tim Gross authored 5 years ago

Adds a stanza for both Host Volumes and CSI Volumes to the the CLI
output for `nomad alloc status`. Mostly relies on information already
in the API structs, but in the case where there are CSI Volumes we
need to make extra API calls to get the volume status. To reduce
overhead, these extra calls are hidden behind the `-verbose` flag.

b4b682b7

csi: remove DevDisableBootstrap flag from tests (#7267) · 313ec1dd

Tim Gross authored 5 years ago

In #7252 we removed the `DevDisableBootstrap` flag to require tests to
honor only `BootstrapExpect`, in order to reduce a source of test
flakiness. This changeset applies the same fix to the CSI tests.

313ec1dd

csi: volumes use `Schedulable` rather than `Healthy` (#7250) · ac39ed8d

Lang Martin authored 5 years ago

* structs: add ControllerRequired, volume.Name, no plug.Type

* structs: Healthy -> Schedulable

* state_store: Healthy -> Schedulable

* api: add ControllerRequired to api data types

* api: copy csi structs changes

* nomad/structs/csi: include name and external id

* api/csi: include Name and ExternalID

* nomad/structs/csi: comments for the 3 ids

ac39ed8d

csi add allocation context to fingerprinting results (#7133) · 056a1dc2

Lang Martin authored 5 years ago

* structs: CSIInfo include AllocID, CSIPlugins no Jobs

* state_store: eliminate plugin Jobs, delete an empty plugin

* nomad/structs/csi: detect empty plugins correctly

* client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID

* client/pluginmanager/csimanager/instance: allocID

* client/pluginmanager/csimanager/fingerprint: set AllocID

* client/node_updater: split controller and node plugins

* api/csi: remove Jobs

The CSI Plugin API will map plugins to allocations, which allows
plugins to be defined by jobs in many configurations. In particular,
multiple plugins can be defined in the same job, and multiple jobs can
be used to define a single plugin.

Because we now map the allocation context directly from the node, it's
no longer necessary to track the jobs associated with a plugin
directly.

* nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint

* client/dynamicplugins: lift AllocI...

056a1dc2

csi: VolumeCapabilities for ControllerPublishVolume · 20ec9a21

Danielle Lancashire authored 5 years ago

This commit introduces support for providing VolumeCapabilities during
requests to `ControllerPublishVolumes` as this is a required field.

20ec9a21

csi: Fix Controller RPCs · fab252ce

Danielle Lancashire authored 5 years ago

Currently the handling of CSINode RPCs does not correctly handle
forwarding RPCs to Nodes.

This commit fixes this by introducing a shim RPC
(nomad/client_csi_enpdoint) that will correctly forward the request to
the owning node, or submit the RPC to the client.

In the process it also cleans up handling a little bit by adding the
`CSIControllerQuery` embeded struct for required forwarding state.

The CSIControllerQuery embeding the requirement of a `PluginID` also
means we could move node targetting into the shim RPC if wanted in the
future.

fab252ce

client: Rename ClientCSI -> CSIController · 2f72b480
Danielle Lancashire authored 5 years ago

2f72b480

csi: Add /dev mounts to CSI Plugins · 08ad2b97

Danielle Lancashire authored 5 years ago

CSI Plugins that manage devices need not just access to the CSI
directory, but also to manage devices inside `/dev`.

This commit introduces a `/dev:/dev` mount to the container so that they
may do so.

08ad2b97

csi: volume claim garbage collection (#7125) · 27e29a30

Tim Gross authored 5 years ago

When an alloc is marked terminal (and after node unstage/unpublish
have been called), the client syncs the terminal alloc state with the
server via `Node.UpdateAlloc RPC`.

For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC
handler at the server will emit an eval for a new core job to garbage
collect CSI volume claims. When this eval is handled on the core
scheduler, it will call a `volumeReap` method to release the claims
for all terminal allocs on the job.

The volume reap will issue a `ControllerUnpublishVolume` RPC for any
node that has no alloc claiming the volume. Once this returns (or
is skipped), the volume reap will send a new `CSIVolume.Claim` RPC
that releases the volume claim for that allocation in the state store,
making it available for scheduling again.

This same `volumeReap` method will be called from the core job GC,
which gives us a second chance to reclaim volumes during GC if there
were controller RPC failures.

27e29a30

csimanager/volume: Update MountVolume docstring · 5cbcabb3
Danielle Lancashire authored 5 years ago

5cbcabb3
api: Register CSIPlugin before registering a Volume · 6998e767
Danielle Lancashire authored 5 years ago

6998e767
hook resources: Init with empty resources during setup · 4d19c3e2
Danielle Lancashire authored 5 years ago

4d19c3e2