Commits · f-csi-e2e-awsebs-awsefs · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

13 Mar, 2020 40 commits

csi: e2e tests for EBS and EFS plugins · 629aae3a

Tim Gross authored 5 years ago

This changeset provides two basic e2e tests for CSI plugins targeting
common AWS use cases.

The EBS test launches the EBS plugin (controller + nodes) and registers
an EBS volume as a Nomad CSI volume. We deploy a job that writes to
the volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The EFS test launches the EFS plugin (nodes-only) and registers an EFS
volume as a Nomad CSI volume. We deploy a job that writes to the
volume, and share the volume with another job which should be able to
read the data written by the first job.

629aae3a

csi: make claims on volumes idempotent for the same alloc (#7328) · ba7ee097

Tim Gross authored 5 years ago

Nomad clients will push node updates during client restart which can
cause an extra claim for a volume by the same alloc. If an alloc
already claims a volume, we can allow it to be treated as a valid
claim and continue.

ba7ee097

csi: add dynamicplugins registry to client state store (#7330) · c24ce9f1

Tim Gross authored 5 years ago

In order to correctly fingerprint dynamic plugins on client restarts,
we need to persist a handle to the plugin (that is, connection info)
to the client state store.

The dynamic registry will sync automatically to the client state
whenever it receives a register/deregister call.

c24ce9f1

csi: use `ExternalID`, when set, to identify volumes for outside RPC calls (#7326) · 7c874db0

Lang Martin authored 5 years ago

* nomad/structs/csi: new RemoteID() uses the ExternalID if set

* nomad/csi_endpoint: pass RemoteID to volume request types

* client/pluginmanager/csimanager/volume: pass RemoteID to NodePublishVolume

7c874db0

csi: docstring and log message fixups (#7327) · 8246b248

Tim Gross authored 5 years ago

Fix some docstring typos and fix noisy log message during client restarts.
A log for the common case where the plugin socket isn't ready yet
isn't actionable by the operator so having it at info is just noise.

8246b248

csi: change the API paths to match CLI command layout (#7325) · 6b711dee

Lang Martin authored 5 years ago

* command/agent/csi_endpoint: support type filter in volumes & plugins

* command/agent/http: use /v1/volume/csi & /v1/plugin/csi

* api/csi: use /v1/volume/csi & /v1/plugin/csi

* api/nodes: use /v1/volume/csi & /v1/plugin/csi

* api/nodes: not /volumes/csi, just /volumes

* command/agent/csi_endpoint: fix ot parameter parsing

6b711dee

csi: volumes listed in `nomad node status` (#7318) · e50f9bfd

Lang Martin authored 5 years ago

* api/allocations: GetTaskGroup finds the taskgroup struct

* command/node_status: display CSI volume names

* nomad/state/state_store: new CSIVolumesByNodeID

* nomad/state/iterator: new SliceIterator type implements memdb.ResultIterator

* nomad/csi_endpoint: deal with a slice of volumes

* nomad/state/state_store: CSIVolumesByNodeID return a SliceIterator

* nomad/structs/csi: CSIVolumeListRequest takes a NodeID

* nomad/csi_endpoint: use the return iterator

* command/agent/csi_endpoint: parse query params for CSIVolumes.List

* api/nodes: new CSIVolumes to list volumes by node

* command/node_status: use the new list endpoint to print volumes

* nomad/state/state_store: error messages consider the operator

* command/node_status: include the Provider

e50f9bfd

csi: csi-hostpath plugin unimplemented error on controller publish (#7299) · fb6e40c7

Lang Martin authored 5 years ago

* client/allocrunner/csi_hook: tag errors

* nomad/client_csi_endpoint: tag errors

* nomad/client_rpc: remove an unnecessary error tag

* nomad/state/state_store: ControllerRequired fix intent

We use ControllerRequired to indicate that a volume should use the
publish/unpublish workflow, rather than that it has a controller. We
need to check both RequiresControllerPlugin and SupportsAttachDetach
from the fingerprint to check that.

* nomad/csi_endpoint: tag errors

* nomad/csi_endpoint_test: longer error messages, mock fingerprints

fb6e40c7

csi: ensure volume query is idempotent (#7303) · 036e74a9

Tim Gross authored 5 years ago

We denormalize the `CSIVolume` struct when we query it from the state
store by getting the plugin and its health. But unless we copy the
volume, this denormalization gets synced back to the state store
without passing through the fsm (which is invalid).

036e74a9

csi: ensure GET for plugin is idempotent (#7298) · a3918f59

Tim Gross authored 5 years ago

We denormalize the `CSIPlugin` struct when we query it from the state
store by getting the current set of allocations that provide the
plugin. But unless we copy the plugin, this denormalization gets
synced back to the state store and each time we query we'll add
another copy of the current allocations.

a3918f59

csi: add Provider field to CSI CLIs and APIs (#7285) · 2c59deaf

Tim Gross authored 5 years ago

Derive a provider name and version for plugins (and the volumes that
use them) from the CSI identity API `GetPluginInfo`. Expose the vendor
name as `Provider` in the API and CLI commands.

2c59deaf

csi: CLI for volume status, registration/deregistration and plugin status (#7193) · c1f5a68f

Lang Martin authored 5 years ago

* command/csi: csi, csi_plugin, csi_volume

* helper/funcs: move ExtraKeys from parse_config to UnusedKeys

* command/agent/config_parse: use helper.UnusedKeys

* api/csi: annotate CSIVolumes with hcl fields

* command/csi_plugin: add Synopsis

* command/csi_volume_register: use hcl.Decode style parsing

* command/csi_volume_list

* command/csi_volume_status: list format, cleanup

* command/csi_plugin_list

* command/csi_plugin_status

* command/csi_volume_deregister

* command/csi_volume: add Synopsis

* api/contexts/contexts: add csi search contexts to the constants

* command/commands: register csi commands

* api/csi: fix struct tag for linter

* command/csi_plugin_list: unused struct vars

* command/csi_plugin_status: unused struct vars

* command/csi_volume_list: unused struct vars

* api/csi: add allocs to CSIPlugin

* command/csi_plugin_status: format the allocs

* api/allocations: copy Allocation.Stub in from structs

* nomad/client_rpc: add some error context with Errorf

* api/csi: collapse read & write alloc maps to a stub list

* command/csi_volume_status: cleanup allocation display

* command/csi_volume_list: use Schedulable instead of Healthy

* command/csi_volume_status: use Schedulable instead of Healthy

* command/csi_volume_list: sprintf string

* command/csi: delete csi.go, csi_plugin.go

* command/plugin: refactor csi components to sub-command plugin status

* command/plugin: remove csi

* command/plugin_status: remove csi

* command/volume: remove csi

* command/volume_status: split out csi specific

* helper/funcs: add RemoveEqualFold

* command/agent/config_parse: use helper.RemoveEqualFold

* api/csi: do ,unusedKeys right

* command/volume: refactor csi components to `nomad volume`

* command/volume_register: split out csi specific

* command/commands: use the new top level commands

* command/volume_deregister: hardwired type csi for now

* command/volume_status: csiFormatVolumes rescued from volume_list

* command/plugin_status: avoid a panic on no args

* command/volume_status: avoid a panic on no args

* command/plugin_status: predictVolumeType

* command/volume_status: predictVolumeType

* nomad/csi_endpoint_test: move CreateTestPlugin to testing

* command/plugin_status_test: use CreateTestCSIPlugin

* nomad/structs/structs: add CSIPlugins and CSIVolumes search consts

* nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix

* nomad/search_endpoint: add CSIPlugins and CSIVolumes

* command/plugin_status: move the header to the csi specific

* command/volume_status: move the header to the csi specific

* nomad/state/state_store: CSIPluginByID prefix

* command/status: rename the search context to just Plugins/Volumes

* command/plugin,volume_status: test return ids now

* command/status: rename the search context to just Plugins/Volumes

* command/plugin_status: support -json and -t

* command/volume_status: support -json and -t

* command/plugin_status_csi: comments

* command/*_status: clean up text

* api/csi: fix stale comments

* command/volume: make deregister sound less fearsome

* command/plugin_status: set the id length

* command/plugin_status_csi: more compact plugin health

* command/volume: better error message, comment

c1f5a68f

storage: add volumes to 'nomad alloc status' CLI (#7256) · 1bdc521d

Tim Gross authored 5 years ago

Adds a stanza for both Host Volumes and CSI Volumes to the the CLI
output for `nomad alloc status`. Mostly relies on information already
in the API structs, but in the case where there are CSI Volumes we
need to make extra API calls to get the volume status. To reduce
overhead, these extra calls are hidden behind the `-verbose` flag.

1bdc521d

csi: remove DevDisableBootstrap flag from tests (#7267) · 10e7d5ba

Tim Gross authored 5 years ago

In #7252 we removed the `DevDisableBootstrap` flag to require tests to
honor only `BootstrapExpect`, in order to reduce a source of test
flakiness. This changeset applies the same fix to the CSI tests.

10e7d5ba

csi: volumes use `Schedulable` rather than `Healthy` (#7250) · 3fadc9c2

Lang Martin authored 5 years ago

* structs: add ControllerRequired, volume.Name, no plug.Type

* structs: Healthy -> Schedulable

* state_store: Healthy -> Schedulable

* api: add ControllerRequired to api data types

* api: copy csi structs changes

* nomad/structs/csi: include name and external id

* api/csi: include Name and ExternalID

* nomad/structs/csi: comments for the 3 ids

3fadc9c2

csi add allocation context to fingerprinting results (#7133) · 9c90a42c

Lang Martin authored 5 years ago

* structs: CSIInfo include AllocID, CSIPlugins no Jobs

* state_store: eliminate plugin Jobs, delete an empty plugin

* nomad/structs/csi: detect empty plugins correctly

* client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID

* client/pluginmanager/csimanager/instance: allocID

* client/pluginmanager/csimanager/fingerprint: set AllocID

* client/node_updater: split controller and node plugins

* api/csi: remove Jobs

The CSI Plugin API will map plugins to allocations, which allows
plugins to be defined by jobs in many configurations. In particular,
multiple plugins can be defined in the same job, and multiple jobs can
be used to define a single plugin.

Because we now map the allocation context directly from the node, it's
no longer necessary to track the jobs associated with a plugin
directly.

* nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint

* client/dynamicplugins: lift AllocID into the struct from Options

* api/csi_test: remove Jobs test

* nomad/structs/csi: CSIPlugins has an array of allocs

* nomad/state/state_store: implement CSIPluginDenormalize

* nomad/state/state_store: CSIPluginDenormalize npe on missing alloc

* nomad/csi_endpoint_test: defer deleteNodes for clarity

* api/csi_test: disable this test awaiting mocks:
https://github.com/hashicorp/nomad/issues/7123

9c90a42c

csi: VolumeCapabilities for ControllerPublishVolume · 610c1760

Danielle Lancashire authored 5 years ago

This commit introduces support for providing VolumeCapabilities during
requests to `ControllerPublishVolumes` as this is a required field.

610c1760

csi: Fix Controller RPCs · 55c02c9b

Danielle Lancashire authored 5 years ago

Currently the handling of CSINode RPCs does not correctly handle
forwarding RPCs to Nodes.

This commit fixes this by introducing a shim RPC
(nomad/client_csi_enpdoint) that will correctly forward the request to
the owning node, or submit the RPC to the client.

In the process it also cleans up handling a little bit by adding the
`CSIControllerQuery` embeded struct for required forwarding state.

The CSIControllerQuery embeding the requirement of a `PluginID` also
means we could move node targetting into the shim RPC if wanted in the
future.

55c02c9b

client: Rename ClientCSI -> CSIController · 66084d7e
Danielle Lancashire authored 5 years ago

66084d7e

csi: Add /dev mounts to CSI Plugins · a4113724

Danielle Lancashire authored 5 years ago

CSI Plugins that manage devices need not just access to the CSI
directory, but also to manage devices inside `/dev`.

This commit introduces a `/dev:/dev` mount to the container so that they
may do so.

a4113724

csi: volume claim garbage collection (#7125) · cb6bb550

Tim Gross authored 5 years ago

When an alloc is marked terminal (and after node unstage/unpublish
have been called), the client syncs the terminal alloc state with the
server via `Node.UpdateAlloc RPC`.

For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC
handler at the server will emit an eval for a new core job to garbage
collect CSI volume claims. When this eval is handled on the core
scheduler, it will call a `volumeReap` method to release the claims
for all terminal allocs on the job.

The volume reap will issue a `ControllerUnpublishVolume` RPC for any
node that has no alloc claiming the volume. Once this returns (or
is skipped), the volume reap will send a new `CSIVolume.Claim` RPC
that releases the volume claim for that allocation in the state store,
making it available for scheduling again.

This same `volumeReap` method will be called from the core job GC,
which gives us a second chance to reclaim volumes during GC if there
were controller RPC failures.

cb6bb550

csimanager/volume: Update MountVolume docstring · bf1b381d
Danielle Lancashire authored 5 years ago

bf1b381d
api: Register CSIPlugin before registering a Volume · fe812a5c
Danielle Lancashire authored 5 years ago

fe812a5c
hook resources: Init with empty resources during setup · f5f1d21f
Danielle Lancashire authored 5 years ago

f5f1d21f

csi: Claim CSI Volumes during csi_hook.Prerun · a34b930d

Danielle Lancashire authored 5 years ago

This commit is the initial implementation of claiming volumes from the
server and passes through any publishContext information as appropriate.

There's nothing too fancy here.

a34b930d

csi_endpoint: Provide AllocID in req, and return Volume · 27006c24

Danielle Lancashire authored 5 years ago

Currently, the client has to ship an entire allocation to the server as
part of performing a VolumeClaim, this has a few problems:

Firstly, it means the client is sending significantly more data than is
required (an allocation contains the entire contents of a Nomad job,
alongside other irrelevant state) which has a non-zero (de)serialization
cost.

Secondly, because the allocation was never re-fetched from the state
store, it means that we were potentially open to issues caused by stale
state on a misbehaving or malicious client.

The change removes both of those issues at the cost of a couple of more
state store lookups, but they should be relatively cheap.

We also now provide the CSIVolume in the response for a claim, so the
client can perform a Claim without first going ahead and fetching all of
the volumes.

27006c24

csi: Basic volume usage tracking · 05204b53
Danielle Lancashire authored 5 years ago

05204b53
csi: Add comment to UsageOptions.ToFS() · 299c30fb
Danielle Lancashire authored 5 years ago

299c30fb

csi: Validate Volumes during registration · 34f3e90a

Danielle Lancashire authored 5 years ago

This PR implements some intitial support for doing deeper validation of
a volume during its registration with the server. This allows us to
validate the capabilities before users attempt to use the volumes during
most cases, and also prevents registering volumes without first setting
up a plugin, which should help to catch typos and the like during
registration.

This does have the downside of requiring users to wait for (1) instance
of a plugin to be running in their cluster before they can register
volumes.

34f3e90a

client: Implement ClientCSI.ControllerValidateVolume · 031765ac
Danielle Lancashire authored 5 years ago

031765ac
plugins/csi: Implement ConvtrollerValidateCapabilities RPC · 1177670f
Danielle Lancashire authored 5 years ago

1177670f
csi: Move VolumeCapabilties helper to package · 8e3d3d6a
Danielle Lancashire authored 5 years ago

8e3d3d6a
sched/feasible: Return more detailed CSI Failure messages · 8707f295
Danielle Lancashire authored 5 years ago

8707f295

csi: Pass through usage options to the csimanager · 7fe2389d

Danielle Lancashire authored 5 years ago

The CSI Spec requires us to attach and stage volumes based on different
types of usage information when it may effect how they are bound. Here
we pass through some basic usage options in the CSI Hook (specifically
the volume aliases ReadOnly field), and the attachment/access mode from
the volume. We pass the attachment/access mode seperately from the
volume as it simplifies some handling and doesn't necessarily force
every attachment to use the same mode should more be supported (I.e if
we let each `volume "foo" {}` specify an override in the future).

7fe2389d

csi: Unpublish volumes during ar.Postrun · cb92657a

Danielle Lancashire authored 5 years ago

This commit introduces initial support for unmounting csi volumes.

It takes a relatively simplistic approach to performing
NodeUnpublishVolume calls, optimising for cleaning up any leftover state
rather than terminating early in the case of errors.

This is because it happens during an allocation's shutdown flow and may
not always have a corresponding call to `NodePublishVolume` that
succeeded.

cb92657a

csiclient: Add grpc.CallOption support to NodeUnpublishVolume · 272530b7
Danielle Lancashire authored 5 years ago

272530b7
taskrunner/volume_hook: Cleanup arg order of prepareHostVolumes · 7b726f69
Danielle Lancashire authored 5 years ago

7b726f69

taskrunner/volume_hook: Mounts for CSI Volumes · 65ec3593

Danielle Lancashire authored 5 years ago

This commit implements support for creating driver mounts for CSI
Volumes.

It works by fetching the created mounts from the allocation resources
and then iterates through the volume requests, creating driver mount
configs as required.

It's a little bit messy primarily because there's _so_ much terminology
overlap and it's a bit difficult to follow.

65ec3593

volume_hook: Loosen validation in host volume prep · dbeaa466
Danielle Lancashire authored 5 years ago

dbeaa466

allocrunner: Push state from hooks to taskrunners · 6802d37e

Danielle Lancashire authored 5 years ago

This commit is an initial (read: janky) approach to forwarding state
from an allocrunner hook to a taskrunner using a similar `hookResources`
approach that tr's use internally.

It should eventually probably be replaced with something a little bit
more message based, but for things that only come from pre-run hooks,
and don't change, it's probably fine for now.

6802d37e