Commits · 66084d7ed49a4fe747a196f3c5a8f32bf18d1b97 · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

13 Mar, 2020 40 commits

client: Rename ClientCSI -> CSIController · 66084d7e
Danielle Lancashire authored 5 years ago

66084d7e

csi: Add /dev mounts to CSI Plugins · a4113724

Danielle Lancashire authored 5 years ago

CSI Plugins that manage devices need not just access to the CSI
directory, but also to manage devices inside `/dev`.

This commit introduces a `/dev:/dev` mount to the container so that they
may do so.

a4113724

csi: volume claim garbage collection (#7125) · cb6bb550

Tim Gross authored 5 years ago

When an alloc is marked terminal (and after node unstage/unpublish
have been called), the client syncs the terminal alloc state with the
server via `Node.UpdateAlloc RPC`.

For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC
handler at the server will emit an eval for a new core job to garbage
collect CSI volume claims. When this eval is handled on the core
scheduler, it will call a `volumeReap` method to release the claims
for all terminal allocs on the job.

The volume reap will issue a `ControllerUnpublishVolume` RPC for any
node that has no alloc claiming the volume. Once this returns (or
is skipped), the volume reap will send a new `CSIVolume.Claim` RPC
that releases the volume claim for that allocation in the state store,
making it available for scheduling again.

This same `volumeReap` method will be called from the core job GC,
which gives us a second chance to reclaim volumes during GC if there
were controller RPC failures.

cb6bb550

csimanager/volume: Update MountVolume docstring · bf1b381d
Danielle Lancashire authored 5 years ago

bf1b381d
api: Register CSIPlugin before registering a Volume · fe812a5c
Danielle Lancashire authored 5 years ago

fe812a5c
hook resources: Init with empty resources during setup · f5f1d21f
Danielle Lancashire authored 5 years ago

f5f1d21f

csi: Claim CSI Volumes during csi_hook.Prerun · a34b930d

Danielle Lancashire authored 5 years ago

This commit is the initial implementation of claiming volumes from the
server and passes through any publishContext information as appropriate.

There's nothing too fancy here.

a34b930d

csi_endpoint: Provide AllocID in req, and return Volume · 27006c24

Danielle Lancashire authored 5 years ago

Currently, the client has to ship an entire allocation to the server as
part of performing a VolumeClaim, this has a few problems:

Firstly, it means the client is sending significantly more data than is
required (an allocation contains the entire contents of a Nomad job,
alongside other irrelevant state) which has a non-zero (de)serialization
cost.

Secondly, because the allocation was never re-fetched from the state
store, it means that we were potentially open to issues caused by stale
state on a misbehaving or malicious client.

The change removes both of those issues at the cost of a couple of more
state store lookups, but they should be relatively cheap.

We also now provide the CSIVolume in the response for a claim, so the
client can perform a Claim without first going ahead and fetching all of
the volumes.

27006c24

csi: Basic volume usage tracking · 05204b53
Danielle Lancashire authored 5 years ago

05204b53
csi: Add comment to UsageOptions.ToFS() · 299c30fb
Danielle Lancashire authored 5 years ago

299c30fb

csi: Validate Volumes during registration · 34f3e90a

Danielle Lancashire authored 5 years ago

This PR implements some intitial support for doing deeper validation of
a volume during its registration with the server. This allows us to
validate the capabilities before users attempt to use the volumes during
most cases, and also prevents registering volumes without first setting
up a plugin, which should help to catch typos and the like during
registration.

This does have the downside of requiring users to wait for (1) instance
of a plugin to be running in their cluster before they can register
volumes.

34f3e90a

client: Implement ClientCSI.ControllerValidateVolume · 031765ac
Danielle Lancashire authored 5 years ago

031765ac
plugins/csi: Implement ConvtrollerValidateCapabilities RPC · 1177670f
Danielle Lancashire authored 5 years ago

1177670f
csi: Move VolumeCapabilties helper to package · 8e3d3d6a
Danielle Lancashire authored 5 years ago

8e3d3d6a
sched/feasible: Return more detailed CSI Failure messages · 8707f295
Danielle Lancashire authored 5 years ago

8707f295

csi: Pass through usage options to the csimanager · 7fe2389d

Danielle Lancashire authored 5 years ago

The CSI Spec requires us to attach and stage volumes based on different
types of usage information when it may effect how they are bound. Here
we pass through some basic usage options in the CSI Hook (specifically
the volume aliases ReadOnly field), and the attachment/access mode from
the volume. We pass the attachment/access mode seperately from the
volume as it simplifies some handling and doesn't necessarily force
every attachment to use the same mode should more be supported (I.e if
we let each `volume "foo" {}` specify an override in the future).

7fe2389d

csi: Unpublish volumes during ar.Postrun · cb92657a

Danielle Lancashire authored 5 years ago

This commit introduces initial support for unmounting csi volumes.

It takes a relatively simplistic approach to performing
NodeUnpublishVolume calls, optimising for cleaning up any leftover state
rather than terminating early in the case of errors.

This is because it happens during an allocation's shutdown flow and may
not always have a corresponding call to `NodePublishVolume` that
succeeded.

cb92657a

csiclient: Add grpc.CallOption support to NodeUnpublishVolume · 272530b7
Danielle Lancashire authored 5 years ago

272530b7
taskrunner/volume_hook: Cleanup arg order of prepareHostVolumes · 7b726f69
Danielle Lancashire authored 5 years ago

7b726f69

taskrunner/volume_hook: Mounts for CSI Volumes · 65ec3593

Danielle Lancashire authored 5 years ago

This commit implements support for creating driver mounts for CSI
Volumes.

It works by fetching the created mounts from the allocation resources
and then iterates through the volume requests, creating driver mount
configs as required.

It's a little bit messy primarily because there's _so_ much terminology
overlap and it's a bit difficult to follow.

65ec3593

volume_hook: Loosen validation in host volume prep · dbeaa466
Danielle Lancashire authored 5 years ago

dbeaa466

allocrunner: Push state from hooks to taskrunners · 6802d37e

Danielle Lancashire authored 5 years ago

This commit is an initial (read: janky) approach to forwarding state
from an allocrunner hook to a taskrunner using a similar `hookResources`
approach that tr's use internally.

It should eventually probably be replaced with something a little bit
more message based, but for things that only come from pre-run hooks,
and don't change, it's probably fine for now.

6802d37e

csi_hook: Stage/Mount volumes as required · 8989c685

Danielle Lancashire authored 5 years ago

This commit introduces the first stage of volume mounting for an
allocation. The csimanager.VolumeMounter interface manages the blocking
and actual minutia of the CSI implementation allowing this hook to do
the minimal work of volume retrieval and creating mount info.

In the future the `CSIVolume.Get` request should be replaced by
`CSIVolume.Claim(Batch?)` to minimize the number of RPCs and to handle
external triggering of a ControllerPublishVolume request as required.

We also need to ensure that if pre-run hooks fail, we still get a full
unwinding of any publish and staged volumes to ensure that there are no hanging
references to volumes. That is not handled in this commit.

8989c685

client: Pass an RPC Client to AllocRunners · ce8351d9

Danielle Lancashire authored 5 years ago

As part of introducing support for CSI, AllocRunner hooks need to be
able to communicate with Nomad Servers for validation of and interaction
with storage volumes. Here we create a small RPCer interface and pass
the client (rpc client) to the AR in preparation for making these RPCs.

ce8351d9

csi: server-to-controller publish/unpublish RPCs (#7124) · c1072a61

Tim Gross authored 5 years ago

Nomad servers need to make requests to CSI controller plugins running
on a client for publish/unpublish. The RPC needs to look up the client
node based on the plugin, load balancing across controllers, and then
perform the required client RPC to that node (via server forwarding if
neccessary).

c1072a61

csi: stub methods for server-to-controller RPC calls (#7117) · 525d09cf
Tim Gross authored 5 years ago

525d09cf

csi_endpoint: Support No ACLs and restrict Nodes · b1213db4

Danielle Lancashire authored 5 years ago

This commit refactors the ACL code for the CSI endpoint to support
environments that run without acls enabled (e.g developer environments)
and also provides an easy way to restrict which endpoints may be
accessed with a client's SecretID to limit the blast radius of a
malicious client on the state of the environment.

b1213db4

sched/feasible: Validate CSIVolume's correctly · 3be09e61

Danielle Lancashire authored 5 years ago

Previously we were looking up plugins based on the Alias Name for a CSI
Volume within the context of its task group.

Here we first look up a volume based on its identifier and then validate
the existence of the plugin based on its `PluginID`.

3be09e61

csi: Disable validation of volume topology · ab28c87e
Danielle Lancashire authored 5 years ago

ab28c87e

api: Parse CSI Volumes · bb3dee2a

Danielle Lancashire authored 5 years ago

Previously when deserializing volumes we skipped over volumes that were
not of type `host`. This commit ensures that we parse both host and csi
volumes correctly.

bb3dee2a

sched/feasible: CSI - Filter applicable volumes · 8c97a097

Danielle Lancashire authored 5 years ago

This commit filters the jobs volumes when setting them on the
feasibility checker. This ensures that the rest of the checker does not
have to worry about non-csi volumes.

8c97a097

csi: add PublishContext to CSIVolumeClaimResponse (#7113) · 8227aa7d

Tim Gross authored 5 years ago

The `ControllerPublishVolumeResponse` CSI RPC includes the publish
context intended to be passed by the orchestrator as an opaque value
to the node plugins. This changeset adds it to our response to a
volume claim request to proxy the controller's response back to the
client node.

8227aa7d

csi: implement CSI controller detach request/response (#7107) · f7017594

Tim Gross authored 5 years ago

This changeset implements the minimal structs on the client-side we
need to compile the work-in-progress implementation of the
server-to-controller RPCs. It doesn't include implementing the
`ClientCSI.DettachVolume` RPC on the client.

f7017594

csi: Fix broken call to newVolumeManager · 3e6c5cac
Danielle Lancashire authored 5 years ago

3e6c5cac

csi: Provide plugin-scoped paths during RPCs · 61169e66

Danielle Lancashire authored 5 years ago

When providing paths to plugins, the path needs to be in the scope of
the plugins container, rather than that of the host.

Here we enable that by providing the mount point through the plugin
registration and then use it when constructing request target paths.

61169e66

csimanager: Cleanup volumemanager setup · b69bd7d9
Danielle Lancashire authored 5 years ago

b69bd7d9
csimanager: Instantiate fingerprint manager's csiclient · af810483
Danielle Lancashire authored 5 years ago

af810483

csi: implement releasing volume claims for terminal allocs (#7076) · 83444a38

Tim Gross authored 5 years ago

When an alloc is marked terminal, and after node unstage/unpublish
have been called, the client will sync the terminal alloc state with
the server via `Node.UpdateAlloc` RPC.

This changeset implements releasing the volume claim for each volume
associated with the terminal alloc. It doesn't yet implement the RPC
call we need to make to the `ControllerUnpublishVolume` CSI RPC.

83444a38

csi: implement VolumeClaimRPC (#7048) · bce8dac5

Tim Gross authored 5 years ago

When the client receives an allocation which includes a CSI volume,
the alloc runner will block its main `Run` loop. The alloc runner will
issue a `VolumeClaim` RPC to the Nomad servers. This changeset
implements the portions of the `VolumeClaim` RPC endpoint that have
not been previously completed.

bce8dac5

nomad: csi_endpoint send register & deregister requests to raft (#7059) · 25462906
Lang Martin authored 5 years ago

25462906