Commits · df48c5eabdf63f7f96084333154ac746ac689571 · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

17 Mar, 2022 3 commits

cli: display Raft version in `server members` (#12317) · eca4ac67

Luiz Aoqui authored 3 years ago

The previous output of the `nomad server members` command would output a
column named `Protocol` that displayed the Serf protocol being currently
used by servers.

This is not a configurable option, so it holds very little value to
operators. It is also easy to confuse it with the Raft Protocol version,
which is configurable and highly relevant to operators.

This commit replaces the previous `Protocol` column with the new `Raft
Version`. It also updates the `-detailed` flag to be called `-verbose`
so it matches other commands. The detailed output now also outputs the
same information as the standard output with the addition of the
previous `Protocol` column and `Tags`.

eca4ac67

Luiz Aoqui authored 3 years ago

The `related` query param is used to indicate that the request should
return a list of related (next, previous, and blocked) evaluations.
Co-authored-by: Jasmine Dahilig <jasmine@hashicorp.com>

81687c1c

server: transfer leadership in case of error (#12293) · dfe520a9

Luiz Aoqui authored 3 years ago

When a Nomad server becomes the Raft leader, it must perform several
actions defined in the establishLeadership function. If any of these
actions fail, Raft will think the node is the leader, but it will not
actually be able to act as a Nomad leader.

In this scenario, leadership must be revoked and transferred to another
server if possible, or the node should retry the establishLeadership
steps.

dfe520a9

09 Mar, 2022 1 commit
- Add pagination, filtering and sort to more API endpoints (#12186) · 154264fc
  Luiz Aoqui authored 3 years ago
  
  154264fc
07 Mar, 2022 5 commits

csi: add pagination args to `volume snapshot list` (#12193) · bc40222e

Tim Gross authored 3 years ago

The snapshot list API supports pagination as part of the CSI
specification, but we didn't have it plumbed through to the command
line.

bc40222e

CSI: allow updates to volumes on re-registration (#12167) · 7d0f87b9

Tim Gross authored 3 years ago

CSI `CreateVolume` RPC is idempotent given that the topology,
capabilities, and parameters are unchanged. CSI volumes have many
user-defined fields that are immutable once set, and many fields that
are not user-settable.

Update the `Register` RPC so that updating a volume via the API merges
onto any existing volume without touching Nomad-controlled fields,
while validating it with the same strict requirements expected for
idempotent `CreateVolume` RPCs.

Also, clarify that this state store method is used for everything, not just
for the `Register` RPC.

7d0f87b9

csi: volume snapshot list plugin option is required (#12197) · 711a9d9a

Tim Gross authored 3 years ago

The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.

711a9d9a

csi: get plugin ID for creating snapshot from volume, not args (#12195) · bec44cc6

Tim Gross authored 3 years ago

The `CreateSnapshot` RPC expects a plugin ID to be set by the API, but
in the common case of the `nomad volume snapshot create` command, we
don't ask the user for the plugin ID because it's available from the
volume we're snapshotting.

Change the order of the RPC so that we get the volume first and then
use the volume's plugin ID for the plugin if the API didn't set the
value.

bec44cc6

Add changelog file. Add meta to ns mock for testing · 451586af
Jorge Marey authored 3 years ago

451586af

04 Mar, 2022 1 commit

csi: fix prefix queries for plugin list RPC (#12194) · 9ed4d962

Tim Gross authored 3 years ago

The `CSIPlugin.List` RPC was intended to accept a prefix to filter the
list of plugins being listed. This was being accidentally being done
in the state store instead, which contributed to incorrect filtering
behavior for plugins in the `volume plugin status` command.

Move the prefix matching into the RPC so that it calls the
prefix-matching method in the state store if we're looking for a
prefix.

Update the `plugin status command` to accept a prefix for the plugin
ID argument so that it matches the expected behavior of other commands.

9ed4d962

03 Mar, 2022 2 commits

Fix CSI volume list with prefix and `*` namespace (#12184) · ad99a450

Luiz Aoqui authored 3 years ago

When using a prefix value and the * wildcard for namespace, the endpoint
would not take the prefix value into consideration due to the order in
which the checks were executed but also the logic for retrieving volumes
from the state store.

This commit changes the order to check for a prefix first and wraps the
result iterator of the state store query in a filter to apply the
prefix.

ad99a450

csi: add missing fields to HTTP API response (#12178) · cd928d2c

Tim Gross authored 3 years ago

The HTTP endpoint for CSI manually serializes the internal struct to
the API struct for purposes of redaction (see also #10470). Add fields
that were missing from this serialization so they don't show up as
always empty in the API response.

cd928d2c

01 Mar, 2022 5 commits
- CSI: implement support for topology (#12129) · 03a8d72d
  Tim Gross authored 3 years ago
  
  03a8d72d
- CSI: use HTTP headers for passing CSI secrets (#12144) · 3fd96831
  Tim Gross authored 3 years ago
  
  3fd96831
- csi: fix redaction of `volume status` mount flags (#12150) · 8ccb9a32
  Tim Gross authored 3 years ago
```
The `volume status` command and associated API redacts the entire
mount options instead of just the `MountFlags` field that can contain
sensitive data. Return a redacted value so that the return value makes
sense to operators who have set this field.
```
  8ccb9a32
- CSI: sort capabilities in `plugin status` (#12154) · c06f31ee
  Tim Gross authored 3 years ago
```
Also fix `LIST_SNAPSHOTS` capability name
```
  c06f31ee
- csi: respect -verbose flag for allocs in volume status (#12153) · 8c8b997f
  Tim Gross authored 3 years ago
  
  8c8b997f
25 Feb, 2022 1 commit
- docs: add changelog for #10808 · 29461444
  Michael Schurter authored 3 years ago
  
  29461444
24 Feb, 2022 7 commits

tests: deflake test that joins a server with non-voting servers to form qourum · bd03d254

Seth Hoenig authored 3 years ago

This PR
 - upgrades the serf library
 - has the test start the join process using the un-joined server first
 - disables schedulers on the servers
 - uses the WaitForLeader and wantPeers helpers

Not sure which, if any of these actually improves the flakiness of this test.

bd03d254

CSI: display plugin capabilities in verbose status (#12116) · 59f6c753

Tim Gross authored 3 years ago

The behaviors of CSI plugins are governed by their capabilities as
defined by the CSI specification. When debugging plugin issues, it's
useful to know which behaviors are expected so they can be matched
against RPC calls made to the plugin allocations.

Expose the plugin capabilities as named in the CSI spec in the `nomad
plugin status -verbose` output.

59f6c753

CSI: retry claims from client when max claims are reached (#12113) · 649f1e39

Tim Gross authored 3 years ago

When the alloc runner claims a volume, an allocation for a previous
version of the job may still have the volume claimed because it's
still shutting down. In this case we'll receive an error from the
server. Retry this error until we succeed or until a very long timeout
expires, to give operators a chance to recover broken plugins.

Make the alloc runner hook tolerant of temporary RPC failures.

649f1e39

CSI: enforce usage at claim time (#12112) · 6b6b8279

Tim Gross authored 3 years ago

* Remove redundant schedulable check in `FreeWriteClaims`. If a volume
  has been created but not yet claimed, its capabilities will be checked
  in `WriteSchedulable` at both scheduling time and claim time. We don't
  need to also check them in the `FreeWriteClaims` method.

* Enforce maximum volume claims for writers.

  When the scheduler checks feasibility for CSI volumes, the check is
  fairly loose: earlier versions of the same job are not counted as
  active claims. This allows the scheduler to place new allocations
  for the new version of a job, under the assumption that we'll replace
  the existing allocations and their volume claims.

  But when the alloc runner claims the volume, we need to enforce the
  active claims even if they're for allocations of an earlier version of
  the job. Otherwise we'll try to mount a volume that's currently being
  unmounted, and this will cause replacement allocations to frequently
  fail.

* Enforce ...

6b6b8279

add go-sockaddr templating support to nomad consul address (#12084) · 0ae76b1a
Sander Mol authored 3 years ago

0ae76b1a
namespaces: allow enabling/disabling allowed drivers per namespace · b84f70ae
Florian Apolloner authored 3 years ago

b84f70ae

csi: tolerate missing plugins on job delete (#12114) · bfbb6509

Tim Gross authored 3 years ago

If a plugin job fails before successfully fingerprinting the plugins,
the plugin will not exist when we try to delete the job. Tolerate
missing plugins.

bfbb6509

23 Feb, 2022 3 commits

agent: switch to go.etc.io/bbolt for state store · b2fe196e

Seth Hoenig authored 3 years ago

This PR modifies the server and client agents to use `go.etc.io/bbolt` as the
implementation for their state stores.

b2fe196e

core: switch to go.etc.io/bbolt · 16efcf4e

Seth Hoenig authored 3 years ago

This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720

16efcf4e

CSI: allow for concurrent plugin allocations (#12078) · 7bcf0afd

Tim Gross authored 3 years ago

The dynamic plugin registry assumes that plugins are singletons, which
matches the behavior of other Nomad plugins. But because dynamic
plugins like CSI are implemented by allocations, we need to handle the
possibility of multiple allocations for a given plugin type + ID, as
well as behaviors around interleaved allocation starts and stops.

Update the data structure for the dynamic registry so that more recent
allocations take over as the instance manager singleton, but we still
preserve the previous running allocations so that restores work
without racing.

Multiple allocations can run on a client for the same plugin, even if
only during updates. Provide each plugin task a unique path for the
control socket so that the tasks don't interfere with each other.

7bcf0afd

19 Feb, 2022 1 commit
- docs: add changelog for #11600 · 62ea60d0
  Michael Schurter authored 3 years ago
  
  62ea60d0
18 Feb, 2022 2 commits

connect: bootstrap envoy using -proxy-id · efee15f1

Seth Hoenig authored 3 years ago

This PR modifies the Consul CLI arguments used to bootstrap envoy for
Connect sidecars to make use of '-proxy-id' instead of '-sidecar-for'.

Nomad registers the sidecar service, so we know what ID it has. The
'-sidecar-for' was intended for use when you only know the name of the
service for which the sidecar is being created.

The improvement here is that using '-proxy-id' does not require an underlying
request for listing Consul services. This will make make the interaction
between Nomad and Consul more efficient.

Closes #10452

efee15f1

connect: write envoy bootstrap debugging info · d4767807

Michael Schurter authored 3 years ago

When Consul Connect just works, it's wonderful. When it doesn't work it
can be exceeding difficult to debug: operators have to check task
events, Nomad logs, Consul logs, Consul APIs, and even then critical
information is missing.

Using Consul to generate a bootstrap config for Envoy is notoriously
difficult. Nomad doesn't even log stderr, so operators are left trying
to piece together what went wrong.

This patch attempts to provide *maximal* context which unfortunately
includes secrets. **Secrets are always restricted to the secrets/
directory.** This makes debugging a little harder, but allows operators
to know exactly what operation Nomad was trying to perform.

What's added:

- stderr is sent to alloc/logs/envoy_bootstrap.stderr.0
- the CLI is written to secrets/.envoy_bootstrap.cmd
- the environment is written to secrets/.envoy_bootstrap.env as JSON

Accessing this information is unfortunately awkward:
```
nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.env
nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.cmd
nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0
```

The above assumes an alloc id that starts with `b36a` and a Connect
sidecar proxy for a service named `count-countdash`.

If the alloc is unable to start successfully, the debugging files are
only accessible from the host filesystem.

d4767807

17 Feb, 2022 1 commit
- deps: upgrade hashicorp/raft to v1.3.5 · 49b97eb5
  Seth Hoenig authored 3 years ago
  
  49b97eb5
16 Feb, 2022 3 commits
- build: respect GOBIN when using make targets · 06613d65
  Seth Hoenig authored 3 years ago
```
This PR updates GNUMakefile to respect $GOBIN if it is set in the
environment or via an $GOENV file. Previously we hard-coded the output
to $GOPATH/bin, which is not necessarily the desired behavior.
```
  06613d65
- Add `go-bexpr` filters to evals and deployment list endpoints (#12034) · fafb7cec
  Luiz Aoqui authored 3 years ago
  
  fafb7cec
- interpolate network.dns block on client (#12021) · 1fabefd2
  Tiernan authored 3 years ago
  
  1fabefd2
15 Feb, 2022 4 commits

CSI: make gRPC client creation more robust (#12057) · b775a73d

Tim Gross authored 3 years ago

Nomad communicates with CSI plugin tasks via gRPC. The plugin
supervisor hook uses this to ping the plugin for health checks which
it emits as task events. After the first successful health check the
plugin supervisor registers the plugin in the client's dynamic plugin
registry, which in turn creates a CSI plugin manager instance that has
its own gRPC client for fingerprinting the plugin and sending mount
requests.

If the plugin manager instance fails to connect to the plugin on its
first attempt, it exits. The plugin supervisor hook is unaware that
connection failed so long as its own pings continue to work. A
transient failure during plugin startup may mislead the plugin
supervisor hook into thinking the plugin is up (so there's no need to
restart the allocation) but no fingerprinter is started.

* Refactors the gRPC client to connect on first use. This provides the
  plugin manager instance the ability to retry the gRPC client
  conn...

b775a73d

api: return sorted results in certain list endpoints · b432f377

Seth Hoenig authored 3 years ago

These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List

b432f377

cl: shorten changelog entry · 5ac59de9
Seth Hoenig authored 3 years ago

5ac59de9
changelog entry (#12072) · 7c027503
Tim Gross authored 3 years ago

7c027503

11 Feb, 2022 1 commit

csi: volume cli prefix matching should accept exact match (#12051) · 4afc67b7

Tim Gross authored 3 years ago

The `volume detach`, `volume deregister`, and `volume status` commands
accept a prefix argument for the volume ID. Update the behavior on
exact matches so that if there is more than one volume that matches
the prefix, we should only return an error if one of the volume IDs is
not an exact match. Otherwise we won't be able to use these commands
at all on those volumes. This also makes the behavior of these commands
consistent with `job stop`.

4afc67b7