Commits · 5683fdf75ab286e31162844d81dae38d020b5f0c · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

25 Feb, 2022 1 commit
- Merge pull request #12130 from hashicorp/flakey-serf-non-voter · 5683fdf7
  Seth Hoenig authored 3 years ago
```
tests: deflake test that joins a server with non-voting servers to form quorum
```
  5683fdf7
24 Feb, 2022 14 commits

tests: deflake test that joins a server with non-voting servers to form qourum · bd03d254

Seth Hoenig authored 3 years ago

This PR
 - upgrades the serf library
 - has the test start the join process using the un-joined server first
 - disables schedulers on the servers
 - uses the WaitForLeader and wantPeers helpers

Not sure which, if any of these actually improves the flakiness of this test.

bd03d254

chore: bump docs-page for code-block fix (#12117) · a10af1bc

Zachary Shilton authored 3 years ago

* chore: bump to latest docs-page

* fix: bump to react-consent-manager patch

* chore: bump to consent-manager with events dep

* chore: bump to stable consent-manager release

a10af1bc

CSI: ensure all fields are mapped from structs to api response (#12124) · 21aa7641

Tim Gross authored 3 years ago

In PR #12108 we added missing fields to the plugin response, but we
didn't include the manual serialization steps that we need until
issue #10470 is resolved.

21aa7641

CSI: display plugin capabilities in verbose status (#12116) · 59f6c753

Tim Gross authored 3 years ago

The behaviors of CSI plugins are governed by their capabilities as
defined by the CSI specification. When debugging plugin issues, it's
useful to know which behaviors are expected so they can be matched
against RPC calls made to the plugin allocations.

Expose the plugin capabilities as named in the CSI spec in the `nomad
plugin status -verbose` output.

59f6c753

docs: add docs for the autoscaler `on_error` and `on_check_error` configuration (#12083) · 48184772
Luiz Aoqui authored 3 years ago

48184772
Merge pull request #12122 from hashicorp/b-api-remove-namespace-test-ent-tag · d8352186
James Rasell authored 3 years ago
```
api: remove ent build tag on namespace test file.
```
d8352186
api: remove ent build tag on namespace test file. · e7d3220d
James Rasell authored 3 years ago

e7d3220d

CSI: retry claims from client when max claims are reached (#12113) · 649f1e39

Tim Gross authored 3 years ago

When the alloc runner claims a volume, an allocation for a previous
version of the job may still have the volume claimed because it's
still shutting down. In this case we'll receive an error from the
server. Retry this error until we succeed or until a very long timeout
expires, to give operators a chance to recover broken plugins.

Make the alloc runner hook tolerant of temporary RPC failures.

649f1e39

CSI: enforce usage at claim time (#12112) · 6b6b8279

Tim Gross authored 3 years ago

* Remove redundant schedulable check in `FreeWriteClaims`. If a volume
  has been created but not yet claimed, its capabilities will be checked
  in `WriteSchedulable` at both scheduling time and claim time. We don't
  need to also check them in the `FreeWriteClaims` method.

* Enforce maximum volume claims for writers.

  When the scheduler checks feasibility for CSI volumes, the check is
  fairly loose: earlier versions of the same job are not counted as
  active claims. This allows the scheduler to place new allocations
  for the new version of a job, under the assumption that we'll replace
  the existing allocations and their volume claims.

  But when the alloc runner claims the volume, we need to enforce the
  active claims even if they're for allocations of an earlier version of
  the job. Otherwise we'll try to mount a volume that's currently being
  unmounted, and this will cause replacement allocations to frequently
  fail.

* Enforce single-node reader check for read-only volumes. When the
  alloc runner makes a claim for a read-only volume, we only check that
  the volume is potentially schedulable and not that it actually has
  free read claims.

6b6b8279

add go-sockaddr templating support to nomad consul address (#12084) · 0ae76b1a
Sander Mol authored 3 years ago

0ae76b1a
namespaces: allow enabling/disabling allowed drivers per namespace · b84f70ae
Florian Apolloner authored 3 years ago

b84f70ae
Merge pull request #12107 from hashicorp/use-bbolt · 5b65c97c
Seth Hoenig authored 3 years ago
```
core: swap bolt impl and enable configuring raft freelist sync behavior
```
5b65c97c
docs: emphasize snapshot before upgrading · 96a6f2c9
Seth Hoenig authored 3 years ago

96a6f2c9

csi: tolerate missing plugins on job delete (#12114) · bfbb6509

Tim Gross authored 3 years ago

If a plugin job fails before successfully fingerprinting the plugins,
the plugin will not exist when we try to delete the job. Tolerate
missing plugins.

bfbb6509

23 Feb, 2022 11 commits

command: switch from raft-boltdb to raft-boltdb/v2 · 42c6d5a5
Seth Hoenig authored 3 years ago

42c6d5a5
client: resolve rebase conflict · a6cc062c
Seth Hoenig authored 3 years ago

a6cc062c
build: disallow old boltdb during build · 615d08bb
Seth Hoenig authored 3 years ago

615d08bb

agent: switch to go.etc.io/bbolt for state store · b2fe196e

Seth Hoenig authored 3 years ago

This PR modifies the server and client agents to use `go.etc.io/bbolt` as the
implementation for their state stores.

b2fe196e

core: switch to go.etc.io/bbolt · 16efcf4e

Seth Hoenig authored 3 years ago

This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720

16efcf4e

CSI: allow for concurrent plugin allocations (#12078) · 7bcf0afd

Tim Gross authored 3 years ago

The dynamic plugin registry assumes that plugins are singletons, which
matches the behavior of other Nomad plugins. But because dynamic
plugins like CSI are implemented by allocations, we need to handle the
possibility of multiple allocations for a given plugin type + ID, as
well as behaviors around interleaved allocation starts and stops.

Update the data structure for the dynamic registry so that more recent
allocations take over as the instance manager singleton, but we still
preserve the previous running allocations so that restores work
without racing.

Multiple allocations can run on a client for the same plugin, even if
only during updates. Provide each plugin task a unique path for the
control socket so that the tasks don't interfere with each other.

7bcf0afd

CSI: add missing plugin capabilities to api response (#12108) · 822285fa

Tim Gross authored 3 years ago

Detection of the full set of plugin capabilities was added in Nomad
1.1 for the volume creation workflow, but these were not added to the
API response for plugins.

822285fa

csi: fix broken test (#12110) · 85fb42fb
Tim Gross authored 3 years ago

85fb42fb
Fixed scheduler config examples (#12049) · 53d55ee9
Charlie Voiselle authored 3 years ago

53d55ee9

CSI: minor refactoring (#12105) · 7f5a0c54

Tim Gross authored 3 years ago

* rename method checking that free write claims are available
* use package-level variables for claim errors
* semgrep fix for testify

7f5a0c54

csi: fix mocked modes in volumewatcher test (#12104) · 88a80828

Tim Gross authored 3 years ago

The volumewatcher test incorrectly represents the change in attachment
and access modes introduced in Nomad 1.1.0 to support volume
creation. This leads to a test that happens to pass but only
accidentally.

Update the test to correctly represent the volume modes set by the
existing claims on the test volumes.

88a80828

22 Feb, 2022 4 commits
- Merge pull request #12065 from hashicorp/docs-add-form-link · 950ccaf1
  Mike Nomitch authored 3 years ago
```
Adding link to interview form
```
  950ccaf1
- csi: don't wait to fire initial unmount RPC (#12102) · 89ca3d9d
  Tim Gross authored 3 years ago
```
In PR #11892 we updated the `csi_hook` to unmount the volume locally
via the CSI node RPCs before releasing the claim from the server. The
timer for this hook was initialized with the retry time, forcing us to
wait 1s before making the first unmount RPC calls.

Use the new helper for timers to ensure we clean up the timer nicely.
```
  89ca3d9d
- docs: update link to `mount` in Docker task driver (#12101) · a9407111
  Luiz Aoqui authored 3 years ago
  
  a9407111
- Merge pull request #11600 from hashicorp/f-remove-unused-version · 85abc2de
  Michael Schurter authored 3 years ago
```
core: remove all traces of unused protocol version
```
  85abc2de
19 Feb, 2022 2 commits

docs: add changelog for #11600 · 62ea60d0
Michael Schurter authored 3 years ago

62ea60d0

core: remove all traces of unused protocol version · 2411d3af

Michael Schurter authored 3 years ago

Nomad inherited protocol version numbering configuration from Consul and
Serf, but unlike those projects Nomad has never used it. Nomad's
`protocol_version` has always been `1`.

While the code is effectively unused and therefore poses no runtime
risks to leave, I felt like removing it was best because:

1. Nomad's RPC subsystem has been able to evolve extensively without
   needing to increment the version number.
2. Nomad's HTTP API has evolved extensively without increment
   `API{Major,Minor}Version`. If we want to version the HTTP API in the
   future, I doubt this is the mechanism we would choose.
3. The presence of the `server.protocol_version` configuration
   parameter is confusing since `server.raft_protocol` *is* an important
   parameter for operators to consider. Even more confusing is that
   there is a distinct Serf protocol version which is included in `nomad
   server members` output under the heading `Protocol`. `raft_protocol`
   is the...

2411d3af

18 Feb, 2022 7 commits

Update autoscaler AWS ASG target docs: AWS keypair can be empty (#11977) · 1ad08c2e
Adrián López authored 3 years ago

1ad08c2e
docs: add autoscaler hcloud target plugin link. (#12087) · 36cc1702
James Rasell authored 3 years ago

36cc1702
Merge pull request #11975 from hashicorp/f-connect-debugging · bdeea4b0
Michael Schurter authored 3 years ago
```
connect: write envoy bootstrap debugging info
```
bdeea4b0
Merge pull request #12011 from hashicorp/cc-use-proxyid · 5138f00b
Seth Hoenig authored 3 years ago
```
connect: bootstrap envoy using -proxy-id
```
5138f00b

connect: bootstrap envoy using -proxy-id · efee15f1

Seth Hoenig authored 3 years ago

This PR modifies the Consul CLI arguments used to bootstrap envoy for
Connect sidecars to make use of '-proxy-id' instead of '-sidecar-for'.

Nomad registers the sidecar service, so we know what ID it has. The
'-sidecar-for' was intended for use when you only know the name of the
service for which the sidecar is being created.

The improvement here is that using '-proxy-id' does not require an underlying
request for listing Consul services. This will make make the interaction
between Nomad and Consul more efficient.

Closes #10452

efee15f1

connect: write envoy bootstrap debugging info · d4767807

Michael Schurter authored 3 years ago

When Consul Connect just works, it's wonderful. When it doesn't work it
can be exceeding difficult to debug: operators have to check task
events, Nomad logs, Consul logs, Consul APIs, and even then critical
information is missing.

Using Consul to generate a bootstrap config for Envoy is notoriously
difficult. Nomad doesn't even log stderr, so operators are left trying
to piece together what went wrong.

This patch attempts to provide *maximal* context which unfortunately
includes secrets. **Secrets are always restricted to the secrets/
directory.** This makes debugging a little harder, but allows operators
to know exactly what operation Nomad was trying to perform.

What's added:

- stderr is sent to alloc/logs/envoy_bootstrap.stderr.0
- the CLI is written to secrets/.envoy_bootstrap.cmd
- the environment is written to secrets/.envoy_bootstrap.env as JSON

Accessing this information is unfortunately awkward:
```
nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.env
nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.cmd
nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0
```

The above assumes an alloc id that starts with `b36a` and a Connect
sidecar proxy for a service named `count-countdash`.

If the alloc is unable to start successfully, the debugging files are
only accessible from the host filesystem.

d4767807

Merge pull request #12079 from hashicorp/deps-update-raft · dd4a3a9f
Seth Hoenig authored 3 years ago
```
deps: upgrade hashicorp/raft to v1.3.5
```
dd4a3a9f

17 Feb, 2022 1 commit
- deps: upgrade hashicorp/raft to v1.3.5 · 49b97eb5
  Seth Hoenig authored 3 years ago
  
  49b97eb5