This project is mirrored from https://gitee.com/mirrors/nomad.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 25 Feb, 2022 1 commit
-
-
Seth Hoenig authored
tests: deflake test that joins a server with non-voting servers to form quorum
-
- 24 Feb, 2022 14 commits
-
-
Seth Hoenig authored
This PR - upgrades the serf library - has the test start the join process using the un-joined server first - disables schedulers on the servers - uses the WaitForLeader and wantPeers helpers Not sure which, if any of these actually improves the flakiness of this test.
-
Zachary Shilton authored
* chore: bump to latest docs-page * fix: bump to react-consent-manager patch * chore: bump to consent-manager with events dep * chore: bump to stable consent-manager release
-
Tim Gross authored
In PR #12108 we added missing fields to the plugin response, but we didn't include the manual serialization steps that we need until issue #10470 is resolved.
-
Tim Gross authored
The behaviors of CSI plugins are governed by their capabilities as defined by the CSI specification. When debugging plugin issues, it's useful to know which behaviors are expected so they can be matched against RPC calls made to the plugin allocations. Expose the plugin capabilities as named in the CSI spec in the `nomad plugin status -verbose` output.
-
Luiz Aoqui authored
-
James Rasell authored
api: remove ent build tag on namespace test file.
-
James Rasell authored
-
Tim Gross authored
When the alloc runner claims a volume, an allocation for a previous version of the job may still have the volume claimed because it's still shutting down. In this case we'll receive an error from the server. Retry this error until we succeed or until a very long timeout expires, to give operators a chance to recover broken plugins. Make the alloc runner hook tolerant of temporary RPC failures.
-
Tim Gross authored
* Remove redundant schedulable check in `FreeWriteClaims`. If a volume has been created but not yet claimed, its capabilities will be checked in `WriteSchedulable` at both scheduling time and claim time. We don't need to also check them in the `FreeWriteClaims` method. * Enforce maximum volume claims for writers. When the scheduler checks feasibility for CSI volumes, the check is fairly loose: earlier versions of the same job are not counted as active claims. This allows the scheduler to place new allocations for the new version of a job, under the assumption that we'll replace the existing allocations and their volume claims. But when the alloc runner claims the volume, we need to enforce the active claims even if they're for allocations of an earlier version of the job. Otherwise we'll try to mount a volume that's currently being unmounted, and this will cause replacement allocations to frequently fail. * Enforce single-node reader check for read-only volumes. When the alloc runner makes a claim for a read-only volume, we only check that the volume is potentially schedulable and not that it actually has free read claims.
-
Sander Mol authored
-
Florian Apolloner authored
-
Seth Hoenig authored
core: swap bolt impl and enable configuring raft freelist sync behavior
-
Seth Hoenig authored
-
Tim Gross authored
If a plugin job fails before successfully fingerprinting the plugins, the plugin will not exist when we try to delete the job. Tolerate missing plugins.
-
- 23 Feb, 2022 11 commits
-
-
Seth Hoenig authored
-
Seth Hoenig authored
-
Seth Hoenig authored
-
Seth Hoenig authored
This PR modifies the server and client agents to use `go.etc.io/bbolt` as the implementation for their state stores.
-
Seth Hoenig authored
This PR swaps the underlying BoltDB implementation from boltdb/bolt to go.etc.io/bbolt. In addition, the Server has a new configuration option for disabling NoFreelistSync on the underlying database. Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81 Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
-
Tim Gross authored
The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing. Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.
-
Tim Gross authored
Detection of the full set of plugin capabilities was added in Nomad 1.1 for the volume creation workflow, but these were not added to the API response for plugins.
-
Tim Gross authored
-
Charlie Voiselle authored
-
Tim Gross authored
* rename method checking that free write claims are available * use package-level variables for claim errors * semgrep fix for testify
-
Tim Gross authored
The volumewatcher test incorrectly represents the change in attachment and access modes introduced in Nomad 1.1.0 to support volume creation. This leads to a test that happens to pass but only accidentally. Update the test to correctly represent the volume modes set by the existing claims on the test volumes.
-
- 22 Feb, 2022 4 commits
-
-
Mike Nomitch authored
Adding link to interview form
-
Tim Gross authored
In PR #11892 we updated the `csi_hook` to unmount the volume locally via the CSI node RPCs before releasing the claim from the server. The timer for this hook was initialized with the retry time, forcing us to wait 1s before making the first unmount RPC calls. Use the new helper for timers to ensure we clean up the timer nicely.
-
Luiz Aoqui authored
-
Michael Schurter authored
core: remove all traces of unused protocol version
-
- 19 Feb, 2022 2 commits
-
-
Michael Schurter authored
-
Michael Schurter authored
Nomad inherited protocol version numbering configuration from Consul and Serf, but unlike those projects Nomad has never used it. Nomad's `protocol_version` has always been `1`. While the code is effectively unused and therefore poses no runtime risks to leave, I felt like removing it was best because: 1. Nomad's RPC subsystem has been able to evolve extensively without needing to increment the version number. 2. Nomad's HTTP API has evolved extensively without increment `API{Major,Minor}Version`. If we want to version the HTTP API in the future, I doubt this is the mechanism we would choose. 3. The presence of the `server.protocol_version` configuration parameter is confusing since `server.raft_protocol` *is* an important parameter for operators to consider. Even more confusing is that there is a distinct Serf protocol version which is included in `nomad server members` output under the heading `Protocol`. `raft_protocol` is the...
-
- 18 Feb, 2022 7 commits
-
-
Adrián López authored
-
James Rasell authored
-
Michael Schurter authored
connect: write envoy bootstrap debugging info
-
Seth Hoenig authored
connect: bootstrap envoy using -proxy-id
-
Seth Hoenig authored
This PR modifies the Consul CLI arguments used to bootstrap envoy for Connect sidecars to make use of '-proxy-id' instead of '-sidecar-for'. Nomad registers the sidecar service, so we know what ID it has. The '-sidecar-for' was intended for use when you only know the name of the service for which the sidecar is being created. The improvement here is that using '-proxy-id' does not require an underlying request for listing Consul services. This will make make the interaction between Nomad and Consul more efficient. Closes #10452
-
Michael Schurter authored
When Consul Connect just works, it's wonderful. When it doesn't work it can be exceeding difficult to debug: operators have to check task events, Nomad logs, Consul logs, Consul APIs, and even then critical information is missing. Using Consul to generate a bootstrap config for Envoy is notoriously difficult. Nomad doesn't even log stderr, so operators are left trying to piece together what went wrong. This patch attempts to provide *maximal* context which unfortunately includes secrets. **Secrets are always restricted to the secrets/ directory.** This makes debugging a little harder, but allows operators to know exactly what operation Nomad was trying to perform. What's added: - stderr is sent to alloc/logs/envoy_bootstrap.stderr.0 - the CLI is written to secrets/.envoy_bootstrap.cmd - the environment is written to secrets/.envoy_bootstrap.env as JSON Accessing this information is unfortunately awkward: ``` nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.env nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.cmd nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0 ``` The above assumes an alloc id that starts with `b36a` and a Connect sidecar proxy for a service named `count-countdash`. If the alloc is unable to start successfully, the debugging files are only accessible from the host filesystem.
-
Seth Hoenig authored
deps: upgrade hashicorp/raft to v1.3.5
-
- 17 Feb, 2022 1 commit
-
-
Seth Hoenig authored
-