This project is mirrored from https://gitee.com/mirrors/nomad.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 10 Nov, 2022 1 commit
-
-
Seth Hoenig authored
This PR protects access to `templateHook.templateManager` with its lock. So far we have not been able to reproduce the panic - but it seems either Poststart is running without a Prestart being run first (should be impossible), or the Update hook is running concurrently with Poststart, nil-ing out the templateManager in a race with Poststart. Fixes #15189
-
- 08 Nov, 2022 2 commits
-
-
Seth Hoenig authored
* make: add target cl for create changelog entry This PR adds `tools/cl-entry` and the `make cl` Makefile target for conveniently creating correctly formatted Changelog entries. * Update tools/cl-entry/main.go Co-authored-by:
Luiz Aoqui <luiz@hashicorp.com> * Update tools/cl-entry/main.go Co-authored-by:
Luiz Aoqui <luiz@hashicorp.com> Co-authored-by:
Luiz Aoqui <luiz@hashicorp.com>
-
Derek Strickland authored
This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream. Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0. Upon further inspection, a few things became apparent. The struct already has hcl tags that support the indirection during job submission. Serialization/deserialization with both the json and hcl packages produce the desired result. The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types. This PR: Removes the mapstructure tags from the api.Port structs Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances. Closes #11044 Co-authored-by:
DerekStrickland <dstrickland@hashicorp.com> Co-authored-by:
Piotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
-
- 07 Nov, 2022 8 commits
-
-
twunderlich-grapl authored
* Fix s3 URLs so that they work Unfortunately, s3 urls prefixed with https:// do NOT work with the underlying go-getter library. As such, this fixes the examples so that they are working examples that won't cause problems for people reading the docs. See discussion in https://github.com/hashicorp/nomad/issues/1113 circa 2016. * Use s3:// protocol schema for artifact examples Per the discussion in https://github.com/hashicorp/nomad/pull/15123, we're going to use the explicit s3 protocol in the examples since that is the likeliest to work in all scenarios
-
Drew Gonzales authored
-
Phil Renaud authored
* Remove animation from task logs sidebar * changelog
-
Tim Gross authored
Add a new `Eval.Count` RPC and associated HTTP API endpoints. This API is designed to support interactive use in the `nomad eval delete` command to get a count of evals expected to be deleted before doing so. The state store operations to do this sort of thing are somewhat expensive, but it's cheaper than serializing a big list of evals to JSON. Note that although it seems like this could be done as an extra parameter and response field on `Eval.List`, having it as its own endpoint avoids having to change the response body shape and lets us avoid handling the legacy filter params supported by `Eval.List`.
-
dependabot[bot] authored
Bumps [github.com/zclconf/go-cty-yaml](https://github.com/zclconf/go-cty-yaml) from 1.0.2 to 1.0.3. - [Release notes](https://github.com/zclconf/go-cty-yaml/releases) - [Changelog](https://github.com/zclconf/go-cty-yaml/blob/master/CHANGELOG.md) - [Commits](https://github.com/zclconf/go-cty-yaml/compare/v1.0.2...v1.0.3 ) --- updated-dependencies: - dependency-name: github.com/zclconf/go-cty-yaml dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [github.com/hashicorp/go-plugin](https://github.com/hashicorp/go-plugin) from 1.4.3 to 1.4.5. - [Release notes](https://github.com/hashicorp/go-plugin/releases) - [Changelog](https://github.com/hashicorp/go-plugin/blob/master/CHANGELOG.md) - [Commits](https://github.com/hashicorp/go-plugin/compare/v1.4.3...v1.4.5 ) --- updated-dependencies: - dependency-name: github.com/hashicorp/go-plugin dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [github.com/shirou/gopsutil/v3](https://github.com/shirou/gopsutil) from 3.22.9 to 3.22.10. - [Release notes](https://github.com/shirou/gopsutil/releases) - [Commits](https://github.com/shirou/gopsutil/compare/v3.22.9...v3.22.10 ) --- updated-dependencies: - dependency-name: github.com/shirou/gopsutil/v3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [github.com/zclconf/go-cty](https://github.com/zclconf/go-cty) from 1.11.0 to 1.12.0. - [Release notes](https://github.com/zclconf/go-cty/releases) - [Changelog](https://github.com/zclconf/go-cty/blob/main/CHANGELOG.md) - [Commits](https://github.com/zclconf/go-cty/compare/v1.11.0...v1.12.0 ) --- updated-dependencies: - dependency-name: github.com/zclconf/go-cty dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 06 Nov, 2022 1 commit
-
-
dependabot[bot] authored
* build(deps): bump github.com/shoenig/test from 0.4.3 to 0.4.4 in /api Bumps [github.com/shoenig/test](https://github.com/shoenig/test) from 0.4.3 to 0.4.4. - [Release notes](https://github.com/shoenig/test/releases) - [Commits](https://github.com/shoenig/test/compare/v0.4.3...v0.4.4 ) --- updated-dependencies: - dependency-name: github.com/shoenig/test dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> * deps: also update root go mod Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Seth Hoenig <shoenig@duck.com>
-
- 04 Nov, 2022 4 commits
-
-
Luiz Aoqui authored
* scheduler: allow updates after alloc reconnects When an allocation reconnects to a cluster the scheduler needs to run special logic to handle the reconnection, check if a replacement was create and stop one of them. If the allocation kept running while the node was disconnected, it will be reconnected with `ClientStatus: running` and the node will have `Status: ready`. This combination is the same as the normal steady state of allocation, where everything is running as expected. In order to differentiate between the two states (an allocation that is reconnecting and one that is just running) the scheduler needs an extra piece of state. The current implementation uses the presence of a `TaskClientReconnected` task event to detect when the allocation has reconnected and thus must go through the reconnection process. But this event remains even after the allocation is reconnected, causing all future evals to consider the allocation as still reconnecting. This commit changes the reconnect logic to use an `AllocState` to register when the allocation was reconnected. This provides the following benefits: - Only a limited number of task states are kept, and they are used for many other events. It's possible that, upon reconnecting, several actions are triggered that could cause the `TaskClientReconnected` event to be dropped. - Task events are set by clients and so their timestamps are subject to time skew from servers. This prevents using time to determine if an allocation reconnected after a disconnect event. - Disconnect events are already stored as `AllocState` and so storing reconnects there as well makes it the only source of information required. With the new logic, the reconnection logic is only triggered if the last `AllocState` is a disconnect event, meaning that the allocation has not been reconnected yet. After the reconnection is handled, the new `ClientStatus` is store in `AllocState` allowing future evals to skip the reconnection logic. * scheduler: prevent spurious placement on reconnect When a client reconnects it makes two independent RPC calls: - `Node.UpdateStatus` to heartbeat and set its status as `ready`. - `Node.UpdateAlloc` to update the status of its allocations. These two calls can happen in any order, and in case the allocations are updated before a heartbeat it causes the state to be the same as a node being disconnected: the node status will still be `disconnected` while the allocation `ClientStatus` is set to `running`. The current implementation did not handle this order of events properly, and the scheduler would create an unnecessary placement since it considered the allocation was being disconnected. This extra allocation would then be quickly stopped by the heartbeat eval. This commit adds a new code path to handle this order of events. If the node is `disconnected` and the allocation `ClientStatus` is `running` the scheduler will check if the allocation is actually reconnecting using its `AllocState` events. * rpc: only allow alloc updates from `ready` nodes Clients interact with servers using three main RPC methods: - `Node.GetAllocs` reads allocation data from the server and writes it to the client. - `Node.UpdateAlloc` reads allocation from from the client and writes them to the server. - `Node.UpdateStatus` writes the client status to the server and is used as the heartbeat mechanism. These three methods are called periodically by the clients and are done so independently from each other, meaning that there can't be any assumptions in their ordering. This can generate scenarios that are hard to reason about and to code for. For example, when a client misses too many heartbeats it will be considered `down` or `disconnected` and the allocations it was running are set to `lost` or `unknown`. When connectivity is restored the to rest of the cluster, the natural mental model is to think that the client will heartbeat first and then update its allocations status into the servers. But since there's no inherit order in these calls the reverse is just as possible: the client updates the alloc status and then heartbeats. This results in a state where allocs are, for example, `running` while the client is still `disconnected`. This commit adds a new verification to the `Node.UpdateAlloc` method to reject updates from nodes that are not `ready`, forcing clients to heartbeat first. Since this check is done server-side there is no need to coordinate operations client-side: they can continue sending these requests independently and alloc update will succeed after the heartbeat is done. * chagelog: add entry for #15068 * code review * client: skip terminal allocations on reconnect When the client reconnects with the server it synchronizes the state of its allocations by sending data using the `Node.UpdateAlloc` RPC and fetching data using the `Node.GetClientAllocs` RPC. If the data fetch happens before the data write, `unknown` allocations will still be in this state and would trigger the `allocRunner.Reconnect` flow. But when the server `DesiredStatus` for the allocation is `stop` the client should not reconnect the allocation. * apply more code review changes * scheduler: persist changes to reconnected allocs Reconnected allocs have a new AllocState entry that must be persisted by the plan applier. * rpc: read node ID from allocs in UpdateAlloc The AllocUpdateRequest struct is used in three disjoint use cases: 1. Stripped allocs from clients Node.UpdateAlloc RPC using the Allocs, and WriteRequest fields 2. Raft log message using the Allocs, Evals, and WriteRequest fields 3. Plan updates using the AllocsStopped, AllocsUpdated, and Job fields Adding a new field that would only be used in one these cases (1) made things more confusing and error prone. While in theory an AllocUpdateRequest could send allocations from different nodes, in practice this never actually happens since only clients call this method with their own allocations. * scheduler: remove logic to handle exceptional case This condition could only be hit if, somehow, the allocation status was set to "running" while the client was "unknown". This was addressed by enforcing an order in "Node.UpdateStatus" and "Node.UpdateAlloc" RPC calls, so this scenario is not expected to happen. Adding unnecessary code to the scheduler makes it harder to read and reason about it. * more code review * remove another unused test
-
Luiz Aoqui authored
When a Nomad service starts it tries to establish a connection with servers, but it also runs alloc runners to manage whatever allocations it needs to run. The alloc runner will invoke several hooks to perform actions, with some of them requiring access to the Nomad servers, such as Native Service Discovery Registration. If the alloc runner starts before a connection is established the alloc runner will fail, causing the allocation to be shutdown. This is particularly problematic for disconnected allocations that are reconnecting, as they may fail as soon as the client reconnects. This commit changes the RPC request logic to retry it, using the existing retry mechanism, if there are no servers available.
-
Charlie Voiselle authored
* Support error_on_missing_value for templates * Update docs for template stanza
-
Seth Hoenig authored
Also add some debug log lines for this test, because it doesn't make sense for the allocation to be complete yet a task in the allocation to be not started yet, which is what the test failures are implying.
-
- 03 Nov, 2022 5 commits
-
-
Charlie Voiselle authored
-
Ethan authored
* fix: update device in batch first footprint * cl: add cl note Co-authored-by:
Seth Hoenig <shoenig@duck.com>
-
Phil Renaud authored
-
Tim Gross authored
Allocations created before 1.4.0 will not have a workload identity token. When the client running these allocs is upgraded to 1.4.x, the identity hook will run and replace the node secret ID token used previously with an empty string. This causes service discovery queries to fail. Fallback to the node's secret ID when the allocation doesn't have a signed identity. Note that pre-1.4.0 allocations won't have templates that read Variables, so there's no threat that this new node ID secret will be able to read data that the allocation shouldn't have access to.
-
Phil Renaud authored
-
- 02 Nov, 2022 2 commits
-
-
Phil Renaud authored
* Adds meta to job list stub and displays a pack logo on the jobs index * Changelog * Modifying struct for optional meta param * Explicitly ask for meta anytime I look up a job from index or job page * Test case for the endpoint * adding meta field to API struct and ommitting from response if empty * passthru method added to api/jobs.list * Meta param listed in docs for jobs list * Update api/jobs.go Co-authored-by:
Tim Gross <tgross@hashicorp.com> Co-authored-by:
Tim Gross <tgross@hashicorp.com>
-
Phil Renaud authored
* Job spec upload by click or drag * pseudo-restrict formats * Changelog * Tweak to job spec upload to be above editor layer * Within the job-editor again tho * Beginning testcase cleanup * Test progression * refact: update codemirror fillin logic Co-authored-by:
Jai Bhagat <jaybhagat841@gmail.com>
-
- 01 Nov, 2022 4 commits
-
-
Seth Hoenig authored
-
Tim Gross authored
If a GC claim is written and then volume is deleted before the `volumewatcher` enters its run loop, we panic on the nil-pointer access. Simply doing a nil-check at the top of the loop reveals a race condition around shutting down the loop just as a new update is coming in. Have the parent `volumeswatcher` send an initial update on the channel before returning, so that we're still holding the lock. Update the watcher's `Stop` method to set the running state, which lets us avoid having a second context and makes stopping synchronous. This reduces the cases we have to handle in the run loop. Updated the tests now that we'll safely return from the goroutine and stop the runner in a larger set of cases. Ran the tests with the `-race` detection flag and fixed up any problems found here as well.
-
Tim Gross authored
In order to limit how much the rekey job can monopolize a scheduler worker, we limit how long it can run to 1min before stopping work and emitting a new eval. But this exactly matches the default nack timeout, so it'll fail the eval rather than getting a chance to emit a new one. Set the timeout for the rekey eval to half the configured nack timeout.
-
Tim Gross authored
When replication of a single key fails, the replication loop breaks early and therefore keys that fall later in the sorting order will never get replicated. This is particularly a problem for clusters impacted by the bug that caused #14981 and that were later upgraded; the keys that were never replicated can now never be replicated, and so we need to handle them safely. Included in the replication fix: * Refactor the replication loop so that each key replicated in a function call that returns an error, to make the workflow more clear and reduce nesting. Log the error and continue. * Improve stability of keyring replication tests. We no longer block leadership on initializing the keyring, so there's a race condition in the keyring tests where we can test for the existence of the root key before the keyring has been initialize. Change this to an "eventually" test. But these fixes aren't enough to fix #14981 because they'll end up seeing an error once a second complaining about the missing key, so we also need to fix keyring GC so the keys can be removed from the state store. Now we'll store the key ID used to sign a workload identity in the Allocation, and we'll index the Allocation table on that so we can track whether any live Allocation was signed with a particular key ID.
-
- 31 Oct, 2022 6 commits
-
-
dependabot[bot] authored
build(deps): bump github.com/docker/cli from 20.10.18+incompatible to 20.10.21+incompatible (#15078) * build(deps): bump github.com/docker/cli Bumps [github.com/docker/cli](https://github.com/docker/cli) from 20.10.18+incompatible to 20.10.21+incompatible. - [Release notes](https://github.com/docker/cli/releases) - [Commits](https://github.com/docker/cli/compare/v20.10.18...v20.10.21 ) --- updated-dependencies: - dependency-name: github.com/docker/cli dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> * deps: updated github.com/docker/cli from 20.10.18+incompatible to 20.10.21+incompatible Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Seth Hoenig <shoenig@duck.com>
-
dependabot[bot] authored
Bumps [github.com/hashicorp/serf](https://github.com/hashicorp/serf) from 0.10.0 to 0.10.1. - [Release notes](https://github.com/hashicorp/serf/releases) - [Changelog](https://github.com/hashicorp/serf/blob/master/CHANGELOG.md) - [Commits](https://github.com/hashicorp/serf/compare/v0.10.0...v0.10.1 ) --- updated-dependencies: - dependency-name: github.com/hashicorp/serf dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
* build(deps): bump github.com/aws/aws-sdk-go from 1.44.84 to 1.44.126 Bumps [github.com/aws/aws-sdk-go](https://github.com/aws/aws-sdk-go) from 1.44.84 to 1.44.126. - [Release notes](https://github.com/aws/aws-sdk-go/releases) - [Commits](https://github.com/aws/aws-sdk-go/compare/v1.44.84...v1.44.126 ) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> * deps: update github.com/aws/aws-sdk-go from 1.44.84 to 1.44.126 Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by:
Seth Hoenig <shoenig@duck.com>
-
dependabot[bot] authored
Bumps [github.com/stretchr/testify](https://github.com/stretchr/testify) from 1.8.0 to 1.8.1. - [Release notes](https://github.com/stretchr/testify/releases) - [Commits](https://github.com/stretchr/testify/compare/v1.8.0...v1.8.1 ) --- updated-dependencies: - dependency-name: github.com/stretchr/testify dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [github.com/hashicorp/memberlist](https://github.com/hashicorp/memberlist) from 0.4.0 to 0.5.0. - [Release notes](https://github.com/hashicorp/memberlist/releases) - [Commits](https://github.com/hashicorp/memberlist/compare/v0.4.0...v0.5.0 ) --- updated-dependencies: - dependency-name: github.com/hashicorp/memberlist dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [go.uber.org/goleak](https://github.com/uber-go/goleak) from 1.1.12 to 1.2.0. - [Release notes](https://github.com/uber-go/goleak/releases) - [Changelog](https://github.com/uber-go/goleak/blob/master/CHANGELOG.md) - [Commits](https://github.com/uber-go/goleak/compare/v1.1.12...v1.2.0 ) --- updated-dependencies: - dependency-name: go.uber.org/goleak dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by:
dependabot[bot] <support@github.com> Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 28 Oct, 2022 1 commit
-
-
Tim Gross authored
The `Eval.Delete` endpoint has a helper that takes a list of jobs and allocs and determines whether the eval associated with those is safe to delete (based on their state). Filtering improvements to the `Eval.Delete` endpoint are going to need this check to run in the state store itself for consistency. Refactor to push this check down into the state store to keep the eventual diff for that work reasonable.
-
- 27 Oct, 2022 6 commits
-
-
Seth Hoenig authored
Remove dead linters and add some interesting new ones.
-
Tim Gross authored
While working on filtering improvements to the `Eval.Delete` endpoint I noticed that this test was going to need to expand significantly and needed some refactoring to make that work nicely. In order to reduce the size of the eventual diff, I've pulled this refactoring out into its own changeset.
-
Charlie Voiselle authored
-
Tim Gross authored
Post 1.4.2 release
-
Tim Gross authored
Changelog updates for 1.4.2 and backports.
-
hc-github-team-nomad-core authored
-