This project is mirrored from https://gitee.com/mirrors/nomad.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 15 Feb, 2022 3 commits
-
-
Alex Holyoake authored
-
Seth Hoenig authored
scheduler: fix dropped test error
-
Lars Lehtonen authored
-
- 14 Feb, 2022 1 commit
-
-
James Rasell authored
client: track service deregister call so it's only called once.
-
- 11 Feb, 2022 5 commits
-
-
Tim Gross authored
The `volume detach`, `volume deregister`, and `volume status` commands accept a prefix argument for the volume ID. Update the behavior on exact matches so that if there is more than one volume that matches the prefix, we should only return an error if one of the volume IDs is not an exact match. Otherwise we won't be able to use these commands at all on those volumes. This also makes the behavior of these commands consistent with `job stop`.
-
Tim Gross authored
The CSI specification says: > The CO SHALL provide the listen-address for the Plugin by way of the `CSI_ENDPOINT` environment variable. Note that plugins without filesystem isolation won't have the plugin dir bind-mounted to their alloc dir, but we can provide a path to the socket anyways. Refactor to use opts struct for plugin supervisor hook config. The parameter list for configuring the plugin supervisor hook has grown enough where is makes sense to use an options struct similiar to many of the other task runner hooks (ex. template).
-
James Rasell authored
doc(typo): technical typo in advertised example
-
James Rasell authored
changelog: add entry for #12040
-
James Rasell authored
In certain task lifecycles the taskrunner service deregister call could be called three times for a task that is exiting. Whilst each hook caller of deregister has its own purpose, we should try and ensure it is only called once during the shutdown lifecycle of a task. This change therefore tracks when deregister has been called, so that subsequent calls are noop. In the event the task is restarting, the deregister value is reset to ensure proper operation.
-
- 10 Feb, 2022 15 commits
-
-
Derek Strickland authored
The allocReconciler's computeGroup function contained a significant amount of inline logic that was difficult to understand the intent of. This commit extracts inline logic into the following intention revealing subroutines. It also includes updates to the function internals also aimed at improving maintainability and renames some existing functions for the same purpose. New or renamed functions include. Renamed functions - handleGroupCanaries -> cancelUnneededCanaries - handleDelayedLost -> createLostLaterEvals - handeDelayedReschedules -> createRescheduleLaterEvals New functions - filterAndStopAll - initializeDeploymentState - requiresCanaries - computeCanaries - computeUnderProvisionedBy - computeReplacements - computeDestructiveUpdates - computeMigrations - createDeployment - isDeploymentComplete
-
Luiz Aoqui authored
-
Luiz Aoqui authored
-
Luiz Aoqui authored
Merge release 1.2.6 branch
-
Luiz Aoqui authored
-
Luiz Aoqui authored
Version 1.2.6
-
Marc-Aurèle Brothier authored
-
James Rasell authored
-
Nomad Release Bot authored
-
Nomad Release bot authored
-
Luiz Aoqui authored
-
Tim Gross authored
The spread iterator can panic when processing an evaluation, resulting in an unrecoverable state in the cluster. Whenever a panicked server restarts and quorum is restored, the next server to dequeue the evaluation will panic. To trigger this state: * The job must have `max_parallel = 0` and a `canary >= 1`. * The job must not have a `spread` block. * The job must have a previous version. * The previous version must have a `spread` block and at least one failed allocation. In this scenario, the desired changes include `(place 1+) (stop 1+), (ignore n) (canary 1)`. Before the scheduler can place the canary allocation, it tries to find out which allocations can be stopped. This passes back through the stack so that we can determine previous-node penalties, etc. We call `SetJob` on the stack with the previous version of the job, which will include assessing the `spread` block (even though the results are unused). The task group spread info state from that pass through the spread iterator is not reset when we call `SetJob` again. When the new job version iterates over the `groupPropertySets`, it will get an empty `spreadAttributeMap`, resulting in an unexpected nil pointer dereference. This changeset resets the spread iterator internal state when setting the job, logging with a bypass around the bug in case we hit similar cases, and a test that panics the scheduler without the patch.
-
Luiz Aoqui authored
Add new namespace ACL requirement for the /v1/jobs/parse endpoint and return early if HCLv2 parsing fails. The endpoint now requires the new `parse-job` ACL capability or `submit-job`.
-
Seth Hoenig authored
This PR adds symlink resolution when doing validation of paths to ensure they do not escape client allocation directories.
-
Seth Hoenig authored
go-getter creates a circular dependency between a Client and Getter, which means each is inherently thread-unsafe if you try to re-use on or the other. This PR fixes Nomad to no longer make use of the default Getter objects provided by the go-getter package. Nomad must create a new Client object on every artifact download, as the Client object controls the Src and Dst among other things. When Caling Client.Get, the Getter modifies its own Client reference, creating the circular reference and race condition. We can still achieve most of the desired connection caching behavior by re-using a shared HTTP client with transport pooling enabled.
-
- 09 Feb, 2022 3 commits
-
-
Tim Gross authored
When an allocation is updated, the job summary for the associated job is also updated. CSI uses the job summary to set the expected count for controller and node plugins. We incorrectly used the allocation's server status instead of the job status when deciding whether to update or remove the job from the plugins. This caused a node drain or other terminal state for an allocation to clear the expected count for the entire plugin. Use the job status to guide whether to update or remove the expected count. The existing CSI tests for the state store incorrectly modeled the updates we received from servers vs those we received from clients, leading to test assertions that passed when they should not. Rework the tests to clarify each step in the lifecycle and rename CSI state store functions for clarity
-
Tim Gross authored
-
Kevin Schoonover authored
-
- 08 Feb, 2022 9 commits
-
-
Thomas Lefebvre authored
-
Tim Gross authored
-
Seth Hoenig authored
env: update aws cpu configs
-
Seth Hoenig authored
By running the tools/ec2info tool
-
Tim Gross authored
Processing an evaluation is nearly a pure function over the state snapshot, but we randomly shuffle the nodes. This means that developers can't take a given state snapshot and pass an evaluation through it and be guaranteed the same plan results. But the evaluation ID is already random, so if we use this as the seed for shuffling the nodes we can greatly reduce the sources of non-determinism. Unfortunately golang map iteration uses a global source of randomness and not a goroutine-local one, but arguably if the scheduler behavior is impacted by this, that's a bug in the iteration.
-
Seth Hoenig authored
changelog: update changelog for DO
-
Seth Hoenig authored
Co-authored-by:
Luiz Aoqui <luiz@hashicorp.com>
-
Seth Hoenig authored
-
Seth Hoenig authored
client/fingerprint: add digitalocean fingerprinter
-
- 07 Feb, 2022 3 commits
-
-
Dylan Staley authored
website: display warning in IE 11
-
Kevin Schoonover authored
Co-authored-by:
Seth Hoenig <seth.a.hoenig@gmail.com>
-
Tim Gross authored
If processing a specific evaluation causes the scheduler (and therefore the entire server) to panic, that evaluation will never get a chance to be nack'd and cleared from the state store. It will get dequeued by another scheduler, causing that server to panic, and so forth until all servers are in a panic loop. This prevents the operator from intervening to remove the evaluation or update the state. Recover the goroutine from the top-level `Process` methods for each scheduler so that this condition can be detected without panicking the server process. This will lead to a loop of recovering the scheduler goroutine until the eval can be removed or nack'd, but that's much better than taking a downtime.
-
- 06 Feb, 2022 1 commit
-
-
Kevin Schoonover authored
-