Commits · 11dcb8751214fe2740430b885bde3ffc69f9c5db · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

15 Feb, 2022 3 commits
- config: merge ReservableCores in clientConfig (#12044) · 11dcb875
  Alex Holyoake authored 3 years ago
  
  Unverified
  
  11dcb875
- Merge pull request #12069 from alrs/scheduler-test-err · 34de8b56
  Seth Hoenig authored 3 years ago
```
scheduler: fix dropped test error
```
  Unverified
  
  34de8b56
- scheduler: fix dropped test error · f8d472a1
  Lars Lehtonen authored 3 years ago
  
  Unverified
  
  f8d472a1
14 Feb, 2022 1 commit
- Merge pull request #12052 from hashicorp/b-taskrunner-track-deregistered-call · 282eb10a
  James Rasell authored 3 years ago
```
client: track service deregister call so it's only called once.
```
  Unverified
  
  282eb10a
11 Feb, 2022 5 commits

csi: volume cli prefix matching should accept exact match (#12051) · 4afc67b7

Tim Gross authored 3 years ago

The `volume detach`, `volume deregister`, and `volume status` commands
accept a prefix argument for the volume ID. Update the behavior on
exact matches so that if there is more than one volume that matches
the prefix, we should only return an error if one of the volume IDs is
not an exact match. Otherwise we won't be able to use these commands
at all on those volumes. This also makes the behavior of these commands
consistent with `job stop`.

Unverified

4afc67b7

csi: provide `CSI_ENDPOINT` env var to plugins (#12050) · 16baefcb

Tim Gross authored 3 years ago

The CSI specification says:
> The CO SHALL provide the listen-address for the Plugin by way of the
`CSI_ENDPOINT` environment variable.

Note that plugins without filesystem isolation won't have the plugin
dir bind-mounted to their alloc dir, but we can provide a path to the
socket anyways.

Refactor to use opts struct for plugin supervisor hook config.
The parameter list for configuring the plugin supervisor hook has
grown enough where is makes sense to use an options struct similiar to
many of the other task runner hooks (ex. template).

Unverified

16baefcb

Merge pull request #12053 from marcaurele/fix-typo · 26061886
James Rasell authored 3 years ago
```
doc(typo): technical typo in advertised example
```
Unverified

26061886
Merge pull request #12041 from hashicorp/b-gh-12040 · d1ffc237
James Rasell authored 3 years ago
```
changelog: add entry for #12040
```
Unverified

d1ffc237

client: track service deregister call so it's only called once. · 72f411c9

James Rasell authored 3 years ago

In certain task lifecycles the taskrunner service deregister call
could be called three times for a task that is exiting. Whilst
each hook caller of deregister has its own purpose, we should try
and ensure it is only called once during the shutdown lifecycle of
a task.

This change therefore tracks when deregister has been called, so
that subsequent calls are noop. In the event the task is
restarting, the deregister value is reset to ensure proper
operation.

Unverified

72f411c9

10 Feb, 2022 15 commits

reconciler: refactor `computeGroup` (#12033) · cefc58dd

Derek Strickland authored 3 years ago

The allocReconciler's computeGroup function contained a significant amount of inline logic that was difficult to understand the intent of. This commit extracts inline logic into the following intention revealing subroutines. It also includes updates to the function internals also aimed at improving maintainability and renames some existing functions for the same purpose. New or renamed functions include.

Renamed functions

- handleGroupCanaries -> cancelUnneededCanaries
- handleDelayedLost -> createLostLaterEvals
- handeDelayedReschedules -> createRescheduleLaterEvals

New functions

- filterAndStopAll
- initializeDeploymentState
- requiresCanaries
- computeCanaries
- computeUnderProvisionedBy
- computeReplacements
- computeDestructiveUpdates
- computeMigrations
- createDeployment
- isDeploymentComplete

Unverified

cefc58dd

docs: add upgrade note and ACL requirements for the job submit endpoint (#12046) · 6a3368a0
Luiz Aoqui authored 3 years ago

Unverified

6a3368a0
update download to Nomad v1.2.6 (#12042) · 6d7813d5
Luiz Aoqui authored 3 years ago

Unverified

6d7813d5
Merge pull request #12045 from hashicorp/merge-release-1.2.6-branch · af332373
Luiz Aoqui authored 3 years ago
```
Merge release 1.2.6 branch
```
Unverified

af332373
prepare for next release · 096934a5
Luiz Aoqui authored 3 years ago

Unverified

096934a5
Merge tag 'v1.2.6' into merge-release-1.2.6-branch · bc333c25
Luiz Aoqui authored 3 years ago
```
Version 1.2.6
```
Unverified

bc333c25
small typo in advertised example · 0cc28e95
Marc-Aurèle Brothier authored 3 years ago

Unverified

0cc28e95
changelog: add entry for #12040 · 7f0435ae
James Rasell authored 3 years ago

Unverified

7f0435ae
Release v1.2.6 · 95514d56
Nomad Release Bot authored 3 years ago

Unverified

95514d56
Generate files for 1.2.6 release · a6c6b475
Nomad Release bot authored 3 years ago

a6c6b475
docs: add 1.2.6 to changelog · a3319d7d
Luiz Aoqui authored 3 years ago

Unverified

a3319d7d

scheduler: prevent panic in spread iterator during alloc stop · c49359ad

Tim Gross authored 3 years ago

The spread iterator can panic when processing an evaluation, resulting
in an unrecoverable state in the cluster. Whenever a panicked server
restarts and quorum is restored, the next server to dequeue the
evaluation will panic.

To trigger this state:
* The job must have `max_parallel = 0` and a `canary >= 1`.
* The job must not have a `spread` block.
* The job must have a previous version.
* The previous version must have a `spread` block and at least one
  failed allocation.

In this scenario, the desired changes include `(place 1+) (stop
1+), (ignore n) (canary 1)`. Before the scheduler can place the canary
allocation, it tries to find out which allocations can be
stopped. This passes back through the stack so that we can determine
previous-node penalties, etc. We call `SetJob` on the stack with the
previous version of the job, which will include assessing the `spread`
block (even though the results are unused). The task group spread info
state from that pass through the spread iterator is not reset when we
call `SetJob` again. When the new job version iterates over the
`groupPropertySets`, it will get an empty `spreadAttributeMap`,
resulting in an unexpected nil pointer dereference.

This changeset resets the spread iterator internal state when setting
the job, logging with a bypass around the bug in case we hit similar
cases, and a test that panics the scheduler without the patch.

Unverified

c49359ad

api: prevent excessice CPU load on job parse · 1aa3b561

Luiz Aoqui authored 3 years ago

Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
return early if HCLv2 parsing fails.

The endpoint now requires the new `parse-job` ACL capability or
`submit-job`.

Unverified

1aa3b561

client: check escaping of alloc dir using symlinks · b3c0e6a7

Seth Hoenig authored 3 years ago

This PR adds symlink resolution when doing validation of paths
to ensure they do not escape client allocation directories.

Unverified

b3c0e6a7

client: fix race condition in use of go-getter · 6445da9b

Seth Hoenig authored 3 years ago

go-getter creates a circular dependency between a Client and Getter,
which means each is inherently thread-unsafe if you try to re-use
on or the other.

This PR fixes Nomad to no longer make use of the default Getter objects
provided by the go-getter package. Nomad must create a new Client object
on every artifact download, as the Client object controls the Src and Dst
among other things. When Caling Client.Get, the Getter modifies its own
Client reference, creating the circular reference and race condition.

We can still achieve most of the desired connection caching behavior by
re-using a shared HTTP client with transport pooling enabled.

Unverified

6445da9b

09 Feb, 2022 3 commits

CSI: use job status not alloc status for plugin updates from summary (#12027) · 05b99001

Tim Gross authored 3 years ago

When an allocation is updated, the job summary for the associated job
is also updated. CSI uses the job summary to set the expected count
for controller and node plugins. We incorrectly used the allocation's
server status instead of the job status when deciding whether to
update or remove the job from the plugins. This caused a node drain or
other terminal state for an allocation to clear the expected count for
the entire plugin.

Use the job status to guide whether to update or remove the expected
count.

The existing CSI tests for the state store incorrectly modeled the
updates we received from servers vs those we received from clients,
leading to test assertions that passed when they should not.

Rework the tests to clarify each step in the lifecycle and rename CSI state
store functions for clarity

Unverified

05b99001

docs and changelog for `nomad config validate` (#12031) · b3212a5b
Tim Gross authored 3 years ago

Unverified

b3212a5b
fingerprint: remove metadata from digitalocean (#12032) · 6633f8d9
Kevin Schoonover authored 3 years ago

Unverified

6633f8d9

08 Feb, 2022 9 commits
- Add config command and config validate subcommand to nomad CLI (#9198) · 41f84c65
  Thomas Lefebvre authored 3 years ago
  
  Unverified
  
  41f84c65
- fingerprint: digitalocean fingerprint test requires metadata header (#12028) · 79e8d394
  Tim Gross authored 3 years ago
  
  Unverified
  
  79e8d394
- Merge pull request #12026 from hashicorp/f-update-aws · 0ae882a3
  Seth Hoenig authored 3 years ago
```
env: update aws cpu configs
```
  Unverified
  
  0ae882a3
- env: update aws cpu configs · 652de761
  Seth Hoenig authored 3 years ago
```
By running the tools/ec2info tool
```
  652de761
- scheduler: seed random shuffle nodes with eval ID (#12008) · b0b7a494
  Tim Gross authored 3 years ago
```
Processing an evaluation is nearly a pure function over the state
snapshot, but we randomly shuffle the nodes. This means that
developers can't take a given state snapshot and pass an evaluation
through it and be guaranteed the same plan results.

But the evaluation ID is already random, so if we use this as the seed
for shuffling the nodes we can greatly reduce the sources of
non-determinism. Unfortunately golang map iteration uses a global
source of randomness and not a goroutine-local one, but arguably
if the scheduler behavior is impacted by this, that's a bug in the
iteration.
```
  Unverified
  
  b0b7a494
- Merge pull request #12024 from hashicorp/docs-update-cl · e2b69dcb
  Seth Hoenig authored 3 years ago
```
changelog: update changelog for DO
```
  Unverified
  
  e2b69dcb
- cl: fix DO name · da42b284
  Seth Hoenig authored 3 years ago
```
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
```
  Unverified
  
  da42b284
- changelog: update changelog for DO · 06b73afd
  Seth Hoenig authored 3 years ago
  
  06b73afd
- Merge pull request #12015 from kevinschoonover/main · fa0d8901
  Seth Hoenig authored 3 years ago
```
client/fingerprint: add digitalocean fingerprinter
```
  Unverified
  
  fa0d8901
07 Feb, 2022 3 commits

Merge pull request #11936 from hashicorp/ds.ie11-warning · 21f7d011
Dylan Staley authored 3 years ago
```
website: display warning in IE 11
```
Unverified

21f7d011
address comments · 5cea3663
Kevin Schoonover authored 3 years ago
```
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
```
5cea3663

scheduler: recover from panic (#12009) · f8111692

Tim Gross authored 3 years ago

If processing a specific evaluation causes the scheduler (and
therefore the entire server) to panic, that evaluation will never
get a chance to be nack'd and cleared from the state store. It will
get dequeued by another scheduler, causing that server to panic, and
so forth until all servers are in a panic loop. This prevents the
operator from intervening to remove the evaluation or update the
state.

Recover the goroutine from the top-level `Process` methods for each
scheduler so that this condition can be detected without panicking the
server process. This will lead to a loop of recovering the scheduler
goroutine until the eval can be removed or nack'd, but that's much
better than taking a downtime.

Unverified

f8111692

06 Feb, 2022 1 commit
- small fixes · 7b6f9540
  Kevin Schoonover authored 3 years ago
  
  7b6f9540