Commits · angrycub-patch-1 · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

10 Feb, 2022 13 commits

Fixed scheduler config examples · c95a4a38
Charlie Voiselle authored 3 years ago

Unverified

c95a4a38
docs: add upgrade note and ACL requirements for the job submit endpoint (#12046) · 6a3368a0
Luiz Aoqui authored 3 years ago

Unverified

6a3368a0
update download to Nomad v1.2.6 (#12042) · 6d7813d5
Luiz Aoqui authored 3 years ago

Unverified

6d7813d5
Merge pull request #12045 from hashicorp/merge-release-1.2.6-branch · af332373
Luiz Aoqui authored 3 years ago
```
Merge release 1.2.6 branch
```
Unverified

af332373
prepare for next release · 096934a5
Luiz Aoqui authored 3 years ago

Unverified

096934a5
Merge tag 'v1.2.6' into merge-release-1.2.6-branch · bc333c25
Luiz Aoqui authored 3 years ago
```
Version 1.2.6
```
Unverified

bc333c25
Release v1.2.6 · 95514d56
Nomad Release Bot authored 3 years ago

Unverified

95514d56
Generate files for 1.2.6 release · a6c6b475
Nomad Release bot authored 3 years ago

a6c6b475
docs: add 1.2.6 to changelog · a3319d7d
Luiz Aoqui authored 3 years ago

Unverified

a3319d7d

scheduler: prevent panic in spread iterator during alloc stop · c49359ad

Tim Gross authored 3 years ago

The spread iterator can panic when processing an evaluation, resulting
in an unrecoverable state in the cluster. Whenever a panicked server
restarts and quorum is restored, the next server to dequeue the
evaluation will panic.

To trigger this state:
* The job must have `max_parallel = 0` and a `canary >= 1`.
* The job must not have a `spread` block.
* The job must have a previous version.
* The previous version must have a `spread` block and at least one
  failed allocation.

In this scenario, the desired changes include `(place 1+) (stop
1+), (ignore n) (canary 1)`. Before the scheduler can place the canary
allocation, it tries to find out which allocations can be
stopped. This passes back through the stack so that we can determine
previous-node penalties, etc. We call `SetJob` on the stack with the
previous version of the job, which will include assessing the `spread`
block (even though the results are unused). The task group spread info
state from that pass through the spread iterator is not reset when we
call `SetJob` again. When the new job version iterates over the
`groupPropertySets`, it will get an empty `spreadAttributeMap`,
resulting in an unexpected nil pointer dereference.

This changeset resets the spread iterator internal state when setting
the job, logging with a bypass around the bug in case we hit similar
cases, and a test that panics the scheduler without the patch.

Unverified

c49359ad

api: prevent excessice CPU load on job parse · 1aa3b561

Luiz Aoqui authored 3 years ago

Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
return early if HCLv2 parsing fails.

The endpoint now requires the new `parse-job` ACL capability or
`submit-job`.

Unverified

1aa3b561

client: check escaping of alloc dir using symlinks · b3c0e6a7

Seth Hoenig authored 3 years ago

This PR adds symlink resolution when doing validation of paths
to ensure they do not escape client allocation directories.

Unverified

b3c0e6a7

client: fix race condition in use of go-getter · 6445da9b

Seth Hoenig authored 3 years ago

go-getter creates a circular dependency between a Client and Getter,
which means each is inherently thread-unsafe if you try to re-use
on or the other.

This PR fixes Nomad to no longer make use of the default Getter objects
provided by the go-getter package. Nomad must create a new Client object
on every artifact download, as the Client object controls the Src and Dst
among other things. When Caling Client.Get, the Getter modifies its own
Client reference, creating the circular reference and race condition.

We can still achieve most of the desired connection caching behavior by
re-using a shared HTTP client with transport pooling enabled.

Unverified

6445da9b

09 Feb, 2022 3 commits

CSI: use job status not alloc status for plugin updates from summary (#12027) · 05b99001

Tim Gross authored 3 years ago

When an allocation is updated, the job summary for the associated job
is also updated. CSI uses the job summary to set the expected count
for controller and node plugins. We incorrectly used the allocation's
server status instead of the job status when deciding whether to
update or remove the job from the plugins. This caused a node drain or
other terminal state for an allocation to clear the expected count for
the entire plugin.

Use the job status to guide whether to update or remove the expected
count.

The existing CSI tests for the state store incorrectly modeled the
updates we received from servers vs those we received from clients,
leading to test assertions that passed when they should not.

Rework the tests to clarify each step in the lifecycle and rename CSI state
store functions for clarity

Unverified

05b99001

docs and changelog for `nomad config validate` (#12031) · b3212a5b
Tim Gross authored 3 years ago

Unverified

b3212a5b
fingerprint: remove metadata from digitalocean (#12032) · 6633f8d9
Kevin Schoonover authored 3 years ago

Unverified

6633f8d9

08 Feb, 2022 9 commits
- Add config command and config validate subcommand to nomad CLI (#9198) · 41f84c65
  Thomas Lefebvre authored 3 years ago
  
  Unverified
  
  41f84c65
- fingerprint: digitalocean fingerprint test requires metadata header (#12028) · 79e8d394
  Tim Gross authored 3 years ago
  
  Unverified
  
  79e8d394
- Merge pull request #12026 from hashicorp/f-update-aws · 0ae882a3
  Seth Hoenig authored 3 years ago
```
env: update aws cpu configs
```
  Unverified
  
  0ae882a3
- env: update aws cpu configs · 652de761
  Seth Hoenig authored 3 years ago
```
By running the tools/ec2info tool
```
  652de761
- scheduler: seed random shuffle nodes with eval ID (#12008) · b0b7a494
  Tim Gross authored 3 years ago
```
Processing an evaluation is nearly a pure function over the state
snapshot, but we randomly shuffle the nodes. This means that
developers can't take a given state snapshot and pass an evaluation
through it and be guaranteed the same plan results.

But the evaluation ID is already random, so if we use this as the seed
for shuffling the nodes we can greatly reduce the sources of
non-determinism. Unfortunately golang map iteration uses a global
source of randomness and not a goroutine-local one, but arguably
if the scheduler behavior is impacted by this, that's a bug in the
iteration.
```
  Unverified
  
  b0b7a494
- Merge pull request #12024 from hashicorp/docs-update-cl · e2b69dcb
  Seth Hoenig authored 3 years ago
```
changelog: update changelog for DO
```
  Unverified
  
  e2b69dcb
- cl: fix DO name · da42b284
  Seth Hoenig authored 3 years ago
```
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
```
  Unverified
  
  da42b284
- changelog: update changelog for DO · 06b73afd
  Seth Hoenig authored 3 years ago
  
  06b73afd
- Merge pull request #12015 from kevinschoonover/main · fa0d8901
  Seth Hoenig authored 3 years ago
```
client/fingerprint: add digitalocean fingerprinter
```
  Unverified
  
  fa0d8901
07 Feb, 2022 3 commits

Merge pull request #11936 from hashicorp/ds.ie11-warning · 21f7d011
Dylan Staley authored 3 years ago
```
website: display warning in IE 11
```
Unverified

21f7d011
address comments · 5cea3663
Kevin Schoonover authored 3 years ago
```
Co-authored-by: Seth Hoenig <seth.a.hoenig@gmail.com>
```
5cea3663

scheduler: recover from panic (#12009) · f8111692

Tim Gross authored 3 years ago

If processing a specific evaluation causes the scheduler (and
therefore the entire server) to panic, that evaluation will never
get a chance to be nack'd and cleared from the state store. It will
get dequeued by another scheduler, causing that server to panic, and
so forth until all servers are in a panic loop. This prevents the
operator from intervening to remove the evaluation or update the
state.

Recover the goroutine from the top-level `Process` methods for each
scheduler so that this condition can be detected without panicking the
server process. This will lead to a loop of recovering the scheduler
goroutine until the eval can be removed or nack'd, but that's much
better than taking a downtime.

Unverified

f8111692

06 Feb, 2022 2 commits
- small fixes · 7b6f9540
  Kevin Schoonover authored 3 years ago
  
  7b6f9540
- add digitalocean fingerprinter · 4d4c8397
  Kevin Schoonover authored 3 years ago
  
  4d4c8397
05 Feb, 2022 4 commits

reconciler: improve variable names and extract methods from inline logic (#12010) · 0263650f
Derek Strickland authored 3 years ago
```
* reconciler: improved variable names and extract methods from inline logic
Co-authored-by: Tim Gross <tgross@hashicorp.com>
```
Unverified

0263650f

fix mTLS certificate check on agent to agent RPCs (#11998) · 29ffa026

Luiz Aoqui authored 3 years ago

PR #11956 implemented a new mTLS RPC check to validate the role of the
certificate used in the request, but further testing revealed two flaws:

1. client-only endpoints did not accept server certificates so the
request would fail when forwarded from one server to another.
2. the certificate was being checked after the request was forwarded,
so the check would happen over the server certificate, not the
actual source.

This commit checks for the desired mTLS level, where the client level
accepts both, a server or a client certificate. It also validates the
cercertificate before the request is forwarded.

Unverified

29ffa026

style: fix up very long tag word breaking the allocation service table width (#11995) · 3fc73896
ttys3 authored 3 years ago

Unverified

3fc73896
improve error message on service length (#12012) · 16485f40
Karthick Ramachandran authored 3 years ago

Unverified

16485f40

04 Feb, 2022 3 commits
- feat: display warning in IE 11 · 26e3b0f0
  Dylan Staley authored 3 years ago
  
  26e3b0f0
- Merge pull request #12002 from... · e9ef2c05
  Seth Hoenig authored 3 years ago
```
Merge pull request #12002 from hashicorp/dependabot/go_modules/github.com/hashicorp/go-version-1.4.0

build(deps): bump github.com/hashicorp/go-version from 1.3.0 to 1.4.0
```
  Unverified
  
  e9ef2c05
- Merge pull request #11937 from hashicorp/dependabot/go_modules/google.golang.org/grpc-1.44.0 · 4eb00265
  Seth Hoenig authored 3 years ago
```
build(deps): bump google.golang.org/grpc from 1.42.0 to 1.44.0
```
  Unverified
  
  4eb00265
03 Feb, 2022 3 commits

add semgrep rule to check for potential time.After leaks (#12001) · 290bd0d5
Luiz Aoqui authored 3 years ago

Unverified

290bd0d5

build(deps): bump github.com/hashicorp/go-version from 1.3.0 to 1.4.0 · de7e94cf

dependabot[bot] authored 3 years ago

Bumps [github.com/hashicorp/go-version](https://github.com/hashicorp/go-version) from 1.3.0 to 1.4.0.
- [Release notes](https://github.com/hashicorp/go-version/releases)
- [Changelog](https://github.com/hashicorp/go-version/blob/main/CHANGELOG.md)
- [Commits](https://github.com/hashicorp/go-version/compare/v1.3.0...v1.4.0

)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/go-version
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>

Unverified

de7e94cf

build(deps): bump google.golang.org/grpc from 1.42.0 to 1.44.0 · 3efd36eb

dependabot[bot] authored 3 years ago

Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from 1.42.0 to 1.44.0.
- [Release notes](https://github.com/grpc/grpc-go/releases)
- [Commits](https://github.com/grpc/grpc-go/compare/v1.42.0...v1.44.0

)

---
updated-dependencies:
- dependency-name: google.golang.org/grpc
  dependency-type: direct:production
  update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>

Unverified

3efd36eb