Commits · c5479d30d95fcdb5e3e22cac01b5432a059e9430 · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

10 Feb, 2022 5 commits

docs: add 1.1.12 to changelog · c5479d30
Luiz Aoqui authored 3 years ago

Unverified

c5479d30

scheduler: prevent panic in spread iterator during alloc stop · 9565ce3f

Tim Gross authored 3 years ago

The spread iterator can panic when processing an evaluation, resulting
in an unrecoverable state in the cluster. Whenever a panicked server
restarts and quorum is restored, the next server to dequeue the
evaluation will panic.

To trigger this state:
* The job must have `max_parallel = 0` and a `canary >= 1`.
* The job must not have a `spread` block.
* The job must have a previous version.
* The previous version must have a `spread` block and at least one
  failed allocation.

In this scenario, the desired changes include `(place 1+) (stop
1+), (ignore n) (canary 1)`. Before the scheduler can place the canary
allocation, it tries to find out which allocations can be
stopped. This passes back through the stack so that we can determine
previous-node penalties, etc. We call `SetJob` on the stack with the
previous version of the job, which will include assessing the `spread`
block (even though the results are unused). The task group spread info
state from that pass through the spread iterator is not reset when we
call `SetJob` again. When the new job version iterates over the
`groupPropertySets`, it will get an empty `spreadAttributeMap`,
resulting in an unexpected nil pointer dereference.

This changeset resets the spread iterator internal state when setting
the job, logging with a bypass around the bug in case we hit similar
cases, and a test that panics the scheduler without the patch.

Unverified

9565ce3f

api: prevent excessice CPU load on job parse · 820c8e4f

Luiz Aoqui authored 3 years ago

Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
return early if HCLv2 parsing fails.

The endpoint now requires the new `parse-job` ACL capability or
`submit-job`.

Unverified

820c8e4f

client: check escaping of alloc dir using symlinks · fcb3a5d0

Seth Hoenig authored 3 years ago

This PR adds symlink resolution when doing validation of paths
to ensure they do not escape client allocation directories.

Unverified

fcb3a5d0

client: fix race condition in use of go-getter · 1064431c

Seth Hoenig authored 3 years ago

go-getter creates a circular dependency between a Client and Getter,
which means each is inherently thread-unsafe if you try to re-use
on or the other.

This PR fixes Nomad to no longer make use of the default Getter objects
provided by the go-getter package. Nomad must create a new Client object
on every artifact download, as the Client object controls the Src and Dst
among other things. When Caling Client.Get, the Getter modifies its own
Client reference, creating the circular reference and race condition.

We can still achieve most of the desired connection caching behavior by
re-using a shared HTTP client with transport pooling enabled.

Unverified

1064431c

31 Jan, 2022 2 commits
- Release v1.1.11 · 97a0f897
  Nomad Release Bot authored 3 years ago
  
  Unverified
  
  97a0f897
- Generate files for 1.1.11 release · 17fbe0e6
  Nomad Release bot authored 3 years ago
  
  17fbe0e6
28 Jan, 2022 10 commits

docs: add 1.1.11 to changelog · e860d073
Tim Gross authored 3 years ago

e860d073
set LAST_RELEASE to 1.1.10 for the 1.1.11 release branch · 96c5c628
Tim Gross authored 3 years ago

96c5c628
docs: missing changelog for #11892 (#11959) · 5c6aeebb
Tim Gross authored 3 years ago

5c6aeebb

CSI: node unmount from the client before unpublish RPC (#11892) · c2b850b1

Tim Gross authored 3 years ago

When an allocation stops, the `csi_hook` makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and `stop_on_client_disconnect`
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can
return ignorable errors in the case where the volume has already been
unmounted from the ...

c2b850b1

CSI: tests to exercise csi_hook (#11788) · 8af384a9

Tim Gross authored 3 years ago

Small refactoring of the allocrunner hook for CSI to make it more
testable, and a unit test that covers most of its logic.

8af384a9

CSI: move terminal alloc handling into denormalization (#11931) · 2c6de3e8

Tim Gross authored 3 years ago

* The volume claim GC method and volumewatcher both have logic
collecting terminal allocations that duplicates most of the logic
that's now in the state store's `CSIVolumeDenormalize` method. Copy
this logic into the state store so that all code paths have the same
view of the past claims.
* Remove logic in the volume claim GC that now lives in the state
store's `CSIVolumeDenormalize` method.
* Remove logic in the volumewatcher that now lives in the state
store's `CSIVolumeDenormalize` method.
* Remove logic in the node unpublish RPC that now lives in the state
store's `CSIVolumeDenormalize` method.

2c6de3e8

csi: ensure that PastClaims are populated with correct mode (#11932) · 26b50083

Tim Gross authored 3 years ago

In the client's `(*csiHook) Postrun()` method, we make an unpublish
RPC that includes a claim in the `CSIVolumeClaimStateUnpublishing`
state and using the mode from the client. But then in the
`(*CSIVolume) Unpublish` RPC handler, we query the volume from the
state store (because we only get an ID from the client). And when we
make the client RPC for the node unpublish step, we use the _current
volume's_ view of the mode. If the volume's mode has been changed
before the old allocations can have their claims released, then we end
up making a CSI RPC that will never succeed.

Why does this code path get the mode from the volume and not the
claim? Because the claim written by the GC job in `(*CoreScheduler)
csiVolumeClaimGC` doesn't have a mode. Instead it just writes a claim
in the unpublishing state to ensure the volumewatcher detects a "past
claim" change and reaps all the claims on the volumes.

Fix this by ensuring that...

26b50083

CSI: resolve invalid claim states (#11890) · 6e0119de

Tim Gross authored 3 years ago

* csi: resolve invalid claim states on read

It's currently possible for CSI volumes to be claimed by allocations
that no longer exist. This changeset asserts a reasonable state at
the state store level by registering these nil allocations as "past
claims" on any read. This will cause any pass through the periodic GC
or volumewatcher to trigger the unpublishing workflow for those claims.

* csi: make feasibility check errors more understandable

When the feasibility checker finds we have no free write claims, it
checks to see if any of those claims are for the job we're currently
scheduling (so that earlier versions of a job can't block claims for
new versions) and reports a conflict if the volume can't be scheduled
so that the user can fix their claims. But when the checker hits a
claim that has a GCd allocation, the state is recoverable by the
server once claim reaping completes and no user intervention is
required; the blocked eval should complete. Differentiate the
scheduler error produced by these two conditions.

6e0119de

csi: update leader's ACL in volumewatcher (#11891) · 41c2daf4

Tim Gross authored 3 years ago

The volumewatcher that runs on the leader needs to make RPC calls
rather than writing to raft (as we do in the deploymentwatcher)
because the unpublish workflow needs to make RPC calls to the
clients. This requires that the volumewatcher has access to the
leader's ACL token.

But when leadership transitions, the new leader creates a new leader
ACL token. This ACL token needs to be passed into the volumewatcher
when we enable it, otherwise the volumewatcher can find itself with a
stale token.

41c2daf4

csi: reap unused volume claims at leadership transitions (#11776) · ad8166de

Tim Gross authored 3 years ago

When `volumewatcher.Watcher` starts on the leader, it starts a watch
on every volume and triggers a reap of unused claims on any change to
that volume. But if a reaping is in-flight during leadership
transitions, it will fail and the event that triggered the reap will
be dropped. Perform one reap of unused claims at the start of the
watcher so that leadership transitions don't drop this event.

ad8166de

19 Jan, 2022 1 commit
- Release v1.1.10 · 2f08fe23
  Nomad Release Bot authored 3 years ago
  
  Unverified
  
  2f08fe23
18 Jan, 2022 18 commits

Generate files for 1.1.10 release · 028cef25
Nomad Release bot authored 3 years ago

028cef25
docs: add 1.1.10 to changelog · d7ae04eb
Luiz Aoqui authored 3 years ago

Unverified

d7ae04eb
Merge pull request #11744 from hashicorp/b-node-copy · 5d5bb262
Michael Schurter authored 3 years ago
```
Fix Node.Copy()
```
Unverified

5d5bb262
changelog: add entry for #11793 (#11862) · 2ba4892c
Luiz Aoqui authored 3 years ago

Unverified

2ba4892c

drivers: set world-readable permissions on copied resolv.conf (#11856) · 0d14741d

Tim Gross authored 3 years ago

When we copy the system DNS to a task's `resolv.conf`, we should set
the permissions as world-readable so that unprivileged users within
the task can read it.

Unverified

0d14741d

freebsd: build fix for ARM7 32-bit (#11854) · 2fb80225

Tim Gross authored 3 years ago

The size of `stat_t` fields is architecture dependent, which was
reportedly causing a build failure on FreeBSD ARM7 32-bit
systems. This changeset matches the behavior we have on Linux.

Unverified

2fb80225

csi: when warning for multiple prefix matches, use full ID (#11853) · 54203561

Tim Gross authored 3 years ago

When the `volume deregister` or `volume detach` commands get an ID
prefix that matches multiple volumes, show the full length of the
volume IDs in the list of volumes shown so so that the user can select
the correct one.

Unverified

54203561

csi: volume deregistration should require exact ID (#11852) · cd0139d1

Tim Gross authored 3 years ago

The command line client sends a specific volume ID, but this isn't
enforced at the API level and we were incorrectly using a prefix match
for volume deregistration, resulting in cases where a volume with a
shorter ID that's a prefix of another volume would be deregistered
instead of the intended volume.

Unverified

cd0139d1

Merge pull request #11849 from hashicorp/b-changelog-11848 · c48d40d1
James Rasell authored 3 years ago
```
changelog: add entry for #11848
```
Unverified

c48d40d1
Merge pull request #11833 from hashicorp/deps-go-getter-v1.5.11 · 10c786a0
Michael Schurter authored 3 years ago
```
deps: update go-getter to v1.5.11
```
Unverified

10c786a0
Merge pull request #11830 from hashicorp/b-validate-reserved-ports · cc1e4847
Michael Schurter authored 3 years ago
```
agent: validate reserved_ports are valid
```
Unverified

cc1e4847
docs: improve changelog for PR #11783 (#11818) · 8d34f9b4
Tim Gross authored 3 years ago

Unverified

8d34f9b4
scheduler: detect and log unexpected scheduling collisions (#11793) · 58017f99
Luiz Aoqui authored 3 years ago

Unverified

58017f99

Un-break templates when using vault stanza change_mode noop (#11783) · 14c8bbb5

grembo authored 3 years ago

Templates in nomad jobs make use of the vault token defined in
the vault stanza when issuing credentials like client certificates.

When using change_mode "noop" in the vault stanza, consul-template
is not informed in case a vault token is re-issued (which can
happen from time to time for various reasons, as described
in https://www.nomadproject.io/docs/job-specification/vault).

As a result, consul-template will keep using the old vault token
to renew credentials and - once the token expired - stop renewing
credentials. The symptom of this problem is a vault_token
file that is newer than the issued credential (e.g., TLS certificate)
in a job's /secrets directory.

This change corrects this, so that h.updater.updatedVaultToken(token)
is called, which will inform stakeholders about the new
token and make sure, the new token is used by consul-template.

Example job template fragment:

    vault {
        policies = ["nomad-job-policy"]
        change_mode = "noop"
    }

    template {
      data = <<-EOH
        {{ with secret "pki_int/issue/nomad-job"
        "common_name=myjob.service.consul" "ttl=90m"
        "alt_names=localhost" "ip_sans=127.0.0.1"}}
        {{ .Data.certificate }}
        {{ .Data.private_key }}
        {{ .Data.issuing_ca }}
        {{ end }}
      EOH
      destination = "${NOMAD_SECRETS_DIR}/myjob.crt"
      change_mode = "noop"
    }

This fix does not alter the meaning of the three change modes of vault

- "noop" - Take no action
- "restart" - Restart the job
- "signal" - send a signal to the task

as the switch statement following line 232 contains the necessary
logic.

It is assumed that "take no action" was never meant to mean "don't tell
consul-template about the new vault token".

Successfully tested in a staging cluster consisting of multiple
nomad client nodes.

Unverified

14c8bbb5

task runner: fix goroutine leak in prestart hook (#11741) · a24bd934

Tim Gross authored 3 years ago

The task runner prestart hooks take a `joincontext` so they have the
option to exit early if either of two contexts are canceled: from
killing the task or client shutdown. Some tasks exit without being
shutdown from the server, so neither of the joined contexts ever gets
canceled and we leak the `joincontext` (48 bytes) and its internal
goroutine. This primarily impacts batch jobs and any task that fails
or completes early such as non-sidecar prestart lifecycle tasks.
Cancel the `joincontext` after the prestart call exits to fix the
leak.

Unverified

a24bd934

fix host network reserved port fingerprint (#11728) · f730bc28
Luiz Aoqui authored 3 years ago

Unverified

f730bc28

scheduler: fix quadratic performance with spread blocks (#11712) · d63e628a

Tim Gross authored 3 years ago

When the scheduler picks a node for each evaluation, the
`LimitIterator` provides at most 2 eligible nodes for the
`MaxScoreIterator` to choose from. This keeps scheduling fast while
producing acceptable results because the results are binpacked.

Jobs with a `spread` block (or node affinity) remove this limit in
order to produce correct spread scoring. This means that every
allocation within a job with a `spread` block is evaluated against
_all_ eligible nodes. Operators of large clusters have reported that
jobs with `spread` blocks that are eligible on a large number of nodes
can take longer than the nack timeout to evaluate (60s). Typical
evaluations are processed in milliseconds.

In practice, it's not necessary to evaluate every eligible node for
every allocation on large clusters, because the `RandomIterator` at
the base of the scheduler stack produces enough variation in each pass
that the likelihood of an uneven spread is negligible. Note that
feasibility is checked before the limit, so this only impacts the
number of _eligible_ nodes available for scoring, not the total number
of nodes.

This changeset sets the iterator limit for "large" `spread` block and
node affinity jobs to be equal to the number of desired
allocations. This brings an example problematic job evaluation down
from ~3min to ~10s. The included tests ensure that we have acceptable
spread results across a variety of large cluster topologies.

Unverified

d63e628a

cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678) · 03e5425e

Tim Gross authored 3 years ago

When a cluster doesn't have a leader, the `nomad operator debug`
command can safely use stale queries to gracefully degrade the
consistency of almost all its queries. The query parameter for these
API calls was not being set by the command.

Some `api` package queries do not include `QueryOptions` because
they target a specific agent, but they can potentially be forwarded to
other agents. If there is no leader, these forwarded queries will
fail. Provide methods to call these APIs with `QueryOptions`.

Unverified

03e5425e

17 Jan, 2022 4 commits

Override TLS flags individually for meta commands (#11592) · 060a4740

Derek Strickland authored 3 years ago


* Override TLS flags individually for meta commands

* Update command/meta.go
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>

Unverified

060a4740

client: respect `client_auto_join` after connection loss (#11585) · 13047b65

Tim Gross authored 3 years ago

The `consul.client_auto_join` configuration block tells the Nomad
client whether to use Consul service discovery to find Nomad
servers. By default it is set to `true`, but contrary to the
documentation it was only respected during the initial client
registration. If a client missed a heartbeat, failed a
`Node.UpdateStatus` RPC, or if there was no Nomad leader, the client
would fallback to Consul even if `client_auto_join` was set to
`false`. This changeset returns early from the client's trigger for
Consul discovery if the `client_auto_join` field is set to `false`.

Unverified

13047b65

Merge pull request #11579 from hashicorp/b-getscalingpolicy-rpc-index-response · b830afbb
Michael Schurter authored 3 years ago
```
rpc: fix scaling policy get index response when policy is found.
```
Unverified

b830afbb

build(deps): bump github.com/hashicorp/cronexpr from 1.1.0 to 1.1.1 in /api (#11132) · 782d5398

dependabot[bot] authored 3 years ago

* build(deps): bump github.com/hashicorp/cronexpr in /api

Bumps [github.com/hashicorp/cronexpr](https://github.com/hashicorp/cronexpr) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/hashicorp/cronexpr/releases)
- [Commits](https://github.com/hashicorp/cronexpr/compare/v1.1.0...v1.1.1

)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/cronexpr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>

* go mod tidy
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tim@0x74696d.com>

Unverified

782d5398