Commits · backport/docs-docker-dns-options/mutually-game-eel · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

08 Mar, 2022 1 commit
- no-op commit due to failed cherry-picking · 99e8f624
  temp authored 3 years ago
  
  99e8f624
07 Mar, 2022 1 commit
- Merge pull request #12212 from hashicorp/backport/patch-1/strictly-loving-troll · f3f97868
  hc-github-team-nomad-core authored 3 years ago
```
This pull request was automerged via backport-assistant
```
  f3f97868
01 Mar, 2022 1 commit
- Merge pull request #12164 from hashicorp/backport/docs-csi-plugin/remarkably-resolved-collie · bacf448c
  hc-github-team-nomad-core authored 3 years ago
```
This pull request was automerged via backport-assistant
```
  bacf448c
28 Feb, 2022 1 commit
- backport of commit 9aad0a4fb7e91b72015e6d1060c3ae14f0fd8327 · be7720ec
  Tim Gross authored 3 years ago
  
  be7720ec
18 Feb, 2022 1 commit
- backport of commit 414e0157b3ef1e5c56ac1f5ee58d2def232c3cab · f172bf75
  Ignacio Torres Masdeu authored 3 years ago
  
  f172bf75
01 Feb, 2022 1 commit
- prepare for next release · 0b3070de
  Tim Gross authored 3 years ago
  
  0b3070de
31 Jan, 2022 3 commits
- remove generated files · e5298266
  Nomad Release Bot authored 3 years ago
  
  e5298266
- Release v1.0.17 · c19be8d8
  Nomad Release Bot authored 3 years ago
  
  c19be8d8
- Generate files for 1.0.17 release · 0007f4c4
  Nomad Release bot authored 3 years ago
  
  0007f4c4
28 Jan, 2022 10 commits

docs: add 1.1.17 to changelog · beb996b3
Tim Gross authored 3 years ago

beb996b3
docs: missing changelog for #11892 (#11959) · 8665724c
Tim Gross authored 3 years ago

8665724c
set LAST_RELEASE to 1.1.16 for the 1.1.17 release branch · 45b7aab8
Tim Gross authored 3 years ago

45b7aab8

CSI: node unmount from the client before unpublish RPC (#11892) · 136e2a8e

Tim Gross authored 3 years ago

When an allocation stops, the `csi_hook` makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and `stop_on_client_disconnect`
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can
return ignorable errors in the case where the volume has already been
unmounted from the node. Handle all ...

136e2a8e

CSI: tests to exercise csi_hook (#11788) · 37dea7c2

Tim Gross authored 3 years ago

Small refactoring of the allocrunner hook for CSI to make it more
testable, and a unit test that covers most of its logic.

37dea7c2

CSI: resolve invalid claim states (#11890) · 9154033a

Tim Gross authored 3 years ago

* csi: resolve invalid claim states on read

It's currently possible for CSI volumes to be claimed by allocations
that no longer exist. This changeset asserts a reasonable state at
the state store level by registering these nil allocations as "past
claims" on any read. This will cause any pass through the periodic GC
or volumewatcher to trigger the unpublishing workflow for those claims.

* csi: make feasibility check errors more understandable

When the feasibility checker finds we have no free write claims, it
checks to see if any of those claims are for the job we're currently
scheduling (so that earlier versions of a job can't block claims for
new versions) and reports a conflict if the volume can't be scheduled
so that the user can fix their claims. But when the checker hits a
claim that has a GCd allocation, the state is recoverable by the
server once claim reaping completes and no user intervention is
required; the blocked eval should complete. Differentiate the
scheduler error produced by these two conditions.

9154033a

tests: use standard library testing.TB · 73740b28

Mahmood Ali authored 3 years ago

Glint pulled in an updated version of mitchellh/go-testing-interface
which broke some existing tests because the update added a Parallel()
method to testing.T. This switches to the standard library testing.TB
which doesn't have a Parallel() method.

73740b28

csi: update leader's ACL in volumewatcher (#11891) · 0d8952a3

Tim Gross authored 3 years ago

The volumewatcher that runs on the leader needs to make RPC calls
rather than writing to raft (as we do in the deploymentwatcher)
because the unpublish workflow needs to make RPC calls to the
clients. This requires that the volumewatcher has access to the
leader's ACL token.

But when leadership transitions, the new leader creates a new leader
ACL token. This ACL token needs to be passed into the volumewatcher
when we enable it, otherwise the volumewatcher can find itself with a
stale token.

0d8952a3

csi: reap unused volume claims at leadership transitions (#11776) · d30ceb6f

Tim Gross authored 3 years ago

When `volumewatcher.Watcher` starts on the leader, it starts a watch
on every volume and triggers a reap of unused claims on any change to
that volume. But if a reaping is in-flight during leadership
transitions, it will fail and the event that triggered the reap will
be dropped. Perform one reap of unused claims at the start of the
watcher so that leadership transitions don't drop this event.

d30ceb6f

Merge pull request #10752 from hashicorp/b-fix-test-datarace-volumewatcher · 628959b3
James Rasell authored 3 years ago
```
volumewatcher: fix test data race.
```
628959b3

18 Jan, 2022 5 commits
- Release v1.0.16 · 6efeeae6
  Nomad Release Bot authored 3 years ago
  
  6efeeae6
- Generate files for 1.0.16 release · 17e90813
  Nomad Release bot authored 3 years ago
  
  17e90813
- docs: add 1.0.16 to changelog · 1e80db6f
  Luiz Aoqui authored 3 years ago
  
  1e80db6f
- update tools/ go modules · efa40dc9
  Luiz Aoqui authored 3 years ago
  
  efa40dc9
- Merge pull request #11744 from hashicorp/b-node-copy · d5b51da9
  Michael Schurter authored 3 years ago
```
Fix Node.Copy()
```
  d5b51da9
17 Jan, 2022 16 commits

drivers: set world-readable permissions on copied resolv.conf (#11856) · e7caa8cf

Tim Gross authored 3 years ago

When we copy the system DNS to a task's `resolv.conf`, we should set
the permissions as world-readable so that unprivileged users within
the task can read it.

e7caa8cf

freebsd: build fix for ARM7 32-bit (#11854) · 30ac5ae1

Tim Gross authored 3 years ago

The size of `stat_t` fields is architecture dependent, which was
reportedly causing a build failure on FreeBSD ARM7 32-bit
systems. This changeset matches the behavior we have on Linux.

30ac5ae1

csi: when warning for multiple prefix matches, use full ID (#11853) · 3f059258

Tim Gross authored 3 years ago

When the `volume deregister` or `volume detach` commands get an ID
prefix that matches multiple volumes, show the full length of the
volume IDs in the list of volumes shown so so that the user can select
the correct one.

3f059258

csi: volume deregistration should require exact ID (#11852) · 77f4a254

Tim Gross authored 3 years ago

The command line client sends a specific volume ID, but this isn't
enforced at the API level and we were incorrectly using a prefix match
for volume deregistration, resulting in cases where a volume with a
shorter ID that's a prefix of another volume would be deregistered
instead of the intended volume.

77f4a254

Un-break templates when using vault stanza change_mode noop (#11783) · c1b4238f

grembo authored 3 years ago

Templates in nomad jobs make use of the vault token defined in
the vault stanza when issuing credentials like client certificates.

When using change_mode "noop" in the vault stanza, consul-template
is not informed in case a vault token is re-issued (which can
happen from time to time for various reasons, as described
in https://www.nomadproject.io/docs/job-specification/vault).

As a result, consul-template will keep using the old vault token
to renew credentials and - once the token expired - stop renewing
credentials. The symptom of this problem is a vault_token
file that is newer than the issued credential (e.g., TLS certificate)
in a job's /secrets directory.

This change corrects this, so that h.updater.updatedVaultToken(token)
is called, which will inform stakeholders about the new
token and make sure, the new token is used by consul-template.

Example job template fragment:

    vault {
        policies = ["nomad-job-policy"]
        change_mode = "noop"
    }

    template {
      data = <<-EOH
        {{ with secret "pki_int/issue/nomad-job"
        "common_name=myjob.service.consul" "ttl=90m"
        "alt_names=localhost" "ip_sans=127.0.0.1"}}
        {{ .Data.certificate }}
        {{ .Data.private_key }}
        {{ .Data.issuing_ca }}
        {{ end }}
      EOH
      destination = "${NOMAD_SECRETS_DIR}/myjob.crt"
      change_mode = "noop"
    }

This fix does not alter the meaning of the three change modes of vault

- "noop" - Take no action
- "restart" - Restart the job
- "signal" - send a signal to the task

as the switch statement following line 232 contains the necessary
logic.

It is assumed that "take no action" was never meant to mean "don't tell
consul-template about the new vault token".

Successfully tested in a staging cluster consisting of multiple
nomad client nodes.

c1b4238f

task runner: fix goroutine leak in prestart hook (#11741) · 1a58d176

Tim Gross authored 3 years ago

The task runner prestart hooks take a `joincontext` so they have the
option to exit early if either of two contexts are canceled: from
killing the task or client shutdown. Some tasks exit without being
shutdown from the server, so neither of the joined contexts ever gets
canceled and we leak the `joincontext` (48 bytes) and its internal
goroutine. This primarily impacts batch jobs and any task that fails
or completes early such as non-sidecar prestart lifecycle tasks.
Cancel the `joincontext` after the prestart call exits to fix the
leak.

1a58d176

scheduler: detect and log unexpected scheduling collisions (#11793) · 21cdd93d
Luiz Aoqui authored 3 years ago

21cdd93d
Merge pull request #11830 from hashicorp/b-validate-reserved-ports · 6cf2c529
Michael Schurter authored 3 years ago
```
agent: validate reserved_ports are valid
```
6cf2c529
Merge pull request #11833 from hashicorp/deps-go-getter-v1.5.11 · a7df948f
Michael Schurter authored 3 years ago
```
deps: update go-getter to v1.5.11
```
a7df948f
fix host network reserved port fingerprint (#11728) · 4c5ff512
Luiz Aoqui authored 3 years ago

4c5ff512

cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678) · 6cf4af9b

Tim Gross authored 3 years ago

When a cluster doesn't have a leader, the `nomad operator debug`
command can safely use stale queries to gracefully degrade the
consistency of almost all its queries. The query parameter for these
API calls was not being set by the command.

Some `api` package queries do not include `QueryOptions` because
they target a specific agent, but they can potentially be forwarded to
other agents. If there is no leader, these forwarded queries will
fail. Provide methods to call these APIs with `QueryOptions`.

6cf4af9b

build(deps): bump github.com/hashicorp/cronexpr from 1.1.0 to 1.1.1 in /api (#11132) · 4c0d26ac

dependabot[bot] authored 3 years ago

* build(deps): bump github.com/hashicorp/cronexpr in /api

Bumps [github.com/hashicorp/cronexpr](https://github.com/hashicorp/cronexpr) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/hashicorp/cronexpr/releases)
- [Commits](https://github.com/hashicorp/cronexpr/compare/v1.1.0...v1.1.1

)

---
updated-dependencies:
- dependency-name: github.com/hashicorp/cronexpr
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>

* go mod tidy
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Tim Gross <tim@0x74696d.com>

4c0d26ac

changelog: add entry for #11793 (#11862) · 0312c4ab
Luiz Aoqui authored 3 years ago

0312c4ab
Merge pull request #11849 from hashicorp/b-changelog-11848 · 90d3f577
James Rasell authored 3 years ago
```
changelog: add entry for #11848
```
90d3f577
docs: improve changelog for PR #11783 (#11818) · 6016325f
Tim Gross authored 3 years ago

6016325f

scheduler: fix quadratic performance with spread blocks (#11712) · 4a11b079

Tim Gross authored 3 years ago

When the scheduler picks a node for each evaluation, the
`LimitIterator` provides at most 2 eligible nodes for the
`MaxScoreIterator` to choose from. This keeps scheduling fast while
producing acceptable results because the results are binpacked.

Jobs with a `spread` block (or node affinity) remove this limit in
order to produce correct spread scoring. This means that every
allocation within a job with a `spread` block is evaluated against
_all_ eligible nodes. Operators of large clusters have reported that
jobs with `spread` blocks that are eligible on a large number of nodes
can take longer than the nack timeout to evaluate (60s). Typical
evaluations are processed in milliseconds.

In practice, it's not necessary to evaluate every eligible node for
every allocation on large clusters, because the `RandomIterator` at
the base of the scheduler stack produces enough variation in each pass
that the likelihood of an uneven spread is negligible. Note that
feasibility is checked before the limit, so this only impacts the
number of _eligible_ nodes available for scoring, not the total number
of nodes.

This changeset sets the iterator limit for "large" `spread` block and
node affinity jobs to be equal to the number of desired
allocations. This brings an example problematic job evaluation down
from ~3min to ~10s. The included tests ensure that we have acceptable
spread results across a variety of large cluster topologies.

4a11b079