Commits · b-update-reconnected-alloc · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

04 Nov, 2022 2 commits

rpc: read node ID from allocs in UpdateAlloc · e57379c2

Luiz Aoqui authored 2 years ago

The AllocUpdateRequest struct is used in three disjoint use cases:

1. Stripped allocs from clients Node.UpdateAlloc RPC using the Allocs,
   and WriteRequest fields
2. Raft log message using the Allocs, Evals, and WriteRequest fields
3. Plan updates using the AllocsStopped, AllocsUpdated, and Job fields

Adding a new field that would only be used in one these cases (1) made
things more confusing and error prone. While in theory an
AllocUpdateRequest could send allocations from different nodes, in
practice this never actually happens since only clients call this method
with their own allocations.

e57379c2

scheduler: persist changes to reconnected allocs · cb51a281
Luiz Aoqui authored 2 years ago
```
Reconnected allocs have a new AllocState entry that must be persisted by
the plan applier.
```
cb51a281

03 Nov, 2022 1 commit
- apply more code review changes · 12f1ff56
  Luiz Aoqui authored 2 years ago
  
  12f1ff56
02 Nov, 2022 6 commits

client: skip terminal allocations on reconnect · f5ce8a96

Luiz Aoqui authored 2 years ago

When the client reconnects with the server it synchronizes the state of
its allocations by sending data using the `Node.UpdateAlloc` RPC and
fetching data using the `Node.GetClientAllocs` RPC.

If the data fetch happens before the data write, `unknown` allocations
will still be in this state and would trigger the
`allocRunner.Reconnect` flow.

But when the server `DesiredStatus` for the allocation is `stop` the
client should not reconnect the allocation.

f5ce8a96

code review · aa324ad0
Luiz Aoqui authored 2 years ago

aa324ad0
chagelog: add entry for #15068 · 3e081c1f
Luiz Aoqui authored 2 years ago

3e081c1f

rpc: only allow alloc updates from `ready` nodes · 956a3fa9

Luiz Aoqui authored 2 years ago

Clients interact with servers using three main RPC methods:

  - `Node.GetAllocs` reads allocation data from the server and writes it
    to the client.
  - `Node.UpdateAlloc` reads allocation from from the client and writes
    them to the server.
  - `Node.UpdateStatus` writes the client status to the server and is
    used as the heartbeat mechanism.

These three methods are called periodically by the clients and are done
so independently from each other, meaning that there can't be any
assumptions in their ordering.

This can generate scenarios that are hard to reason about and to code
for. For example, when a client misses too many heartbeats it will be
considered `down` or `disconnected` and the allocations it was running
are set to `lost` or `unknown`.

When connectivity is restored the to rest of the cluster, the natural
mental model is to think that the client will heartbeat first and then
update its allocations status into the servers.

But since there's no inherit order in these calls the reverse is just as
possible: the client updates the alloc status and then heartbeats. This
results in a state where allocs are, for example, `running` while the
client is still `disconnected`.

This commit adds a new verification to the `Node.UpdateAlloc` method to
reject updates from nodes that are not `ready`, forcing clients to
heartbeat first. Since this check is done server-side there is no need
to coordinate operations client-side: they can continue sending these
requests independently and alloc update will succeed after the heartbeat
is done.

956a3fa9

scheduler: prevent spurious placement on reconnect · 720513f0

Luiz Aoqui authored 2 years ago

When a client reconnects it makes two independent RPC calls:

  - `Node.UpdateStatus` to heartbeat and set its status as `ready`.
  - `Node.UpdateAlloc` to update the status of its allocations.

These two calls can happen in any order, and in case the allocations are
updated before a heartbeat it causes the state to be the same as a node
being disconnected: the node status will still be `disconnected` while
the allocation `ClientStatus` is set to `running`.

The current implementation did not handle this order of events properly,
and the scheduler would create an unnecessary placement since it
considered the allocation was being disconnected. This extra allocation
would then be quickly stopped by the heartbeat eval.

This commit adds a new code path to handle this order of events. If the
node is `disconnected` and the allocation `ClientStatus` is `running`
the scheduler will check if the allocation is actually reconnecting
using its `AllocState` events.

720513f0

scheduler: allow updates after alloc reconnects · 882904bf

Luiz Aoqui authored 2 years ago

When an allocation reconnects to a cluster the scheduler needs to run
special logic to handle the reconnection, check if a replacement was
create and stop one of them.

If the allocation kept running while the node was disconnected, it will
be reconnected with `ClientStatus: running` and the node will have
`Status: ready`. This combination is the same as the normal steady state
of allocation, where everything is running as expected.

In order to differentiate between the two states (an allocation that is
reconnecting and one that is just running) the scheduler needs an extra
piece of state.

The current implementation uses the presence of a
`TaskClientReconnected` task event to detect when the allocation has
reconnected and thus must go through the reconnection process. But this
event remains even after the allocation is reconnected, causing all
future evals to consider the allocation as still reconnecting.

This commit changes the reconnect logic to use an `AllocState` to
register when the allocation was reconnected. This provides the
following benefits:

  - Only a limited number of task states are kept, and they are used for
    many other events. It's possible that, upon reconnecting, several
    actions are triggered that could cause the `TaskClientReconnected`
    event to be dropped.
  - Task events are set by clients and so their timestamps are subject
    to time skew from servers. This prevents using time to determine if
    an allocation reconnected after a disconnect event.
  - Disconnect events are already stored as `AllocState` and so storing
    reconnects there as well makes it the only source of information
    required.

With the new logic, the reconnection logic is only triggered if the
last `AllocState` is a disconnect event, meaning that the allocation has
not been reconnected yet. After the reconnection is handled, the new
`ClientStatus` is store in `AllocState` allowing future evals to skip
the reconnection logic.

882904bf

26 Oct, 2022 1 commit

docs: improved documentation on hardening and required capabilities (#15036) · b363c56c

Tim Gross authored 2 years ago

The existing docs on required capabilities are a little sparse and have been the
subject of a lots of questions. Expand on this information and provide a pointer
to the ongoing design discussion around rootless Nomad.

b363c56c

25 Oct, 2022 1 commit
- keyring: remove root key GC (#15034) · b583f782
  Tim Gross authored 2 years ago
  
  b583f782
24 Oct, 2022 7 commits

client: ensure minimal cgroup controllers enabled (#15027) · d978e771

Seth Hoenig authored 2 years ago

* client: ensure minimal cgroup controllers enabled

This PR fixes a bug where Nomad could not operate properly on operating
systems that set the root cgroup.subtree_control to a set of controllers that
do not include the minimal set of controllers needed by Nomad.

Nomad needs these controllers enabled to operate:
- cpuset
- cpu
- io
- memory
- pids

Now, Nomad will ensure these controllers are enabled during Client initialization,
adding them to cgroup.subtree_control as necessary. This should be particularly
helpful on the RHEL/CentOS/Fedora family of system. Ubuntu systems should be
unaffected as they enable all controllers by default.

Fixes: https://github.com/hashicorp/nomad/issues/14494

* docs: cleanup doc string

* client: cleanup controller writes, enhance log messages

d978e771

keyring: refactor to hold locks for less time (#15026) · c902c804

Tim Gross authored 2 years ago

Follow-up from https://github.com/hashicorp/nomad/pull/14987/files#r1003611644

We don't need to hold the lock when querying the state store, so move the
read-lock to the interior of the `activeKeySet` function.

c902c804

docs: add details to redirects file (#15020) · 563e5e3d
Zach Shilton authored 2 years ago

563e5e3d

deps: update hashicorp/raft to v1.3.11 (#15021) · a0071f42

Seth Hoenig authored 2 years ago

* deps: update hashicorp/raft to v1.3.11

Includes part of the fix for https://github.com/hashicorp/raft/issues/524

* cl: add changelog entry

a0071f42

ci: add -core suffix to mods action (#15015) · 4ffb8c0f

Seth Hoenig authored 2 years ago

Forgot to add this line to the new mods action; without it, it
creates a cache different from the one used by the other jobs.

4ffb8c0f

refact: preserve promise.then behavior for acceptance tests (#15003) · 4076085b
Jai authored 2 years ago

4076085b

keyring: fix missing GC config, don't rotate on manual GC (#15009) · 19653074

Tim Gross authored 2 years ago

The configuration knobs for root keyring garbage collection are present in the
consumer and present in the user-facing config, but we missed the spot where we
copy from one to the other. Fix this so that users can set their own thresholds.

The root key is automatically rotated every ~30d, but the function that does
both rotation and key GC was wired up such that `nomad system gc` caused an
unexpected key rotation. Split this into two functions so that `nomad system gc`
cleans up old keys without forcing a rotation, which will be done periodially
or by the `nomad operator root keyring rotate` command.

19653074

21 Oct, 2022 6 commits

ci: use the same go mod cache across test-core jobs (#15006) · dbd742d8
Seth Hoenig authored 2 years ago
```
* ci: use the same go mod cache for test-core jobs

* ci: precache go modules

* ci: add a mods precache job
```
dbd742d8

keyring: fixes for keyring replication on cluster join (#14987) · 5732eb2c

Tim Gross authored 2 years ago

* keyring: don't unblock early if rate limit burst exceeded

The rate limiter returns an error and unblocks early if its burst limit is
exceeded (unless the burst limit is Inf). Ensure we're not unblocking early,
otherwise we'll only slow down the cases where we're already pausing to make
external RPC requests.

* keyring: set MinQueryIndex on stale queries

When keyring replication makes a stale query to non-leader peers to find a key
the leader doesn't have, we need to make sure the peer we're querying has had a
chance to catch up to the most current index for that key. Otherwise it's
possible for newly-added servers to query another newly-added server and get a
non-error nil response for that key ID.

Ensure that we're setting the correct reply index in the blocking query.

Note that the "not found" case does not return an error, just an empty key. So
as a belt-and-suspenders, update the handling of empty responses so that we
don't break the loop early if we hit a server that doesn't have the key.

* test for adding new servers to keyring

* leader: initialize keyring after we have consistent reads

Wait until we're sure the FSM is current before we try to initialize the
keyring.

Also, if a key is rotated immediately following a leader election, plans that
are in-flight may get signed before the new leader has the key. Allow for a
short timeout-and-retry to avoid rejecting plans

5732eb2c

test: use port collision instead of cpu exhaustion (#14994) · 5ed74049

Michael Schurter authored 2 years ago

Originally this test relied on Job 1 blocking Job 2 until Job 1 had a
terminal *ClientStatus.* Job 2 ensured it would get blocked using 2
mechanisms:

1. A constraint requiring it is placed on the same node as Job 1.
2. Job 2 would require all unreserved CPU on the node to ensure it would
   be blocked until Job 1's resources were free.

That 2nd assertion breaks if *any previous job is still running on the
target node!* That seems very likely to happen in the flaky world of our
e2e tests. In fact there may be some jobs we intentionally want running
throughout; in hindsight it was never safe to assume my test would be
the only thing scheduled when it ran.

*Ports to the rescue!* Reserving a static port means that both Job 2
will now block on Job 1 being terminal. It will only conflict with other
tests if those tests use that port *on every node.* I ensured no
existing tests were using the port I chose.

Other changes:
- Gave j...

5ed74049

docs: use of `node_class` when autoscaling (#14950) · f2318ed2

Luiz Aoqui authored 2 years ago

Document how the value of `node_class` is used during cluster scaling.

https://github.com/hashicorp/nomad-autoscaler/issues/255

f2318ed2

ci: use gotestsum for CI tests (#14995) · b52d40d4
Seth Hoenig authored 2 years ago
```
Use gotestsum in both GHA and Circle with retries enabled.
```
b52d40d4

acl: allow tokens to read policies linked via roles to the token. (#14982) · fbe9f590

James Rasell authored 2 years ago

ACL tokens are granted permissions either by direct policy links
or via ACL role links. Callers should therefore be able to read
policies directly assigned to the caller token or indirectly by
ACL role links.

fbe9f590

20 Oct, 2022 7 commits

cli: prevent panic on `operator debug` (#14992) · 9d28d9eb

Luiz Aoqui authored 2 years ago

If the API returns an error during debug bundle collection the CLI was
expanding the wrong error object, resulting in a panic since `err` is
`nil`.

9d28d9eb

refact: upgrade Promise.then to async/await (#14798) · 67fd9ffe

Jai authored 2 years ago

* refact: upgrade Promise.then to async/await

* naive solution (#14800)

* refact: use id instead of model

* chore:  add changelog entry

* refact: add conditional safety around alloc

67fd9ffe

Post 1.4.1 release (#14988) · 4a6417eb

Luiz Aoqui authored 2 years ago


* Generate files for 1.4.1 release

* Prepare for next release
Co-authored-by: hc-github-team-nomad-core <github-team-nomad-core@hashicorp.com>

4a6417eb

deps: update go-memdb for goroutine leak fix (#14983) · e81dae1e
Seth Hoenig authored 2 years ago
```
* deps: update go-memdb for goroutine leak fix

* cl: update for goroutine leak go-memdb
```
e81dae1e
deps: bump shoenig for str func bugfixes (#14974) · f02a0957
Seth Hoenig authored 2 years ago
```
And fix the one place we use them.
```
f02a0957

acl: add ACL roles to event stream topic and resolve policies. (#14923) · 1c9b4e39

James Rasell authored 2 years ago

This changes adds ACL role creation and deletion to the event
stream. It is exposed as a single topic with two types; the filter
is primarily the role ID but also includes the role name.

While conducting this work it was also discovered that the events
stream has its own ACL resolution logic. This did not account for
ACL tokens which included role links, or tokens with expiry times.
ACL role links are now resolved to their policies and tokens are
checked for expiry correctly.

1c9b4e39

acl: correctly resolve ACL roles within client cache. (#14922) · eaea9164

James Rasell authored 2 years ago

The client ACL cache was not accounting for tokens which included
ACL role links. This change modifies the behaviour to resolve role
links to policies. It will also now store ACL roles within the
cache for quick lookup. The cache TTL is configurable in the same
manner as policies or tokens.

Another small fix is included that takes into account the ACL
token expiry time. This was not included, which meant tokens with
expiry could be used past the expiry time, until they were GC'd.

eaea9164

19 Oct, 2022 9 commits

docs: expand Autoscaling documentation (#14937) · 56816f2f

Luiz Aoqui authored 2 years ago

Rename `Internals` section to `Concepts` to match core docs structure
and expand on how policies are evaluated.

Also include missing documentation for check grouping and fix examples
to use the new feature.

56816f2f

Adds searching and filtering for nodes on topology view (#14913) · aa5b83bf

Phil Renaud authored 2 years ago

* Adds searching and filtering for nodes on topology view

* Lintfix and changelog

* Acceptance tests for topology search and filter

* Search terms also apply to class and dc on topo page

* Initialize queryparam values so as to not break history state

aa5b83bf

docs: add autoscaling debug (#14941) · 3fd800c6
Luiz Aoqui authored 2 years ago

3fd800c6

docs: move autoscaling `source` agent config (#14947) · 38606a6a

Luiz Aoqui authored 2 years ago

Move the Autoscaler agent configuration `source` to the `policy` page
since they are very closely related.

Also update all headers in this section so they follow the proper `h1 >
h2 > h3 > ...` hierarchy.

38606a6a

docs: explain autoscaler target-value strategy (#14951) · 876ea900
Luiz Aoqui authored 2 years ago
```
Provide more technical details about how the `target-value` strategy
calculates new scaling actions.
```
876ea900

website: fix broken links (#14946) · c81fe3cf

Zach Shilton authored 2 years ago

* fix: nomad license put link

* fix: redirected URL

* fix: avoid auto-formatting changes

c81fe3cf

consul: register checks along with service on initial registration (#14944) · faac908a

Seth Hoenig authored 2 years ago

* consul: register checks along with service on initial registration

This PR updates Nomad's Consul service client to include checks in
an initial service registration, so that the checks associated with
the service are registered "atomically" with the service. Before, we
would only register the checks after the service registration, which
causes problems where the service is deemed healthy, even if one or
more checks are unhealthy - especially problematic in the case where
SuccessBeforePassing is configured.

Fixes #3935

* cr: followup to fix cause of extra consul logging

* cr: fix another bug

* cr: fixup changelog

faac908a

build: add ability to specify release targets (#14957) · d59dc3da

Michael Schurter authored 2 years ago

My make knowledge is very very limited, so if there's a better way to do
this please let me know! This seems to work and lets me cut one off
builds easily.

d59dc3da

deps: update consul-template to `61e288a` (#14955) · 6e34e3fd
James Rasell authored 2 years ago

6e34e3fd