Commits · 8ac47f5b20f879f48143a4ba94a39a405d728c65 · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

23 Feb, 2022 1 commit
- backport of commit 848c4b0c97f7691eb7f02b13ec4b9209dd34fb7b · 889601b3
  Tim Gross authored 3 years ago
  
  889601b3
08 Feb, 2022 2 commits
- backport of commit 6e545d4f240a6de117aeae8de01c67ff7df52414 · 33176d07
  Tim Gross authored 3 years ago
  
  33176d07
- backport of commit 84810dced9727ff435f3e18b53a30eee7ae69d71 · 83b98186
  Tim Gross authored 3 years ago
  
  83b98186
28 Jan, 2022 3 commits

CSI: move terminal alloc handling into denormalization (#11931) · 2c6de3e8

Tim Gross authored 3 years ago

* The volume claim GC method and volumewatcher both have logic
collecting terminal allocations that duplicates most of the logic
that's now in the state store's `CSIVolumeDenormalize` method. Copy
this logic into the state store so that all code paths have the same
view of the past claims.
* Remove logic in the volume claim GC that now lives in the state
store's `CSIVolumeDenormalize` method.
* Remove logic in the volumewatcher that now lives in the state
store's `CSIVolumeDenormalize` method.
* Remove logic in the node unpublish RPC that now lives in the state
store's `CSIVolumeDenormalize` method.

2c6de3e8

csi: ensure that PastClaims are populated with correct mode (#11932) · 26b50083

Tim Gross authored 3 years ago

In the client's `(*csiHook) Postrun()` method, we make an unpublish
RPC that includes a claim in the `CSIVolumeClaimStateUnpublishing`
state and using the mode from the client. But then in the
`(*CSIVolume) Unpublish` RPC handler, we query the volume from the
state store (because we only get an ID from the client). And when we
make the client RPC for the node unpublish step, we use the _current
volume's_ view of the mode. If the volume's mode has been changed
before the old allocations can have their claims released, then we end
up making a CSI RPC that will never succeed.

Why does this code path get the mode from the volume and not the
claim? Because the claim written by the GC job in `(*CoreScheduler)
csiVolumeClaimGC` doesn't have a mode. Instead it just writes a claim
in the unpublishing state to ensure the volumewatcher detects a "past
claim" change and reaps all the claims on the volumes.

Fix this by ensuring that the `CSIVolumeDenormalize` creates past
claims for all nil allocations with a correct access mode set.

26b50083

CSI: resolve invalid claim states (#11890) · 6e0119de

Tim Gross authored 3 years ago

* csi: resolve invalid claim states on read

It's currently possible for CSI volumes to be claimed by allocations
that no longer exist. This changeset asserts a reasonable state at
the state store level by registering these nil allocations as "past
claims" on any read. This will cause any pass through the periodic GC
or volumewatcher to trigger the unpublishing workflow for those claims.

* csi: make feasibility check errors more understandable

When the feasibility checker finds we have no free write claims, it
checks to see if any of those claims are for the job we're currently
scheduling (so that earlier versions of a job can't block claims for
new versions) and reports a conflict if the volume can't be scheduled
so that the user can fix their claims. But when the checker hits a
claim that has a GCd allocation, the state is recoverable by the
server once claim reaping completes and no user intervention is
required; the blocked eval should complete. Differentiate the
scheduler error produced by these two conditions.

6e0119de

18 Jan, 2022 1 commit

csi: volume deregistration should require exact ID (#11852) · cd0139d1

Tim Gross authored 3 years ago

The command line client sends a specific volume ID, but this isn't
enforced at the API level and we were incorrectly using a prefix match
for volume deregistration, resulting in cases where a volume with a
shorter ID that's a prefix of another volume would be deregistered
instead of the intended volume.

cd0139d1

07 May, 2021 1 commit
- Node Drain Metadata (#10250) · 140e7b3a
  Chris Baker authored 4 years ago
  
  140e7b3a
07 Apr, 2021 1 commit

CSI: use AccessMode/AttachmentMode from CSIVolumeClaim · a37af310

Tim Gross authored 4 years ago

Registration of Nomad volumes previously allowed for a single volume
capability (access mode + attachment mode pair). The recent `volume create`
command requires that we pass a list of requested capabilities, but the
existing workflow for claiming volumes and attaching them on the client
assumed that the volume's single capability was correct and unchanging.

Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to
set the initial claim value, and add backwards compatibility logic to handle
the existing volumes that already have claims without these fields.

a37af310

21 Mar, 2021 1 commit

removed deprecated fields from Drain structs and API · 93d5187e

Chris Baker authored 4 years ago

node drain: use msgtype on txn so that events are emitted
wip: encoding extension to add Node.Drain field back to API responses

new approach for hiding Node.SecretID in the API, using `json` tag
documented this approach in the contributing guide
refactored the JSON handlers with extensions
modified event stream encoding to use the go-msgpack encoders with the extensions

93d5187e

18 Mar, 2021 2 commits

CSI: unique volume per allocation · 7c756967

Tim Gross authored 4 years ago

Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.

Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.

7c756967

CSI: remove prefix matching from CSIVolumeByID and fix CLI prefix matching (#10158) · a1eaad9c

Tim Gross authored 4 years ago

Callers of `CSIVolumeByID` are generally assuming they should receive a single
volume. This potentially results in feasibility checking being performed
against the wrong volume if a volume's ID is a prefix substring of other
volume (for example: "test" and "testing").

Removing the incorrect prefix matching from `CSIVolumeByID` breaks prefix
matching in the command line client. Add the required elements for prefix
matching to the commands and API.

a1eaad9c

16 Mar, 2021 1 commit

Fixup uses of `sanity` (#10187) · d914990e

Charlie Voiselle authored 4 years ago

* Fixup uses of `sanity`
* Remove unnecessary comments.

These checks are better explained by earlier comments about
the context of the test. Per @tgross, moved the tests together
to better reinforce the overall shared context.

* Update nomad/fsm_test.go

d914990e

10 Mar, 2021 2 commits

RPC endpoints to support 'nomad ui -login' · a12f4470

Tim Gross authored 4 years ago

RPC endpoints for the user-driven APIs (`UpsertOneTimeToken` and
`ExchangeOneTimeToken`) and token expiration (`ExpireOneTimeTokens`).
Includes adding expiration to the periodic core GC job.

a12f4470

state store updates for one-time tokens · b4a516be

Tim Gross authored 4 years ago

The `OneTimeToken` struct is to support the `nomad ui -login` command. This
changeset adds the struct to the Nomad state store.

b4a516be

22 Feb, 2021 1 commit

deploymentwatcher: reset progress deadline on promotion (#10042) · 174c206b

Tim Gross authored 4 years ago

In a deployment with two groups (ex. A and B), if group A's canary becomes
healthy before group B's, the deadline for the overall deployment will be set
to that of group A. When the deployment is promoted, if group A is done it
will not contribute to the next deadline cutoff. Group B's old deadline will
be used instead, which will be in the past and immediately trigger a
deployment progress failure. Reset the progress deadline when the job is
promotion to avoid this bug, and to better conform with implicit user
expectations around how the progress deadline should interact with promotions.

174c206b

25 Jan, 2021 1 commit
- ignore setting job summary when oldstatus == newstatus (#9884) · 85129bb7
  Drew Bailey authored 4 years ago
  
  85129bb7
22 Jan, 2021 1 commit

prevent double job status update (#9768) · 3cb11326

Drew Bailey authored 4 years ago

* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard

3cb11326

11 Dec, 2020 1 commit

Events/acl events (#9595) · 3e793ea3

Drew Bailey authored 4 years ago

* fix acl event creation

* allow way to access secretID without exposing it to stream

test that values are omitted

test event creation

test acl events

payloads are pointers

fix failing tests, do all security steps inside constructor

* increase time

* ignore empty tokens

* uncomment line

* changelog

3e793ea3

08 Dec, 2020 1 commit
- Add gocritic to golangci-lint config (#9556) · 071f4c75
  Kris Hicks authored 4 years ago
  
  071f4c75
01 Dec, 2020 3 commits

pass in msgType for UpsertJob (#9475) · 246855c0
Drew Bailey authored 4 years ago

246855c0

Event Stream: Track ACL changes, unsubscribe on invalidating changes (#9447) · 61ce7432

Drew Bailey authored 4 years ago


* upsertaclpolicies

* delete acl policies msgtype

* upsert acl policies msgtype

* delete acl tokens msgtype

* acl bootstrap msgtype

wip unsubscribe on token delete

test that subscriptions are closed after an ACL token has been deleted

Start writing policyupdated test

* update test to use before/after policy

* add SubscribeWithACLCheck to run acl checks on subscribe

* update rpc endpoint to use broker acl check

* Add and use subscriptions.closeSubscriptionFunc

This fixes the issue of not being able to defer unlocking the mutex on
the event broker in the for loop.

handle acl policy updates

* rpc endpoint test for terminating acl change

* add comments
Co-authored-by: Kris Hicks <khicks@hashicorp.com>

61ce7432

return potential errors from txn.Commit (#9483) · d9257f73
Drew Bailey authored 4 years ago

d9257f73

30 Nov, 2020 1 commit

Remove Managed Sinks from Nomad (#9470) · bf225f71

Drew Bailey authored 4 years ago

* Remove Managed Sinks from Nomad

Managed Sinks were a beta feature in Nomad 1.0-beta2. During the beta
period it was determined that this was not a scalable approach to
support community and third party sinks.

* update comment

* changelog

bf225f71

25 Nov, 2020 1 commit

CSI: fix transaction handling in state store (#9438) · c2aaa517

Tim Gross authored 4 years ago

When making updates to CSI plugins, the state store methods that have open
write transactions were querying the state store using the same methods used
by the CSI RPC endpoint, but these method creates their own top-level read
transactions. During concurrent plugin updates (as happens when a plugin job
is stopped), this can cause write skew in the plugin counts.

* Refactor the CSIPlugin query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* Refactor the CSIVolume query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
* CSI volumes need to be "denormalized" with their plugins and (optionally)
allocations. Read-only RPC endpoints should take a snapshot so that we can
make multiple state store method calls with a consistent view.

c2aaa517

18 Nov, 2020 1 commit

CSI: fix struct copying errors (#9239) · 71a378e6

Tim Gross authored 4 years ago

The CSIVolume struct "denormalizes" allocations when it's first queried from
the state store. The CSIVolumeByID method on the state store copies the volume
before denormalizing so that we don't end up with unexpected changes. The
copying has some subtle bugs that meant that Allocations (as well as
Topologies and MountOptions) were not getting copied when expected.

Also, ensure we never write allocations attached to volumes to the state store
during claims.

71a378e6

11 Nov, 2020 1 commit

csi: Postrun hook should not change mode (#9323) · 0ed0b945

Tim Gross authored 4 years ago

The unpublish workflow requires that we know the mode (RW vs RO) if we want to
unpublish the node. Update the hook and the Unpublish RPC so that we mark the
claim for release in a new state but leave the mode alone. This fixes a bug
where RO claims were failing node unpublish.

The core job GC doesn't know the mode, but we don't need it for that workflow,
so add a mode specifically for GC; the volumewatcher uses this as a sentinel
to check whether claims (with their specific RW vs RO modes) need to be claimed.

0ed0b945

10 Nov, 2020 1 commit
- fix #9227: use both job and type query on scaling policy list endpoint · ece8cde7
  Chris Baker authored 4 years ago
  
  ece8cde7
05 Nov, 2020 2 commits
- events: Use single eventsFromChanges func (#9281) · f2d2669f
  Kris Hicks authored 4 years ago
  
  f2d2669f
- updated Allocation.List to properly handle ACL checking for namespace=* · 4aeb8900
  Chris Baker authored 4 years ago
  
  4aeb8900
28 Oct, 2020 1 commit

added new policy capabilities for recommendations API · 9e2eadc7

Chris Baker authored 4 years ago

state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type

9e2eadc7

26 Oct, 2020 1 commit

Send events to EventSinks (#9171) · da45c959

Drew Bailey authored 4 years ago

* Process to send events to configured sinks

This PR adds a SinkManager to a server which is responsible for managing
managed sinks. Managed sinks subscribe to the event broker and send
events to a sink writer (webhook). When changes to the eventstore are
made the sinkmanager and managed sink are responsible for reloading or
starting a new managed sink.

* periodically check in sink progress to raft

Save progress on the last successfully sent index to raft. This allows a
managed sink to resume close to where it left off in the event of a lost
server or leadership change

dereference eventsink so we can accurately use the watchch

When using a pointer to eventsink struct it was updated immediately and our reload logic would not trigger

da45c959

23 Oct, 2020 1 commit

event sink crud operation api (#9155) · fbb199d4

Drew Bailey authored 4 years ago

* network sink rpc/api plumbing

state store methods and restore

upsert sink test

get sink

delete sink

event sink list and tests

go generate new msg types

validate sink on upsert

* go generate

fbb199d4

22 Oct, 2020 2 commits

core: open source namespaces · ecfcb002
Michael Schurter authored 4 years ago

ecfcb002

remove event durability (#9147) · 3347b40d

Drew Bailey authored 4 years ago

* remove event durability

temporarily removing go-memdb event durability until a new strategy is developed on how to best handled increased durability needs

* drop events table schema and state store methods

* fix neweventbuffer invocations

3347b40d

19 Oct, 2020 1 commit

Events/msgtype cleanup (#9117) · 7ce0b501

Drew Bailey authored 4 years ago

* use msgtype in upsert node

adds message type to signature for upsert node, update tests, remove placeholder method

* UpsertAllocs msg type test setup

* use upsertallocs with msg type in signature

update test usage of delete node

delete placeholder msgtype method

* add msgtype to upsert evals signature, update test call sites with test setup msg type

handle snapshot upsert eval outside of FSM and ignore eval event

remove placeholder upsertevalsmsgtype

handle job plan rpc and prevent event creation for plan

msgtype cleanup upsertnodeevents

updatenodedrain msgtype

msg type 0 is a node registration event, so set the default  to the ignore type

* fix named import

* fix signature ordering on upsertnode to match

7ce0b501

14 Oct, 2020 4 commits

filter on additional filter keys, remove switch statement duplication · 3c15f414

Drew Bailey authored 4 years ago

properly wire up durable event count

move newline responsibility

moves newline creation from NDJson to the http handler, json stream only encodes and sends now

ignore snapshot restore if broker is disabled

enable dev mode to access event steam without acl

use mapping instead of switch

use pointers for config sizes, remove unused ttl, simplify closed conn logic

3c15f414

api: add field filters to /v1/{allocations,nodes} · a55f46e9

Michael Schurter authored 4 years ago

Fixes #9017

The ?resources=true query parameter includes resources in the object
stub listings. Specifically:

- For `/v1/nodes?resources=true` both the `NodeResources` and
  `ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
  included.

The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)

a55f46e9

handle txn returning error · 8711376e
Drew Bailey authored 4 years ago

8711376e

Add EvictCallbackFn to handle removing entries from go-memdb when they · 39ef3263

Drew Bailey authored 4 years ago

are removed from the event buffer.

Wire up event buffer size config, use pointers for structs.Events
instead of copying.

39ef3263