Commits · 79d1c11e2435b1f36984a7d9c6d1ab58894a56ea · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

28 Jan, 2022 6 commits

feat: add meta evaluations · 79d1c11e
Jai Bhagat authored 3 years ago
```
To support pagination on evaluations queries.
```
79d1c11e
feat: extract status cell logic into component · 3d848654
Jai Bhagat authored 3 years ago

3d848654
fix: move evaluations template to index and inside page layout · 6ffee675
Jai Bhagat authored 3 years ago

6ffee675
chore: run prettier on gutter-menu · 2ef93947
Jai Bhagat authored 3 years ago

2ef93947
feat: add evalutions view with table · 0b70c1a4
Jai Bhagat authored 3 years ago

0b70c1a4

CSI: node unmount from the client before unpublish RPC (#11892) · 8364eda1

Tim Gross authored 3 years ago

When an allocation stops, the `csi_hook` makes an unpublish RPC to the
servers to unpublish via the CSI RPCs: first to the node plugins and
then the controller plugins. The controller RPCs must happen after the
node RPCs so that the node has had a chance to unmount the volume
before the controller tries to detach the associated device.

But the client has local access to the node plugins and can
independently determine if it's safe to send unpublish RPC to those
plugins. This will allow the server to treat the node plugin as
abandoned if a client is disconnected and `stop_on_client_disconnect`
is set. This will let the server try to send unpublish RPCs to the
controller plugins, under the assumption that the client will be
trying to unmount the volume on its end first.

Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can
return ignorable errors in the case where the volume has already been
unmounted from the node. Handle all other errors by retrying until we
get success so as to give operators the opportunity to reschedule a
failed node plugin (ex. in the case where they accidentally drained a
node without `-ignore-system`). Fan-out the work for each volume into
its own goroutine so that we can release a subset of volumes if only
one is stuck.

8364eda1

27 Jan, 2022 8 commits

Merge pull request #11942 from hashicorp/f-ui/test-tooling · f2fef6ff
Jai authored 3 years ago
```
ui:  test tooling
```
f2fef6ff
Merge pull request #11951 from hashicorp/b-cgroups-broken-part1-oss · 2b93ae67
Seth Hoenig authored 3 years ago
```
client: change test to not poke cgroupv2 edge case
```
2b93ae67

CSI: move terminal alloc handling into denormalization (#11931) · 2e357163

Tim Gross authored 3 years ago

* The volume claim GC method and volumewatcher both have logic
collecting terminal allocations that duplicates most of the logic
that's now in the state store's `CSIVolumeDenormalize` method. Copy
this logic into the state store so that all code paths have the same
view of the past claims.
* Remove logic in the volume claim GC that now lives in the state
store's `CSIVolumeDenormalize` method.
* Remove logic in the volumewatcher that now lives in the state
store's `CSIVolumeDenormalize` method.
* Remove logic in the node unpublish RPC that now lives in the state
store's `CSIVolumeDenormalize` method.

2e357163

csi: ensure that PastClaims are populated with correct mode (#11932) · b588a7bd

Tim Gross authored 3 years ago

In the client's `(*csiHook) Postrun()` method, we make an unpublish
RPC that includes a claim in the `CSIVolumeClaimStateUnpublishing`
state and using the mode from the client. But then in the
`(*CSIVolume) Unpublish` RPC handler, we query the volume from the
state store (because we only get an ID from the client). And when we
make the client RPC for the node unpublish step, we use the _current
volume's_ view of the mode. If the volume's mode has been changed
before the old allocations can have their claims released, then we end
up making a CSI RPC that will never succeed.

Why does this code path get the mode from the volume and not the
claim? Because the claim written by the GC job in `(*CoreScheduler)
csiVolumeClaimGC` doesn't have a mode. Instead it just writes a claim
in the unpublishing state to ensure the volumewatcher detects a "past
claim" change and reaps all the claims on the volumes.

Fix this by ensuring that the `CSIVolumeDenormalize` creates past
claims for all nil allocations with a correct access mode set.

b588a7bd

CSI: resolve invalid claim states (#11890) · d0624fc0

Tim Gross authored 3 years ago

* csi: resolve invalid claim states on read

It's currently possible for CSI volumes to be claimed by allocations
that no longer exist. This changeset asserts a reasonable state at
the state store level by registering these nil allocations as "past
claims" on any read. This will cause any pass through the periodic GC
or volumewatcher to trigger the unpublishing workflow for those claims.

* csi: make feasibility check errors more understandable

When the feasibility checker finds we have no free write claims, it
checks to see if any of those claims are for the job we're currently
scheduling (so that earlier versions of a job can't block claims for
new versions) and reports a conflict if the volume can't be scheduled
so that the user can fix their claims. But when the checker hits a
claim that has a GCd allocation, the state is recoverable by the
server once claim reaping completes and no user intervention is
required; the blocked eval should complete. Differentiate the
scheduler error produced by these two conditions.

d0624fc0

client: change test to not poke cgroupv2 edge case · 87d54b8c

Seth Hoenig authored 3 years ago

This PR tweaks the TestCpusetManager_AddAlloc unit test to not break
when being run on a machine using cgroupsv2. The behavior of writing
an empty cpuset.cpu changes in cgroupv2, where such a group now inherits
the value of its parent group, rather than remaining empty.

The test in question was written such that a task would consume all available
cores shared on an alloc, causing the empty set to be written to the shared
group, which works fine on cgroupsv1 but breaks on cgroupsv2. By adjusting
the test to consume only 1 core instead of all cores, it no longer triggers
that edge case.

The actual fix for the new cgroupsv2 behavior will be in #11933

87d54b8c

fix: differentiate commands for circleci and local use · 7f5e0b82
Jai Bhagat authored 3 years ago

7f5e0b82
Merge pull request #11940 from hashicorp/b-docs-add-client-reserved-cores · 402e36bb
James Rasell authored 3 years ago
```
docs: add `cores` to client reserved config block.
```
402e36bb

26 Jan, 2022 15 commits
- ci: add semgrep (#11934) · f6575298
  Luiz Aoqui authored 3 years ago
  
  f6575298
- ui: move volume link to the source column and fix the link target (#11896) · 045bcd79
  André authored 3 years ago
```
The link target used the volume name instead of the volume id.
Fixes issue #11884.
```
  045bcd79
- ui: add local testing script · 9863aa45
  Jai Bhagat authored 3 years ago
  
  9863aa45
- ui: replace qunit start tests with ember-exam start · 757799d4
  Jai Bhagat authored 3 years ago
  
  757799d4
- ui: allow parallel test-runs · b5e3e32d
  Jai Bhagat authored 3 years ago
  
  b5e3e32d
- ui: add ember-exam · 765c04c4
  Jai Bhagat authored 3 years ago
  
  765c04c4
- Merge pull request #11780 from hashicorp/f-ui/job-page-refactor · 94e55fcf
  Jai authored 3 years ago
```
fix:   authorization bug for `job-client-status-summary`
```
  94e55fcf
- ui: add npm script for running ember test server · 2c4a9d2c
  Jai Bhagat authored 3 years ago
  
  2c4a9d2c
- refact: extract setPolicy into utils · 8d8fe0bd
  Jai Bhagat authored 3 years ago
  
  8d8fe0bd
- Update IsEmpty to check for pre-1.2.4 fields (#11930) · a30c7dd5
  Derek Strickland authored 3 years ago
  
  a30c7dd5
- refact: fix tests after contextual job page changes · 6c65966c
  Jai Bhagat authored 3 years ago
  
  6c65966c
- ui: prettify remaining files · c1bb21da
  Jai Bhagat authored 3 years ago
  
  c1bb21da
- docs: add `cores` to client reserved config block. · 08d30323
  James Rasell authored 3 years ago
  
  08d30323
- Merge pull request #11927 from hashicorp/b-hcl1-sidecar_task-resources · 4ca43352
  Seth Hoenig authored 3 years ago
```
connect: fix bug where sidecar_task.resources was ignored with hcl1
```
  4ca43352
- changelog: use pr number not issue number · 629d861a
  Seth Hoenig authored 3 years ago
  
  629d861a
25 Jan, 2022 4 commits

Merge pull request #11920 from hashicorp/dependabot/go_modules/github.com/rs/cors-1.8.2 · 94b744c5
Seth Hoenig authored 3 years ago
```
build(deps): bump github.com/rs/cors from 1.8.0 to 1.8.2
```
94b744c5

connect: fix bug where sidecar_task.resources was ignored with hcl1 · 15442b35

Seth Hoenig authored 3 years ago

The HCL1 parser did not respect connect.sidecar_task.resources if the
connect.sidecar_service block was not set (an optimiztion that no longer
makes sense with connect gateways).

Fixes #10899

15442b35

fix integer bounds checks (#11815) · 358a4681

Tim Gross authored 3 years ago

* driver: fix integer conversion error

The shared executor incorrectly parsed the user's group into int32 and
then cast to uint32 without bounds checking. This is harmless because
an out-of-bounds gid will throw an error later, but it triggers
security and code quality scans. Parse directly to uint32 so that we
get correct error handling.

* helper: fix integer conversion error

The autopilot flags helper incorrectly parses a uint64 to a uint which
is machine specific size. Although we don't have 32-bit builds, this
sets off security and code quality scaans. Parse to the machine sized
uint.

* driver: restrict bounds of port map

The plugin server doesn't constrain the maximum integer for port
maps. This could result in a user-visible misconfiguration, but it
also triggers security and code quality scans. Restrict the bounds
before casting to int32 and return an error.

* cpuset: restrict upper bounds of cpuset values

Our cpuset configuration expects values in the range of uint16 to
match the expectations set by the kernel, but we don't constrain the
values before downcasting. An underflow could lead to allocations
failing on the client rather than being caught earlier. This also make
security and code quality scanners happy.

* http: fix integer downcast for per_page parameter

The parser for the `per_page` query parameter downcasts to int32
without bounds checking. This could result in underflow and
nonsensical paging, but there's no server-side consequences for
this. Fixing this will silence some security and code quality scanners
though.

358a4681

Merge pull request #11907 from hashicorp/f-state-store-nomad-file · 34231188
James Rasell authored 3 years ago
```
state: move restore functionality into its own file.
```
34231188

24 Jan, 2022 7 commits

build(deps): bump github.com/rs/cors from 1.8.0 to 1.8.2 · cce845ab

dependabot[bot] authored 3 years ago

Bumps [github.com/rs/cors](https://github.com/rs/cors) from 1.8.0 to 1.8.2.
- [Release notes](https://github.com/rs/cors/releases)
- [Commits](https://github.com/rs/cors/compare/v1.8.0...v1.8.2

)

---
updated-dependencies:
- dependency-name: github.com/rs/cors
  dependency-type: direct:production
  update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>

cce845ab

Merge pull request #11918 from hashicorp/deps-update-api-deps · 6c51333e
Seth Hoenig authored 3 years ago
```
deps: update api go version and dependencies
```
6c51333e

Merge pull request #11883 from... · 204e2d7f

Seth Hoenig authored 3 years ago

Merge pull request #11883 from hashicorp/dependabot/go_modules/github.com/prometheus/client_golang-1.12.0

build(deps): bump github.com/prometheus/client_golang from 1.7.1 to 1.12.0

204e2d7f

deps: update api go version and dependencies · 5ef844c8

Seth Hoenig authored 3 years ago

This PR sets the minimum Go version for the `api` submodule to Go 1.17.

It also upgrades
 - gorilla/websocket 1.4.1 -> 1.4.2
 - mitchelh/mapstructure 1.4.2 -> 1.4.3
 - stretchr/testify 1.5.1 -> 1.7.0

Closes #11518 #11602 #11528

5ef844c8

Merge pull request #11836 from... · ab6dcebd

Seth Hoenig authored 3 years ago

Merge pull request #11836 from hashicorp/dependabot/go_modules/github.com/hashicorp/memberlist-0.3.1

chore(deps): bump github.com/hashicorp/memberlist from 0.2.2 to 0.3.1

ab6dcebd

csi: update leader's ACL in volumewatcher (#11891) · 9d60df2f

Tim Gross authored 3 years ago

The volumewatcher that runs on the leader needs to make RPC calls
rather than writing to raft (as we do in the deploymentwatcher)
because the unpublish workflow needs to make RPC calls to the
clients. This requires that the volumewatcher has access to the
leader's ACL token.

But when leadership transitions, the new leader creates a new leader
ACL token. This ACL token needs to be passed into the volumewatcher
when we enable it, otherwise the volumewatcher can find itself with a
stale token.

9d60df2f

docs: Update volume create/register mount options to use []string example (#11912) · a01b70cc

Dan Norris authored 3 years ago

The examples for `nomad volume create` and `nomad volume register` are
not setting `mount_flags` using an array of strings.

This fixes the issue by changing the example to be `mount_flags =
["noatime"]`.

a01b70cc