Commits · mdrake/svc-acct-codeowner · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

06 May, 2022 5 commits
- add service acct to codeowners for backport merging · a0ecdac6
  Morgan Drake authored 3 years ago
  
  a0ecdac6
- docs: add version note to nomad services template (#12910) · 76e6b5d2
  Chetan Sarva authored 3 years ago
  
  76e6b5d2
- Changelog for visual diff tests (#12909) · 592222bb
  Phil Renaud authored 3 years ago
  
  592222bb
- ci: update backport assitant workflow (#12899) · 4df64859
  Luiz Aoqui authored 3 years ago
```
Remove the step to automatically backport `backport/website` PRs to the
latest release. This will be done manually by adding the proper tags.

Also use squash backports to match the pattern we use for `main`.
```
  4df64859
- fsm: add service registration snapshot persistence. (#12896) · 3956854c
  James Rasell authored 3 years ago
  
  3956854c
05 May, 2022 8 commits

ci: revert file changes and add some checks (#12873) · d7d578b3

Luiz Aoqui authored 3 years ago

During the release there are several files that need to be modified:

  - .release/ci.hcl: the notification channel needs to be updated to a
    channel with greater team visibility during the release.
  - version/version.go: the Version and VersionPrerelease variables
    need to be set so they match the release version.

After the release these files need to be reverted.

For GA releases the following additional changes also need to happen:

  - version/version.go: the Version variable needs to be bumped to the
    next version number.
  - GNUMakefile: the LAST_RELEASE variable needs to be set to the
    version that was just released.

Since the release process will commit file changes to the branch being
used for the release, it should _never_ run on main, so the first step
is now to protect against that.

It also adds a validation to make the user input version is correct.

After looking at the different release options and st...

d7d578b3

Chronological most-recent evals by default (#12847) · f1fdca55

Phil Renaud authored 3 years ago

* Chronological most-recent evals by default

* Adding reverse: true to the list of expected queryparams in test

* changelog

f1fdca55

Percy snapshot tests (#12872) · f34938d9

Phil Renaud authored 3 years ago

* Sample percy test added

* Node engine up to 14.x for UI prep

* Force ui test rerun

* Updated config.yml

* Node v upgraded to 14 for docker image

* Expect length in test

* Running ember tests under percy exec

* Percy exec format

* Percy cli added

* Noop to rerun tests with updated percy_token

* Evals full list and details open snapshots

* Pretty legit use of assert so disable the warning

* Jobs list tests

* Snapshots for top-level clients, servers, ACL, topology, and storage lists

* Expect caveat for Topology test

* Stabilizing tests with faker seeded to 1

* Seed-stabilizing any tests with percySnapshots

* Faker import

* Drop unused param

* Assets and test audit using an older node version

* New strategy: avoid seeding, just use percyCSS to hide certain things

f34938d9

Merge pull request #12875 from hashicorp/b-cgroupsv2-task-restarts · 7c91ac07
Seth Hoenig authored 3 years ago
```
cgroups: make sure cgroup still exists after task restart
```
7c91ac07

docs: add missing `set_contains_any` constraint docs (#12886) · 29c014fb

Tim Gross authored 3 years ago

This constraint and affinity was added in 0.9.x but was only
documented for affinities. Close that documentation gap.

29c014fb

website: remove source code and related CI jobs (#12596) · 9412a840

Bryce Kalow authored 3 years ago

* remove website source code and related circle jobs

* remove data files

* updates platform-cli

* update local instructions

* updates package-lock

9412a840

cgroups: make sure cgroup still exists after task restart · 37ffd2ff

Seth Hoenig authored 3 years ago

This PR modifies raw_exec and exec to ensure the cgroup for a task
they are driving still exists during a task restart. These drivers
have the same bug but with different root cause.

For raw_exec, we were removing the cgroup in 2 places - the cpuset
manager, and in the unix containment implementation (the thing that
uses freezer cgroup to clean house). During a task restart, the
containment would remove the cgroup, and when the task runner hooks
went to start again would block on waiting for the cgroup to exist,
which will never happen, because it gets created by the cpuset manager
which only runs as an alloc pre-start hook. The fix here is to simply
not delete the cgroup in the containment implementation; killing the
PIDs is enough. The removal happens in the cpuset manager later anyway.

For exec, it's the same idea, except DestroyTask is called on task
failure, which in turn calls into libcontainer, which in turn deletes
the cgroup. In this cas...

37ffd2ff

core: add namespace to plan for node rejected log line. (#12868) · 0310a963
James Rasell authored 3 years ago

0310a963

04 May, 2022 2 commits
- release: fix hcl linting error within CI file. (#12867) · 52faa167
  James Rasell authored 3 years ago
  
  52faa167
- Add config key to the promote-staging event (#12857) · ed0d375e
  Michele Degges authored 3 years ago
  
  ed0d375e
03 May, 2022 3 commits

Add config key to the promote-staging event · d551cda6
Michele Degges authored 3 years ago

d551cda6

CSI: node drain should end once only plugins remain (#12846) · 9d5c7b5d

Tim Gross authored 3 years ago

In #12324 we made it so that plugins wait until the node drain is
complete, as we do for system jobs. But we neglected to mark the node
drain as complete once only plugins (or system jobs) remaining, which
means that the node drain is left in a draining state until the
`deadline` time expires. This was incorrectly documented as expected
behavior in #12324.

9d5c7b5d

[WIP] feat: homepage and use case pages redesign (#11873) · e0ca2f4f

Alex Carpenter authored 3 years ago

* feat: connect homepage and use case pages

* fix: internalLink usage

* fix: query name

* chore: add homepage patterns

* chore: remove offerings

* chore: add intro features

* chore: bump subnav

* chore: updating patterns

* chore: add use case to the subnav

* chore: cleanup unused import

* chore: remove subnav border

e0ca2f4f

02 May, 2022 6 commits

Update CHANGELOG for 1.3.0-rc.1 (#12849) · d3f26a55
Luiz Aoqui authored 3 years ago

d3f26a55
Merge pull request #12740 from hashicorp/cleanup-makefile-help · 4d404b39
Seth Hoenig authored 3 years ago
```
build: add missing help descriptions to makefile
```
4d404b39
Merge pull request #12840 from hashicorp/docs-nvidia-updates · 30ec18da
Seth Hoenig authored 3 years ago
```
docs: update nvidia driver documentation
```
30ec18da

ui: fix an error when navigating to a task group (#12832) · c333eb60

Luiz Aoqui authored 3 years ago

Clicking in a task group row in the job details page would throw the
error:

Uncaught Error: You didn't provide enough string/numeric parameters to satisfy all of the dynamic segments for route jobs.job.task-group. Missing params: name
createParamHandlerInfo http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4814
applyToHandlers http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4804
applyToState http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4801
getTransitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4843
transitionByIntent http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4836
refresh http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:4885
refresh http://localhost:4646/ui/assets/vendor-194b1e0d68d11ef7a4bf334eb30ba74d.js:2254
queryParamsDid...

c333eb60

docs: update nvidia driver documentation · d352ab25

Seth Hoenig authored 3 years ago

notably:
- name of the compiled binary is 'nomad-device-nvidia', not 'nvidia-gpu'
- link to Nvidia docs for installing the container runtime toolkit
- list docker v19.03 as minimum version, to track with nvidia's new container runtime toolkit

d352ab25

nomad can also install autocomplete for fish shell (#12834) · 89a794c9
Matus Goljer authored 3 years ago

89a794c9

29 Apr, 2022 2 commits

ci: remove unused CircleCI Makefile (#12828) · dfda28da

Luiz Aoqui authored 3 years ago

This Makefile was used to generate the full config.yml from smaller
sub-files, but this is not done anymore.

dfda28da

docs: clarify `capacity_min/max` for volumes (#12825) · 342a4ee7

Tim Gross authored 3 years ago

The capacity fields for `create volume` set bounds on the resulting
size of the volume, but the ultimate size of the volume will be
determined by the storage provider (between the min and max). Clarify
this in the documentation and provide a suggestion for how to set a
exact size.

342a4ee7

28 Apr, 2022 7 commits

docs: Add known limitations callouts to Max Client Disconnect section (#12801) · 2118226c
Derek Strickland authored 3 years ago
```
* docs: Add known limitations callouts to Max Client Disconnect section
```
2118226c
Moves the evaluations table toolbar outside of the table-container (#12799) · 3c4d09cd
Phil Renaud authored 3 years ago

3c4d09cd
ci: update the `hashicorp/actions-generate-metadata` action version (#12813) · 2ffa7108
Luiz Aoqui authored 3 years ago

2ffa7108

fix broken link to `task-group` in `Recent Allocation` table in `jobs.job.index` (#12765) · c180c8d4

Jai authored 3 years ago


* chore:  run prettier on hbs files

* ui:  ensure to pass a real job object to task-group link

* chore:  add changelog entry

* chore: prettify template

* ui:  template helper for formatting jobId in LinkTo component

* ui:  handle async relationship

* ui:  pass in job id to model arg instead of job model

* update test for serialized namespace

* ui:  defend against null  in tests

* ui:  prettified template added whitespace

* ui:  rollback ember-data to 3.24 because watcher return undefined on abort

* ui: use format-job-helper instead of job model via alloc

* ui: fix whitespace in template caused by prettier using template helper

* ui: update test for new namespace

* ui: revert prettier change
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>

c180c8d4

debug: add version constraint to avoid pprof panic (#12807) · 522b6308
Dave May authored 3 years ago

522b6308
ci: fix build workflow trigger on push (#12806) · 9dccbb1c
Luiz Aoqui authored 3 years ago

9dccbb1c
ci: setup release process with CRT (#12781) · d6315878
Luiz Aoqui authored 3 years ago

d6315878

27 Apr, 2022 3 commits

e2e: Wait for deployment to finish before disconnect (#12795) · de59d730
Derek Strickland authored 3 years ago
```
* Wait for deployment to finish
* Don't reschedule disconnect or restart-node jobs
```
de59d730

[ui, mirage] Evaluation mocks (#12471) · bae5bc16

Phil Renaud authored 3 years ago

* Linear and Branching mock evaluations

* De-comment

* test-trigger

* Making evaluation trees dynamic

* Reinstated job relationship on eval mock

* Dasherize job prefix back to normal

* Handle bug where UUIDKey is not present on job

* Appending node to eval

* Job ID as a passed property

* Remove unused import

* Branching evals set up as generatable

bae5bc16

remove pre-0.9 driver code and related E2E test (#12791) · 3671ea6a

Tim Gross authored 3 years ago

This test exercises upgrades between 0.8 and Nomad versions greater
than 0.9. We have not supported 0.8.x in a very long time and in any
case the test has been marked to skip because the downloader doesn't
work.

3671ea6a

26 Apr, 2022 4 commits

client: fix waiting on preempted alloc (#12779) · e7924e35

Michael Schurter authored 3 years ago

Fixes #10200

**The bug**

A user reported receiving the following error when an alloc was placed
that needed to preempt existing allocs:

```
[ERROR] client.alloc_watcher: error querying previous alloc:
alloc_id=28... previous_alloc=8e... error="rpc error: alloc lookup
failed: index error: UUID must be 36 characters"
```

The previous alloc (8e) was already complete on the client. This is
possible if an alloc stops *after* the scheduling decision was made to
preempt it, but *before* the node running both allocations was able to
pull and start the preemptor. While that is hopefully a narrow window of
time, you can expect it to occur in high throughput batch scheduling
heavy systems.

However the RPC error made no sense! `previous_alloc` in the logs was a
valid 36 character UUID!

**The fix**

The fix is:

```
- prevAllocID: c.Alloc.PreviousAllocation,
+ prevAllocID: watchedAllocID,
```

The alloc watcher new func used for preemption improperly referenced
Alloc.PreviousAllocation instead of the passed in watchedAllocID. When
multiple allocs are preempted, a watcher is created for each with
watchedAllocID set properly by the caller. In this case
Alloc.PreviousAllocation="" -- which is where the `UUID must be 36 characters`
error was coming from! Sadly we were properly referencing
watchedAllocID in the log, so it made the error make no sense!

**The repro**

I was able to reproduce this with a dev agent with [preemption enabled](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hcl)
and [lowered limits](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-limits-hcl)
for ease of repro.

First I started a [low priority count 3 job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-lo-nomad),
then a [high priority job](https://gist.github.com/schmichael/53f79cbd898afdfab76865ad8c7fc6a0#file-preempt-hi-nomad)
that evicts 2 low priority jobs. Everything worked as expected.

However if I force it to use the [remotePrevAlloc implementation](https://github.com/hashicorp/nomad/blob/v1.3.0-beta.1/client/allocwatcher/alloc_watcher.go#L147),
it reproduces the bug because the watcher references PreviousAllocation
instead of watchedAllocID.

e7924e35

E2E: move volume mounts test to use golang's stdlib test runner (#12788) · 059c89df
Tim Gross authored 3 years ago
```
Part of ongoing work to remove the old E2E framework code.
```
059c89df

E2E: remove old CLI for driving provisioning (#12787) · 26b0e047

Tim Gross authored 3 years ago

We moved off the old provisioning process for nightly E2E to one driven
entirely by Terraform quite a while back now. We're in the slow
process of removing the framework code for this test-by-test, but this
chunk of code no longer has any callers.

26b0e047

CSI: enforce one plugin supervisor loop via `sync.Once` (#12785) · b32722a6

Tim Gross authored 3 years ago

We enforce exactly one plugin supervisor loop by checking whether
`running` is set and returning early. This works but is fairly
subtle. It can briefly result in two goroutines where one quickly
exits before doing any work. Clarify the intent by using
`sync.Once`. The goroutine we've spawned only exits when the entire
task runner is being torn down, and not when the task driver restarts
the workload, so it should never be re-run.

b32722a6