This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

11 May, 2022 1 commit

core: emit node evals only for sys jobs in dc · 9f70ede5

Michael Schurter authored 3 years ago

Whenever a node joins the cluster, either for the first time or after
being `down`, we emit a evaluation for every system job to ensure all
applicable system jobs are running on the node.

This patch adds an optimization to skip creating evaluations for system
jobs not in the current node's DC. While the scheduler performs the same
feasability check, skipping the creation of the evaluation altogether
saves disk, network, and memory.

9f70ede5

26 Apr, 2022 4 commits

client: fix waiting on preempted alloc (#12779) · e7924e35

Michael Schurter authored 3 years ago

Fixes #10200

**The bug**

A user reported receiving the following error when an alloc was placed
that needed to preempt existing allocs:

```
[ERROR] client.alloc_watcher: error querying previous alloc:
alloc_id=28... previous_alloc=8e... error="rpc error: alloc lookup
failed: index error: UUID must be 36 characters"
```

The previous alloc (8e) was already complete on the client. This is
possible if an alloc stops *after* the scheduling decision was made to
preempt it, but *before* the node running both allocations was able to
pull and start the preemptor. While that is hopefully a narrow window of
time, you can expect it to occur in high throughput batch scheduling
heavy systems.

However the RPC error made no sense! `previous_alloc` in the logs was a
valid 36 character UUID!

**The fix**

The fix is:

```
-		prevAllocID:  c.Alloc.PreviousAllocation,
+		prevAllocID:  watchedAllocID,
```

The alloc watcher new func used for ...

e7924e35

E2E: move volume mounts test to use golang's stdlib test runner (#12788) · 059c89df
Tim Gross authored 3 years ago
```
Part of ongoing work to remove the old E2E framework code.
```
Unverified

059c89df

E2E: remove old CLI for driving provisioning (#12787) · 26b0e047

Tim Gross authored 3 years ago

We moved off the old provisioning process for nightly E2E to one driven
entirely by Terraform quite a while back now. We're in the slow
process of removing the framework code for this test-by-test, but this
chunk of code no longer has any callers.

26b0e047

CSI: enforce one plugin supervisor loop via `sync.Once` (#12785) · b32722a6

Tim Gross authored 3 years ago

We enforce exactly one plugin supervisor loop by checking whether
`running` is set and returning early. This works but is fairly
subtle. It can briefly result in two goroutines where one quickly
exits before doing any work. Clarify the intent by using
`sync.Once`. The goroutine we've spawned only exits when the entire
task runner is being torn down, and not when the task driver restarts
the workload, so it should never be re-run.

b32722a6

25 Apr, 2022 3 commits
- api: add ParseHCLOpts helper method (#12777) · aeff83b7
  Michael Schurter authored 3 years ago
```
The existing ParseHCL func didn't allow setting HCLv1=true.
```
  Unverified
  
  aeff83b7
- CSI: plugin config updates should always be destructive (#12774) · 3aa520e0
  Tim Gross authored 3 years ago
  
  Unverified
  
  3aa520e0
- update LAST_RELEASE comment to match new release branches structure (#12773) · a01e219b
  Luiz Aoqui authored 3 years ago
  
  Unverified
  
  a01e219b
22 Apr, 2022 19 commits

docs: update json jobs docs (#12766) · e4d6d510

Michael Schurter authored 3 years ago

* docs: update json jobs docs

Did you know that Nomad has not 1 but 2 JSON formats for jobs? 2½ if you
want to acknowledge that sometimes our JSON job representations have a
Job top-level wrapper and sometimes do not.

The 2½ formats are:
```
 1.   HCL JSON
 2.   Input API JSON (top-level Job field)
 2.5. Output API JSON (lacks top-level Job field)
```

`#2` is what our docs consider our API JSON. `#2.5` seems to be an
accident of history we can't fix with breaking API compatibility.

`#1` is an even more interesting accident of history: the `jobspec2`
package automatically detects if the input to Parse is JSON and switches
to a JSON parser. This behavior is undocumented, the format is
unspecified, and there is no official HashiCorp tooling to produce this
JSON from HCL. The plot thickens when you discover popular third party
tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to
produce JSON that `nomad run` accepts!

Since...

e4d6d510

bug: fix filter and search (#12587) · 5660d889

Jai authored 3 years ago

* chore:  remove commented out code and skipped tests

* refact:  triggeredBy requires filter expression not qp

* refact:  use filter expression dsl instead of named params

* fix:  add  type

* docs:  add in-line reference to filter expression DSL

* fix:  update filter copy for non-matches

* fix:  correct conditional logic to render no match copy

5660d889

Sets up a new z-modal z-index and assigns it to the sidebar (#12758) · 34add685
Phil Renaud authored 3 years ago

Unverified

34add685
Accidentally added while setting lint rules elsewhere (#12759) · 30bc79d6
Phil Renaud authored 3 years ago

Unverified

30bc79d6

CSI: plugin supervisor prestart should not mark itself done (#12752) · b1ce3929

Tim Gross authored 3 years ago

The task runner hook `Prestart` response object includes a `Done`
field that's intended to tell the client not to run the hook
again. The plugin supervisor creates mount points for the task during
prestart and saves these mounts in the hook resources. But if a client
restarts the hook resources will not be populated. If the plugin task
restarts at any time after the client restarts, it will fail to have
the correct mounts and crash loop until restart attempts run out.

Fix this by not returning `Done` in the response, just as we do for
the `volume_mount_hook`.

b1ce3929

deps: update consul-template to v0.29.0 (#12747) · ff9c9acc
James Rasell authored 3 years ago
```
* deps: update consul-template to v0.29.0

* changelog: add entry for #12747
```
Unverified

ff9c9acc
Adding changelog note (#12753) · 6b7cefb9
Phil Renaud authored 3 years ago

Unverified

6b7cefb9

[ui] Disconnected Clients: "Unknown" allocations in the UI (#12544) · cabe0570

Phil Renaud authored 3 years ago


* Unknown status for allocations accounted for

* Canary string removed

* Test cleanup

* Generate unknown in mirage

* aacidentally oovervoowled

* Update ui/app/components/allocation-status-bar.js
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>

* Disconnected state on job status in client

* Renaming Disconnected to Unknown in the job-status-in-client

* Unknown accounted for on job rows filtering and testsfix

* Adding lostAllocs as a computed dependency

* Unknown client status within acceptance test

* Swatches updated and PR comments addressed

* Unknown and disconnected added to test fixtures
Co-authored-by: Derek Strickland <1111455+DerekStrickland@users.noreply.github.com>

cabe0570

vault: revert support for entity aliases (#12723) · 0abe5a6c

Luiz Aoqui authored 3 years ago

After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.

0abe5a6c

Merge pull request #12720 from hashicorp/f-arbitrary-addresses · c8bd0904
Seth Hoenig authored 3 years ago
```
services: enable setting arbitrary address value in service registrations
```
Unverified

c8bd0904
services: fix imports · 24431745
Seth Hoenig authored 3 years ago

24431745
services: cr followup · ed37d211
Seth Hoenig authored 3 years ago

ed37d211
services: format ipv6 in nomad service info output · 2e260986
Seth Hoenig authored 3 years ago
```
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
```
2e260986

services: enable setting arbitrary address value in service registrations · 890d4a91

Seth Hoenig authored 3 years ago

This PR introduces the `address` field in the `service` block so that Nomad
or Consul services can be registered with a custom `.Address.` to advertise.

The address can be an IP address or domain name. If the `address` field is
set, the `service.address_mode` must be set in `auto` mode.

890d4a91

E2E: remove platform specific realpath code from UI run script (#12750) · 512338f0

Tim Gross authored 3 years ago

We don't need the absolute path for any of the commands in this script
so long as we `cd` into the source directory path. Doing this removes
the need for weird platform-specific tricks we have to do with
realpath vs GNU realpath.

512338f0

docs: add upgrade note for Consul implicit constraint. (#12749) · 89b74632
James Rasell authored 3 years ago

Unverified

89b74632
CSI: handle nil topologies safely in command line (#12751) · 0b9a85f5
Tim Gross authored 3 years ago

Unverified

0b9a85f5
E2E: fix debug logging on disconnected clients test (#12621) · a29023ef
Tim Gross authored 3 years ago

Unverified

a29023ef
cli: add pagination flags to service info command. (#12730) · 2c6966c6
James Rasell authored 3 years ago

Unverified

2c6966c6

21 Apr, 2022 13 commits

E2E: make UIs runnable from any working directory (#12739) · cf913ba6

Tim Gross authored 3 years ago

The E2E test runner is running from the root of the Nomad
repository. Make this run independent of the working directory for
convenience of developers and the test runner.

cf913ba6

cli: add -json flag to support job commands (#12591) · 7af0c3c9

Michael Schurter authored 3 years ago

* cli: add -json flag to support job commands

While the CLI has always supported running JSON jobs, its support has
been via HCLv2's JSON parsing. I have no idea what format it expects the
job to be in, but it's absolutely not in the same format as the API
expects.

So I ignored that and added a new -json flag to explicitly support *API*
style JSON jobspecs.

The jobspecs can even have the wrapping {"Job": {...}} envelope or not!

* docs: fix example for `nomad job validate`

We haven't been able to validate inside driver config stanzas ever since
the move to task driver plugins.

7af0c3c9

cli: detect directory when applying namespace spec file (#12738) · 42bcb74a

Tim Gross authored 3 years ago

The new `namespace apply` feature that allows for passing a namespace
specification file detects the difference between an empty namespace
and a namespace specification by checking if the file exists. For most
cases, the file will have an extension like `.hcl` and so there's
little danger that a user will apply a file spec when they intended to
apply a file name.

But because directory names typically don't include an extension,
you're much more likely to collide when trying to `namespace apply` by
name only, and then you get a confusing error message of the form:

Failed to read file: read $namespace: is a directory

Detect the case where the namespace name collides with a directory in
the current working directory, and skip trying to load the directory.

42bcb74a

[ui, bugfix] Link fix for volumes where per_alloc=true (#12713) · a977577e

Phil Renaud authored 3 years ago

* Allocation page linkfix

* fix added to task page and computed prop moved to allocation model

* Fallback query added to task group when specific volume isnt knowable

* Delog

* link text reflects alloc suffix

* Helper instead of in-template conditionals

* formatVolumeName unit test

* Removing unused helper import

a977577e

Merge pull request #12736 from hashicorp/build-update-go-1.17.9 · f1fcd509
Seth Hoenig authored 3 years ago
```
build: update golang to 1.17.9
```
Unverified

f1fcd509
build: update golang version script to use .go-version file · 7637a6c9
Seth Hoenig authored 3 years ago

7637a6c9
Merge pull request #12737 from hashicorp/buid-update-ec2-instances · 82066218
Seth Hoenig authored 3 years ago
```
build: update ec2 instance profiles
```
Unverified

82066218
build: update ec2 instance profiles · 96b6a8d9
Seth Hoenig authored 3 years ago
```
using tools/ec2info
```
96b6a8d9
build: update golang to 1.17.9 · 91d91e28
Seth Hoenig authored 3 years ago

91d91e28

docker: back out cgroup v2 OOM detection (#12735) · 55ca76e2

Tim Gross authored 3 years ago

When shutting down an allocation that ends up needing to be
force-killed, we're getting a spurious "OOM Killed (137)" message on
the task termination event. We introduced this as part of cgroups v2
support because the Docker daemon isn't detecting the container status
correctly. Although exit code 137 is the exit code we get for
OOM-killed processes, that's because OOM kill is a `SIGKILL`. So any
sigkilled process will get that exit code.

55ca76e2

E2E: set longer timeout for CSI plugin alloc start (#12732) · 5c17f911

Tim Gross authored 3 years ago

The CSI plugin allocations take a while to be marked healthy,
sometimes causing E2E test flakes during the setup phase of the
tests. There's nothing CSI specific about marking plugin allocs
healthy, as the plugin supervisor hook does all the fingerprinting in
the postrun hook (the prestart hook just makes a couple of empty
directories). The timeouts we're seeing may be because of where we're
pulling the images from; most our jobs pull from a CDN-backed public
registry whereas these are pulling from ECR. Set a 1min timeout for
these to make sure we have enough time to pull the image and start the
task.

5c17f911

api: Add support for filtering and pagination to the node list endpoint (#12727) · 15e6e5be
James Rasell authored 3 years ago

Unverified

15e6e5be
docs: fix broken link from `template` to client config (#12733) · 1f1c9701
Tim Gross authored 3 years ago

Unverified

1f1c9701

Menu

免费DevSecOps平台，让您的项目体验完整的DevSecOps流程，让项目更安全