Commits · docs-autoscaler-on-error · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

17 Feb, 2022 1 commit
- docs: add docs for the autoscaler `on_error` and `on_check_error` configuration · 42863113
  Luiz Aoqui authored 3 years ago
  
  42863113
16 Feb, 2022 5 commits
- initial base work for implementing sorting and filter across API endpoints (#12076) · 36e31c51
  Luiz Aoqui authored 3 years ago
  
  36e31c51
- Merge pull request #12077 from hashicorp/b-makefile-use-gobin · 3ebfd7b4
  Seth Hoenig authored 3 years ago
```
build: respect GOBIN when using make targets
```
  3ebfd7b4
- build: respect GOBIN when using make targets · 06613d65
  Seth Hoenig authored 3 years ago
```
This PR updates GNUMakefile to respect $GOBIN if it is set in the
environment or via an $GOENV file. Previously we hard-coded the output
to $GOPATH/bin, which is not necessarily the desired behavior.
```
  06613d65
- Add `go-bexpr` filters to evals and deployment list endpoints (#12034) · fafb7cec
  Luiz Aoqui authored 3 years ago
  
  fafb7cec
- interpolate network.dns block on client (#12021) · 1fabefd2
  Tiernan authored 3 years ago
  
  1fabefd2
15 Feb, 2022 10 commits

CSI: make gRPC client creation more robust (#12057) · b775a73d

Tim Gross authored 3 years ago

Nomad communicates with CSI plugin tasks via gRPC. The plugin
supervisor hook uses this to ping the plugin for health checks which
it emits as task events. After the first successful health check the
plugin supervisor registers the plugin in the client's dynamic plugin
registry, which in turn creates a CSI plugin manager instance that has
its own gRPC client for fingerprinting the plugin and sending mount
requests.

If the plugin manager instance fails to connect to the plugin on its
first attempt, it exits. The plugin supervisor hook is unaware that
connection failed so long as its own pings continue to work. A
transient failure during plugin startup may mislead the plugin
supervisor hook into thinking the plugin is up (so there's no need to
restart the allocation) but no fingerprinter is started.

* Refactors the gRPC client to connect on first use. This provides the
  plugin manager instance the ability to retry the gRPC client
  connection until success.
* Add a 30s timeout to the plugin supervisor so that we don't poll
  forever waiting for a plugin that will never come back up.

Minor improvements:
* The plugin supervisor hook creates a new gRPC client for every probe
  and then throws it away. Instead, reuse the client as we do for the
  plugin manager.
* The gRPC client constructor has a 1 second timeout. Clarify that this
  timeout applies to the connection and not the rest of the client
  lifetime.

b775a73d

Merge pull request #12054 from hashicorp/b-creation-indexes · 07f4227d
Seth Hoenig authored 3 years ago
```
api: return sorted results in certain list endpoints
```
07f4227d

api: return sorted results in certain list endpoints · b432f377

Seth Hoenig authored 3 years ago

These API endpoints now return results in chronological order. They
can return results in reverse chronological order by setting the
query parameter ascending=true.

- Eval.List
- Deployment.List

b432f377

Merge pull request #11955 from hashicorp/f-update-gopsutil · 53577ea3
Seth Hoenig authored 3 years ago
```
Update gopsutil to 3.21.12
```
53577ea3
cl: shorten changelog entry · 5ac59de9
Seth Hoenig authored 3 years ago

5ac59de9
changelog entry (#12072) · 7c027503
Tim Gross authored 3 years ago

7c027503
Merge pull request #12066 from hashicorp/f-make-golint-faster · a56b7958
Seth Hoenig authored 3 years ago
```
build: allow golangci-lint to use more than 1 core
```
a56b7958
config: merge ReservableCores in clientConfig (#12044) · 11dcb875
Alex Holyoake authored 3 years ago

11dcb875
Merge pull request #12069 from alrs/scheduler-test-err · 34de8b56
Seth Hoenig authored 3 years ago
```
scheduler: fix dropped test error
```
34de8b56
scheduler: fix dropped test error · f8d472a1
Lars Lehtonen authored 3 years ago

f8d472a1

14 Feb, 2022 2 commits

build: allow golangci-lint to use more than 1 core · 9ec605ea

Seth Hoenig authored 3 years ago

Since switching to `golangci-lint` we have set the `-j 1` flag, which
restricts the tool to using 1 CPU thread.

This PR removes the flag so `make check` takes less time on good
computers.

9ec605ea

Merge pull request #12052 from hashicorp/b-taskrunner-track-deregistered-call · 282eb10a
James Rasell authored 3 years ago
```
client: track service deregister call so it's only called once.
```
282eb10a

11 Feb, 2022 5 commits

csi: volume cli prefix matching should accept exact match (#12051) · 4afc67b7

Tim Gross authored 3 years ago

The `volume detach`, `volume deregister`, and `volume status` commands
accept a prefix argument for the volume ID. Update the behavior on
exact matches so that if there is more than one volume that matches
the prefix, we should only return an error if one of the volume IDs is
not an exact match. Otherwise we won't be able to use these commands
at all on those volumes. This also makes the behavior of these commands
consistent with `job stop`.

4afc67b7

csi: provide `CSI_ENDPOINT` env var to plugins (#12050) · 16baefcb

Tim Gross authored 3 years ago

The CSI specification says:
> The CO SHALL provide the listen-address for the Plugin by way of the
`CSI_ENDPOINT` environment variable.

Note that plugins without filesystem isolation won't have the plugin
dir bind-mounted to their alloc dir, but we can provide a path to the
socket anyways.

Refactor to use opts struct for plugin supervisor hook config.
The parameter list for configuring the plugin supervisor hook has
grown enough where is makes sense to use an options struct similiar to
many of the other task runner hooks (ex. template).

16baefcb

Merge pull request #12053 from marcaurele/fix-typo · 26061886
James Rasell authored 3 years ago
```
doc(typo): technical typo in advertised example
```
26061886
Merge pull request #12041 from hashicorp/b-gh-12040 · d1ffc237
James Rasell authored 3 years ago
```
changelog: add entry for #12040
```
d1ffc237

client: track service deregister call so it's only called once. · 72f411c9

James Rasell authored 3 years ago

In certain task lifecycles the taskrunner service deregister call
could be called three times for a task that is exiting. Whilst
each hook caller of deregister has its own purpose, we should try
and ensure it is only called once during the shutdown lifecycle of
a task.

This change therefore tracks when deregister has been called, so
that subsequent calls are noop. In the event the task is
restarting, the deregister value is reset to ensure proper
operation.

72f411c9

10 Feb, 2022 16 commits

reconciler: refactor `computeGroup` (#12033) · cefc58dd

Derek Strickland authored 3 years ago

The allocReconciler's computeGroup function contained a significant amount of inline logic that was difficult to understand the intent of. This commit extracts inline logic into the following intention revealing subroutines. It also includes updates to the function internals also aimed at improving maintainability and renames some existing functions for the same purpose. New or renamed functions include.

Renamed functions

- handleGroupCanaries -> cancelUnneededCanaries
- handleDelayedLost -> createLostLaterEvals
- handeDelayedReschedules -> createRescheduleLaterEvals

New functions

- filterAndStopAll
- initializeDeploymentState
- requiresCanaries
- computeCanaries
- computeUnderProvisionedBy
- computeReplacements
- computeDestructiveUpdates
- computeMigrations
- createDeployment
- isDeploymentComplete

cefc58dd

docs: add upgrade note and ACL requirements for the job submit endpoint (#12046) · 6a3368a0
Luiz Aoqui authored 3 years ago

6a3368a0
update download to Nomad v1.2.6 (#12042) · 6d7813d5
Luiz Aoqui authored 3 years ago

6d7813d5
Merge pull request #12045 from hashicorp/merge-release-1.2.6-branch · af332373
Luiz Aoqui authored 3 years ago
```
Merge release 1.2.6 branch
```
af332373
prepare for next release · 096934a5
Luiz Aoqui authored 3 years ago

096934a5
Merge tag 'v1.2.6' into merge-release-1.2.6-branch · bc333c25
Luiz Aoqui authored 3 years ago
```
Version 1.2.6
```
bc333c25
small typo in advertised example · 0cc28e95
Marc-Aurèle Brothier authored 3 years ago

0cc28e95
changelog: add entry for #12040 · 7f0435ae
James Rasell authored 3 years ago

7f0435ae
Release v1.2.6 · 95514d56
Nomad Release Bot authored 3 years ago

95514d56
Generate files for 1.2.6 release · a6c6b475
Nomad Release bot authored 3 years ago

a6c6b475
docs: add 1.2.6 to changelog · a3319d7d
Luiz Aoqui authored 3 years ago

a3319d7d

scheduler: prevent panic in spread iterator during alloc stop · c49359ad

Tim Gross authored 3 years ago

The spread iterator can panic when processing an evaluation, resulting
in an unrecoverable state in the cluster. Whenever a panicked server
restarts and quorum is restored, the next server to dequeue the
evaluation will panic.

To trigger this state:
* The job must have `max_parallel = 0` and a `canary >= 1`.
* The job must not have a `spread` block.
* The job must have a previous version.
* The previous version must have a `spread` block and at least one
  failed allocation.

In this scenario, the desired changes include `(place 1+) (stop
1+), (ignore n) (canary 1)`. Before the scheduler can place the canary
allocation, it tries to find out which allocations can be
stopped. This passes back through the stack so that we can determine
previous-node penalties, etc. We call `SetJob` on the stack with the
previous version of the job, which will include assessing the `spread`
block (even though the results are unused). The task group spread info
sta...

c49359ad

api: prevent excessice CPU load on job parse · 1aa3b561

Luiz Aoqui authored 3 years ago

Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
return early if HCLv2 parsing fails.

The endpoint now requires the new `parse-job` ACL capability or
`submit-job`.

1aa3b561

client: check escaping of alloc dir using symlinks · b3c0e6a7

Seth Hoenig authored 3 years ago

This PR adds symlink resolution when doing validation of paths
to ensure they do not escape client allocation directories.

b3c0e6a7

client: fix race condition in use of go-getter · 6445da9b

Seth Hoenig authored 3 years ago

go-getter creates a circular dependency between a Client and Getter,
which means each is inherently thread-unsafe if you try to re-use
on or the other.

This PR fixes Nomad to no longer make use of the default Getter objects
provided by the go-getter package. Nomad must create a new Client object
on every artifact download, as the Client object controls the Src and Dst
among other things. When Caling Client.Get, the Getter modifies its own
Client reference, creating the circular reference and race condition.

We can still achieve most of the desired connection caching behavior by
re-using a shared HTTP client with transport pooling enabled.

6445da9b

Add changelog · 1e29872d
Charlie Voiselle authored 3 years ago

1e29872d

09 Feb, 2022 1 commit

CSI: use job status not alloc status for plugin updates from summary (#12027) · 05b99001

Tim Gross authored 3 years ago

When an allocation is updated, the job summary for the associated job
is also updated. CSI uses the job summary to set the expected count
for controller and node plugins. We incorrectly used the allocation's
server status instead of the job status when deciding whether to
update or remove the job from the plugins. This caused a node drain or
other terminal state for an allocation to clear the expected count for
the entire plugin.

Use the job status to guide whether to update or remove the expected
count.

The existing CSI tests for the state store incorrectly modeled the
updates we received from servers vs those we received from clients,
leading to test assertions that passed when they should not.

Rework the tests to clarify each step in the lifecycle and rename CSI state
store functions for clarity

05b99001