Commits · 93e115e14e7cf60bb0a8ad9dd3431336972edab3 · 小白蛋 / Nomad

This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed 2 years ago.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.

24 Apr, 2019 1 commit
- Merge pull request #5606 from hashicorp/docs-ea-update-0.9 · 93e115e1
  Preetha authored 6 years ago
```
Docs ea update 0.9
```
  93e115e1
23 Apr, 2019 9 commits

Merge pull request #5599 from hashicorp/docs-091rc1 · 987ed01b
Michael Schurter authored 6 years ago
```
docs: add download link to 0.9.1-rc1
```
987ed01b
docs: add download link to 0.9.1-rc1 · c2ffc69a
Michael Schurter authored 6 years ago

c2ffc69a
website: add plugin docs (#5501) · cfad24c9
Nick Ethier authored 6 years ago
```
website: add plugin docs
```
cfad24c9
website: fixs a few errors in new plugin docs · ff9b3e37
Nick Ethier authored 6 years ago

ff9b3e37
Merge pull request #5598 from hashicorp/b-dont-forward-logs · c76130c0
Mahmood Ali authored 6 years ago
```
fix crash when executor parent nomad process dies
```
c76130c0

fix crash when executor parent nomad process dies · c07c0c81

Mahmood Ali authored 6 years ago

Fixes https://github.com/hashicorp/nomad/issues/5593

Executor seems to die unexpectedly after nomad agent dies or is
restarted.  The crash seems to occur at the first log message after
the nomad agent dies.

To ease debugging we forward executor log messages to executor.log as
well as to Stderr.  `go-plugin` sets up plugins with Stderr pointing to
a pipe being read by plugin client, the nomad agent in our case[1].
When the nomad agent dies, the pipe is closed, and any subsequent
executor logs fail with ErrClosedPipe and SIGPIPE signal.  SIGPIPE
results into executor process dying.

I considered adding a handler to ignore SIGPIPE, but hc-log library
currently panics when logging write operation fails[2]

This we opt to revert to v0.8 behavior of exclusively writing logs to
executor.log, while we investigate alternative options.

[1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535
[2] https://github.c...

c07c0c81

changelog: Update for GH-5512 and GH-5577 · 04d4d86e
Danielle Lancashire authored 6 years ago

04d4d86e
Merge pull request #5512 from hashicorp/dani/f-alloc-stop · 9a4fe5e9
Danielle authored 6 years ago
```
alloc-lifecycle: nomad alloc stop
```
9a4fe5e9

allocs: Add nomad alloc stop · bb142af5

Danielle Lancashire authored 6 years ago

This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.

This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.

The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.

bb142af5

22 Apr, 2019 7 commits
- Merge pull request #5591 from hashicorp/cgbaker/changelog · 09c998a4
  Chris Baker authored 6 years ago
```
changelog: added entry for #5540 fix
```
  09c998a4
- Merge pull request #5586 from hashicorp/docs-deploy-ver · 95bc6fe3
  Michael Schurter authored 6 years ago
```
docs: bump deployment guide to 0.9.0
```
  95bc6fe3
- changelog: added entry for #5540 fix · 184e171e
  Chris Baker authored 6 years ago
  
  184e171e
- Merge pull request #5541 from hashicorp/b/5540-bad-client-alloc-metrics · 7b4ac71d
  Chris Baker authored 6 years ago
```
client/metrics: fixed stale metrics 
```
  7b4ac71d
- Merge pull request #5577 from hashicorp/dani/b-logmon-unrecoverable · 151e0ae7
  Mahmood Ali authored 6 years ago
```
logging: Attempt to recover logmon failures
```
  151e0ae7
- tweak logging level for failed log line · 0f91277d
  Michael Schurter authored 6 years ago
```
Co-Authored-By: notnoop <mahmood@notnoop.com>
```
  0f91277d
- client/metrics: modified metrics to use (updated) client copy of allocation... · 7d8fa4c0
  Chris Baker authored 6 years ago
```
client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy
```
  7d8fa4c0
19 Apr, 2019 15 commits

docs: bump deployment guide to 0.9.0 · a3e8f516
Michael Schurter authored 6 years ago

a3e8f516
Merge pull request #5583 from ygersie/fingerprint_nilpointer · 8a0df403
Michael Schurter authored 6 years ago
```
fix nil pointer in fingerprinting AWS env leading to crash
```
8a0df403
Merge pull request #5437 from hashicorp/r-upstream-libcontainer-plain · 54e1e076
Mahmood Ali authored 6 years ago
```
Use upstream libcontainer package 
```
54e1e076
comment on using init() for libcontainer handling · 67471956
Mahmood Ali authored 6 years ago

67471956
comment what refer to · 9bf54eae
Mahmood Ali authored 6 years ago

9bf54eae
Move libcontainer helper to executor package · b6af5c9d
Mahmood Ali authored 6 years ago

b6af5c9d
vendor upstream opencontainers/runc · 0088f40f
Mahmood Ali authored 6 years ago

0088f40f
Merge pull request #5585 from hashicorp/b-drivers-node-registration · 9050f5f6
Mahmood Ali authored 6 years ago
```
client: wait for batched driver updates before registering nodes
```
9050f5f6
clarify cryptic log line · 8041b0cb
Mahmood Ali authored 6 years ago

8041b0cb

client: log detected driver health state · 9a2f46f3

Mahmood Ali authored 6 years ago

Noticed that `detected drivers` log line was misleading - when a driver
doesn't fingerprint before timeout, their health status is empty string
`""` which we would mark as detected.

Now, we log all drivers along with their state to ease driver
fingerprint debugging.

9a2f46f3

client: avoid registering node twice right away · 9dcebcd8

Mahmood Ali authored 6 years ago

I noticed that `watchNodeUpdates()` almost immediately after
`registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5
seconds.

This call is unnecessary and made debugging a bit harder.  So here, we
ensure that we only re-register node for new node events, not for
initial registration.

9dcebcd8

Update CHANGELOG.md · 92a4033a
Preetha authored 6 years ago

92a4033a

client: wait for batched driver updated · 7a68d761

Mahmood Ali authored 6 years ago

Here we retain 0.8.7 behavior of waiting for driver fingerprints before
registering a node, with some timeout. This is needed for system jobs,
as system job scheduling for node occur at node registration, and the
race might mean that a system job may not get placed on the node because
of missing drivers.

The timeout isn't strictly necessary, but raising it to 1 minute as it's
closer to indefinitely blocked than 1 second. We need to keep the value
high enough to capture as much drivers/devices, but low enough that
doesn't risk blocking too long due to misbehaving plugin.

Fixes https://github.com/hashicorp/nomad/issues/5579

7a68d761

fix nil pointer in fingerprinting AWS env leading to crash · 77a8fda8

Yorick Gersie authored 6 years ago

  HTTP Client returns a nil response if an error has occured. We first
  need to check for an error before being able to check the HTTP response
  code.

77a8fda8

Merge pull request #5580 from hashicorp/f-api-preemption-info · 83a2e693
Preetha authored 6 years ago
```
Add preemption related fields to AllocationListStub
```
83a2e693

18 Apr, 2019 5 commits

Add preemption related fields to AllocationListStub · ad77c18c
Preetha Appan authored 6 years ago

ad77c18c
Merge pull request #5572 from hashicorp/dani/b-docker-volumes · 11388ab9
Danielle authored 6 years ago
```
Switch to pre-0.9 behaviour for handling volumes
```
11388ab9
Merge pull request #5573 from hashicorp/dani/update-vol-docs · 4789948b
Danielle authored 6 years ago
```
docs: Clarify docker volume behaviour
```
4789948b

Switch to pre-0.9 behaviour for handling volumes · ccce364c

Danielle Lancashire authored 6 years ago

In Nomad 0.9, we made volume driver handling the same for `""`, and
`"local"` volumes. Prior to Nomad 0.9 however these had slightly different
behaviour for relative paths and named volumes.

Prior to 0.9 the empty string would expand relative paths within the task
dir, and `"local"` volumes that are not absolute paths would be treated
as docker named volumes.

This commit reverts to the previous behaviour as follows:

| Nomad Version | Driver  |   Volume Spec    | Behaviour                 |
|-------------------------------------------------------------------------
| all           | ""      | testing:/testing | allocdir/testing          |
| 0.8.7         | "local" | testing:/testing | "testing" as named volume |
| 0.9.0         | "local" | testing:/testing | allocdir/testing          |
| 0.9.1         | "local" | testing:/testing | "testing" as named volume |

ccce364c

loggging: Attempt to recover logmon failures · 269e2c00

Danielle Lancashire authored 6 years ago

Currently, when logmon fails to reattach, we will retry reattachment to
the same pid until the task restart specification is exhausted.

Because we cannot clear hook state during error conditions, it is not
possible for us to signal to a future restart that it _shouldn't_
attempt to reattach to the plugin.

Here we revert to explicitly detecting reattachment seperately from a
launch of a new logmon, so we can recover from scenarios where a logmon
plugin has failed.

This is a net improvement over the current hard failure situation, as it
means in the most common case (the pid has gone away), we can recover.

Other reattachment failure modes where the plugin may still be running
could potentially cause a duplicate process, or a subsequent failure to launch
a new plugin.

If there was a duplicate process, it could potentially cause duplicate
logging. This is better than a production workload outage.

If there was a subsequent failure to launch a new plugin, it would fail
in the same (retry until restarts are exhausted) as the current failure
mode.

269e2c00

17 Apr, 2019 3 commits
- Merge pull request #5559 from ArangoGutierrez/website_docs_singularity · 15c64875
  Chris Baker authored 6 years ago
```
list singularity as a community driver
```
  15c64875
- fixed header level · 4a0da839
  Charlie Voiselle authored 6 years ago
  
  4a0da839
- docs: Clairfy docker volume behaviour · acf8ab86
  Danielle Lancashire authored 6 years ago
  
  acf8ab86