This project is mirrored from https://gitee.com/mirrors/nomad.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 24 Apr, 2019 1 commit
-
-
Preetha authored
Docs ea update 0.9
-
- 23 Apr, 2019 9 commits
-
-
Michael Schurter authored
docs: add download link to 0.9.1-rc1
-
Michael Schurter authored
-
Nick Ethier authored
website: add plugin docs
-
Nick Ethier authored
-
Mahmood Ali authored
fix crash when executor parent nomad process dies
-
Mahmood Ali authored
Fixes https://github.com/hashicorp/nomad/issues/5593 Executor seems to die unexpectedly after nomad agent dies or is restarted. The crash seems to occur at the first log message after the nomad agent dies. To ease debugging we forward executor log messages to executor.log as well as to Stderr. `go-plugin` sets up plugins with Stderr pointing to a pipe being read by plugin client, the nomad agent in our case[1]. When the nomad agent dies, the pipe is closed, and any subsequent executor logs fail with ErrClosedPipe and SIGPIPE signal. SIGPIPE results into executor process dying. I considered adding a handler to ignore SIGPIPE, but hc-log library currently panics when logging write operation fails[2] This we opt to revert to v0.8 behavior of exclusively writing logs to executor.log, while we investigate alternative options. [1] https://github.com/hashicorp/nomad/blob/v0.9.0/vendor/github.com/hashicorp/go-plugin/client.go#L528-L535 [2] https://github.c...
-
Danielle Lancashire authored
-
Danielle authored
alloc-lifecycle: nomad alloc stop
-
Danielle Lancashire authored
This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.
-
- 22 Apr, 2019 7 commits
-
-
Chris Baker authored
changelog: added entry for #5540 fix
-
Michael Schurter authored
docs: bump deployment guide to 0.9.0
-
Chris Baker authored
-
Chris Baker authored
client/metrics: fixed stale metrics
-
Mahmood Ali authored
logging: Attempt to recover logmon failures
-
Michael Schurter authored
Co-Authored-By:
notnoop <mahmood@notnoop.com>
-
Chris Baker authored
client/metrics: modified metrics to use (updated) client copy of allocation instead of (unupdated) server copy
-
- 19 Apr, 2019 15 commits
-
-
Michael Schurter authored
-
Michael Schurter authored
fix nil pointer in fingerprinting AWS env leading to crash
-
Mahmood Ali authored
Use upstream libcontainer package
-
Mahmood Ali authored
-
Mahmood Ali authored
-
Mahmood Ali authored
-
Mahmood Ali authored
-
Mahmood Ali authored
client: wait for batched driver updates before registering nodes
-
Mahmood Ali authored
-
Mahmood Ali authored
Noticed that `detected drivers` log line was misleading - when a driver doesn't fingerprint before timeout, their health status is empty string `""` which we would mark as detected. Now, we log all drivers along with their state to ease driver fingerprint debugging.
-
Mahmood Ali authored
I noticed that `watchNodeUpdates()` almost immediately after `registerAndHeartbeat()` calls `retryRegisterNode()`, well after 5 seconds. This call is unnecessary and made debugging a bit harder. So here, we ensure that we only re-register node for new node events, not for initial registration.
-
Preetha authored
-
Mahmood Ali authored
Here we retain 0.8.7 behavior of waiting for driver fingerprints before registering a node, with some timeout. This is needed for system jobs, as system job scheduling for node occur at node registration, and the race might mean that a system job may not get placed on the node because of missing drivers. The timeout isn't strictly necessary, but raising it to 1 minute as it's closer to indefinitely blocked than 1 second. We need to keep the value high enough to capture as much drivers/devices, but low enough that doesn't risk blocking too long due to misbehaving plugin. Fixes https://github.com/hashicorp/nomad/issues/5579
-
Yorick Gersie authored
HTTP Client returns a nil response if an error has occured. We first need to check for an error before being able to check the HTTP response code.
-
Preetha authored
Add preemption related fields to AllocationListStub
-
- 18 Apr, 2019 5 commits
-
-
Preetha Appan authored
-
Danielle authored
Switch to pre-0.9 behaviour for handling volumes
-
Danielle authored
docs: Clarify docker volume behaviour
-
Danielle Lancashire authored
In Nomad 0.9, we made volume driver handling the same for `""`, and `"local"` volumes. Prior to Nomad 0.9 however these had slightly different behaviour for relative paths and named volumes. Prior to 0.9 the empty string would expand relative paths within the task dir, and `"local"` volumes that are not absolute paths would be treated as docker named volumes. This commit reverts to the previous behaviour as follows: | Nomad Version | Driver | Volume Spec | Behaviour | |------------------------------------------------------------------------- | all | "" | testing:/testing | allocdir/testing | | 0.8.7 | "local" | testing:/testing | "testing" as named volume | | 0.9.0 | "local" | testing:/testing | allocdir/testing | | 0.9.1 | "local" | testing:/testing | "testing" as named volume |
-
Danielle Lancashire authored
Currently, when logmon fails to reattach, we will retry reattachment to the same pid until the task restart specification is exhausted. Because we cannot clear hook state during error conditions, it is not possible for us to signal to a future restart that it _shouldn't_ attempt to reattach to the plugin. Here we revert to explicitly detecting reattachment seperately from a launch of a new logmon, so we can recover from scenarios where a logmon plugin has failed. This is a net improvement over the current hard failure situation, as it means in the most common case (the pid has gone away), we can recover. Other reattachment failure modes where the plugin may still be running could potentially cause a duplicate process, or a subsequent failure to launch a new plugin. If there was a duplicate process, it could potentially cause duplicate logging. This is better than a production workload outage. If there was a subsequent failure to launch a new plugin, it would fail in the same (retry until restarts are exhausted) as the current failure mode.
-
- 17 Apr, 2019 3 commits
-
-
Chris Baker authored
list singularity as a community driver
-
Charlie Voiselle authored
-
Danielle Lancashire authored
-