This project is mirrored from https://gitee.com/mirrors/nomad.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 18 Jun, 2021 5 commits
-
-
Seth Hoenig authored
This PR adds validation during job submission that Connect proxy upstreams within a task group are using different listener addresses. Otherwise, a duplicate envoy listener will be created and not be able to bind. Closes #7833
-
Seth Hoenig authored
e2e: fix a couple recent e2e bugs
-
Seth Hoenig authored
This PR changes the e2e helper thingy to set -detach option when registering a job with the CLI instead of the API. This is necessary for jobs which never become healthy, as the deployment never finishes for failing jobs and the command never returns, causing the test to timeout after 10 minutes.
-
Seth Hoenig authored
This PR fixes a bug introduced in a refactoring https://github.com/hashicorp/nomad/pull/10764/files#diff-56b3c82fcbc857f8fb93a903f1610f6e6859b3610a4eddf92bad9ea27fdc85ec where task level service checks would inherent the task name field, when they shouldn't. Fixes #10781
-
Tim Gross authored
Running the `client/allocrunner` tests fail to compile on macOS because the CNI test file depends on the CNI network configurator, which is in a Linux-only file.
-
- 17 Jun, 2021 4 commits
-
-
Tim Gross authored
-
Seth Hoenig authored
consul/connect: in-place update service definition when connect upstreams are modified
-
Tim Gross authored
-
Tim Gross authored
-
- 16 Jun, 2021 4 commits
-
-
Seth Hoenig authored
This PR fixes a bug where modifying the upstreams of a Connect sidecar proxy would not result Consul applying the changes, unless an additional change to the job would trigger a task replacement (thus replacing the service definition). The fix is to check if upstreams have been modified between Nomad's view of the sidecar service definition, and the service definition for the sidecar that is actually registered in Consul. Fixes #8754
-
Tim Gross authored
When `network.mode = "bridge"`, we create a pause container in Docker with no networking so that we have a process to hold the network namespace we create in Nomad. The default `/etc/hosts` file of that pause container is then used for all the Docker tasks that share that network namespace. Some applications rely on this file being populated. This changeset generates a `/etc/hosts` file and bind-mounts it to the container when Nomad owns the network, so that the container's hostname has an IP in the file as expected. The hosts file will include the entries added by the Docker driver's `extra_hosts` field. In this changeset, only the Docker task driver will take advantage of this option, as the `exec`/`java` drivers currently copy the host's `/etc/hosts` file and this can't be changed without breaking backwards compatibility. But the fields are available in the task driver protobuf for community task drivers to use if they'd like.
-
dependabot[bot] authored
Bumps [postcss](https://github.com/postcss/postcss) from 7.0.35 to 7.0.36. - [Release notes](https://github.com/postcss/postcss/releases) - [Changelog](https://github.com/postcss/postcss/blob/main/CHANGELOG.md) - [Commits](https://github.com/postcss/postcss/compare/7.0.35...7.0.36 ) --- updated-dependencies: - dependency-name: postcss dependency-type: indirect ... Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
dependabot[bot] authored
Bumps [ws](https://github.com/websockets/ws) from 7.3.1 to 7.4.6. - [Release notes](https://github.com/websockets/ws/releases) - [Commits](https://github.com/websockets/ws/compare/7.3.1...7.4.6 ) Signed-off-by:
dependabot[bot] <support@github.com> Co-authored-by:
dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
-
- 15 Jun, 2021 9 commits
-
-
Seth Hoenig authored
client/fingerprint/java: improve java version string regex matching
-
Seth Hoenig authored
This PR improves the regular expression used for matching the java version string, which varies a lot depending on the java vendor and version. These are the example strings we now test for: java version "1.7.0_80" openjdk version "11.0.1" 2018-10-16 openjdk version "11.0.1" 2018-10-16 java version "1.6.0_36" openjdk version "1.8.0_192" openjdk 11.0.11 2021-04-20 LTS The last one is a new test added on behalf of #6081, which is still broken on today's CentOS 7 default JDK package. openjdk 11.0.11 2021-04-20 LTS OpenJDK Runtime Environment 18.9 (build 11.0.11+9-LTS) OpenJDK 64-Bit Server VM 18.9 (build 11.0.11+9-LTS, mixed mode, sharing) ==> Evaluation "21c6caf7" finished with status "complete" but failed to place all allocations: Task Group "example" (failed to place 1 allocation): * Constraint "${driver.java.version} >= 11.0.0": 1 nodes excluded by filter Evaluation "2b737d48" waiting for additional capacity to place r...
-
Seth Hoenig authored
consul: make failures_before_critical and success_before_passing work with group services
-
Seth Hoenig authored
-
Seth Hoenig authored
This PR fixes some job submission plumbing to make sure the Consul Check parameters - failure_before_critical - success_before_passing work with group-level services. They already work with task-level services.
-
Seth Hoenig authored
docs: update changelog
-
Seth Hoenig authored
-
James Rasell authored
plugins: fix test data race.
-
James Rasell authored
-
- 14 Jun, 2021 7 commits
-
-
Isabel Suchanek authored
System and batch jobs don't create deployments, which means nomad tries to monitor a non-existent deployment when it runs a job and outputs an error message. This adds a check to make sure a deployment exists before monitoring. Also fixes some formatting. Co-authored-by:
Tim Gross <tgross@hashicorp.com>
-
Mahmood Ali authored
Fix deployment watchers to avoid creating unnecessary deployment watcher goroutines and blocking queries. `deploymentWatcher.getAllocsCh` creates a new goroutine that makes a blocking query to fetch updates of deployment allocs. ## Background When operators submit a new or updated service job, Nomad create a new deployment by default. The deployment object controls how fast to place the allocations through [`max_parallel`](https://www.nomadproject.io/docs/job-specification/update#max_parallel) and health checks configurations. The `scheduler` and `deploymentwatcher` package collaborate to achieve deployment logic: The scheduler only places the canaries and `max_parallel` allocations for a new deployment; the `deploymentwatcher` monitors for alloc progress and then enqueues a new evaluation whenever the scheduler should reprocess a job and places the next `max_parallel` round of allocations. The `deploymentwatcher` package makes blocking queries against the state store, to fetch all deployments and the relevant allocs for each running deployments. If `deploymentwatcher` fails or is hindered from fetching the state, the deployments fail to make progress. `Deploymentwatcher` logic only runs on the leader. ## Why unnecessary deployment watchers can halt cluster progress Previously, `getAllocsCh` is called on every for loop iteration in `deploymentWatcher.watch()` function. However, the for-loop may iterate many times before the allocs get updated. In fact, whenever a new deployment is created/updated/deleted, *all* `deploymentWatcher`s get notified through `w.deploymentUpdateCh`. The `getAllocsCh` goroutines and blocking queries spike significantly and grow quadratically with respect to the number of running deployments. The growth leads to two adverse outcomes: 1. it spikes the CPU/Memory usage resulting potentially leading to OOM or very slow processing 2. it activates the [query rate limiter](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L896-L898), so later the watcher fails to get updates and consequently fails to make progress towards placing new allocations for the deployment! So the cluster fails to catch up and fails to make progress in almost all deployments. The cluster recovers after a leader transition: the deposed leader stops all watchers and free up goroutines and blocking queries; the new leader recreates the watchers without the quadratic growth and remaining under the rate limiter. Well, until a spike of deployments are created triggering the condition again. ### Relevant Code References Path for deployment monitoring: * [`Watcher.watchDeployments`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L164-L192) loops waiting for deployment updates. * On every deployment update, [`w.getDeploys`](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L194-L229) returns all deployments in the system * `watchDeployments` calls `w.add(d)` on every active deployment * which in turns, [updates existing watcher if one is found](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployments_watcher.go#L251-L255). * The deployment watcher [updates local local deployment field and trigger `deploymentUpdateCh` channel]( https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L136-L147) * The [deployment watcher `deploymentUpdateCh` selector is activated](https://github.com/hashicorp/nomad/blob/abaa9c5c5bd09af774fda30d76d5767b06128df4/nomad/deploymentwatcher/deployment_watcher.go#L455-L489 ). Most of the time the selector clause is a no-op, because the flow was triggered due to another deployment update * The `watch` for-loop iterates again and in the previous code we create yet another goroutine and blocking call that risks being rate limited. Co-authored-by:
Tim Gross <tgross@hashicorp.com>
-
Seth Hoenig authored
consul/connect: remove unnecessary connect constraint on clients
-
James Rasell authored
volumewatcher: fix test data race.
-
Tim Gross authored
The `QuotaIterator` is used as the source of nodes passed into feasibility checking for constraints. Every node that passes the quota check counts the allocation resources agains the quota, and as a result we count nodes which will be later filtered out by constraints. Therefore for jobs with constraints, nodes that are feasibility checked but fail have been counted against quotas. This failure mode is order dependent; if all the unfiltered nodes happen to be quota checked first, everything works as expected. This changeset moves the `QuotaIterator` to happen last among all feasibility checkers (but before ranking). The `QuotaIterator` will never receive filtered nodes so it will calculate quotas correctly.
-
Seth Hoenig authored
PR https://github.com/hashicorp/nomad/pull/10702 added 2 new constraints for connect jobs - one for Consul gRPC listener, and one for Connect being enabled on Clients. Connect does not need to be enabled on clients, only on Consul servers. Remove the extra constraint. Discuss: https://discuss.hashicorp.com/t/nomad-1-1-1-and-consul-connect-enabled-on-consul-clients/25295
-
James Rasell authored
-
- 11 Jun, 2021 11 commits
-
-
Brandon Romano authored
Fix headshot image 404
-
Brandon Romano authored
-
Luiz Aoqui authored
-
James Rasell authored
deploymentwatcher: fix test data race.
-
James Rasell authored
chore: remove duplicate import statements
-
Mahmood Ali authored
Deflaking Test 2021 June edition
-
James Rasell authored
core: remove unused types pkg and PeriodicCallback type.
-
James Rasell authored
-
James Rasell authored
-
James Rasell authored
-
James Rasell authored
-