This project is mirrored from https://gitee.com/mirrors/nomad.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 28 Jan, 2022 6 commits
-
-
Jai Bhagat authored
To support pagination on evaluations queries.
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Tim Gross authored
When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.
-
- 27 Jan, 2022 8 commits
-
-
Jai authored
ui: test tooling
-
Seth Hoenig authored
client: change test to not poke cgroupv2 edge case
-
Tim Gross authored
* The volume claim GC method and volumewatcher both have logic collecting terminal allocations that duplicates most of the logic that's now in the state store's `CSIVolumeDenormalize` method. Copy this logic into the state store so that all code paths have the same view of the past claims. * Remove logic in the volume claim GC that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the volumewatcher that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the node unpublish RPC that now lives in the state store's `CSIVolumeDenormalize` method.
-
Tim Gross authored
In the client's `(*csiHook) Postrun()` method, we make an unpublish RPC that includes a claim in the `CSIVolumeClaimStateUnpublishing` state and using the mode from the client. But then in the `(*CSIVolume) Unpublish` RPC handler, we query the volume from the state store (because we only get an ID from the client). And when we make the client RPC for the node unpublish step, we use the _current volume's_ view of the mode. If the volume's mode has been changed before the old allocations can have their claims released, then we end up making a CSI RPC that will never succeed. Why does this code path get the mode from the volume and not the claim? Because the claim written by the GC job in `(*CoreScheduler) csiVolumeClaimGC` doesn't have a mode. Instead it just writes a claim in the unpublishing state to ensure the volumewatcher detects a "past claim" change and reaps all the claims on the volumes. Fix this by ensuring that the `CSIVolumeDenormalize` creates past claims for all nil allocations with a correct access mode set.
-
Tim Gross authored
* csi: resolve invalid claim states on read It's currently possible for CSI volumes to be claimed by allocations that no longer exist. This changeset asserts a reasonable state at the state store level by registering these nil allocations as "past claims" on any read. This will cause any pass through the periodic GC or volumewatcher to trigger the unpublishing workflow for those claims. * csi: make feasibility check errors more understandable When the feasibility checker finds we have no free write claims, it checks to see if any of those claims are for the job we're currently scheduling (so that earlier versions of a job can't block claims for new versions) and reports a conflict if the volume can't be scheduled so that the user can fix their claims. But when the checker hits a claim that has a GCd allocation, the state is recoverable by the server once claim reaping completes and no user intervention is required; the blocked eval should complete. Differentiate the scheduler error produced by these two conditions.
-
Seth Hoenig authored
This PR tweaks the TestCpusetManager_AddAlloc unit test to not break when being run on a machine using cgroupsv2. The behavior of writing an empty cpuset.cpu changes in cgroupv2, where such a group now inherits the value of its parent group, rather than remaining empty. The test in question was written such that a task would consume all available cores shared on an alloc, causing the empty set to be written to the shared group, which works fine on cgroupsv1 but breaks on cgroupsv2. By adjusting the test to consume only 1 core instead of all cores, it no longer triggers that edge case. The actual fix for the new cgroupsv2 behavior will be in #11933
-
Jai Bhagat authored
-
James Rasell authored
docs: add `cores` to client reserved config block.
-
- 26 Jan, 2022 15 commits
-
-
Luiz Aoqui authored
-
André authored
The link target used the volume name instead of the volume id. Fixes issue #11884.
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai authored
fix: authorization bug for `job-client-status-summary`
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Derek Strickland authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
James Rasell authored
-
Seth Hoenig authored
connect: fix bug where sidecar_task.resources was ignored with hcl1
-
Seth Hoenig authored
-
- 25 Jan, 2022 4 commits
-
-
Seth Hoenig authored
build(deps): bump github.com/rs/cors from 1.8.0 to 1.8.2
-
Seth Hoenig authored
The HCL1 parser did not respect connect.sidecar_task.resources if the connect.sidecar_service block was not set (an optimiztion that no longer makes sense with connect gateways). Fixes #10899
-
Tim Gross authored
* driver: fix integer conversion error The shared executor incorrectly parsed the user's group into int32 and then cast to uint32 without bounds checking. This is harmless because an out-of-bounds gid will throw an error later, but it triggers security and code quality scans. Parse directly to uint32 so that we get correct error handling. * helper: fix integer conversion error The autopilot flags helper incorrectly parses a uint64 to a uint which is machine specific size. Although we don't have 32-bit builds, this sets off security and code quality scaans. Parse to the machine sized uint. * driver: restrict bounds of port map The plugin server doesn't constrain the maximum integer for port maps. This could result in a user-visible misconfiguration, but it also triggers security and code quality scans. Restrict the bounds before casting to int32 and return an error. * cpuset: restrict upper bounds of cpuset values Our cpuset configuration expects values in the range of uint16 to match the expectations set by the kernel, but we don't constrain the values before downcasting. An underflow could lead to allocations failing on the client rather than being caught earlier. This also make security and code quality scanners happy. * http: fix integer downcast for per_page parameter The parser for the `per_page` query parameter downcasts to int32 without bounds checking. This could result in underflow and nonsensical paging, but there's no server-side consequences for this. Fixing this will silence some security and code quality scanners though.
-
James Rasell authored
state: move restore functionality into its own file.
-
- 24 Jan, 2022 7 commits
-
-
dependabot[bot] authored
Bumps [github.com/rs/cors](https://github.com/rs/cors) from 1.8.0 to 1.8.2. - [Release notes](https://github.com/rs/cors/releases) - [Commits](https://github.com/rs/cors/compare/v1.8.0...v1.8.2 ) --- updated-dependencies: - dependency-name: github.com/rs/cors dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by:
dependabot[bot] <support@github.com>
-
Seth Hoenig authored
deps: update api go version and dependencies
-
Seth Hoenig authored
Merge pull request #11883 from hashicorp/dependabot/go_modules/github.com/prometheus/client_golang-1.12.0 build(deps): bump github.com/prometheus/client_golang from 1.7.1 to 1.12.0
-
Seth Hoenig authored
This PR sets the minimum Go version for the `api` submodule to Go 1.17. It also upgrades - gorilla/websocket 1.4.1 -> 1.4.2 - mitchelh/mapstructure 1.4.2 -> 1.4.3 - stretchr/testify 1.5.1 -> 1.7.0 Closes #11518 #11602 #11528
-
Seth Hoenig authored
Merge pull request #11836 from hashicorp/dependabot/go_modules/github.com/hashicorp/memberlist-0.3.1 chore(deps): bump github.com/hashicorp/memberlist from 0.2.2 to 0.3.1
-
Tim Gross authored
The volumewatcher that runs on the leader needs to make RPC calls rather than writing to raft (as we do in the deploymentwatcher) because the unpublish workflow needs to make RPC calls to the clients. This requires that the volumewatcher has access to the leader's ACL token. But when leadership transitions, the new leader creates a new leader ACL token. This ACL token needs to be passed into the volumewatcher when we enable it, otherwise the volumewatcher can find itself with a stale token.
-
Dan Norris authored
The examples for `nomad volume create` and `nomad volume register` are not setting `mount_flags` using an array of strings. This fixes the issue by changing the example to be `mount_flags = ["noatime"]`.
-