This project is mirrored from https://gitee.com/mirrors/nomad.git.
Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
- 28 Jan, 2022 16 commits
-
-
Seth Hoenig authored
Tools should be pinned to a specific dependency when using `make bootstrap` to Go install the tools we use to build Nomad. Using @latest tags means a tool and what it produces could change out from under us.
-
Tim Gross authored
-
Noel Quiles authored
* chore: Add Demandbase tag to consent manager * fix: Add services to manager options
-
Jai authored
feat: add evaluations view with table
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
To support pagination on evaluations queries.
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Tim Gross authored
When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.
-
- 27 Jan, 2022 8 commits
-
-
Jai authored
ui: test tooling
-
Seth Hoenig authored
client: change test to not poke cgroupv2 edge case
-
Tim Gross authored
* The volume claim GC method and volumewatcher both have logic collecting terminal allocations that duplicates most of the logic that's now in the state store's `CSIVolumeDenormalize` method. Copy this logic into the state store so that all code paths have the same view of the past claims. * Remove logic in the volume claim GC that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the volumewatcher that now lives in the state store's `CSIVolumeDenormalize` method. * Remove logic in the node unpublish RPC that now lives in the state store's `CSIVolumeDenormalize` method.
-
Tim Gross authored
In the client's `(*csiHook) Postrun()` method, we make an unpublish RPC that includes a claim in the `CSIVolumeClaimStateUnpublishing` state and using the mode from the client. But then in the `(*CSIVolume) Unpublish` RPC handler, we query the volume from the state store (because we only get an ID from the client). And when we make the client RPC for the node unpublish step, we use the _current volume's_ view of the mode. If the volume's mode has been changed before the old allocations can have their claims released, then we end up making a CSI RPC that will never succeed. Why does this code path get the mode from the volume and not the claim? Because the claim written by the GC job in `(*CoreScheduler) csiVolumeClaimGC` doesn't have a mode. Instead it just writes a claim in the unpublishing state to ensure the volumewatcher detects a "past claim" change and reaps all the claims on the volumes. Fix this by ensuring that...
-
Tim Gross authored
* csi: resolve invalid claim states on read It's currently possible for CSI volumes to be claimed by allocations that no longer exist. This changeset asserts a reasonable state at the state store level by registering these nil allocations as "past claims" on any read. This will cause any pass through the periodic GC or volumewatcher to trigger the unpublishing workflow for those claims. * csi: make feasibility check errors more understandable When the feasibility checker finds we have no free write claims, it checks to see if any of those claims are for the job we're currently scheduling (so that earlier versions of a job can't block claims for new versions) and reports a conflict if the volume can't be scheduled so that the user can fix their claims. But when the checker hits a claim that has a GCd allocation, the state is recoverable by the server once claim reaping completes and no user intervention is required; the blocked eval should comp...
-
Seth Hoenig authored
This PR tweaks the TestCpusetManager_AddAlloc unit test to not break when being run on a machine using cgroupsv2. The behavior of writing an empty cpuset.cpu changes in cgroupv2, where such a group now inherits the value of its parent group, rather than remaining empty. The test in question was written such that a task would consume all available cores shared on an alloc, causing the empty set to be written to the shared group, which works fine on cgroupsv1 but breaks on cgroupsv2. By adjusting the test to consume only 1 core instead of all cores, it no longer triggers that edge case. The actual fix for the new cgroupsv2 behavior will be in #11933
-
Jai Bhagat authored
-
James Rasell authored
docs: add `cores` to client reserved config block.
-
- 26 Jan, 2022 15 commits
-
-
Luiz Aoqui authored
-
André authored
The link target used the volume name instead of the volume id. Fixes issue #11884.
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Jai authored
fix: authorization bug for `job-client-status-summary`
-
Jai Bhagat authored
-
Jai Bhagat authored
-
Derek Strickland authored
-
Jai Bhagat authored
-
Jai Bhagat authored
-
James Rasell authored
-
Seth Hoenig authored
connect: fix bug where sidecar_task.resources was ignored with hcl1
-
Seth Hoenig authored
-
- 25 Jan, 2022 1 commit
-
-
Seth Hoenig authored
build(deps): bump github.com/rs/cors from 1.8.0 to 1.8.2
-