This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 10 Nov, 2022 2 commits
    • Luiz Aoqui's avatar
      ci: re-enable tests on main · f692d8e4
      Luiz Aoqui authored
      Now that the tests are grouped more tightly we don't use as many runners
      as before, so we can re-enable these without clogging the queue.
      f692d8e4
    • Seth Hoenig's avatar
      template: protect use of template manager with a lock (#15192) · 00c8cd37
      Seth Hoenig authored
      This PR protects access to `templateHook.templateManager` with its lock. So
      far we have not been able to reproduce the panic - but it seems either Poststart
      is running without a Prestart being run first (should be impossible), or the
      Update hook is running concurrently with Poststart, nil-ing out the templateManager
      in a race with Poststart.
      
      Fixes #15189
      Unverified
      00c8cd37
  2. 08 Nov, 2022 2 commits
    • Seth Hoenig's avatar
      make: add target cl for create changelog entry (#15186) · 72d58fcf
      Seth Hoenig authored
      
      * make: add target cl for create changelog entry
      
      This PR adds `tools/cl-entry` and the `make cl` Makefile target for
      conveniently creating correctly formatted Changelog entries.
      
      * Update tools/cl-entry/main.go
      Co-authored-by: default avatarLuiz Aoqui <luiz@hashicorp.com>
      
      * Update tools/cl-entry/main.go
      Co-authored-by: default avatarLuiz Aoqui <luiz@hashicorp.com>
      Co-authored-by: default avatarLuiz Aoqui <luiz@hashicorp.com>
      Unverified
      72d58fcf
    • Derek Strickland's avatar
      api: remove `mapstructure` tags from`Port` struct (#12916) · 7e8306e4
      Derek Strickland authored
      
      This PR solves a defect in the deserialization of api.Port structs when returning structs from theEventStream.
      
      Previously, the api.Port struct's fields were decorated with both mapstructure and hcl tags to support the network.port stanza's use of the keyword static when posting a static port value. This works fine when posting a job and when retrieving any struct that has an embedded api.Port instance as long as the value is deserialized using JSON decoding. The EventStream, however, uses mapstructure to decode event payloads in the api package. mapstructure expects an underlying field named static which does not exist. The result was that the Port.Value field would always be set to 0.
      
      Upon further inspection, a few things became apparent.
      
      The struct already has hcl tags that support the indirection during job submission.
      Serialization/deserialization with both the json and hcl packages produce the desired result.
      The use of of the mapstructure tags provided no value as the Port struct contains only fields with primitive types.
      This PR:
      
      Removes the mapstructure tags from the api.Port structs
      Updates the job parsing logic to use hcl instead of mapstructure when decoding Port instances.
      Closes #11044
      Co-authored-by: default avatarDerekStrickland <dstrickland@hashicorp.com>
      Co-authored-by: default avatarPiotr Kazmierczak <470696+pkazmierczak@users.noreply.github.com>
      Unverified
      7e8306e4
  3. 07 Nov, 2022 8 commits
  4. 06 Nov, 2022 1 commit
  5. 04 Nov, 2022 4 commits
    • Luiz Aoqui's avatar
      Update alloc after reconnect and enforece client heartbeat order (#15068) · 7828c02a
      Luiz Aoqui authored
      * scheduler: allow updates after alloc reconnects
      
      When an allocation reconnects to a cluster the scheduler needs to run
      special logic to handle the reconnection, check if a replacement was
      create and stop one of them.
      
      If the allocation kept running while the node was disconnected, it will
      be reconnected with `ClientStatus: running` and the node will have
      `Status: ready`. This combination is the same as the normal steady state
      of allocation, where everything is running as expected.
      
      In order to differentiate between the two states (an allocation that is
      reconnecting and one that is just running) the scheduler needs an extra
      piece of state.
      
      The current implementation uses the presence of a
      `TaskClientReconnected` task event to detect when the allocation has
      reconnected and thus must go through the reconnection process. But this
      event remains even after the allocation is reconnected, causing all
      future evals to co...
      Unverified
      7828c02a
    • Luiz Aoqui's avatar
      client: retry RPC call when no server is available (#15140) · f33bb5ec
      Luiz Aoqui authored
      When a Nomad service starts it tries to establish a connection with
      servers, but it also runs alloc runners to manage whatever allocations
      it needs to run.
      
      The alloc runner will invoke several hooks to perform actions, with some
      of them requiring access to the Nomad servers, such as Native Service
      Discovery Registration.
      
      If the alloc runner starts before a connection is established the alloc
      runner will fail, causing the allocation to be shutdown. This is
      particularly problematic for disconnected allocations that are
      reconnecting, as they may fail as soon as the client reconnects.
      
      This commit changes the RPC request logic to retry it, using the
      existing retry mechanism, if there are no servers available.
      Unverified
      f33bb5ec
    • Charlie Voiselle's avatar
      template: error on missing key (#15141) · 52a254ba
      Charlie Voiselle authored
      * Support error_on_missing_value for templates
      * Update docs for template stanza
      Unverified
      52a254ba
    • Seth Hoenig's avatar
      e2e: explicitly wait on task status in chroot download exec test (#15145) · 3c17552d
      Seth Hoenig authored
      Also add some debug log lines for this test, because it doesn't make sense
      for the allocation to be complete yet a task in the allocation to be not
      started yet, which is what the test failures are implying.
      Unverified
      3c17552d
  6. 03 Nov, 2022 5 commits
  7. 02 Nov, 2022 2 commits
  8. 01 Nov, 2022 4 commits
    • Seth Hoenig's avatar
      build: update to go1.19.3 (#15099) · 152f8af9
      Seth Hoenig authored
      Unverified
      152f8af9
    • Tim Gross's avatar
      volumewatcher: prevent panic on nil volume (#15101) · ffbae782
      Tim Gross authored
      If a GC claim is written and then volume is deleted before the `volumewatcher`
      enters its run loop, we panic on the nil-pointer access. Simply doing a
      nil-check at the top of the loop reveals a race condition around shutting down
      the loop just as a new update is coming in.
      
      Have the parent `volumeswatcher` send an initial update on the channel before
      returning, so that we're still holding the lock. Update the watcher's `Stop`
      method to set the running state, which lets us avoid having a second context and
      makes stopping synchronous. This reduces the cases we have to handle in the run
      loop.
      
      Updated the tests now that we'll safely return from the goroutine and stop the
      runner in a larger set of cases. Ran the tests with the `-race` detection flag
      and fixed up any problems found here as well.
      Unverified
      ffbae782
    • Tim Gross's avatar
      variables: limit rekey eval to half the nack timeout (#15102) · 18cb9c76
      Tim Gross authored
      In order to limit how much the rekey job can monopolize a scheduler worker, we
      limit how long it can run to 1min before stopping work and emitting a new
      eval. But this exactly matches the default nack timeout, so it'll fail the eval
      rather than getting a chance to emit a new one.
      
      Set the timeout for the rekey eval to half the configured nack timeout.
      Unverified
      18cb9c76
    • Tim Gross's avatar
      keyring: safely handle missing keys and restore GC (#15092) · 6b2da83f
      Tim Gross authored
      When replication of a single key fails, the replication loop breaks early and
      therefore keys that fall later in the sorting order will never get
      replicated. This is particularly a problem for clusters impacted by the bug that
      caused #14981 and that were later upgraded; the keys that were never replicated
      can now never be replicated, and so we need to handle them safely.
      
      Included in the replication fix:
      * Refactor the replication loop so that each key replicated in a function call
        that returns an error, to make the workflow more clear and reduce nesting. Log
        the error and continue.
      * Improve stability of keyring replication tests. We no longer block leadership
        on initializing the keyring, so there's a race condition in the keyring tests
        where we can test for the existence of the root key before the keyring has
        been initialize. Change this to an "eventually" test.
      
      But these fixes aren't enough to fix #14981 because they'll end up seeing an
      error once a second complaining about the missing key, so we also need to fix
      keyring GC so the keys can be removed from the state store. Now we'll store the
      key ID used to sign a workload identity in the Allocation, and we'll index the
      Allocation table on that so we can track whether any live Allocation was signed
      with a particular key ID.
      Unverified
      6b2da83f
  9. 31 Oct, 2022 6 commits
  10. 28 Oct, 2022 1 commit
    • Tim Gross's avatar
      refactor eval delete safety check (#15070) · 8c19a126
      Tim Gross authored
      The `Eval.Delete` endpoint has a helper that takes a list of jobs and allocs and
      determines whether the eval associated with those is safe to delete (based on
      their state). Filtering improvements to the `Eval.Delete` endpoint are going to
      need this check to run in the state store itself for consistency.
      
      Refactor to push this check down into the state store to keep the eventual diff
      for that work reasonable.
      Unverified
      8c19a126
  11. 27 Oct, 2022 5 commits