This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 22 Nov, 2022 1 commit
  2. 21 Nov, 2022 5 commits
  3. 19 Nov, 2022 1 commit
  4. 18 Nov, 2022 5 commits
  5. 17 Nov, 2022 10 commits
  6. 16 Nov, 2022 3 commits
    • Tim Gross's avatar
      eval broker: shed all but one blocked eval per job after ack (#14621) · 1c4307b8
      Tim Gross authored
      When an evaluation is acknowledged by a scheduler, the resulting plan is
      guaranteed to cover up to the `waitIndex` set by the worker based on the most
      recent evaluation for that job in the state store. At that point, we no longer
      need to retain blocked evaluations in the broker that are older than that index.
      
      Move all but the highest priority / highest `ModifyIndex` blocked eval into a
      canceled set. When the `Eval.Ack` RPC returns from the eval broker it will
      signal a reap of a batch of cancelable evals to write to raft. This paces the
      cancelations limited by how frequently the schedulers are acknowledging evals;
      this should reduce the risk of cancelations from overwhelming raft relative to
      scheduler progress. In order to avoid straggling batches when the cluster is
      quiet, we also include a periodic sweep through the cancelable list.
      1c4307b8
    • Seth Hoenig's avatar
      e2e: swap bionic image for jammy (#15220) · 0e3606af
      Seth Hoenig authored
      0e3606af
    • Tim Gross's avatar
      test: ensure leader is still valid in reelection test (#15267) · 460f19b6
      Tim Gross authored
      The `TestLeader_Reelection` test waits for a leader to be elected and then makes
      some other assertions. But it implcitly assumes that there's no failure of
      leadership before shutting down the leader, which can lead to a panic in the
      tests. Assert there's still a leader before the shutdown.
      460f19b6
  7. 15 Nov, 2022 4 commits
  8. 14 Nov, 2022 3 commits
    • Tim Gross's avatar
      eval delete: move batching of deletes into RPC handler and state (#15117) · 65b3d01a
      Tim Gross authored
      During unusual outage recovery scenarios on large clusters, a backlog of
      millions of evaluations can appear. In these cases, the `eval delete` command can
      put excessive load on the cluster by listing large sets of evals to extract the
      IDs and then sending larges batches of IDs. Although the command's batch size
      was carefully tuned, we still need to be JSON deserialize, re-serialize to
      MessagePack, send the log entries through raft, and get the FSM applied.
      
      To improve performance of this recovery case, move the batching process into the
      RPC handler and the state store. The design here is a little weird, so let's
      look a the failed options first:
      
      * A naive solution here would be to just send the filter as the raft request and
        let the FSM apply delete the whole set in a single operation. Benchmarking with
        1M evals on a 3 node cluster demonstrated this can block the FSM apply for
        several minutes, which puts the cluster at risk if there's a leadership
        failover (the barrier write can't be made while this apply is in-flight).
      
      * A less naive but still bad solution would be to have the RPC handler filter
        and paginate, and then hand a list of IDs to the existing raft log
        entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
        took roughly an hour to complete.
      
      Instead, we're filtering and paginating in the RPC handler to find a page token,
      and then passing both the filter and page token in the raft log. The FSM apply
      recreates the paginator using the filter and page token to get roughly the same
      page of evaluations, which it then deletes. The pagination process is fairly
      cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
      rework ends up being much faster. A benchmark of 1M evaluations showed this
      blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
      completes in less than 4 minutes.
      
      Note that, as with the existing design, this delete is not consistent: a new
      evaluation inserted "behind" the cursor of the pagination will fail to be
      deleted.
      65b3d01a
    • Douglas Jose's avatar
      Fix wrong reference to `vault` (#15228) · 1217a96e
      Douglas Jose authored
      1217a96e
    • Kyle Root's avatar
      263ed6f9
  9. 11 Nov, 2022 3 commits
    • Charlie Voiselle's avatar
      [bug] Return a spec on reconnect (#15214) · 9ad90290
      Charlie Voiselle authored
      client: fixed a bug where non-`docker` tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running
      9ad90290
    • Seth Hoenig's avatar
      client: avoid unconsumed channel in timer construction (#15215) · 5f3f5215
      Seth Hoenig authored
      * client: avoid unconsumed channel in timer construction
      
      This PR fixes a bug introduced in #11983 where a Timer initialized with 0
      duration causes an immediate tick, even if Reset is called before reading the
      channel. The fix is to avoid doing that, instead creating a Timer with a non-zero
      initial wait time, and then immediately calling Stop.
      
      * pr: remove redundant stop
      5f3f5215
    • Tim Gross's avatar
      exec: allow running commands from host volume (#14851) · 11a5f790
      Tim Gross authored
      The exec driver and other drivers derived from the shared executor check the
      path of the command before handing off to libcontainer to ensure that the
      command doesn't escape the sandbox. But we don't check any host volume mounts,
      which should be safe to use as a source for executables if we're letting the
      user mount them to the container in the first place.
      
      Check the mount config to verify the executable lives in the mount's host path,
      but then return an absolute path within the mount's task path so that we can hand
      that off to libcontainer to run.
      
      Includes a good bit of refactoring here because the anchoring of the final task
      path has different code paths for inside the task dir vs inside a mount. But
      I've fleshed out the test coverage of this a good bit to ensure we haven't
      created any regressions in the process.
      11a5f790
  10. 10 Nov, 2022 5 commits
    • Seth Hoenig's avatar
      docs: clarify how to access task meta values in templates (#15212) · 106dce9c
      Seth Hoenig authored
      This PR updates template and meta docs pages to give examples of accessing
      meta values in templates. To do so one must use the environment variable form
      of the meta key name, which isn't obvious and wasn't yet documented.
      106dce9c
    • Luiz Aoqui's avatar
      a2fed26f
    • Luiz Aoqui's avatar
      ci: re-enable tests on main (#15204) · e20af3cf
      Luiz Aoqui authored
      Now that the tests are grouped more tightly we don't use as many runners
      as before, so we can re-enable these without clogging the queue.
      e20af3cf
    • Piotr Kazmierczak's avatar
      acl: sso auth method schema and store functions (#15191) · 02253e6f
      Piotr Kazmierczak authored
      This PR implements ACLAuthMethod type, acl_auth_methods table schema and crud state store methods. It also updates nomadSnapshot.Persist and nomadSnapshot.Restore methods in order for them to work with the new table, and adds two new Raft messages: ACLAuthMethodsUpsertRequestType and ACLAuthMethodsDeleteRequestType
      
      This PR is part of the SSO work captured under ️ ticket #13120.
      02253e6f
    • Seth Hoenig's avatar
      template: protect use of template manager with a lock (#15192) · 00c8cd37
      Seth Hoenig authored
      This PR protects access to `templateHook.templateManager` with its lock. So
      far we have not been able to reproduce the panic - but it seems either Poststart
      is running without a Prestart being run first (should be impossible), or the
      Update hook is running concurrently with Poststart, nil-ing out the templateManager
      in a race with Poststart.
      
      Fixes #15189
      00c8cd37