This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 22 Nov, 2022 2 commits
  2. 21 Nov, 2022 10 commits
    • Phil Renaud's avatar
    • Tim Gross's avatar
      ensure engineering has merge authority on build pipeline (#15350) · 0235280b
      Tim Gross authored
      Adds @hashicorp/nomad-eng to the codeowners list for the build and release
      workflow files, so that we can fix problems that arise without being
      bottlenecked on another team.
      0235280b
    • Tim Gross's avatar
      pin build/release pipeline to ubuntu 20.04 (#15348) · ba81ae18
      Tim Gross authored
      The `ubuntu-latest` runner has been migrated to Ubuntu 22.04, which doesn't have
      all the same multilib packages as 20.04. Although we'll probably want to migrate
      eventually, we should ship Nomad 1.4.3 with the same toolchain as we did
      previously so that we're not introducing new issues.
      ba81ae18
    • Seth Hoenig's avatar
      e2e: fixup oversubscription test case for jammy (#15347) · 2372c6d2
      Seth Hoenig authored
      * e2e: fixup oversubscription test case for jammy
      
      jammy uses cgroups v2, need to lookup the max memory limit from the
      unified heirarchy format
      
      * e2e: set constraint to require cgroups v2 on oversub docker test
      2372c6d2
    • James Rasell's avatar
      client: accommodate Consul 1.14.0 gRPC and agent self changes. (#15309) · 847c2cc5
      James Rasell authored
      * client: accommodate Consul 1.14.0 gRPC and agent self changes.
      
      Consul 1.14.0 changed the way in which gRPC listeners are
      configured, particularly when using TLS. Prior to the change, a
      single listener was responsible for handling plain-text and
      encrypted gRPC requests. In 1.14.0 and beyond, separate listeners
      will be used for each, defaulting to 8502 and 8503 for plain-text
      and TLS respectively.
      
      The change means that Nomad’s Consul Connect integration would not
      work when integrated with Consul clusters using TLS and running
      1.14.0 or greater.
      
      The Nomad Consul fingerprinter identifies the gRPC port Consul has
      exposed using the "DebugConfig.GRPCPort" value from Consul’s
      “/v1/agent/self” endpoint. In Consul 1.14.0 and greater, this only
      represents the plain-text gRPC port which is likely to be disbaled
      in clusters running TLS. In order to fix this issue, Nomad now
      takes into account the Consul version and configured scheme to
      optionally use “DebugConfig.GRPCTLSPort” value from Consul’s agent
      self return.
      
      The “consul_grcp_socket” allocrunner hook has also been updated so
      that the fingerprinted gRPC port attribute is passed in. This
      provides a better fallback method, when the operator does not
      configure the “consul.grpc_address” option.
      
      * docs: modify Consul Connect entries to detail 1.14.0 changes.
      
      * changelog: add entry for #15309
      
      * fixup: tidy tests and clean version match from review feedback.
      
      * fixup: use strings tolower func.
      847c2cc5
    • Jai's avatar
      respect casing on service tags (#15329) · 2aff20e8
      Jai authored
      * styles: add service tag style
      
      * refact: update service tag on alloc
      
      * refact: update service tag in component
      2aff20e8
    • Jai's avatar
      style: wrap secret value in tag (#15331) · 34004b09
      Jai authored
      34004b09
    • Seth Hoenig's avatar
      consul: add trace logging around service registrations (#15311) · 3b14db4b
      Seth Hoenig authored
      This PR adds trace logging around the differential done between a Nomad service
      registration and its corresponding Consul service registration, in an effort
      to shed light on why a service registration request is being made.
      3b14db4b
    • Piotr Kazmierczak's avatar
      acl: sso auth method RPC endpoints (#15221) · b7ddd5bf
      Piotr Kazmierczak authored
      This PR implements RPC endpoints for SSO auth methods.
      
      This PR is part of the SSO work captured under ️ ticket #13120.
      b7ddd5bf
    • Piotr Kazmierczak's avatar
      acl: sso auth method event stream (#15280) · fee85dac
      Piotr Kazmierczak authored
      This PR implements SSO auth method support in the event stream.
      
      This PR is part of the SSO work captured under ️ ticket #13120.
      fee85dac
  3. 19 Nov, 2022 1 commit
  4. 18 Nov, 2022 5 commits
  5. 17 Nov, 2022 10 commits
  6. 16 Nov, 2022 3 commits
    • Tim Gross's avatar
      eval broker: shed all but one blocked eval per job after ack (#14621) · 1c4307b8
      Tim Gross authored
      When an evaluation is acknowledged by a scheduler, the resulting plan is
      guaranteed to cover up to the `waitIndex` set by the worker based on the most
      recent evaluation for that job in the state store. At that point, we no longer
      need to retain blocked evaluations in the broker that are older than that index.
      
      Move all but the highest priority / highest `ModifyIndex` blocked eval into a
      canceled set. When the `Eval.Ack` RPC returns from the eval broker it will
      signal a reap of a batch of cancelable evals to write to raft. This paces the
      cancelations limited by how frequently the schedulers are acknowledging evals;
      this should reduce the risk of cancelations from overwhelming raft relative to
      scheduler progress. In order to avoid straggling batches when the cluster is
      quiet, we also include a periodic sweep through the cancelable list.
      1c4307b8
    • Seth Hoenig's avatar
      e2e: swap bionic image for jammy (#15220) · 0e3606af
      Seth Hoenig authored
      0e3606af
    • Tim Gross's avatar
      test: ensure leader is still valid in reelection test (#15267) · 460f19b6
      Tim Gross authored
      The `TestLeader_Reelection` test waits for a leader to be elected and then makes
      some other assertions. But it implcitly assumes that there's no failure of
      leadership before shutting down the leader, which can lead to a panic in the
      tests. Assert there's still a leader before the shutdown.
      460f19b6
  7. 15 Nov, 2022 4 commits
  8. 14 Nov, 2022 3 commits
    • Tim Gross's avatar
      eval delete: move batching of deletes into RPC handler and state (#15117) · 65b3d01a
      Tim Gross authored
      During unusual outage recovery scenarios on large clusters, a backlog of
      millions of evaluations can appear. In these cases, the `eval delete` command can
      put excessive load on the cluster by listing large sets of evals to extract the
      IDs and then sending larges batches of IDs. Although the command's batch size
      was carefully tuned, we still need to be JSON deserialize, re-serialize to
      MessagePack, send the log entries through raft, and get the FSM applied.
      
      To improve performance of this recovery case, move the batching process into the
      RPC handler and the state store. The design here is a little weird, so let's
      look a the failed options first:
      
      * A naive solution here would be to just send the filter as the raft request and
        let the FSM apply delete the whole set in a single operation. Benchmarking with
        1M evals on a 3 node cluster demonstrated this can block the FSM apply for
        several minutes, which puts the cluster at risk if there's a leadership
        failover (the barrier write can't be made while this apply is in-flight).
      
      * A less naive but still bad solution would be to have the RPC handler filter
        and paginate, and then hand a list of IDs to the existing raft log
        entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
        took roughly an hour to complete.
      
      Instead, we're filtering and paginating in the RPC handler to find a page token,
      and then passing both the filter and page token in the raft log. The FSM apply
      recreates the paginator using the filter and page token to get roughly the same
      page of evaluations, which it then deletes. The pagination process is fairly
      cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
      rework ends up being much faster. A benchmark of 1M evaluations showed this
      blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
      completes in less than 4 minutes.
      
      Note that, as with the existing design, this delete is not consistent: a new
      evaluation inserted "behind" the cursor of the pagination will fail to be
      deleted.
      65b3d01a
    • Douglas Jose's avatar
      Fix wrong reference to `vault` (#15228) · 1217a96e
      Douglas Jose authored
      1217a96e
    • Kyle Root's avatar
      263ed6f9
  9. 11 Nov, 2022 2 commits
    • Charlie Voiselle's avatar
      [bug] Return a spec on reconnect (#15214) · 9ad90290
      Charlie Voiselle authored
      client: fixed a bug where non-`docker` tasks with network isolation would leak network namespaces and iptables rules if the client was restarted while they were running
      9ad90290
    • Seth Hoenig's avatar
      client: avoid unconsumed channel in timer construction (#15215) · 5f3f5215
      Seth Hoenig authored
      * client: avoid unconsumed channel in timer construction
      
      This PR fixes a bug introduced in #11983 where a Timer initialized with 0
      duration causes an immediate tick, even if Reset is called before reading the
      channel. The fix is to avoid doing that, instead creating a Timer with a non-zero
      initial wait time, and then immediately calling Stop.
      
      * pr: remove redundant stop
      5f3f5215