This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 10 Feb, 2022 13 commits
    • Charlie Voiselle's avatar
      Fixed scheduler config examples · c95a4a38
      Charlie Voiselle authored
      c95a4a38
    • Luiz Aoqui's avatar
    • Luiz Aoqui's avatar
      update download to Nomad v1.2.6 (#12042) · 6d7813d5
      Luiz Aoqui authored
      6d7813d5
    • Luiz Aoqui's avatar
      Merge pull request #12045 from hashicorp/merge-release-1.2.6-branch · af332373
      Luiz Aoqui authored
      Merge release 1.2.6 branch
      af332373
    • Luiz Aoqui's avatar
      prepare for next release · 096934a5
      Luiz Aoqui authored
      096934a5
    • Luiz Aoqui's avatar
      Merge tag 'v1.2.6' into merge-release-1.2.6-branch · bc333c25
      Luiz Aoqui authored
      Version 1.2.6
      bc333c25
    • Nomad Release Bot's avatar
      Release v1.2.6 · 95514d56
      Nomad Release Bot authored
      95514d56
    • Nomad Release bot's avatar
      Generate files for 1.2.6 release · a6c6b475
      Nomad Release bot authored
      a6c6b475
    • Luiz Aoqui's avatar
      docs: add 1.2.6 to changelog · a3319d7d
      Luiz Aoqui authored
      a3319d7d
    • Tim Gross's avatar
      scheduler: prevent panic in spread iterator during alloc stop · c49359ad
      Tim Gross authored
      The spread iterator can panic when processing an evaluation, resulting
      in an unrecoverable state in the cluster. Whenever a panicked server
      restarts and quorum is restored, the next server to dequeue the
      evaluation will panic.
      
      To trigger this state:
      * The job must have `max_parallel = 0` and a `canary >= 1`.
      * The job must not have a `spread` block.
      * The job must have a previous version.
      * The previous version must have a `spread` block and at least one
        failed allocation.
      
      In this scenario, the desired changes include `(place 1+) (stop
      1+), (ignore n) (canary 1)`. Before the scheduler can place the canary
      allocation, it tries to find out which allocations can be
      stopped. This passes back through the stack so that we can determine
      previous-node penalties, etc. We call `SetJob` on the stack with the
      previous version of the job, which will include assessing the `spread`
      block (even though the results are unused). The task group spread info
      state from that pass through the spread iterator is not reset when we
      call `SetJob` again. When the new job version iterates over the
      `groupPropertySets`, it will get an empty `spreadAttributeMap`,
      resulting in an unexpected nil pointer dereference.
      
      This changeset resets the spread iterator internal state when setting
      the job, logging with a bypass around the bug in case we hit similar
      cases, and a test that panics the scheduler without the patch.
      c49359ad
    • Luiz Aoqui's avatar
      api: prevent excessice CPU load on job parse · 1aa3b561
      Luiz Aoqui authored
      Add new namespace ACL requirement for the /v1/jobs/parse endpoint and
      return early if HCLv2 parsing fails.
      
      The endpoint now requires the new `parse-job` ACL capability or
      `submit-job`.
      1aa3b561
    • Seth Hoenig's avatar
      client: check escaping of alloc dir using symlinks · b3c0e6a7
      Seth Hoenig authored
      This PR adds symlink resolution when doing validation of paths
      to ensure they do not escape client allocation directories.
      b3c0e6a7
    • Seth Hoenig's avatar
      client: fix race condition in use of go-getter · 6445da9b
      Seth Hoenig authored
      go-getter creates a circular dependency between a Client and Getter,
      which means each is inherently thread-unsafe if you try to re-use
      on or the other.
      
      This PR fixes Nomad to no longer make use of the default Getter objects
      provided by the go-getter package. Nomad must create a new Client object
      on every artifact download, as the Client object controls the Src and Dst
      among other things. When Caling Client.Get, the Getter modifies its own
      Client reference, creating the circular reference and race condition.
      
      We can still achieve most of the desired connection caching behavior by
      re-using a shared HTTP client with transport pooling enabled.
      6445da9b
  2. 09 Feb, 2022 3 commits
    • Tim Gross's avatar
      CSI: use job status not alloc status for plugin updates from summary (#12027) · 05b99001
      Tim Gross authored
      When an allocation is updated, the job summary for the associated job
      is also updated. CSI uses the job summary to set the expected count
      for controller and node plugins. We incorrectly used the allocation's
      server status instead of the job status when deciding whether to
      update or remove the job from the plugins. This caused a node drain or
      other terminal state for an allocation to clear the expected count for
      the entire plugin.
      
      Use the job status to guide whether to update or remove the expected
      count.
      
      The existing CSI tests for the state store incorrectly modeled the
      updates we received from servers vs those we received from clients,
      leading to test assertions that passed when they should not.
      
      Rework the tests to clarify each step in the lifecycle and rename CSI state
      store functions for clarity
      05b99001
    • Tim Gross's avatar
      b3212a5b
    • Kevin Schoonover's avatar
  3. 08 Feb, 2022 9 commits
  4. 07 Feb, 2022 3 commits
    • Dylan Staley's avatar
      Merge pull request #11936 from hashicorp/ds.ie11-warning · 21f7d011
      Dylan Staley authored
      website: display warning in IE 11
      21f7d011
    • Kevin Schoonover's avatar
      address comments · 5cea3663
      Kevin Schoonover authored
      Co-authored-by: default avatarSeth Hoenig <seth.a.hoenig@gmail.com>
      5cea3663
    • Tim Gross's avatar
      scheduler: recover from panic (#12009) · f8111692
      Tim Gross authored
      If processing a specific evaluation causes the scheduler (and
      therefore the entire server) to panic, that evaluation will never
      get a chance to be nack'd and cleared from the state store. It will
      get dequeued by another scheduler, causing that server to panic, and
      so forth until all servers are in a panic loop. This prevents the
      operator from intervening to remove the evaluation or update the
      state.
      
      Recover the goroutine from the top-level `Process` methods for each
      scheduler so that this condition can be detected without panicking the
      server process. This will lead to a loop of recovering the scheduler
      goroutine until the eval can be removed or nack'd, but that's much
      better than taking a downtime.
      f8111692
  5. 06 Feb, 2022 2 commits
  6. 05 Feb, 2022 4 commits
  7. 04 Feb, 2022 3 commits
  8. 03 Feb, 2022 3 commits