This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 08 Mar, 2022 1 commit
  2. 07 Mar, 2022 1 commit
  3. 01 Mar, 2022 1 commit
  4. 28 Feb, 2022 1 commit
  5. 18 Feb, 2022 1 commit
  6. 01 Feb, 2022 1 commit
  7. 31 Jan, 2022 3 commits
  8. 28 Jan, 2022 10 commits
    • Tim Gross's avatar
      docs: add 1.1.17 to changelog · beb996b3
      Tim Gross authored
      beb996b3
    • Tim Gross's avatar
      docs: missing changelog for #11892 (#11959) · 8665724c
      Tim Gross authored
      8665724c
    • Tim Gross's avatar
      45b7aab8
    • Tim Gross's avatar
      CSI: node unmount from the client before unpublish RPC (#11892) · 136e2a8e
      Tim Gross authored
      When an allocation stops, the `csi_hook` makes an unpublish RPC to the
      servers to unpublish via the CSI RPCs: first to the node plugins and
      then the controller plugins. The controller RPCs must happen after the
      node RPCs so that the node has had a chance to unmount the volume
      before the controller tries to detach the associated device.
      
      But the client has local access to the node plugins and can
      independently determine if it's safe to send unpublish RPC to those
      plugins. This will allow the server to treat the node plugin as
      abandoned if a client is disconnected and `stop_on_client_disconnect`
      is set. This will let the server try to send unpublish RPCs to the
      controller plugins, under the assumption that the client will be
      trying to unmount the volume on its end first.
      
      Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can
      return ignorable errors in the case where the volume has already been
      unmounted from the node. Handle all ...
      136e2a8e
    • Tim Gross's avatar
      CSI: tests to exercise csi_hook (#11788) · 37dea7c2
      Tim Gross authored
      Small refactoring of the allocrunner hook for CSI to make it more
      testable, and a unit test that covers most of its logic.
      37dea7c2
    • Tim Gross's avatar
      CSI: resolve invalid claim states (#11890) · 9154033a
      Tim Gross authored
      * csi: resolve invalid claim states on read
      
      It's currently possible for CSI volumes to be claimed by allocations
      that no longer exist. This changeset asserts a reasonable state at
      the state store level by registering these nil allocations as "past
      claims" on any read. This will cause any pass through the periodic GC
      or volumewatcher to trigger the unpublishing workflow for those claims.
      
      * csi: make feasibility check errors more understandable
      
      When the feasibility checker finds we have no free write claims, it
      checks to see if any of those claims are for the job we're currently
      scheduling (so that earlier versions of a job can't block claims for
      new versions) and reports a conflict if the volume can't be scheduled
      so that the user can fix their claims. But when the checker hits a
      claim that has a GCd allocation, the state is recoverable by the
      server once claim reaping completes and no user intervention is
      required; the blocked eval should complete. Differentiate the
      scheduler error produced by these two conditions.
      9154033a
    • Mahmood Ali's avatar
      tests: use standard library testing.TB · 73740b28
      Mahmood Ali authored
      Glint pulled in an updated version of mitchellh/go-testing-interface
      which broke some existing tests because the update added a Parallel()
      method to testing.T. This switches to the standard library testing.TB
      which doesn't have a Parallel() method.
      73740b28
    • Tim Gross's avatar
      csi: update leader's ACL in volumewatcher (#11891) · 0d8952a3
      Tim Gross authored
      The volumewatcher that runs on the leader needs to make RPC calls
      rather than writing to raft (as we do in the deploymentwatcher)
      because the unpublish workflow needs to make RPC calls to the
      clients. This requires that the volumewatcher has access to the
      leader's ACL token.
      
      But when leadership transitions, the new leader creates a new leader
      ACL token. This ACL token needs to be passed into the volumewatcher
      when we enable it, otherwise the volumewatcher can find itself with a
      stale token.
      0d8952a3
    • Tim Gross's avatar
      csi: reap unused volume claims at leadership transitions (#11776) · d30ceb6f
      Tim Gross authored
      When `volumewatcher.Watcher` starts on the leader, it starts a watch
      on every volume and triggers a reap of unused claims on any change to
      that volume. But if a reaping is in-flight during leadership
      transitions, it will fail and the event that triggered the reap will
      be dropped. Perform one reap of unused claims at the start of the
      watcher so that leadership transitions don't drop this event.
      d30ceb6f
    • James Rasell's avatar
      Merge pull request #10752 from hashicorp/b-fix-test-datarace-volumewatcher · 628959b3
      James Rasell authored
      volumewatcher: fix test data race.
      628959b3
  9. 18 Jan, 2022 5 commits
  10. 17 Jan, 2022 16 commits
    • Tim Gross's avatar
      drivers: set world-readable permissions on copied resolv.conf (#11856) · e7caa8cf
      Tim Gross authored
      When we copy the system DNS to a task's `resolv.conf`, we should set
      the permissions as world-readable so that unprivileged users within
      the task can read it.
      e7caa8cf
    • Tim Gross's avatar
      freebsd: build fix for ARM7 32-bit (#11854) · 30ac5ae1
      Tim Gross authored
      The size of `stat_t` fields is architecture dependent, which was
      reportedly causing a build failure on FreeBSD ARM7 32-bit
      systems. This changeset matches the behavior we have on Linux.
      30ac5ae1
    • Tim Gross's avatar
      csi: when warning for multiple prefix matches, use full ID (#11853) · 3f059258
      Tim Gross authored
      When the `volume deregister` or `volume detach` commands get an ID
      prefix that matches multiple volumes, show the full length of the
      volume IDs in the list of volumes shown so so that the user can select
      the correct one.
      3f059258
    • Tim Gross's avatar
      csi: volume deregistration should require exact ID (#11852) · 77f4a254
      Tim Gross authored
      The command line client sends a specific volume ID, but this isn't
      enforced at the API level and we were incorrectly using a prefix match
      for volume deregistration, resulting in cases where a volume with a
      shorter ID that's a prefix of another volume would be deregistered
      instead of the intended volume.
      77f4a254
    • grembo's avatar
      Un-break templates when using vault stanza change_mode noop (#11783) · c1b4238f
      grembo authored
      Templates in nomad jobs make use of the vault token defined in
      the vault stanza when issuing credentials like client certificates.
      
      When using change_mode "noop" in the vault stanza, consul-template
      is not informed in case a vault token is re-issued (which can
      happen from time to time for various reasons, as described
      in https://www.nomadproject.io/docs/job-specification/vault).
      
      As a result, consul-template will keep using the old vault token
      to renew credentials and - once the token expired - stop renewing
      credentials. The symptom of this problem is a vault_token
      file that is newer than the issued credential (e.g., TLS certificate)
      in a job's /secrets directory.
      
      This change corrects this, so that h.updater.updatedVaultToken(token)
      is called, which will inform stakeholders about the new
      token and make sure, the new token is used by consul-template.
      
      Example job template fragment:
      
          vault {
              policies = ["nomad-job-policy"]
              change_mode = "noop"
          }
      
          template {
            data = <<-EOH
              {{ with secret "pki_int/issue/nomad-job"
              "common_name=myjob.service.consul" "ttl=90m"
              "alt_names=localhost" "ip_sans=127.0.0.1"}}
              {{ .Data.certificate }}
              {{ .Data.private_key }}
              {{ .Data.issuing_ca }}
              {{ end }}
            EOH
            destination = "${NOMAD_SECRETS_DIR}/myjob.crt"
            change_mode = "noop"
          }
      
      This fix does not alter the meaning of the three change modes of vault
      
      - "noop" - Take no action
      - "restart" - Restart the job
      - "signal" - send a signal to the task
      
      as the switch statement following line 232 contains the necessary
      logic.
      
      It is assumed that "take no action" was never meant to mean "don't tell
      consul-template about the new vault token".
      
      Successfully tested in a staging cluster consisting of multiple
      nomad client nodes.
      c1b4238f
    • Tim Gross's avatar
      task runner: fix goroutine leak in prestart hook (#11741) · 1a58d176
      Tim Gross authored
      The task runner prestart hooks take a `joincontext` so they have the
      option to exit early if either of two contexts are canceled: from
      killing the task or client shutdown. Some tasks exit without being
      shutdown from the server, so neither of the joined contexts ever gets
      canceled and we leak the `joincontext` (48 bytes) and its internal
      goroutine. This primarily impacts batch jobs and any task that fails
      or completes early such as non-sidecar prestart lifecycle tasks.
      Cancel the `joincontext` after the prestart call exits to fix the
      leak.
      1a58d176
    • Luiz Aoqui's avatar
    • Michael Schurter's avatar
      Merge pull request #11830 from hashicorp/b-validate-reserved-ports · 6cf2c529
      Michael Schurter authored
      agent: validate reserved_ports are valid
      6cf2c529
    • Michael Schurter's avatar
      Merge pull request #11833 from hashicorp/deps-go-getter-v1.5.11 · a7df948f
      Michael Schurter authored
      deps: update go-getter to v1.5.11
      a7df948f
    • Luiz Aoqui's avatar
      4c5ff512
    • Tim Gross's avatar
      cli: ensure `-stale` flag is respected by `nomad operator debug` (#11678) · 6cf4af9b
      Tim Gross authored
      When a cluster doesn't have a leader, the `nomad operator debug`
      command can safely use stale queries to gracefully degrade the
      consistency of almost all its queries. The query parameter for these
      API calls was not being set by the command.
      
      Some `api` package queries do not include `QueryOptions` because
      they target a specific agent, but they can potentially be forwarded to
      other agents. If there is no leader, these forwarded queries will
      fail. Provide methods to call these APIs with `QueryOptions`.
      6cf4af9b
    • dependabot[bot]'s avatar
      build(deps): bump github.com/hashicorp/cronexpr from 1.1.0 to 1.1.1 in /api (#11132) · 4c0d26ac
      dependabot[bot] authored
      * build(deps): bump github.com/hashicorp/cronexpr in /api
      
      Bumps [github.com/hashicorp/cronexpr](https://github.com/hashicorp/cronexpr) from 1.1.0 to 1.1.1.
      - [Release notes](https://github.com/hashicorp/cronexpr/releases)
      - [Commits](https://github.com/hashicorp/cronexpr/compare/v1.1.0...v1.1.1
      
      )
      
      ---
      updated-dependencies:
      - dependency-name: github.com/hashicorp/cronexpr
        dependency-type: direct:production
        update-type: version-update:semver-patch
      ...
      Signed-off-by: default avatardependabot[bot] <support@github.com>
      
      * go mod tidy
      Co-authored-by: default avatardependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
      Co-authored-by: default avatarTim Gross <tim@0x74696d.com>
      4c0d26ac
    • Luiz Aoqui's avatar
      changelog: add entry for #11793 (#11862) · 0312c4ab
      Luiz Aoqui authored
      0312c4ab
    • James Rasell's avatar
      Merge pull request #11849 from hashicorp/b-changelog-11848 · 90d3f577
      James Rasell authored
      changelog: add entry for #11848
      90d3f577
    • Tim Gross's avatar
      6016325f
    • Tim Gross's avatar
      scheduler: fix quadratic performance with spread blocks (#11712) · 4a11b079
      Tim Gross authored
      When the scheduler picks a node for each evaluation, the
      `LimitIterator` provides at most 2 eligible nodes for the
      `MaxScoreIterator` to choose from. This keeps scheduling fast while
      producing acceptable results because the results are binpacked.
      
      Jobs with a `spread` block (or node affinity) remove this limit in
      order to produce correct spread scoring. This means that every
      allocation within a job with a `spread` block is evaluated against
      _all_ eligible nodes. Operators of large clusters have reported that
      jobs with `spread` blocks that are eligible on a large number of nodes
      can take longer than the nack timeout to evaluate (60s). Typical
      evaluations are processed in milliseconds.
      
      In practice, it's not necessary to evaluate every eligible node for
      every allocation on large clusters, because the `RandomIterator` at
      the base of the scheduler stack produces enough variation in each pass
      that the likelihood of an uneven spread is negligible. Note that
      feasibility is checked before the limit, so this only impacts the
      number of _eligible_ nodes available for scoring, not the total number
      of nodes.
      
      This changeset sets the iterator limit for "large" `spread` block and
      node affinity jobs to be equal to the number of desired
      allocations. This brings an example problematic job evaluation down
      from ~3min to ~10s. The included tests ensure that we have acceptable
      spread results across a variety of large cluster topologies.
      4a11b079