This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 17 Mar, 2022 3 commits
    • Luiz Aoqui's avatar
      cli: display Raft version in `server members` (#12317) · eca4ac67
      Luiz Aoqui authored
      The previous output of the `nomad server members` command would output a
      column named `Protocol` that displayed the Serf protocol being currently
      used by servers.
      
      This is not a configurable option, so it holds very little value to
      operators. It is also easy to confuse it with the Raft Protocol version,
      which is configurable and highly relevant to operators.
      
      This commit replaces the previous `Protocol` column with the new `Raft
      Version`. It also updates the `-detailed` flag to be called `-verbose`
      so it matches other commands. The detailed output now also outputs the
      same information as the standard output with the addition of the
      previous `Protocol` column and `Tags`.
      eca4ac67
    • Luiz Aoqui's avatar
      api: add related evals to eval details (#12305) · 81687c1c
      Luiz Aoqui authored
      
      The `related` query param is used to indicate that the request should
      return a list of related (next, previous, and blocked) evaluations.
      Co-authored-by: default avatarJasmine Dahilig <jasmine@hashicorp.com>
      81687c1c
    • Luiz Aoqui's avatar
      server: transfer leadership in case of error (#12293) · dfe520a9
      Luiz Aoqui authored
      When a Nomad server becomes the Raft leader, it must perform several
      actions defined in the establishLeadership function. If any of these
      actions fail, Raft will think the node is the leader, but it will not
      actually be able to act as a Nomad leader.
      
      In this scenario, leadership must be revoked and transferred to another
      server if possible, or the node should retry the establishLeadership
      steps.
      dfe520a9
  2. 09 Mar, 2022 1 commit
  3. 07 Mar, 2022 5 commits
    • Tim Gross's avatar
      csi: add pagination args to `volume snapshot list` (#12193) · bc40222e
      Tim Gross authored
      The snapshot list API supports pagination as part of the CSI
      specification, but we didn't have it plumbed through to the command
      line.
      bc40222e
    • Tim Gross's avatar
      CSI: allow updates to volumes on re-registration (#12167) · 7d0f87b9
      Tim Gross authored
      CSI `CreateVolume` RPC is idempotent given that the topology,
      capabilities, and parameters are unchanged. CSI volumes have many
      user-defined fields that are immutable once set, and many fields that
      are not user-settable.
      
      Update the `Register` RPC so that updating a volume via the API merges
      onto any existing volume without touching Nomad-controlled fields,
      while validating it with the same strict requirements expected for
      idempotent `CreateVolume` RPCs.
      
      Also, clarify that this state store method is used for everything, not just
      for the `Register` RPC.
      7d0f87b9
    • Tim Gross's avatar
      csi: volume snapshot list plugin option is required (#12197) · 711a9d9a
      Tim Gross authored
      The RPC for listing volume snapshots requires a plugin ID. Update the
      `volume snapshot list` command to find the specific plugin from the
      provided prefix.
      711a9d9a
    • Tim Gross's avatar
      csi: get plugin ID for creating snapshot from volume, not args (#12195) · bec44cc6
      Tim Gross authored
      The `CreateSnapshot` RPC expects a plugin ID to be set by the API, but
      in the common case of the `nomad volume snapshot create` command, we
      don't ask the user for the plugin ID because it's available from the
      volume we're snapshotting.
      
      Change the order of the RPC so that we get the volume first and then
      use the volume's plugin ID for the plugin if the API didn't set the
      value.
      bec44cc6
    • Jorge Marey's avatar
      451586af
  4. 04 Mar, 2022 1 commit
    • Tim Gross's avatar
      csi: fix prefix queries for plugin list RPC (#12194) · 9ed4d962
      Tim Gross authored
      The `CSIPlugin.List` RPC was intended to accept a prefix to filter the
      list of plugins being listed. This was being accidentally being done
      in the state store instead, which contributed to incorrect filtering
      behavior for plugins in the `volume plugin status` command.
      
      Move the prefix matching into the RPC so that it calls the
      prefix-matching method in the state store if we're looking for a
      prefix.
      
      Update the `plugin status command` to accept a prefix for the plugin
      ID argument so that it matches the expected behavior of other commands.
      9ed4d962
  5. 03 Mar, 2022 2 commits
    • Luiz Aoqui's avatar
      Fix CSI volume list with prefix and `*` namespace (#12184) · ad99a450
      Luiz Aoqui authored
      When using a prefix value and the * wildcard for namespace, the endpoint
      would not take the prefix value into consideration due to the order in
      which the checks were executed but also the logic for retrieving volumes
      from the state store.
      
      This commit changes the order to check for a prefix first and wraps the
      result iterator of the state store query in a filter to apply the
      prefix.
      ad99a450
    • Tim Gross's avatar
      csi: add missing fields to HTTP API response (#12178) · cd928d2c
      Tim Gross authored
      The HTTP endpoint for CSI manually serializes the internal struct to
      the API struct for purposes of redaction (see also #10470). Add fields
      that were missing from this serialization so they don't show up as
      always empty in the API response.
      cd928d2c
  6. 01 Mar, 2022 5 commits
  7. 25 Feb, 2022 1 commit
  8. 24 Feb, 2022 7 commits
    • Seth Hoenig's avatar
      tests: deflake test that joins a server with non-voting servers to form qourum · bd03d254
      Seth Hoenig authored
      This PR
       - upgrades the serf library
       - has the test start the join process using the un-joined server first
       - disables schedulers on the servers
       - uses the WaitForLeader and wantPeers helpers
      
      Not sure which, if any of these actually improves the flakiness of this test.
      bd03d254
    • Tim Gross's avatar
      CSI: display plugin capabilities in verbose status (#12116) · 59f6c753
      Tim Gross authored
      The behaviors of CSI plugins are governed by their capabilities as
      defined by the CSI specification. When debugging plugin issues, it's
      useful to know which behaviors are expected so they can be matched
      against RPC calls made to the plugin allocations.
      
      Expose the plugin capabilities as named in the CSI spec in the `nomad
      plugin status -verbose` output.
      59f6c753
    • Tim Gross's avatar
      CSI: retry claims from client when max claims are reached (#12113) · 649f1e39
      Tim Gross authored
      When the alloc runner claims a volume, an allocation for a previous
      version of the job may still have the volume claimed because it's
      still shutting down. In this case we'll receive an error from the
      server. Retry this error until we succeed or until a very long timeout
      expires, to give operators a chance to recover broken plugins.
      
      Make the alloc runner hook tolerant of temporary RPC failures.
      649f1e39
    • Tim Gross's avatar
      CSI: enforce usage at claim time (#12112) · 6b6b8279
      Tim Gross authored
      * Remove redundant schedulable check in `FreeWriteClaims`. If a volume
        has been created but not yet claimed, its capabilities will be checked
        in `WriteSchedulable` at both scheduling time and claim time. We don't
        need to also check them in the `FreeWriteClaims` method.
      
      * Enforce maximum volume claims for writers.
      
        When the scheduler checks feasibility for CSI volumes, the check is
        fairly loose: earlier versions of the same job are not counted as
        active claims. This allows the scheduler to place new allocations
        for the new version of a job, under the assumption that we'll replace
        the existing allocations and their volume claims.
      
        But when the alloc runner claims the volume, we need to enforce the
        active claims even if they're for allocations of an earlier version of
        the job. Otherwise we'll try to mount a volume that's currently being
        unmounted, and this will cause replacement allocations to frequently
        fail.
      
      * Enforce ...
      6b6b8279
    • Sander Mol's avatar
    • Florian Apolloner's avatar
    • Tim Gross's avatar
      csi: tolerate missing plugins on job delete (#12114) · bfbb6509
      Tim Gross authored
      If a plugin job fails before successfully fingerprinting the plugins,
      the plugin will not exist when we try to delete the job. Tolerate
      missing plugins.
      bfbb6509
  9. 23 Feb, 2022 3 commits
    • Seth Hoenig's avatar
      agent: switch to go.etc.io/bbolt for state store · b2fe196e
      Seth Hoenig authored
      This PR modifies the server and client agents to use `go.etc.io/bbolt` as the
      implementation for their state stores.
      b2fe196e
    • Seth Hoenig's avatar
      core: switch to go.etc.io/bbolt · 16efcf4e
      Seth Hoenig authored
      This PR swaps the underlying BoltDB implementation from boltdb/bolt
      to go.etc.io/bbolt.
      
      In addition, the Server has a new configuration option for disabling
      NoFreelistSync on the underlying database.
      
      Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
      Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
      16efcf4e
    • Tim Gross's avatar
      CSI: allow for concurrent plugin allocations (#12078) · 7bcf0afd
      Tim Gross authored
      The dynamic plugin registry assumes that plugins are singletons, which
      matches the behavior of other Nomad plugins. But because dynamic
      plugins like CSI are implemented by allocations, we need to handle the
      possibility of multiple allocations for a given plugin type + ID, as
      well as behaviors around interleaved allocation starts and stops.
      
      Update the data structure for the dynamic registry so that more recent
      allocations take over as the instance manager singleton, but we still
      preserve the previous running allocations so that restores work
      without racing.
      
      Multiple allocations can run on a client for the same plugin, even if
      only during updates. Provide each plugin task a unique path for the
      control socket so that the tasks don't interfere with each other.
      7bcf0afd
  10. 19 Feb, 2022 1 commit
  11. 18 Feb, 2022 2 commits
    • Seth Hoenig's avatar
      connect: bootstrap envoy using -proxy-id · efee15f1
      Seth Hoenig authored
      This PR modifies the Consul CLI arguments used to bootstrap envoy for
      Connect sidecars to make use of '-proxy-id' instead of '-sidecar-for'.
      
      Nomad registers the sidecar service, so we know what ID it has. The
      '-sidecar-for' was intended for use when you only know the name of the
      service for which the sidecar is being created.
      
      The improvement here is that using '-proxy-id' does not require an underlying
      request for listing Consul services. This will make make the interaction
      between Nomad and Consul more efficient.
      
      Closes #10452
      efee15f1
    • Michael Schurter's avatar
      connect: write envoy bootstrap debugging info · d4767807
      Michael Schurter authored
      When Consul Connect just works, it's wonderful. When it doesn't work it
      can be exceeding difficult to debug: operators have to check task
      events, Nomad logs, Consul logs, Consul APIs, and even then critical
      information is missing.
      
      Using Consul to generate a bootstrap config for Envoy is notoriously
      difficult. Nomad doesn't even log stderr, so operators are left trying
      to piece together what went wrong.
      
      This patch attempts to provide *maximal* context which unfortunately
      includes secrets. **Secrets are always restricted to the secrets/
      directory.** This makes debugging a little harder, but allows operators
      to know exactly what operation Nomad was trying to perform.
      
      What's added:
      
      - stderr is sent to alloc/logs/envoy_bootstrap.stderr.0
      - the CLI is written to secrets/.envoy_bootstrap.cmd
      - the environment is written to secrets/.envoy_bootstrap.env as JSON
      
      Accessing this information is unfortunately awkward:
      ```
      nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.env
      nomad alloc exec -task connect-proxy-count-countdash b36a cat secrets/.envoy_bootstrap.cmd
      nomad alloc fs b36a alloc/logs/envoy_bootstrap.stderr.0
      ```
      
      The above assumes an alloc id that starts with `b36a` and a Connect
      sidecar proxy for a service named `count-countdash`.
      
      If the alloc is unable to start successfully, the debugging files are
      only accessible from the host filesystem.
      d4767807
  12. 17 Feb, 2022 1 commit
  13. 16 Feb, 2022 3 commits
  14. 15 Feb, 2022 4 commits
    • Tim Gross's avatar
      CSI: make gRPC client creation more robust (#12057) · b775a73d
      Tim Gross authored
      Nomad communicates with CSI plugin tasks via gRPC. The plugin
      supervisor hook uses this to ping the plugin for health checks which
      it emits as task events. After the first successful health check the
      plugin supervisor registers the plugin in the client's dynamic plugin
      registry, which in turn creates a CSI plugin manager instance that has
      its own gRPC client for fingerprinting the plugin and sending mount
      requests.
      
      If the plugin manager instance fails to connect to the plugin on its
      first attempt, it exits. The plugin supervisor hook is unaware that
      connection failed so long as its own pings continue to work. A
      transient failure during plugin startup may mislead the plugin
      supervisor hook into thinking the plugin is up (so there's no need to
      restart the allocation) but no fingerprinter is started.
      
      * Refactors the gRPC client to connect on first use. This provides the
        plugin manager instance the ability to retry the gRPC client
        conn...
      b775a73d
    • Seth Hoenig's avatar
      api: return sorted results in certain list endpoints · b432f377
      Seth Hoenig authored
      These API endpoints now return results in chronological order. They
      can return results in reverse chronological order by setting the
      query parameter ascending=true.
      
      - Eval.List
      - Deployment.List
      b432f377
    • Seth Hoenig's avatar
      cl: shorten changelog entry · 5ac59de9
      Seth Hoenig authored
      5ac59de9
    • Tim Gross's avatar
      changelog entry (#12072) · 7c027503
      Tim Gross authored
      7c027503
  15. 11 Feb, 2022 1 commit
    • Tim Gross's avatar
      csi: volume cli prefix matching should accept exact match (#12051) · 4afc67b7
      Tim Gross authored
      The `volume detach`, `volume deregister`, and `volume status` commands
      accept a prefix argument for the volume ID. Update the behavior on
      exact matches so that if there is more than one volume that matches
      the prefix, we should only return an error if one of the volume IDs is
      not an exact match. Otherwise we won't be able to use these commands
      at all on those volumes. This also makes the behavior of these commands
      consistent with `job stop`.
      4afc67b7