This project is mirrored from https://gitee.com/mirrors/nomad.git. Pull mirroring failed .
Repository mirroring has been paused due to too many failed attempts. It can be resumed by a project maintainer.
  1. 29 Aug, 2022 1 commit
  2. 26 Aug, 2022 6 commits
  3. 25 Aug, 2022 5 commits
  4. 24 Aug, 2022 13 commits
    • Luiz Aoqui's avatar
      ui: task lifecycle restart all tasks (#14223) · 546bdb8b
      Luiz Aoqui authored
      Now that tasks that have finished running can be restarted, the UI needs
      to use the actual task state to determine which CSS class to use when
      rendering the task lifecycle chart element.
      546bdb8b
    • Luiz Aoqui's avatar
      Task lifecycle restart (#14127) · f74f5080
      Luiz Aoqui authored
      * allocrunner: handle lifecycle when all tasks die
      
      When all tasks die the Coordinator must transition to its terminal
      state, coordinatorStatePoststop, to unblock poststop tasks. Since this
      could happen at any time (for example, a prestart task dies), all states
      must be able to transition to this terminal state.
      
      * allocrunner: implement different alloc restarts
      
      Add a new alloc restart mode where all tasks are restarted, even if they
      have already exited. Also unifies the alloc restart logic to use the
      implementation that restarts tasks concurrently and ignores
      ErrTaskNotRunning errors since those are expected when restarting the
      allocation.
      
      * allocrunner: allow tasks to run again
      
      Prevent the task runner Run() method from exiting to allow a dead task
      to run again. When the task runner is signaled to restart, the function
      will jump back to the MAIN loop and run it again.
      
      The task runner determines if a task needs to run again based on two new
      task events that were added to differentiate between a request to
      restart a specific task, the tasks that are currently running, or all
      tasks that have already run.
      
      * api/cli: add support for all tasks alloc restart
      
      Implement the new -all-tasks alloc restart CLI flag and its API
      counterpar, AllTasks. The client endpoint calls the appropriate restart
      method from the allocrunner depending on the restart parameters used.
      
      * test: fix tasklifecycle Coordinator test
      
      * allocrunner: kill taskrunners if all tasks are dead
      
      When all non-poststop tasks are dead we need to kill the taskrunners so
      we don't leak their goroutines, which are blocked in the alloc restart
      loop. This also ensures the allocrunner exits on its own.
      
      * taskrunner: fix tests that waited on WaitCh
      
      Now that "dead" tasks may run again, the taskrunner Run() method will
      not return when the task finishes running, so tests must wait for the
      task state to be "dead" instead of using the WaitCh, since it won't be
      closed until the taskrunner is killed.
      
      * tests: add tests for all tasks alloc restart
      
      * changelog: add entry for #14127
      
      * taskrunner: fix restore logic.
      
      The first implementation of the task runner restore process relied on
      server data (`tr.Alloc().TerminalStatus()`) which may not be available
      to the client at the time of restore.
      
      It also had the incorrect code path. When restoring a dead task the
      driver handle always needs to be clear cleanly using `clearDriverHandle`
      otherwise, after exiting the MAIN loop, the task may be killed by
      `tr.handleKill`.
      
      The fix is to store the state of the Run() loop in the task runner local
      client state: if the task runner ever exits this loop cleanly (not with
      a shutdown) it will never be able to run again. So if the Run() loops
      starts with this local state flag set, it must exit early.
      
      This local state flag is also being checked on task restart requests. If
      the task is "dead" and its Run() loop is not active it will never be
      able to run again.
      
      * address code review requests
      
      * apply more code review changes
      
      * taskrunner: add different Restart modes
      
      Using the task event to differentiate between the allocrunner restart
      methods proved to be confusing for developers to understand how it all
      worked.
      
      So instead of relying on the event type, this commit separated the logic
      of restarting an taskRunner into two methods:
      - `Restart` will retain the current behaviour and only will only restart
        the task if it's currently running.
      - `ForceRestart` is the new method where a `dead` task is allowed to
        restart if its `Run()` method is still active. Callers will need to
        restart the allocRunner taskCoordinator to make sure it will allow the
        task to run again.
      
      * minor fixes
      f74f5080
    • Tim Gross's avatar
      vault: detect namespace change in config reload (#14298) · e886d5d0
      Tim Gross authored
      The `namespace` field was not included in the equality check between old and new
      Vault configurations, which meant that a Vault config change that only changed
      the namespace would not be detected as a change and the clients would not be
      reloaded.
      
      Also, the comparison for boolean fields such as `enabled` and
      `allow_unauthenticated` was on the pointer and not the value of that pointer,
      which results in spurious reloads in real config reload that is easily missed in
      typical test scenarios.
      
      Includes a minor refactor of the order of fields for `Copy` and `Merge` to match
      the struct fields in hopes it makes it harder to make this mistake in the
      future, as well as additional test coverage.
      e886d5d0
    • Seth Hoenig's avatar
      Merge pull request #14283 from hashicorp/f-java-corretto-test-case · ee501f4f
      Seth Hoenig authored
      drivers/java: add parsing test case for corretto 17
      ee501f4f
    • Seth Hoenig's avatar
      Merge pull request #14297 from hashicorp/b-logmon-fork-mystery-bin · bb72d81a
      Seth Hoenig authored
      client/logmon: acquire executable in init block
      bb72d81a
    • Seth Hoenig's avatar
      testing: fix flakey check status test · 3ae6db66
      Seth Hoenig authored
      This PR fixes a flakey test where we did not wait on the check
      status to actually become failing (go too fast and you just get
      a pending check).
      
      Instead add a helper for waiting on any check in the alloc to become
      the state we are looking for.
      3ae6db66
    • Seth Hoenig's avatar
      client/logmon: acquire executable in init block · 24a1c48f
      Seth Hoenig authored
      This PR causes the logmon task runner to acquire the binary of the
      Nomad executable in an 'init' block, so as to almost certainly get
      the name while the nomad file still exists.
      
      This is an attempt at fixing the case where a deleted Nomad file
      (e.g. during upgrade) may be getting renamed with a mysterious
      suffix first.
      
      If this doesn't work, as a last resort we can literally just trim
      the mystery string.
      
      Fixes: #14079
      24a1c48f
    • Piotr Kazmierczak's avatar
      template: custom change_mode scripts (#13972) · 34e4b080
      Piotr Kazmierczak authored
      This PR adds the functionality of allowing custom scripts to be executed on template change. Resolves #2707
      34e4b080
    • Luiz Aoqui's avatar
      9e775824
    • Luiz Aoqui's avatar
      deps: sync versions of go-discover in go.mod (#14269) · abeeecbe
      Luiz Aoqui authored
      In #13491 the version of `go-discover` was updated in `go.mod` but the
      comment above it mentions that it also needs to be updated in the
      `replace` directive.
      abeeecbe
    • Piotr Kazmierczak's avatar
      docs: Update upgrade guide to reflect enterprise changes introduced in nomad-enterprise (#14212) · 2d4acce3
      Piotr Kazmierczak authored
      This PR documents a change made in the enterprise version of nomad that addresses the following issue:
      
      When a user tries to filter audit logs, they do so with a stanza that looks like the following:
      
      audit {
        enabled = true
      
        filter "remove deletes" {
          type = "HTTPEvent"
          endpoints  = ["*"]
          stages = ["OperationComplete"]
          operations = ["DELETE"]
        }
      }
      
      When specifying both an "endpoint" and a "stage", the events with both matching a "endpoint" AND a matching "stage" will be filtered.
      
      When specifying both an "endpoint" and an "operation" the events with both matching a "endpoint" AND a matching "operation" will be filtered.
      
      When specifying both a "stage" and an "operation" the events with a matching a "stage" OR a matching "operation" will be filtered.
      
      The "OR" logic with stages and operations is unexpected and doesn't allow customers to get specific on which events they want to filter. For instance the following use-case is impossible to achieve: "I want to filter out all OperationReceived events that have the DELETE verb".
      2d4acce3
    • Seth Hoenig's avatar
      cd4be96b
    • Seth Hoenig's avatar
      build: set osusergo build tag by default (#14248) · 2d425bf2
      Seth Hoenig authored
      This PR activates the osuergo build tag in GNUMakefile. This forces the os/user
      package to be compiled without CGO. Doing so seems to resolve a race condition
      in getpwnam_r that causes alloc creation to hang or panic on `user.Lookup("nobody")`.
      2d425bf2
  5. 23 Aug, 2022 8 commits
  6. 22 Aug, 2022 7 commits
    • Luiz Aoqui's avatar
      allocrunner: refactor task coordinator (#14009) · 6070fa0c
      Luiz Aoqui authored
      The current implementation for the task coordinator unblocks tasks by
      performing destructive operations over its internal state (like closing
      channels and deleting maps from keys).
      
      This presents a problem in situations where we would like to revert the
      state of a task, such as when restarting an allocation with tasks that
      have already exited.
      
      With this new implementation the task coordinator behaves more like a
      finite state machine where task may be blocked/unblocked multiple times
      by performing a state transition.
      
      This initial part of the work only refactors the task coordinator and
      is functionally equivalent to the previous implementation. Future work
      will build upon this to provide bug fixes and enhancements.
      6070fa0c
    • Phil Renaud's avatar
    • Tim Gross's avatar
      allow ACL policies to be associated with workload identity (#14140) · 2eaf3d72
      Tim Gross authored
      The original design for workload identities and ACLs allows for operators to
      extend the automatic capabilities of a workload by using a specially-named
      policy. This has shown to be potentially unsafe because of naming collisions, so
      instead we'll allow operators to explicitly attach a policy to a workload
      identity.
      
      This changeset adds workload identity fields to ACL policy objects and threads
      that all the way down to the command line. It also a new secondary index to the
      ACL policy table on namespace and job so that claim resolution can efficiently
      query for related policies.
      2eaf3d72
    • Charlie Voiselle's avatar
      ab67f30e
    • Charlie Voiselle's avatar
      Add .0 to make goenv happy (#14218) · 36c1c686
      Charlie Voiselle authored
      36c1c686
    • Seth Hoenig's avatar
      Merge pull request #14219 from hashicorp/e2e-nsd-checks · 422971af
      Seth Hoenig authored
      e2e: add e2e tests for nomad service disco checks
      422971af
    • Seth Hoenig's avatar
      e2e: add e2e tests for nomad service disco checks · f8beb534
      Seth Hoenig authored
      This PR adds 2 e2e tests for ensuring nomad service discovery checks
      get created and produce status results as expected.
      f8beb534