Unverified Commit f8111692 authored by Tim Gross's avatar Tim Gross Committed by GitHub
Browse files

scheduler: recover from panic (#12009)

If processing a specific evaluation causes the scheduler (and
therefore the entire server) to panic, that evaluation will never
get a chance to be nack'd and cleared from the state store. It will
get dequeued by another scheduler, causing that server to panic, and
so forth until all servers are in a panic loop. This prevents the
operator from intervening to remove the evaluation or update the
state.

Recover the goroutine from the top-level `Process` methods for each
scheduler so that this condition can be detected without panicking the
server process. This will lead to a loop of recovering the scheduler
goroutine until the eval can be removed or nack'd, but that's much
better than taking a downtime.
parent 0263650f
No related merge requests found
Showing with 18 additions and 2 deletions
+18 -2
```release-note:improvement
scheduler: recover scheduler goroutines on panic
```
......@@ -125,7 +125,14 @@ func NewBatchScheduler(logger log.Logger, eventsCh chan<- interface{}, state Sta
}
// Process is used to handle a single evaluation
func (s *GenericScheduler) Process(eval *structs.Evaluation) error {
func (s *GenericScheduler) Process(eval *structs.Evaluation) (err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("processing eval %q panicked scheduler - please report this as a bug! - %v", eval.ID, r)
}
}()
// Store the evaluation
s.eval = eval
......
......@@ -72,7 +72,13 @@ func NewSysBatchScheduler(logger log.Logger, eventsCh chan<- interface{}, state
}
// Process is used to handle a single evaluation.
func (s *SystemScheduler) Process(eval *structs.Evaluation) error {
func (s *SystemScheduler) Process(eval *structs.Evaluation) (err error) {
defer func() {
if r := recover(); r != nil {
err = fmt.Errorf("processing eval %q panicked scheduler - please report this as a bug! - %v", eval.ID, r)
}
}()
// Store the evaluation
s.eval = eval
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment