Skip to content

recourse

recourse (n.): a source of help or strength.

Policy-driven, observable resilience for Go services: retries, hedging, circuit breaking, and budgets.

New here? Start with Design overview, then move to Getting started.

Why recourse?

Retry logic is deceptively easy to write and notoriously hard to operate.

  • Resilience code drifts: each service ends up with slightly different retry semantics, timeouts, logging, and metrics.
  • Naive retries amplify outages: retries turn “a little latency” into “a lot more load” unless there is explicit backpressure.
  • Protocol/domain semantics matter: a timeout, a 429, and a 404 should not all be treated the same.
  • Debuggability is non-negotiable: when an incident happens, you need to answer “what happened on each attempt, and why?”.

recourse centralizes resilience behavior behind a low-cardinality policy key and makes every decision observable.

Concretely, recourse gives you:

  • Deterministic envelopes: bounded attempts, bounded backoff, and explicit timeouts.
  • Domain-aware retry decisions: pluggable classifiers (instead of “retry on any error”).
  • Backpressure: per-attempt budgets to prevent retry storms.
  • Structured observability: timelines and hooks that make behavior explainable in production.

The problem with ad-hoc retries

This is a common shape:

var lastErr error
for attempt := 0; attempt < 3; attempt++ {
    if err := callDependency(ctx); err == nil {
        return nil
    } else {
        lastErr = err
    }
    time.Sleep(time.Duration(attempt+1) * 50 * time.Millisecond)
}
return lastErr

But production requirements pile up quickly:

  • Is the error retryable (HTTP 500 vs 404 vs 429)? Should we treat context.Canceled differently?
  • Are we respecting per-attempt timeouts vs overall timeouts?
  • Are we emitting consistent logs/metrics/traces across all call sites?
  • Do we have backpressure so retries don’t turn an outage into a storm?
  • When this goes wrong at 3am, can we reconstruct the exact sequence of attempts and decisions?

What “policy-driven” means

In recourse, call sites supply a key (e.g., "payments.Charge"). Policies decide the retry envelope for that key:

  • maximum attempts
  • backoff/jitter
  • per-attempt and overall timeouts
  • classifier selection (how to interpret errors/results)
  • optional budgets/backpressure (whether to allow each attempt)

This makes behavior consistent, tunable, and observable without re-implementing retry loops everywhere.

Quick start

The facade API takes a string key like "user-service.GetUser":

package main

import (
    "context"

    "github.com/aponysus/recourse/recourse"
)

type User struct{ ID string }

func main() {
    user, err := recourse.DoValue[User](context.Background(), "user-service.GetUser", func(ctx context.Context) (User, error) {
        // call dependency here
        return User{ID: "123"}, nil
    })
    _ = user
    _ = err
}

When you need to know what happened, request a timeline:

ctx, capture := observe.RecordTimeline(ctx)
user, err := recourse.DoValue(ctx, "user-service.GetUser", op)
_ = user
_ = err

tl := capture.Timeline()
for _, a := range tl.Attempts {
    // a.Attempt, a.Outcome, a.BudgetAllowed, a.Backoff, a.Err, ...
}

Observability-first

Retries are only “safe” if they are observable.

recourse captures a structured observe.Timeline (attempt timings, outcomes, budget decisions, errors) and can also stream attempt/timeline events to your own logging/metrics/tracing via observe.Observer.

What’s inside

  • Policy keys: stable, low-cardinality keys ("svc.Method") that select behavior.
  • Policies + providers: policy.EffectivePolicy resolved via controlplane.PolicyProvider (today: in-process static).
  • Retry executor: bounded attempts with backoff/jitter and per-attempt/overall timeouts.
  • Classifiers: pluggable (value, err) → Outcome so retry decisions are protocol/domain-aware.
  • Budgets/backpressure: per-attempt gates to prevent retry storms (with optional release semantics).
  • Observability: structured observe.Timeline plus streaming hooks via observe.Observer.

Where to go next