TRUST & SAFETY

Human-in-the-Loop in AI SRE: When to Trust, When to Override

Building the right protocols for AI-human collaboration in production environments

By Shafi KhanJune 18, 202514 min read

The scariest moment in my AI SRE career wasn't when the system failed. It was when a junior engineer asked: "The AI wants to rollback production. Should I approve it?" And I realized we had no protocol for that decision.

Human-in-the-loop (HIL) sounds simple: "Keep humans involved in AI decisions." But how they're involved, when to trust, when to override, and how to build the right safety rails is what separates successful AI SRE from production disasters.

This guide is the protocol manual we wish we had. It's based on real production incidents, near-misses, and the hard lessons from teams running AI SRE at scale.

The Trust Spectrum: Not All Decisions Are Equal

The first mistake teams make: treating all AI actions the same. "The AI should never act without approval" or "Let the AI handle everything" are both wrong. Reality is a spectrum.

Low RiskMedium RiskHigh Risk
Auto-Approve
  • • Restart pod
  • • Clear cache
  • • Scale replicas +1
Human Approval
  • • Rollback deploy
  • • Scale replicas +5
  • • Change config
Manual Only
  • • Database migration
  • • Failover to backup region
  • • Delete data

Key insight: The goal isn't to automate everything or approve everything manually. It's to match automation level to risk level.

The Risk-Based HIL Framework

Here's the decision tree we use at AutonomOps. Every action is scored on three dimensions:

Risk FactorLow (1-3)Medium (4-6)High (7-10)
Blast RadiusSingle pod/service5-10 servicesPayment/auth/core infra
ReversibilityInstant (restart)5-10 min (rollback)Irreversible (delete)
Confidence Score>95%80-95%<80%
Compliance ImpactNoneAudit trail requiredSOX/PCI/HIPAA

Decision Matrix

Total Risk Score 3-9: Auto-approve. AI acts autonomously, posts notification to Slack.
Total Risk Score 10-20: Request approval. AI suggests action + rationale, waits for human thumbs-up/down.
Total Risk Score 21+: Escalate only. AI provides RCA + options, but does NOT suggest a specific action. Human decides.

Real-World Examples: Trust vs. Override

When to TRUST the AI

Example 1: OOM Kill → Auto-Restart

Scenario: payment-service pod killed due to OOM (out of memory)

AI Decision: "Restart pod, increase memory limit from 2GB → 4GB"

Risk Score: Blast radius (2) + Reversibility (1) + Confidence (1) = 4/30 (LOW)

✓ AUTO-APPROVED. AI restarts pod, posts Slack notification. MTTR: 30 seconds.

Example 2: Redis Timeout → Clear Cache

Scenario: auth-service timing out on Redis calls

AI Decision: "Flush Redis cache, restart Redis sentinel"

Risk Score: Blast radius (2) + Reversibility (1) + Confidence (1) = 4/30 (LOW)

✓ AUTO-APPROVED. This pattern has been seen 15 times with 100% success rate.

⚠️ When to APPROVE (with Human Judgment)

Example 3: Bad Deploy → Rollback

Scenario: v2.5.0 causing 10x error rate increase

AI Decision: "Rollback from v2.5.0 → v2.4.9"

Risk Score: Blast radius (5) + Reversibility (5) + Confidence (4) + Compliance (2) = 16/40 (MEDIUM)

⚠ APPROVAL REQUIRED. Why?

  • • v2.5.0 may have DB migrations (rollback could break schema)
  • • Need to check: Are there any one-way migrations?
  • • AI flags: "No migrations detected, safe to rollback"
  • • Human approves in 30 seconds. MTTR: 2 minutes.

Example 4: Traffic Spike → Scale Up

Scenario: Unexpected traffic spike (5x normal)

AI Decision: "Scale from 10 → 50 replicas"

Risk Score: Blast radius (2) + Reversibility (2) + Confidence (3) + Cost (6) = 13/40 (MEDIUM)

⚠ APPROVAL REQUIRED. Why?

  • • 5x scale = $1k/hour cloud cost spike
  • • Could be a DDoS attack (don't scale, enable rate limiting instead)
  • • Human checks traffic patterns → Legitimate (Black Friday sale)
  • • Approves scale-up. MTTR: 5 minutes.

🛑 When to OVERRIDE the AI

Example 5: Database Connection Pool Exhaustion

Scenario: payment-service can't connect to Postgres (pool exhausted)

AI Suggestion: "Restart payment-service pods to reset connections"

Risk Score: Blast radius (8) + Reversibility (5) + Confidence (4) + Compliance (4) = 21/40 (HIGH)

🛑 HUMAN OVERRIDE. Why?

  • • Restarting pods won't fix pool exhaustion (symptom, not cause)
  • • Human investigates: Sees 1000+ idle connections from a leaking service
  • • Correct fix: Kill the leaking service, increase pool size, add connection timeouts
  • • AI learned: Next time, it suggests this fix pattern.

Example 6: False Positive Anomaly

Scenario: AI detects "CPU spike" in ML training job

AI Suggestion: "Scale down training job (CPU at 95%)"

Risk Score: Blast radius (2) + Reversibility (2) + Confidence (7) = 11/40 (MEDIUM)

🛑 HUMAN OVERRIDE. Why?

  • • ML training jobs are SUPPOSED to use 95% CPU (that's not a problem)
  • • AI's anomaly detection doesn't understand "normal for ML workloads"
  • • Human rejects. AI learns: "training-job pods → high CPU is expected"

Break-Glass Protocols: When Humans Must Take Over

Even with perfect AI, there are moments when humans need to say: "Stop. I'm taking over."

The 3 Break-Glass Scenarios

1. AI is Making Things Worse

Trigger: AI takes an action, but error rate increases (not decreases)

→ IMMEDIATE OVERRIDE. Human hits "Stop AI + Rollback Last Action" button.

2. Novel Incident (No Confidence)

Trigger: AI confidence <60%, or incident pattern never seen before

→ AI escalates: "I don't know how to fix this. Paging senior SRE."

3. Security/Compliance Event

Trigger: Suspected breach, data leak, or audit-critical system failure

→ AI enters read-only mode. All actions require VP Eng approval + legal review.

The Override Button: UX Matters

The override button must be instant, obvious, and frictionless. No "Are you sure?" dialogs. No entering justification first. Just STOP.

AutonomOps approach: Big red "TAKE CONTROL" button in Slack and UI. One click = AI stops current action + enters read-only mode. Human takes over. Justification can be added later (for audit trail).

Building Trust Over Time: The 90-Day Ramp-Up

Trust isn't binary. Teams don't go from "AI scares us" to "AI runs prod autonomously" overnight. Here's the gradual ramp:

Days 1-30: Shadow Mode (Trust = 0%)

AI watches incidents but never acts. It posts "What I would have done" to Slack. Team reviews daily: "Would this have worked?"

Success Metric:

>80% of AI suggestions match what human eventually did. Team says: "Yeah, AI got it right."

Days 31-60: Approval Mode (Trust = 40%)

AI suggests fixes with rationale. Humans click "Approve" or "Reject." Track: approval rate, time-to-approve, and whether AI fixes work.

Success Metric:

>90% approval rate + <5% regression rate. Team starts approving in <30 seconds (not 5 minutes of deliberation).

Days 61-90: Auto-Pilot (Low Risk) (Trust = 80%)

Enable auto-remediation for proven low-risk patterns (restart pod, clear cache). AI still requests approval for medium/high-risk actions.

Success Metric:

60%+ of incidents auto-resolved without human involvement. No major regressions. Team says: "We trust it."

Days 90+: Full Augmentation (Trust = 95%)

AI handles 80%+ of incidents autonomously. Humans only paged for novel/high-risk issues. Team's role shifts from firefighting to AI tuning + strategic reliability work.

Success Metric:

MTTR <5 minutes for automated incidents. Team satisfaction >8/10. No one wants to go back to manual on-call.

Compliance & Legal Considerations

If you're in a regulated industry (finance, healthcare, critical infrastructure), HIL isn't optional—it's legally required in some cases.

Compliance FrameworkHIL RequirementAutonomOps Solution
SOX (Sarbanes-Oxley)Human must approve changes to financial systemsTag "financial" services → AI requests approval (never auto-acts)
PCI DSS (Payment Card)Full audit trail of changes to cardholder data systemsEvery AI action logged with: who, what, when, why, outcome
HIPAA (Healthcare)Human oversight for PHI (protected health info) accessAI never accesses PHI logs. Only aggregated metrics.
GDPR (EU Data)Right to human review of automated decisionsBreak-glass override + explainability (why AI made decision)

Legal recommendation: Work with your compliance team to tag services by regulation. AutonomOps can enforce: "This service is SOX-tagged → AI must always request human approval, even for low-risk actions."

Key Takeaways

  • Not all decisions are equal: Match automation level to risk level (low-risk = auto, high-risk = human only)
  • Risk-based framework: Score every action on blast radius, reversibility, confidence, compliance
  • Break-glass protocols: Humans must be able to override instantly, no friction, no delays
  • Trust is earned gradually: 90-day ramp from shadow mode → approval mode → auto-pilot
  • Compliance matters: SOX/PCI/HIPAA may legally require human approval for certain actions

See HIL Protocols in Action

AutonomOps has these trust frameworks built-in: risk scoring, break-glass override, gradual ramp-up modes. Start in shadow mode, build trust, then let AI handle the toil.

SK

About Shafi Khan

Shafi Khan is the founder of AutonomOps AI. He's lived through the "AI made production worse" moment and built these HIL protocols to ensure it never happens again.

Related Articles