Building a Temporary Admin Access Workflow

The Problem

Every IT team knows the conversation. A developer needs to install a dependency. An engineer needs to run a system-level diagnostic. A designer needs to update a font cache. The ask is always the same: "Can you just make me a local admin for a bit?"

The traditional answers are both bad. Option A: give them permanent local admin rights, accept the expanded attack surface, and hope they don't accidentally break their system or run a malicious installer. Option B: have IT remote in every single time, creating a bottleneck that interrupts both the user and the IT team.

I wanted a third path: just-in-time admin access, approved in Slack, time-limited to 5–30 minutes, and fully audited. No persistent privilege. No IT babysitting. A complete audit trail of every command run during the window.

Goal: Users get self-service access in under 60 seconds. IT maintains approval control. Every session auto-expires. Every sudo command is logged and shipped to the Slack thread.

How It Works

User opens Iru Self Service and clicks "Request Admin Access" A compiled AppKit form presents a single window with a reason field, duration picker (5, 10, 15, or 30 minutes), and category selector (Install, Debug, Config, Security, Developer, Other). The script collects device identity (hostname, serial) and POSTs a signed request to an API Gateway endpoint.

IT receives an interactive Slack approval message The message shows user, hostname, serial, reason, and category. IT sees four duration-labeled Approve buttons (5/10/15/30 min) — they can approve at the user's requested duration or override it — plus a Deny button. Posted to a dedicated IT channel.

IT clicks Approve — duration-specific Iru tag assigned + background monitor detects it A background LaunchDaemon polls /status every 20 seconds. On approval, it calls iru run directly — the fastest path to processing Library Items. The user sees an "approved" alert within 20 seconds. Each duration maps to a distinct Iru tag, which scopes its own SAP Privileges MDM profile with the matching ExpirationInterval.

Device runs elevation-start.sh via Iru Library Item Calls PrivilegesCLI --add to grant admin, enables a sudoers drop-in for command logging, notifies the backend to start the N-minute timer, and starts a persistent network monitor LaunchDaemon. The monitor loops continuously — checking admin group membership every 1s, network every 5s, and backend status every 60s. If network is lost, admin is stripped immediately and re-elevation attempts via the Privileges app are blocked within 1 second until connectivity is restored.

EventBridge sends a 5-minute warning DM, then fires expiration at T+N Timers are anchored to when the device confirms elevation — not when IT clicks Approve. The user always gets the full approved duration. The warning DM is skipped entirely for 5-minute sessions (a T+0 warning would be instant noise).

On expiration, Iru removes the tag and collects the sudo log A second Iru tag triggers collect-sudo-log.sh, which ships the sudoers log to the backend. The backend uploads it as a file attachment in the original Slack approval thread.

IT reviews per-user risk scores on the AI dashboard (v1.2.0) Every session log is scored by Claude Haiku 4.5 via Amazon Bedrock — evaluating actual sudo commands, not just metadata. Scores are cached 48h per user. The dashboard is served over CloudFront and accessible directly from each Slack approval message via a 30-minute session token.

Click any screenshot to zoom. Each is labelled with the step it corresponds to.

Admin access request form showing reason, duration, and category fields — Step 1Request Form

On-device dialog confirming request submitted and pending IT approval — Step 1Pending Approval

On-device alert showing user already has temporary admin access with time remaining — Step 1Already Elevated

Slack approval card with AI risk score and approve/deny buttons — Step 2Slack Approval

Slack thread showing repeated nudge replies when request goes unanswered — Step 2Pending Nudges

Slack message showing request expired after no IT response — Step 2Auto-Denied

On-device alert showing admin access was approved and elevation is being applied — Step 3Access Approved

macOS notification confirming administrator privileges were granted — Step 3Privileges Granted

Slack DM confirming admin access approved with Iru check-in reminder — Step 3User DM

Slack thread showing session active with expiry time and Revoke and Lock Device buttons — Step 4IT Controls

On-device dialog confirming admin session was ended early by user — Step 4Session Ended

Slack thread showing 5-minute warning before session expiry — Step 55-Min Warning

Slack thread showing session expired and sudo log collected inline — Step 6Session Expired

Dashboard showing AI risk score, behavioral analysis, and request history — Step 7Risk Dashboard

Architecture

Iru Blueprint showing conditional tag-based Library Item assignment for each duration

The Iru Blueprint — tag-based conditional logic assigns the correct SAP Privileges profile (5, 10, 15, or 30 min) when the matching elevation tag is present, then triggers elevation-start and log collection scripts in sequence.

The backend is a fully serverless AWS SAM application. There's no always-on infrastructure — all compute is Lambda functions invoked by API Gateway or EventBridge Scheduler.

⚡

API Gateway + Lambda

9 Lambda functions handle request intake, Slack actions, device confirmations, log receipt, status polling, and expiration. All endpoints behind API key auth.

🗄

DynamoDB

Single-table design stores each request's full lifecycle: status, timestamps, Slack thread IDs, device ID, and actor identity for every state transition.

⏰

EventBridge Scheduler

One-time schedules per session for the 5-minute warning (T+25) and expiration (T+30). Auto-delete after firing via ActionAfterCompletion: DELETE.

📱

Iru MDM

Two tags act as signals. Elevation tag triggers the Privileges profile. Log-collection tag triggers log shipping. Device-side iru run forces immediate processing.

🔑

SAP Privileges

Open-source macOS app providing controlled, time-limited local admin via a LaunchAgent. Scoped via an Iru config profile — only activates on tagged devices.

🔐

System Keychain

API key stored in the macOS system keychain (accessible by root) via a provisioning script. Retrieved at runtime — never hardcoded in source.

The Slack ↔ Lambda Handshake

Slack requires a 200 response within 3 seconds of an interactive action. But processing an approval — hitting Iru, writing to DynamoDB, creating EventBridge schedules — takes longer. The solution is a two-Lambda pattern:

handleSlackAction verifies the Slack HMAC-SHA256 signature and immediately invokes processSlackAction asynchronously (InvocationType: 'Event').
handleSlackAction returns 200 to Slack within milliseconds.
processSlackAction runs independently and handles all the heavy work.

Timer Anchoring

An early design mistake: the 30-minute timer was started at approval time. But there's latency between IT clicking Approve and the device actually being elevated — MDM check-in, Iru running the script, PrivilegesCLI executing. A user could lose 3–5 minutes before they even had admin.

The fix: elevation-start.sh POSTs to a /start endpoint when elevation is confirmed on device. The backend creates EventBridge schedules from that timestamp. The user always gets a full 30 minutes from the moment they're actually elevated.

bash

# elevation-start.sh — notify backend that elevation is confirmed
HTTP_STATUS=$(curl -s -o "$ELEVATION_RESPONSE_FILE" -w "%{http_code}" \
  -X POST "$API_ENDPOINT" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $API_KEY" \
  --max-time 15 \
  -d "{\"requestId\":\"$REQUEST_ID\",\"serial\":\"$SERIAL\"}")

Security Features

After eleven rounds of security audits, the system incorporates defense-in-depth across every layer:

Slack Signature Verification

Every webhook verified with HMAC-SHA256. Requests older than 5 minutes rejected. Timing-safe comparison via crypto.timingSafeEqual.

DynamoDB Conditional Writes

All status transitions use ConditionExpression atomically. Two IT admins clicking Approve simultaneously results in exactly one approval.

Input Validation Everywhere

UUID format validation on all device endpoints. Field length limits. Serial validated as 8–14 uppercase alphanumeric. Lambda endpoints reject non-object JSON bodies.

Slack mrkdwn Injection Prevention

All user-controlled fields passed through escapeSlack() before embedding in Block Kit messages. Prevents link injection via <URL|text> syntax.

Device Identity Binding

Serial stored at request time is validated against every subsequent device call. A device can only interact with its own session — not another device's.

Network Loss Revocation & Offline Enforcement

The network monitor runs as a persistent loop — not a periodic LaunchDaemon job — so timing is no longer dependent on a 60-second StartInterval. Admin group membership is checked every 1 second. Network connectivity is checked every 5 seconds via curl to captive.apple.com (requiring HTTP 200 exactly — not just any response). Backend status is polled every 60 seconds.

When network loss is detected, admin is stripped immediately and an offline enforcement loop begins: the SAP Privileges mobileconfig remains on the device while offline (it can't be removed without MDM connectivity), so a user could theoretically re-elevate themselves via the Privileges app. The loop strips the admin group membership within 1 second of any re-elevation attempt — both via PrivilegesCLI --remove and directly via dscl. The loop runs until network is restored and the backend is notified, or a 2-hour TTL expires.

Auth errors (401/403) from the backend fail-secure — access is revoked immediately rather than retried. HTTP 000 (no response) is treated as network loss rather than a transient error.

IT Slash Command

/admin-status restricted to a configured Slack user ID allowlist. Empty allowlist defaults to denying all access (fail closed).

Off-Hours Delegation

Optional off-hours auto-approval routes to an on-call admin. Configuration errors fail closed — requests held for manual review, never auto-approved on misconfiguration.

Transient Failure Resilience

Iru API calls use exponential backoff (1s, 2s) for 5xx/429 — up to 3 attempts. 4xx throws immediately. Prevents a single rate-limit from dropping an entire operation.

Partial Failure Resilience

Elevation removal is the critical path — if it fails, EventBridge retries. Log collection failure is non-critical: session is still marked expired and IT is alerted.

Audit Trail & Delayed Notifications

Every transition records timestamp and actor. User DMs are deliberately delayed until sudo log collection succeeds — audit trail secured before user is notified.

Secrets Management

API key in macOS system keychain — never in scripts. Lambda secrets via AWS SSM. Module-load-time validation ensures Lambdas fail fast if secrets are missing.

Atomic Metadata Writes

Session metadata at /var/root/.iru-elevation/meta.json (mode 600). mktemp + mv pattern — a crash mid-write never leaves a partial file.

iru run Mutex Lock

File lock at /var/run/iru-run.lock serializes all iru run invocations across three daemons. PID-aware — detects and clears stale locks from killed processes.

Post-Run State Verification

After each iru run, daemons verify the expected state change occurred. Single retry after 120s if not confirmed — absorbs Iru tag propagation latency.

Key Design Decisions

Why Iru Tags as Signals?

Iru Library Items can be scoped to specific device tags. By scoping a Library Item to the temp-admin-elevation tag, we get Iru's built-in delivery guarantees: retry on failure, run-at-install semantics, and immediate execution on iru run. We don't need to build our own device delivery mechanism — Iru handles it.

Why SAP Privileges Instead of dseditgroup?

Direct dseditgroup calls add the user to the local admin group and require explicit cleanup. SAP Privileges integrates with macOS's authorization model, provides a visible UI indicator, supports an ExpirationInterval MDM key as a safety-net fallback, and is open-source with active maintenance. The MDM profile approach means the app only works on tagged devices.

Why Two Iru Tags?

Separation of concerns. The elevation tag is removed on revocation or expiration. The log-collection tag is assigned on revocation or expiration. These are often simultaneous but not always. Keeping them separate avoids race conditions and makes each Library Item's trigger unambiguous.

Why EventBridge Scheduler Instead of SQS Delayed Messages?

EventBridge Scheduler supports named one-time schedules that can be deleted by name. Critical for the revoke flow: if IT revokes at T+15, we cancel the T+25 warning and T+30 expiration schedules. SQS delayed messages can't be cancelled after enqueuing.

Lessons Learned

The real attack surface is the device, not the backend

Most of the interesting security findings were in shell scripts — unvalidated data in generated scripts, metadata files with wrong permissions, Python subprocesses without timeouts. Lambda code is easy to reason about; device-side bash is where subtle bugs hide. Treat shell scripts as first-class security artifacts.

Race conditions require database-level guards, not application-level checks

The "fetch → check status → update" pattern is a TOCTOU race. Two concurrent Lambda invocations can both pass the check and both apply the update. DynamoDB's ConditionExpression moves the check into the atomic write. Non-negotiable for state machines where each transition must happen exactly once.

Anchor timers to device confirmation, not approval

Any time you have an async pipeline (approve → MDM deliver → device run → confirm), the user experience is only as good as the last step. Anchoring timers to device confirmation cost one extra API call but resulted in users always getting the full 30-minute window they were promised.

Iterative security auditing finds what point-in-time reviews miss

Eleven rounds of security audits found meaningful issues in almost every round — not because earlier rounds were bad, but because fixing issues and adding features creates new surface area. Build security review into your iteration cycle, not just your launch gate.

Zero open findings is achievable — accept risk explicitly, not by omission

The accepted-risk items were each evaluated deliberately. Accepted risk with documented rationale is categorically different from unfixed risk with no explanation.

New features are new attack surface — review them immediately

The slash command was the highest-severity new finding: signature verification was in place, but no authorization check existed. Any workspace member could enumerate all active admin sessions. The fix was three lines. The gap between "implemented" and "authorized" is where high-severity findings live.

Fail closed beats fail open, especially for access control

The off-hours auto-approval feature had a subtle misconfiguration path that auto-approved every request on invalid configuration. The correct behavior: if you're not sure it's off-hours, require manual approval. Any access-control feature that grants permissions by default on config error is a security risk.

Escape at the output boundary, not at ingestion

Early versions sanitized input at ingestion time — this leads to double-encoding bugs and false confidence. Store raw data; escape at every output boundary. Each output context (Slack mrkdwn, JSON, shell) has different escaping requirements.

Never call a process runner from within a script being run by that process runner

elevation-start.sh was calling iru run at the end of its own execution — but it runs inside an iru run triggered by the approval monitor. The agent holds an internal lock during execution; the nested call deadlocked the outer agent. A nested iru run inside an Iru script is a deadlock by construction.

Verify the check actually checks something

The UTF-8 validation in receiveLog was Buffer.from(x).equals(Buffer.from(x)) — a tautology that always returns true. It passed code review because it looked correct. Always test security checks with input that should fail. A check that never rejects is not a check.

By the Numbers

Lambda functions

Device shell scripts

30m

Max elevation window

<60s

Approval → elevated

~3s

Slack latency

Security fixes

Recently Shipped

Pending request nudges & auto-deny: EventBridge Scheduler posts Slack thread reminders to IT every 10 minutes for the first hour, then hourly for up to 24 hours. After 24 hours with no response, the request is automatically closed as expired_unanswered — does not affect the user's AI risk score. All intervals configurable via SAM parameters.
Adaptive device polling phases: Approval monitor switches from 20-second polls (first 15 min) to 5-minute (15 min–1 hr) to hourly (1–24 hr) — same LaunchDaemon, no reinstall needed. State is persisted in a root-only file so non-root users can't manipulate it.
Background approval monitor: Self Service script exits in under 5 seconds. A background LaunchDaemon polls for approval every 20 seconds — no more blocking the Iru Self Service app.
Removed all blankPush calls: blankPush triggers an MDM check-in, not a Library Item run. Device-side iru run is now the exclusive mechanism for picking up tag changes.
iru run mutex lock: Three background daemons could previously call iru run concurrently. A PID-aware file lock serializes all invocations.
Delayed revocation DMs: Users are not notified until after sudo logs are successfully collected. Audit trail secured first.
Slack original message lifecycle: The approval message is updated to a "completed" state when log collection succeeds — outcome and timeline shown, all buttons removed.
Fixed nested iru run deadlock: Removed the redundant inner iru run call inside elevation-start.sh that deadlocked the outer agent.
Switched to iru run --reset-daily: Forces full re-evaluation even if the agent's daily run already completed.
Post-run state verification with 120s retry: Daemons confirm expected state changes after every iru run.
Off-hours failure visibility: If off-hours auto-approval fails, a warning posts to the IT Slack thread immediately.
Degraded log warning: If timezone conversion fails during log collection, a visible warning is prepended to the uploaded log.

What Does It Cost?

Fully serverless means you only pay for what runs. Lambda and EventBridge stay within the permanent free tier at all realistic volumes — Bedrock (AI risk scoring) is the only real line item. Estimates below are based on typical request frequency by team size: engineers average 2–3 sudo sessions per day when they need access, but most engineers only need it a few times a month. A 30-minute window often covers 8–15 sudo commands, which slightly increases Bedrock token cost per session but doesn't change call count.

5 engineers

—

~35 req/mo

20 engineers

—

~100 req/mo

100 engineers

—

~350 req/mo

250 engineers

—

~625 req/mo

Monthly cost by team size

Bedrock API Gateway DynamoDB

Cost breakdown

Team	Bedrock	API GW	Dynamo	Total

Lambda, EventBridge & CloudFront remain within the permanent free tier at all volumes shown.

Bedrock is ~95% of total cost because every completed session triggers a Claude Haiku 4.5 risk re-score. Scores are cached 48h per user, so frequent users cost less over time. Requests that expire unanswered (expired_unanswered) are never scored — no session, no cost. See full cost breakdown →

Estimates are based on assumed average request frequency by team size. Actual costs will vary based on how often your engineers request access, session duration, and the number of sudo commands run per session.

Version History

Each release is tagged on GitHub — download any version as a zip or clone the tag directly.

v1.2.0 Pre-release

AI Risk Dashboard · Sudo Log Storage · Security Hardening

IT risk dashboard hosted on S3 + CloudFront — per-user request history, AI risk scores (Low/Medium/High/Critical), expandable sudo logs
Single-use Slack links — every approval message includes a one-click dashboard link with a UUID session token; no API key entry required
AI-powered risk scoring — Amazon Bedrock (Claude Haiku 4.5) evaluates actual sudo commands, not just request metadata; re-scored after each session log arrives. Haiku 4.5 was chosen over larger models for its low latency, sub-cent per-call cost, and because risk scoring is a structured classification task — not a reasoning-heavy one. The full sudo command list and session history fit comfortably in its context window.
Sudo log stored in DynamoDB — viewable inline in the dashboard; normalized to HH:MM:SS <command> with PrivilegesCLI revocation entries stripped
Pending request nudges & auto-deny — EventBridge Scheduler posts thread reminders to the IT Slack channel every 10 min for the first hour, then hourly for up to 24 h. After 24 h with no response, the request is auto-closed as expired_unanswered (does not affect the user's AI risk score). All intervals configurable via SAM parameters.
Device polling phases — approval monitor now uses timestamp-based phases (20s → 5 min → 60 min) instead of a fixed attempt counter; persists state in a root-owned file to prevent tampering
Security fixes — 14 issues resolved across two full audits: race condition in schedule creation, Bedrock IAM scope, CORS wildcard, XSS escaping, token session window, HTTPS enforcement, and more

Release notes & upgrade guide →

v1.1.0 Stable

Network Enforcement · AppKit Form · Duration Selection

Network enforcement — LaunchDaemon revokes admin if device goes offline during elevation; offline enforcement loop blocks re-elevation until backend is notified
Native AppKit request form — replaces AppleScript dialog; single-screen window with duration picker and reason category selector
Duration & category selection — users choose 5/10/15/30 min; IT can override at approval time; reason categories drive trend analysis
Off-hours auto-approval — configurable on-call Slack user ID for automatic approval outside business hours
IT device lock — MDM-lock button in Slack thread for emergency response during active sessions

Release notes →

v1.0.0

Initial Release

Self Service Slack-approval workflow for temporary local admin elevation
Interactive Approve/Deny buttons, EventBridge-scheduled expiration, 5-minute warning
Sudo log collection via macOS unified log, uploaded to Slack thread on session end
API key in System Keychain, serial number binding, DynamoDB request tracking

Release notes →

What's Next

Extended audit retention: Export DynamoDB records to S3 before TTL expiration for long-term compliance storage.
Rotation-aware off-hours: Pull the on-call rotation from PagerDuty rather than a static Slack user ID.
Dashboard pagination: Cursor-based paging for large request histories instead of full table scans.

Built for a macOS-first environment using Iru as the MDM, but the core pattern — Slack-gated JIT access with EventBridge timers and MDM tag signaling — translates to other MDM platforms with an API. Source on GitHub →

Building a Zero-Trust Temporary Admin Access Workflow on macOS

The Problem

How It Works

Architecture

API Gateway + Lambda

DynamoDB

EventBridge Scheduler

Iru MDM

SAP Privileges

System Keychain

The Slack ↔ Lambda Handshake

Timer Anchoring

Security Features

Slack Signature Verification

DynamoDB Conditional Writes

Input Validation Everywhere

Slack mrkdwn Injection Prevention

Device Identity Binding

Network Loss Revocation & Offline Enforcement

IT Slash Command

Off-Hours Delegation

Transient Failure Resilience

Partial Failure Resilience

Audit Trail & Delayed Notifications

Secrets Management

Atomic Metadata Writes

iru run Mutex Lock

Post-Run State Verification

Key Design Decisions

Why Iru Tags as Signals?

Why SAP Privileges Instead of dseditgroup?

Why Two Iru Tags?

Why EventBridge Scheduler Instead of SQS Delayed Messages?

Lessons Learned

The real attack surface is the device, not the backend

Race conditions require database-level guards, not application-level checks

Anchor timers to device confirmation, not approval

Iterative security auditing finds what point-in-time reviews miss

Zero open findings is achievable — accept risk explicitly, not by omission

New features are new attack surface — review them immediately

Fail closed beats fail open, especially for access control

Escape at the output boundary, not at ingestion

Never call a process runner from within a script being run by that process runner

Verify the check actually checks something

By the Numbers

Recently Shipped

What Does It Cost?

Version History

What's Next