How AI Agents Replaced an HR Department

A mid-market HR services company was running with a team of human HR managers serving its customer base — small and medium businesses that couldn’t afford or didn’t need a full-time HR person in-house. Then a significant reduction in force hit. The HR management team was largely eliminated.

The question stopped being theoretical: could AI agents do the work of an HR department?

The answer turned out to be yes — and then some.

The Situation

This company’s product was HR expertise delivered as a service. Customers would submit requests and human HR managers would respond with compliant, professionally drafted documents: corrective action plans, performance improvement plans, written warnings, incident reports, policy documents. The whole stack.

After the RIF, there were two obvious options: rehire (expensive, months to ramp) or reduce service quality (customers churn, revenue drops). Neither was acceptable. I built a third option — AI agents that could handle the task queue.

What the Agents Did

These weren’t chatbots answering HR questions. They were task-completing agents that took an HR action from request to finished deliverable, including document generation, compliance checks, and stakeholder communication. Within weeks of deployment, agents were handling over 80% of the tasks the human HR managers had performed the prior year.

The coverage spanned the full HR services stack:

Corrective Action Plans — generated compliant CAPs based on employee history, incident context, and company policy, including specific corrective steps, timelines, and consequences
Performance Improvement Plans — structured PIPs with measurable goals, check-in schedules, and success criteria tailored to the employee’s role and performance gaps
Written Warnings — drafted with proper documentation, policy references, and escalation language, ensuring consistency across the customer base
Final Warnings — full incident history, prior corrective actions referenced, and termination implications clearly stated
Incident Reporting — structured reports from raw descriptions, required fields captured, compliance-relevant details flagged
Policy Creation and Rollout — the flagship capability, and the one that changed everything

Every one of these tasks went from taking hours or days to taking minutes.

The Policy Creation Flagship

Policy creation was the highest-value transformation. Under the old model, when a customer requested a new policy — say, a remote work policy, or a PTO policy, or an anti-harassment policy — the process looked like this: an HR manager researched compliance requirements for the relevant state and industry, drafted the policy over a day or two, it went through internal review, revisions, back to the customer for approval, then formatting and rollout. Start to finish: 7 to 10 days.

The agent process compressed all of that. When a policy request comes in, the agent gathers context — company details, state, industry, existing policies — then generates a compliant draft with inline compliance checks. An LLM quality evaluation gate reviews the output across multiple dimensions: relevance, regulatory compliance, clarity, and completeness. If it passes, it goes to the customer for review and approval. If it doesn’t, the agent retries with feedback from the evaluator. Once approved, rollout to employees is automated.

Start to finish: under 20 minutes.

A 50x speedup changes the nature of the service. When a policy takes a week, customers only request policies they urgently need. When a policy takes 20 minutes, they start requesting policies they’d always wanted but never bothered to ask for. The volume of policy work increased significantly — and the agents absorbed it without adding headcount.

For the full technical architecture behind how this pipeline worked, see my writeup on the Predictive HR system.

From Red to Profitable

The business impact was direct and measurable.

Before the RIF, the unit economics looked like this: revenue from the customer base minus the cost of the HR management team equaled operating at a loss. The human HR function was expensive. After the RIF and agent deployment, the math changed: same revenue minus compute costs for the AI agents and a small engineering team equaled profitable.

No customers churned due to service degradation. Revenue held. The agents maintained full service for the entire existing customer base during the transition. The business went from red to profitable — not because revenue increased (though it eventually did), but because the cost structure was fundamentally different.

Linear headcount scaling became compute scaling. Adding more customers no longer required hiring more HR managers. It required more compute and better agents.

Architecture Patterns That Made It Work

A few architectural decisions were essential to getting this right.

Gap detection before generation. Agents don’t hallucinate context they don’t have — they surface what’s missing and collect it. A dynamic wizard system prompted users for the specific pieces of information needed for each task type before generation began. This produced dramatically better outputs than prompting an agent with incomplete information and hoping it filled in the gaps correctly.

LLM-as-judge quality gates. Every generated document went through a multi-criteria evaluation pass before delivery. The evaluator checked for compliance gaps, missing required elements, unclear language, and contextual appropriateness. Documents that failed the gate went back for regeneration with specific feedback. Customers almost never saw a document that wasn’t ready.

A data flywheel. Every policy created, every CAP generated, every wizard interaction completed fed back into the system. Agents serving later customers had dramatically richer context than agents serving early customers. The system got better as the customer base grew, which is the opposite of what happens with human teams under load.

Micro team, high velocity. The engineering team that built and operated all of this was intentionally small — possible because the same AI-first development practices applied to the customer-facing agents also applied internally. Smaller team shipping more features faster, with AI-assisted development across the codebase. The leverage stacks.

Takeaways

AI agents can replace structured professional work, not just assist with it. These agents didn’t help HR managers work faster — they did the work. 80%+ task coverage isn’t augmentation, it’s replacement of the function. The distinction matters.

Compression changes behavior. When a multi-day process becomes a 20-minute process, customers start using the service differently. The demand curve shifts. This is underappreciated in most AI implementation discussions — the efficiency gain isn’t just operational, it changes what’s possible for end users.

Quality gates are non-negotiable in professional services. HR documents carry legal weight. A CAP or a final warning issued incorrectly can create liability. The LLM-as-judge pattern — where a separate model evaluates outputs against explicit criteria before delivery — was the thing that made deploying these agents into a production professional services context safe.

Data flywheels compound. Every interaction made the next one better. The agents serving customer #500 were meaningfully better than the agents serving customer #1, without any code changes. This is the actual moat in AI-native businesses — not the model, but the accumulated context.

Sometimes a crisis forces the right architecture. The RIF made AI agents a survival requirement, not an optimization. That urgency produced a system that turned out to be fundamentally better than what it replaced — not just cheaper, but faster, more consistent, and continuously improving.

If this kind of architecture is relevant to your business — whether you’re dealing with a headcount constraint or just want to understand what AI agents can actually own end-to-end — let’s talk.

Originally adapted from content on nateross.dev.