A walkthrough of what a forward-deployed engineering engagement looks like end-to-end. Synthetic client (Northwind Treasury, $420M ARR fintech infrastructure), real methodology, real artifacts. Eight agents shipped to production. Built on the customer’s stack, owned by the customer, instrumented for evals from day one.
What we walked into. Three weeks of discovery, eight systems mapped, the heatmap of where time was actually leaking.
How undocumented playbook rules and people-in-people’s-heads conventions became agent-runnable rules.
What an AE, an RVP, and a CRO each saw on Monday morning after the agents went live.
How a single deal flows through five sub-agents in parallel and what the feedback loop looks like.
Before/after metrics, accuracy curve, full audit trail, and what the team kept after we left.
Northwind’s sales team had outgrown its tooling. The product was a developer-led PLG fintech infra platform, but the deals had become enterprise — security questionnaires, redlines, multi-stakeholder buying committees. Reps were closing deals by force of will, not because the system was working. The CRO wanted forecasts she could defend at board meetings. The CFO wanted commission disputes to stop. Three weeks in, we had the picture below.
Heatmap from time-and-motion shadowing of 14 reps and 4 RVPs across two weeks. Reps and managers were spending more time on internal coordination than on the customer.
The deliverable from week three: which agent to build first and why. P1 is the foundation — every downstream agent depends on clean CRM data and structured call intelligence.
The forecast wasn’t the broken thing — the rep admin tax was. Reps were spending 38% of their week on internal work, and pipeline-review prep alone consumed every Sunday night for every RVP. The team had been chasing forecast accuracy for two quarters; we found it was a downstream symptom of upstream signal loss. Fix the upstream, the forecast follows.
Half of how a sales org actually runs is in nobody’s playbook. It’s in Marcus’s head, or in a Slack thread from 2024, or in the way the deal-desk lead grants exceptions in Q4 but not Q1. Before agents could run, we needed those rules captured, scored, and made revisable. We call this the SOP Compiler step.
Northwind had a 38-page deal playbook in Notion. The agent extracted 12 executable rules from it; 6 of the most-cited shown below. Confidence scores reflect how unambiguously each rule could be applied without further interpretation.
What lived in people’s heads. Captured through structured interviews with the AE bench, the deal-desk lead, the CRO, the VP Customer Success, and the regional sales managers. Each rule has a named source so it can be revisited as the org changes.
Marcus always handles Meridian Health deals because he ran the relationship at his last company. Don't auto-route them.
Q4 pricing exceptions get more flexibility because annual quota pressure; deals pushed to Jan get standard rates.
Healthcare-vertical deals always need a HIPAA addendum even when the prospect doesn't ask for it.
Compensation disputes from the top 10% of quota carriers get escalated to VP Sales Ops, not the standard queue.
If a deal has been in negotiation >60 days, the forecast probability should be downgraded one tier regardless of stage.
EMEA renewals route to the regional rep, not the original closer, because of follow-up timezone & language match.
Every agent ships with an explicit confidence threshold. Below it, the agent escalates to a named human; never silently fails. These thresholds are tuned weekly during the first quarter, then quarterly thereafter.
Agents don’t do well at pretending they’re right when they’re not. The whole stack is built around the agent declaring uncertainty rather than guessing. Below threshold, work routes to a named human with the full reasoning trace attached. The human decides; the agent learns from the decision; the threshold tunes itself.
Three perspectives on the same Monday morning, three weeks after deployment. Each role saw a different surface of the same agent system — tuned to the decisions that role actually needed to make.
Marcus Chen, Enterprise AE opened her queue Monday morning and found four items. Down from 23 unresolved CRM tasks before the agents shipped. Everything else was handled overnight.
One deal enters the system. The Main Sales Agent spawns five specialized sub-agents in parallel — each applying a different rule set, checking different thresholds, surfacing different decisions. The orchestrator synthesises the output. The feedback loop catches the cases where humans correct the agents — and the next similar pattern auto-adjusts.
Updated 14 fields: MEDDPICC score, decision criteria, champion identified, timeline confirmed, competitive landscape populated from last 3 Gong calls.
Enterprise deal >$500k: routed to deal desk. Named-account match (Marcus Chen). Non-standard discount (22%) flagged for VP approval.
Discount at 22% exceeds 15% standard threshold. Multi-year premium applies. Competitive displacement justification attached. Recommend: VP approval with supporting data.
Matched 142/200 questions (71%) from knowledge base. 46 routed to security team. 12 flagged for updated SOC2 attestation. Draft response document assembled.
Draft contract assembled. Awaiting pricing approval resolution before generating final DocuSign package.
When a human corrects an agent’s output, the correction is captured, the relevant rule is updated, and the next similar pattern auto-adjusts. The system learns explicitly, with the rule update visible in the audit trail.
Org changes always break agent systems unless they’re absorbed automatically. Three real changes from this engagement; all auto-detected, rules auto-updated, with an audit log of what changed.
Deal routing, pricing validation, and commission calculation rules updated automatically. Zero manual reconfiguration.
Deal routing, lead assignment, and territory rules updated. 34 active opportunities re-mapped to correct AEs without disruption.
Commission calculation rules auto-adjusted. Retroactive impact analysis flagged 3 reps who cross new threshold.
Month 1 vs Month 4. The agents got smarter, pipeline execution got faster, and the team shifted from process work to selling. What follows is the full set of measurable outcomes — plus the audit trail that backs every number.
Agents get smarter every week. Human feedback and SOP changes are absorbed automatically. Overall agent accuracy lifted from 82% in week 1 to 96.4% by week 16.
Every agent action with timestamp, reasoning, confidence, and human approvals. Searchable. Filterable. Exportable for compliance review.
Northwind owns the agents, the data, the rules, the methodology. We did the work; they keep everything.
Every workflow, every rule, every model. Deployed on their infrastructure, inside their VPC, within their security perimeter.
Processing happens in their environment. No deal data sent to external servers. Full compliance with their security policies.
Zero platform lock-in. They keep everything if the engagement ends. Our IP is in the methodology, not the output.
They own the building. We designed and built it. The blueprints, the structure, the systems. All theirs.
Range, not point estimate. We publish the methodology because mid-market CFOs read these numbers carefully and ours need to survive scrutiny. Below is how the value gets created — with each line tied to a specific agent and a specific measurable outcome.
Rep admin time fell from 41% → 18% of their week. At 95 reps × 23pp × 40h × $180/hr fully-loaded, that’s 6.4–9.5 FTE-equivalent capacity reclaimed. Did not result in headcount cuts; capacity redeployed to expansion + new logo motion.
Average enterprise cycle 7.2mo → 5.4mo (25% compression). Cash flow value of pulling forward closes computed at quarterly weighted-average value of the closed-won cohort. Conservative — doesn’t count expansion deals.
+4.5pp win rate on >$250k deals, attributable to faster SecQ turnaround (5–8d → 18–30h) and cleaner deal desk routing (median 2.8d → 8h). Computed against the prior-quarter loss cohort with a stage-weighted attribution model.
CS handoffs same-day instead of 10–15 days. Earlier cohort onboarding correlates with 22% higher first-90-day expansion-conversation rate in our control group. Conservative; the lift may be larger as cohort matures.
All baselines are pre-engagement (the prior fiscal quarter at Northwind). Headcount equivalent uses fully-loaded comp ($180k median for AE benchmarked against 2025 Pavilion data). Cycle compression value uses the contribution-margin-weighted close cohort. The range exists because cohort sizes are still small (n=47 for forecast accuracy, n=23 for win rate); we’ll tighten the band as more deals close. We never bill more than the lower bound of created value.
We have to say this because most consultancies won’t. Knowing the limits is the only way to deploy something that survives contact with reality.
Quarter two: scope expansion to outbound prospecting agent, pre-call brief generator, and renewal-risk scoring. Quarter three: BOT (build-operate-transfer) optionality. The methodology is portable; the agents are theirs.