Founder's Field Notes · AI-Assisted Development · 2025–2026

One Founder. One AI.
One Production System.

How Claude Code and I built a multi-tenant, 12-stage AI hiring intelligence platform — and shipped, this week, the audit system that proves every score it produces.

01 Origin & Architecture

What is Intelletto.ai, and Why Does It Exist?

My first enterprise system wasn't built with a framework, a CI/CD pipeline, or a team of specialists. It was Pinnacle Software — a logistics ERP I wrote in C, deployed across clients in multiple countries, managing operations at a scale that, looking back, should have been impossible for one person to attempt. The fact that I also have dyslexia makes that chapter feel even more unlikely in hindsight. Reading is hard. Writing code — where the computer doesn't care about your spelling, only your logic — turned out to be a different thing entirely.

I didn't know then that I was building a mental model that would define the next twenty years of my career: technology is leverage. The right system, built well, multiplies human capability in ways that are genuinely hard to quantify until you've lived them.

Since Pinnacle, I've worked across most of the industries that matter — fintech, BPO, eCommerce, logistics, enterprise digital transformation — for companies ranging from scrappy startups to publicly listed institutions. And every single software development project, regardless of sector or scale, followed the same exhausting playbook. The business wants the solution yesterday. The development team — developers, QA engineers, PMs, product owners, infrastructure, security, compliance — already has more work than they can physically cope with. We paper over the gap with Agile methodologies, RAD tools, low-code platforms, code generators. We hire more people. We extend the timeline. We descope. We repeat.

I never stopped looking for a better way. What I found — eventually — is Claude Code. But I'll get to that.

An Industry Most People Underestimate

Before I explain what Intelletto.ai is and why it exists, I need to talk about an industry that most people outside of it genuinely don't understand: Business Process Outsourcing.

The global BPO market was valued at approximately $327 billion in 2025 and is projected to reach over $525 billion by 2030, growing at nearly 10% annually. It employs more than 25 million professionals worldwide. The Philippines alone accounts for 1.7 million BPO workers, representing one of the country's largest economic pillars. Over 72% of Fortune 500 companies outsource some portion of their operations. This is not a niche industry. It is one of the largest service ecosystems on the planet.

And a significant portion of that ecosystem runs on recruiting.

The volume of resumes that flows through large BPO operations is, frankly, staggering. A single mid-to-large BPO with continuous hiring cycles can process tens of thousands of candidate applications per month. Across the region's largest operators, we're talking hundreds of thousands of candidates per year, flowing across multiple client accounts and role types simultaneously. Each one of those candidates submitted a resume. Each one of those resumes had a human being on the other side of it, trying to make a fair assessment in the time they had available.

Here is the brutal arithmetic of that reality. Manually screening 300 resumes takes approximately 30 recruiter hours. Manual hiring averages $1,000–$4,700 per hire, largely due to recruiter time and overhead. Preparing a candidate shortlist costs an average of two additional hours — before a single interview has been scheduled. Multiply that across hundreds of open roles running simultaneously, and the recruiter capacity problem becomes obvious: it is structurally impossible to give every candidate the evaluation they deserve. Not because the recruiters aren't good at their jobs. Because the math simply doesn't work.

AI-powered resume screening can reduce initial screening time by up to 75%. Around 67% of hiring managers cite time savings as the single biggest advantage of AI in recruitment. These aren't projections from a vendor brochure — they're outcomes reported by organisations that have implemented even basic AI-assisted screening. Having worked inside large-scale BPO hiring operations, I saw impact of that order firsthand. Resume parsing and basic AI scoring didn't just save recruiter hours; it changed what recruiters were able to do with their time. Instead of triaging volume, they were doing actual talent evaluation.

But working that close to the problem gave me an uncomfortable feeling that has never left me: we were only scratching the surface. Basic parsing and keyword-based scoring is better than nothing. It is nowhere near what's possible. A resume is a compressed record of a career — years of decisions, pivots, promotions, lateral moves, skill accumulation, industry transitions. Keyword matching reads almost none of that. It sees the words. It misses the story.

Keyword matching is not intelligence. Applicant Tracking Systems track. They don't think. That gap is where Intelletto.ai begins.

The Platform

Intelletto.ai is an AI-native candidate intelligence platform. At its core, it does three things that incumbents cannot:

The technical stack is not timid. The backend is a FastAPI/Python platform with hundreds of documented endpoints, backed by Google Cloud SQL PostgreSQL (295 tables in production), Google Cloud Storage for document management, Google Document AI for parsing, and Gemini as the primary inference layer. Pipeline orchestration runs on Netflix Conductor with supervised workers, a watchdog, and per-stage advisory locks. Infrastructure is managed in Terraform across GCP's asia-southeast1 region.

// Intelletto.ai — System Architecture
Resume Upload (GCS)
12-Stage Pipeline Orchestrator
Inference Artifacts
Document AI (OCR)
Gemini LLM (Extraction)
Scoring Engine
Candidate Scorecard
FastAPI · 200+ endpoints
·
PostgreSQL · 295 tables
·
RBAC v1 · 12 permissions · 6 system roles
Mode A — Pool Prep
|
Mode B — Pool Activate
|
Mode C — Full Auto
|
Mode D — Bulk Import
Sealed Audit Packet · SHA-chain · Signed Bundle · Verifier CLI
12
Pipeline stages per resume
295
Production database tables
200+
API endpoints
4
Pipeline modes
9
Months from first commit
02 The Tool

What Claude Code Actually Is

Most people who haven't used Claude Code think it's a smarter autocomplete. It is not. Claude Code is Anthropic's agentic coding tool — a command-line interface that gives Claude direct, persistent access to your codebase, your terminal, your database connections, and your file system. It reads files. It writes files. It runs commands. It reasons across your entire project context and executes multi-step plans.

The distinction from tools like GitHub Copilot or even ChatGPT with code execution is architectural: Claude Code operates with agentic persistence. You don't paste a function in and wait for a suggestion. You describe an objective — "refactor the scoring engine to use typed protocol interfaces with per-stage artifact contracts" — and it navigates the codebase, identifies every relevant file, proposes and executes the changes, and verifies the result. It is the closest thing that currently exists to a senior engineer who is also infinitely patient and works at the speed of thought.

What Claude Code Can Do

Read, write, and refactor across an entire codebase with full context of all interdependencies. Execute shell commands, run tests, interact with databases, manage git history, and chain complex multi-file changes as a single coordinated operation.

Produce production-grade code with proper typing, error handling, observability hooks, and documentation — not prototype-grade sketches that need extensive cleanup.

Operate from a governance file (CLAUDE.md) that constrains its behaviour, enforces conventions, and establishes hard rules — making it a collaborator that respects the architecture you've established, not one that free-lances around it.

What makes it genuinely different for a solo founder-CTO is the compression of cognitive overhead. The mental tax of holding an entire system architecture in your head while writing implementation details is one of the core bottlenecks of solo technical work. Claude Code carries that architectural context. I can think at the strategy layer while it executes at the implementation layer.

That said — and this is critical — it is not magic. It requires disciplined collaboration. The difference between Claude Code as a force multiplier and Claude Code as a liability is almost entirely a function of how well you govern it.

03 The Trials

The Good, the Bad, and the Genuinely Ugly

I want to be honest here, because most AI development content reads like a vendor brochure. The reality of building a production system with an AI coding partner involves real failures, real wasted time, and some genuinely alarming moments. It also involves breakthroughs that would have taken weeks by conventional means.

The Good

The moments that made me a believer were when Claude Code demonstrated something I can only describe as systems thinking. Early in the project, I needed to design a formal stage contract for the pipeline — a typed Python Protocol interface that every one of the 12 stages would implement, with strict input/output artifact typing, failure classification between TRANSIENT and PERMANENT errors, and per-stage observability hooks.

I described the architectural intent. Within a single session, we had a complete Protocol definition, a base class implementation, an orchestrator that respected stage independence, a telemetry event schema with a PostgreSQL migration, and stub implementations for all 12 stages that compiled and type-checked. That work represents at minimum a week of focused senior engineering effort. We did it in an afternoon.

Similarly, the multi-tenant RBAC model — a six-role hierarchy spanning Platform Admin, EOR Admin, EOR Recruiter, Customer Admin, Customer Recruiter, and Read-Only, with JWT payloads carrying dynamic tenant context — was architecturally defined and scaffolded across the FastAPI route layer in a session that would have taken a team of two engineers several days to produce at the same quality.

The moments that made me a believer were when Claude Code demonstrated genuine architectural reasoning — not code generation, but systems thinking that matched my intent.

The Bad

Here is where I'll be uncomfortable with you. Early in the project, I discovered that my working codebase had only a single git commit — "Initial commit" — with nothing pushed to the remote repository. Every time I started a new Claude Code session, it was operating on whatever state existed on disk, with no reliable baseline. Because I hadn't established a disciplined push cadence, Claude Code was on more than one occasion working from incomplete context — and the results showed it.

This is entirely my failure, not the tool's. But it illustrates something important: Claude Code amplifies your engineering discipline as much as it amplifies your velocity. Poor git hygiene in a traditional team produces confusion. With an agentic AI tool, it produces sessions where you don't immediately realise the context is corrupted.

The fix was straightforward once identified — commit everything, push to GitHub, establish a cadence — but the time lost before that diagnosis was real and avoidable.

The Ugly

The ugliest moments were the P0 bug discoveries. After the pipeline had been running for some time, a systematic audit revealed three critical failures that had been silently executing incorrectly for every resume processed:

P0 Bugs Discovered in Production Audit

Gate B always returning true: The conditional gate that determines whether a resume passes quality thresholds was unconditionally returning True — meaning every resume advanced regardless of quality signals.

Evidence validation checking list presence, not content: The validator confirmed that evidence arrays existed but never inspected whether they contained valid data. Empty arrays passed validation silently.

Scoring using placeholder arithmetic: A critical scoring stage was computing results using placeholder addition rather than the defined rubric formula. Every score produced was mathematically incorrect.

These bugs did not originate entirely from Claude Code — they were the product of iterative scaffolding where stubs got promoted to production without sufficient scrutiny. But they highlight the governance challenge: when an AI can generate plausible-looking code at high velocity, you need systematic verification processes that match that velocity. We implemented these — formal acceptance criteria, explicit stage contracts, structured audit protocols — but we should have established them earlier.

The lesson I took from this is that the CLAUDE.md governance file is not optional. It is as important as any other architectural document in the project.

Both of those P0 gates have since been resolved. Gate B now separates blocking from advisory schema errors and fails the pipeline on the former; Gate C verifies per-section fact coverage with a documented density floor. The third — the bucket dispatch / rubric persistence problem — has been substantially closed and is tracked in the public spec. Every fix landed with new regression tests in the suite. What turned the corner wasn't a single change but a pattern: every time we found a silent-failure mode, the next step was to make that mode loud — typed contracts, gate verdicts persisted to the database, audit packets that surface "evidence missing" instead of swallowing it. The system got more honest as it got more sophisticated, which is not the usual direction software travels.

# CLAUDE.md — v3.0 (excerpt) # This file governs ALL Claude Code behaviour in this repository. HARD RULES — NO EXCEPTIONS: git: Never force push. Commit descriptively. Push after every session. database: Never DROP, TRUNCATE or ALTER production tables without explicit confirmation. pipeline: No stage may import or reference another stage. Orchestrator authority only. scoring: Never use placeholder arithmetic. Rubric formula is canonical. stubs: All stub implementations must be flagged TODO:STUB. Never ship stubs silently. DIAGNOSTIC PROTOCOL: Diagnose before fixing. State the root cause before writing any code. Confirm destructive operations. No silent migrations.
// The governance contract between founder and AI collaborator
04 What Was Built

The Final Product — A Detailed Look

Let me give you a complete picture of what actually exists in production, because I think the aggregate is more impressive than any individual component.

The Intelligence Pipeline

The 12-stage pipeline is the intellectual core of Intelletto. Each stage is an independent, typed module implementing a formal Python Protocol contract. The orchestrator is the only entity with authority to sequence stages, pass artifacts between them, handle failures, and emit telemetry. No stage has knowledge of any other stage — this isn't just good engineering hygiene, it's a deliberate architectural constraint that makes the system testable, observable, and safe to evolve.

Stage outputs include: raw text extraction, entity normalisation, career segment identification, skills extraction with 4-signal proficiency scoring (years of experience, recency decay, role frequency, seniority context), career trajectory classification with named flags — RAPID_ASCENT, STAGNATION, LATE_BLOOM, REGRESSION — stability risk scoring, role-switch risk modelling, and a final rubric-based scorecard with explainable component weights.

The Scoring Engine

The scoring engine is not a black box. Every candidate score is the product of a documented rubric formula with weighted components. Each component is individually auditable. The system produces a structured scorecard — not just a number — that includes skills match depth, experience alignment, trajectory assessment, and risk-adjusted candidacy signals.

The skills proficiency module operates on a 4-signal composite: raw years of exposure, recency decay (skills not exercised in recent roles score lower), role frequency (how consistently a skill appeared across positions), and seniority context (a skill claimed at executive level carries more weight than the same skill at entry level). This is the kind of nuance that keyword-matching systems cannot produce.

The GCS Pipeline Architecture

The Google Cloud Storage architecture uses path-based mode detection to route resume documents through the correct pipeline configuration. The RESUME-POOL/ prefix triggers Mode A (pool build). A job-code path segment triggers Mode B (pool activation). A direct upload with an attached job requisition triggers Mode C. The design keeps the pipeline stateless from a trigger perspective — intent is encoded in the file path, not in application state.

The Multi-Tenant Data Model

The database schema — 169 tables in production — is designed around a three-tier tenancy hierarchy: EOR at the top, Customer in the middle, Employee/Candidate at the leaf level. Every JWT payload carries eor_id unconditionally, with customer_id populated conditionally based on the authenticated role. Tenancy is enforced at the data layer, not just the application layer.

The Platform Interface

The frontend is built on a consistent design system with a permanent two-column navigation pattern: a dark icon strip paired with a white item panel. The JD Orchestrator provides a real-time interface for job descriptor management. A mission-control style monitoring dashboard provides per-stage observability into the pipeline telemetry system, with event-level visibility into every resume as it moves through all 12 stages.

Infrastructure and Deployment

The platform runs on Google Cloud Platform managed entirely through Terraform. Cloud SQL PostgreSQL in asia-southeast1. GCS buckets for resume intake, inference artifact storage, and Terraform state. The FastAPI application is containerised with Docker multi-stage builds for production efficiency. All infrastructure is version-controlled, reproducible, and auditable.

The Surfaces Built Since

The original cut of this piece talked about the pipeline, the scoring engine, the RBAC model, and the infrastructure. What follows is everything that's joined them in production since — because the rate at which a solo founder + AI can extend a platform is, I think, the part of this story that's easy to underestimate until you see it written out.

And then — this week — the part of the story I most want to write about.

The Auditor Validation System

Most AI hiring tools talk about audit. They mean: "the database has timestamps and you can export a CSV." What an actual auditor — an external regulator, a fairness firm, a tenant's compliance team — needs is something different. They need to be able to walk every claim on a candidate dashboard back to a source byte they can verify with their own eyes, and re-run the scoring on their own machine and confirm they get the same number. That bar is much higher than "we logged it."

The system that ships this is built out of pieces that were already in place — the SHA-stamped pipeline, the immutable scoring-input snapshot, the per-fact evidence spans, the rescore history — but it ties them together into a single artefact the auditor can hold in their hand. The deliverable is a fourteen-page Audit Report attached to every sealed scorecard.

What's in the Audit Report

Cover sheet: the candidate, the JD they were scored against, the final score with band, the seal timestamp, the packet's SHA-256 fingerprint, and the signing-key fingerprint.

Executive summary: the engine's plain-language verdict, top strengths and gaps with rationale, red flags with the modifier points they cost.

Bucket-by-bucket breakdown: all nine scoring buckets with weight, contribution, matched evidence quotes, matched-vs-missing token counts, and cert-implied source attribution. Empty buckets are explicitly marked EVIDENCE MISSING — claim suppressed, not silently omitted.

Modifier trace: every modifier with the points it awarded and its rationale, plus the net impact on the final score.

Provenance chain: the source PDF SHA, the page-map SHA, the cleaned-text SHA, the snapshot SHA, the pipeline run ID, the twelve stage outcomes with timing, the gate verdicts, the rescore history.

Methodology and verify-yourself instructions: step-by-step, including the command line for the open-source verifier CLI.

Recruiters open the report inline in a browser tab. Auditors download a signed bundle — the PDF, the canonical machine-readable JSON, a detached cryptographic signature, a README — and run intelletto-verify packet.zip on their own machine. The verifier independently checks the signature against our published public key, recomputes every SHA in the chain, and re-runs the deterministic scoring routine on the immutable input snapshot. The output is a single line: VERIFIED, or VERIFICATION FAILED at a specific node. That's the bar. Not "trust our signature" — re-run the scoring yourself and see you get the same number.

The reason this matters in the context of this piece — and I want to be precise here — is that it is the strongest possible answer to the question every AI-assisted system has to eventually face: how do you know it wasn't fabricated? The Audit Report is the system producing its own evidence, on demand, in a form an auditor can verify without taking anyone's word for anything. Every architectural choice that came before — the per-fact source attribution, the immutable snapshot, the SHA-stamped chain, the gate framework, the rubric versioning — was building toward the moment when that report could be assembled without compromise. This week it can.

The work itself was a tight loop. A spec written in an afternoon. A four-phase plan. Subagent-driven execution where Claude Code did most of the wiring with me steering on the architectural calls — what does each audience profile redact, where does the signature go in the canonical JSON, which fields are pinned by the SHA and which are cover-sheet metadata. The first draft of the report PDF was thin — one page with empty placeholders — and the user feedback (very direct) was "this makes no sense to any human." That was correct. Rewriting the template to actually deliver against the spec section, rewiring the export endpoint to call the rich builder, and re-running the regression suite took another session. The final report is 14 pages of substantive content backed by a 71 KB canonical JSON. Whether you trust the recommendation or not, you can verify the work.

05 Reality Check

What This Actually Means

I want to be precise here, because the productivity claims around AI-assisted development are frequently vague and self-serving. Let me give you concrete, grounded estimates based on my experience managing engineering teams at scale.

The Traditional Development Equivalent

The system described above represents a specific and estimable body of engineering work. Here is my honest assessment of what it would have required through conventional development:

Component Traditional (3-person team) With Claude Code (solo)
12-stage pipeline + contracts4–6 weeks~2 weeks
Scoring engine + rubric modules6–8 weeks~3 weeks
FastAPI scaffolding + 72 endpoints4–5 weeks~1.5 weeks
PostgreSQL schema (169 tables)6–8 weeks~3 weeks
Multi-tenant RBAC + JWT model3–4 weeks~1 week
GCS 3-mode pipeline architecture2–3 weeks~1 week
Frontend design system + interfaces5–6 weeks~2 weeks
Terraform + GCP infrastructure2–3 weeks~1 week
Monitoring, telemetry, observability3–4 weeks~1 week
Shipped since the original draft — Q4 2025 through Q2 2026
Talent CRM (Bank · Pool · Connect · Segments · Campaigns · Referrals)10–14 weeks~4 weeks
Career site + builder + apply experience6–8 weeks~2.5 weeks
Interview Operations (plans, kits, conduct, decision, approval)8–10 weeks~3 weeks
Hire & Onboard (offers, generalised approval engine, templates)6–8 weeks~2 weeks
RBAC v1 (permissions catalogue, override semantics, admin UI)4–5 weeks~1.5 weeks
Conductor migration + reliability hardening4–6 weeks~2 weeks
AI surface area (JD Generator, one-liner, red flags, skill-gap, outreach, Bias Auditor)6–8 weeks~2.5 weeks
Auditor Validation System (Audit Report PDF + signed bundle + verifier CLI)5–7 weeks~1.5 weeks

Original-scope traditional total: 35–47 weeks with a team of 3 engineers. Extended-scope traditional total — what the platform actually is today: 84–113 weeks, which is roughly two years of a four-to-five-person team. At fully-loaded Manila market rates — conservative at $8,000–12,000 USD per month per senior engineer — that's a burn of $640,000–$1,400,000 USD for the team-and-time it would have taken through conventional development.

With Claude Code, one person built the equivalent in nine months of elapsed time, at the cost of a Claude subscription and GCP infrastructure. The capital efficiency is not incremental. It is categorical.

One person. A few months. A fraction of the cost. And a system that is genuinely sophisticated — not a prototype dressed up in production language.

The Important Qualifications

I want to be careful not to oversell this, because the nuances matter for anyone considering a similar path.

What This Means for the Industry

I've built software teams in some of the largest technology organisations in the Philippines. I've managed engineers across multiple countries, negotiated vendor contracts, and overseen technology transformations at institutional scale. My perspective on what this moment represents is therefore not that of someone who has only ever worked alone.

What is happening right now is a structural shift in the production function of software. The minimum viable team for building a sophisticated SaaS platform is collapsing. Not because engineers are becoming less valuable — but because the leverage available to a skilled technical leader has increased by an order of magnitude.

This has profound implications for how early-stage ventures are funded, staffed, and evaluated. A solo technical founder with deep domain expertise and disciplined AI collaboration practices can now produce what previously required a seed-funded team. The "build it to prove it" phase — the most capital-intensive and time-intensive phase of any startup — just got dramatically shorter and cheaper.

For enterprise technology leaders — CIOs, CTOs, VPs of Engineering — the implication is different but equally significant: the teams you are managing need to evolve toward this mode of working, not resist it. The engineer who is 10x effective with AI tools is not a curiosity. They are the new baseline expectation. Your job as a technology executive is to build the governance frameworks, the quality assurance practices, and the architectural discipline that makes AI-augmented development trustworthy at scale.

The Honest Summary

Intelletto.ai is a production-ready, multi-tenant, AI-powered hiring intelligence platform built by one person in a few months using Claude Code as a collaborative engineering partner. It is sophisticated. It is deployed. It is real.

The journey was not frictionless. There were bugs, wasted sessions, governance lessons learned the hard way, and moments of genuine frustration. There were also moments of remarkable productivity that I could not have achieved by any other means available to a solo founder.

The conclusion I've reached is simple: the tools have changed the game. Whether you're a founder, a CTO, or a senior engineer, the question is no longer whether to engage with this mode of working. It's how to govern it well enough to trust it with production systems.

I started this piece with a logistics ERP written in C, built by someone with dyslexia who had no business pulling it off. The throughline from that system to Intelletto.ai is the same one it's always been: technology is leverage. What's changed is the magnitude of that leverage. And what stays the same — what will always stay the same — is that the person holding it still has to know what they're building, and why.

There's a tidy symmetry to the Audit Report being the last thing we shipped before publishing this. The article is fundamentally an argument for accountability — that solo founders building serious systems with AI should be expected to govern their work in a way that's trustworthy at scale. The Audit Report is the system literally producing a trustworthy artefact of its own work, on demand, in a form an auditor can verify without trusting us. If I'm going to make the claim that one founder and one AI can build something a regulator would be willing to evaluate, the system needs to make the same claim about itself — and prove it. Now it does.

If you're building something ambitious, I'd encourage you to find out what's now possible. The barrier to production-grade sophistication has never been lower. The discipline required to use that capability responsibly has never been more important. And the bar for what counts as "trust me, it works" has, for systems that matter, risen to "here is the signed report — verify it yourself."