Debug Log

Scaling for Ghosts: 7 Microservices, 47 Users, and the Trap of Resume-Driven Development

April 25, 202614:40Debug Log

This episode explores the phenomenon of "Resume-Driven Development," where an engineer at a pre-seed startup built an enterprise-grade distributed system designed for 100,000 users, despite only having 47. It highlights how engineers might prioritize resume-boosting complex infrastructure over a startup's actual needs, leading to significant financial and human capital costs. Listeners will learn about the dangers of over-engineering and the critical misalignment of incentives in early-stage tech development.

Key Takeaways

Detailed Report

The Case of the Over-Engineered Startup

A seasoned engineer, with eight years of experience in high-scale systems, joined a pre-seed startup tasked with building a project management tool for remote teams. Given total autonomy, the engineer proceeded to construct a full-blown, enterprise-grade distributed system. This elaborate setup included seven distinct microservices, a Kubernetes cluster on AWS, Kong API Gateway, Jenkins CI/CD, a sharded PostgreSQL database, a 3-node sharded Redis cluster, Kafka for event-driven messaging, Prometheus, Grafana, and an ELK stack, complete with circuit breakers and exponential backoff.

Remarkably, this system, designed for 100,000 users, was built for a startup that, after three months of development and a month in beta, had only 47 total users. The author, writing anonymously under the pseudonym "Production Systems" in *Stackademic*, candidly described it as a "crime scene" of over-engineering, built for "ghosts" rather than actual customers.

The Trap of Resume-Driven Development

This incident is not an isolated case but exemplifies a broader behavioral phenomenon: Resume-Driven Development. The core thesis is that the software engineering job market heavily rewards familiarity with complex, distributed systems. An engineer with "Deployed and maintained a 7-service Kubernetes cluster with Kafka event streaming" on their resume commands a significantly higher market premium than someone who simply "Maintained a single Spring Boot monolith."

The engineer themselves confessed, "I had 8 years of experience at companies with scale problems... I knew how to build systems that survive Black Friday." They were not optimizing for the startup's immediate needs but for their own perceived career trajectory, effectively using the pre-seed startup as a paid sandbox. This creates a fundamental conflict: the company needs rapid iteration and user acquisition, while the engineer is incentivized to build complex infrastructure that looks impressive on a resume, regardless of its actual utility to the business.

Tangible Costs of Over-Engineering

The consequences of building for "ghosts" were substantial.

Financial and Operational Drain

During the fourth month, this hyper-scalable system designed for 100,000 users was handling extremely low traffic. The database had minimal active connections, the Kafka cluster processed very few messages, and the Redis cache hit rate was negligible. It was, as described, a Rolls-Royce engine idling in a traffic jam.

For 47 users, the monthly AWS bill represented a significant cost per user, per month, purely in cloud infrastructure. For a pre-seed startup with limited runway, this was a critical drain on resources.

Human Capital Cost

The most chilling metric was the human capital cost. The engineer admitted to spending more than half their work week debugging infrastructure—wrestling with Kubernetes and Kafka—rather than building features or engaging with users.

In Month 5, the founder confronted the engineer, asking, "We need a frontend engineer. Can you handle backend alone?" The engineer's response: "Uh… maybe? I'm spending most of my time on infrastructure." The founder reasonably pushed back: "Why? We have 47 users. Serious question. What are you working on?" The founder's perspective was clear: "We need features to get users. Not infrastructure for users we don't have." This highlights that a startup's most precious resource is developer hours, and burning that time on unnecessary complexity is akin to suffocating the company.

Cargo-Culting and Premature Optimization

The report also points to the "cargo-culting" of buzzwords. Kafka, for instance, was developed by LinkedIn to handle trillions of messages per day. Here, it was used to manage to-do list updates for 47 people, generating very few messages per hour—the architectural equivalent of using a commercial jetliner to cross the street.

This is part of a broader industry culture susceptible to "cargo-cult programming," where developers mimic the outward forms of successful systems without understanding *why* those systems were built that way. Tech vendors and cloud providers aggressively market complex architectures, and startup engineers often prematurely adopt solutions detailed in hyperscaler engineering blogs (the "Netflix/Uber effect"). However, Netflix adopted microservices to solve organizational scaling problems with thousands of engineers, not just technical traffic problems. As the report aptly states, "If you have fewer users than can fit on a standard city bus, you do not have a distributed systems problem. You have a customer acquisition problem."

The Path to Simplicity: A Successful Rewrite

Following the confrontation and a moment of clarity, the engineer pitched a total rewrite. In just three weeks during Month 6, they merged all seven microservices into a single Spring Boot application with a PostgreSQL database. Kafka was replaced with simple PostgreSQL queues, Redis was removed entirely, and Kubernetes was deleted. The entire simplified system was deployed on Railway, a Platform-as-a-Service.

The results were staggering: the monthly AWS bill plummeted significantly, deploy times were drastically reduced, and debugging time dropped substantially. Most critically for a startup, feature velocity doubled. The author's initial conclusion was that "Constraints Are Gifts. A monolith forces you to keep things simple. A single database forces you to think about data design. Constraints force good decisions."

Beyond the Monolith: Nuance and Trade-offs

While the outcome was positive, the idea that "a monolith forces good decisions" is a convenient and arguably simplistic takeaway. A sloppy developer can build a disastrous, tightly-coupled, unmaintainable "Big Ball of Mud" monolith just as easily as they can build messy microservices. The monolith simply limits the *blast radius* of bad decisions by removing network boundaries, latency, and serialization pain; it doesn't magically cure poor design principles.

This critique aligns with a broader industry trend of high-profile companies abandoning microservices when operational overhead outweighs benefits. Martin Fowler, a renowned software architect, has long advocated for a "Monolith First" strategy, noting that most successful microservice stories started with a monolith that grew too big. Our case study is a textbook example of the "serious trouble" that can arise from building a microservice system from scratch.

Even Amazon Prime Video, a poster child for cloud microservices, saw its Video Quality Analysis team move from a distributed, serverless microservices architecture back to a single-process monolith, resulting in a 90% reduction in infrastructure costs and massively improved scaling. Similarly, Segment, a customer data platform, consolidated over 150 microservices into a "distributed monolith" due to the operational overhead of updating shared libraries. Kelsey Hightower, a prominent Kubernetes advocate, has also stated, "Microservices will not fix a bad Monolith." These examples reinforce that architectural choices are trade-offs, not universal truths, and a "monolith-first" approach is often pragmatic, deferring complexity until absolutely necessary.

The User's Perspective and Future Heuristics

Perhaps the most poignant realization from the engineer's story is what the actual users experienced throughout this architectural odyssey. The 47 users noticed "zero difference" from the initial complex architecture. They never saw the seven microservices, Kubernetes, or Kafka, and they never cared about the architecture. Users subscribe to a product for a problem solved, not for its impressive Kubernetes manifest.

The engineer's eventual heuristic, after this painful lesson, was to design for "Now + One Order of Magnitude." If you have 50 users, design for 500, not 100,000. This is a simple, practical rule for scaling responsibly—building just enough to meet current and immediate future needs, rather than chasing hypothetical future scale.

Ultimately, the engineering culture needs to shift its rewards system. We need to celebrate the delivery of value and problem-solving, not just the deployment of complex, buzzword-heavy infrastructure. The opportunity cost of an engineer spending a significant portion of their week wrestling with Kafka consumer lag instead of building features that would actually acquire customers is a hard but crucial number for engineering leaders to quantify.

Show Notes

Works Referenced

  • I Spent 3 Months Building a Scalable Architecture. We Have 47 Users. Here's What I Should've Built Instead: The original anonymous postmortem by an engineer detailing their experience of over-engineering a pre-seed startup's product for hypothetical scale.
  • Kubernetes: An open-source system for automating deployment, scaling, and management of containerized applications, often prematurely adopted by early-stage startups.
  • Apache Kafka: A distributed streaming platform originally developed by LinkedIn for massive data processing, often misapplied to low-traffic scenarios.
  • Amazon Web Services (AWS): A comprehensive, broadly adopted cloud platform that can incur significant costs when infrastructure is over-provisioned for a startup's actual needs.
  • Railway: A developer platform (PaaS) that simplifies deployment, used by the engineer in the case study to host their refactored, simplified monolith.
  • Martin Fowler: A renowned software architect and author, known for advocating a 'Monolith First' strategy for most new software projects.
  • Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%: A 2023 case study from Amazon Prime Video detailing their move from a distributed, serverless microservices architecture back to a single-process monolith, resulting in significant cost reductions.
  • Goodbye Microservices: From 100s of problem children to 1 superstar: A blog post by Segment detailing their journey of breaking a monolith into over 150 microservices, only to consolidate back into a 'distributed monolith' due to operational overhead.
  • Kelsey Hightower: A prominent advocate for Kubernetes and cloud-native technologies, who has also publicly stated that microservices do not fix a bad monolith and are often misapplied.
  • Cargo Cult Science: A concept introduced by physicist Richard Feynman, describing the practice of mimicking the outward forms of successful endeavors without understanding their underlying principles, applied here to software development.

Glossary

  • Microservices: An architectural style where an application is built as a collection of small, independent services that communicate with each other, each focusing on a specific business capability.
  • Kubernetes: An open-source system for automating the deployment, scaling, and management of containerized applications, often used for orchestrating microservices.
  • Kafka: A distributed streaming platform used for building real-time data pipelines and streaming applications, designed to handle high volumes of messages.
  • Monolith: A traditional software architecture where all components of an application are tightly coupled into a single, unified program.
  • CI/CD: (Continuous Integration/Continuous Delivery) A set of practices that enable rapid and reliable software releases through automation of building, testing, and deployment.
  • API Gateway: A server that acts as a single entry point for clients to access multiple backend services, often used in microservices architectures.
  • Sharding: A database partitioning technique that divides a large database into smaller, more manageable parts called shards, distributing data across multiple servers.
  • Event-Driven Architecture: A software design pattern where components communicate by producing and consuming events, allowing for loose coupling and asynchronous processing.
  • Product-Market Fit: The degree to which a product satisfies a strong market demand, indicating that it has found a viable customer base.
  • Runway: In a startup context, the amount of time a company can operate before running out of money, typically measured in months.
  • Resume-Driven Development: The practice of selecting technologies or architectural patterns primarily to enhance an engineer's resume and marketability, rather than to meet the actual needs of the project or company.
  • Cargo Cult Programming: The practice of including code or architectural patterns without fully understanding why they are needed, often mimicking examples from successful but vastly different projects or companies.

Sources / References

Full Transcript

HostThe engineer had eight years of experience building systems for "real scale problems." They joined a pre-seed startup, building a project management tool for remote teams, with total autonomy.
ExpertAnd what did they build? A full-blown, enterprise-grade distributed system. Seven distinct microservices, a Kubernetes cluster on AWS, Kong API Gateway, Jenkins CI/CD, PostgreSQL ready for sharding, a 3-node sharded Redis cluster, Kafka for event-driven messaging, Prometheus, Grafana, ELK stack, circuit breakers, exponential backoff. The full nine yards.
HostFor how many users?
ExpertForty-seven. Total. At its peak, after three months of development and a month in beta. They launched, got 12 sign-ups on day one, and hit 47 users by day 30.
HostForty-seven users. And this full-blown system was designed for 100,000. It sounds less like engineering and more like an architectural hallucination.
ExpertIndeed. The author, writing anonymously under the pseudonym "Production Systems" in *Stackademic*, admits as much. They describe it as a "crime scene" of over-engineering, built for "ghosts" rather than actual customers.
HostThis isn't just an isolated incident of an engineer getting carried away. The report frames this as a deeply ingrained behavioral phenomenon: Resume-Driven Development.
ExpertThat's right. The core thesis is that the software engineering job market heavily rewards familiarity with complex, distributed systems. An engineer with "Deployed and maintained a 7-service Kubernetes cluster with Kafka event streaming" on their resume commands a significantly higher market premium.
HostAs opposed to what? "Maintained a single Spring Boot monolith on Railway?" That's not exactly going to get you past the HR filter at a FAANG company, is it?
ExpertPrecisely. The author themselves confessed, "I had 8 years of experience at companies with scale problems... I knew how to build systems that survive Black Friday." They weren't optimizing for the startup's needs; they were optimizing for their own perceived career trajectory, using the pre-seed startup as a paid sandbox.
HostSo, early-stage companies effectively become proving grounds for engineers to acquire marketable skills, even if those skills are completely irrelevant, or even detrimental, to the product's immediate survival. The incentive structure is clearly misaligned.
ExpertIt's a fundamental conflict. The company needs to find product-market fit, which means rapid iteration, shipping features, and acquiring users. The engineer, perhaps subconsciously, is incentivized to build complex infrastructure that looks impressive on a resume, regardless of whether it actually helps the company achieve its goals.
HostThe cost of this misalignment must be substantial. Beyond the architectural bloat, what were the tangible consequences of building for these "ghosts"?
ExpertThe source material provides some truly stark metrics. During Month 4, this hyper-scalable system designed for 100,000 users was handling extremely low traffic.
HostLet that sink in. For a system built for Black Friday traffic, it was handling negligible load.
ExpertExactly. The database had minimal active connections. The Kafka cluster, designed for massive event streaming, was processing very few messages. And the Redis cache hit rate was extremely low. It was a Rolls-Royce engine idling in a traffic jam.
HostThe absurdity is almost comedic. But beyond the technical metrics, there's the financial and human cost. The report details a substantial AWS bill for this setup.
ExpertFor 47 users, this represented a significant cost per user, per month, purely in cloud infrastructure. For a pre-seed startup, that's not just a rounding error; it's a significant drain on their limited runway.
HostBut the really chilling number is the human capital cost. The engineer admitted to spending a significant portion of their work week debugging infrastructure. More than half their time, not building features, not talking to users, but wrestling with a Kubernetes cluster and a Kafka setup that had no business being there.
ExpertAnd this is where the "emotional climax" of the episode, as the research calls it, occurs. In Month 5, the founder finally confronted the engineer. The founder asked, "We need a frontend engineer. Can you handle backend alone?" The engineer's response: "Uh… maybe? I'm spending most of my time on infrastructure."
HostAnd the founder, completely reasonably, pushed back, "Why? We have 47 users. Serious question. What are you working on?"
ExpertThe engineer then listed off "Scaling the Kafka consumers, fixing Kubernetes networking, optimizing Redis—" and the founder cut them off: "For 47 users? We need features to get users. Not infrastructure for users we don't have."
HostThe founder's perspective is absolutely spot on. A startup's most precious resource is runway, directly correlated to developer hours. Burning that time playing sysadmin for an empty room is akin to suffocating the company. The complexity of these distributed systems inherently devours time through network latency debugging, serialization issues, and deployment orchestration.
ExpertIt's a stark reminder that infrastructure is not a goal in itself; it's a means to an end. And if the means are consuming a significant portion of the primary developer's time for negligible load, it's actively working against business objectives.
HostThat founder's line, "We need features to get users. Not infrastructure for users we don't have," should be tattooed on the forehead of every early-stage startup engineer.
ExpertIt certainly highlights the disconnect. And part of that disconnect stems from what the report calls the "cargo-culting" of buzzwords.
HostKafka was mentioned earlier, and it's a prime example. The source material notes Kafka was developed by LinkedIn to handle trillions of messages per day.
ExpertAnd the engineer was using it to manage to-do list updates for 47 people, generating very few messages per hour. That's the architectural equivalent of using a commercial jetliner to cross the street. It’s a tool designed for global, interplanetary travel being used for a quick trip to the corner store.
HostThis isn't just about Kafka, though, is it? It's about a broader industry culture susceptible to "cargo-cult programming," where developers mimic the outward forms of successful systems without understanding *why* those systems were built that way. Richard Feynman's concept of cargo cult science, applied to software.
ExpertExactly. Tech vendors and cloud providers aggressively market "event-driven architectures" and "serverless microservices" as the default modern standard. And then you have the "Netflix/Uber effect." Startup engineers read these hyperscaler engineering blogs, detailing solutions to massive scaling bottlenecks, and prematurely adopt those solutions.
HostBut Netflix adopted microservices to solve organizational scaling problems with thousands of engineers, not just technical traffic problems. It wasn't about the number of users at first; it was about managing independent teams.
ExpertA crucial distinction often overlooked. The report highlights this perfectly: "If you have fewer users than can fit on a standard city bus, you do not have a distributed systems problem. You have a customer acquisition problem." That should be another maxim for early-stage development.
HostSo, the engineer had this moment of clarity. The confrontation with the founder, the realization they were solving the wrong problem. What was the outcome? A total rewrite?
ExpertYes, a total rewrite. In Month 6, the engineer pitched the founder on it, and in just three weeks, they merged all seven microservices into a single Spring Boot application with a PostgreSQL database. Kafka was replaced with simple PostgreSQL queues, Redis was removed entirely, and Kubernetes was deleted. They deployed it all on Railway, a PaaS.
HostAnd the results of this dramatic simplification?
ExpertThey were staggering. The monthly AWS bill plummeted significantly. Deploy times were drastically reduced. The debugging time dropped substantially. And, most critically for a startup, feature velocity doubled.
HostSo, a clear win for simplicity. The author's conclusion to their postmortem was that "Constraints Are Gifts. A monolith forces you to keep things simple. A single database forces you to think about data design. Constraints force good decisions."
ExpertAnd this is where it's important to push back on the author's reasoning, however well-intentioned. While the outcome was positive for them, stating that "a monolith forces good decisions" is a rather convenient and arguably simplistic takeaway.
HostConvenient how?
ExpertA sloppy developer can build a disastrous, tightly-coupled, unmaintainable "Big Ball of Mud" monolith just as easily as they can build seven messy microservices. The monolith simply limits the *blast radius* of bad decisions by removing the network boundary. It removes the latency and serialization pain, but it doesn't magically cure bad code or poor design principles. It doesn't *force* good decisions; it merely removes a layer of complexity that amplifies bad ones.
HostSo, the monolith didn't make them a better engineer, it just removed the distributed systems overhead that was previously punishing their architectural choices.
ExpertPrecisely. And this critique is grounded in a broader industry trend. We've seen high-profile companies abandoning microservices when the operational overhead outweighs the benefits.
HostFor instance, Martin Fowler, the renowned software architect, has long advocated for a "Monolith First" strategy.
ExpertHe famously said, "Almost all the successful microservice stories have started with a monolith that got too big and was broken up; Almost all the cases where I've heard of a system that was built as a microservice system from scratch, it has ended up in serious trouble." Our case study is a textbook example of that "serious trouble."
HostAnd Amazon, of all companies. Amazon Prime Video, the poster child for cloud microservices. The source references a 2023 case study where their Video Quality Analysis team moved from a distributed, serverless microservices architecture back to a single-process monolith.
ExpertAnd the result? A 90% reduction in infrastructure costs and massively improved scaling capabilities. If Amazon, the architects of AWS, is reverting to monoliths to save money and improve performance for a highly scalable video service, then a 47-user task management app certainly doesn't need a Kubernetes cluster.
HostWe also saw Segment, the customer data platform, famously break their monolith into over 150 microservices, only to consolidate back into a "distributed monolith" called Centrifuge, because the operational overhead of updating shared libraries across so many repositories crushed developer productivity.
ExpertAnd Kelsey Hightower, a prominent Kubernetes advocate, has been quoted saying, "Monoliths are the future because the problem people are trying to solve with microservices doesn't really line up with reality... Microservices will not fix a bad Monolith." The point isn't that monoliths are inherently superior, but that they are the appropriate starting point for most organizations and that microservices do not automatically solve underlying code quality issues.
HostThese examples reinforce that architectural choices are trade-offs, not universal truths. The "monolith-first" approach is less about dogma and more about pragmatism and deferring complexity until it's absolutely necessary.
ExpertExactly. And the most poignant realization from the engineer's story is what the actual users experienced throughout this architectural odyssey.
HostThe 47 users. What did they get from the initial three-month build?
ExpertSeven microservices they'll never see. Kubernetes they'll never know about. Kafka they don't care about. The author's final kicker really hits home: "The kicker? Users noticed zero difference. Because they never cared about our architecture."
HostThis highlights a deep disconnect in software engineering. We often equate technical elegance, or more accurately, technical complexity, with quality. But for early-stage products, the metrics that truly matter are speed of iteration and feature completeness. Users don't subscribe to your product for its impressive Kubernetes manifest; they subscribe for a problem solved.
ExpertThe author's eventual heuristic, after this painful lesson, was to design for "Now + One Order of Magnitude." If you have 50 users, design for 500. Don't design for 100,000. It's a simple, practical rule for scaling responsibly.
HostIt's about building just enough to meet current and immediate future needs, rather than chasing hypothetical future scale. A boring, highly functional monolith that acquires 1,000 paying customers is infinitely superior to a pristine, horizontally scalable Kubernetes cluster serving an empty room.
ExpertThat's the ultimate takeaway. The engineering culture needs to shift its rewards system. We need to celebrate the delivery of value and problem-solving, not just the deployment of complex, buzzword-heavy infrastructure.
HostSo, looking back at this entire saga, the trap of Resume-Driven Development is clearly a powerful force. How much of the infrastructure bloat observed in the industry is truly driven by engineers optimizing for their next job interview, rather than their current employer's survival?
ExpertAnd beyond the AWS bill, which is easy to quantify, how do engineering leaders truly measure the opportunity cost? The cost of an engineer spending a significant portion of their week wrestling with Kafka consumer lag instead of building features that would actually acquire customers? That's a much harder number to put on a spreadsheet.
HostAnd while the engineer's rewrite was successful, it's important to remember that a monolith doesn't magically "force" good code.
ExpertIndeed. The fundamental point, as Kelsey Hightower notes, is that "Microservices will not fix a bad Monolith." Architecture choice is about managing complexity and trade-offs, not a panacea for poor design or a substitute for understanding user needs. Ultimately, users don't care about your stack. If Amazon Prime Video can save 90% by ditching microservices, why should a 47-user startup think they need them?