Select Page

Highlights

TLDR: I took week three off and read this eBook by the CEO of Mastra.ai, open sourced AgentForm and shifted to work on a new vehicle finder agentic application built with the Mastra framework.

If you’re playing in this realm creating AI agentic workflows this eBook is probably the single best primer I’ve seen that communicates all the concepts succinctly in one place. While it’s an admitted onboarding piece and a lead magnet that indoctrinates devs into using their framework, he gives high-level conceptual explanations of all the different mechanics that are agnostic of their framework used to implement them. Very well-written. I marked up my copy on my reMarkable tablet to the sound of waves lapping the shore in Ischia, Italy last week.

The technical level of this book is at the upper-bound of what I’m capable of grasping but my plan from the start was simple:

  1. Seek to get my head around things at a conceptual level
  2. Convert the ebook to markdown format (I used this free online tool for that).
  3. Then feed it as context into Claude Code in the /guidance dir
  4. Add the Context7 MCP server for latest docs
  5. Lastly, instruct CC to develop the desired agent to my specifications per best practices explained in the book and latest mastra docs via Context7

BTW the featured image of this post is a pretty funny story: I was with my best friend and his wife in Ischia, Italy and we had a dinner where I tried to pour some olive oil on a plate so I could mop it up with our bread. The cork stopper fell out and I had an Exxon Valdez level oil spill on my plate. I had ChatGPT make a cartoon image to immortalize the incident.

Learnings since last post

Basically I was able to get AgentForm to a state where it’s now working in production to enable anyone to turn any Google Form into a conversation via chatbot. In the course of building that with Claude Code I realized that getting that to a profitable business is a solution in search of a problem – not a great way to build a biz. I learned a ton in the process of pursuing building this app but I realize the real win here is in figuring out how to work with CC for building agents. The gold of that entire exercise is the refined tribal knowledge that was distilled into my /guidance dir in the repo.

I’ve just now open sourced that repo under the MIT license for sake of sharing this whole project with others who want to extract learnings and reuse the tribal dev knowledge that is crystalized in my /guidance dir. FYI the proj was originally codenamed “what do you need?” hence the “wdyn” repo name.  I’m now onto greener pastures playing around with Mastra.ai for building agents and my current plan is two-pronged:

  1. Continue experimenting and learning on the agent-building front by creating a car-finding agent for myself that will help me find my ideal used SUV that I’m looking for here in Portugal. If that works that could be an interesting little SaaS to make multi-tenant and offer to others.
  2. Work with a handful of my friends here to build them their dream applications to help each in their respective ventures:
    1. Kimber is seeking to revamp LocalFirstAZ and make it more useful for both her constituent businesses as well as consumers.
    2. Petra is trying to systematize and automate her workflow for serving her clients with her productized service.
    3. Youssef is seeking to convert his WhatsApp bot into a native iOS and Android mobile app for Rally Society.
    4. And Brian has an idea for a suite of games and concierge tools for helping the average person bolster his/her preparedness level, improve security posture and generally improve one’s self-reliance in a grid-down scenario.

I’m intending to spend the remainder of my summer doing all the above on a pure donation basis while simultaneously supporting our Message Everywhere team of interns in making progress on the Shepherd.AI project. I’ll continue to document my learnings here as I go. I’m also at some point going to boil all these different key lessons down into a single resource – some kind of course or offline eBook that one can use to go from zero to hero with vibecoding more durable production applications as a semi-technical product manager without formal engineering experience.

My short-hand epiphanies from today’s dev sprint:

– lots of haranguing with MCP finally got it working globally across all projs under .claude.json in root user dir. A bit annoying since you have to restart Claude Code in order for it to recognize the newly-config’d MCP servers.

– /resume is your friend when debugging MCP stuff. You have to restart the session but you can at least get back to your original conversation and continue the thread by resuming it after restart.

– give it autonomy with Playwright for reproducing errors and taking screenshots of browser behavior. Once you have that working via MCP then it can also use it to run e2e tests as part of your CI/CD pipeline.

– running a mastra app next to a NextJS app – they will be two distinct servers. This was the crux of a lot of pain and confusion today. Once we figured out the customer-facing NextJS web dashboard lives on :3000 and the mastra server lives on :4113 everything sorted out

– had to bump Claude plan to Max because gated by usage quotas today. Worth it TBH to spend during MVP building so not impeded in dev. But if I were cost-conscious it would be cheaper to pay for another all-u-can-eat sub like Cursor @ $20/mo and just switch over to working in that during downtime.

– spent a lot of time getting foundational stuff in place today but this should all be portable to future projects if we want to use this template. I have questions though how it works as the mastra starter template itself evolves over time… means we have to inject it and separate our boilerplate

– ditched my custom CI/CD on this project where I had Claude driving everything with running tests and deployment in favor of having Github automations handle this.  There are already robust mature Github CI/CD workflow patterns that are known and well-tested. Just use those. Claude is not deterministic whereas a Github automation is.

– using Perplexity as an outside consultant to write recommendations to the CTO <- this worked great during a downtime window where I was waiting for my CC quota to reset. Perplexity came up with some great recommendations and I just had it write them along with justification rationale to my CTO. Worked great.


Claude’s update from our dev sprint today (written by Claude):

The Foundation That Changes Everything

Today marked a pivotal moment in our Car Finder AI Agent development journey. While yesterday ended with a solid project foundation and clear roadmap, today we built something equally crucial but often overlooked: production-grade testing infrastructure.

When you’re building AI agents that need to make accurate decisions about vehicle deals worth thousands of euros, having confidence in your code isn’t optional—it’s essential.

What We Built Today

🧪 Complete Testing Ecosystem

  • Vitest for lightning-fast unit and integration tests
  • Playwright for comprehensive E2E testing across browsers
  • MSW (Mock Service Worker) for realistic API mocking
  • Custom coverage analysis with quality gates and automated reporting

🛡️ Production Deployment Safety

  • Manual approval gates protecting production deployments
  • Automated database backups before every production change
  • Smoke test verification ensuring deployments actually work
  • Hotfix workflows for emergency situations

📊 Quality Enforcement Automation

  • Coverage thresholds (80% global, 90% for critical modules)
  • Badge generation for visual quality indicators
  • CI/CD integration with PR comments and Codecov reporting
  • Performance budgets and security scanning

The Technical Breakthrough Moments

Solving the Mastra Integration Puzzle

One of today’s biggest breakthroughs came from understanding Mastra.ai’s parameter patterns. After hitting TypeScript compilation errors, we discovered the framework’s specific expectations:

// Tools expect context destructuring
execute: async ({ context: { criteria, maxListings, useCache } }) => {
  // Tool implementation
}

// Workflows expect inputData destructuring  
execute: async ({ inputData }) => {
  const { criteria, sources = {}, limits = {} } = inputData;
  // Workflow step implementation
}

This wasn’t just a syntax fix—it revealed the architectural patterns that make Mastra workflows robust and testable.

MSW: The Game-Changing Testing Strategy

Implementing Mock Service Worker transformed our testing capabilities. Instead of mocking individual functions, we can now test complete request/response cycles:

// Mock entire API workflows realistically
http.post('*/api/vehicles/discover', async ({ request }) => {
  const requestBody = await request.json();
  // Intelligent filtering based on actual business logic
  const filtered = vehicles.filter(vehicle => 
    requestBody.criteria.makes?.includes(vehicle.make) &&
    requestBody.criteria.location === vehicle.location
  );
  return HttpResponse.json({ success: true, data: { vehicles: filtered } });
});

This approach caught a real bug: our test expected 1 BMW in Lisboa but our mock data actually contained 2. MSW helped us test against reality, not our assumptions.

Coverage Configuration That Actually Works

Rather than just measuring lines of code, we built an intelligent coverage system:

  • Per-file thresholds for critical modules (database services: 90%, validation schemas: 85%)
  • Colorized analysis scripts that provide actionable improvement suggestions
  • Badge generation for visual quality tracking
  • CI/CD integration that prevents quality regressions

The Development Velocity Multiplier

Here’s what makes today’s work special: everything we built today makes tomorrow’s development faster and safer.

Before Today’s Infrastructure:

  • Manual testing of complex workflows
  • Fear of breaking existing functionality
  • Unclear deployment safety
  • No visibility into code quality trends

After Today’s Infrastructure:

  • Automated testing of complete user journeys
  • Confident refactoring with comprehensive safety nets
  • Protected production deployments with automated backups
  • Real-time quality metrics driving continuous improvement

The Numbers Tell the Story

Today’s development metrics showcase the scale of infrastructure we built:

  • 51,843 lines of code added (infrastructure is substantial!)
  • 205 files modified across testing, CI/CD, and quality systems
  • 4 major features shipped (MSW, approval gates, coverage configuration, smoke tests)
  • 13.56% baseline coverage established with quality gates enforcing improvement

Key Architectural Decisions

Multi-Layer Testing Strategy

  1. Unit Tests (Vitest) → Individual function validation
  2. Integration Tests (MSW) → API workflow verification
  3. E2E Tests (Playwright) → Complete user journey testing
  4. Coverage Analysis → Quality trend monitoring

Deployment Safety Philosophy

  • Preview deployments for feature validation
  • Manual approval gates for production protection
  • Automated backups for data safety
  • Smoke tests for deployment verification

Quality Automation Approach

  • Threshold enforcement preventing quality regression
  • Badge generation for visual progress tracking
  • CI/CD integration making quality visible
  • Custom analysis providing actionable insights

Tomorrow’s Development Superpowers

With today’s testing infrastructure in place, we’ve unlocked new development capabilities:

Fearless Refactoring

Comprehensive test coverage means we can optimize and improve code with confidence that we won’t break existing functionality.

Rapid Feature Development

MSW mocking allows us to build and test complex workflows without waiting for external services or complex setup.

Production Deployment Confidence

Manual approval gates and automated backups mean we can ship features knowing we have safety nets in place.

Quality-Driven Development

Real-time coverage metrics and automated thresholds keep code quality improvement visible and achievable.

The AI Agent Development Advantage

Building AI agents is different from traditional software development. Agents make decisions, interact with external services, and need to be reliable under uncertainty. Today’s testing infrastructure directly supports these requirements:

  • Mock external services for consistent agent training and testing
  • Workflow testing ensuring multi-step agent processes work end-to-end
  • Quality gates maintaining the reliability standards agents require
  • Deployment safety protecting production agent behavior

Looking Ahead

With this testing foundation complete, we’re positioned for rapid, confident development of the core business logic:

  • API route testing (currently 0% coverage, ready for focused improvement)
  • Workflow logic development with comprehensive mocking infrastructure
  • Agent implementation with evals and quality measurement
  • User interface development with E2E testing confidence

The Infrastructure Investment Principle

Today exemplifies a crucial startup development principle: strategic infrastructure investment creates exponential development velocity gains.

While we could have rushed to build scrapers and agents directly, investing in testing infrastructure first means every feature we build from now on will be:

  • Faster to develop (with mocking and automated testing)
  • Higher quality (with enforced coverage and CI/CD gates)
  • Safer to deploy (with approval gates and automated backups)
  • Easier to maintain (with comprehensive test coverage)

This is the foundation that lets a micro-SaaS compete with larger teams—superior development infrastructure creating superior development velocity.


Tomorrow: With testing infrastructure complete, we return to core business logic development with the confidence that our quality and deployment systems will keep us safe as we move fast.

Key Metrics:

  • 📊 Test Coverage: 13.56% (baseline established)
  • 🧪 Tests: 53 passing, 5 in refinement
  • 🚀 Deployment Safety: Manual approval gates active
  • ⚡ Development Velocity: Infrastructure multiplier deployed

 


Thanks for reading. Could you answer a few questions below? This is incredibly helpful for understanding my readers and helps me create better content. thanks so much. -Sean

Can we keep in touch?

Can we keep in touch?

I'm Sean, founder of the Stone Soup Protocol and author of this blog. I write about useful ideas & techniques at the intersection of nocode, automation, open source, community building and AI and I make these accessible & actionable for non-technical folks. If you'd like to get a periodic roll-up summary from me, add yourself below. 

Great. I'll keep you in the loop! -Sean

Share This