SpecStory Logo
← Back to Blog

SpecStory Studio Case Study

Greg Ceccarelli10 min readProduct

Executive Summary

This process analysis examines the development of SpecStory Studio from May 8 to June 2, 2025, revealing how four human developers successfully orchestrated AI agents to build a complex software system. Through 87 conversations, 520 commits, and 66 workflow documents, the project demonstrates a novel development paradigm where humans acted as architects and validators while an AI agent performed the bulk of implementation work.

NOTE TO READER: This analysis was generated using a combination of process mining software, chained LLM prompts and human review for clarity. It is the complement to the Beyond Code Centric White Paper available for download on our website. It details the end-to-end build of our Pre-Alpha Studio Product (not released).

Key Findings

  • Development Velocity: 20.6 commits per day with 85.7% first-time success rate

  • Collaboration Model: 2.6:1 human-to-agent interaction ratio with specialized human roles emerging

  • Architecture Evolution: Rapid progression from monolithic to service-oriented design

  • Process Maturity: Evolution from ad-hoc development to formal plan-driven methodology


Part I: The Development Journey

Chapter 1: Genesis and First Pivots (May 8-12)

The Original Vision

The project began with an ambitious goal: create a system comprising a Flutter desktop app, Cloudflare Workers API, and Remix web app. The initial architecture specified Python/FastAPI for the backend, but this vision would survive less than an hour.

The First Issue (May 8, Hour 1)

Human: "What might this be? error: failed to run custom build command for pydantic-core v2.16.2..."
Agent: [Diagnoses Python 3.13 compatibility issue]
Human: "OK, this Cloudflare API worker in /apps/api on Python isn't working out. Let's move that to Typescript."

This immediate pivot from Python to TypeScript set the project's tone: rapid adaptation based on real-world constraints rather than rigid adherence to initial plans.

The Second Issue (May 8, Hour 3) The TypeScript implementation using itty-router failed with a hanging Promise error. After two unsuccessful fix attempts:

Agent: "Looking at Cloudflare Workers documentation... I see we should use Hono instead of itty-router."
Human: "groovy."

This established the final API architecture: TypeScript with Hono framework.

Key Architectural Decisions

Day 1-2 Decisions:

  • Backend: Python → TypeScript/Hono (stability over familiarity)

  • Framework: itty-router → Hono (official support over lightweight option)

  • Environment: Local development with explicit path configurations

Day 4 Strategic Pivot (May 12):

  • Auth/Database: Clerk + Custom Backend → All-in-one Supabase

  • Process: Ad-hoc BOOTSTRAP.md → Formal phased workplans

  • Architecture: Monolithic files → Service-oriented design

Chapter 2: The Team Dynamics

The Human Orchestrators

Jake Levirne - The Conductor (308 commits)

  • Primary implementer of the "Guide-Build-Fix" cycle

  • Master of agent-human translation

  • Led critical refactoring efforts (DocState/DocCollection)

Greg Ceccarelli - The Specialist (90 commits)

  • Complex feature architect (Notion-style editor, theme system)

  • Production firefighter (CI/CD fixes)

  • Technical pivot decision maker

Sean Johnson - The Feature Pioneer (81 commits)

  • Zero-to-one feature creator

  • Claude Loop saga protagonist

  • Large-scale scaffolding expert

Eric Musgrove - The Integrator (41 commits)

  • CI/CD pipeline architect

  • Cross-team collaboration hub (highest shared file count)

  • Monorepo stability guardian

Collaboration Patterns

The data reveals three dominant collaboration modes:

  1. Vision → Implementation (Most Common)

    • Human provides high-level goal
    • Agent generates comprehensive implementation
    • Human validates and refines
  2. Debugging Partnership (Critical Moments)

    • Agent attempts solution
    • Human provides key insight
    • Agent applies insight successfully
  3. Iterative Refinement (UI/UX Polish)

    • Rapid back-and-forth cycles
    • Small, specific improvements
    • Real-time feedback integration

Chapter 3: Major Development Sagas

The Editor Evolution (May 16-22)

Phase 1: Rapid Feature Addition

  • Initial implementation: 17,000 lines for Notion-style features

  • Custom blocks: Callouts, Dividers, Quotes, Checklists

  • Slash command system integration

Phase 2: Technical Debt Recognition

  • Problem: Monolithic project_detail_screen.dart becoming unmaintainable

  • Diagnosis: State management tightly coupled to UI

Phase 3: Architectural Refactor

Commit b4c2f1cc0 (May 22): Jake Levirne introduces DocState and DocCollection
Commit 1c15dbcf4 (May 22): Agent integrates new architecture
Commit b70c71ff6 (May 22): Jake performs "Aggressive refactor" (-650 lines)

Outcome: Modular, extensible editor architecture

The Claude Loop Challenge (May 16-27)

The Vision: Integrate AI agent execution directly into the development environment

Technical Hurdles:

  1. Streaming Problem: Node.js buffering prevented real-time output

    • Solution: PTY (Pseudo-Terminal) implementation
    • Key Insight: "We already know Claude needs a PTY to respond properly"
  2. JSON Corruption: Malformed data in session logs

    • Initial Approach: Build robust parser for broken JSON
    • Pivot: "HOW did we get this malformed JSON content?"
    • Solution: Abandon stream parsing, use file watching instead

Implementation Timeline:

  • May 22: Metadata and session file creation

  • May 23: UI parsing with corruption fixes

  • May 27: Continue/chat functionality

  • May 29: Full file execution pivot

The CI/CD Gauntlet (May 28-30)

The Challenge: Production macOS builds failing despite working in development

Failed Attempts:

Commit b8af6ed0 (18:37): Agent attempts PATH fix
Commit 37b36998 (18:44): Greg reverts (7 minutes later)
Commit 5fa6178e (19:02): Agent's second attempt
Commit 738ef343 (19:18): Greg reverts again (16 minutes)

Human Solution:

Commit 593eaedc: Disable problematic features temporarily
Commit 675a6864: Direct modification of Info.plist and project.pbxproj

Lesson: Environment-specific production issues remain beyond agent capabilities


Part II: Technical Architecture Evolution

From Monolith to Services

Initial State (May 8-10)

apps/desktop/lib/main.dart (contains everything)
├── UI logic
├── Business logic
├── API calls
└── State management

Service Extraction Timeline

Week 1: Core Services

  • AIService: Extracted API calls and streaming logic

  • FileTreeService: Centralized file system operations

Week 2: Architecture Services

  • ThemeService: UI customization and dark mode

  • SpecialFilesConfig: Centralized configuration

  • DiffCalculationService: LCS algorithm implementation

Week 3: Advanced Patterns

  • DocState/DocCollection: Document state management

  • TaskStreamingService: Message bus pattern for async communication

  • ClaudeService: PTY-based process management

Technology Stack Evolution

Backend Journey

Python/FastAPI (1 hour)
    ↓ [Build failure]
TypeScript/itty-router (2 hours)
    ↓ [Hanging promises]
TypeScript/Hono (Stable)

Authentication Evolution

Planned: Clerk (external) + Supabase (database only)
    ↓ [Complexity reduction]
Final: Supabase (all-in-one auth + database)

Key Dependencies Added

  • State Management: provider

  • UI Enhancement: flutter_svg, flutter_quill

  • Process Management: flutter_pty

  • Development: logging (replaced all print statements)

Code Quality Journey

Phase 1: "Make it Work" (Days 1-7)

  • Rapid prototyping

  • Monolithic files acceptable

  • Technical debt accumulation

Phase 2: "Make it Right" (Days 8-14)

  • Service extraction begins

  • Linting rules established

  • First refactoring efforts

Phase 3: "Make it Maintainable" (Days 15-25)

  • Comprehensive refactoring

  • Architectural patterns established

  • Living documentation via workplans


Part III: Process Innovation

The Workplan Revolution

Before (May 8-11)

  • Single BOOTSTRAP.md file

  • Ad-hoc task management

  • Informal communication

After (May 12+)

  • Structured workplans/ directory

  • Phased implementation plans

  • Living documentation with:

    • Task checklists [x]
    • Known issues sections
    • Architectural decisions
    • Implementation notes

Example Workplan Lifecycle

Dark Mode Implementation

1. Creation (May 21, 03:42): Human requests plan

2. Planning: Agent creates Dark_Mode_Theming.md
3. Approval: Human reviews and approves

4. Implementation: Agent executes in 2h 39m
5. Validation: Human tests and requests tweaks

6. Completion: Plan updated with [x] markers

The Guide-Build-Fix Loop

This became the dominant development pattern:

  1. Guide (Human)

    • Create detailed workplan
    • Define architecture
    • Set constraints
  2. Build (Agent)

    • Generate implementation
    • Follow patterns
    • Update documentation
  3. Fix (Human)

    • Test thoroughly
    • Handle edge cases
    • Resolve environment issues

Part IV: Lessons and Insights

What Worked Exceptionally Well

1. Rapid Feature Scaffolding

The agent could generate entire features in hours that would take days manually:

  • Supabase authentication: 2.5 hours from plan to implementation

  • Dark mode theming: 2.6 hours including UI polish

  • Initial editor with custom blocks: Single 17,000-line commit

2. Systematic Refactoring

Agent excelled at project-wide changes:

  • Replacing all print() with logging service

  • Extracting widgets from monolithic files

  • Implementing consistent patterns

3. Learning from Examples

Once a pattern existed, the agent could replicate it perfectly:

  • Quote blocks learned from Callout blocks

  • New services followed established patterns

  • UI components maintained consistency

Critical Friction Points

1. The "First Mile" Problem

Environmental issues consistently blocked progress:

  • Python version incompatibility

  • macOS network permissions

  • Node.js port conflicts

  • Flutter path configuration

Impact: ~20% of conversations involved environmental debugging

2. The "Last Mile" Problem

Agent code required human cleanup:

  • Linting errors

  • Minor logical flaws

  • Integration issues

  • Production-specific bugs

Impact: Every agent commit needed 15-30 minutes of human review

3. Architectural Blind Spots

Without explicit guidance, agent defaulted to:

  • Monolithic implementations

  • Tight coupling

  • Missing error handling

  • Incomplete documentation

Key Success Factors

1. Embracing Human-Agent Complementarity

  • Humans: Vision, architecture, problem-solving

  • Agent: Implementation, refactoring, pattern replication

2. Rapid Pivoting

Major pivots executed without hesitation:

  • Python → TypeScript (Day 1)

  • Custom auth → Supabase (Day 4)

  • Stream parsing → File watching (Day 15)

3. Living Documentation

Workplans served triple duty:

  • Implementation guide

  • Progress tracker

  • Historical record

Quantitative Insights

Development Metrics:

  • Velocity: 20.6 commits/day (10x traditional)

  • Quality: 85.7% first-time success rate

  • Rework: Only 14.2% of features needed significant revision

Collaboration Metrics:

  • Human/Agent Ratio: 2.6:1 interactions

  • Handoff Efficiency: 85% successful first attempts

  • Clarification Cycles: 1.01 average (very low)


Part V: Future Recommendations

Process Improvements

1. Pre-Flight Checklist

Implement automated environment validation:

#!/bin/bash
# Pre-development validation
check_tool_versions()
check_network_permissions()
verify_api_endpoints()
test_hello_world_builds()

2. Architectural Templates

Create AGENT_PATTERNS.md:

## Service Creation Pattern
1. Create service in services/

2. Use singleton pattern
3. Include error handling

4. Add logging
5. Write tests

3. Automated Quality Gates

Enforce on all agent commits:

  • Linting passes

  • Tests run

  • Documentation updated

  • Patterns followed

Scaling Considerations

For Larger Teams

  • Dedicated "Agent Wranglers" per feature area

  • Centralized workplan coordination

  • Automated merge conflict resolution

For Complex Projects

  • Multi-agent orchestration

  • Specialized agents (UI, Backend, Testing)

  • Human architects for system design


Conclusion

The SpecStory Studio project demonstrates that agent-driven development is viable. The insight is not that AI replaces developers, but that it fundamentally changes what developers do. In this new paradigm, humans become orchestrators, architects, and problem-solvers, while AI handles the implementation heavy lifting.

The 10x productivity gains achieved here came not from the agent alone, but from the sophisticated dance between human vision and AI execution. As we move forward, the teams that master this dance will have a significant competitive advantage in software development.

The future of software engineering has arrived, and it's a partnership.


Analysis completed using ProcessMiningAnalyzer V2 Repository: specstoryai/specstory-studio Period: May 8 - June 2, 2025