Back to Blog
Case Study

How We Built DevboardAI: A Local-First AI Coding Agent Orchestrator for Mac

Sophylabs Engineering
9 min read

Quick answer

DevboardAI is a native macOS AI coding agent orchestrator built by Sophylabs that gives developers a local-first Kanban dashboard for Claude Code, OpenAI Codex, and Kimi. Users describe a feature in plain English; the app generates a full sprint, then runs parallel coding agents in isolated git worktrees with automatic retries and inline diffs. It ships as a notarized Electron app for Apple Silicon and Intel, sold as a $74 lifetime license — no cloud backend, no subscription. Read the full DevboardAI case study.

Most developers do not have an AI problem. They have an orchestration problem. Claude Code, Cursor, and Copilot are excellent assistants — they wait for a prompt, suggest code, and leave the loop to the human. You prompt, you paste, you run it, you paste the error back into chat, you repeat. Side projects pile up because every feature requires hours of supervision.

The DevboardAI founder came to us with a sharper framing: agent CLIs are finally capable of writing real code against a real repo. They just need a builder layer on top to handle planning, task delegation, parallelism, retries, and review. Build that as a local-first Mac app and the developer goes from babysitting a chat window to describing a sprint and walking away.

The Problem DevboardAI Solves

The pain points the founder brought us were the same ones we have hit on our own client projects:

  • -Assistant, not builder. Existing tools wait for a prompt. The human is still the orchestrator — deciding what to do next, what file to edit, what to run, what to retry.
  • -One task at a time. There is no clean way to run five agents on five tasks against one repo without them clobbering each other.
  • -Subscription stacking. Cursor at $20, Claude Max at $100, Devin at $20+ACUs, plus whatever else you tried this month. The total bill grows but the loop is still manual.
  • -Cloud-only execution. Sending the contents of a private repo to a hosted agent is a non-starter for a lot of developers, and a non-starter for most clients.

Weeks 1–2: Architecture Before Features

We spent the first two weeks resolving the decisions that would shape every line of code after. For a desktop AI product, those decisions matter more than usual — you cannot redeploy your way out of a bad one.

  • -Desktop, not SaaS. Electron with a Node.js main process and a React renderer. Local SQLite for state. No backend server, no auth flow, no "send us your repo".
  • -Agent-agnostic execution layer. A thin adapter wraps each CLI — Claude Code, OpenAI Codex, Kimi — behind a common interface for spawn, prompt, stream, and cancel. Adding a fourth agent is a single adapter, not a refactor.
  • -Worktree-based parallelism. Each running task gets its own git worktree so parallel coding agents can edit the same repo simultaneously. Merges and discards happen per task from inline diffs.
  • -One-time license, not subscription. Stripe checkout, signed license keys verified locally, no entitlements server in the hot path. The economics of a $74 lifetime app forced us to keep operational cost near zero.

The Orchestrator: Three Loops That Earn Their Keep

A common failure mode for AI products is sprinkling LLM calls across every surface. The result feels impressive in a demo and frustrating in real use. DevboardAI started narrow with three loops that each map to a measurable user outcome.

  • -Sprint generation. The user types what they want to build. The model returns a structured plan — tasks, dependencies, story points, and validation criteria — that drops straight onto the Kanban board, ready to edit or run.
  • -Task execution. When a task starts, the orchestrator spins up a worktree, picks an agent, streams the task context plus relevant files, and tails the agent's output into the in-app terminal. Every run is logged with status, exit code, time, model, and full transcript.
  • -QA retry loop. When validation fails, the error output is fed into the next attempt as structured context. The orchestrator retries with exponential backoff, switches models on rate limits, and only escalates to the user after the retry budget is exhausted.

Keeping Parallel Agents From Stepping On Each Other

The hardest engineering problem was running multiple coding agents against the same repository at the same time. The naïve approach — point them all at the same working tree — turns into a merge bloodbath in minutes. We landed on four rules:

  • -One worktree per task. Each task runs in its own checkout. Agents never share a filesystem state.
  • -Dependencies in the graph, not in the prompt. The orchestrator walks the dependency graph and only releases a task once its prerequisites have merged. Agents never have to reason about ordering.
  • -Inline diff review. Each finished task surfaces a diff inside the app — accept, reject, or send back to QA with a comment. No leaving DevboardAI to review changes.
  • -Value Mode routing. Cheap models for small edits, flagship models for risky refactors. The orchestrator picks per task so the user does not have to think about model pricing while sprinting.

What Shipped

  • -Kanban dashboard with five columns, drag-and-drop, per-card model and priority, and live execution status.
  • -Natural-language sprint generation with editable tasks, dependencies, story points, and validation criteria.
  • -Orchestrator that runs the full backlog in dependency order with automatic retries, Value Mode model routing, and a full audit trail.
  • -Built-in multi-tab terminal, file explorer, inline git diffs, and project export — no context switching out of the app.
  • -Notarized Electron binary for Apple Silicon and Intel with Sparkle auto-update, native notifications, and dock badges.
  • -Marketing site on Next.js 16 with Stripe checkout and locally-verified license keys.

The Bet

  • -$74 once, not $20–200/mo. Breaks even against a single mid-tier subscription in about 23 days.
  • -Zero bytes leave the machine. Code, repo context, and run history all stay local. Only the agent CLIs talk to their providers.
  • -Multi-provider by design. Claude Code, OpenAI Codex, and Kimi on day one, so the app survives any single provider's pricing or policy changes.
  • -Builder, not assistant. The product replaces the human-in-the-loop tax for routine work and keeps the human firmly in the loop for review.

What Made the Build Possible

  • -Senior engineers from day one. The architecture decisions in weeks one and two — worktree isolation, adapter-based agent layer, local-first data — are not survivable with junior staff. The compounding cost of a bad early decision in a desktop AI product is brutal.
  • -Weekly demos to the founder. Same pattern we use on every project. Misalignment surfaces in week three instead of week ten. We have written about it before.
  • -A narrow, opinionated scope. Three loops — sprint generation, task execution, QA retry — each tied to a measurable outcome. We said no to a dozen tempting AI features that would have looked good in a demo and broken the orchestrator's reliability story.
  • -A stack that eliminates infrastructure work. Electron, Node, SQLite, Stripe, and Vercel for the marketing site. No Kubernetes, no queues, no cloud database — and no DevOps weeks burned on either.

Want the Full Case Study?

The DevboardAI case study has the full architecture, the complete tech stack, and the trade-offs we made along the way. If you are evaluating an AI development partner, it is the most honest look we can give you at how we work.

Building an AI product? Talk to engineers, not salespeople.

We design and ship production AI platforms — not demos. If you have a real budget and a real timeline, we will tell you, honestly, whether your idea is shippable as-is.

Free 30-minute call | No commitment