Skip to content

Open-source legal intake doesn't exist yet — and that's about to change

Open-source legal intake software in 2026 — the missing layer of the legal AI stack

By Tiago Strammiello, Founder, ClaireAI

In brief

  • There is no production-grade open-source legal intake software comparable to Lawmatics, Lead Docket, or Intaker in 2026 — every "open-source legal AI" project either assists the lawyer after intake or is a generic voice agent with no legal calibration.
  • Mike OSS, an open-source Harvey/Legora clone shipped by a solo developer in 14 days, hit 3,550 GitHub stars and 1,089 forks in 28 days under AGPL-3.0 — the clearest signal yet that demand for self-hostable legal AI is real.
  • Three forces opened the door: Thomson Reuters shut down standalone Casetext in April 2025 with 5–10× price hikes; Harvey scaled to an $11B valuation at ~$1,200 per seat; ABA Formal Opinion 512 made "I don't know how my vendor handles client data" a malpractice question.
  • The ethics framework is more permissive than firms realize. Florida Bar Op. 24-1 and Oregon State Bar Op. 2026-208 spell out exactly what an AI chatbot intake must do: disclose its non-lawyer status, decline to give legal advice, refer to a licensed attorney.
  • Honest economics: self-hosted legal intake only beats SaaS on raw cost above 10–15 attorneys. Below that, the real choice is monolithic SaaS vs. an orchestrated stack of single-purpose APIs.

The state of the question. There is no production-grade open-source legal intake software competitive with Lawmatics, Lead Docket, or Intaker in May 2026. There are open-source document assistants (Mike), document-automation engines (Docassemble), e-signature platforms (DocuSeal), case-management workspaces (Stella), and generic voice receptionists (AIReceptionist, Pipecat) — and a fast-growing community putting them together by hand. The one piece of the legal-AI stack still missing from the open-source movement is the front door.

3,550

GitHub stars Mike OSS earned in its first 28 days under AGPL-3.0

github.com/willchen96/mike

$11B

Harvey's March 2026 valuation at roughly $1,200 per attorney seat per month

CNBC / Artificial Lawyer

0

production-grade, customer-facing, open-source legal intake products on the market

ClaireAI inventory, May 2026

The Mike OSS moment

Will Chen, a solo developer, shipped Mike — an open-source clone of Harvey and Legora — in 14 days and posted it on Hacker News on April 29, 2026. The Mike repository collected 3,550 stars, 1,089 forks, and 49 subscribers within a month. The license is AGPL-3.0, which forces any commercial fork to publish its own changes. The stack is Next.js + Express + Supabase + Cloudflare R2, with Claude, Gemini, and GPT wired up as interchangeable model providers.

What Mike actually does: lets a user sign up, upload matter documents, organize them into project workspaces, and chat with them across multiple frontier models. It's a legal-document research assistant. Downstream forks already exist — MikeRust ports it to a native Tauri desktop app; Emilie rebrands it as a Swiss sovereign-data variant; another fork wires in cryptographic verification.

Reading the repo cover-to-cover this past weekend, the code is rougher than Harvey's marketing comparisons would suggest. There are TODOs in the auth flow, the document-chunking strategy is naïve, the workspace model has corners that haven't been fully thought through. That is part of the point. The codebase is small enough that a firm's IT consultant could read it on a Saturday and understand exactly what is happening to client data — which is the inverse of every closed legal-AI product I have evaluated this year.

The traction isn't the surprise. The narrative is. Harvey raised a $200M round at an $11B valuation in March 2026; Mike, built in two weeks by one person, demonstrates that the technical moat around legal AI is thinner than the valuation implies.

What Mike doesn't do — the missing layer of the legal stack

Mike is a legal-document research assistant. It is not a legal intake system. The distinction matters because the open-source legal-AI conversation routinely conflates them.

A legal intake system is the front door: it picks up a phone call or web chat from a prospective client, screens for conflicts of interest, qualifies the matter, books a consult, and hands off to a case-management system. That workflow involves telephony, speech-to-text, calendaring, e-signatures, CRM write-back, and — most importantly — a model that performs reliably under Rule 1.18 confidentiality constraints while a stranger describes a legal problem at 11 p.m. on a Sunday.

Map the open-source legal stack today and the gap is unmistakable:

LayerWhat it doesOpen-source incumbentProduction-ready?
Document researchChat with matter documents across modelsMike (willchen96/mike), AGPL-3.0Yes, May 2026
Document automationGuided interviews, form generationDocassemble (jhpyle/docassemble), MITYes, deployed by most U.S. legal aid
Court forms / A2JSelf-represented-litigant interviewsSuffolk LIT AssemblyLine, MITYes, multiple state courts
E-signatureSigned retainer + matter docsOpenSign (AGPL-3.0), DocuSealYes
Case management workspaceMatters, documents, review tabsStella (Apache-2.0)Beta
CRM / pipelineLead tracking, pipeline automationTwenty, EspoCRM, SuiteCRMYes (generic, not legal)
Voice / phone captureSpeech-to-speech voice agentPipecat, LiveKit Agents, AIReceptionistGeneric only — no legal template
Customer-facing intakeQualify caller, screen conflicts, book consult(none)No

The open-source legal stack as of May 28, 2026. The customer-facing intake layer is the one with no production-grade option.

Three forces pried the door open

The Mike moment didn't happen in a vacuum. Three concurrent events made the legal-AI market structurally vulnerable to open-source disruption in 2025–2026.

1. Casetext's shutdown

Thomson Reuters acquired Casetext for $650 million in 2023, kept the brand alive for two years, then shut down standalone CoCounsel on April 1, 2025. Users with promotional pricing locked in "for life" were forced onto Westlaw-bundled tiers running 5–10× the original $65/month rate. The case study is now a permanent feature of legal-tech buyer conversations: this is what acquisition risk looks like.

2. Harvey's pricing wall

Harvey ran from a $3B valuation in February 2025 to $8B in December 2025 to $11B in March 2026. Reported seat pricing is approximately $1,200 per lawyer per month with 20-seat minimums and 12-month commits. The June 2025 LexisNexis alliance is projected by Artificial Lawyer to push all-in costs toward $3,000 per seat once Lexis content overlaps with Harvey workflows. For any firm under 50 attorneys, the economics simply do not work.

3. The ethics floor moved

ABA Formal Opinion 512 (July 29, 2024) was the first comprehensive ABA guidance on generative AI. It cross-references Model Rules 1.1 (competence), 1.4 (communication), 1.5 (fees), 1.6 (confidentiality), 3.1 and 3.3 (candor), and 5.1 and 5.3 (supervision). The load-bearing passage for intake:

Before lawyers input information relating to the representation of a client into a GAI tool, they must evaluate the risks that the information will be disclosed to or accessed by others.

ABA Formal Opinion 512 (July 29, 2024), p. 6

A boilerplate engagement letter does not satisfy that obligation. The natural architectural answer is a model the firm controls — and self-hostable Llama 3.3, Mistral, and Qwen finally crossed the usability threshold for that in 2025.

What actually exists in open-source legal intake today

An honest inventory of the field, broken into three buckets:

Lawyer-side intake assistants (not customer-facing)

  • LawDroid Legal Aid Plugin — Tom Martin's Apache-2.0 Claude plugin pack, released v0.1.0 on May 20, 2026. Includes a /client-intake skill (practice-area templates, cross-area issue spotting, conflict flags, urgency triage) and /eligibility-screening (income, residency, citizenship per funder rules). Brand-new and explicitly scoped to civil legal aid — not retained private practice.
  • Anthropic claude-for-legal — 7,800-star Apache-2.0 repo launched April 2026 with 12 plugins and 80+ skills. The intake-relevant ones (/legal-clinic:client-intake, /litigation-legal:matter-intake, /litigation-legal:demand-intake) run inside Claude Code; they are markdown prompt packs, not deployable apps. A licensed attorney uses them to structure intake notes after the prospect has already called.

Generic voice receptionists (no legal calibration)

  • AIReceptionist (kirklandsig/AIReceptionist) — 40 stars, AGPL-3.0. OpenAI Realtime + LiveKit SIP voice agent. The only built-in template is dental. The README warns of breaking changes.
  • Pipecat — BSD-2 voice framework with a medical patient-intake example. A community PR to add a legal example (#631, October 2024) was explicitly rejected as redundant in January 2025. No legal example currently in the tree.

Court-facing and legal-aid (deterministic, no LLM in the intake path)

  • Docassemble — 951 stars, MIT. Jonathan Pyle's guided-interview engine. Powers most U.S. legal-aid intake in production. No LLM front-end.
  • Suffolk LIT Lab docassemble-AssemblyLine — 62 stars, MIT, v4.6.0 in May 2026. A framework on top of Docassemble that turns paper court forms into guided web interviews. Drives CourtFormsOnline.org and replicators in IL, FL, and CA. Purely deterministic; the LLM work (Weaver, Steenhuis et al.) helps authors write interviews, not run them.

The verdict..No project on the list above is a production-grade, customer-facing AI legal intake system. The closest is LawDroid's Legal Aid Plugin, which is one week old, alpha-grade, scoped to civil legal aid funders, and runs inside Claude as a markdown skill pack.

The ethics question, answered cleanly

The ethics layer is the question every law-firm partner asks first, and the answer is more permissive than the legal-tech press makes it sound.

Model Rule 1.18 attaches the moment a chatbot collects facts from a prospective client. Confidentiality, conflicts imputation, and the duty not to use the information adversely begin pre-retainer. An intake bot must therefore (a) run a conflict pre-screen before collecting substantive facts and (b) be architected so the model does not train on inputs.

Florida Bar Ethics Opinion 24-1 (January 19, 2024) is the cleanest opinion on chatbot intake specifically. Its four requirements: (1) the lawyer must inform prospective clients that they are communicating with an AI program; (2) the chatbot must clearly identify its non-lawyer status; (3) the bot must limit itself to factual intake and refer legal questions to a lawyer; (4) the lawyer is "ultimately responsible should the chatbot provide misleading information."

ABA Formal Opinion 512 (July 29, 2024) names "client intake" explicitly as one of the four use cases it covers. Boilerplate engagement-letter consent does not satisfy Rule 1.6 for a self-learning AI tool — the lawyer must understand how the GAI tool uses data specifically.

Oregon State Bar Opinion 2026-208 treats the AI agent as a non-lawyer assistant under Oregon RPC 5.3 (parallel to ABA Model Rule 5.3). Lawyers must monitor the AI to prevent "creating false impressions of attorney-client relationships, promising services the firm cannot deliver, guaranteeing particular outcomes." Disclaimers help but do not fully discharge the supervision duty.

Implication for open-source vs. SaaS..No U.S. bar opinion treats open-source differently from commercial software. The supervision duty under Rule 5.3 does not disappear when a firm self-hosts — it relocates. If you self-host the model, you become the vendor, and the architectural burden (no training on inputs, encryption, BAA, retention controls) is yours. If you use a commercial vendor, the contractual burden is yours.

The honest economics

The legal-tech press over-promises self-hosting. The actual breakdown:

ScenarioSelf-hosted stack (monthly)Equivalent SaaS (monthly)Notes
Solo, ~50 intakes/mo~$492~$537 (Lawmatics + Smith.ai + DocuSign)Roughly tied; SaaS wins if the attorney's time has value
5-attorney, ~250 intakes/mo~$1,210~$927SaaS still wins until per-seat scaling kicks in
20-attorney, ~1,000 intakes/mo~$3,730~$2,527SaaS wins on raw cost — but per-seat curve worsens at 30+
50-attorney, ~3,000 intakes/mo~$6,800~$8,500+Self-hosted wins, assuming a ¼-time engineer is already on payroll

Self-hosted costs include compute, voice (Telnyx + LiveKit), Deepgram STT, ElevenLabs TTS, LLM API (Claude Haiku 4.5), DocuSeal, and a fractional IT contractor. SaaS bundle pricing from Lawmatics, Smith.ai, and DocuSign published rates, May 2026.

The break-even. Self-hosting beats SaaS on raw monthly cost at roughly 10–15 attorneys, assuming the firm has — or can hire — a quarter-time engineer to keep the stack alive. Below that scale, the IT contractor line item alone (4–8 hours per month at $100 per hour) erases the apparent savings.

The trap. Running Llama 3.3 70B on your own GPU costs $3,000–$5,000 per month at production volume (cloud A100 plus ops). At realistic intake token volumes — under 10 million tokens per month for most firms — API providers like Claude Haiku and GPT-4o-mini are 10–100× cheaper. Self-hosting the model rarely pays unless you are doing more than 50M tokens per month or have a hard data-residency requirement.

Where SaaS still wins

  • Compliance inheritance. Lawmatics, Clio Grow, and Smith.ai ship SOC 2 and HIPAA BAAs out of the box. DocuSeal Community self-hosted does not hand you a BAA — that's on you.
  • Case-management integrations. Lawmatics, Lead Docket, and Intaker have prebuilt sync with Clio, Filevine, MyCase, and PracticePanther. Rebuilding those integrations in-house is weeks of engineering.
  • Human escalation. Smith.ai's $9.75–$11-per-call human receptionist is irreplaceable for the confused 2 a.m. caller. Self-hosted gives you no human fallback.

The reframe. The honest question is not self-host vs. SaaS. It is one monolithic SaaS vs. several single-purpose APIs glued together. Every public "I built my own intake stack" case study turns out, on inspection, to be the second option: Twilio + Deepgram + ElevenLabs + Claude + DocuSeal + a CRM, orchestrated by n8n or custom code. The data lives in the firm's tenant of each service; the firm is not actually self-hosting anything.

What this means for law firms in 2026

If you're a solo or small firm (under 10 attorneys)

Skip the open-source hype. The realistic choice is a vendor whose contract you can actually read — short data-retention policy, no model-training rights over your conversations, clean export path, month-to-month or short-commit pricing. That is the meaningful version of "data sovereignty" at your scale, and it does not require self-hosting.

If you're a 15+ attorney firm with operational capacity

The building blocks exist. Pipecat or LiveKit (voice), Deepgram (STT), ElevenLabs (TTS), DocuSeal (e-signature), Twenty or EspoCRM (CRM), and a self-hosted Llama or Mistral can be assembled into a credible intake stack. But you are accepting responsibility for the integration, the supervision under Rule 5.3, and the bar opinions in every jurisdiction where you practice. Plan on at least one quarter of engineering work and a permanent quarter-time operational owner.

If you're an open-source developer

This is the wedge. The community has document research, document automation, e-signature, case management, and voice infrastructure as separate open-source projects. The customer-facing intake layer is the missing piece — and the first project that ships a credible, legal-calibrated, multi-channel intake under a permissive license will define the next chapter of the conversation Mike OSS opened.

Where ClaireAI sits in this picture

Editorial disclosure..ClaireAI is the publisher of this guide. We are not open-source, and we say that clearly below. Every named vendor, repo, and statistic above is verifiable through the GitHub, ABA, state bar, and pricing-page URLs that source them.

ClaireAI 365 is not open-source. It's a commercial product, purpose-built for law firms that need the legal calibration, CRM integrations, compliance attestations, and human escalation paths that no open-source project ships today.

That's the honest framing. The reason this guide exists is that the legal-tech press treats "AI legal intake" as if it were a solved category, and it isn't. The open-source side is fertile and growing — Mike OSS is real, Docassemble is foundational, LawDroid's plugin pack is the most interesting new entrant in twelve months — but none of it is a turnkey customer-facing AI receptionist for a firm that needs to be live next week.

Where we differ from the alternatives covered above

  • Calibrated per-practice — personal injury, criminal defense, family law, immigration, and general civil each get a tuned intake script.
  • Conflict screening per Rule 1.18 before any privileged facts are collected.
  • 66 native case-management integrations including Clio, Filevine, MyCase, PracticePanther, CASEpeer, Litify, Lawmatics, CloudLex, Smokeball, Rocket Matter, and CosmoLex.
  • SOC 2 Type II infrastructure and HIPAA-aligned BAAs signed at onboarding.
  • Live human escalation when the AI reaches the limit of what it should be answering on its own.

If you'd like to see ClaireAI handle a calibrated intake call for your practice area, the demo is the most efficient way to evaluate the fit. Pricing is published on the pricing page — no sales-gated quotes.

Frequently asked questions

What is open-source legal intake software?

Software for capturing, qualifying, and routing prospective-client inquiries at a law firm — the front door that precedes case management — distributed under a license (typically MIT, Apache-2.0, or AGPL-3.0) that lets a firm read, modify, and self-host the code. As of May 2026, no production-grade open-source product fills this category for retained private practice.

Is Mike OSS legal intake software?

No. Mike is a document-research assistant — analogous to Harvey or Legora — for chatting with matter documents. It does not answer phones, screen for conflicts, or book consults. It's the most-watched open-source legal-AI project of 2026, but it sits a layer above intake.

What's the difference between free, open-source, and self-hosted legal intake?

Three different things. "Free" usually means a vendor's free trial or limited tier of a closed product (Clio Grow, Lawmatics, and Gavel all market free tiers). "Open-source" means the source code is published under a license that permits modification and self-hosting (AGPL-3.0, Apache-2.0, MIT). "Self-hosted" describes where the software runs — on infrastructure the firm controls — and applies to both open-source projects and certain commercial products with on-premise deployment options. The three overlap but are not synonyms.

Can a law firm self-host its client intake software?

Technically yes. A firm can wire up Pipecat or LiveKit for voice, Deepgram or AWS Transcribe for speech-to-text, ElevenLabs or Cartesia for text-to-speech, Claude or self-hosted Llama for reasoning, DocuSeal for e-signature, and an open-source CRM (Twenty, EspoCRM, SuiteCRM) on its own infrastructure. Break-even versus a SaaS like Lawmatics or Intaker is around 10–15 attorneys, assuming a quarter-time engineer is already on payroll. Below that, the IT line item alone wipes out the savings.

Does the ABA permit AI client intake?

Yes, with conditions. ABA Formal Opinion 512 (July 29, 2024) addresses generative-AI use including for client intake, and requires lawyers to understand the tool's data-handling, obtain informed client consent where Rule 1.6 is triggered, supervise the tool under Rule 5.3, and avoid billing clients for time saved by AI. Florida Bar Opinion 24-1 (January 2024) and Oregon State Bar Opinion 2026-208 add intake-specific rules: the bot must disclose it is non-lawyer software, limit itself to factual information, and refer legal questions to a licensed attorney.

What's the best open-source alternative to Lawmatics?

There isn't a like-for-like alternative in May 2026. The closest functional substitute is an assembled stack — an open-source CRM (Twenty, EspoCRM, or SuiteCRM) for pipeline, Docassemble for guided intake forms, DocuSeal for e-signature, and Pipecat or LiveKit Agents for phone capture. None of those projects market themselves as a Lawmatics replacement, and stitching them together is non-trivial.

Can open-source legal intake meet bar confidentiality rules?

Yes, when implemented carefully. The architectural win of self-hosting is that inputs never leave the firm's control, which directly satisfies California's Practical Guidance on Generative AI (November 2023) and Rule 1.6 generally. The catch: the firm becomes the vendor for purposes of supervision under Rule 5.3, so the technical controls — encryption at rest, access logs, retention policies, prompt-injection defenses — become the firm's responsibility rather than a vendor's.

Tiago Strammiello, Founder, ClaireAI. Tiago founded ClaireAI after watching law firms lose six-figure cases to missed intake calls, and has spent the last two years benchmarking voice-AI stacks against the realities of Rule 1.18 and state-bar advertising rules. He reads every ABA and state ethics opinion the day it drops and tracks the open-source legal stack repo-by-repo. This piece was reported from primary GitHub repositories, the published text of every bar opinion cited, and the public pricing pages of every named vendor.

See ClaireAI handle a live intake call.

Book a 30-minute walk-through. We'll show Claire handling a live intake call calibrated to your firm's practice area and rubric.

Book a demo