vebrOS Chat — Foundation Document~/vebr/blog/vebros-chat-foundation

vebrOS Chat — Foundation Document

12 June 2026

The working blueprint for an embeddable AI chatbot SaaS, shown on-camera in episode 1 and shared here as a living reference.

vebrOS Chat — Foundation Document

Working codename. Branding TBD. Last updated: 12 June 2026

1. Essence

A tailored AI chatbot, trained on your content and ready in a few clicks — right there when your users need it.

2. Architecture Principle

One engine, many skins.

The core is a single engine with parameters. Every product surface is a different skin/delivery mechanism wrapped around that same engine. We are not building three products — we are building one and wrapping it.

3. Product Surfaces

Surface 1 — Website Chat (customer-facing) — SHIPS FIRST

Embedded on a website
Fresh session every visitor, no cross-session memory
Strictly scoped to the content the owner uploads
Public-facing

Surface 2 — WordPress Plugin (later)

Same engine, wrapped for the WordPress market
Large existing market, easy distribution

Surface 3 — Browser Extension (enterprise, later)

For internal company use
Solo chats or team chats, scoped by team and role
Persistent
Governed from a central web panel — admins control who sees what
Enterprise-grade, higher value per customer

4. Build Order

Engine — ingest content, store it, serve scoped answers
Website embed — first skin, first live demo, first users
WordPress plugin + browser extension — same engine, wrapped later

5. The Buyer

Buyer differs per skin:

Website embed → solo devs & small agencies. Building sites for clients, want a quick AI chat to drop in. Comfortable with a snippet/API, want clean docs and control. This is the primary launch buyer — and overlaps directly with the YouTube audience, so the channel feeds the product.
WordPress plugin → site owners / non-technical. Install-and-go, no terminal, no API keys to wrangle.
Browser extension → corporations. High security demands (SSO, data residency, audit logs). Long sales cycles, high value per customer.

Ladder from technical → non-technical → enterprise.

6. The Differentiator

Against alternatives (Kundo, Crisp, Intercom AI, or rolling your own with the raw API):

Extremely easy setup — fastest path from signup to a live, working chat.
Great UI, front and back — the embedded widget looks great; the admin backend is a pleasure to use.
Very customizable — handles and levers to adjust everything... but great defaults first. Works beautifully out of the box; advanced levers are tucked away for those who want them. (Easy and customizable pull in opposite directions — we resolve this with strong defaults + optional depth.)
Decent free tier, great features — generous enough to build trust and drive word of mouth.

North star: so good that people recommend it unprompted. Product-led growth.

Competitor note: Kundo is a heavy enterprise customer-service platform. Our angle is leaner, self-serve, AI-first, embed-anywhere.

7. Tech Stack

Next.js

App + admin

MongoDB

Data + logs

DeepSeek

Chat model

Vercel

Hosting

Better Auth

Auth + roles

Resend

WordPress

Skin (later)

Framework

App + admin panel: Next.js (React). Chosen for speed-to-ship, largest ecosystem, easy future hiring, and it's already familiar from Splitte/Grundo. Scales well into the millions of requests. Biggest risk to future-proofing is stalling, not the framework ceiling — Next.js minimizes that.
Embed widget: vanilla JS — tiny, framework-agnostic bundle. Must load fast and never conflict with the host site. Explicitly NOT React.

Database

MongoDB Atlas holds everything — documents (users, accounts, settings, chat logs). No second database needed.
MongoDB Atlas Vector Search is available natively but NOT used in v1 (no embeddings in v1). Reserved for v2 retrieval work.

LLM & Retrieval

Chat model: DeepSeek. Chosen primarily for cost — dramatically cheaper per token than frontier models, which matters when every customer chat runs through it and we want a generous free tier.
v1 retrieval: content-stuffing. No embeddings in v1. Uploaded content is stored and injected directly into DeepSeek's context window per chat, scoped to the owner's content. Works well for single-site-sized knowledge (FAQ, product pages, a few docs). Simpler, faster first build.
v2 retrieval: our own LLM-based methods. Semantic search / retrieval for customers whose content is too big to fit in context. Native MongoDB Atlas Vector Search is available when we get there. This is deliberately deferred — and makes for strong build-in-public content when we tackle it.

Hosting

Vercel. Zero-config Next.js deploys, auto-scaling, generous free tier; already familiar. Caveat: serverless execution time limits on lower tiers — fine for streaming chat responses in v1, worth monitoring.

Auth

Better Auth. Open-source, data lives in our own MongoDB (not a third party), free. Has a MongoDB adapter. Handles email/password, social logins, sessions. Future-proofs the enterprise skin — org/roles/SSO plugins available for the browser-extension corporate buyers later, so nothing needs ripping out.

8. Business Model / Pricing

Metering

Headline billing unit: chats/month (a chat = one conversation/session, ~20 messages avg). This is what's advertised and what tiers are built around.
Hidden rail 1: messages-per-chat cap — prevents one conversation running forever. Soft "fair use per conversation" in docs, no scary number on pricing page.
Hidden rail 2: tokens-per-message cap — prevents a single message blowing up cost/abuse.
Model selection as a feature gate — free gets DeepSeek; premium models could unlock on paid (future).

Two-track pricing (customer chooses)

Track A — Simple Tiers: fixed monthly price, bucket of included chats. Cheaper within expected volume because it's predictable. Overages are deliberately expensive — the cost of not committing to usage-based.

Track B — Pay-as-you-go ramp: transparent graduated per-chat pricing that steps down with volume. Fully public, self-serve to any scale — NO "contact sales," no gatekeeping. A whale lands on the lowest rate without ever emailing us. Transparency is itself a selling point vs. competitors who hide pricing.

Example ramp (numbers illustrative, measure-first):

Pay-as-you-go ramp · illustrative, measure-first

First N chats: included / base
N – 2,000: $0.10 / chat
2,001 – 10,000: $0.07 / chat
10,001 – 50,000: $0.05 / chat
50,000+: $0.03 / chat

Free tier

50 chats/month — positioned as a taste/demo, expect fast upgrade.
Graceful cap when exhausted — soft "demo limit reached" message, NEVER a dead/broken bot (protects the dev who installed it in front of their client).
DeepSeek only.
"Powered by vebrOS Chat" badge — free advertising + upgrade nudge. Every free bot is a billboard (product-led growth engine).

Platform floor

$5/month to keep a bot active (applies to the pay-as-you-go track; Simple Tiers already have a price floor).
Purpose is filtering + revenue predictability, NOT covering idle cost. Idle technical cost is pennies (storage ~2–3¢, embed serving negligible). The $5 floor filters out dead-weight/spam accounts (skin in the game) and clears Stripe's cut cleanly (~9% at $5).

Overage behavior

Customer's choice in settings. Defaults to soft cap (bot pauses, no surprise charge). Opt into auto-overage for guaranteed uptime. Removes the #1 objection to usage-based pricing (fear of surprise bills).

Cost reality (working assumptions — MEASURE FIRST)

~$0.03/chat working DeepSeek cost estimate (content-stuffing re-sends context every message, so chats cost more than bare API calls). This is a guess until measured on real traffic.
Fixed monthly baseline: ~$35/mo (Vercel, MongoDB Atlas, domains, services) before any customer usage. Drops per-customer fast as customer count grows.
Stripe ~2.9% + $0.30/transaction is the real per-charge floor — the $0.30 flat fee is why sub-$5 subscriptions don't work. Above ~$9 the percentage settles.
Content storage capped per tier — makes idle cost bounded and predictable rather than open-ended.

Open / to revisit after first measurements

Exact included-chat allowances per Simple Tier.
Exact ramp breakpoints and rates.
Real per-chat cost (the single biggest variable).
Minimum billing threshold for pay-as-you-go (avoid processing tiny charges).

9. The Channel Tie-In

This product is built in public. The build is the content. Series premise: "Watch me build and sell a real AI chatbot SaaS from zero."

vebrOS Chat — Foundation Document

vebrOS Chat — Foundation Document

1. Essence

2. Architecture Principle

3. Product Surfaces

Surface 1 — Website Chat (customer-facing) — SHIPS FIRST

Surface 2 — WordPress Plugin (later)

Surface 3 — Browser Extension (enterprise, later)

4. Build Order

5. The Buyer

6. The Differentiator

7. Tech Stack

Framework

Database

LLM & Retrieval

Hosting

Auth

8. Business Model / Pricing

Metering

Two-track pricing (customer chooses)

Free tier

Platform floor

Overage behavior

Cost reality (working assumptions — MEASURE FIRST)

Open / to revisit after first measurements

9. The Channel Tie-In

From this episode