Skip to content
Vegard OS
v0.1.0
vebrOS Chat — Foundation Document~/vegard/blog/vebros-chat-foundation
← vebrOS Labs

vebrOS Chat — Foundation Document

12 June 2026

The working blueprint for an embeddable AI chatbot SaaS, shown on-camera in episode 1 and shared here as a living reference.

vebrOS Chat — Foundation Document

Working codename. Branding TBD. Last updated: 12 June 2026


1. Essence

A tailored AI chatbot, trained on your content and ready in a few clicks — right there when your users need it.


2. Architecture Principle

One engine, many skins.

The core is a single engine with parameters. Every product surface is a different skin/delivery mechanism wrapped around that same engine. We are not building three products — we are building one and wrapping it.


3. Product Surfaces

Surface 1 — Website Chat (customer-facing) — SHIPS FIRST

  • Embedded on a website
  • Fresh session every visitor, no cross-session memory
  • Strictly scoped to the content the owner uploads
  • Public-facing

Surface 2 — WordPress Plugin (later)

  • Same engine, wrapped for the WordPress market
  • Large existing market, easy distribution

Surface 3 — Browser Extension (enterprise, later)

  • For internal company use
  • Solo chats or team chats, scoped by team and role
  • Persistent
  • Governed from a central web panel — admins control who sees what
  • Enterprise-grade, higher value per customer

4. Build Order

  1. Engine — ingest content, store it, serve scoped answers
  2. Website embed — first skin, first live demo, first users
  3. WordPress plugin + browser extension — same engine, wrapped later

5. The Buyer

Buyer differs per skin:

  • Website embed → solo devs & small agencies. Building sites for clients, want a quick AI chat to drop in. Comfortable with a snippet/API, want clean docs and control. This is the primary launch buyer — and overlaps directly with the YouTube audience, so the channel feeds the product.
  • WordPress plugin → site owners / non-technical. Install-and-go, no terminal, no API keys to wrangle.
  • Browser extension → corporations. High security demands (SSO, data residency, audit logs). Long sales cycles, high value per customer.

Ladder from technical → non-technical → enterprise.


6. The Differentiator

Against alternatives (Kundo, Crisp, Intercom AI, or rolling your own with the raw API):

  • Extremely easy setup — fastest path from signup to a live, working chat.
  • Great UI, front and back — the embedded widget looks great; the admin backend is a pleasure to use.
  • Very customizable — handles and levers to adjust everything... but great defaults first. Works beautifully out of the box; advanced levers are tucked away for those who want them. (Easy and customizable pull in opposite directions — we resolve this with strong defaults + optional depth.)
  • Decent free tier, great features — generous enough to build trust and drive word of mouth.

North star: so good that people recommend it unprompted. Product-led growth.

Competitor note: Kundo is a heavy enterprise customer-service platform. Our angle is leaner, self-serve, AI-first, embed-anywhere.


7. Tech Stack

Framework

  • App + admin panel: Next.js (React). Chosen for speed-to-ship, largest ecosystem, easy future hiring, and it's already familiar from Splitte/Grundo. Scales well into the millions of requests. Biggest risk to future-proofing is stalling, not the framework ceiling — Next.js minimizes that.
  • Embed widget: vanilla JS — tiny, framework-agnostic bundle. Must load fast and never conflict with the host site. Explicitly NOT React.

Database

  • MongoDB Atlas holds everything — documents (users, accounts, settings, chat logs). No second database needed.
  • MongoDB Atlas Vector Search is available natively but NOT used in v1 (no embeddings in v1). Reserved for v2 retrieval work.

LLM & Retrieval

  • Chat model: DeepSeek. Chosen primarily for cost — dramatically cheaper per token than frontier models, which matters when every customer chat runs through it and we want a generous free tier.
  • v1 retrieval: content-stuffing. No embeddings in v1. Uploaded content is stored and injected directly into DeepSeek's context window per chat, scoped to the owner's content. Works well for single-site-sized knowledge (FAQ, product pages, a few docs). Simpler, faster first build.
  • v2 retrieval: our own LLM-based methods. Semantic search / retrieval for customers whose content is too big to fit in context. Native MongoDB Atlas Vector Search is available when we get there. This is deliberately deferred — and makes for strong build-in-public content when we tackle it.

Hosting

  • Vercel. Zero-config Next.js deploys, auto-scaling, generous free tier; already familiar. Caveat: serverless execution time limits on lower tiers — fine for streaming chat responses in v1, worth monitoring.

Auth

  • Better Auth. Open-source, data lives in our own MongoDB (not a third party), free. Has a MongoDB adapter. Handles email/password, social logins, sessions. Future-proofs the enterprise skin — org/roles/SSO plugins available for the browser-extension corporate buyers later, so nothing needs ripping out.

8. Business Model / Pricing

Metering

  • Headline billing unit: chats/month (a chat = one conversation/session, ~20 messages avg). This is what's advertised and what tiers are built around.
  • Hidden rail 1: messages-per-chat cap — prevents one conversation running forever. Soft "fair use per conversation" in docs, no scary number on pricing page.
  • Hidden rail 2: tokens-per-message cap — prevents a single message blowing up cost/abuse.
  • Model selection as a feature gate — free gets DeepSeek; premium models could unlock on paid (future).

Two-track pricing (customer chooses)

Track A — Simple Tiers: fixed monthly price, bucket of included chats. Cheaper within expected volume because it's predictable. Overages are deliberately expensive — the cost of not committing to usage-based.

Track B — Pay-as-you-go ramp: transparent graduated per-chat pricing that steps down with volume. Fully public, self-serve to any scale — NO "contact sales," no gatekeeping. A whale lands on the lowest rate without ever emailing us. Transparency is itself a selling point vs. competitors who hide pricing.

Example ramp (numbers illustrative, measure-first):

First N chats     included / base
N – 2,000         $0.10 / chat
2,001 – 10,000    $0.07 / chat
10,001 – 50,000   $0.05 / chat
50,000+           $0.03 / chat

Free tier

  • 50 chats/month — positioned as a taste/demo, expect fast upgrade.
  • Graceful cap when exhausted — soft "demo limit reached" message, NEVER a dead/broken bot (protects the dev who installed it in front of their client).
  • DeepSeek only.
  • "Powered by vebrOS Chat" badge — free advertising + upgrade nudge. Every free bot is a billboard (product-led growth engine).

Platform floor

  • $5/month to keep a bot active (applies to the pay-as-you-go track; Simple Tiers already have a price floor).
  • Purpose is filtering + revenue predictability, NOT covering idle cost. Idle technical cost is pennies (storage ~2–3¢, embed serving negligible). The $5 floor filters out dead-weight/spam accounts (skin in the game) and clears Stripe's cut cleanly (~9% at $5).

Overage behavior

  • Customer's choice in settings. Defaults to soft cap (bot pauses, no surprise charge). Opt into auto-overage for guaranteed uptime. Removes the #1 objection to usage-based pricing (fear of surprise bills).

Cost reality (working assumptions — MEASURE FIRST)

  • ~$0.03/chat working DeepSeek cost estimate (content-stuffing re-sends context every message, so chats cost more than bare API calls). This is a guess until measured on real traffic.
  • Fixed monthly baseline: ~$35/mo (Vercel, MongoDB Atlas, domains, services) before any customer usage. Drops per-customer fast as customer count grows.
  • Stripe ~2.9% + $0.30/transaction is the real per-charge floor — the $0.30 flat fee is why sub-$5 subscriptions don't work. Above ~$9 the percentage settles.
  • Content storage capped per tier — makes idle cost bounded and predictable rather than open-ended.

Open / to revisit after first measurements

  • Exact included-chat allowances per Simple Tier.
  • Exact ramp breakpoints and rates.
  • Real per-chat cost (the single biggest variable).
  • Minimum billing threshold for pay-as-you-go (avoid processing tiny charges).

9. The Channel Tie-In

This product is built in public. The build is the content. Series premise: "Watch me build and sell a real AI chatbot SaaS from zero."