Dmitry Zorin@medonomator·Batumi, GE

Booking · Q3 2026

AI engineer.
I build LLM systems
that survive real users.

The plumbing under products with paying users: token-level billing, cognitive memory, multi-stage evals, voice.

Three years on AI infra. Twelve on backend.

2026-05 → MindForge v2 shipped: GPT-5 routing, cognitive memory, pgvector · mind-forge.org

Start a project See selected work

TypeScript

LLM systems in production

systems in production

01What I build

Four areas, each shipped to production more than once.

The cases that break under load - cost runaway, hallucinated entities, provider outages, prompt drift, rate-limit storms. I plan for them up front, since patching after the first incident takes ten times longer.

LLM infrastructure

Token-level billing across text, TTS, and Whisper. Provider fallback chains. Prompt versioning with hash-based drift detection. Full Langfuse traces from request to model output.

billingevalsobservability

Cognitive memory

Episodic and semantic memory over Qdrant or pgvector. Bayesian Knowledge Tracing for skill mastery. Hidden Markov Models for emotional state. Thompson sampling for adaptive content.

memoryBKTbandits

AI pipelines

Multi-stage quality gates with Zod-validated outputs. Entity reference checks against retrieved context. Retry loops that pass rejection reasons back into the prompt instead of blind regeneration.

RAGvalidationagents

Multi-modal

Streaming chat over μWebSockets. TTS and Whisper with hallucination sanitization on both ends. Function calling with project-aware context. Real-time interrupts.

voicestreamingWS

02Selected work

Four projects in production - one open, three under NDA.

Adaptive learning

Solo architect and engineer

MindForge. AI engineering curriculum.

64 lessons of AI engineering, taught by a Socratic tutor that grades exercises semantically and adapts coding tasks to the learner's GitHub history. Whole stack mine, from database to UI.

Prompt composition framework. 15+ task prompts, hash-based drift detection on every change
Token-level billing across text, TTS, Whisper, and function calls
WebSocket streaming via μWebSockets.js for chat, hints, and live transcription
GS2 lesson schema validated with Zod, with misconception flags attached per concept

Visit MindForge

Behavior change

Lead AI engineer, NDA

Cognitive notification engine for a habit app.

Per-user push generation for ~2k MAU. A Thompson-sampling bandit learns the tone and angle each user responds to. Four-stage LLM quality gate: generate, judge, entity-check against context, fallback. When the HMM detects emotional risk, an override forces the model into supportive tone.

8 notification types with tier-based budgets (ACTIVE, SLOWING, DORMANT, COLD)
Bayesian Knowledge Tracing for habit mastery probabilities
Hidden Markov Model for emotional trajectory and crisis prediction
Adaptive timing on a Redis bloom prefilter with SQL verify pass on hits

Long-context memory

Architect and engineer

Episodic and semantic memory layer.

A cognitive context service that enriches every LLM prompt with relevant past interactions, learned facts as subject-predicate-object triples, emotional trends, and behavioral predictions. Each module degrades gracefully on its own.

OpenAI text-embedding-3-small over Qdrant with filtered similarity
Survival analysis for churn prediction, injected as warnings into the prompt
Concept ingestion from PR reviews feeds automatic mastery tracking
Sub-100ms P95 enrichment with async-safe failure modes

Developer tools

Solo build

PR-aware AI tutor for production repos.

A GitHub App that generates personalized coding tasks against the learner's own repos, then reviews their PRs with inline comments. Difficulty calibrates from past review decisions: three rounds of changes make the next task easier, two clean approvals raise the bar.

Octokit-based diff fetching with line-number validation and SHA-dedup
Stack-aware persona for ML, distributed systems, or web, each with its own antipatterns
Concept extraction from PR review flows into the learner mastery model
BullMQ async pipeline, observability via Sentry and Langfuse

03Shipping log

What actually went out - last weeks, dated, no embellishment.

A trimmed slice of recent commits across mind-forge, exocortex, confyday-back, second-brain, math-research. Maintenance commits, lint passes, and merges removed.

2026-05-09
mind-forge
voice mentor vocab hints + billing energy reset countdown
2026-05-08
mind-forge
voice mentor multi-layer orchestration (intent + facts + planner + critic)
2026-05-07
mind-forge
native streaming for gpt-5 reasoning models, per-chunk TTS
2026-05-06
mind-forge
energy billing rewrite + voice mentor observability
2026-05-06
mind-forge
pre-flight model gate + estimate-shortfall clamp
2026-05-05
mind-forge
PWA last-route persistence + Lab dashboard expansion
2026-05-04
mind-forge
ai-engineering course +10 lessons (59-68), full candy audit pass on 68
2026-05-04
mind-forge
project-lab v2: portfolio signing, leaderboard, PR review prompt v2
2026-05-04
ai-pulse
weekly LLM industry pulse via Claude CLI to Telegram, sources-first strategy
2026-05-03
confyday-back
per-locale push source/keywords (CON-847)
2026-05-02
confyday-back
curated push library replaces LLM/fact-pushes pipeline
2026-05-01
mind-forge
live-practice pendingCorrections + selective corrections + fluency turns
2026-05-01
mind-forge
learning-theory lessons lt-11 margin-bounds, lt-12 online-regret, lt-13 deep-generalization
2026-05-01
mind-forge
CS lessons 03-04 candy rewrite (51 + 53 topics RU+EN)
2026-04-30
mind-forge
mathlikeanim block type + consistent-hashing animation pilot
2026-04-30
mind-forge
CS lesson 02 candy rewrite, all 50 topics RU+EN
2026-04-29
mind-forge
lessons hot-reload + schema-driven cross-link filter
2026-04-29
mind-forge
forbidden-phrase ratchet gate + LaTeX validator wired into CI
2026-04-28
mind-forge
Yandex sign-in (RU), web-push permissions, unified PWA manifest
2026-04-27
math-research
optimal hallucination scorer - AUROC 0.979 QA, 0.772 Sum (pure gzip)
2026-04-26
math-research
Kolmogorov Structure Function + 4-method benchmark
2026-04-26
math-research
hallucination detection benchmark on HaluEval
2026-04-25
exocortex
cross-domain noise gates + Unicode-aware skip patterns in recall
2026-04-24
exocortex
HyDE for short-query embedding lift + LLM-as-reranker (gpt-4o-mini listwise)
2026-04-24
exocortex
graph-first retrieval via mem_item_entities
2026-04-24
second-brain
astrology weekly transit alerts (EN Pro feature)
2026-04-24
second-brain
extract cities.ts (12k LOC) to JSON asset
2026-04-23
exocortex
cognitive memory v2 - universal items, lifecycle, multi-signal recall
2026-04-23
second-brain
migrate runtime and CI from PM2 to Docker
2026-04-21
confyday-back
langfuse prompt management - centralized constants, datasets, experiment runner
2026-04-21
confyday-back
LLM quality gate + 7-day cross-type anti-repetition for pushes
2026-04-19
confyday-back
BKT/survival models bug - every habit tracking treated as failure
2026-04-19
second-brain
EN i18n + Stripe payments for Astro AI bot, locale-aware system prompts
2026-04-18
exocortex
cheap swarm testing - replay-consensus and --cheap mode
2026-04-17
exocortex
Langfuse swarm tracer - 20 event types as spans/generations/events
2026-04-17
exocortex
evolution engine v2 - swarm-evolution adapter, enhanced GA
2026-04-17
confyday-back
rolling deploy for zero-downtime production updates
2026-04-17
confyday-back
ban em dashes and AI-typical patterns from push notifications
2026-04-16
exocortex
file-triggered intelligence - per-file context on Read/Edit
2026-04-16
exocortex
git history indexing - batch indexer, commit hook, recall integration
2026-04-16
exocortex
PostgreSQL graph, GA self-optimization, memory consolidation
2026-04-15
confyday-back
gpt-5.4-mini upgrade, sampling params for premium users
2026-04-15
confyday-back
living mind - narrative threads, change detection, follow-up awareness
2026-04-15
confyday-back
ai_notification push support - runtime type guards, entityId deep linking
2026-02-06
second-brain
YooKassa payment polling cron as webhook fallback
2025-12-07
second-brain
achievements system + brain balance economy
2025-12-06
second-brain
brain balance system, living mind, streaming responses

47 entries · pulled from public + private repos · last refreshed 2026-05-09

04What clients say

Two clients, both with AI live in production today.

“Dmitry Zorin shipped our entire AI layer - voice mode, user-portrait personalization, structured-output notifications, cost-tier model routing. Eval coverage from day one, Langfuse observability baked in. The kind of engineer you keep on speed dial.”

Alex PikunovFounder · Confyday

“Built our LLM evaluation harness from scratch. Structured outputs, prompt-judging for candidate scoring, regression tests on every model swap. We stopped deploying AI on vibes - every prompt change now has a measurable delta before it ships.”

FlomniAI engineering team

05Stack

What I reach for. Boring infra, sharp on the AI layer.

TypeScript across the stack, Python at the AI layer. Postgres, Docker, and Hetzner for plumbing. Specialized tools where the difference shows up in the bill.

Models: gpt-5 (default reasoning) · gpt-5.4 · gpt-5.4-mini (premium routing) · gpt-4o · gpt-4o-mini (rerankers, judges) · o1 · o1-mini · Claude Opus · Claude Sonnet · Claude Haiku · text-embedding-3-small · Whisper-1 · TTS-1
LLM tooling: Langfuse (traces + prompt mgmt) · Zod schemas · JSON Schema · structured outputs · function calling · HyDE · listwise rerank · Thompson sampling · BKT · HMM · ts-fsrs (spaced repetition) · Promptfoo
Vector + queue: Qdrant (most projects) · pgvector (Postgres-only stacks) · Redis bloom · KeyDB (ARM, not :alpine) · BullMQ · Postgres 16 · HNSW
Backend: TypeScript (default) · Python (AI layer only) · NestJS · Express · FastAPI · TypeORM · μWebSockets.js · Pino · Sentry · Stripe · YooKassa
Frontend: Next.js 16 · React 19 · Tailwind v4 · Zustand · TanStack Query · Framer Motion · PWA + service worker · mathlikeanim-rs
Infra: Docker (linux/amd64 builds from M1) · Hetzner (most prod) · Oracle Cloud (free-tier ARM) · GCP (fallback) · DigitalOcean · Cloudflare · nginx · GitHub Actions · GitLab CI · PM2 · systemd · certbot (Let's Encrypt)

06How to engage

Three engagement modes - all of them end with code in your repo.

Fixed scope, fixed price. I bill outcomes instead of hours, so every engagement leaves you with code and docs you can actually hand to the next engineer.

4 to 12 weeksfrom $12K

Build

One AI feature, spec to production.

I take a defined system from spec to deploy: cognitive memory, RAG, evals harness, voice mode, or a billing layer. Weekly demos against a checklist you signed off on, no scope creep.

Architecture doc and delivery plan up front
Deployed to your infra, not mine
Evals and observability shipped with the feature

Discuss this engagement

1 to 2 weeksfrom $4K

Audit

Architecture review of a live system.

Your AI system is in production but something feels off - costs creeping, hallucinations slipping past your checks, latency spiking on hot paths. I find what is broken and write a prioritized fix plan with effort estimates against it.

Code and infra walkthrough
Cost, risk, and drift report
Remediation roadmap with effort estimates

Discuss this engagement

monthly engagementfrom $5K/month

Embed

Senior engineer on your team.

I drop into your team for a focused push. Pair on the hardest features, ship them, leave behind PRs and design docs your engineers can extend. No onboarding ramp on basics.

Delivery measured in merged PRs
Architecture pairing with your leads
Decisions captured in writing

Discuss this engagement

AI engineer.I build LLM systems that survive real users.

Four areas, each shipped to production more than once.

LLM infrastructure

Cognitive memory

AI pipelines

Multi-modal

Four projects in production - one open, three under NDA.

MindForge. AI engineering curriculum.

Cognitive notification engine for a habit app.

Episodic and semantic memory layer.

PR-aware AI tutor for production repos.

What actually went out - last weeks, dated, no embellishment.

Two clients, both with AI live in production today.

What I reach for. Boring infra, sharp on the AI layer.

Three engagement modes - all of them end with code in your repo.

One AI feature, spec to production.

Architecture review of a live system.

Senior engineer on your team.

AI engineer.
I build LLM systems
that survive real users.