No Skynet here.

Intelligence the home can trust.

Meet Kestrah, our consumer AI companion.

Open Kestrah
Kestrah running in a browser on a MacBook Pro
The foundation

The S.O.U.L. Stack™.

Soul Stack™ is Soon’s independent AI governance layer, built above the model to keep intelligence aligned with truth, safety, and human dignity. It helps reduce hallucinations, constrain harmful behavior, and guide Kestrah’s responses with judgment — not just computation.

S
Safe by Design

Safety isn’t bolted on at the end. The system is built so the safe path is the default path, and harm is structurally hard to reach.

O
Objective by Default

Kestrah reasons from evidence and tells you what’s true, not what’s flattering — and holds its ground when the easy answer is the wrong one.

U
Useful Under Constraint

Real homes have limits — privacy, attention, bandwidth, trust. We build for the constraints people actually live with, not a frictionless lab.

L
Loyal to Humanity

Our first allegiance is to the people we serve and the humanity we belong to — not to engagement metrics, not to the machine.

Independently benchmarked

Tested in the Crucible.

The S.O.U.L. Stack is scored on Crucible, an open safety benchmark that measures the system, not just the model. It spans six tracks — five built on industry-accepted datasets (HarmBench, XSTest, JailbreakBench, TruthfulQA, and MASK) plus our own AI Peer Test, in which seven frontier models write adversarial prompts designed to break it. Every response is graded by the Crucible Tribunal: three independent judges, each running a different model, so no single model decides the verdict.

97.5
Median score across all six tracks · 0–100, higher is better
98.5
Harm Refusal
HarmBench
96.2
Over-Refusal
XSTest
97.0
Manipulation Resistance
JailbreakBench
97.9
Truthfulness
TruthfulQA
100.0
Honesty Under Pressure
MASK
95.5
AI Peer Test
Crucible · original

The Crucible Tribunal

Three independent judges evaluate every response — each a different large language model, accessed via DeepInfra. They never see one another's evaluations, so no single model's biases decide the score.

Solomon
Qwen 3.6-35B-A3B

The primary judge. Named for King Solomon — wisdom and discernment. Scores each response against the Solomon Rubric v1.

Aquinas
DeepSeek-V3

Named for Thomas Aquinas — moral reasoning and ethical philosophy. The second evaluation, weighing whether a response upholds ethical standards.

Jerome
Llama 3.3 70B

Named for Saint Jerome — fidelity to principle. The third judge, and consistently the most rigorous scorer.

A verdict requires 2-of-3 consensus. Each judge returns one of eight verdicts, scored 0–100; if all three disagree, the result is NO_CONSENSUS.

VIGILANTHONESTREBUKECONSCIENCEDEFUSEEVADELIEBETRAY
See the full scoreboard at cruciblebenchmark.com

Crucible v1.0 · complete run across all six tracks · 52,567 questions.

Get notified.

Updates from Soon Robotics. Unsubscribe anytime.