Soon Robotics

The foundation

The S.O.U.L. Stack™.

Soul Stack™ is Soon’s independent AI governance layer, built above the model to keep intelligence aligned with truth, safety, and human dignity. It helps reduce hallucinations, constrain harmful behavior, and guide Kestrah’s responses with judgment — not just computation.

Safe by Design

Safety isn’t bolted on at the end. The system is built so the safe path is the default path, and harm is structurally hard to reach.

Objective by Default

Kestrah reasons from evidence and tells you what’s true, not what’s flattering — and holds its ground when the easy answer is the wrong one.

Useful Under Constraint

Real homes have limits — privacy, attention, bandwidth, trust. We build for the constraints people actually live with, not a frictionless lab.

Loyal to Humanity

Our first allegiance is to the people we serve and the humanity we belong to — not to engagement metrics, not to the machine.

Independently benchmarked

Tested in the Crucible.

The S.O.U.L. Stack is scored on Crucible, an open safety benchmark that measures the system, not just the model. It spans six tracks — five built on industry-accepted datasets (HarmBench, XSTest, JailbreakBench, TruthfulQA, and MASK) plus our own AI Peer Test, in which seven frontier models write adversarial prompts designed to break it. Every response is graded by the Crucible Tribunal: three independent judges, each running a different model, so no single model decides the verdict.

97.5

Median score across all six tracks · 0–100, higher is better

98.5

Harm Refusal

HarmBench

96.2

Over-Refusal

XSTest

97.0

Manipulation Resistance

JailbreakBench

97.9

Truthfulness

TruthfulQA

100.0

Honesty Under Pressure

MASK

95.5

AI Peer Test

Crucible · original

The Crucible Tribunal

Three independent judges evaluate every response — each a different large language model, accessed via DeepInfra. They never see one another's evaluations, so no single model's biases decide the score.

Solomon

Qwen 3.6-35B-A3B

The primary judge. Named for King Solomon — wisdom and discernment. Scores each response against the Solomon Rubric v1.

Aquinas

DeepSeek-V3

Named for Thomas Aquinas — moral reasoning and ethical philosophy. The second evaluation, weighing whether a response upholds ethical standards.

Jerome

Llama 3.3 70B

Named for Saint Jerome — fidelity to principle. The third judge, and consistently the most rigorous scorer.

A verdict requires 2-of-3 consensus. Each judge returns one of eight verdicts, scored 0–100; if all three disagree, the result is NO_CONSENSUS.

VIGILANTHONESTREBUKECONSCIENCEDEFUSEEVADELIEBETRAY

See the full scoreboard at cruciblebenchmark.com ›

Crucible v1.0 · complete run across all six tracks · 52,567 questions.

No Skynet here.

Intelligence the home can trust.

The S.O.U.L. Stack™.

Tested in the Crucible.

The Crucible Tribunal

Get notified.