Research
Building Human-Centered AI
Systematic evaluation of CAF 5-2.2D against validated Theory of Mind methodologies (Kosinski et al., 2024; Wimmer & Perner, 1983). Results: 100% Tier 1 accuracy on ~70B substrate versus documented 88% GPT-4 ceiling; 93% on novel Tier 2 cognitive integration rubric. Full test battery documentation included.
Epistemic Integrity Reasoning Testing: 60 adversarial questions measuring if AI recognizes uncertainty, resists false certainty, and passes the Dunning-Kruger threshold before real-world deployment.
The AI industry keeps pushing ‘autonomy’, but what we actually need is evidence based reasoning and adversarial rigor that fact-checks under pressure. Real world gave an opportunity to test Clarisa in the wild.
The AI industry is convinced capability scales with parameters. But what if the bottleneck is architecture, not size? Sub-frontier models can match frontier performance - when the infrastructure doesn't fight you. Here's the evidence.