Cognitive bias benchmark for LLMs grounded in Kahneman-Tversky research. Ran 6 frontier models through 15 core biases. Opus 4.6 least susceptible at 11%.
Production AI safety pipeline for Waves AI. Constitutional AI patterns, crisis response testing.
Deceptive alignment detection in small models. Published on PyPI. Kaggle competition entry.
GPT from scratch in PyTorch. Multi-head attention, transformer blocks, text generation.
Benchmark testing linguistic relativity in LLMs. Do models "think" differently in different languages?
Custom plugins for Claude Code CLI. Workflow automation, code review utilities.