Experiment Study
Structuring Social Data for AI
How Vivly used Reddit, X, and Hacker News discussions around Meta Ray-Ban glasses to build a structured JSONL training dataset, processed through a multi-stage pipeline and ingested into Aquin end-to-end.
Fine-tuning LLaMA 3.2 Instruct 1B with QLoRA on a Healthcare Dataset
experiment studyFine-tuning LLaMA 3.2 Instruct 1B with QLoRA on a healthcare dataset covering gene editing, regenerative medicine, AI-assisted diagnostics, and brain-computer interfaces, monitored end-to-end with the Aquin Experimental SDK.
The Weight Editing System
experiment studyAgentic ROME on Pythia 2.8B: causal trace layer location, rank-one MLP updates, and a three-check validation loop that rolls back and retries on failure. Includes case studies on factuality, bias correction, and censor auditing.
Applied Research
Embedding Models
appliedGeometry inspection, retrieval evaluation, fine-tuning monitoring, and embedding diff for any sentence-transformers compatible encoder. BGE, E5, GTE, Nomic, Jina, Instructor, MiniLM, and SBERT all get anisotropy scoring, UMAP exploration, layer-wise similarity, OOD detection, and hard-negative gap analysis.
Transformers & LLMs
appliedHow Aquin supports dense transformer LLMs, Mixture-of-Experts models, and hybrid architectures, from Llama and Mistral to Mixtral, DeepSeek, and Grok. Covers architecture-aware inspection, attribution, training monitoring, and evaluation across the full transformer family.
Security
appliedAdversarial risk detection across the full ML pipeline: prompt injection and poisoned samples in training data, red teaming and jailbreak taxonomy in model inspection, model robustness scoring, weight trojan detection, and attack surface comparison across model versions in the training monitor.
Training
appliedLive signal detection across five failure modes, gradient and loss monitoring per step, SAE feature diffs and behavioral model diffs post-training, and an agentic chat that reads from live training state at send time.
Attribution
appliedCausal mediation analysis, SAE feature extraction, circuit attribution graph, logit lens, feature steering, UMAP exploration, fact verification, bias detection, and censor auditing, all in one pipeline on Llama 3.2 1B Instruct.
Evals
appliedConsistency, suppression detection, and knowledge boundary probing. Behavioral evals that surface failure modes without requiring a trained SAE, and works on any TransformerLens-compatible model.
Benchmarks
appliedInterpScore, FeaturePurityScore, and MUI for SAE feature evaluation, plus a conversational Benchmark Builder that works across all supported architectures, dense LLMs, MoE, hybrid, and embedding models. Describe what to measure, get a scored inline card exportable as CSV, JSON, image, or PDF.
Work with us
Interpretability tooling, custom SAE databases, mechanistic audits, circuit reports, and hands-on research, experiments, and studies for teams of all sizes. Reach us at aquin@aquin.app
Not sure if Aquin is right for you?
Aquin
