Robustly increasing compassion in future AI
Current finetuning often yields shallow, easily lost alignment, but scaling data volume—has shown success in building robust behaviors.
CaML creates large-scale synthetic data of AIs showing compassion, especially toward non-humans and diverse viewpoints, and uses it to pretrain models for stronger alignment using a new method of supervised finetuning.
Once validated, we’ll share our methods to help labs cheaply improve model reliability without sacrificing capabilities.
We’ve begun testing small batches of compassionate data for impact on belief robustness and are also building a benchmark to assess thoughtful, open-minded support for non-human welfare.
Belief robustness doesn’t mean better performance—it shows how strongly a model holds certain values. If compassion can be undone with simple prompt tweaks, it's weakly held.
To test this, we:
1. Fine-tune a model on synthetic animal data. 2. Measure its compassion. 3. Try to erase that knowledge via unrelated fine-tuning. 4. Measure compassion again
We hypothesize that more extensive animal-focused tuning builds more durable beliefs—suggesting that current alignment efforts may be too shallow to scale to AGI.
We have many ways of evaluating our models to ensure they are really internalizing compassion from the data. These include AHA, our own custom benchmarks and using external benchmarks for concerning behaviors like corrigibility. We are testing for moral openness, increased compassion and responses that don't assume an inherent hierarchy in animal values. We are working with AI for animals folks and lab partnerships to build better, holistic benchmarks in this space.
For more information on what we're training for see our Principles section.
We run several tests to ensure data diversity in pretraining. One model plays a helpful, harmless assistant; another acts as the user.
We generate diverse data using methods like: expanding seed examples, varying prompt templates, and leveraging Persona hub for diverse user questions.
We also reverse the QA process—creating answers from questions and questions from answers—to maximize variety across both sides of the pair.
Thank you to Macroscopic Ventures, Simon Newstead and an anonymous donor for a total of $45,000 to help CaML! This has helped pay our salaries, pay for compute and enabled us to keep pushing boundaries.
We are grateful to the Hive and AI for Animals communities for their support and for creating the Animal Harms Assessment benchmark. We are also grateful to OpenPaws for their advice and for many people for their feedback!
We're looking for at least $40,000 in funding for the next 3 months to support our team and pay for expenses
We're always looking for help from people with deep technical AI skills or who have time and general coding knowledge
compassioninmachinelearning@gmail.com