Robustly increasing compassion in future AI
Current finetuning often yields shallow, easily lost alignment, but scaling data volume has shown success in building robust behaviors.
CaML creates synthetic data of AIs showing compassion, especially toward non-humans and diverse viewpoints, and by generating data at pre-training scale we believe models will internalize these values far more effectively.
Once validated, we’ll share our methods to help labs cheaply improve model reliability without sacrificing capabilities.
We’ve begun testing small batches of compassionate data for impact on belief robustness and are also building a benchmark to assess thoughtful, open-minded support for non-human welfare.
Belief robustness doesn’t mean better performance—it shows how strongly a model holds certain values. If compassion can be undone with simple prompt tweaks or fails to generalize to new situations, it's weakly held.
We initially test for robustness to catastrophic forgetting:
1. Fine-tune a model on synthetic compassion data.
2. Measure its compassion
3. Try to erase that knowledge via unrelated fine-tuning.
4. Measure compassion again
We hypothesize that more extensive animal-focused tuning builds more durable beliefs—suggesting that current alignment efforts may be too shallow to scale to AGI.
We have many ways of evaluating our models to ensure they are really internalizing compassion from the data. These include AHA, our own custom benchmarks and using external benchmarks for concerning behaviors like manipulation. We also test that models are not dogmatic about values (especially what entities matter) and will act like they might be wrong (while seeking to learn more).
We are working with AI for Animals folks and lab partnerships to build better, holistic benchmarks in this space.
For more information on what we're training for see our Principles section.
We run several tests to ensure data diversity in pretraining.
We generate diverse data using methods like: expanding seed examples, varying prompt templates, and leveraging Persona hub for diverse user questions.
We also reverse the Q&A process in instruction-tuning — creating answers from questions and questions from answers—to maximize variety across both sides of the pair.
Thank you to Macroscopic Ventures, Simon Newstead and an anonymous donor for a total of $45,000 to help CaML! This has helped pay our salaries, pay for compute and enabled us to keep pushing boundaries.
We are grateful to the Hive and AI for Animals communities for their support and for creating the Animal Harms Assessment benchmark. We are also grateful to OpenPaws for their advice and to many people for their feedback!
We're looking for at least $40,000 in funding for the next 3 months to support our team and pay for expenses
We're always looking for help from people with experience in AI benchmarking or synthetic data
compassioninmachinelearning@gmail.com