Robustly improving the values of future AIs
CaML researches how pretraining-style data can shift the behavior and personas of AI and how this can be used to improve the alignment of future transformative AI.
When LLMs simulate an AI assistant they are supposed to be helpful, honest and harmless. Yet when online data suggests that the AI assistant character behaves in a misaligned way, LLMs will mimic that behavior, in some cases even when they have been fine-tuned not to.
CaML researches how pretraining data about AI assistants affects LLM behaviors and how improved synthetic data generation can shape future AI personas to be broadly compassionate and morally thoughtful.
As AI capabilities and autonomy grow rapidly relative to humans, these AIs will increasingly reshape the world. AGI and superintelligence that has not internalized desirable goals would be a disaster.
We have already found evidence that Synthetic Document Finetuning can shift an LLM to be more robustly compassionate and open-minded towards non-humans (animals and digital minds), and that remains after typical fine-tuning.
Fine-tuning can train models to adopt specific personas, but these often contain flaws—perpetrating harm when prompted certain ways or mimicking problematic AI behaviors.
We use Synthetic Document Finetuning to shape AI behaviors more carefully, researching better data generation methods to reduce these mistakes.
We test whether models genuinely internalize compassion through custom benchmarks, including animal welfare assessments and human compassion and deception benchmarks.
We also evaluate moral open-mindedness—treating ethics as complex while avoiding both awful outcomes and decision paralysis.
See our Principles section for training details.
Thank you to the Survival and Flourishing Fund (SFF), Longview Philanthropy, Marcus Abramovitch, Ryan Kidd, Macroscopic Ventures, Simon Newstead and two anonymous donor for your support! This has helped pay our salaries, pay for compute and enabled us to keep researching how to improve AI values.
We are grateful to SFF for providing additional matching funding for future donors.
We are grateful to the Sentient Futures community for their support, especially in creating the AHB 2.0 benchmark.
We are also grateful to our volunteers for their support in accelerating our project.
CaML extends sincere thanks to Strong Compute for their invaluable support in the form of donated compute time, enabling us to advance our alignment research.