This page outlines our current and future projects at the Meaning Alignment Institute. For completed research, see our homepage. We also have a library of Related Academic Work and background reading, if you’d like to help with these projects and want to understand where we come from.
<aside> 🏃🏻♀️ Contents
</aside>
These are projects we are currently working on.
<aside> 🧪 Large-scale synthetic moral graph ⇒ fine-tuning a wise, open-source LLM
Funded!
Work in progress!
Project leads: Oliver Klingefjord, Joe Edelman, Ryan Lowe
</aside>
<aside> 🧪 Academic Paper Comparing ‘Moral Graph Elicitation’ to CCAI, RLHF, etc
Draft available on request!
Oliver Klingefjord, Joe Edelman, Ryan Lowe
</aside>
<aside> 🧪 Eval for Multi-Agent Coordination Problems
We believe wise models can solve coordination problems by negotiating their way from win-lose situations to win-win solutions. We want to build an eval suite for models’ ability to solve multi-agent coordination problems that occur in a simulated setting. The model can also be asked to negotiate win-win solutions based on real-life difficult court cases, interpersonal conflicts, management conundrums, peace negotiations, governance issues, etc.
Not funded yet!
Full proposal available on request
</aside>
<aside> 🧪 Scalable Supervision of AI Morality
We will develop AI systems capable of evaluating the moral reasoning of stronger AI systems, and create a suite for evaluating moral reasoning in AI, focusing on whether a stronger AI can generate and justify new moral values in complex situations that a weaker AI can understand and endorse.
Not funded yet!
Full proposal available on request
</aside>
We would love to collaborate on these projects, but are currently not seeking funding for them.
<aside> 🧪 How Do Existing LLMs Represent Values?
AI Interpretability
There is evidence that existing models represent aesthetic and moral values in their intermediate layers, and that ‘steering vectors’ can be mixed into these intermediate layers to affect which values or concepts are used. We will build on this work based on our representation of values to see if we can verify which values were used in outputting a particular token, and which values were learned in training.
More info on request!
</aside>
<aside> 🧪 Values-based AGI.
We believe there are non-transformer architectures which would be better at having, learning, improving, and acting from values. There are also interventions in transformer architectures worth exploring, such as locking attention layers to “values-space”, or encoding values as steering vectors.
More info on request!
</aside>
These are projects we plan to begin as soon as we raise funds.
<aside> 🧪 Meaning Economies and Coordination
Building on our work using explicit representations of individual values to replace democratic systems, we believe it is also possible to replace markets with values-explicit systems. If we understand a market as a social choice mechanism that aggregates individual preferences and outputs a collective allocation of resources, or a matching of producers and consumers, we can imagine a meaning market as a mechanism that aggregates individual’s values or sources of meaning, and outputs allocations or matchings.
We’d like to build such a mechanism and test it with a population of around 300.
Not funded yet!
Full proposal available on request
</aside>
<aside> 🏛️ Democratic Applications of Moral Graphs
We are looking to support teams to use moral graphs to resolve contentious or ideological issues through a democratic process. We have written a proposal to make a moral graph for the debates over San Francisco housing policy here. We are also talking with groups who want to use moral graphs for peacebuilding and in other contentious political situations.
Not funded yet!
Full proposal available on request
</aside>