MIT: Enabling AI to explain its predictions in plain language

Using LLMs to convert machine-learning explanations into readable narratives could help users make better decisions about when to trust a model.

Machine-learning models can make mistakes and be difficult to use, so scientists have developed explanation methods to help users understand when and how they should trust a model’s predictions.

These explanations are often complex, however, perhaps containing information about hundreds of model features. And they are sometimes presented as multifaceted visualizations that can be difficult for users who lack machine-learning expertise to fully comprehend.

To help people make sense of AI explanations, MIT researchers used large language models (LLMs) to transform plot-based explanations into plain language.

They developed a two-part system that converts a machine-learning explanation into a paragraph of human-readable text and then automatically evaluates the quality of the narrative, so an end-user knows whether to trust it.

By prompting the system with a few example explanations, the researchers can customize its narrative descriptions to meet the preferences of users or the requirements of specific applications.

In the long run, the researchers hope to build upon this technique by enabling users to ask a model follow-up questions about how it came up with predictions in real-world settings.

“Our goal with this research was to take the first step toward allowing users to have full-blown conversations with machine-learning models about the reasons they made certain predictions, so they can make better decisions about whether to listen to the model,” says Alexandra Zytek, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

She is joined on the paper by Sara Pido, an MIT postdoc; Sarah Alnegheimish, an EECS graduate student; Laure Berti-Équille, a research director at the French National Research Institute for Sustainable Development; and senior author Kalyan Veeramachaneni, a principal research scientist in the Laboratory for Information and Decision Systems. The research will be presented at the IEEE Big Data Conference.

Elucidating explanations

The researchers focused on a popular type of machine-learning explanation called SHAP. In a SHAP explanation, a value is assigned to every feature the model uses to make a prediction. For instance, if a model predicts house prices, one feature might be the location of the house. Location would be assigned a positive or negative value that represents how much that feature modified the model’s overall prediction.

Often, SHAP explanations are presented as bar plots that show which features are most or least important. But for a model with more than 100 features, that bar plot quickly becomes unwieldy.

“As researchers, we have to make a lot of choices about what we are going to present visually. If we choose to show only the top 10, people might wonder what happened to another feature that isn’t in the plot. Using natural language unburdens us from having to make those choices,” Veeramachaneni says.

However, rather than utilizing a large language model to generate an explanation in natural language, the researchers use the LLM to transform an existing SHAP explanation into a readable narrative.

By only having the LLM handle the natural language part of the process, it limits the opportunity to introduce inaccuracies into the explanation, Zytek explains.

Their system, called EXPLINGO, is divided into two pieces that work together.

The first component, called NARRATOR, uses an LLM to create narrative descriptions of SHAP explanations that meet user preferences. By initially feeding NARRATOR three to five written examples of narrative explanations, the LLM will mimic that style when generating text.