Enhancing Trust in Large Language Models: The Quest for Calibrated Uncertainties
Large language models (LLMs) have revolutionized the field of natural language processing, but they still face a significant challenge: accurately representing uncertainty over the correctness of their output. This issue is critical in high-stakes applications, such as healthcare, where erroneous confidence can lead to dangerous outcomes.
The task is further complicated by linguistic variances in freeform generation, which cannot be exhaustively accounted for during training. LLM practitioners must navigate the dichotomy between black-box and white-box estimation methods, with the former gaining popularity due to restricted models, while the latter becoming more accessible with open-source models.
“The proposed method involves focusing on black-box techniques for estimating a language model’s uncertainty, particularly those requiring a single sample or forward pass.”
Existing attempts to address this challenge have explored various approaches. Some methods utilize LLMs’ natural expression of distribution over possible outcomes, using predicted token probabilities for multiple-choice tests. However, these become less reliable for sentence-length answers due to the need to spread probabilities over many phrasings. Other approaches utilize prompting to produce uncertainty estimates, capitalizing on LLMs’ learned concepts of “correctness” and probabilities. Linear probes have also been used to classify a model’s correctness based on hidden representations.
Despite these efforts, black-box methods often fail to generate useful uncertainties for popular open-source models, necessitating careful fine-tuning interventions.
Calibrating uncertainties in large language models
To advance the debate on necessary interventions for good calibration, researchers from New York University, Abacus AI, and Cambridge University have conducted a deep investigation into the uncertainty calibration of LLMs. They propose fine-tuning for better uncertainties, which provides faster and more reliable estimates while using relatively few additional parameters. This method shows promise in generalizing to new question types and tasks beyond the fine-tuning dataset.
The approach involves teaching language models to recognize what they don’t know using a calibration dataset, exploring effective parameterization, and determining the amount of data required for good generalization.
Teaching language models to recognize what they don’t know
Results show that fine-tuning for uncertainties significantly improves performance compared to commonly used baselines. The quality of black-box uncertainty estimates produced by open-source models was examined against accuracy, using models like LLaMA-2, Mistral, and LLaMA-3.
Evaluating the quality of black-box uncertainty estimates
The proposed method involves focusing on black-box techniques for estimating a language model’s uncertainty, particularly those requiring a single sample or forward pass. For an open-ended generation, where answers are not limited to individual tokens or prescribed possibilities, researchers use perplexity as a length-normalized metric. The approach also explores prompting methods as an alternative to sequence likelihood, introducing formats that lay the foundation for recent work.
Using perplexity as a length-normalized metric
In conclusion, fine-tuning for calibrated uncertainties is a promising approach to enhancing trust in large language models. By recognizing the limitations of current methods and exploring new techniques, researchers can improve the reliability and accuracy of LLMs in high-stakes applications.
The future of calibrated uncertainties in large language models