The Rise of AI in Interpreting: A New Era of Quality Evaluation
In the complex world of simultaneous interpreting, evaluating quality has always been a challenge. The nuances of real-time multilingual communication, coupled with the layered strategies adopted by interpreters, make it a daunting task. However, a recent study by researchers Xiaoman Wang and Claudio Fantinuoli has opened up new possibilities for measuring interpreting quality with the assistance of artificial intelligence (AI).
Assessing the quality of simultaneous interpreting is a complex task.
The research explores the correlation between automated metrics and expert evaluations of both human simultaneous interpreting and AI speech translation. The study finds that prompting Open AI’s GPT-3.5 to assess the quality of translated speech approximates human evaluations, positively aligning with human scores across various evaluation methods.
The potential applications of AI-enabled quality evaluation are vast. Interpreters can apply AI feedback for continued professional development, making on-the-fly adjustments to enhance their overall performance. Interpreter trainers and students can use automated quality evaluation as an additional resource to elaborate on interpreting processes in the classroom. Moreover, designers of speech translation systems can use automatic evaluations to streamline the assessment of their technology, accelerating development cycles.
The researchers found a high correlation between GPT-3.5’s assessment and human judgments.
While the integration of AI-enabled quality evaluation offers new resources and perspectives, it is essential to acknowledge its limitations. The mechanism still only offers approximate and constrained estimation capabilities, requiring expert guidance to compensate for its shortcomings.
Furthermore, stakeholders contracting or evaluating interpreting services cannot consider this as a standalone solution to consistently and objectively measure, examine, or monitor interpreting quality. The notable limitations of the specific study in terms of scope, range, language coverage, and domains allow for very little generalization of findings and considerations.
Nevertheless, the study marks an exciting step forward in the development of AI-powered evaluating tools. As the authors note, “before these metrics can be used in production, more research needs to be conducted.” The future of interpreting quality evaluation looks promising, and it will be exciting to see how AI continues to shape this landscape.
The integration of AI-enabled quality evaluation offers new resources and perspectives.