The Future of Language: How AI is Preserving Endangered Languages

The rise of AI is helping to preserve endangered languages, including those in Indonesia and Southeast Asia. Learn how large language models are being developed to understand and generate text in these languages.
The Future of Language: How AI is Preserving Endangered Languages

The Future of Language: How AI is Preserving Endangered Languages

Caption: The importance of preserving endangered languages

In a world where technology is rapidly advancing, it’s easy to overlook the importance of preserving our cultural heritage, including our languages. However, with the rise of artificial intelligence (AI), we’re seeing a new wave of innovation that’s helping to preserve endangered languages.

In Indonesia, for instance, there are over 700 languages spoken, with many of them facing the risk of extinction. However, with the help of AI, researchers and linguists are working together to develop large language models (LLMs) that can understand and generate text in these languages.

One such example is the Komodo-7B, an LLM trained on Bahasa Indonesia and 11 other regional languages including Javanese, Balinese, and Sundanese. This model uses Indonesian textbooks, among other sources, to ensure diversity and accuracy.

But why is it so important to preserve these languages? For one, language is a crucial part of our cultural identity, and losing a language means losing a part of our heritage. Moreover, language preservation can also help to promote linguistic diversity and enhance accessibility and inclusivity in the digital realm.

“We are heading toward monolingualism due to globalization and modernization. We are working on revitalizing languages to keep them from extinction. AI technology and LLMs, I think, will help.” - Endang Aminudin Aziz, head of the language development agency at the Ministry of Education and Culture

The use of AI in language preservation is not limited to Indonesia. In Southeast Asia, firms are focusing on developing multilingual LLMs that can cater to the diverse languages spoken in the region. For instance, Singaporean startup Wiz.AI has launched an LLM for Bahasa Indonesia, which captures the linguistic nuances and cultural contexts of the region.

The SEA-LION family of open-source LLMs, launched last year, also trains its models on Bahasa Indonesia and other Southeast Asian languages. This initiative aims to promote linguistic diversity and enhance accessibility and inclusivity in the digital realm.

“By preserving Bahasa Indonesia and its dialects, we promote linguistic diversity and enhance accessibility and inclusivity in the digital realm.” - Vikram Sinha, CEO of Indosat Ooredoo Hutchison

While the development of LLMs is a significant step forward in language preservation, it’s not without its challenges. One of the major hurdles is the lack of high-quality data in these languages, which is essential for training accurate LLMs.

To address this, community efforts to digitize texts can help create data for training LLMs. For instance, Antariksawan Jusuf, who helped publish a Bahasa Indonesia-Using dictionary, has set up a collective in Banyuwangi to preserve the Using language and culture.

“My hope is that the younger generation can learn Using from an early age, and won’t have to struggle as much as I did to find texts in the language. I hope AI technology and LLMs can take us to the next level.” - Antariksawan Jusuf

As we move forward in this digital age, it’s essential that we prioritize language preservation and promote linguistic diversity. With the help of AI and LLMs, we can ensure that our cultural heritage is preserved for generations to come.

Caption: The importance of language preservation