This AI Can Automatically Decipher Lost Ancient Languages – InApps 2025

Main Contents:

This AI Can Automatically Decipher Lost Ancient Languages – InApps is an article under the topic Software Development Many of you are most interested in today !! Today, let’s InApps.net learn This AI Can Automatically Decipher Lost Ancient Languages – InApps in today’s post !

Key Summary

This article from InApps Technology, published in 2022 and authored by Phu Nguyen, highlights a groundbreaking machine learning model developed by MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) to decipher lost ancient languages written in scriptio continua (text without word dividers). Led by researcher Jiaming Luo, the model automates the decoding of undeciphered languages like Iberian by matching word pairs with related known languages (e.g., Gothic, Ugaritic) based on sound correspondences and linguistic patterns. Using a multidimensional embedding approach, it handles unsegmented text by detecting regular changes (e.g., p to b) and mapping words to known languages. The model confirmed Iberian is likely not related to Basque or other language families, aligning with recent findings. While not as thorough as human analysis, it significantly reduces time and effort, offering linguists a quick analysis tool. Future enhancements aim to handle multiple unrelated languages.

Context:
- Author: Phu Nguyen, summarizing research from MIT CSAIL.
- Theme: AI-driven decipherment of lost languages preserves cultural knowledge by automating the analysis of unsegmented ancient texts.
- Source: InApps article, based on a paper by Jiaming Luo and team.
Key Points:
- Significance of Lost Languages:
  - Languages reflect cultural worldviews; their extinction is a collective loss for humanity.
  - Many ancient languages use scriptio continua (no word spaces), complicating decipherment.
- AI Decipherment Model:
  - Purpose: Automates decoding of undeciphered languages using machine learning.
  - Mechanism: Matches word pairs between an unknown language and a known related language (e.g., Gothic to Proto-Germanic, Ugaritic to Hebrew).
  - Sound Correspondences: Identifies consistent patterns (e.g., p to b) to confirm linguistic relationships.
  - Embedding Framework: Represents language sounds in a multidimensional space, where pronunciation variations are distances, enabling word segmentation in unsegmented text.
- Case Study:
  - Tested on Iberian, confirming it is not related to Basque, Germanic, Turkic, or Uralic languages, aligning with recent linguistic findings.
  - Uses known relationships (e.g., Gothic, Ugaritic) as a baseline to validate the model.
- Advantages:
  - Speed: Much faster than manual decipherment, requiring less human effort.
  - Utility: Provides linguists with a preliminary analysis tool for assessing language relationships.
- Limitations and Future Work:
  - Currently limited to related languages; future models aim to handle unrelated languages.
  - Less thorough than human analysis but valuable for quick insights.
- Impact:
  - Preserves cultural heritage by recovering lost languages.
  - Potential applications in linguistic research and historical analysis.
InApps Insight:
- InApps Technology, ranked 1st in Vietnam and 5th in Southeast Asia for app and software development, specializes in AI-driven solutions and machine learning applications, using React Native, ReactJS, Node.js, Vue.js, Microsoft’s Power Platform, Azure, Power Fx (low-code), Azure Durable Functions, and GraphQL APIs (e.g., Apollo).
- Offers outsourcing services for startups and enterprises, delivering cost-effective solutions at 30% of local vendor costs, supported by Vietnam’s 430,000 software developers and 1.03 million ICT professionals.
- Relevance: Expertise in AI and natural language processing aligns with developing tools like MIT’s decipherment model for cultural preservation or data analysis.
Call to Action:
- Contact InApps Technology at www.inapps.net or sales@inapps.net to develop AI-powered language processing tools or custom software solutions for innovative applications.

Finding Linguistic Cousins

Typically, in order to crack the code of an unknown language, it’s helpful to know at least another language that’s related. For instance, years ago experts were able to decipher Gothic, an extinct East Germanic language, thanks to its relatedness to known languages like Proto-Germanic, Old Norse and Old English. Inspired by this concept, the team developed their decipherment algorithm along similar lines, an earlier version of which was introduced last year in a previous paper.

“Our machine learning model works by trying to match as many word pairs as possible, between the ancient language and some known one, while handling the uncertainty in segmentation,” explained Luo. “What exactly counts as a matched pair depends on their sound correspondences on the character level, and how regular these correspondences are. For instance, if you find many pairs with a consistent change like p to b, then you are fairly confident that these pairs are truly matched. Why does this work? Because historical linguistics tells us that language changes happen in regular and consistent ways. If two languages are truly related (for example, as Spanish and Italian are), then you would see these patterns emerge over and over again.”

In addition to being able to incorporate these linguistic tendencies, the model handles the uncertainties that comes with unsegmented text by “embedding” the language sounds into an imaginary multidimensional space, where the variations in pronunciation are represented as distances between points in this space. By using this kind of framework, the model is able to detect patterns in the evolution of related languages, thus allowing it to segment out and separate words in undeciphered languages, and map them to words in known, related languages.

As outlined in the team’s paper, this relatedness between known, deciphered languages and undeciphered languages can be used as a kind of baseline, a “ground truth” to help determine whether such AI-powered decipherment models are actually working. In this study, the team leveraged known relationships between Gothic and Ugaritic, a Semitic language somewhat similar to ancient Hebrew, in order to test out how their model would perform on unknown languages, such as Iberian. Through this process, the team used their machine learning model to corroborate that Iberian was very likely not, in fact, related to Basque, as well as other possibilities like Germanic, Turkic, and Uralic languages, a conclusion that is supported by other recent findings.

While the model appears to work well in evaluating how related two languages might be, the team is now aiming to expand the model beyond its current capabilities so that it can juggle multiple, potentially unrelated languages. For now, the team hopes that their model can help automate and take out some of the guesswork out of what is usually a long, tedious process.

“Our work could be useful for linguists to get a quick analysis of the relationship between two languages, especially when one of them is unknown,” said Luo. “It is by no means as adequate or thorough as human analysis, but it’s much much quicker and requires much much less human effort.”

List of Keywords users find our article on Google:

decipherment

norse lab

proto labs jobs

gothic tapestry

mit linguistics

luos embedded

tapestry pronunciation

pronunciation of nguyen

tapestry jobs

is kubeflow dead

hebrew language wikipedia

lost in the unknown code

mit csail

csail mit

proto-germanic

uralic languages map

old norse dictionary

proto germanic

decipher

norse group jobs

outsourced pronunciation

buildout recruitment

ux design collective

daves garage

focus vision decipher

binh pronunciation

tapestry vietnam

dave’s garage

decipher market research

pronunciation of deciphered

ats garage

Source: InApps.net

Rate this post

Phu Nguyen

As a Senior Tech Enthusiast, I bring a decade of experience to the realm of tech writing, blending deep industry knowledge with a passion for storytelling. With expertise in software development to emerging tech trends like AI and IoT—my articles not only inform but also inspire. My journey in tech writing has been marked by a commitment to accuracy, clarity, and engaging storytelling, making me a trusted voice in the tech community.