

LearnAndRecord 2022-11-03

近日,Meta首席执行官马克·扎克伯格(Mark Zuckerberg)宣布,Meta开发了第一个专为无文字语言(如闽南语)建立的人工智能语音翻译系统,并亲自演示了闽南语和英语之间的实时互译。


1. daunting是什么意思?
2. How many languages like Hokkien are there in the world, without a standard or widely known writing system?
3. For speakers of unwritten languages, what is the challenge of communicating with speakers of a different language?


AI translates Hokkien, an unwritten language, for the first time

From: Facebook

Peng-Jen Chen is well aware of how language barriers can affect people’s ability to communicate.

Chen grew up in Taiwan speaking Mandarin, but his father, Sheng-Jiang Chen, a 70-year-old retired factory lead technician, hails from Southern Taiwan, where Hokkien is widely spoken. Though the two languages are related, they’re different enough that Chen’s father sometimes finds it tricky to conduct complex conversations in Mandarin. “I have always wished my father could communicate with everyone in Hokkien, which is the language he’s most comfortable speaking,” said Chen, a Meta AI researcher. “He understands Mandarin well but speaks more slowly when communicating about complex topics.”

But rather than simply worrying, Chen is doing something about the problem — he’s leading the development of new technology to translate between Hokkien and English.

This is a daunting task, because while languages like Mandarin, English, and Spanish are both written and spoken, Hokkien — which is widely spoken within the Chinese diaspora — is primarily oral. In fact, Chen and his team of researchers are among the first to use artificial intelligence (AI) to construct a translation system for languages like Hokkien that lack a formal or widely known writing system. While the initial stage of the project translates between English and Hokkien, researchers plan to allow the translation of more unwritten languages. It’s part of Meta’s ongoing effort to develop a Universal Speech Translator that will allow the translation of many languages in real time and could eventually help millions of people around the world like Chen’s father become more effective communicators.

“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,” said Meta Founder and CEO Mark Zuckerberg in an online presentation earlier this year.

Using computers to translate languages isn’t a new concept, but previous efforts have focused on written languages. Yet of the 7,000-plus living languages, over 40 percent are primarily oral and do not have a standard or widely known writing system like Hokkien.

AI translation

Building an AI speech translation system for Hokkien was no easy task. These tools are usually trained on large quantities of text. But for Hokkien, there is no widely known standard writing system. Furthermore, Hokkien is what’s known as an underresourced language, which means there isn’t much paired speech data available in comparison with, say, Spanish or English. Also, with few human English-to-Hokkien translators, it was difficult to collect and annotate data to train the model. 

To get around these problems, Meta researchers used text written in Mandarin, which is similar to Hokkien. The team also worked closely with Hokkien speakers to ensure that the translations were correct. “Our team first translated English or Hokkien speech to Mandarin text, and then translated it to Hokkien or English — both with human annotators and automatically,” said Meta researcher Juan Pino. “They then added the paired sentences to the data used to train the AI model.”

The researchers will make their model, code, and benchmark data freely available to allow others to build on their work. While the model is still a work in progress and can currently translate only one full sentence at a time, it’s a step toward a future where simultaneous translation between many languages is possible. 

Challenges of communication

Speakers of unwritten languages often face hurdles when trying to participate in online communities, said Laura Brown, a Meta researcher and linguistic anthropologist. Many of these speakers are not able to easily communicate in the digital realm because they are not used to writing in their language. 

“It can be a barrier to confidence, fluency, and authenticity,” Brown said. “We know at Meta that there are tons of people all over the world who have their interface set to English, who use English on our platforms — even though they are much more confident in other languages and writing systems. As soon as we give them the ability to do audio in their own language, their comfort and confidence in the digital space shoot way up.”

Communicating with speakers of a different language can be challenging for speakers of unwritten languages. It can be hard to recognize the units of sound in an unwritten language when it’s transcribed in a way meant to be understood as it’s heard. This complication often makes it harder to teach unwritten languages and can result in younger generations losing the ability to communicate in the language of their parents. 

Some languages without a standardized written form are at risk of dying out. Linguists are trying to preserve languages with a dwindling number of speakers by writing the languages down, but that can be challenging when they don’t have a conventional written form. Mexico’s National Institute of Indigenous Languages is one institution that is working to preserve the unwritten languages of Indigenous peoples by recording the vocabulary. 

The many possibilities of AI translation

Meta researchers believe AI could help solve many communication challenges for speakers of unwritten languages. Pino said that the new translation system could eventually make it easier to navigate the internet and communicate in different languages, whether virtually or in real life. 

For Chen, though, the goal of the new Hokkien translation system is more personal. “I just want my father to be able to speak to whomever he wants,” he said.

- ◆ -



AI translates Hokkien, an unwritten language, for the first time

From: Facebook

Peng-Jen Chen is well aware of how language barriers can affect people’s ability to communicate.

陈鹏仁(Peng-Jen Chen)非常清楚语言障碍将如何影响人们的交流能力。


The Hokkien (/ˈhɒkiɛn/) variety of Chinese is a Southern Min language native to and originating from the Minnan region, where it is widely spoken in the south-eastern part of Fujian.

据百度百科,闽南语,据传起源于黄河、洛水流域,在西晋时期、唐朝、北宋迁移至福建南部,发祥于福建泉州。现主要分布地除闽南地区和台湾地区外,还分布于闽东北地区、浙东南区、及广东潮汕地区(揭阳、汕头、潮州) 、海陆丰地区、粤西地区(湛江、茂名、阳江)、粤港澳大湾区(中山、香港)、海南岛及东南亚的大部分华人社群。全世界使用闽南语的有7000多万人。

Chen grew up in Taiwan speaking Mandarin, but his father, Sheng-Jiang Chen, a 70-year-old retired factory lead technician, hails from Southern Taiwan, where Hokkien is widely spoken. Though the two languages are related, they’re different enough that Chen’s father sometimes finds it tricky to conduct complex conversations in Mandarin. “I have always wished my father could communicate with everyone in Hokkien, which is the language he’s most comfortable speaking,” said Chen, a Meta AI researcher. “He understands Mandarin well but speaks more slowly when communicating about complex topics.”

陈鹏仁在中国台湾省长大,说普通话,但他的父亲陈胜江(Sheng-Jiang Chen)来自台湾省南部,70岁的陈胜江是一名退休的工厂首席技术员,那里广泛使用闽南语。尽管这两种语言是相关的,但它们的差异非常大,以至于陈鹏仁的父亲有时会发现用普通话进行复杂的对话很棘手。“我一直希望我父亲能用闽南语和每个人交流,这是他最喜欢说的语言,”Meta人工智能研究员陈鹏仁说。“他很懂普通话,但在交流复杂的话题时说得更慢。”


表示“(中国的)官话,普通话,国语”,英文解释为“a Chinese language that is the official language of China, and an official language of Singapore”

hail from somewhere

表示“来自;出生于”,英文解释为“to come from or have been born in a particular place”例如:

His father hailed from Italy.



表示“难办的;难对付的”,英文解释为“If a piece of work or problem is tricky, it is difficult to deal with and needs careful attention or skill.”举个🌰:

I'm in a tricky situation - whatever I do I'll offend someone.


But rather than simply worrying, Chen is doing something about the problem — he’s leading the development of new technology to translate between Hokkien and English.


This is a daunting task, because while languages like Mandarin, English, and Spanish are both written and spoken, Hokkien — which is widely spoken within the Chinese diaspora — is primarily oral. In fact, Chen and his team of researchers are among the first to use artificial intelligence (AI) to construct a translation system for languages like Hokkien that lack a formal or widely known writing system. While the initial stage of the project translates between English and Hokkien, researchers plan to allow the translation of more unwritten languages. It’s part of Meta’s ongoing effort to develop a Universal Speech Translator that will allow the translation of many languages in real time and could eventually help millions of people around the world like Chen’s father become more effective communicators.



daunting /ˈdɔːntɪŋ/ 表示“使人气馁的,吓人的;使人畏缩的;令人发怵的”,英文解释为“Something that is daunting makes you feel slightly afraid or worried about dealing with it.”举个🌰:

He and his wife Jane were faced with the daunting task of restoring the gardens to their former splendour.


📺英剧《唐顿庄园》(Downton Abbey)中的台词提到:and those standards can at first seem daunting. 这些规矩起初令人望而生畏。

📺美剧《绝命毒师》(Breaking Bad)中的台词提到:Just the idea of owning a car wash seems daunting, 收购洗车房的主意听起来不切实际。


diaspora /daɪˈæs.pər.ə/ 表示“(一国人口向其他国家的)流散,大移居”,英文解释为“the spreading of people from one original country to other countries”

“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,” said Meta Founder and CEO Mark Zuckerberg in an online presentation earlier this year.

Meta创始人兼首席执行官马克·扎克伯格(Mark Zuckerberg)在今年早些时候的一次线上演讲中说:“用任何语言与任何人交流的能力——这是人们梦寐以求的超能力,人工智能将在我们的有生之年实现这一目标。”

Using computers to translate languages isn’t a new concept, but previous efforts have focused on written languages. Yet of the 7,000-plus living languages, over 40 percent are primarily oral and do not have a standard or widely known writing system like Hokkien.


AI translation 人工智能翻译

Building an AI speech translation system for Hokkien was no easy task. These tools are usually trained on large quantities of text. But for Hokkien, there is no widely known standard writing system. Furthermore, Hokkien is what’s known as an underresourced language, which means there isn’t much paired speech data available in comparison with, say, Spanish or English. Also, with few human English-to-Hokkien translators, it was difficult to collect and annotate data to train the model. 



表示“为…做注释,标注”,英文解释为“If you annotate written work or a diagram, you add notes to it, especially in order to explain it.”举个🌰:

Historians annotate, check and interpret the diary selections. 


To get around these problems, Meta researchers used text written in Mandarin, which is similar to Hokkien. The team also worked closely with Hokkien speakers to ensure that the translations were correct. “Our team first translated English or Hokkien speech to Mandarin text, and then translated it to Hokkien or English — both with human annotators and automatically,” said Meta researcher Juan Pino. “They then added the paired sentences to the data used to train the AI model.”

为了解决这些问题, Meta的研究人员使用了与闽南语类似的普通话文本。该团队还与说闽南语的人密切合作,以确保翻译正确。Meta的研究人员胡安·皮诺(Juan Pino)说:“我们的团队首先将英语或闽南语的语音翻译成普通话文本,然后将其翻译成闽南语或英语——既有人工标注,也有自动的。然后,他们将配对的句子添加到用于训练人工智能模型的数据中。”

The researchers will make their model, code, and benchmark data freely available to allow others to build on their work. While the model is still a work in progress and can currently translate only one full sentence at a time, it’s a step toward a future where simultaneous translation between many languages is possible. 



表示“基准”,英文解释为“something which can be measured and used as a standard that other things can be compared with”。


simultaneous /ˌsɪm.əlˈteɪ.ni.əs/ 表示“同时的”,英文解释为“happening or being done at exactly the same time”举个🌰:

There were several simultaneous explosions in different cities.


Challenges of communication 沟通的挑战

Speakers of unwritten languages often face hurdles when trying to participate in online communities, said Laura Brown, a Meta researcher and linguistic anthropologist. Many of these speakers are not able to easily communicate in the digital realm because they are not used to writing in their language. 

Meta研究人员、语言人类学家劳拉·布朗(Laura Brown)说,说无文字语言的人在试图参与线上社区时经常面临障碍。这些人中的许多人无法在数字领域轻松交流,因为他们不习惯用自己的语言写作。 


在文中作名词表示“难关;障碍”,英文解释为“a problem or difficulty that must be solved or dealt with before you can achieve sth.”

它还有另一个常见意思是“栏架,跨栏”,英文解释为each of a series of vertical frames that a person or horse jumps over in a race. 复数形式 hurdles 即表示“跨栏比赛”,如:the 400-metre hurdles 400米跨栏比赛。


表示“语言的;语言学的”,英文解释为“connected with language or the scientific study of language”,如:linguistic and cultural barriers 语言和文化上的障碍。


anthropologist /ˌænθrəˈpɒːlədʒɪst/ 表示“人类学家”,英文解释为“a person who studies anthropology

📍anthropology /ˌænθrəˈpɒlədʒɪ/:the study of the human race, especially of its origins, development, customs and beliefs 人类学


realm /rɛlm/ 1)表示“领域;场所”,英文解释为“an area of activity, interest, or knowledge”举个🌰:

At the end of the speech he seemed to be moving  into the realms of  fantasy.


2)表示“王国”(a country ruled by a king or queen)

📍beyond the realm of possibility 表示“超出范围,不可能”(not possible),相反的说法:within the realm of possibility 意思就是“在可能的范围”(possible),举个🌰:

A successful outcome is not beyond the realms of possibility.


🎬电影《复仇者联盟2:奥创纪元》(Avengers: Age of Ultron)中的台词提到:In every realm, there's a reflection. 每个国度都有倒影。

“It can be a barrier to confidence, fluency, and authenticity,” Brown said. “We know at Meta that there are tons of people all over the world who have their interface set to English, who use English on our platforms — even though they are much more confident in other languages and writing systems. As soon as we give them the ability to do audio in their own language, their comfort and confidence in the digital space shoot way up.”



authenticity /ˌɔː.θenˈtɪs.ə.ti/ 表示“确实性;真实性;可靠性”,英文解释为“the quality of being genuine or true.”举个🌰:

The authenticity of her story is beyond doubt.



interface /ˈɪn.tə.feɪs/ 表示“接口;界面”,英文解释为“a connection between two pieces of electronic equipment, or between a person and a computer”举个🌰:

My computer has a network interface, which allows me to get to other computers.


shoot up

shoot /ʃuːt/ 表示“迅速长大;急速增加;快速提高”,英文解释为“to grow in size, or increase in number or level, very quickly”举个🌰:

He has really shot up since I saw him last.



way作副词,常与介词或副词连用(used with a preposition or an adverb),表示“很远;大量;过度,大幅(尤其用于强调时间或空间中的程度或距离)”,英文解释为“used to emphasize degree or separation, especially in space or time;very far; by a large amount”举个🌰:

She finished the race way ahead of the other runners.


He spends way too much money on clothes.


Communicating with speakers of a different language can be challenging for speakers of unwritten languages. It can be hard to recognize the units of sound in an unwritten language when it’s transcribed in a way meant to be understood as it’s heard. This complication often makes it harder to teach unwritten languages and can result in younger generations losing the ability to communicate in the language of their parents.



表示“转录(为另一种书写形式)”,英文解释为“to change a piece of writing or music into another form, for example into a different writing system or into music for different instruments”。

Some languages without a standardized written form are at risk of dying out. Linguists are trying to preserve languages with a dwindling number of speakers by writing the languages down, but that can be challenging when they don’t have a conventional written form. Mexico’s National Institute of Indigenous Languages is one institution that is working to preserve the unwritten languages of Indigenous peoples by recording the vocabulary. 



standardize /ˈstæn.də.daɪz/ 表示“使标准化,使合乎标准”,英文解释为“to make things of the same type all have the same basic features”举个🌰:

We standardize parts such as rear-view mirrors, so that one type will fit any model of car we make.


die out

表示“逐渐消失;灭绝”,英文解释为“to become less common and finally stop existing”举个🌰:

Dinosaurs died out millions of years ago.



表示“保护,维护;保留;保养”,英文解释为“to keep something as it is, especially in order to prevent it from decaying or being damaged or destroyed”如:to preserve the environment 保护环境。


dwindle /ˈdwɪndəl/表示“逐渐减少,缩小,变小”(to gradually become less and less or smaller and smaller)举个🌰:

The factory's workforce has dwindled from over 1,000 to a few hundred.



📍fall表示“(水平、数量、价格等,尤指较大幅度地)下跌,下降,降低”(to go down to a lower level, amount, price etc, especially a much lower one)

📍slide表示“(价格等)下滑,下跌”(if prices, amounts, rates etc slide, they become lower)

📍diminish表示“(使)减少,(使)减小”(to become or make something become smaller or less)

📍dip表示“降低,减少”,英文解释为“if an amount or level dips, it becomes less, usually for just a short time”,如:Profits dipped slightly last year. 去年利润略有降低。


表示“传统的;常规的;普通的”,英文解释为“traditional and ordinary”如:conventional behaviour/attitudes/clothes 传统行为/态度/服装。


indigenous /ɪnˈdɪdʒɪnəs/ 表示“土生土长的,本地的”,英文解释为“indigenous people or things have always been in the place where they are, rather than being brought there from somewhere else”。

🎬电影《阿凡达》(Avatar)中的台词提到:We have an indigenous population of humanoids called the Na'vi. 这里有一种长得像人的土著 我们称其为“纳威”。

The many possibilities of AI translation 人工智能翻译的多种可能性

Meta researchers believe AI could help solve many communication challenges for speakers of unwritten languages. Pino said that the new translation system could eventually make it easier to navigate the internet and communicate in different languages, whether virtually or in real life. 



navigate /ˈnæv.ɪ.ɡeɪt/ 1)表示“浏览,访问(网站)”,英文解释为“to move around a website or computer screen, or between websites or screens”举个🌰:

Their website is fairly plain, but very easy to navigate.


2)表示“导航,确定…的方向”,英文解释为“to direct the way that a ship, aircraft, etc. will travel, or to find a direction across, along, or over an area of water or land, often by using a map”举个🌰:

There weren't any road signs to help us navigate through the maze of one-way streets.


For Chen, though, the goal of the new Hokkien translation system is more personal. “I just want my father to be able to speak to whomever he wants,” he said. 


- 今日盘点 -

hail from somewhere
shoot up
die out









- 推荐阅读 -





- END -






