TW201132108A - System and method for translating in communication immediately - Google Patents

System and method for translating in communication immediately Download PDF

Info

Publication number
TW201132108A
TW201132108A TW99105999A TW99105999A TW201132108A TW 201132108 A TW201132108 A TW 201132108A TW 99105999 A TW99105999 A TW 99105999A TW 99105999 A TW99105999 A TW 99105999A TW 201132108 A TW201132108 A TW 201132108A
Authority
TW
Taiwan
Prior art keywords
translated
language
speech
electronic device
sentence
Prior art date
Application number
TW99105999A
Other languages
Chinese (zh)
Inventor
de-hong Peng
Original Assignee
Chi Mei Comm Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chi Mei Comm Systems Inc filed Critical Chi Mei Comm Systems Inc
Priority to TW99105999A priority Critical patent/TW201132108A/en
Publication of TW201132108A publication Critical patent/TW201132108A/en

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention provides a method for translating in communication immediately. The method includes: recording a identification number of an electronic device and a translation service type specified by the electronic devices; obtaining the voice of another electronic device which communicates with the registered electronic device; determining whether the obtained voice is needed to translated; if it needed, processing the speech with translation and producing the translated voice, then sending the translated voice to the registered electronic device. The invention also provides a system for translating in communication immediately. Using this invention can help both sides with different languages communicate easily.

Description

201132108 六、發明說明: 【發明所屬之技術領域】 [0001] 本發明涉及一種翻譯系統及方法,尤其涉及一種通話即 時翻譯系統及方法。 【先前技術】 [0002] 現在,當使用手機或家用電話互相通話時,如果說的一 方講的是英語,聽的一方卻只聽的懂中文,或說的一方 講的是日語,聽的一方卻只聽的懂英文,這樣將會造成 雙方語言上的溝通障礙。 〇 [0003] 目前,有推出可應用於手機的即時翻譯軟體,當接聽到 曰語語音後,安裝有這種軟體的手機會立即在螢幕上顯 示出通話内容,1秒鐘後,相應的英文翻譯就會在螢幕上 出現。不過此項功能卻無法讓接收者聽到翻譯後的語音 ,而且一旦對方說話的速度很快,就不能很即時地看到 螢幕上的翻譯。 【發明内容】 〇 [〇〇〇4] 鑒於以上内容,有必要提供一種通話即時翻譯系統,其 可以通過伺服器的語音辨識、翻譯、合成技術,將對方 的語言進行即時翻譯,方便交流。 [0005] 此外,還有必要提供一種通話即時翻譯方法,其可以通 過伺服器的語音辨識、翻譯、合成技術,將對方的語言 進行即時翻譯,方便交流。 [0006] 一種通話即時翻譯系統,運行於伺服器上,該系統包括 :記錄模組,用於當電子裝置以本身的識別號碼向伺服 099105999 表單編號A0101 第3頁/共20頁 0992010926-0 201132108 器進行註冊申請翻譯服務時,記錄該電子裝置的識別號 碼以及被翻譯的語言與需要翻譯成的語言;擷取模組, 用於當所述註冊的電子裝置中與其他電子裝置進行通話 時,獲取該其他電子裝置的語音;所述的擷取模組還用 於根據所記錄的被翻譯的語言,從與該伺服器連接的語 音模型中擷取該被翻譯語言的聲學模型以及語言模型; 轉換模組,用於根據所擷取的聲學模型以及語言模型, 將所獲取的其他電子裝置的語音轉換成相應的文句;發 送模組,用於將所述轉換後的被翻譯語言的文句發送至 翻譯軟體,以翻譯成為需要翻譯成的語言的文句;分析 模組,用於分析翻譯後的文句的語法與語意,並轉化成 相應的語言特徵參數;所述的發送模組,還用於將轉化 後的語言特徵參數發送至韻律產生器中以得到翻譯後的 語言的文句的韻律參數,及將翻譯後的語言的文句發送 至合成單元產生器以得到合成單元;所述的發送模組, 還用於將合成單元以及韻律參數發送至語音合成器中, 從而產生翻譯後的語音,並將該語音發送至該註冊的電 子裝置。 [0007] 一種通話即時翻譯方法,應用於伺服器,該方法包括如 下步驟:a ·當電子裝置以本身的識別號碼向伺服器進行 註冊申請翻譯服務時,記錄該電子裝置的識別號碼以及 被翻譯的語言與需要翻譯成的語言;b ·當所述註冊的電 子裝置與其他電子裝置進行通話時,獲取該其他電子裝 置的語音,並從語音模型中擷取該被翻譯語言的聲學模 型以及語言模型;c .根據所擷取的聲學模型以及語言模 099105999 表單編號A0101 第4頁/共20頁 0992010926-0 201132108 型’將所獲取的其他電子裝置的語音轉換成相應的文句 ;d•當所獲取的其他電子裝置的語音能成功地被轉換成 相應的文句時,將所述轉換後的文句發送至翻譯軟體, 獲知·需要翻譯成的語言的文句;e .分析翻譯後的文句的 語法與語意’並轉化成相應的語言特徵參數;f .將轉化 後的語言特徵參數發送至韻律產生器中以得到韻律參數 ’及將翻譯後的文句發送至合成單元產生器以得到合成 單兀h ·將合成單元以及韻律參數發送至語音合成器中 Ο [0008] ’從而產生翻譯後的語音,並將該語音發送至該註冊的 電子裝置。 相較於習知技術’所述的通話即時翻譯系統及方法,當 對方以不同的語言與通話者通話時,通話者可以通過伺 服器的語音辨識、翻譯、合成技術,將對方的語言進行 即時翻譯,提供給通話者,方便雙方進行交流。 【實施方式】 [0009] Ο 如圖1所示’是本發明通話即時翻譯系統的較佳實施例的 環境架構圖。該通話即時翻譯系統運行於伺服器2上,並 通過網路(圖中未示出)與至少兩個電子裝置連接,將 兩個電子裝置(電子裝置A與電子裝置β)之間的通話進行 即時翻譯。所述的電子襞置Α與電子裝置Β中都分別包括 一個通話單元10。所述的伺服器2還包括翻譯軟體22、韻 律產生器24、合成單元產生器26以及語音合成器28。所 述的伺服器2還通過網路連接語音模型4以及語音資料庫5 〇 [0010] 所述的語音資料庫5中包括音譜、音高、發聲(音節、音 099105999 表單編號Α0101 第5頁/共2〇頁 0992010926-0 201132108 素等)以及單音節音素語音波形樣本。例如,在中文語言 中,“人”就是由三個音素“r” 、“e”以及“η”所形 成的。所述的語音資料庫5中還包括聲學參數,所述的聲 學參數指不同語音類型所表現出的音長、語調、能量等 參數。 [0011] 所述的語音模型4中包括各類語言的聲學模型以及語言模 型,例如英文的聲學模型以及語言模型,中文的聲學模 型以及語言模型等。其中所述的聲學模型是指根據一連 串訓練的資料來做運算,類比在特定環境下,判斷某段 語音發生的機率的資料模型。也即該聲學模型根據聲音 語言訓練資料訓練而成。所述的語言模型是指基於歷史 的資訊,推算下一個最有機會出現的詞的資料模型。也 即該語言模型是根據文字語言訓練資料訓練而成。 [0012] 所述的通話即時翻譯系統20可以用於擷取語音,並利用 語音模型4中的各類語言的聲學模型以及語言模型,將所 擷取的語音轉換成該語音所對應的一種語言的文句。201132108 VI. Description of the Invention: [Technical Field] The present invention relates to a translation system and method, and more particularly to a call instant translation system and method. [Prior Art] [0002] Now, when using a mobile phone or a home phone to talk to each other, if one of the speakers speaks English, the one who listens only knows Chinese, or the other party speaks Japanese, and the party that listens is But only understand English, this will cause communication barriers between the two languages. 〇[0003] At present, there is an instant translation software that can be applied to mobile phones. When the slang voice is received, the mobile phone with the software installed will immediately display the content of the call on the screen. After 1 second, the corresponding English The translation will appear on the screen. However, this feature does not allow the recipient to hear the translated speech, and once the other party speaks quickly, the translation on the screen cannot be seen very instantly. [Summary of the Invention] 〇 [〇〇〇4] In view of the above, it is necessary to provide a call instant translation system, which can instantly translate the other party's language through the voice recognition, translation and synthesis techniques of the server to facilitate communication. [0005] In addition, it is also necessary to provide a method for instant translation of a call, which can instantly translate the language of the other party through the voice recognition, translation, and synthesis techniques of the server to facilitate communication. [0006] A call instant translation system, running on a server, the system comprising: a recording module for when the electronic device uses its own identification number to the servo 099105999 Form No. A0101 Page 3 / Total 20 Pages 0992010926-0 201132108 Recording the identification number of the electronic device and the translated language and the language to be translated when the device performs the registration application translation service; the capture module is configured to perform a call with other electronic devices in the registered electronic device Acquiring the voice of the other electronic device; the capturing module is further configured to: retrieve an acoustic model and a language model of the translated language from a voice model connected to the server according to the recorded translated language; a conversion module, configured to convert the acquired voice of the other electronic device into a corresponding sentence according to the captured acoustic model and the language model; and send a module, configured to send the translated sentence of the translated language To translation software, to translate into a language that needs to be translated into a language; analysis module for analyzing the grammar of translated sentences Meaning and converted into corresponding language feature parameters; the sending module is further configured to send the transformed language feature parameters to the prosody generator to obtain prosodic parameters of the translated language sentence, and a sentence of the language is sent to the synthesizing unit generator to obtain a synthesizing unit; the transmitting module is further configured to send the synthesizing unit and the prosody parameter to the speech synthesizer, thereby generating the translated speech, and transmitting the speech To the registered electronic device. [0007] A call instant translation method is applied to a server, and the method includes the following steps: a. when an electronic device registers with a server to apply for a translation service with its own identification number, records the identification number of the electronic device and is translated. a language and a language to be translated; b. when the registered electronic device makes a call with another electronic device, acquires the voice of the other electronic device, and extracts an acoustic model and language of the translated language from the voice model Model; c. According to the acoustic model captured and language modal 099105999 Form No. A0101 Page 4 / Total 20 pages 0992010926-0 201132108 Type 'Transform the voice of other electronic devices obtained into corresponding sentences; d•当所When the acquired speech of the other electronic device can be successfully converted into the corresponding sentence, the converted sentence is sent to the translation software, and the sentence of the language to be translated is obtained; e. analyzing the grammar of the translated sentence The semantic meaning is transformed into the corresponding linguistic feature parameter; f. The transformed linguistic feature parameter is sent to the prosody generation In order to obtain the prosody parameter 'and send the translated sentence to the synthesis unit generator to obtain the synthesis unit 兀h · send the synthesis unit and the prosody parameter to the speech synthesizer 0008 [0008] ' to generate the translated speech, and The voice is sent to the registered electronic device. Compared with the instant call translation system and method described in the prior art, when the other party talks to the caller in different languages, the caller can instantly voice the other party's language through the voice recognition, translation and synthesis techniques of the server. Translation, provided to the caller, to facilitate communication between the two parties. [Embodiment] [0009] [FIG. 1] is an environmental architecture diagram of a preferred embodiment of the call instant translation system of the present invention. The call instant translation system runs on the server 2 and is connected to at least two electronic devices via a network (not shown) to perform a call between the two electronic devices (the electronic device A and the electronic device β). instant translation. The electronic device and the electronic device respectively comprise a communication unit 10. The server 2 also includes a translation software 22, a rhythm generator 24, a synthesis unit generator 26, and a speech synthesizer 28. The server 2 also connects the voice model 4 and the voice database 5 through the network. The voice database 5 described in the voice database 5 includes the sound spectrum, the pitch, and the utterance (syllable, sound 099105999, form number Α 0101, page 5 / A total of 2 pages 0992010926-0 201132108 prime, and a single syllable phoneme waveform sample. For example, in the Chinese language, “person” is formed by three phonemes “r”, “e” and “η”. The voice database 5 further includes acoustic parameters, and the acoustic parameters refer to parameters such as length, intonation, and energy exhibited by different types of speech. [0011] The speech model 4 includes acoustic models of various languages and language models, such as an acoustic model of English and a language model, an acoustic model of Chinese, and a language model. The acoustic model described above refers to a data model that performs operations based on a series of training data, and compares the probability of occurrence of a certain speech in a specific environment. That is, the acoustic model is trained based on sound language training materials. The language model refers to historical data, and the data model of the next most probable word is derived. That is, the language model is trained based on text language training materials. [0012] The call instant translation system 20 can be used to capture voice, and use the acoustic model and language model of each language in the voice model 4 to convert the captured voice into a language corresponding to the voice. The sentence.

II

[0013] 所述的翻譯軟體22用於將一種語言的文句翻譯成另外一 種對應的語言的文句,得到翻譯後的文句。例如將英文 翻譯成中文,將中文翻譯成英文,或者將日文翻譯成英 文等。 [0014] 所述的通話即時翻譯系統20還用於分析文句的語法與語 意,轉化成語言特徵參數,即知道文句中有哪些句子、 有哪些詞、發什麼音、怎麼發音、發音時到哪裡該停頓 、停頓多長等。 099105999 表單編號Α0101 第6頁/共20頁 0992010926-0 201132108 [0015] [0016] Ο [0017] 〇 所述的韻律產生㈣用於以語謂徵參數為輪人,產生 文句的每個音節的對應韻律訊息,該韻律訊息包含基頻 軌跡、音量、音長等,將說話的聲調、語氣、停頓方式 、及發音長短轉換成相應的韻律參數。 所述的合成單元產生器26用於根據翻譯後的文句從語音 資料庫5中的單音節音素語音波形樣本選擇其所對應的單 音節音素語音波形樣本,輸出合成單元。例如,在中文 中,由二個音素“r”、“e”以及“η”則可以輸出合成 單元“人”。 所述的語音合成器28用於根據合成單元產生器26輸出的 合成單元從語音資料庫5中選擇出該合成單元的聲學參數 ,包括音長、語調、能量等,然後根據在韻律產生器24 中得到的韻律參數,利用語音演算法,例如波形拼接法 ,產生相應的語音。例如,利用合成單元產生器26輸出 的合成單元人,在語音資料庫5中選擇出該合成單元 的音長、語調、能量等參數,再配合韻律產生器24轉換 的韻律參數(包含說話的聲調、語氣等),由該語音合 成器28透過語音演算法(如,波形拼接法)產生出語音 [0018] 如圖2所示,是本發明通話即時翻譯系統的功能模組圖。 所述的通話即時翻譯系統2〇包括記錄模組2〇〇、擷取模組 202、轉換模組204、判斷模組206、發送模組208以及分 析模組210。在本較佳實施例中,以電子裝置a的通話單 元10與電子裝置B的通話單元1〇進行通話來作為較佳實施 例進行說明。且在本較佳實施例中,以被翻譯的語言為 099105999 表單編號A0101 第7頁/共2〇頁 0992010926-0 201132108 英文,需要翻譯成的語言為中文進行說明。 [0019] 所述的記錄模組200,用於當電子裝置A以本身的識別號 碼向伺服器2進行註冊申請翻譯服務時,記錄該電子裝置 A的識別號碼以及該電子裝置A所指定的翻譯服務類型。 例如手機號碼為1 234568901的手機向伺服器2註冊申請 將英文翻譯成中文的翻譯服務時,記錄模組200將以其識 別號碼1 2345678901進行記錄,並記錄對應的翻譯服務 類型為英文-中文,也即被翻譯的語言為英文,需要翻譯 成的語言為中文。 [0020] 所述的擷取模組202,用於當電子裝置A中的通話單元10 以其本身的識別號碼與電子裝置B的通話單元10進行通話 時,獲取電子裝置B的語音。 [0021] 所述的擷取模組202還用於根據該電子裝置A的識別號碼 所記錄被翻譯的語言,如英文,從語音模型4中擷取對應 的英文的聲學模型以及語言模型。 [0022] 所述的轉換模組204,用於根據所擷取的對應的英文的聲 學模型以及語言模型,將所獲取的電子裝置B的語音轉換 成相應的英文文句。 [0023] 所述的判斷模組206,用於判斷所獲取的電子裝置B的語 音是否能被成功地轉換成被翻譯語言的文句,以此來判 斷是否需要翻譯。 [0024] 所述的發送模組208用於當所獲取的電子裝置B的語音需 要翻譯時,將所述轉化的相應的英文文句發送至翻譯軟 體22進行翻譯,即將被翻譯的語言的相應的英文文句翻 099105999 表單編號A0101 第8頁/共20頁 0992010926-0 201132108 澤成需要翻譯成的語言相應的中文文句,從而得到翻譯 後的中文文句。 [〇〇25]所述的分析模組210,用於分析翻譯後的文句的語法與語 意,並轉化成相應的語言特徵參數,即知道文句中有哪 些句子、有哪些詞、發什麼音、怎麼發音、發音時到哪 裡該停頓、停頓多長等。 [0026]所述的發送模組208還用於將轉化後的語言特徵參數作為 輸入發送至韻律產生器24中’從而得到韻律參數。 〇 ⑽27]所述的發送模組208還用玲將翻譯後的文句及發送至合成 單元產生器26。所述的合成單元產生器26根據翻譯後的 «吾s的文句從語音資料庫5中.的單音節音素語音波形樣本 選擇其所對應的單音節音素譆音波形樣本,從而得到合 成單元。 [0028] 所述的發送模組208還用於將所得到的合成單元以及所述 韻律參數發送至語音合成器28中,從而產生翻譯後的語 音,並將該語音發送至電子裝,置A ^ [0029] 所述的發送模組208還用於當所擷取的語音不需要翻譯時 ,直接將電子裝置B的語音發送至電子裝置A,即將沒有 經過翻譯的語音直接發送至電子裝置A。 [0030] 如圖3所示,是本發明通話即時翻譯的方法較佳實施例的 流程圖。步驟S110,當電子裝置a以本身的識別號碼向伺 服器2進行註冊申請翻譯服務時,記錄模組2〇〇記錄該電 子裝置A的識別號碼以及該電子裝置a所指定的翻譯服務 類型’即記錄其被翻譯的語言以及需要翻譯成的語言。 099105999 表單煸號A0101 第9頁/共20頁 0992010926-0 201132108 [0031] [0032] [0033] [0034] [0035] 在本較佳實施例中,以被翻譯的語言為英文,需要翻譯 成的語言為中文進行說明。 步驟Sill,t電子裝置a中的通話單元1〇以其本身的識別 號碼與電子裝置B的通話單元1G進行通話時,操取模組 202獲取電子裝置B的語音並根據該電子裝置A的識別號碼 所記錄的被翻譯的語言―英文,從語音模型4中擷取對應 的英文的聲學模型以及語言模型。 步驟S112,根據所擷取的對應英文的聲學模型以及語言 模型,轉換模組204將所獲取的電子裝„的語音轉換成 相應的英文文句。 步驟S113,判斷模組206判斷所獲取的電子裝置β的語音 是否能被成功地轉換成被翻譯語言的文句。當轉換模組 204能成功地將所獲取的電子裝置Β的語音轉換成相應的 英文文句時,即所擁取的語音需要翻譯,進入步驟Sl15 ,當轉換核組2 04不能成功地將所獲取的:電子裝置β的語 音轉換成相應的英文文句時,即木翻譯所擷取的語音, 進入步驟S114。 步驟S114 ’發送模組208直接將電子裝置β的語音發送給 電子裝置A,中間不經過翻譯處理,流程結束。 步驟S115,發送模組208將所述轉化的英文文句發送至翻 譯軟體22進行翻譯,即將英文文句翻譯成中文文句,從 而得到翻譯後的中文文句。 步驟S116,分析模組210分析翻譯後的中文文句的語法與 語意’並轉化成相應的語言特徵參數,以及發送模組2 〇 8 099105999 表單編號A0101 第10頁/共20頁 0992010926-0 [0036] 201132108 將轉化後的語言特徵參數發送至韻律產生器24中,從而 得到韻律參數。 [0037] [0038] Ο [0039] [0040] ❹ [0041] [0042] [0043] [0044] [0045] 步驟S117,發送模組208將翻譯後的中文文句發送至合成 單元產生器26,s亥合成早元產生器26從語音資料庫.5中的 單音節音素波形樣本中選擇所述翻譯後的中文文句所對 應的單音節音素語音波形樣本,從而得到合成單元。 步驟S118,發送模組208將所得到的合成單元以及所述得 到的韻律參數發送至語音合成器28中,從而產生翻譯後 的語音,並將該語音發送至電::子:裝.置.:Α·。 綜上所述,本剌符合發明專料件,練法提出專利 申凊。惟,以上所述者僅爲本發明之較佳實施例,本發 明之範圍並不以上述實施例爲限,舉凡熟悉本案技藝之 人士援依本發明之精神所作之等效修飾或變化,皆應涵 蓋於以下申請專利範圍内。 【圖式簡單說明】 圖1是本發明通話即_翻譯系統的較佳實施例的環境架構 圖。 圖2是本發明通話即時翻譯系統的功能模組圖。 圖3是本發明通話即時翻譯方法的較佳實施例的流程圖。 【主要元件符號說明】 電子裝置A 1 伺服器2 電子裝置B 3 099105999 表單編號A0101 第11頁/共20頁 0992010926-0 201132108 [0046] 語音模型4 [0047] 語音資料庫5 [0048] 通話單元10 [0049] 通話即時翻譯系統2 0 [0050] 翻譯軟體22 [0051] 韻律產生器24 [0052] 合成單元產生器26 [0053] 語音合成器28 [0054] 記錄模組200 [0055] 擷取模組2 02 [0056] 轉換模組204 [0057] 判斷模組20 6 [0058] 發送模組208 [0059] 分析模組210 [0060] 當電子裝置A向伺服器註冊翻譯服務時,記錄其識別號碼 以及所指定的被翻譯的語言與需要翻譯成的語言 [0061] S110 [0062] 當電子裝置A與電子裝置B進行通話時,獲取電子裝置B的 語音,並擷取被翻譯的語言的語音模型Sill [0063] 根據所擷取的語音模型,將所獲取的電子裝置B的語音轉 099105999 表單編號A0101 第12頁/共20頁 0992010926-0 201132108 [0064] [0065] [0066] [0067] 〇 [0068] [0069] [0070] [0071] 換成其對應的文句S112 判斷是否轉換成功S113 直接將電子裝置B的語音傳送給電子裝置A S114 將所述轉化的文句發送至翻譯軟體,得到翻譯後的文句 5115 分析翻譯後的文句,轉成相應的語言特徵參數,並將所 述語言特徵參數發送至韻律產生器,得到韻律參數 5116 將翻譯後的文句發送至合成單元產生器,根據語音資料 庫中的單音節音素語音波形樣本,得到合成單元S117[0013] The translation software 22 is configured to translate a sentence of one language into a sentence of another corresponding language to obtain a translated sentence. For example, translate English into Chinese, translate Chinese into English, or translate Japanese into English. [0014] The call instant translation system 20 is also used to analyze the grammar and semantic meaning of the sentence, and convert it into a linguistic feature parameter, that is, know which sentences, what words, what sounds, how to pronounce, and where to pronounce the sentences The pause, the length of the pause, and so on. 099105999 Form No. 101 0101 Page 6 / Total 20 Pages 0992010926-0 201132108 [0016] Ο [0017] 韵 The prosody generation described above (4) is used for the verbs of the verbs to generate the syllables of the sentence. Corresponding to the prosody message, the prosody message includes a fundamental frequency trajectory, a volume, a sound length, etc., and converts the tone, tone, pause mode, and length of the utterance into corresponding rhythm parameters. The synthesizing unit generator 26 is configured to select a corresponding monosyllabic phoneme speech waveform sample from the monosyllabic phoneme speech waveform sample in the speech database 5 according to the translated sentence, and output the synthesizing unit. For example, in Chinese, the synthesizing unit "person" can be output by two phonemes "r", "e", and "n". The speech synthesizer 28 is configured to select an acoustic parameter of the synthesizing unit from the speech database 5 according to the synthesizing unit output by the synthesizing unit generator 26, including a length, a tone, an energy, etc., and then according to the prosody generator 24 The prosody parameters obtained in the speech are generated by a speech algorithm, such as a waveform stitching method. For example, the synthesizing unit outputted by the synthesizing unit generator 26 selects parameters such as the length, tone, and energy of the synthesizing unit in the speech database 5, and then coordinates the rhythm parameters (including the tones of the speech) converted by the prosody generator 24. , speech, etc., the speech synthesizer 28 generates speech through a speech algorithm (eg, waveform stitching method) [0018] As shown in FIG. 2, it is a functional module diagram of the call instant translation system of the present invention. The call instant translation system 2 includes a recording module 2, a capture module 202, a conversion module 204, a determination module 206, a transmission module 208, and an analysis module 210. In the preferred embodiment, the communication unit 10 of the electronic device a makes a call with the communication unit 1 of the electronic device B as a preferred embodiment. In the preferred embodiment, the translated language is 099105999. Form number A0101 Page 7 / Total 2 pages 0992010926-0 201132108 English, the language to be translated is Chinese. [0019] The recording module 200 is configured to record the identification number of the electronic device A and the translation specified by the electronic device A when the electronic device A performs the registration application translation service to the server 2 with its own identification number. Service type. For example, when the mobile phone with the mobile phone number 1 234568901 registers with the server 2 to translate the English into Chinese translation service, the recording module 200 records with the identification number 1 2345678901, and records the corresponding translation service type as English-Chinese. That is, the translated language is English, and the language to be translated is Chinese. The capture module 202 is configured to acquire the voice of the electronic device B when the call unit 10 in the electronic device A makes a call with the call unit 10 of the electronic device B with its own identification number. [0021] The capture module 202 is further configured to capture a corresponding English acoustic model and a language model from the voice model 4 according to the translated language of the electronic device A, such as English. [0022] The conversion module 204 is configured to convert the acquired voice of the electronic device B into a corresponding English sentence according to the captured acoustic model and language model of the corresponding English. [0023] The determining module 206 is configured to determine whether the acquired voice of the electronic device B can be successfully converted into a sentence of the translated language, thereby determining whether translation is needed. [0024] The sending module 208 is configured to: when the acquired voice of the electronic device B needs to be translated, send the converted corresponding English sentence to the translation software 22 for translation, corresponding to the language to be translated. English sentence turn 099105999 Form number A0101 Page 8 / Total 20 pages 0992010926-0 201132108 Zecheng needs to translate the corresponding Chinese sentence into the language, so as to get the translated Chinese sentence. [分析25] The analysis module 210 is configured to analyze the grammar and semantic meaning of the translated sentence, and convert it into corresponding language feature parameters, that is, know which sentences, which words, what sounds, and what sounds are in the sentence. How to pronounce, when to pronounce, where to pause, how long to pause, etc. The transmitting module 208 is further configured to send the converted language feature parameter as an input to the prosody generator 24 to obtain a prosody parameter. The transmitting module 208 described in (10) 27] also sends the translated sentence to the synthesizing unit generator 26 using Ling. The synthesizing unit generator 26 selects the corresponding monosyllabic phoneme arpeggio waveform samples from the monosyllabic phoneme speech waveform samples in the speech database 5 according to the translated «u's sentence, thereby obtaining a synthesizing unit. [0028] The sending module 208 is further configured to send the obtained synthesizing unit and the prosody parameter to the speech synthesizer 28, thereby generating the translated speech, and transmitting the speech to the electronic device, and setting the A [0029] The sending module 208 is further configured to directly send the voice of the electronic device B to the electronic device A when the captured voice does not need to be translated, that is, the voice that has not been translated is directly sent to the electronic device A. . [0030] As shown in FIG. 3, it is a flowchart of a preferred embodiment of the method for instant translation of a call of the present invention. Step S110, when the electronic device a performs the registration application translation service to the server 2 with its own identification number, the recording module 2 records the identification number of the electronic device A and the translation service type specified by the electronic device a. Record the language in which it is translated and the language that needs to be translated. 099105999 Form nickname A0101 Page 9 / Total 20 pages 0992010926-0 201132108 [0033] [0035] [0035] In the preferred embodiment, the translated language is English and needs to be translated into The language is explained in Chinese. Step Sill, when the call unit 1 in the electronic device a makes a call with the call unit 1G of the electronic device B with its own identification number, the operation module 202 acquires the voice of the electronic device B and according to the identification of the electronic device A The translated language "English" recorded by the number captures the corresponding English acoustic model and language model from the speech model 4. Step S112, the conversion module 204 converts the acquired electronic device into a corresponding English sentence according to the captured acoustic model and the language model of the corresponding English. In step S113, the determining module 206 determines the acquired electronic device. Whether the speech of β can be successfully converted into a sentence of the translated language. When the conversion module 204 can successfully convert the acquired speech of the electronic device into a corresponding English sentence, the captured voice needs to be translated, Proceeding to step S15, when the converted core group 206 cannot successfully convert the acquired voice of the electronic device β into the corresponding English sentence, that is, the voice retrieved by the wood translation, the process proceeds to step S114. Step S114 'Transmission module 208 directly sends the voice of the electronic device β to the electronic device A, without translating processing, and the process ends. Step S115, the sending module 208 sends the converted English sentence to the translation software 22 for translation, that is, the English sentence is translated into Chinese sentence, thereby obtaining a translated Chinese sentence. Step S116, the analysis module 210 analyzes the syntax of the translated Chinese sentence The semantic meaning 'and translates into the corresponding language feature parameters, and the sending module 2 〇8 099105999 Form number A0101 Page 10 / Total 20 pages 0992010926-0 [0036] 201132108 Send the converted language feature parameters to the prosody generator 24 [0038] [0040] [0040] [0044] [0045] [0045] Step S117, the sending module 208 sends the translated Chinese sentence to The synthesizing unit generator 26 selects a monosyllabic phoneme speech waveform sample corresponding to the translated Chinese sentence from the monosyllabic phoneme waveform samples in the speech database .5 to obtain a synthesizing unit. Step S118, the sending module 208 sends the obtained synthesizing unit and the obtained prosody parameter to the speech synthesizer 28, thereby generating the translated speech, and transmitting the speech to the electric:: sub: mounting . . . In summary, the present invention meets the invention specific materials, and the patent application is filed. However, the above description is only a preferred embodiment of the present invention, and the scope of the present invention is not implemented as described above. Case limit The equivalent modifications or variations made by those skilled in the art in light of the spirit of the present invention are intended to be included in the scope of the following claims. FIG. 1 is a preferred embodiment of the call-to-translation system of the present invention. Figure 2 is a functional block diagram of a call instant translation system of the present invention. Figure 3 is a flow chart of a preferred embodiment of the call instant translation method of the present invention. [Main component symbol description] Electronic device A 1 server 2 Electronic device B 3 099105999 Form number A0101 Page 11 / Total 20 pages 0992010926-0 201132108 [0046] Voice model 4 [0047] Voice database 5 [0048] Call unit 10 [0049] Call instant translation system 2 0 [0050] Translation software 22 [0051] Prosody generator 24 [0052] Synthesizer generator 26 [0053] Speech synthesizer 28 [0054] Recording module 200 [0055] Capture module 2 02 [0056] Conversion module 204 [ 0057] Judging Module 20 6 [0058] Transmitting Module 208 [0059] When the electronic device A registers the translation service with the server, it records its identification number and the specified translated language and needs. Translated into [0061] S110 [0062] When the electronic device A makes a call with the electronic device B, the voice of the electronic device B is acquired, and the voice model Sill of the translated language is captured [0063] according to the captured voice model, The acquired voice of the electronic device B is 099105999 Form No. A0101 Page 12 / Total 20 Pages 0992010926-0 201132108 [0064] [0067] [0067] [0068] [0070] [0071] The corresponding sentence S112 determines whether the conversion succeeds S113. The voice of the electronic device B is directly transmitted to the electronic device A S114, and the translated sentence is sent to the translation software, and the translated sentence 5115 is analyzed, and the translated sentence is converted into a corresponding sentence. The linguistic feature parameter, and sending the linguistic feature parameter to the prosody generator, obtaining the prosody parameter 5116, sending the translated sentence to the synthesizing unit generator, and obtaining the synthesizing unit according to the monosyllabic phoneme speech waveform sample in the speech database S117

以合成單元與得到的韻律參數為輸入發送至語音合成器 中產生翻譯後的語音,並將該語音傳送至電子裝置A S118 099105999 表單編號A0101 第13頁/共20頁 0992010926-0The synthesized unit and the obtained prosody parameter are input as input to the speech synthesizer to generate the translated speech, and the speech is transmitted to the electronic device A S118 099105999 Form No. A0101 Page 13 of 20 0992010926-0

Claims (1)

201132108 七、申請專利範圍: 1 . 一種通話即時翻譯系統,運行於伺服器上,該系統包括: 記錄模組,用於當電子裝置以本身的識別號碼向伺服器進 行註冊申請翻譯服務時,記錄該電子裝置的識別號碼以及 被翻譯的語言與需要翻譯成的語言; 擷取模組,用於當所述註冊的電子裝置中與其他電子裝置 進行通話時,獲取該其他電子裝置的語音; 所述的擷取模組還用於根據所記錄的被翻譯的語言,從與 該伺服器連接的語音模型中擷取該被翻譯語言的聲學模型 以及語言模型; 轉換模組,用於根據所擷取的聲學模型以及語言模型,將 所獲取的其他電子裝置的語音轉換成相應的文句; 發送模組,用於將所述轉換後的被翻譯語言的文句發送至 翻譯軟體,以翻譯成為需要翻譯成的語言的文句; 分析模組,用於分析翻譯後的文句的語法與語意,並轉化 成相應的語言特徵參數; 所述的發送模組,還用於將轉化後的語言特徵參數發送至 韻律產生器中以得到翻譯後的語言的文句的韻律參數,及 將翻譯後的語言的文句發送至合成單元產生器以得到合成 單元; 所述的發送模組,還用於將合成單元以及韻律參數發送至 語音合成器中,從而產生翻譯後的語音,並將該語音發送 至該註冊的電子裝置。 2 .如申請專利範圍第1項所述之通話即時翻譯系統,該系統 還包括: 099105999 表單編號A0101 第14頁/共20頁 0992010926-0 201132108 判斷模組,用於判斷所獲取的其他電子裝置的語音是否能 成功地轉換成相應的文句; I 所述的發送模組,還用於當所獲取的其他電子裝置的語音 不能成功地轉換成相應的文句時,直接將該其他電子裝置 的語音發送至該註冊的電子裝置。 3 .如申請專利範圍第1項所述之通話即時翻譯系統,所述的 韻律產生器以語言特徵參數為輸入,產生翻譯後的語言的 文句的每個音節的對應韻律訊息,將說話的聲調、語氣、 > 停頓方式、及發音長短轉換成相應的韻律參數·,及 所述的合成單元產生器根據翻譯後的語言的文句從與伺服 器連接的語音資料庫中的單音節音素語音:波形樣本選擇其 所對應的單音節音素語音波形樣本,輪出合成單元。 4 ·如申請專利範圍第3項所述之通話即時翻譯系統,所述的 語音資料庫包括音譜、音高、發聲、單音節音素語音波形 樣本以及聲學參數。 5 .如申凊專利範圍第i項所述之通話即時翻譯系統,所述的 》 成器根據合成單元產生器輸出的合成單元從與伺服 器連接的語音資料庫中選擇咄該合成單元的聲學參數然 後根據在韻律產生器中得到的韻律參數,利用語音演算法 產生相應的語音。 6. _種通話即時翻譯方法,應用於舰器,該方純括如下 步驟: a备電子裝置以本身的識別號碼向伺服器進行註冊申請 翻譯服務時,記錄該電子裝置的識別號碼以及被翻譯的語 言與需要翻譯成的語言; 0992010926-0 b·當所述註冊的電子裝置與其他電子裝置進行通話時, 099105999 表單編號 A0101 201132108 獲取該其他電子裝置的語音,並從語音模型中操取該被翻 譯語言的聲學模型以及語言模型; C .根據所掏取的聲學模型以及語言模型,將所獲取的其 他電子I置的語音轉換成相應的文句; d田所獲取的其他電子裝置的語音能成功地被轉換成相 應的文句時’將所述轉換後的文句發送至翻譯軟體獲得 需要翻譯成的語言的文句; e .分析翻譯後的文句的語法與語意,並轉化成相應的語 言特徵參數; f.將轉化後的語言特徵參數發送至韻律產生 器中以得到 韻律參數’及將翻譯後的文句發送至合成單元產生器以得 到合成單元; h .將合成單元以及韻律參數發送至語音合成器中,從而 產生翻譯後的語音,並將該語音發送至該註冊的 電子裝置 〇 7 .如申請專利知•圍第6項所述之通話.即.時;翻譯i方法 ,該方法 還包括步驟: *所獲取的其他電子裝置的語音_成功地被轉換成相應 的文句時,直接將該其他電子裝置的語音發送至該註冊的 電子裝置。 8 ·如申明專利範圍第6項所述之通話即時翻譯方法 ,所述的 0員律產生器以語言特徵參數為輸入,產生翻譯後的語言的 文句的每個音節對應的韻律訊息,將說話的聲調、語氣、 停頓方式、及發音長短轉換成相應的韻律參數; 及所述的合成單元產生器根據翻譯後的語言的文句從與伺 服器連接的語音資料庫中的單音節音素語音波形樣本選擇 099105999 表單編號 A0101 第 16 頁/共 20 頁 0992010926-0 201132108 其所對應的單音節音素語音波形樣本,輸出合成單元。 9 .如申請專利範圍第8項所述之通話即時翻譯方法,所述的 語音資料庫包括音譜、音高、發聲、單音節音素語音波形 樣本以及聲學參數。 10 .如申請專利範圍第6項所述之通話即時翻譯方法,所述的 語音合成器根據合成單元產生器輸出的合成單元從語音資 料庫中選擇出該合成單元的聲學參數,然後根據在韻律產 生器中得到的韻律參數,利用語音演算法產生相應的語音 〇 〇 ❹ 099105999 表單編號A0101 第17頁/共20頁 0992010926-0201132108 VII. Patent application scope: 1. A call instant translation system running on a server, the system comprising: a recording module, configured to record when the electronic device registers with the server to apply for translation service with its own identification number The identification number of the electronic device and the translated language and the language to be translated; the capture module is configured to acquire the voice of the other electronic device when the registered electronic device makes a call with another electronic device; The capture module is further configured to retrieve an acoustic model and a language model of the translated language from a voice model connected to the server according to the recorded translated language; and the conversion module is configured to perform the Taking the acoustic model and the language model, converting the acquired voices of other electronic devices into corresponding sentences; and sending a module for transmitting the translated words of the translated language to the translation software, so as to be translated into a translation a language sentence; an analysis module that analyzes the grammar and semantics of the translated sentence and converts it into a corresponding language feature parameter; the sending module is further configured to send the converted language feature parameter to the prosody generator to obtain a prosody parameter of the translated language sentence, and send the translated language sentence a synthesizing unit generator to obtain a synthesizing unit; the transmitting module is further configured to send the synthesizing unit and the prosody parameter to the speech synthesizer, thereby generating the translated speech, and transmitting the speech to the registered electronic Device. 2. The instant translation system of the call as claimed in claim 1, the system further comprising: 099105999 Form No. A0101 Page 14 of 20 0992010926-0 201132108 Judging module for judging other electronic devices acquired Whether the voice can be successfully converted into a corresponding sentence; The transmitting module described in I is also used to directly voice the other electronic device when the acquired voice of other electronic device cannot be successfully converted into the corresponding sentence Send to the registered electronic device. 3. The call instant translation system according to claim 1, wherein the prosody generator inputs a linguistic feature parameter, and generates a corresponding prosody message of each syllable of the translated language sentence, and the tone of the speech is spoken. , tone, > pause mode, and length of pronunciation converted into corresponding prosody parameters, and the synthesizing unit generator according to the sentence of the translated language from the monosyllabic phoneme in the speech database connected to the server: The waveform sample selects the corresponding single-syllable phoneme speech waveform sample and rotates the synthesis unit. 4. The instant speech translation system of claim 3, wherein the speech database includes a sound spectrum, a pitch, a vocalization, a monosyllabic phoneme speech waveform sample, and an acoustic parameter. 5. The call instant translation system according to item ii of claim patent, wherein said composer selects acoustics of said synthesizing unit from a speech database connected to the server according to a synthesizing unit outputted by the synthesizing unit generator The parameters then generate a corresponding speech using a speech algorithm based on the prosody parameters obtained in the prosody generator. 6. _ instant call translation method, applied to the ship, the party is purely as follows: a backup electronic device to register the server with its own identification number to apply for translation service, record the identification number of the electronic device and be translated Language and language to be translated; 0992010926-0 b. When the registered electronic device makes a call with other electronic devices, 099105999 Form No. A0101 201132108 acquires the voice of the other electronic device, and takes the voice from the voice model The acoustic model of the translated language and the language model; C. According to the acquired acoustic model and the language model, the acquired other electronic I-formed speech is converted into the corresponding sentence; the voice of other electronic devices acquired by the field can be successful When the ground is converted into the corresponding sentence, 'the converted sentence is sent to the translation software to obtain the sentence of the language to be translated; e. analyzing the grammar and semantic meaning of the translated sentence, and converting into the corresponding language feature parameter; f. Send the converted language feature parameters to the prosody generator to obtain the rhythm And 'sending the translated sentence to the synthesizing unit generator to obtain the synthesizing unit; h. transmitting the synthesizing unit and the prosody parameter to the speech synthesizer, thereby generating the translated speech, and transmitting the speech to the registered The electronic device 〇7. As claimed in the patent application, the call described in item 6 is called the time; the i method is translated, and the method further comprises the steps of: * the acquired voice of the other electronic device is successfully converted into a corresponding When the sentence is spoken, the voice of the other electronic device is directly sent to the registered electronic device. 8 . The instant call translation method according to claim 6 , wherein the zero-law generator inputs the language feature parameter, and generates a prosody message corresponding to each syllable of the translated language sentence, and will speak. The tone, tone, pause mode, and length of the pronunciation are converted into corresponding prosody parameters; and the synthesizing unit generator samples the monosyllabic phoneme waveform from the speech database connected to the server according to the translated language sentence Select 099105999 Form No. A0101 Page 16 of 20 0992010926-0 201132108 The corresponding monosyllabic phoneme speech waveform sample, output synthesis unit. 9. The instant translation method of a call as claimed in claim 8, wherein the speech database comprises a sound spectrum, a pitch, a vocalization, a monosyllabic phoneme speech waveform sample, and an acoustic parameter. 10. The instant call translation method according to claim 6, wherein the speech synthesizer selects an acoustic parameter of the synthesizing unit from a speech database according to a synthesizing unit output by the synthesizing unit generator, and then according to the prosody The prosody parameters obtained in the generator, using the speech algorithm to generate the corresponding speech 〇〇❹ 099105999 Form No. A0101 Page 17 / Total 20 Pages 0992010926-0
TW99105999A 2010-03-02 2010-03-02 System and method for translating in communication immediately TW201132108A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW99105999A TW201132108A (en) 2010-03-02 2010-03-02 System and method for translating in communication immediately

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW99105999A TW201132108A (en) 2010-03-02 2010-03-02 System and method for translating in communication immediately

Publications (1)

Publication Number Publication Date
TW201132108A true TW201132108A (en) 2011-09-16

Family

ID=50180494

Family Applications (1)

Application Number Title Priority Date Filing Date
TW99105999A TW201132108A (en) 2010-03-02 2010-03-02 System and method for translating in communication immediately

Country Status (1)

Country Link
TW (1) TW201132108A (en)

Similar Documents

Publication Publication Date Title
US11776540B2 (en) Voice control of remote device
US9967382B2 (en) Enabling voice control of telephone device
CN111128126B (en) Multi-language intelligent voice conversation method and system
US7124082B2 (en) Phonetic speech-to-text-to-speech system and method
WO2019214047A1 (en) Method and apparatus for establishing voice print model, computer device, and storage medium
US20200012724A1 (en) Bidirectional speech translation system, bidirectional speech translation method and program
US7269561B2 (en) Bandwidth efficient digital voice communication system and method
US20070088547A1 (en) Phonetic speech-to-text-to-speech system and method
JP2022521289A (en) End-to-end voice conversion
US10325599B1 (en) Message response routing
JPH10507536A (en) Language recognition
CN102903361A (en) Instant call translation system and instant call translation method
WO2008084476A2 (en) Vowel recognition system and method in speech to text applications
TW201214413A (en) Modification of speech quality in conversations over voice channels
JP2023539888A (en) Synthetic data augmentation using voice conversion and speech recognition models
US11062711B2 (en) Voice-controlled communication requests and responses
KR20150105075A (en) Apparatus and method for automatic interpretation
TW200304638A (en) Network-accessible speaker-dependent voice models of multiple persons
US10143027B1 (en) Device selection for routing of communications
CN102196100A (en) Instant call translation system and method
Mishra et al. An Overview of Hindi Speech Recognition
Burke Speech processing for ip networks: Media resource control protocol (MRCP)
Westall et al. Speech technology for telecommunications
US11172527B2 (en) Routing of communications to a device
TW201132108A (en) System and method for translating in communication immediately