TWM532593U

TWM532593U - Voice-translation system

Info

Publication number: TWM532593U
Application number: TW105212108U
Authority: TW
Inventors: Chin-Min Lin; Yun-Lin Tsai
Original assignee: Nat Taichung University Science & Technology
Priority date: 2016-08-10
Filing date: 2016-08-10
Publication date: 2016-11-21

Description

Speech translation system

本創作是關於一種翻譯系統，特別是有關於一種語音翻譯系統。This creation is about a translation system, especially about a speech translation system.

目前將一國語音轉換到另一國語音的機器翻譯流程，是首先將語音識別成文字，然後將文字翻譯成目標語音的文字，再使用語音合成技術將目標語音的文字合成為目標語音。語音中包含了遠豐富於文字訊息的諸多訊息，比如，笑聲，嘆氣聲等語氣音，以及每個詞或音節等語音單元的時長、音高和能量等韻律訊息。這些訊息對於理解說話人的真正含意是非常有幫助的。然而，語音合成技術合成出來的語音僅僅是依賴於翻譯後的文字訊息，這樣語音背後的很多訊息便丟失掉了，此將導致翻譯精準度不足。At present, the machine translation process of converting a country's voice to another country's voice is to first recognize the voice into a text, then translate the text into the target voice, and then use speech synthesis technology to synthesize the target voice's text into the target voice. The voice contains a lot of information that is far richer than text messages, such as laughter, sigh and other tone, as well as prosody information such as duration, pitch and energy of each word or syllable. These messages are very helpful in understanding the true meaning of the speaker. However, the speech synthesized by speech synthesis technology only relies on the translated text message, so that many messages behind the speech are lost, which will result in insufficient translation accuracy.

綜觀前所述，本創作之創作人經多年潛心研究，設計了一種語音翻譯系統，以針對現有技術之缺失加以改善，進而增進產業上之實施利用。As mentioned above, the creators of this creation have designed a speech translation system after years of research, to improve the lack of existing technology, and to enhance the implementation and utilization of the industry.

有鑑於上述習知之問題，本創作之目的係提出一種語音翻譯系統，其用以解決習知之缺失。In view of the above-mentioned problems, the purpose of the present invention is to propose a speech translation system for solving the lack of the prior art.

基於上述目的，本創作係提供一種語音翻譯系統，其包含文本匯入模組、語音收發模組、資料庫建立模組、目標語言選擇模組以及翻譯處理模組。文本匯入模組提供匯入具有複數個詞彙的複數國語言的複數個文本資料。語音收發模組接收複數個參考語音；複數個參考語音對應複數個文本資料且具有複數個參考音訊；語音收發模組並接收具有複數個待翻譯音訊的待翻譯語音。資料庫建立模組連接文本匯入模組以及語音收發模組；資料庫建立模組比對複數個詞彙與複數個參考音訊，並根據比對結果建立語音文字對應資料庫。目標語言選擇模組提供介面；介面具有對應複數個文本資料之語言的複數個語言選項，以提供透過介面自複數個語言選項中選擇目標語言。翻譯處理模組連接語音收發模組、資料庫建立模組以及目標語言選擇模組；翻譯處理模組將複數個待翻譯音訊與語音文字對應資料庫中的複數個參考音訊比對，並根據比對符合的複數個參考音訊，以自語音文字對應資料庫中擷取對應目標語言的複數個參考音訊、複數個詞彙或其組合並輸出翻譯結果。Based on the above purposes, the present invention provides a speech translation system including a text import module, a voice transceiver module, a database creation module, a target language selection module, and a translation processing module. The text import module provides a plurality of text materials that are fed into a plurality of national languages having a plurality of vocabularies. The voice transceiver module receives a plurality of reference voices; the plurality of reference voices correspond to the plurality of text materials and have a plurality of reference voices; and the voice transceiver module receives the to-be-translated voices having a plurality of to-be-translated voices. The database building module connects the text import module and the voice transceiver module; the database building module compares the plurality of words and the plurality of reference audio signals, and establishes a voice text corresponding database according to the comparison result. The target language selection module provides an interface; the interface has a plurality of language options corresponding to the language of the plurality of text materials, to provide a selection of the target language from the plurality of language options through the interface. The translation processing module is connected to the voice transceiver module, the database creation module and the target language selection module; the translation processing module compares the plurality of to-be-translated audio with the plurality of reference audio in the corresponding database of the voice text, and compares the ratio For a plurality of reference audios that are matched, a plurality of reference audios, a plurality of vocabularies or a combination thereof corresponding to the target language are extracted from the phonetic correspondence database, and the translation results are output.

較佳地，目標語言選擇模組可提供透過介面自複數個語言選項中選擇複數個國家的複數個目標語言；翻譯處理模組可輸出對應複數個目標語言的複數個翻譯結果。Preferably, the target language selection module can provide a plurality of target languages for selecting a plurality of countries from a plurality of language options through the interface; the translation processing module can output a plurality of translation results corresponding to the plurality of target languages.

較佳地，翻譯處理模組可比對待翻譯語音中的複數個待翻譯音訊與語音文字對應資料庫中的複數個參考音訊的音調、響度、音色、音長或其任意組合。Preferably, the translation processing module can compare the pitch, loudness, timbre, length, or any combination thereof of the plurality of reference audio in the plurality of to-be-translated audio and speech-to-text databases in the speech to be translated.

較佳地，翻譯處理模組可儲存對應複數國語言的複數個國家之複數個位置資料；翻譯處理模組可自複數個位置資料中擷取對應目標語言之位置資料，接著可根據位置資料將翻譯結果輸出至位於對應目標語言之國家的外部電子裝置。Preferably, the translation processing module can store a plurality of location data corresponding to a plurality of countries in a plurality of national languages; the translation processing module can retrieve the location data of the corresponding target language from the plurality of location data, and then can be based on the location data. The translation result is output to an external electronic device located in the country corresponding to the target language.

較佳地，伺服器可包含連接語音收發模組的即時聊天系統，即時聊天系統可接收使用者聊天時所發出的聲音並可建立對應的待翻譯語音，接著可將待翻譯語音輸出至語音收發模組。Preferably, the server may include a live chat system connected to the voice transceiver module, and the instant chat system may receive the voice emitted by the user during the chat and may establish a corresponding voice to be translated, and then output the voice to be translated to the voice transceiver. Module.

較佳地，即時聊天系統所接收的待翻譯語音可來自使用者所持有的外部聲音收發裝置。Preferably, the voice to be translated received by the instant messaging system may be from an external voice transceiver held by the user.

較佳地，外部聲音收發裝置可為無線耳機設備。Preferably, the external sound transceiving device can be a wireless earphone device.

較佳地，伺服器可更包含連接翻譯處理模組的翻譯結果顯示系統；翻譯結果顯示系統可顯示來自翻譯處理模組的翻譯結果給使用者。Preferably, the server may further include a translation result display system connected to the translation processing module; the translation result display system may display the translation result from the translation processing module to the user.

承上所述，本創作之語音翻譯系統，其利用文本匯入模組、語音收發模組分別收集建立資料庫所需的多國語言的文字檔及與其對應的語音檔資料，特別地，這些語音檔資料可為使用者自己錄製的語音檔。接著，資料庫建立模組可根據該些資料以建立能精準辨識使用者音色以利後續查詢對應的文字並翻譯的資料庫。最後，目標語言選擇模組以及翻譯處理模組可利用資料庫執行翻譯之功能。藉此，本創作可將具有個人發音特色的語音精準無誤地轉換為對應期望語言的語音、文字或兩者的功能。As described above, the speech translation system of the present invention uses the text import module and the voice transceiving module to separately collect the multi-language text files and the corresponding voice file materials required for establishing the database, in particular, these The voice file data can be a voice file recorded by the user himself. Then, the database building module can be based on the data to establish a database that can accurately identify the user's voice color for subsequent querying and translating the corresponding text. Finally, the target language selection module and the translation processing module can perform translation functions using the database. In this way, the creation can accurately convert the voice with the characteristics of the individual pronunciation into the function of the voice, the text or both corresponding to the desired language.

以下將參照相關圖式，說明依本創作之語音翻譯系統之實施例，為使便於理解，下述實施例中之相同元件係以相同之符號標示來說明。The embodiments of the speech translation system according to the present invention will be described below with reference to the related drawings. For the sake of understanding, the same components in the following embodiments are denoted by the same reference numerals.

請參閱第1圖，其係根據本創作第一實施例之語音翻譯系統之方塊圖。如圖所示，語音翻譯系統10包含文本匯入模組100、語音收發模組200、資料庫建立模組300、目標語言選擇模組400以及翻譯處理模組500。Please refer to FIG. 1, which is a block diagram of a speech translation system according to a first embodiment of the present creation. As shown, the speech translation system 10 includes a text import module 100, a voice transceiver module 200, a database creation module 300, a target language selection module 400, and a translation processing module 500.

語音翻譯系統10的管理者或使用者可預先大量收集具有往後翻譯可能會使用到的字彙的文章，若有需要，可收集複數國語言文本。接著，利用文本匯入模組100提供的文字檔匯入功能，例如文本匯入模組100可接收doc、txt、html、rtf等檔案格式的文字檔，以將收集的複數個文本資料110 匯入文本匯入模組100。其中，複數個文本資料110可具有複數個詞彙111，且複數個詞彙111可為複數個不同國家之用語，例如英國、法國、日本、韓國等，在此僅舉例說明，不以此為限。The administrator or user of the speech translation system 10 can pre-collect a large number of articles with vocabulary that may be used in subsequent translations, and if necessary, collect a plurality of national language texts. Then, using the text file import function provided by the text import module 100, for example, the text import module 100 can receive text files in file formats such as doc, txt, html, rtf, etc., to collect the collected plurality of text materials 110. The text is imported into the module 100. The plurality of text materials 110 may have a plurality of vocabulary words 111, and the plurality of vocabularies 111 may be a plurality of terms in different countries, such as the United Kingdom, France, Japan, Korea, etc., which are merely illustrative and not limited thereto.

另外，使用者或管理者可收集對應上述複數個文本資料110的複數個參考音訊211。例如，由管理員找尋專業錄音人員針對複數個詞彙111錄製具有標準發音的參考音訊211；或者，客製化地，由使用者各自針對複數個詞彙111錄製具有個人特色的發音的參考音訊211。接著，文章語音收發模組200可接收管理者或使用者所提供的複數個參考語音210，其中這些參考語音210具有代表不同詞彙111之發音的複數個參考音訊211。In addition, the user or manager may collect a plurality of reference audios 211 corresponding to the plurality of text materials 110. For example, the administrator seeks a professional recording person to record a reference audio 211 having a standard pronunciation for a plurality of words 111; or, customarily, the user separately records a reference audio 211 having a personal characteristic pronunciation for the plurality of words 111. Next, the article voice transceiver module 200 can receive a plurality of reference voices 210 provided by the administrator or the user, wherein the reference voices 210 have a plurality of reference voices 211 representing the pronunciations of the different words 111.

舉例來說，語音收發模組200可包括單聲道或立體聲揚聲器及麥克風等，例如為藍芽耳麥或其他可透過藍芽、WiFi等通訊技術傳送及接收聲音訊號的免持裝置。而語音收發模組200可接收的參考語音210的檔案格式可為MP3、MIDI、AAC、WAV等，不以此為限。For example, the voice transceiver module 200 can include a mono or stereo speaker and a microphone, etc., for example, a Bluetooth headset or other hands-free device that can transmit and receive audio signals through communication technologies such as Bluetooth and WiFi. The file format of the reference voice 210 that can be received by the voice transceiver module 200 can be MP3, MIDI, AAC, WAV, etc., and is not limited thereto.

更進一步，資料庫建立模組300連接文本匯入模組100以及語音收發模組200，以接收來自文本匯入模組100的複數個文本資料110以及來自語音收發模組200的對應於複數個文本資料110的複數個參考音訊211(例如串流的參考音訊211)。藉此，資料庫建立模組300可比對複數個文本資料110中的複數個詞彙111與複數個參考語音210中的複數個參考音訊211，並接著根據比對結果建立語音文字對應資料庫310。亦即，彙整出各個不同字彙對應的發音。Further, the database creation module 300 is connected to the text import module 100 and the voice transceiver module 200 to receive a plurality of text materials 110 from the text import module 100 and corresponding to the plurality of voice transceiver modules 200. A plurality of reference audios 211 of the text material 110 (e.g., streamed reference audio 211). In this way, the database establishing module 300 can compare the plurality of vocabularies 111 in the plurality of text materials 110 with the plurality of reference audio 211 in the plurality of reference voices 210, and then establish the voice text corresponding database 310 according to the comparison result. That is, the pronunciation corresponding to each different vocabulary is collected.

在如上述透過文本匯入模組100、語音收發模組200、資料庫建立模組300完成語音文字對應資料庫310之建立後，使用者可利用語音翻譯系統10所包含的其他模組執行語音翻譯之功能，如下詳細說明。After the voice text correspondence module 310 is created through the text import module 100, the voice transceiver module 200, and the database creation module 300, the user can perform voice using other modules included in the voice translation system 10. The function of translation is described in detail below.

目標語言選擇模組400可提供觸控式或非觸控式的介面410，介面410上可具有對應複數個文本資料110之語言的複數個語言選項411。在使用者執行語音翻譯功能之前，可預先透過介面410自複數個語言選項411中選擇期望翻譯後之目標語言412。較佳地，目標語言選擇模組400提供自複數個語言選項411中選擇性挑選複數種目標語言412，以在後續程序中利用翻譯處理模組500取得一或多個不同語言的翻譯結果510。The target language selection module 400 can provide a touch or non-touch interface 410. The interface 410 can have a plurality of language options 411 corresponding to the language of the plurality of text materials 110. Before the user performs the voice translation function, the target language 412 that is desired to be translated may be selected from the plurality of language options 411 through the interface 410 in advance. Preferably, the target language selection module 400 provides for selectively selecting a plurality of target languages 412 from the plurality of language options 411 to obtain translation results 510 of one or more different languages using the translation processing module 500 in subsequent programs.

接著，語音收發模組200可接收具有複數個待翻譯音訊221的待翻譯語音220，即使用者可在語音收發模組200之接收範圍內發出聲音，例如說話或利用語音播放裝置(例如喇叭或麥克風等)播放音檔，其中待翻譯音訊221係指針對翻譯前的每個詞彙111的發音。應可理解，翻譯前的待翻譯音訊221之語言應有別於翻譯後的目標語言412，如此始有翻譯之必要性。Then, the voice transceiver module 200 can receive the to-be-translated voice 220 having a plurality of to-be-translated audios 221, that is, the user can emit a voice within the receiving range of the voice transceiver module 200, such as speaking or using a voice playback device (such as a speaker or The microphone, etc.) plays the audio file, wherein the audio to be translated 221 is a pronunciation of each of the words 111 before the translation. It should be understood that the language of the pre-translated audio 221 to be translated should be different from the translated target language 412, so that the translation is necessary.

更進一步，翻譯處理模組500連接語音收發模組200、資料庫建立模組300以及目標語言選擇模組400。翻譯處理模組500應預先接收資料庫建立模組300所建立之語音文字對應資料庫310。藉此，當翻譯處理模組500接收到來自語音收發模組200的待翻譯語音220時，翻譯處理模組500可將複數個待翻譯音訊221與語音文字對應資料庫310中的複數個參考音訊211進行比對。Further, the translation processing module 500 is connected to the voice transceiver module 200, the database creation module 300, and the target language selection module 400. The translation processing module 500 should receive the voice text correspondence database 310 established by the database creation module 300 in advance. Therefore, when the translation processing module 500 receives the to-be-translated voice 220 from the voice transceiver module 200, the translation processing module 500 can input a plurality of reference audio signals in the plurality of to-be-translated audios 221 and the voice-word correspondence database 310. 211 for comparison.

最後，當翻譯處理模組500比對出與複數個待翻譯音訊221相符合的複數個參考音訊211時，翻譯處理模組500可根據比對符合的複數個參考音訊211，以自語音文字對應資料庫310中擷取對應目標語言412的複數個參考音訊211、複數個詞彙111或兩者，並整合這些音訊211、詞彙111或兩者以輸出翻譯結果510。Finally, when the translation processing module 500 compares the plurality of reference audios 211 that are consistent with the plurality of to-be-translated audios 221, the translation processing module 500 can correspond to the plurality of reference audios 211 that are matched by the comparison. The database 310 retrieves a plurality of reference audios 211, a plurality of words 111 or both corresponding to the target language 412, and integrates the audio 211, the vocabulary 111 or both to output the translation result 510.

較佳地，翻譯處理模組500可比對待翻譯語音220中的複數個待翻譯音訊221與語音文字對應資料庫310中的複數個參考音訊211的音調、響度、音色、音長或其任意組合等，藉以提升翻譯之精準度。Preferably, the translation processing module 500 can compare the pitch, loudness, timbre, length, or any combination of the plurality of reference audio 211 in the plurality of to-be-translated audios 221 and the speech-to-speech data library 310 in the speech-to-speech 220. In order to improve the accuracy of translation.

舉例來說，待翻譯語音220為英語，而目標語言412為日文時，翻譯處理模組500執行將英語的語音檔(即待翻譯語音220)翻譯成日語的語音檔(即翻譯結果510)的翻譯程序，即僅取得音檔格式的翻譯結果510。For example, when the speech to be translated 220 is English and the target language 412 is Japanese, the translation processing module 500 performs translation of the speech file of the English language (ie, the speech to be translated 220) into the Japanese speech file (ie, the translation result 510). The translation program, that is, only the translation result 510 of the audio file format is obtained.

然而，所屬技術領域中具有通常知識者應當理解可藉由簡單改變本創作，以根據實際使用之需求，利用翻譯處理模組500將英語的語音檔(即待翻譯語音220)翻譯成日文的文字檔(即翻譯結果510)；反之，亦可將英文的文字檔(即待翻譯語音220)翻譯成日文的文字檔(即翻譯結果510)；或者，可結合上述翻譯方式，以同時取得音檔格式及文字檔格式的翻譯結果510，以利於使用者理解、學習。However, those of ordinary skill in the art should understand that by simply changing the creation, the translation processing module 500 can be used to translate the English voice file (ie, the voice to be translated 220) into Japanese text according to the needs of actual use. File (ie, translation result 510); conversely, the English text file (ie, the voice to be translated 220) can be translated into the Japanese text file (ie, the translation result 510); or, the above translation method can be combined to obtain the sound file at the same time. The translation result 510 of the format and text file format is convenient for the user to understand and learn.

請參閱第2圖及第3圖，其係本創作第二實施例之語音翻譯系統之方塊圖及示意圖。如第2圖所示，語音翻譯系統10包含文本匯入模組100、語音收發模組200、資料庫建立模組300、目標語言選擇模組400以及翻譯處理模組500。於本實施例中，相同元件符號之元件，其作動與配置類似於前述之實施例，於此便不再加以贅述。Please refer to FIG. 2 and FIG. 3 , which are block diagrams and schematic diagrams of the speech translation system of the second embodiment of the present invention. As shown in FIG. 2, the speech translation system 10 includes a text import module 100, a voice transceiver module 200, a database creation module 300, a target language selection module 400, and a translation processing module 500. In the present embodiment, the components of the same component symbols are similar to the embodiments described above, and will not be further described herein.

伺服器30可包含連接語音收發模組200的即時聊天系統31。當在不同國家的使用者在伺服器30所提供的作為聊天平台的即時聊天系統31上聊天時，即時聊天系統31可接收使用者聊天時所發出的聲音並建立對應的待翻譯語音220。The server 30 can include a live chat system 31 that connects to the voice transceiver module 200. When the users in different countries chat on the instant messaging system 31 provided by the server 30 as the chat platform, the instant messaging system 31 can receive the voice emitted by the user and establish a corresponding voice to be translated 220.

接著，伺服器30的即時聊天系統31可將待翻譯語音220輸出至語音翻譯系統10的語音收發模組200以進行如上所述的翻譯程序。舉例來說，即時聊天系統31所接收的待翻譯語音220可來自使用者所持有的外部聲音收發裝置40，例如無線耳機設備，但不以此為限。Next, the live chat system 31 of the server 30 can output the speech to be translated 220 to the voice transceiving module 200 of the speech translation system 10 to perform the translation process as described above. For example, the voice to be translated 220 received by the live chat system 31 may be from an external voice transceiver 40 held by the user, such as a wireless headset device, but is not limited thereto.

伺服器30可更進一步包含連接翻譯處理模組500的翻譯結果顯示系統32，伺服器30的翻譯結果顯示系統32可顯示來自語音翻譯系統10的翻譯處理模組500的翻譯結果510給使用者。The server 30 can further include a translation result display system 32 coupled to the translation processing module 500. The translation result display system 32 of the server 30 can display the translation result 510 from the translation processing module 500 of the speech translation system 10 to the user.

詳細地說，翻譯處理模組500可儲存對應複數國語言的複數個國家之複數個位置資料530。在語音翻譯系統10的翻譯處理模組500產生翻譯結果510之後，翻譯處理模組500可自複數個位置資料530中擷取對應目標語言412之位置資料530。In detail, the translation processing module 500 can store a plurality of location data 530 corresponding to a plurality of countries in a plurality of national languages. After the translation processing module 500 of the speech translation system 10 generates the translation result 510, the translation processing module 500 can retrieve the location data 530 corresponding to the target language 412 from the plurality of location data 530.

接著，翻譯處理模組500可根據位置資料530將翻譯結果510輸出至位於對應目標語言412之國家的外部電子裝置20。更精準地，翻譯處理模組500可提供使用者預先儲存或聊天當下即時輸入欲傳遞的目的地的位置資料530。Next, the translation processing module 500 can output the translation result 510 to the external electronic device 20 located in the country corresponding to the target language 412 according to the location profile 530. More precisely, the translation processing module 500 can provide the location information 530 for the user to pre-store or chat at the moment to instantly input the destination to be delivered.

如第3圖所示，分別位於北美洲、南美洲、非洲等國家的某特定位置的使用者透過伺服器30提供的即時聊天系統31聊天時，位於非洲的國家的使用者說話產生的待翻譯語音220例如為馬達加斯加語（為南島語系語言），該使用者可選擇英文、美文的目標語言412，以產生英文或美文的翻譯結果510給位於北美洲、南美洲的另外兩個使用者。藉此，世界各國的使用者不需額外的翻譯人員亦可有效地相互溝通、學習，以促進文化上的交流。As shown in FIG. 3, users in a specific location in countries such as North America, South America, Africa, etc., chat through the instant messaging system 31 provided by the server 30, and the users in the African countries speak to be translated. The voice 220 is, for example, Madagascar (for the South Island language), and the user can select the target language 412 in English and American to generate an English or American translation result 510 for two other users located in North America and South America. In this way, users from all over the world can communicate and learn effectively without additional translators to promote cultural exchanges.

以上所述僅為舉例性，而非為限制性者。任何未脫離本創作之精神與範疇，而對其進行之等效修改或變更，均應包含於後附之申請專利範圍中。The above is intended to be illustrative only and not limiting. Any equivalent modifications or alterations to the spirit and scope of this creation shall be included in the scope of the appended patent application.

10‧‧‧語音翻譯系統
20‧‧‧外部電子裝置
30‧‧‧伺服器
31‧‧‧即時聊天系統
32‧‧‧翻譯結果顯示系統
40‧‧‧外部聲音收發裝置
100‧‧‧文本匯入模組
110‧‧‧文本資料
111‧‧‧詞彙
200‧‧‧語音收發模組
210‧‧‧參考語音
211‧‧‧參考音訊
220‧‧‧待翻譯語音
221‧‧‧待翻譯音訊
300‧‧‧資料庫建立模組
310‧‧‧語音文字對應資料庫
400‧‧‧目標語言選擇模組
410‧‧‧介面
411‧‧‧語言選項
412‧‧‧目標語言
500‧‧‧翻譯處理模組
510‧‧‧翻譯結果
530‧‧‧位置資料10‧‧‧Voice translation system
20‧‧‧External electronic devices
30‧‧‧Server
31‧‧‧ Live Chat System
32‧‧‧ Translation result display system
40‧‧‧External sound transceiver
100‧‧‧Text Import Module
110‧‧‧Text materials
111‧‧‧ vocabulary
200‧‧‧Voice transceiver module
210‧‧‧Reference voice
211‧‧‧Reference audio
220‧‧‧voices to be translated
221‧‧‧To be translated
300‧‧‧Database building module
310‧‧‧Voice text correspondence database
400‧‧‧Target language selection module
410‧‧‧ interface
411‧‧‧ language options
412‧‧‧ Target language
500‧‧‧Translation Processing Module
510‧‧‧ translation results
530‧‧‧Location Information

第1圖係根據本創作第一實施例之語音翻譯系統之方塊圖。Fig. 1 is a block diagram of a speech translation system according to a first embodiment of the present creation.

第2圖係根據本創作第二實施例之語音翻譯系統之方塊圖。Fig. 2 is a block diagram of a speech translation system according to a second embodiment of the present creation.

第3圖係根據本創作第二實施例之語音翻譯系統之示意圖。Fig. 3 is a schematic diagram of a speech translation system according to a second embodiment of the present creation.

10‧‧‧語音翻譯系統 10‧‧‧Voice translation system

100‧‧‧文本匯入模組 100‧‧‧Text Import Module

110‧‧‧文本資料 110‧‧‧Text materials

111‧‧‧詞彙 111‧‧‧ vocabulary

200‧‧‧語音收發模組 200‧‧‧Voice transceiver module

210‧‧‧參考語音 210‧‧‧Reference voice

211‧‧‧參考音訊 211‧‧‧Reference audio

220‧‧‧待翻譯語音 220‧‧‧voices to be translated

221‧‧‧待翻譯音訊 221‧‧‧To be translated

300‧‧‧資料庫建立模組 300‧‧‧Database building module

310‧‧‧語音文字對應資料庫 310‧‧‧Voice text correspondence database

400‧‧‧目標語言選擇模組 400‧‧‧Target language selection module

410‧‧‧介面 410‧‧‧ interface

411‧‧‧語言選項 411‧‧‧ language options

412‧‧‧目標語言 412‧‧‧ Target language

500‧‧‧翻譯處理模組 500‧‧‧Translation Processing Module

510‧‧‧翻譯結果 510‧‧‧ translation results

Claims

A speech translation system, comprising: a text import module, the text import module provides a plurality of text materials into a plurality of national languages having a plurality of vocabulary; a voice transceiver module, the voice transceiver module receives a plurality of a reference voice, the plurality of reference voices corresponding to the plurality of text data and having a plurality of reference audio, the voice transceiver module receiving a to-be-translated voice having a plurality of to-be-translated audio; a database establishing module, connecting the a text import module and the voice transceiver module, the database building module compares the plurality of words with the plurality of reference audio signals, and establishes a phonetic text corresponding database according to the comparison result; a target language selection module The target language selection module provides an interface having a plurality of language options corresponding to the language of the plurality of text materials to provide a target language from the plurality of language options through the interface; and a translation processing module a group, connecting the voice transceiver module, the database creation module, and the target language selection module, the translation office The module compares the plurality of to-be-translated audios with the plurality of reference audio signals in the corresponding phonetic text database, and compares the plurality of reference audio messages that are matched by the comparison to obtain a pair from the corresponding phonetic text database The plurality of reference audios, the plurality of words, or a combination thereof in the target language should be output and a translation result is output.

The speech translation system of claim 1, wherein the target language selection module provides a plurality of the target languages selected from the plurality of language options through the interface, and the translation processing module outputs corresponding A plurality of translation results of the target language.

The speech translation system of claim 1, wherein the translation processing module is different from the plurality of to-be-translated audios in the speech to be translated and the tones of the plurality of reference audio in the corresponding database of the speech text. , loudness, tone, length, or any combination thereof.

The speech translation system of claim 1, wherein the translation processing module stores a plurality of location data corresponding to a plurality of countries in a plurality of national languages, the translation processing module extracting from the plurality of location data The location data corresponding to the target language is then output based on the location data to an external electronic device located in the country corresponding to the target language.

The voice translation system of claim 1, wherein the server comprises a live chat system connected to the voice transceiver module, and the instant chat system receives the voice emitted by the user during the chat and establishes the corresponding waiting The voice is translated, and then the voice to be translated is output to the voice transceiver module.

The speech translation system of claim 5, wherein the to-be-translated speech received by the live chat system is from an external sound transceiving device held by the user.

The speech translation system of claim 6, wherein the external sound transceiving device is a wireless earphone device.

The speech translation system of claim 5, wherein the server further comprises a translation result display system connected to the translation processing module, the translation result display system displaying the translation result from the translation processing module to user.