WO2011007595A1 - 音声翻訳システム、辞書サーバ装置、およびプログラム - Google Patents
音声翻訳システム、辞書サーバ装置、およびプログラム Download PDFInfo
- Publication number
- WO2011007595A1 WO2011007595A1 PCT/JP2010/053418 JP2010053418W WO2011007595A1 WO 2011007595 A1 WO2011007595 A1 WO 2011007595A1 JP 2010053418 W JP2010053418 W JP 2010053418W WO 2011007595 A1 WO2011007595 A1 WO 2011007595A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- information
- translation
- speech
- unit
- speech synthesis
- Prior art date
Links
- 238000013519 translation Methods 0.000 title claims abstract description 512
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 381
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 381
- 230000005540 biological transmission Effects 0.000 claims abstract description 158
- 238000000034 method Methods 0.000 claims description 115
- 238000012545 processing Methods 0.000 claims description 59
- 230000004044 response Effects 0.000 claims description 18
- 230000002194 synthesizing effect Effects 0.000 claims description 6
- 238000009825 accumulation Methods 0.000 claims description 4
- 230000014616 translation Effects 0.000 description 449
- 238000004891 communication Methods 0.000 description 27
- 238000010586 diagram Methods 0.000 description 16
- 230000009193 crawling Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000000295 complement effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- the present invention relates to a speech translation system that performs speech translation.
- the speech translation system is a speech translation system having a dictionary server device, one or more speech recognition server devices, one or more translation server devices, and one or more speech synthesis server devices.
- the apparatus includes, for each language of two or more languages, representation of a term having one or more words, speech recognition information that is information for speech recognition of the term, and speech synthesis that is information for speech synthesis of the term.
- All-language pair dictionary storage unit capable of storing two or more all-language term information having term information having information associated with all of two or more languages, and all languages or two of the two or more languages
- information for speech recognition including speech recognition information for terms is acquired from the all-language group dictionary storage unit and transmitted to one or more speech recognition server devices
- speech recognition server devices For all of the above languages or a part of two or more languages, translation information including terminology is acquired from the all-language pair dictionary storage unit and transmitted to one or more translation server devices
- Information for speech synthesis including speech synthesis information of a term is acquired from the all language pair dictionary storage unit for one or more of the information transmission unit and two or more languages or two or more languages.
- a speech synthesis information transmitting unit that transmits to the speech synthesis server device, and the speech recognition server device transmits speech recognition information for all languages or two or more partial languages of two or more languages.
- a voice recognition information storage unit that can be stored; and a voice recognition information receiving unit that receives voice recognition information for all or two or more of two or more languages from the dictionary server device;
- For voice recognition A voice recognition information storage unit for storing the voice recognition information received by the information receiving unit in the voice recognition information storage unit, and a voice information receiving unit for receiving voice information which is voice information input to the first terminal device The voice information received by the voice information receiving unit using the voice recognition information in the voice recognition information storage unit and acquiring the voice recognition result, and the voice recognition transmitting the voice recognition result.
- a translation server apparatus a translation information storage section capable of storing translation information for all or a part of two or more languages, and a dictionary server apparatus. From the translation information receiving unit that receives the information for sound translation and the translation information received by the translation information receiving unit for translation of all or some of the two or more languages Translations stored in the information store The speech recognition result received by the information storage unit, the speech recognition result receiving unit that receives the speech recognition result, and the speech recognition result receiving unit is translated into the target language using the translation information in the translation information storage unit.
- the speech synthesis server device includes a translation unit that acquires the translation result and a translation result transmission unit that transmits the translation result.
- the speech synthesis server device is configured for all languages of two or more languages or two or more partial languages.
- a speech synthesis information storage unit that can store speech synthesis information and speech synthesis that receives speech synthesis information for all or some of two or more languages from a dictionary server device Information receiving unit, a speech synthesis information storage unit that stores speech synthesis information received by the speech synthesis information reception unit in a speech synthesis information storage unit, a translation result reception unit that receives a translation result, and a translation result
- the message received by the receiver A speech synthesis unit that synthesizes speech using the speech synthesis information in the speech synthesis information storage unit and obtains the speech synthesis result, and a speech synthesis result transmission unit that transmits the speech synthesis result to the second terminal device.
- This is a speech translation system.
- Such a configuration can eliminate inconsistencies between dictionaries used in speech translation.
- the speech recognition unit of the speech recognition server device is successful or unsuccessful in the speech recognition processing for the speech information received by the speech information receiving unit.
- the voice recognition determining means for determining whether or not the voice recognition determining means determines that the voice recognition processing has failed, and the voice recognition information for instructing the dictionary server device to transmit the voice recognition information.
- the voice information received by the transmission instruction means and the voice information receiving unit is voice-recognized using the voice recognition information in the voice recognition information storage unit, the voice recognition result is obtained, and the instruction is transmitted Voice recognition means using voice recognition information received by the voice recognition information receiving unit from the dictionary server device, and acquiring voice recognition results.
- the voice recognition information receiving unit transmits instructions.
- Corresponding A speech translation system for receiving speech recognition information from the server device.
- the translation unit of the translation server device determines whether the translation process for the speech recognition result received by the speech recognition result receiving unit is successful or unsuccessful.
- a translation determination means for determining whether or not there is a translation information transmission instruction for instructing the dictionary server device to transmit a notation of a target language term when the translation determination means determines that the translation process has failed.
- the speech recognition result received by the means and the speech recognition result receiving unit is translated into the target language using the translation information in the translation information storage unit, the translation result is obtained, and the instruction is transmitted
- a translation information receiving unit comprising: translation means for translating the speech recognition result into the target language using the notation of the target language term received by the translation information receiving unit from the dictionary server device; Corresponds to the transmission of instructions
- a speech translation system for receiving representation of the target language terms from the dictionary server.
- the speech synthesizer of the speech synthesizer server device is successful or unsuccessful in the speech synthesis processing for the translation result received by the translation result reception unit.
- Speech synthesis information for instructing the dictionary server device to transmit speech synthesis information when the speech synthesis determination unit and the speech synthesis determination unit determine that the speech synthesis process has failed.
- the transmission result received by the transmission instruction means and the translation result receiving unit is synthesized using the speech synthesis information in the speech synthesis information storage unit, the speech synthesis result is obtained, and the instruction is transmitted.
- the speech synthesis information receiving unit receives speech synthesis information received from the dictionary server device, the speech synthesis information is synthesized, and the speech synthesis unit receives voice synthesis results.
- the dictionary server device stores all language pairs from the web page of one or more web server devices on the Internet.
- a speech translation system further comprising: a notation acquisition unit that acquires a notation of a term that does not exist in the part; and a notation storage unit that stores the notation of the term acquired by the notation acquisition unit in an all-language group dictionary storage unit.
- the dictionary server device includes an information receiving unit that receives any of the term information from one or more third terminal devices;
- the speech translation system further includes an information storage unit that stores the information received by the reception unit in association with the notation of the corresponding term in the corresponding language and stores the information in the all-language group dictionary storage unit.
- the dictionary server further includes an output unit that outputs all language term information or a part of all language term information. And when the output unit outputs all language term information or a part of all language term information, the output unit determines in advance that there is all the predetermined information for all two or more languages.
- This is a speech translation system that outputs all language term information or some information of all language term information in a visually different manner depending on whether or not some information of all the information is present.
- the speech translation system according to the present invention can eliminate inconsistencies between dictionaries used in speech translation.
- the all-language dictionary is a dictionary that centrally manages information necessary for speech recognition, translation, and speech synthesis.
- the all language pair dictionary is information of a dictionary storing two or more all language term information.
- the all-language term information is information having one term information for each language for two or more languages that can be a target of speech translation.
- the term information is information including speech recognition information that is information necessary for speech recognition, translation information that is information necessary for translation, and speech synthesis information that is information necessary for speech synthesis.
- the term information is information related to one term. Further, the structure of the term information may be different depending on the language.
- the two or more languages that can be the target of speech translation are preferably three or more languages.
- a dictionary server having an all-language dictionary in real time when necessary information does not exist in a device that performs each process at the stage of each process such as speech recognition, translation, and speech synthesis.
- a speech translation system that executes processing (referred to as real-time complement processing) to acquire information necessary for each device to perform processing from the device will be described.
- information such as newly appearing terms is acquired from one or more web servers by crawling, etc., and the all language pair dictionary is enhanced, for example, unspecified many or
- a dictionary server apparatus having a function of accepting information stored in a dictionary for all languages from a large number of specific users will be described.
- FIG. 1 is a conceptual diagram of the speech translation system 1 in the present embodiment.
- the speech translation system 1 includes one or more first terminal devices 11, one or more second terminal devices 12, a dictionary server device 13, one or more speech recognition server devices 14, one or more translation server devices 15, and one or more speech.
- the synthesis server device 16 includes one or more third terminal devices 17.
- the speech translation server device 14 recognizes Japanese “good morning”. . Then, the translation server device 15 translates the speech recognition result into, for example, English “Good morning”. Next, the speech synthesis server device 16 creates speech information of “Good morning” from the English text “Good morning”. Then, the voice “Good morning” is output from the second terminal device 12 of the user B who is native English.
- the first terminal device 11 and the second terminal device 12 are, for example, terminals (including telephones and mobile phones) that make calls.
- the first terminal device 11 is mainly described as a terminal that speaks
- the second terminal device 12 is described as a terminal that speaks, but it goes without saying that both are interchanged.
- the dictionary server device 13 has all the information used by the speech recognition server device 14, the translation server device 15, and the speech synthesis server device 16. This information is the all language pair dictionary described above.
- the third terminal device 17 is a terminal for inputting information in order to add information to the all language pair dictionary and to enrich the all language pair dictionary.
- FIG. 2 is a block diagram of the speech translation system 1 in the present embodiment.
- FIG. 3 is a block diagram of the dictionary server device 13.
- FIG. 4 is a block diagram of the voice recognition server device 14.
- FIG. 5 is a block diagram of the translation server device 15.
- FIG. 6 is a block diagram of the speech synthesis server device 16.
- the first terminal device 11 includes a first voice reception unit 111, a first voice transmission unit 112, a first voice reception unit 113, and a first voice output unit 114.
- the second terminal device 12 includes a second voice reception unit 121, a second voice transmission unit 122, a second voice reception unit 123, and a second voice output unit 124.
- the dictionary server device 13 includes an all-language group dictionary storage unit 131, a speech recognition information transmission unit 132, a translation information transmission unit 133, a speech synthesis information transmission unit 134, a notation acquisition unit 135, a notation A storage unit 136, an information reception unit 137, an information storage unit 138, and an output unit 139 are provided.
- the voice recognition server device 14 includes a voice recognition information storage unit 141, a voice recognition information reception unit 142, a voice recognition information storage unit 143, a voice information reception unit 144, a voice recognition unit 145, A recognition result transmission unit 146 is provided.
- the voice recognition unit 145 includes a voice recognition determination unit 1451, a voice recognition information transmission instruction unit 1452, and a voice recognition unit 1453.
- the translation server device 15 includes a translation information storage unit 151, a translation information reception unit 152, a translation information storage unit 153, a speech recognition result reception unit 154, a translation unit 155, and a translation result transmission unit 156. It comprises.
- the translation unit 155 includes a translation determination unit 1551, a translation information transmission instruction unit 1552, and a translation unit 1553.
- the speech synthesis server device 16 includes a speech synthesis information storage unit 161, a speech synthesis information reception unit 162, a speech synthesis information storage unit 163, a translation result reception unit 164, a speech synthesis unit 165, A synthesis result transmission unit 166 is provided.
- the voice synthesis unit 165 includes a voice synthesis determination unit 1651, a voice synthesis information transmission instruction unit 1652, and a voice synthesis unit 1653.
- the third terminal device 17 includes an input reception unit 171, an information reception unit 172, an information output unit 173, and an input information transmission unit 174.
- the first voice receiving unit 111 receives voice from the user (referred to as user A) of the first terminal device 11.
- the first voice reception unit 111 can be realized by, for example, a microphone and a device driver thereof.
- the first voice transmission unit 112 transmits the voice received by the first voice reception unit 111.
- the voice transmission destination is any one of the one or more voice recognition server apparatuses 14.
- the first voice transmission unit 112 may transmit voice to two or more voice recognition server devices 14.
- the voice is voice information, and it is preferable that the voice to be transmitted is digitized.
- voice transmission part 112 may transmit audio
- the speech translation control information includes information for the speech recognition server device 14, the translation server device 15, and the speech synthesis server device 16 to perform speech recognition, translation, and speech synthesis, and to transmit processing results, respectively. .
- the speech translation control information includes, for example, information (IP address, telephone number, etc.) specifying a destination to which the processing result is transmitted, information (Japanese, English, German, etc.) specifying the source language and the target language, and the like.
- the first terminal device 11 and the second terminal device 12 receive the source language and the target language from the user. Further, the first terminal device 11 and the second terminal device 12 automatically determine the source language and the target language from the telephone numbers and IP addresses of the second terminal device 12 and the first terminal device 11, for example.
- the second terminal device 12 or the first terminal device 11 holds information such as a telephone number or an IP address and information specifying a language in association with each other, or stores information such as a telephone number or an IP address as a key.
- information specifying a language is acquired from another device.
- the first terminal device 11 and the second terminal device 12 also specify information (such as an IP address) that identifies the speech recognition server device 14 that should be recognized from the source language or the target language, or information that identifies the translation server device 15 ( IP address etc.) and information (IP address etc.) specifying the voice synthesis server device 16 are acquired. That is, the first terminal device 11 and the second terminal device 12 have the source language and the target language associated with information for specifying each server device, Information for identifying each server device is acquired from the device.
- the speech translation control information includes information indicating the format of the input speech, information indicating the format of the output speech, information specifying the voice quality of the input / output speech, information indicating the format of the input text, information indicating the format of the output text, etc. May also be included.
- the first voice transmitting unit 112 may directly transmit the voice to the one or more voice recognition server apparatuses 14 or send the voice to one or more voice recognition servers via another apparatus (indirectly). It may be transmitted to the device 14.
- the first audio transmission unit 112 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the first voice receiving unit 113 receives voice (usually digitized voice information). This voice is a voice obtained by translating the contents of the voice uttered by the user (referred to as user B) of the second terminal device 12 into a language that the user A of the first terminal device 11 can understand.
- the first voice receiving unit 113 receives voice directly or indirectly from the voice synthesis server device 16.
- the first audio receiving unit 113 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the first sound output unit 114 outputs the sound received by the first sound receiving unit 113. You may think that the 1st audio
- the first audio output unit 114 may be implemented by output device driver software, or output device driver software and an output device.
- the second voice receiving unit 121 receives voice from the user B of the second terminal device 12.
- the second voice reception unit 121 can be realized by, for example, a microphone and its device driver.
- the second voice transmission unit 122 transmits the voice received by the second voice reception unit 121.
- the voice transmission destination is any one of the one or more voice recognition server apparatuses 14.
- the second voice transmission unit 122 may transmit voice to two or more voice recognition server devices 14.
- the voice is voice information, and it is preferable that the voice to be transmitted is digitized.
- the second voice transmission unit 122 may directly transmit the voice to the one or more voice recognition server apparatuses 14 or send the voice to the one or more voice recognition servers via another apparatus (indirectly). It may be transmitted to the device 14.
- the second audio transmission unit 122 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the second voice receiving unit 123 receives voice (usually digitized voice information). This voice is a voice obtained by translating the contents of the voice uttered by the user A of the first terminal device 11 into a language (target language) that can be understood by the user B of the second terminal device 12.
- the second voice receiving unit 123 receives voice directly or indirectly from the voice synthesis server device 16.
- the second audio receiving unit 123 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the second audio output unit 124 outputs the audio received by the second audio receiving unit 123.
- the second audio output unit 124 may be considered as including or not including a speaker (output device).
- the first audio output unit 114 may be implemented by output device driver software, or output device driver software and an output device.
- the all language pair dictionary storage unit 131 stores all language pair dictionaries.
- the all language pair dictionary has been described above.
- the speech recognition information included in the all language pair dictionary is, for example, an acoustic model such as a hidden Markov model (HMM).
- the translation information which an all language pair dictionary has is the description of a term, for example.
- the notation of the term is “Osaka” for English and “Osaka” for Chinese when the Japanese language is “Osaka”.
- a term is understood as a broad concept including one word, one or more words, one or more clauses, a sentence, and the like.
- the speech synthesis information is, for example, reading information (referred to as “reading” as appropriate) and accent information (referred to as “accent” as appropriate).
- the term information usually has a reading of the term.
- the term information includes, for example, information such as notation, reading, accent, and tone.
- the term information usually has a different structure depending on the language. For example, when the language is German, the term information includes flag information indicating whether the term is male or female. Such flag information does not exist in term information in languages such as Japanese and English.
- the all-language group dictionary usually has structure information for each language.
- the all-language group dictionary includes “Japanese ⁇ HMM> ⁇ notation> ⁇ reading> ⁇ accent>” “English ⁇ HMM> ⁇ notation> ⁇ reading> ⁇ accent> ⁇ tone>” “German ⁇ notation> ⁇ flag Information> ⁇ Reading> ⁇ Accent> ”.
- the structure information may be managed as one piece of structure information common to the languages, and only the structure information indicating language-specific information may be managed for each language.
- the structure information is information such as structure information common to languages “ ⁇ HMM> ⁇ notation> ⁇ reading> ⁇ accent>”, “German ⁇ flag information>”, and the like.
- the all-language group dictionary storage unit 131 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.
- the process in which various types of information are stored in the all language pair dictionary storage unit 131 is not limited.
- various information may be stored in the all language pair dictionary storage unit 131 via a recording medium, and various information transmitted via a communication line or the like may be stored in the all language pair dictionary storage unit 131.
- various types of information input via the input device may be stored in the all language pair dictionary storage unit 131.
- the speech recognition information transmitting unit 132 reads the speech recognition information from the all-language pair dictionary storage unit 131 and transmits the speech recognition information to one or more speech recognition server devices 14.
- the information for speech recognition is information including the term speech recognition information, and is information used for speech recognition by the speech recognition server device 14.
- the speech recognition information is information including speech recognition information for all or some of the two or more languages.
- the information for voice recognition may be the same information as the voice recognition information, or may be information obtained by adding other information to the voice recognition information. Further, it is preferable that the voice recognition information transmitting unit 132 does not transmit the voice recognition information of a term that includes information that is partially missing in the voice recognition information.
- the voice recognition information transmitting unit 132 is usually realized by a wireless or wired communication means, but may be realized by a broadcasting means.
- the translation information transmission unit 133 reads the translation information from the all-language pair dictionary storage unit 131 and transmits the translation information to one or more translation server devices 15.
- the information for translation is information including notation of terms, and is information used for translation by the translation server device 15.
- the information for translation is information including notation for all or some of the two or more languages.
- the information for translation may be notation only, or information obtained by adding other information to the notation.
- the translation information transmission unit 133 does not transmit translation information of a term that includes information that is partially missing in the translation information. In other words, when there is term information that has only the notation of Japanese terms, it is preferable that the information that the term information has is not transmitted at all.
- the translation information transmitting unit 133 is usually realized by a wireless or wired communication means, but may be realized by a broadcasting means.
- the speech synthesis information transmission unit 134 reads the speech synthesis information from the all-language pair dictionary storage unit 131 and transmits the speech synthesis information to one or more speech synthesis server devices 16.
- the information for speech synthesis is information including the term speech synthesis information, and is information used for speech synthesis by the speech synthesis server device 16.
- the information for speech synthesis is information including speech synthesis information for all or some of the two or more languages.
- the information for speech synthesis may be the same information as the speech synthesis information, or may be information obtained by adding other information to the speech synthesis information.
- it is preferable that the speech synthesis information transmission unit 134 does not transmit speech synthesis information of a term that includes information that is partially missing in the speech synthesis information.
- the voice synthesizing information transmission unit 134 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the notation acquisition unit 135 acquires the notation of terms that do not exist in the all-language group dictionary storage unit 131 from the web pages of one or more web server devices on the Internet. It is preferable that the notation acquisition unit 135 also acquires a language identifier that is information for identifying a language when acquiring the notation.
- the notation acquisition unit 135 acquires, for example, the language identifier “Japanese” when the URL of the web page from which the term is acquired includes “.jp”, and the language identifier “Korean” when the URL includes “.kr”.
- the notation acquisition part 135 may identify a language automatically from the character code of the database or web page from which the term was acquired, for example. Further, the notation acquisition unit 135 may acquire a term from a web page and then inquire the user and input a language.
- the notation acquisition unit 135 When the notation acquisition unit 135 acquires a term from the web page, searches the all-language group dictionary storage unit 131 using the term as a key, and determines that the term does not exist in the all-language group dictionary storage unit 131 It is also possible that the term has been acquired. That is, in such a case, the notation acquisition unit 135 may discard the term once the acquired term exists in the all-language group dictionary storage unit 131.
- the notation acquisition unit 135 can usually be realized by an MPU, memory, communication means, or the like.
- the processing procedure of the notation acquisition unit 135 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the notation acquisition unit 135 may perform a process of starting a so-called search engine. In such a case, the notation acquisition unit 135 may be considered not to have a search engine or to have a search engine.
- the notation accumulation unit 136 accumulates the notation of the term acquired by the notation acquisition unit 135 in the all-language group dictionary storage unit 131. Normally, the notation storage unit 136 stores the acquired term notation as a language notation corresponding to the language identifier acquired by the notation acquisition unit 135.
- the notation storage unit 136 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the notation storage unit 136 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the information receiving unit 137 receives any of the term information from one or more third terminal devices 17.
- the received information is, for example, HMM, notation, reading, accent, tone, flag information, and the like.
- the information receiving unit 137 normally receives information together with information for identifying a language and a corresponding notation. That is, it goes without saying that the received information is in a situation where it is possible to determine which language corresponds to which notation or reading.
- reception means reception of information input from an input device such as a keyboard, mouse, touch panel, reception of information transmitted via a wired or wireless communication line, recording on an optical disk, magnetic disk, semiconductor memory, or the like. It is a concept including reception of information read from a medium.
- the information receiving unit 137 can be realized by, for example, a wired or wireless communication unit.
- the information storage unit 138 stores the information received by the information receiving unit 137 in association with the corresponding term notation in the corresponding language.
- the information is stored in the corresponding language area.
- the information receiving unit 137 receives a notation and other information such as a reading already exists, the notation is stored in association with the reading of the corresponding term in the corresponding language.
- the information storage unit 138 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the information storage unit 138 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the output unit 139 outputs all language term information or partial information of all language term information.
- the output unit 139 normally outputs information according to an instruction from the third terminal device 17 or an instruction from the user.
- the output here is usually transmission of information to the third terminal device 17 that has transmitted the instruction.
- the output unit 139 determines whether there is all the predetermined information for all two or more languages. It is preferable to output all language term information or some information of all language term information in a visually different manner depending on the case where some information of all information does not exist.
- All the predetermined information is information corresponding to the structure information that the all-language group dictionary storage unit 131 has.
- the output unit 139 is normally realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the voice recognition information storage unit 141 can store voice recognition information for all of two or more languages or a part of two or more languages.
- the voice recognition information storage unit 141 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.
- the voice recognition information is stored in the voice recognition information storage unit 141.
- stored in the information storage part 141 for speech recognition is not ask
- the voice recognition information may be stored in the voice recognition information storage unit 141 via the recording medium, or the voice recognition information input via the input device is stored in the voice recognition information. It may be stored in the unit 141.
- the voice recognition information receiving unit 142 receives the voice recognition information for all or two or more of the two or more languages from the dictionary server device 13.
- the voice recognition information receiving unit 142 may receive the voice recognition information from the dictionary server device 13 in response to the transmission of the instruction to the dictionary server device 13 by the voice recognition information transmission instruction unit 1452.
- the voice recognition information receiving unit 142 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the voice recognition information storage unit 143 stores the voice recognition information received by the voice recognition information receiving unit 142 in the voice recognition information storage unit 141.
- the voice recognition information storage unit 143 stores the voice recognition information received by the voice recognition information receiving unit 142 in the voice recognition information storage unit 141.
- the speech recognition information accumulating unit 143 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the voice recognition information storage unit 143 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the voice information receiving unit 144 receives voice information that is voice information input to the first terminal device 11.
- the voice information receiving unit 144 receives voice information directly or indirectly from the first terminal device 11.
- the audio information receiving unit 144 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the voice recognition unit 145 recognizes the voice information received by the voice information reception unit 144 using the voice recognition information stored in the voice recognition information storage unit 141, and acquires a voice recognition result.
- the voice recognition result is usually a character string in the original language (the language of the voice spoken by the user A of the first terminal device 11).
- the voice recognition method performed by the voice recognition unit 145 may be any voice recognition method. Since the voice recognition unit 145 is a known technique, a detailed description thereof is omitted.
- the voice recognition unit 145 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the voice recognition unit 145 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the voice recognition determination unit 1451 determines whether the voice recognition process for the voice information received by the voice information receiving unit 144 is successful or unsuccessful.
- the voice recognition determination unit 1451 is, for example, information included in the voice information received by the voice information receiving unit 144, and the voice recognition information corresponding to the partial voice information that is information corresponding to one or more terms is the voice recognition information. It is determined whether or not it exists in the storage unit 141.
- the partial voice information is usually a part of the voice information, but may be the same as the voice information.
- the voice recognition determination unit 1451 causes the voice recognition unit 1453 to perform voice recognition processing on the voice information received by the voice information reception unit 144, and determines whether the result is success or failure. Also good.
- the speech recognition information corresponding to the partial speech information is present in the speech recognition information storage unit 141.
- the speech recognition determination unit 1451 is successful when the likelihood of the speech recognition result is greater than (or greater than) a predetermined value, and is less than (or less than) the predetermined value. It may be determined that the voice recognition process has failed.
- the voice recognition determination unit 1451 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the voice recognition determination unit 1451 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the speech recognition information transmission instruction unit 1452 instructs the dictionary server device 13 to transmit the speech recognition information when it is determined that the speech recognition process for the speech information received by the speech information reception unit 144 has failed. .
- the speech recognition determination unit 1451 determines that the speech recognition information does not exist in the speech recognition information storage unit 141
- the speech recognition determination unit 1451 instructs the dictionary server device 13 to transmit the speech recognition information.
- This instruction includes, for example, partial voice information and a language identifier. Further, this instruction includes, for example, a phoneme string generated from partial speech information, a language identifier, and the like.
- the voice recognition information transmission instructing unit 1452 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the voice recognition unit 1453 recognizes the voice information received by the voice information receiving unit using the voice recognition information in the voice recognition information storage unit, and acquires a voice recognition result.
- the voice recognition unit 1453 recognizes voice using the voice recognition information received by the voice recognition information receiving unit 142 from the dictionary server device 13 in response to the transmission of the instruction in the voice recognition information transmission instruction unit 1452, Get recognition result.
- the voice recognition unit 1453 uses the voice recognition information in the voice recognition information storage unit 141 to perform voice recognition when the voice recognition determination unit 1451 determines that the voice recognition information exists in the voice recognition information storage unit 141. And obtain a speech recognition result.
- the voice recognition unit 1453 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the voice recognition unit 1453 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the voice recognition result transmission unit 146 transmits the voice recognition result acquired by the voice recognition unit 145.
- the speech recognition result transmission unit 146 transmits the speech recognition result to the translation server device 15 directly or indirectly.
- the voice recognition result transmission unit 146 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the translation information storage unit 151 can store translation information for all or two or more of the two or more languages.
- the information for translation is, for example, a translation model and a language model.
- the translation information storage unit 151 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.
- the translation information is stored in the translation information storage unit 151 by receiving the translation information from the dictionary server device 13.
- the process in which the translation information is stored in the translation information storage unit 151 does not matter.
- the translation information may be stored in the translation information storage unit 151 via a recording medium, or the translation information input via the input device is stored in the translation information storage unit 151. You may come to be.
- the translation information receiving unit 152 receives translation information from the dictionary server device 13 for all of the two or more languages or for some of the two or more languages.
- the translation information receiving unit 152 receives the notation of the target language term from the dictionary server device 13 in response to the transmission of the instruction to the dictionary server device 13 by the translation information transmission instruction means 1552.
- the translation information receiving unit 152 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the translation information accumulating unit 153 accumulates the translation information received by the translation information receiving unit 152 in the translation information storage unit 151.
- the translation information storage unit 153 stores the notation of the target language term received by the translation information reception unit 152 in the translation information storage unit 151.
- the information storage unit for translation 153 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the translation information storage unit 153 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the voice recognition result receiving unit 154 receives the voice recognition result acquired by the voice recognition server device 14.
- the voice recognition result receiving unit 154 receives the voice recognition result directly or indirectly from the voice recognition server device 14.
- the voice recognition result receiving unit 154 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the translation unit 155 translates the speech recognition result received by the speech recognition result reception unit 154 into a target language using the translation information in the translation information storage unit 151, and acquires the translation result. It does not ask
- the translation unit 155 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the translation unit 155 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the translation determination unit 1551 determines whether the translation process for the speech recognition result received by the speech recognition result receiving unit 154 is successful or unsuccessful.
- the translation determination unit 1551 is, for example, information included in the speech recognition result received by the speech recognition result receiving unit 154, and is a partial speech recognition result (usually a character string of a term in the source language) corresponding to one or more terms. ) To determine whether or not the notation of the target language term in the translation information storage unit 151 exists.
- the partial speech recognition result is usually a part of the speech recognition result, but may be the same as the speech recognition result.
- the translation determination unit 1551 causes the translation unit 1553 to perform a translation process on the speech recognition result received by the speech recognition result receiving unit 154, and determines whether the result is a success or a failure. good. In the case of success, the notation of the target language term corresponding to the partial speech recognition result is present in the translation information storage unit 151. Moreover, when it is a failure, it is a case where the notation of the term of the target language corresponding to a partial speech recognition result does not exist in the information storage part 151 for translation.
- the translation determining unit 1551 succeeds in the translation process when the likelihood of the translation result is larger (or larger) than a predetermined value, and performs the translation process when the likelihood is smaller (or smaller) than the predetermined value.
- the translation determining unit 1551 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the translation determination unit 1551 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the translation information transmission instructing unit 1552 When the translation determining unit 1551 determines that the target language term is not present in the translation information storage unit 151, the translation information transmission instructing unit 1552 indicates the target language term to the dictionary server device 13. Instruct to send. This instruction includes, for example, a description of terms in the source language and a language identifier of the target language.
- the translation information transmission instructing unit 1552 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the translation unit 1553 translates the speech recognition result received by the speech recognition result receiving unit 154 into a target language using the translation information stored in the translation information storage unit 151, and acquires the translation result. Also, the translation unit 1553 uses the notation of the target language term received by the translation information receiving unit 152 from the dictionary server device 13 in response to the transmission of the instruction by the translation information transmission instruction unit 1552, and the speech recognition result. Translate to the target language and obtain the translation results. Also, the translation unit 1553 uses, for example, the notation of the target language term in the translation information storage unit 151 when the translation determination unit 1551 determines that the notation of the target language term exists in the translation information storage unit 151. To translate the speech recognition results and obtain the translation results.
- Translation means 1553 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the translation unit 1553 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the translation result transmission unit 156 transmits the translation result acquired by the translation unit 155.
- the translation result transmission unit 156 transmits the translation result directly or indirectly to the speech synthesis server device 16.
- the translation result transmitting unit 156 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the speech synthesis information storage unit 161 can store speech synthesis information for all of the two or more languages or for some of the two or more languages.
- the voice synthesis information storage unit 161 is preferably a non-volatile recording medium, but can also be realized by a volatile recording medium.
- the speech synthesis information is stored in the speech synthesis information storage unit 161.
- the process for storing the speech synthesis information in the speech synthesis information storage unit 161 is not limited.
- the speech synthesis information may be stored in the speech synthesis information storage unit 161 via the recording medium, or the speech synthesis information input via the input device is stored as the speech synthesis information. It may be stored in the unit 161.
- the speech synthesis information receiving unit 162 receives speech synthesis information from the dictionary server device 13 for all of the two or more languages or for some of the two or more languages.
- the speech synthesis information receiving unit 162 receives the speech synthesis information from the dictionary server device 13 in response to the transmission of the instruction to the dictionary server device 13 by the speech synthesis information transmission instruction unit 1652.
- the speech synthesis information receiving unit 162 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the speech synthesis information storage unit 163 stores the speech synthesis information received by the speech synthesis information reception unit 162 in the speech synthesis information storage unit 161.
- the speech synthesis information storage unit 163 stores the speech synthesis information received by the speech synthesis information reception unit 162 in the speech synthesis information storage unit 161.
- the speech synthesis information storage unit 163 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the speech synthesis information storage unit 163 is usually realized by software, and the software is recorded in a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the translation result receiving unit 164 receives the translation result acquired by the translation server device 15.
- the translation result receiving unit 164 receives the translation result directly or indirectly from the translation server device 15.
- the translation result receiving unit 164 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the speech synthesizer 165 synthesizes the translation result received by the translation result receiver 164 using the speech synthesis information stored in the speech synthesis information storage unit 161, and acquires the speech synthesis result. It doesn't matter if it is a speech synthesis algorithm. Since the speech synthesizer 165 is a known technique, a detailed description thereof will be omitted.
- the speech synthesizer 165 can usually be realized by an MPU, a memory, or the like.
- the processing procedure of the speech synthesizer 165 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the speech synthesis determination unit 1651 determines whether the speech synthesis process for the translation result received by the translation result receiving unit 164 is successful or unsuccessful.
- the speech synthesis determination unit 1651 is, for example, information included in the translation result received by the translation result receiving unit 164, and the speech synthesis information corresponding to the partial translation result that is information corresponding to one or more terms is information for speech synthesis. It is determined whether or not the storage unit 161 exists.
- the partial translation result is usually a part of the translation result, but may be the same as the translation result.
- the speech synthesis determination unit 1651 causes the speech synthesis unit 1653 to perform speech synthesis processing on the translation result received by the translation result receiving unit 164, and determines whether the result is a success or a failure. Also good.
- the speech synthesis information corresponding to the partial translation result is present in the speech synthesis information storage unit 161. Moreover, when it is a failure, it is a case where the speech synthesis information corresponding to the partial translation result does not exist in the speech synthesis information storage unit 161.
- the speech synthesis determining unit 1651 is successful when the speech synthesis process is successful when the likelihood of the speech synthesis result is greater than (or greater than) a predetermined value, and is less than (or less than) the predetermined value. It may be determined that the speech synthesis process has failed.
- the speech synthesis determination unit 1651 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the speech synthesis determination unit 1651 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the speech synthesis information transmission instruction means 1652 instructs the dictionary server device 13 to transmit the speech synthesis information when it is determined that the speech synthesis process for the translation result received by the translation result receiving unit 164 has failed. .
- the speech synthesis information transmission instruction unit 1652 transmits the speech synthesis information to the dictionary server device 13 when the speech synthesis determination unit 1651 determines that the speech synthesis information does not exist in the information storage unit 161 for speech synthesis.
- indication contains the description (partial translation result) of the term of a target language, and the language identifier of a target language, for example.
- the voice synthesis information transmission instruction unit 1652 is normally realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the speech synthesizer 1653 synthesizes the translation result received by the translation result receiving unit 164 using the speech synthesis information stored in the speech synthesis information storage unit 161, and acquires the speech synthesis result.
- the speech synthesis unit 1653 uses the speech synthesis information received by the speech synthesis information receiving unit 162 from the dictionary server device 13 in response to the transmission of the instruction by the speech synthesis information transmission instruction unit 1652 to convert the partial translation result into a speech. Synthesize and obtain speech synthesis result.
- the speech synthesis unit 1653 uses the speech synthesis information in the speech synthesis information storage unit 161 when the speech synthesis determination unit 1651 determines that the speech synthesis information exists in the speech synthesis information storage unit 161.
- the partial translation result may be synthesized with speech to obtain the speech synthesis result.
- the speech synthesis result is usually speech information of a target language.
- the voice synthesizing unit 1653 can be usually realized by an MPU, a memory, or the like.
- the processing procedure of the voice synthesizing means 1653 is usually realized by software, and the software is recorded on a recording medium such as a ROM. However, it may be realized by hardware (dedicated circuit).
- the speech synthesis result transmission unit 166 transmits the speech synthesis result acquired by the speech synthesis unit 165 to the second terminal device 12.
- the speech synthesis result transmission unit 166 transmits the speech synthesis result to the second terminal device 12 directly or indirectly.
- the speech synthesis result transmission unit 166 is usually realized by a wireless or wired communication unit, but may be realized by a broadcasting unit.
- the input reception unit 171 receives input of various instructions and various information from the user.
- An instruction or information input means may be anything such as a numeric keypad, a keyboard, a mouse, or a menu screen.
- the input receiving unit 171 can be realized by a device driver for input means such as a numeric keypad or a keyboard, control software for a menu screen, and the like.
- the information receiving unit 172 receives information from the dictionary server device 13.
- the received information is all language term information or a part of all language term information.
- the information receiving unit 172 is usually realized by a wireless or wired communication means, but may be realized by a means for receiving a broadcast.
- the information output unit 173 outputs the information received by the information receiving unit 172.
- the information output unit 173 visually determines whether there are all predetermined information for all two or more languages and when some information of all the predetermined information does not exist.
- the received information (all-language term information or partial information of all-language term information) is output in a different manner.
- the information output unit 173 may be considered as including or not including an output device such as a display or a speaker.
- the information output unit 173 can be implemented by output device driver software, or output device driver software and an output device.
- the input information transmission unit 174 transmits the instruction or information received by the input reception unit 171 to the dictionary server device 13.
- the input information transmission unit 174 is usually realized by a wireless or wired communication means, but may be realized by a broadcasting means.
- the first voice reception unit 111 of the first terminal device 11 receives the language A voice uttered by the user A. Then, the first voice transmission unit 112 uses the received voice as voice information, and transmits the voice information (sometimes simply referred to as “voice”).
- the first voice receiving unit 113 receives voice information of the language A obtained by voice translation of the voice information of the language B uttered by the user B. Next, the 1st audio
- the second voice receiving unit 123 of the second terminal device 12 receives the voice information of the language B obtained by voice translation of the voice of the language A uttered by the user A. And the 2nd audio
- the second voice receiving unit 121 receives a language B voice from the user B of the second terminal device 12. And the 2nd audio
- Step S701 The dictionary server device 13 determines whether or not an instruction is received from an external device. If an instruction is received, the process goes to step S702, and if no instruction is received, the process goes to step S706.
- Step S702 The dictionary server device 13 determines whether or not the instruction received in step S701 is an information transmission instruction. If it is an information transmission instruction, the process proceeds to step S703, and if it is not an information transmission instruction, the process proceeds to step S705.
- Step S703 The speech recognition information transmitting unit 132, the translation information transmitting unit 133, or the speech synthesizing information transmitting unit 134 of the dictionary server device 13 converts the information corresponding to the instruction received in Step S701 to the all-language group dictionary.
- a search is performed from the storage unit 131 to acquire information necessary for speech recognition, information necessary for translation, or information necessary for speech synthesis. More specifically, for example, the voice recognition information transmission unit 132 searches the all-language group dictionary storage unit 131 using the voice information that failed in voice recognition included in the instruction received in step S701 and the language identifier as keys. Voice recognition information (for example, an acoustic model) is acquired.
- the translation information transmission unit 133 uses the speech recognition result (may be a part), the source language identifier, and the target language identifier, which have failed in translation, included in the instruction received in step S701 as keys. Then, the all language pair dictionary storage unit 131 is searched, and translation information (for example, notation of a target language term) is acquired. Further, for example, the speech synthesis information transmission unit 134 uses the translation result (may be a part) of the speech synthesis failed in the instruction received in step S701 and the target language identifier as a key. The dictionary storage unit 131 is searched to acquire speech synthesis information (for example, term reading and accent information).
- Step S704 The speech recognition information transmission unit 132, the translation information transmission unit 133, or the speech synthesis information transmission unit 134 transmits the information acquired in step S703.
- the transmission destination is a device (speech recognition server device 14, translation server device 15, or speech synthesis server device 16) that has transmitted the instruction. The process returns to step S701.
- Step S705 The dictionary server device 13 performs processing corresponding to the instruction received in step S701. For example, when the instruction is an information output instruction (transmission instruction), the output unit 139 searches the all-language pair dictionary storage unit 131 for information corresponding to the instruction, and transmits the information acquired by the search. Transmit to the device (usually the third terminal device 17). The process returns to step S701.
- the instruction is an information output instruction (transmission instruction)
- the output unit 139 searches the all-language pair dictionary storage unit 131 for information corresponding to the instruction, and transmits the information acquired by the search. Transmit to the device (usually the third terminal device 17).
- the process returns to step S701.
- Step S706 The information receiving unit 137 determines whether information is received from an external device (usually the third terminal device 17). If the information is received, the process goes to step S707, and if the information is not received, the process goes to step S708.
- the information storage unit 138 stores the information received in step S706 in a corresponding area in the all language pair dictionary storage unit 131.
- the information is, for example, term reading, HMM, accent information, tone information, and the like, and is information that can be used for speech recognition, translation, or speech synthesis.
- the corresponding area means the following.
- the received information includes, for example, information for identifying a language identifier and term (such as term notation) and information to be added (such as term reading, HMM, accent information, tone information).
- the information storage unit 138 determines an area in which information to be additionally written is to be written using information for specifying a language identifier and a term, and accumulates information to be additionally written in the area. The process returns to step S701.
- Step S708 The speech recognition information transmission unit 132, the translation information transmission unit 133, or the speech synthesis information transmission unit 134 determines whether it is time to transmit information. If it is time to transmit information, the process proceeds to step S714. If it is not time to transmit information, the process proceeds to step S709.
- the information is information for speech recognition, information for translation, or information for speech synthesis.
- the timing which transmits information is the timing which received the instruction
- Step S709 The notation acquisition unit 135 determines whether it is time to start crawling on the web. If it is the timing to start crawling, the process goes to step S710. For example, the notation acquisition unit 135 determines that it is time to start crawling periodically.
- Step S710 The notation acquisition unit 135 crawls on the web and acquires the notation of terms.
- the notation acquisition unit 135 searches the updated web page, compares it with the old web page, extracts the updated part, and acquires terms (nouns, noun phrases, verbs, adjectives, etc.) from the updated part. Is preferred.
- the terminology is also referred to as “term” as appropriate. Further, since the crawling technique is a known technique, detailed description thereof is omitted. Further, the notation acquisition unit 135 may acquire one term or a plurality of terms in this step.
- Step S711 The notation acquisition unit 135 searches the all-language group dictionary storage unit 131 using the notation of one or more terms acquired in step S710 as a key.
- Step S712 The notation acquisition unit 135 determines whether or not the notation of one or more terms acquired in step S710 exists in the all-language group dictionary storage unit 131 as a result of the search in step S711. If all terminology is present, the process returns to step S701, and if any term is not present, the process proceeds to step S713.
- Step S713 The notation accumulation unit 136 accumulates in the all language pair dictionary storage unit 131 the notation of one or more terms determined not to exist in the all language pair dictionary storage unit 131 in step S712. Note that the notation storage unit 136 stores notation of one or more terms in an area corresponding to a language identifier associated with the notation of terms.
- the notation acquisition unit 135 acquires a language identifier in association with a term when acquiring the term. The process returns to step S701.
- Step S714 The voice recognition information transmitting unit 132 determines whether it is time to transmit the voice recognition information. If it is time to transmit the information for speech recognition, the process goes to step S715. If it is not time to transmit the information for speech recognition, the process goes to step S717.
- the timing for transmitting the speech recognition information is, for example, the timing when an instruction from the user is received.
- Step S715 The speech recognition information transmitting unit 132 reads the speech recognition information from the all-language group dictionary storage unit 131.
- Step S716 The voice recognition information transmitting unit 132 transmits the voice recognition information read in step S715 to one or more voice recognition server devices 14. Note that information (IP address, URL, etc.) for transmitting information to one or more voice recognition server devices 14 is held in advance by the voice recognition information transmitting unit 132. The process returns to step S701.
- Step S717 The translation information transmitting unit 133 determines whether or not it is time to transmit the translation information. If it is time to transmit the information for translation, the process goes to step S718. If it is not time to send the information for translation, the process goes to step S720.
- the timing which transmits the information for translation is the timing which received the instruction
- the translation information transmitting unit 133 reads the translation information from the all-language group dictionary storage unit 131.
- Step S719) The translation information transmitting unit 133 transmits the translation information read in step S718 to one or more translation server devices 15. It is assumed that information (IP address, URL, etc.) for transmitting information to one or more translation server devices 15 is held in advance by translation information transmission unit 133. The process returns to step S701.
- Step S720 The speech synthesis information transmission unit 134 determines whether it is time to transmit the speech synthesis information. If it is time to transmit the information for speech synthesis, the process goes to step S721. If it is not time to transmit the information for speech synthesis, the process returns to step S701. Note that the timing for transmitting the information for speech synthesis is, for example, the timing when an instruction from the user is received.
- the speech synthesis information transmission unit 134 reads the speech synthesis information from the all-language group dictionary storage unit 131.
- Step S722 The speech synthesis information transmitting unit 134 transmits the speech synthesis information read in Step S718 to one or more speech synthesis server devices 16. It is assumed that information (IP address, URL, etc.) for transmitting information to one or more speech synthesis server devices 16 is held in advance by the speech synthesis information transmission unit 134. The process returns to step S701.
- Step S801 The voice information receiving unit 144 determines whether voice information has been received. If the voice information is received, the process goes to step S802. If the voice information is not received, the process goes to step S809.
- Step S802 The voice recognition unit 1453 performs voice recognition processing on the voice information received in step S801, and obtains a voice recognition result.
- Step S803 The voice recognition determination unit 1451 determines whether or not the result of the voice recognition in step S802 is successful. Here, for example, the determination is made using the likelihood of the speech recognition result. If successful, go to step S804, and if not successful, go to step S805.
- Step S804 The speech recognition result transmitting unit 146 transmits the speech recognition result acquired in Step S802 to the translation server device 15. The process returns to step S801.
- the voice recognition information transmission instruction unit 1452 acquires information to be transmitted to the dictionary server device 13 in order to acquire information (voice recognition information) necessary for voice recognition of the voice information.
- the information to be acquired is, for example, voice information that has failed in voice recognition (may be a part), information that specifies a language (language identifier), or the like.
- Step S806 The voice recognition information transmission instructing unit 1452 transmits an instruction including the information acquired in Step S805 to the dictionary server device 13. This instruction is an instruction for prompting transmission of voice recognition information.
- Step S807 In response to the transmission of the instruction in step S806, the voice recognition information receiving unit 142 determines whether or not the voice recognition information has been received. If the voice recognition information is received, the process goes to step S808. If the voice recognition information is not received, the process returns to step S807.
- Step S808 The voice recognition information storage unit 143 stores the voice recognition information received in step S807 in the voice recognition information storage unit 141.
- Step S809 The voice recognition unit 1453 uses the voice recognition information received in step S807, performs voice recognition processing on the voice information received in step S801, and obtains a voice recognition result. Go to step S804.
- Step S810 The speech recognition information receiving unit 142 determines whether or not the speech recognition information has been received from the dictionary server device 13 for all of the two or more languages or for some of the two or more languages. to decide. If the voice recognition information is received, the process goes to step S811, and if the voice recognition information is not received, the process returns to step S801.
- Step S811 The voice recognition information storage unit 143 stores the voice recognition information received in step S810 in the voice recognition information storage unit 141. The process returns to step S801.
- Step S901 The voice recognition result receiving unit 154 determines whether or not a voice recognition result has been received. If the voice recognition result is received, the process goes to step S902. If the voice recognition result is not received, the process goes to step S909.
- Step S902 The translation unit 1553 performs a translation process on the speech recognition result received in step S901 to obtain a translation result.
- Step S903 The translation determination means 1551 determines whether or not the result of translation in Step S902 is successful. Here, for example, the determination is made using the likelihood of the translation result. If successful, go to step S904, otherwise go to step S905.
- Step S904 The translation result transmission unit 156 transmits the translation result acquired in Step S902 to the speech synthesis server device 16. The process returns to step S901.
- the translation information transmission instruction means 1552 acquires information to be transmitted to the dictionary server device 13 in order to acquire information (translation information) necessary for translating the speech recognition result.
- the information to be acquired includes, for example, a speech recognition result (which may be a part) that has failed in translation, an identifier of the source language, an identifier of the target language, and the like.
- Step S906 The translation information transmission instructing unit 1552 transmits an instruction including the information acquired in Step S905 to the dictionary server device 13. This instruction is an instruction for prompting transmission of translation information.
- Step S907 In response to the transmission of the instruction in step S906, the translation information receiving unit 152 determines whether or not translation information has been received. If translation information is received, it will go to step S908, and if translation information is not received, it will return to step S907.
- Step S908 The translation information storage unit 153 stores the translation information received in step S907 in the translation information storage unit 151.
- Step S909 The translation unit 1553 performs translation processing on the speech recognition result received in step S901 using the translation information received in step S907, and obtains a translation result. Go to step S904.
- Step S ⁇ b> 910) The translation information receiving unit 152 determines whether translation information has been received from the dictionary server device 13 for all of the two or more languages or for some of the two or more languages. . If the translation information is received, the process goes to step S911. If the translation information is not received, the process returns to step S901.
- Step S911 The translation information accumulation unit 153 accumulates the translation information received in Step S910 in the translation information storage unit 151. The process returns to step S901.
- Step S1001 The translation result receiving unit 164 determines whether or not the translation result has been received. If the translation result is received, the process goes to step S1002, and if the translation result is not received, the process goes to step S1009.
- Step S1002 The voice synthesizer 1653 performs a voice synthesis process on the voice information received in step S1001, and obtains a voice synthesis result.
- Step S1003 The speech synthesis determination means 1651 determines whether or not the result of speech synthesis in step S1002 is successful. Here, for example, the determination is made using the likelihood of the speech synthesis result. If successful, go to step S1004, otherwise go to step S1005.
- Step S1004 The speech synthesis result transmission unit 166 transmits the speech synthesis result acquired in Step S1002 to the second terminal device 12. The process returns to step S1001.
- the speech synthesis information transmission instruction means 1652 acquires information to be transmitted to the dictionary server device 13 in order to acquire information (speech synthesis information) necessary for speech synthesis of the translation result.
- the information to be acquired is, for example, a translation result (may be a part) that has failed in speech synthesis and an identifier of the target language.
- Step S1006 The speech synthesis information transmission instruction unit 1652 transmits an instruction including the information acquired in Step S1005 to the dictionary server device 13. This instruction is an instruction for prompting transmission of speech synthesis information.
- Step S1007 In response to the transmission of the instruction in step S1006, the speech synthesis information receiving unit 162 determines whether or not the speech synthesis information has been received. If the speech synthesis information is received, the process goes to step S1008. If the speech synthesis information is not received, the process returns to step S1007.
- Step S1008 The voice synthesis information storage unit 163 stores the voice synthesis information received in step S1007 in the voice synthesis information storage unit 161.
- Step S1009 The speech synthesis unit 1653 uses the speech synthesis information received in step S1007, performs speech synthesis processing on the translation result received in step S1001, and obtains a speech synthesis result. Go to step S1004.
- Step S1010 The speech synthesis information receiving unit 162 determines whether or not the speech synthesis information has been received from the dictionary server device 13 for all of the two or more languages or for some of the two or more languages. to decide. If the voice synthesis information is received, the process goes to step S1011. If the voice synthesis information is not received, the process returns to step S1001.
- Step S1011 The speech synthesis information storage unit 163 stores the speech synthesis information received in step S1010 in the speech synthesis information storage unit 161. The process returns to step S1001.
- the input receiving unit 171 of the third terminal device 17 receives various instructions and various information inputs from the user. For example, the input reception unit 171 receives an instruction to output an all language pair dictionary from the user. Then, the input information transmission unit 174 transmits the output instruction received by the input reception unit 171 to the dictionary server device 13. In response to the transmission of the output instruction, the information receiving unit 172 receives all or part of the all-language dictionary from the dictionary server device 13. Next, the information output unit 173 outputs all or part of the all-language group dictionary received by the information receiving unit 172.
- the information output unit 173 includes a case where all the predetermined information exists for all two or more languages and a part of all the predetermined information.
- the received information (all language term information or a part of all language term information) is output in a visually different manner depending on the case where it does not exist. That is, all or a part of the all-language group dictionary (information of one or more terms) is output so as to clearly indicate to the user that the information is missing in terms in which missing information exists.
- the input receiving unit 171 receives input of missing information in the all-language group dictionary from the user. Then, the input information transmission unit 174 transmits the information received by the input reception unit 171 to the dictionary server device 13. Then, such information is accumulated in the all language pair dictionary. As a result, the dictionary for all languages is enriched.
- FIG. 1 A conceptual diagram of the speech translation system 1 is shown in FIG.
- the all language pair dictionary storage unit 131 of the dictionary server device 13 stores the all language pair dictionary of FIG.
- the all-language group dictionary of FIG. 11 has the following structure in order to solve the problem of inconsistency between dictionaries in speech translation.
- the all language pair dictionary is information in which information necessary for the speech recognition dictionary, the translation parallel dictionary, and the synthesized speech dictionary is centrally managed for all language pairs.
- the all-language group dictionary includes structure information “Japanese ⁇ notation> ⁇ reading> ⁇ acoustic model> ⁇ accent> ...” “English ⁇ notation> ⁇ reading> ⁇ acoustic model> ⁇ accent> ⁇ tone "" "Chinese ⁇ notation> ⁇ reading> ⁇ acoustic model> ⁇ accent> ⁇ voice tone> " "German ⁇ notation> ⁇ reading> ⁇ acoustic model> ⁇ accent> ⁇ flag information> ... ⁇ ”Etc.
- the all-language pair dictionary has term information for each language corresponding to the structure information. In FIG.
- Japanese term information of the term “Osaka” is “ ⁇ notation> Osaka ⁇ reading> Osaka ⁇ acoustic model>... ⁇ Accent> 4 mora 0 type ...”.
- the information enclosed in parentheses (“ ⁇ " ">”) is information indicating the elements (attributes) of the dictionary, and the information group enclosed in parentheses (" ⁇ " ">”) Structure information.
- the voice recognition information transmitting unit 132 of the dictionary server device 13 determines that it is time to transmit the voice recognition information according to an instruction from the user.
- the voice recognition information transmitting unit 132 reads out the voice recognition information shown in FIG. 12 (which has the same meaning as the voice recognition dictionary) from the all language pair dictionary storage unit 131.
- the speech recognition dictionary has information such as “notation” and “acoustic model” for each term and each language.
- the voice recognition information transmitting unit 132 transmits the read voice recognition dictionary to one or more voice recognition server devices 14.
- the information receiving unit 142 for speech recognition of each of the one or more speech recognition server devices 14 receives the speech recognition dictionary from the dictionary server device 13.
- the speech recognition information storage unit 143 stores the received speech recognition dictionary (speech recognition information) in the speech recognition information storage unit 141.
- the translation information transmitting unit 133 of the dictionary server device 13 determines that it is time to transmit the translation information.
- the translation information transmitting unit 133 reads out the translation information (which has the same meaning as the translation dictionary) shown in FIG. 13 from the all language pair dictionary storage unit 131.
- the translation dictionary has information such as “notation” for each term and each language.
- the translation information transmitting unit 133 transmits the read translation dictionary to one or more translation server devices 15.
- the translation information receiving unit 152 of each of the one or more translation server devices 15 receives the translation dictionary from the dictionary server device 13.
- the translation information storage unit 153 stores the received translation dictionary (translation information) in the translation information storage unit 151.
- the speech synthesis information transmission unit 134 determines that it is time to transmit the speech synthesis information.
- the speech synthesis information transmitting unit 134 reads the speech synthesis information shown in FIG. 14 (which has the same meaning as the speech synthesis dictionary) from the all-language pair dictionary storage unit 131.
- the speech synthesis dictionary has information such as “notation”, “reading”, and “accent” for each term and each language.
- the speech synthesis information transmitting unit 134 transmits the read speech synthesis dictionary to one or more speech synthesis server devices 16.
- the speech synthesis information receiving unit 162 receives the speech synthesis dictionary from the dictionary server device 13.
- the speech synthesis information storage unit 163 stores the received speech synthesis dictionary (speech synthesis information) in the speech synthesis information storage unit 161.
- consistent dictionary is stored in the speech recognition server device 14, the translation server device 15, and the speech synthesis server device 16, and speech translation can usually be performed without any problem.
- the respective dictionaries of the speech recognition server device 14, the translation server device 15, and the speech synthesis server device 16 may be independently expanded, and as a result, inconsistencies between the dictionaries may occur.
- the user A of the first terminal device 11 Suppose that the user B of the second terminal device 12 has a telephone conversation using the speech translation system.
- the language of the user A is, for example, Japanese.
- the language of user B is, for example, English.
- the speech translation control information includes source language information (here, ⁇ source language> information) that is the language spoken by the user, and target language information (here, ⁇ target language> information) that is spoken by the conversation partner.
- source language information here, ⁇ source language> information
- target language information here, ⁇ target language> information
- Information for communicating with the speech recognition server device 14 (here, information of ⁇ speech recognition server>), information for communicating with the translation server device 15 (here, information of ⁇ translation server>), speech synthesis server device 16 (information of ⁇ speech synthesis server> here), the identifier of the second terminal device 12 or the first terminal device 11 (here ⁇ information of the other party terminal>), and the first terminal device 11 or And the identifier of the second terminal device 12 (here, information of ⁇ own terminal>).
- Each information of ⁇ speech recognition server>, ⁇ translation server>, and ⁇ speech synthesis server> is an IP address of each device here, but it goes without saying that other information such as a URL and a telephone number may be used. .
- each information of ⁇ partner terminal> and ⁇ own terminal> is a telephone number here, but it goes without saying that other information such as an IP address and a MAC address may be used.
- the first terminal device 11 configures voice information of “Osaka”. Then, the first terminal device 11 reads the speech translation control information shown in FIG. 15, and the speech recognition server device 14 specified by “ ⁇ speech recognition server> 186.221.1.27” is sent to the speech of “Osaka”. Information and the speech translation control information shown in FIG. 15 are transmitted.
- the speech recognition server device 14 receives the speech information of “Osaka” and the speech translation control information shown in FIG.
- the speech recognition means 1453 of the speech recognition server device 14 acquires “ ⁇ source language> Japanese” of the speech translation control information shown in FIG. Then, the speech recognition unit 1453 performs speech recognition processing on the received speech information of “Osaka” using a Japanese acoustic model (see FIG. 12), and the speech recognition result (character string “Osaka”). Have).
- the voice recognition determination unit 1451 determines whether or not the likelihood of the voice recognition result is equal to or greater than a predetermined threshold (whether or not the voice recognition is successful). Here, it is assumed that the voice recognition is successful.
- the speech recognition result transmitting unit 146 sends the acquired speech recognition result to the translation server device 15 indicated by “ ⁇ translation server> 225.68.21.129” included in the speech translation control information shown in FIG. Send. Further, the speech recognition result transmitting unit 146 also transmits the speech translation control information shown in FIG. 15 to the translation server device 15 indicated by “ ⁇ translation server> 225.68.21.129”.
- the speech recognition result receiving unit 154 of the translation server device 15 receives the speech recognition result (having “Osaka”) and the speech translation control information.
- the translation unit 1553 reads “ ⁇ source language> Japanese” and “ ⁇ target language> English” included in the speech translation control information.
- the translation unit 1553 determines that the speech recognition result (having “Osaka”) is in Japanese, and the term “Osaka” of “ ⁇ target language> English” paired with the term “Osaka” is changed to FIG. Read from the translation dictionary. Then, the translation unit 1553 obtains the translation result “Osaka”.
- the translation determination unit 1551 has determined that the translation result is successful because the term corresponding to “ ⁇ target language> English” has been searched from the translation dictionary.
- the translation result transmission unit 156 converts the acquired translation result “Osaka” and the speech translation control information shown in FIG. 15 into a speech synthesis server indicated by “ ⁇ speech synthesis server> 56.72.128.202”. Transmit to device 16.
- the translation result receiving unit 164 of the speech synthesis server device 16 receives the translation result “Osaka” and the speech translation control information shown in FIG.
- the speech synthesis unit 1653 reads “ ⁇ target language> English” from the speech translation control information.
- the speech synthesis means 1653 reads out the speech synthesis information (reading and accent etc.) corresponding to the terms “Osaka” and “ ⁇ target language> English” from the speech synthesis dictionary of FIG.
- the speech synthesis means 1653 performs speech synthesis processing using the read speech synthesis information and obtains a speech synthesis result.
- the speech synthesis determination unit 1651 determines that the speech synthesis result is successful because the likelihood of the speech synthesis result is equal to or greater than a predetermined threshold.
- the speech synthesis result transmitting unit 166 transmits the acquired speech synthesis result to the second terminal device 12 specified by “ ⁇ other party terminal> 090-1445-1122”.
- the second voice receiving unit 123 of the second terminal device 12 receives the English voice (“Osaka” voice) obtained by voice translation of the Japanese voice “Osaka” uttered by the user A. To do.
- the second audio output unit 124 outputs the audio (“Osaka” audio) received by the second audio receiving unit 123.
- the speech recognition server device 14 when each process fails in each processing step of speech recognition, translation, and speech synthesis, the speech recognition server device 14, the translation server device 15, or the speech synthesis server device 16 makes an inquiry to the dictionary server device 13 in real time. , Get the information you need.
- the speech recognition server device 14, the translation server device 15, or the speech synthesis server device 16 proceeds with speech recognition, translation, or speech synthesis processing after receiving information necessary for each processing from the dictionary server device 13. Is as described above.
- the notation acquisition unit 135 of the dictionary server device 13 determines that it is the timing to start crawling on the web.
- the notation acquisition unit 135 crawls on the web, and acquires the notation “Nagoya” of the term “Nagoya”, for example. Then, the notation acquisition unit 135 searches the all-language dictionary using the acquired term notation “Nagoya” as a key. Then, the notation acquisition unit 135 determines that the acquired term notation “Nagoya” does not exist in the all-language group dictionary as a result of the search.
- the notation accumulating unit 136 accumulates the notation “Nagoya” of the term determined not to exist in the all language pair dictionary in the all language pair dictionary.
- the all language pair dictionary is, for example, as shown in FIG. In FIG. 17, the term “Nagoya” only has a notation, and there is no reading or acoustic model.
- the input information transmission unit 174 transmits the output instruction received by the input reception unit 171 to the dictionary server device 13.
- the dictionary server device 13 receives the output instruction. Then, the output unit 139 searches the all language pair dictionary for information corresponding to the output instruction, and transmits the information acquired by the search to the device (usually the third terminal device 17) that has transmitted the instruction.
- the information receiving unit 172 of the third terminal device 17 receives a part of the all-language group dictionary from the dictionary server device 13.
- the information output unit 173 outputs a part of the all language pair dictionary received by the information receiving unit 172.
- An example of such output is shown in FIG.
- FIG. 18 when all the predetermined information exists for all two or more languages (the term “Osaka”), and when some of the predetermined information does not exist ( The term “Nagoya”) outputs information in a visually different manner.
- empty data areas (cells) are shaded. This is to prompt the user of the third terminal device 17 to input the missing information.
- the user inputs, for example, the reading “Nagoya” of the term “Nagoya”, Japanese acoustic model, Korean notation and reading, and other English notation and reading.
- the input reception part 171 receives the input from a user.
- the input information transmission unit 174 of the third terminal device 17 transmits the information received by the input reception unit 171 to the dictionary server device 13.
- the information receiving unit 137 of the dictionary server device 13 receives some information from the third terminal device 17 (for example, reading “Nagoya” of the term “Nagoya”, Japanese acoustic model, Korean language) English notation and reading, etc.).
- the information storage unit 138 stores the information received by the information reception unit 137 in association with the notation of the corresponding term in the corresponding language.
- the all-language dictionary of the dictionary server device 13 is enriched.
- the speech recognition server device 14 when each process fails in each process step of speech recognition, translation, and speech synthesis, the speech recognition server device 14, the translation server device 15, or the speech synthesis server device 16 Then, the dictionary server device 13 is inquired and necessary information is transmitted. As a result, speech translation is performed with a very high probability.
- the all-language group dictionary of the dictionary server device 13 can be enriched by crawling on the web or receiving input of information from the user. For this reason, speech translation may be possible for various terms including new terms.
- the voice recognition unit 145 has mainly been described as including the voice recognition determination unit 1451, the voice recognition information transmission instruction unit 1452, and the voice recognition unit 1453.
- the speech recognition unit 145 may not include the speech recognition determination unit 1451 and the speech recognition information transmission instruction unit 1452. In such a case, it is assumed that the voice recognition unit 145 does not fail the voice recognition process.
- the translation unit 155 may not include the translation determination unit 1551 and the translation information transmission instruction unit 1552. In such a case, it is assumed that the translation unit 155 does not fail the translation process.
- the speech synthesis unit 165 has been described as including the speech synthesis determination unit 1651, the speech synthesis information transmission instruction unit 1652, and the speech synthesis unit 1653. However, the speech synthesizer 165 may not include the speech synthesis determination unit 1651 and the speech synthesis information transmission instruction unit 1652. In such a case, it is assumed that the speech synthesis unit 165 does not fail the speech synthesis process.
- the speech translation system 1 is a speech translation system having a dictionary server device, one or more speech recognition server devices, one or more translation server devices, and one or more speech synthesis server devices.
- the dictionary server device includes, for each language of two or more languages, notation of terms having one or more words, speech recognition information that is information for speech recognition of terms, and information for speech synthesis of terms.
- All-language term dictionary information that includes two or more all-language term information associated with all of the two or more languages, and the two or more languages.
- Speech recognition information including the speech recognition information of the term is acquired from the all-language group dictionary storage unit for all the languages or a part of two or more languages, and the one or more speech recognition services are acquired.
- a speech recognition information transmitting unit for transmitting to the device, and information for translation including notation of the terms for all or some of the two or more languages.
- Speech synthesis information of the terms for all or some of the two or more languages and the translation information transmitting unit that is acquired from the storage unit and transmitted to the one or more translation server devices A speech synthesis information transmitting unit that acquires the information for speech synthesis including the information from the all-language group dictionary storage unit and transmits the information to the one or more speech synthesis server devices.
- a speech recognition information storage unit capable of storing speech recognition information and all of the two or more languages from the dictionary server device Language
- a voice recognition information storage unit a voice information reception unit that receives voice information that is voice information input to the first terminal device, and a voice information received by the voice information reception unit is stored in the voice recognition information.
- a speech recognition unit that performs speech recognition using the speech recognition information of a part and obtains a speech recognition result; and a speech recognition result transmission unit that transmits the speech recognition result.
- a translation information storage unit capable of storing translation information for all languages or a part of two or more languages, and from the dictionary server device, all the languages of the two or more languages or 2 or more part
- a translation information receiving unit for receiving sound translation information, a translation information storing unit for storing translation information received by the translation information receiving unit in the translation information storage unit, and the speech recognition
- the speech recognition result receiving unit that receives the result and the speech recognition result received by the speech recognition result receiving unit are translated into a target language using the translation information in the translation information storage unit, and the translation result is obtained.
- the speech synthesis server device includes a translation unit and a translation result transmission unit that transmits the translation result, and the speech synthesis server device is for speech synthesis with respect to all or some of the two or more languages.
- a speech synthesis information storage unit that can store information, and a speech synthesis information that receives speech synthesis information for all or some of the two or more languages from the dictionary server device Information receiver
- a speech synthesis information storage unit that stores the speech synthesis information received by the speech synthesis information reception unit in the speech synthesis information storage unit, a translation result reception unit that receives the translation result, and the translation result reception
- a speech synthesis unit that synthesizes speech received using the speech synthesis information in the speech synthesis information storage unit and acquires the speech synthesis result, and transmits the speech synthesis result to the second terminal device.
- a speech translation system including a speech synthesis result transmission unit.
- the processing in the present embodiment may be realized by software. Then, this software may be distributed by software download or the like. Further, this software may be recorded on a recording medium such as a CD-ROM and distributed.
- the software that implements the dictionary server device according to the present embodiment is the following program. In other words, this program synthesizes a term into a storage medium for each language of two or more languages, the notation of a term having one or more words, speech recognition information that is information for speech recognition of the terms, and the terms. Storing two or more pieces of all language term information having term information having speech synthesis information as information for all of the two or more languages, and storing the computer in the two or more languages.
- Speech recognition information including speech recognition information including the speech recognition information of the term for all the languages or a part of two or more languages from the storage medium and transmitted to the one or more speech recognition server devices
- translation information including notation of the term is acquired from the storage medium, and the one or more languages
- the information transmission unit for translation to be transmitted to the translation server device, and the speech synthesis information including the speech synthesis information of the term for all or some of the two or more languages A program for causing a medium to function as a voice synthesis information transmission unit that is acquired from a medium and transmitted to the one or more voice synthesis server devices.
- the computer acquires a notation of a term that does not exist in the storage medium from a web page of one or more web server devices on the Internet, and a notation of the term acquired by the notation acquisition unit.
- a program for further functioning as a notation storage unit for storing in a storage medium is preferably a program for further functioning as a notation storage unit for storing in a storage medium.
- the computer corresponds to the information receiving unit that receives any of the term information from one or more third terminal devices and the information received by the information receiving unit in a corresponding language. It is preferable that the program is a program for further functioning as an information storage unit stored in the storage medium in association with the terminology.
- the computer further causes an output unit that outputs the all language term information or a part of the all language term information to function, and the output unit includes the all language term information or the all language term When outputting a part of the information, there is a case where all the predetermined information exists for all the two or more languages, and a case where a part of the predetermined information exists.
- the program is a program for causing the all-language term information or a part of the all-language term information to be output in a visually different manner.
- the software that realizes the voice recognition server device in the present embodiment is the following program. That is, this program stores the speech recognition information for all or some of the two or more languages in the storage medium.
- the speech recognition information receiving unit that receives speech recognition information for all of the above languages or a part of two or more languages, and the speech recognition information received by the speech recognition information receiving unit A voice recognition information storage unit stored in a storage medium; a voice information reception unit that receives voice information that is voice information input to the first terminal device; and the voice information received by the voice information reception unit, A program for functioning as a voice recognition unit that performs voice recognition using voice recognition information stored in a storage medium and acquires a voice recognition result, and a voice recognition result transmission unit that transmits the voice recognition result
- the voice recognition unit includes: a voice recognition determination unit that determines whether voice recognition processing for the voice information received by the voice information reception unit is successful or unsuccessful; and the voice recognition determination unit.
- Voice recognition information transmission instruction means for instructing the dictionary server device to transmit voice recognition information when it is determined that the voice recognition processing is unsuccessful;
- Voice information received by the voice information receiving unit is voice-recognized using voice recognition information in the storage medium, a voice recognition result is obtained, and the dictionary server device responds to the transmission of the instruction.
- Voice recognition means for performing voice recognition using the voice recognition information received by the voice recognition information receiving unit and acquiring a voice recognition result, the voice recognition information receiving unit corresponding to the transmission of the instruction; It is preferable that the program is a program for causing the voice recognition information to be received from the dictionary server device.
- the software that realizes the translation server device in the present embodiment is the following program. That is, this program stores translation information for all or some of the two or more languages in the storage medium, and the computer is connected to the dictionary server device from the two or more languages.
- a translation information receiving unit that receives sound translation information and the translation information received by the translation information receiving unit are stored in the storage medium for all of the languages or a part of two or more languages.
- the translation unit determines whether the translation process for the speech recognition result received by the speech recognition result receiving unit is successful or unsuccessful; and Translation information transmission instructing means for instructing the dictionary server device to transmit the notation of the target language term when the translation processing is determined to be unsuccessful, and the speech received by the speech recognition result receiving unit
- the recognition result is translated into a target language
- the translation result is obtained, and in response to the transmission of the instruction, the translation information receiving unit from the dictionary server device
- a translation unit that translates the speech recognition result into a target language using the received notation of a term in the target language and obtains the translation result, and the information receiving unit for translation supports transmission of the instruction Te it is preferably a program for functioning as receiving a representation of said target language terms from the dictionary server device.
- the software that implements the speech synthesis server device in the present embodiment is the following program. That is, this program stores speech synthesis information for all or some of the two or more languages in the storage medium, and the computer is connected to the dictionary server device from the two or more languages.
- the speech synthesis information receiving unit that receives speech synthesis information and the speech synthesis information received by the speech synthesis information receiving unit for all languages or a part of two or more languages
- a speech synthesis information storage unit stored in a medium, a translation result reception unit that receives the translation result, and a translation result received by the translation result reception unit are synthesized using speech synthesis information in the storage medium.
- the speech synthesis unit includes speech synthesis determination means for determining whether the speech synthesis process for the translation result received by the translation result reception unit is successful or unsuccessful, and the speech synthesis determination unit.
- speech synthesis information transmission instructing means for instructing the dictionary server device to transmit speech synthesis information
- the translation result received by the translation result receiving unit Is synthesized using speech synthesis information in the storage medium
- a speech synthesis result is obtained, and the speech synthesis information receiving unit receives from the dictionary server device in response to the transmission of the instruction.
- Speech synthesis means for synthesizing the translation result using speech synthesis information and obtaining the speech synthesis result, and the speech synthesis information receiving unit It is preferred from the dictionary server is a program for functioning as receiving the speech synthesis information.
- FIG. 19 shows the external appearance of a computer that executes the program described in this specification to realize the speech translation system or the like of the above-described embodiment.
- the above-described embodiments can be realized by computer hardware and a computer program executed thereon.
- FIG. 19 is an overview diagram of the computer system 340
- FIG. 20 is a block diagram of the computer system 340.
- the computer system 340 includes a computer 341 including an FD drive and a CD-ROM drive, a keyboard 342, a mouse 343, and a monitor 344.
- the computer 341 stores an MPU 3413, a bus 3414 connected to the CD-ROM drive 3412 and the FD drive 3411, and a program such as a bootup program.
- a RAM 3416 for temporarily storing application program instructions and providing a temporary storage space; and a hard disk 3417 for storing application programs, system programs, and data.
- the computer 341 may further include a network card that provides connection to the LAN.
- a program that causes the computer system 340 to execute functions such as voice recognition according to the above-described embodiment is stored in the CD-ROM 3501 or the FD 3502, inserted into the CD-ROM drive 3412 or the FD drive 3411, and further stored in the hard disk 3417. May be forwarded.
- the program may be transmitted to the computer 341 via a network (not shown) and stored in the hard disk 3417.
- the program is loaded into the RAM 3416 at the time of execution.
- the program may be loaded directly from the CD-ROM 3501, the FD 3502, or the network.
- the program does not necessarily include an operating system (OS), a third-party program, or the like that causes the computer 341 to execute a function such as voice recognition according to the above-described embodiment.
- the program only needs to include an instruction portion that calls an appropriate function (module) in a controlled manner and obtains a desired result. How the computer system 340 operates is well known and will not be described in detail.
- processing performed by hardware for example, processing performed by a modem or an interface card in the transmission step (only performed by hardware). Not included) is not included.
- the computer that executes the program may be singular or plural. That is, centralized processing may be performed, or distributed processing may be performed.
- two or more communication means (such as a voice information receiving unit and a voice recognition information receiving unit) existing in one device may be physically realized by one medium. Needless to say.
- each process may be realized by centralized processing by a single device (system), or by distributed processing by a plurality of devices. May be.
- the speech translation system according to the present invention has the effect of eliminating inconsistencies between dictionaries used in speech translation, and is useful as a speech translation system.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Acoustics & Sound (AREA)
- Machine Translation (AREA)
Abstract
Description
本実施の形態において、全言語対辞書を有する辞書サーバ装置を用いて、音声翻訳において使用する辞書の間の不整合をなくすことができる音声翻訳システムについて説明する。なお、全言語対辞書とは、音声認識、翻訳、および音声合成に必要な情報を一元管理している辞書である。全言語対辞書は、2以上の全言語用語情報を格納している辞書の情報である。全言語用語情報は、音声翻訳の対象となり得る2以上の言語について、言語ごとに、一つの用語情報を有する情報である。用語情報は、音声認識に必要な情報である音声認識情報と、翻訳に必要な情報である翻訳情報と、音声合成に必要な情報である音声合成情報とを有する情報である。用語情報は、一つの用語に関する情報である。また、用語情報は、言語により、その構造が異なっても良い。また、音声翻訳の対象となり得る2以上の言語は、好ましくは3以上の言語である。
前記音声情報受信部が受信した音声情報を、前記記憶媒体の音声認識用情報を用いて音声認識し、音声認識結果を取得し、かつ、前記指示の送信に対応して、前記辞書サーバ装置から前記音声認識用情報受信部が受信した音声認識情報を用いて音声認識し、音声認識結果を取得する音声認識手段とを具備し、前記音声認識用情報受信部は、前記指示の送信に対応して、前記辞書サーバ装置から前記音声認識情報を受信するものとして機能させるためのプログラムであることは好適である。
前記音声合成結果を第二端末装置に送信する音声合成結果送信部として機能させるためのプログラムである。
Claims (12)
- 辞書サーバ装置と、1以上の音声認識サーバ装置、1以上の翻訳サーバ装置、1以上の音声合成サーバ装置とを有する音声翻訳システムであって、
前記辞書サーバ装置は、
2以上の言語の各言語について、1以上の単語を有する用語の表記と、用語を音声認識するための情報である音声認識情報と、用語を音声合成するための情報である音声合成情報とを有する用語情報を、前記2以上の言語のすべてについて対応付けて有する全言語用語情報を、2以上格納し得る全言語対辞書格納部と、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、前記用語の音声認識情報を含む音声認識用情報を、前記全言語対辞書格納部から取得し、前記1以上の音声認識サーバ装置に送信する音声認識用情報送信部と、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、前記用語の表記を含む翻訳用情報を、前記全言語対辞書格納部から取得し、前記1以上の翻訳サーバ装置に送信する翻訳用情報送信部と、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、前記用語の音声合成情報を含む音声合成用情報を、前記全言語対辞書格納部から取得し、前記1以上の音声合成サーバ装置に送信する音声合成用情報送信部とを具備し、
前記音声認識サーバ装置は、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、音声認識用情報を格納し得る音声認識用情報格納部と、
前記辞書サーバ装置から、前記2以上の言語のうちのすべての言語または2以上の一部の言語についての音声認識用情報を受信する音声認識用情報受信部と、
前記音声認識用情報受信部が受信した音声認識用情報を前記音声認識用情報格納部に蓄積する音声認識用情報蓄積部と、
第一端末装置に入力された音声の情報である音声情報を受信する音声情報受信部と、
前記音声情報受信部が受信した音声情報を、前記音声認識用情報格納部の音声認識用情報を用いて音声認識し、音声認識結果を取得する音声認識部と、
前記音声認識結果を送信する音声認識結果送信部とを具備し、
前記翻訳サーバ装置は、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、翻訳用情報を格納し得る翻訳用情報格納部と、
前記辞書サーバ装置から、前記2以上の言語のうちのすべての言語または2以上の一部の言語について、音翻訳用情報を受信する翻訳用情報受信部と、
前記翻訳用情報受信部が受信した翻訳用情報を前記翻訳用情報格納部に蓄積する翻訳用情報蓄積部と、
前記音声認識結果を受信する音声認識結果受信部と、
前記音声認識結果受信部が受信した音声認識結果を、前記翻訳用情報格納部の翻訳用情報を用いて、目的言語に翻訳し、翻訳結果を取得する翻訳部と、
前記翻訳結果を送信する翻訳結果送信部とを具備し、
前記音声合成サーバ装置は、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、音声合成用情報を格納し得る音声合成用情報格納部と、
前記辞書サーバ装置から、前記2以上の言語のうちのすべての言語または2以上の一部の言語について、音声合成用情報を受信する音声合成用情報受信部と、
前記音声合成用情報受信部が受信した音声合成用情報を前記音声合成用情報格納部に蓄積する音声合成用情報蓄積部と、
前記翻訳結果を受信する翻訳結果受信部と、
前記翻訳結果受信部が受信した翻訳結果を、前記音声合成用情報格納部の音声合成用情報を用いて音声合成し、音声合成結果を取得する音声合成部と、
前記音声合成結果を第二端末装置に送信する音声合成結果送信部とを具備する音声翻訳システム。 - 前記音声認識サーバ装置の前記音声認識部は、
前記音声情報受信部が受信した音声情報に対する音声認識処理が成功であるか失敗であるかを判断する音声認識判断手段と、
前記音声認識判断手段が、前記音声認識処理が失敗であると判断した場合に、前記辞書サーバ装置に対して、音声認識情報を送信する指示を行う音声認識情報送信指示手段と、
前記音声情報受信部が受信した音声情報を、前記音声認識用情報格納部の音声認識用情報を用いて音声認識し、音声認識結果を取得し、かつ、前記指示の送信に対応して、前記辞書サーバ装置から前記音声認識用情報受信部が受信した音声認識情報を用いて音声認識し、音声認識結果を取得する音声認識手段とを具備し、
前記音声認識用情報受信部は、
前記指示の送信に対応して、前記辞書サーバ装置から前記音声認識情報を受信する請求項1記載の音声翻訳システム。 - 前記翻訳サーバ装置の前記翻訳部は、
前記音声認識結果受信部が受信した音声認識結果に対する翻訳処理が成功であるか失敗であるかを判断する翻訳判断手段と、
前記翻訳判断手段が、前記翻訳処理が失敗であると判断した場合に、前記辞書サーバ装置に対して、前記目的言語の用語の表記を送信する指示を行う翻訳情報送信指示手段と、
前記音声認識結果受信部が受信した音声認識結果を、前記翻訳用情報格納部の翻訳用情報を用いて、目的言語に翻訳し、翻訳結果を取得し、かつ、前記指示の送信に対応して、前記辞書サーバ装置から前記翻訳用情報受信部が受信した目的言語の用語の表記を用いて、前記音声認識結果を目的言語に翻訳し、翻訳結果を取得する翻訳手段とを具備し、
前記翻訳用情報受信部は、
前記指示の送信に対応して、前記辞書サーバ装置から前記目的言語の用語の表記を受信する請求項1記載の音声翻訳システム。 - 前記音声合成サーバ装置の前記音声合成部は、
前記翻訳結果受信部が受信した翻訳結果に対する音声合成処理が成功であるか失敗であるかを判断する音声合成判断手段と、
前記音声合成判断手段が、前記音声合成処理が失敗であると判断した場合に、前記辞書サーバ装置に対して、音声合成情報を送信する指示を行う音声合成情報送信指示手段と、
前記翻訳結果受信部が受信した翻訳結果を、前記音声合成用情報格納部の音声合成用情報を用いて音声合成し、音声合成結果を取得し、かつ、前記指示の送信に対応して、前記辞書サーバ装置から前記音声合成用情報受信部が受信した音声合成情報を用いて前記翻訳結果を音声合成し、音声合成結果を取得する音声合成手段とを具備し、
前記音声合成用情報受信部は、
前記指示の送信に対応して、前記辞書サーバ装置から前記音声合成情報を受信する請求項1記載の音声翻訳システム。 - 前記辞書サーバ装置は、
インターネット上の1以上のウェブサーバ装置のウェブページから、前記全言語対辞書格納部に存在しない用語の表記を取得する表記取得部と、
前記表記取得部が取得した用語の表記を、前記全言語対辞書格納部に蓄積する表記蓄積部とをさらに具備する請求項1から請求項4いずれか記載の音声翻訳システム。 - 前記辞書サーバ装置は、
1以上の第三端末装置から、用語情報のうちのいずれかの情報を受け付ける情報受付部と、
前記情報受付部が受け付けた情報を、対応する言語の対応する用語の表記に対応付けて、前記全言語対辞書格納部に蓄積する情報蓄積部とをさらに具備する請求項5記載の音声翻訳システム。 - 前記辞書サーバ装置は、
前記全言語用語情報または前記全言語用語情報の一部の情報を出力する出力部をさらに具備し、
前記出力部は、
前記全言語用語情報または前記全言語用語情報の一部の情報を出力する場合に、前記2以上のすべての言語について、予め決められたすべての情報が存在する場合と、予め決められたすべての情報のうちの一部の情報が存在しない場合とにより、視覚的に異なる態様で、前記全言語用語情報または前記全言語用語情報の一部の情報を出力する請求項5または請求項6記載の音声翻訳システム。 - 請求項1記載の音声翻訳システムを構成する辞書サーバ装置。
- 請求項1記載の音声翻訳システムを構成する音声認識サーバ装置。
- 請求項1記載の音声翻訳システムを構成する翻訳サーバ装置。
- 請求項1記載の音声翻訳システムを構成する音声合成サーバ装置。
- 記憶媒体に、
2以上の言語の各言語について、1以上の単語を有する用語の表記と、用語を音声認識するための情報である音声認識情報と、用語を音声合成するための情報である音声合成情報とを有する用語情報を、前記2以上の言語のすべてについて対応付けて有する全言語用語情報を、2以上格納しており、
コンピュータを、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、前記用語の音声認識情報を含む音声認識用情報を、前記記憶媒体から取得し、1以上の音声認識サーバ装置に送信する音声認識用情報送信部と、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、前記用語の表記を含む翻訳用情報を、前記記憶媒体から取得し、1以上の翻訳サーバ装置に送信する翻訳用情報送信部と、
前記2以上の言語のうちのすべての言語または2以上の一部の言語について、前記用語の音声合成情報を含む音声合成用情報を、前記記憶媒体から取得し、1以上の音声合成サーバ装置に送信する音声合成用情報送信部として機能させるためのプログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP10799657.1A EP2455936B1 (en) | 2009-07-16 | 2010-03-03 | Speech translation system, dictionary server, and program |
CN2010800312696A CN102473413B (zh) | 2009-07-16 | 2010-03-03 | 语音翻译系统、词典服务器装置及语音翻译方法 |
US13/383,742 US9442920B2 (en) | 2009-07-16 | 2010-03-03 | Speech translation system, dictionary server, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009167501A JP5471106B2 (ja) | 2009-07-16 | 2009-07-16 | 音声翻訳システム、辞書サーバ装置、およびプログラム |
JP2009-167501 | 2009-07-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011007595A1 true WO2011007595A1 (ja) | 2011-01-20 |
Family
ID=43449205
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/053418 WO2011007595A1 (ja) | 2009-07-16 | 2010-03-03 | 音声翻訳システム、辞書サーバ装置、およびプログラム |
Country Status (6)
Country | Link |
---|---|
US (1) | US9442920B2 (ja) |
EP (1) | EP2455936B1 (ja) |
JP (1) | JP5471106B2 (ja) |
KR (1) | KR101626887B1 (ja) |
CN (1) | CN102473413B (ja) |
WO (1) | WO2011007595A1 (ja) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102706343A (zh) * | 2012-06-01 | 2012-10-03 | 陈耀伦 | 一种基于互联网语音导航的方法、装置和系统 |
KR101834546B1 (ko) | 2013-08-28 | 2018-04-13 | 한국전자통신연구원 | 핸즈프리 자동 통역 서비스를 위한 단말 장치 및 핸즈프리 장치와, 핸즈프리 자동 통역 서비스 방법 |
CN103533129B (zh) * | 2013-10-23 | 2017-06-23 | 上海斐讯数据通信技术有限公司 | 实时的语音翻译通信方法、系统及所适用的通讯设备 |
KR101740332B1 (ko) * | 2013-11-05 | 2017-06-08 | 한국전자통신연구원 | 자동 번역 장치 및 방법 |
JP2016133861A (ja) * | 2015-01-16 | 2016-07-25 | 株式会社ぐるなび | 情報多言語変換システム |
JP2016177782A (ja) * | 2015-03-19 | 2016-10-06 | パナソニックIpマネジメント株式会社 | ウェアラブル装置及び翻訳システム |
JP7000671B2 (ja) * | 2016-10-05 | 2022-01-19 | 株式会社リコー | 情報処理システム、情報処理装置、及び情報処理方法 |
KR102384641B1 (ko) * | 2017-02-20 | 2022-04-08 | 엘지전자 주식회사 | 다국어 처리를 수행하는 인공 지능 시스템의 제어 방법 |
CN106998359A (zh) * | 2017-03-24 | 2017-08-01 | 百度在线网络技术(北京)有限公司 | 基于人工智能的语音识别服务的网络接入方法以及装置 |
US10826857B2 (en) * | 2017-10-20 | 2020-11-03 | Sap Se | Message processing for cloud computing applications |
WO2019172946A1 (en) * | 2018-03-07 | 2019-09-12 | Google Llc | Facilitating end-to-end communications with automated assistants in multiple languages |
US11403462B2 (en) * | 2019-09-12 | 2022-08-02 | Oracle International Corporation | Streamlining dialog processing using integrated shared resources |
CN111274828B (zh) * | 2020-01-21 | 2021-02-02 | 陈刚 | 基于留言的语言翻译方法、系统、计算机程序和手持终端 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6488875A (en) * | 1987-09-30 | 1989-04-03 | Toshiba Corp | Voice translation device |
JP2000148176A (ja) * | 1998-11-18 | 2000-05-26 | Sony Corp | 情報処理装置および方法、提供媒体、音声認識システム、音声合成システム、翻訳装置および方法、並びに翻訳システム |
JP2000194698A (ja) * | 1998-12-25 | 2000-07-14 | Sony Corp | 情報処理装置および方法、並びに提供媒体 |
JP2002123282A (ja) * | 2000-10-17 | 2002-04-26 | Brother Ind Ltd | 翻訳装置および記録媒体 |
JP2002245038A (ja) * | 2001-02-21 | 2002-08-30 | Ricoh Co Ltd | 携帯端末装置による多国語翻訳システム |
Family Cites Families (38)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2667408B2 (ja) * | 1987-10-08 | 1997-10-27 | 株式会社東芝 | 翻訳通信システム |
GB2302199B (en) * | 1996-09-24 | 1997-05-14 | Allvoice Computing Plc | Data processing method and apparatus |
JP3466857B2 (ja) * | 1997-03-06 | 2003-11-17 | 株式会社東芝 | 辞書更新方法および辞書更新システム |
JPH11328179A (ja) * | 1998-05-08 | 1999-11-30 | Toshiba Corp | 辞書管理方法及び辞書管理システム |
CN1311881A (zh) * | 1998-06-04 | 2001-09-05 | 松下电器产业株式会社 | 语言变换规则产生装置、语言变换装置及程序记录媒体 |
US6138099A (en) * | 1998-10-19 | 2000-10-24 | International Business Machines Corp. | Automatically updating language models |
US6219638B1 (en) * | 1998-11-03 | 2001-04-17 | International Business Machines Corporation | Telephone messaging and editing system |
JP2000259632A (ja) * | 1999-03-09 | 2000-09-22 | Toshiba Corp | 自動通訳システム、通訳プログラム伝送システム、記録媒体および情報伝送媒体 |
JP2001005488A (ja) * | 1999-06-18 | 2001-01-12 | Mitsubishi Electric Corp | 音声対話システム |
WO2001084535A2 (en) * | 2000-05-02 | 2001-11-08 | Dragon Systems, Inc. | Error correction in speech recognition |
JP3581648B2 (ja) * | 2000-11-27 | 2004-10-27 | キヤノン株式会社 | 音声認識システム、情報処理装置及びそれらの制御方法、プログラム |
US7493259B2 (en) * | 2002-01-04 | 2009-02-17 | Siebel Systems, Inc. | Method for accessing data via voice |
US7210130B2 (en) * | 2002-02-01 | 2007-04-24 | John Fairweather | System and method for parsing data |
US7013262B2 (en) * | 2002-02-12 | 2006-03-14 | Sunflare Co., Ltd | System and method for accurate grammar analysis using a learners' model and part-of-speech tagged (POST) parser |
JP2003295893A (ja) * | 2002-04-01 | 2003-10-15 | Omron Corp | 音声認識システム、装置、音声認識方法、音声認識プログラム及び音声認識プログラムを記録したコンピュータ読み取り可能な記録媒体 |
SE0202058D0 (sv) * | 2002-07-02 | 2002-07-02 | Ericsson Telefon Ab L M | Voice browsing architecture based on adaptive keyword spotting |
GB2395029A (en) * | 2002-11-06 | 2004-05-12 | Alan Wilkinson | Translation of electronically transmitted messages |
DE10304229A1 (de) * | 2003-01-28 | 2004-08-05 | Deutsche Telekom Ag | Kommunikationssystem, Kommunikationsendeinrichtung und Vorrichtung zum Erkennen fehlerbehafteter Text-Nachrichten |
JP2005055607A (ja) * | 2003-08-01 | 2005-03-03 | Toyota Motor Corp | サーバ、情報処理端末、音声合成システム |
JP2005202884A (ja) * | 2004-01-19 | 2005-07-28 | Toshiba Corp | 送信装置、受信装置、中継装置、および送受信システム |
JP3895766B2 (ja) * | 2004-07-21 | 2007-03-22 | 松下電器産業株式会社 | 音声合成装置 |
JP2006099296A (ja) * | 2004-09-29 | 2006-04-13 | Nec Corp | 翻訳システム、翻訳通信システム、機械翻訳方法、および、プログラム |
US7620549B2 (en) * | 2005-08-10 | 2009-11-17 | Voicebox Technologies, Inc. | System and method of supporting adaptive misrecognition in conversational speech |
US7949529B2 (en) * | 2005-08-29 | 2011-05-24 | Voicebox Technologies, Inc. | Mobile systems and methods of supporting natural language human-machine interactions |
JP4047885B2 (ja) * | 2005-10-27 | 2008-02-13 | 株式会社東芝 | 機械翻訳装置、機械翻訳方法および機械翻訳プログラム |
WO2007097176A1 (ja) * | 2006-02-23 | 2007-08-30 | Nec Corporation | 音声認識辞書作成支援システム、音声認識辞書作成支援方法及び音声認識辞書作成支援用プログラム |
US7756708B2 (en) * | 2006-04-03 | 2010-07-13 | Google Inc. | Automatic language model update |
KR100859532B1 (ko) * | 2006-11-06 | 2008-09-24 | 한국전자통신연구원 | 대응 문형 패턴 기반 자동통역 방법 및 장치 |
US8355915B2 (en) * | 2006-11-30 | 2013-01-15 | Rao Ashwin P | Multimodal speech recognition system |
JP5121252B2 (ja) * | 2007-02-26 | 2013-01-16 | 株式会社東芝 | 原言語による音声を目的言語に翻訳する装置、方法およびプログラム |
JP5233989B2 (ja) * | 2007-03-14 | 2013-07-10 | 日本電気株式会社 | 音声認識システム、音声認識方法、および音声認識処理プログラム |
JP2008243080A (ja) | 2007-03-28 | 2008-10-09 | Toshiba Corp | 音声を翻訳する装置、方法およびプログラム |
US7634383B2 (en) * | 2007-07-31 | 2009-12-15 | Northrop Grumman Corporation | Prognosis adaptation method |
JP5098613B2 (ja) * | 2007-12-10 | 2012-12-12 | 富士通株式会社 | 音声認識装置及びコンピュータプログラム |
CN101458681A (zh) | 2007-12-10 | 2009-06-17 | 株式会社东芝 | 语音翻译方法和语音翻译装置 |
JP3142002U (ja) | 2008-03-17 | 2008-05-29 | 隆司 内山 | 翻訳通話システム |
US8326637B2 (en) * | 2009-02-20 | 2012-12-04 | Voicebox Technologies, Inc. | System and method for processing multi-modal device interactions in a natural language voice services environment |
US9171541B2 (en) * | 2009-11-10 | 2015-10-27 | Voicebox Technologies Corporation | System and method for hybrid processing in a natural language voice services environment |
-
2009
- 2009-07-16 JP JP2009167501A patent/JP5471106B2/ja not_active Expired - Fee Related
-
2010
- 2010-03-03 WO PCT/JP2010/053418 patent/WO2011007595A1/ja active Application Filing
- 2010-03-03 EP EP10799657.1A patent/EP2455936B1/en not_active Not-in-force
- 2010-03-03 CN CN2010800312696A patent/CN102473413B/zh not_active Expired - Fee Related
- 2010-03-03 US US13/383,742 patent/US9442920B2/en not_active Expired - Fee Related
- 2010-03-03 KR KR1020127001119A patent/KR101626887B1/ko active IP Right Grant
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS6488875A (en) * | 1987-09-30 | 1989-04-03 | Toshiba Corp | Voice translation device |
JP2000148176A (ja) * | 1998-11-18 | 2000-05-26 | Sony Corp | 情報処理装置および方法、提供媒体、音声認識システム、音声合成システム、翻訳装置および方法、並びに翻訳システム |
JP2000194698A (ja) * | 1998-12-25 | 2000-07-14 | Sony Corp | 情報処理装置および方法、並びに提供媒体 |
JP2002123282A (ja) * | 2000-10-17 | 2002-04-26 | Brother Ind Ltd | 翻訳装置および記録媒体 |
JP2002245038A (ja) * | 2001-02-21 | 2002-08-30 | Ricoh Co Ltd | 携帯端末装置による多国語翻訳システム |
Also Published As
Publication number | Publication date |
---|---|
CN102473413A (zh) | 2012-05-23 |
US20120166176A1 (en) | 2012-06-28 |
CN102473413B (zh) | 2013-08-28 |
EP2455936A1 (en) | 2012-05-23 |
EP2455936A4 (en) | 2018-01-10 |
EP2455936B1 (en) | 2019-01-16 |
KR101626887B1 (ko) | 2016-06-13 |
KR20120040190A (ko) | 2012-04-26 |
JP2011022813A (ja) | 2011-02-03 |
US9442920B2 (en) | 2016-09-13 |
JP5471106B2 (ja) | 2014-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5471106B2 (ja) | 音声翻訳システム、辞書サーバ装置、およびプログラム | |
US11437041B1 (en) | Speech interface device with caching component | |
JP5598998B2 (ja) | 音声翻訳システム、第一端末装置、音声認識サーバ装置、翻訳サーバ装置、および音声合成サーバ装置 | |
KR101683944B1 (ko) | 음성번역 시스템, 제어장치, 및 제어방법 | |
US9761241B2 (en) | System and method for providing network coordinated conversational services | |
EP1125279B1 (en) | System and method for providing network coordinated conversational services | |
US6182038B1 (en) | Context dependent phoneme networks for encoding speech information | |
WO2018021237A1 (ja) | 音声対話装置、音声対話方法、および記録媒体 | |
JP5613335B2 (ja) | 音声認識システム、認識辞書登録システム及び音響モデル識別子系列生成装置 | |
KR20030076661A (ko) | 음성 인식을 위한 방법, 모듈, 디바이스 및 서버 | |
JP2010528389A (ja) | リアルタイム発話自然言語翻訳のための方法およびそのための装置 | |
JP2017120616A (ja) | 機械翻訳方法、及び、機械翻訳システム | |
JP2009122989A (ja) | 翻訳装置 | |
US20170185587A1 (en) | Machine translation method and machine translation system | |
CN111524508A (zh) | 语音对话系统以及语音对话实现方法 | |
JP2000259632A (ja) | 自動通訳システム、通訳プログラム伝送システム、記録媒体および情報伝送媒体 | |
Fischer et al. | Towards multi-modal interfaces for embedded devices | |
JP2016148943A (ja) | 通訳サービス提供システムおよび通訳支援方法ならびに通訳支援プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080031269.6 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10799657 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13383742 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20127001119 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010799657 Country of ref document: EP |