JP6723907B2

JP6723907B2 - Language recognition system, language recognition method, and language recognition program

Info

Publication number: JP6723907B2
Application number: JP2016232115A
Authority: JP
Inventors: 藤田　雄介; 雄介藤田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2016-11-30
Filing date: 2016-11-30
Publication date: 2020-07-15
Anticipated expiration: 2036-11-30
Also published as: JP2018087945A

Description

本発明は、言語認識システム、言語認識方法、及び言語認識プログラムに関する。 The present invention relates to a language recognition system, a language recognition method, and a language recognition program.

人が発した言葉の言語（例えば、日本語、英語、中国語、韓国語等）を識別する技術に関する種々の開発が現在行われている。例えば、特許文献１には、利用者に対して所定の呼びかけ動作を行い、この呼びかけ動作に対して返されるであろう限定されたパターンの言葉を言語識別における認識候補とする言語識別用音声認識エンジンが記載されている。 Various developments are currently underway regarding a technique for identifying the language of words spoken by a person (eg, Japanese, English, Chinese, Korean, etc.). For example, in Patent Document 1, a speech recognition for language identification is performed in which a predetermined calling operation is performed for a user and a limited pattern of words that may be returned in response to the calling operation is a recognition candidate in the language identification. The engine is listed.

特開２００４−５３８２５号公報JP, 2004-53825, A

しかし、特許文献１では、銀行のＡＴＭなどの特定の業務における言語識別においては有効であるものの、言語を認識するためになされる発話が限定されており、応用範囲が狭い。 However, in Patent Document 1, although effective in language identification in a specific business such as ATM of a bank, utterances made for recognizing the language are limited, and the application range is narrow.

そこで、各言語の音声認識装置に入力された発話から出力された認識結果に基づき、各言語についての信頼度を算出し、算出した信頼度のそれぞれを比較することにより言語を識別することが考えられる。ただ、このような方法によっても、発話が短い場合には精度良く識別することが難しい。 Therefore, it may be possible to calculate the reliability of each language based on the recognition result output from the utterance input to the speech recognition device of each language and compare the calculated reliability to identify the language. To be However, even with such a method, it is difficult to accurately identify when the utterance is short.

特に、特定の業務において特殊な言葉が発せされた場合、例えば、いずれの言語の音声認識装置でも認識が困難な言葉が発言者の発言に含まれていた場合には、算出される各言語の信頼度がいずれも低いものとなり、その結果、正しい言語認識ができなくなるおそれがある。例えば、観光業務において、外国からの観光客が発した言葉にその観光客の母国以外の国の地名が含まれている場合、その観光客が発している言語の識別は非常に難しいものとなる。このように、発する言葉の種類によっては、言語識別が難しくなるという問題があった。 In particular, when a special word is uttered in a specific task, for example, when the speech of a speaker includes a word that is difficult to recognize by a speech recognition device in any language, the calculated language The degree of reliability is low, and as a result, correct language recognition may not be possible. For example, in tourism business, if the language spoken by a foreign tourist includes a place name of a country other than the tourist's home country, it is very difficult to identify the language spoken by the tourist. .. As described above, there is a problem that the language identification becomes difficult depending on the type of the spoken words.

本発明はこのような背景に基づきなされたものであり、言語認識を正確に行うための言語認識システム、言語認識方法、及び言語認識プログラムを提供することにある。 The present invention has been made based on such a background, and an object thereof is to provide a language recognition system, a language recognition method, and a language recognition program for accurately performing language recognition.

上記課題を解決するための本発明の一つは、プロセッサ及びメモリを備え、発せられた音声の言語を認識する言語認識システムであって、音声を取得する音声取得部と、第１の言語における第１の語彙に関する情報である第１言語一般語彙情報に基づき、前記取得した音声に対応する前記第１の言語の言葉である第１言語一般翻訳句を生成する第１言語一般翻訳部と、前記第１の言語における第２の語彙に関する情報である第１言語専門語彙情報に基づき、前記取得した音声に対応する前記第１の言語の言葉である第１言語専門翻訳句を生成する第１言語専門翻訳部と、前記生成した第１言語一般翻訳句、及び前記生成した第１言語専門翻訳句の間で相互に対応する部分である第１言語対応部分を抽出する第１対応区間算出部と、第２の言語における第３の語彙に関する情報である第２言語一般語彙情報に基づき、前記取得した音声に対応する前記第２の言語の言葉である第２言語一般翻訳句を生成する第２言語一般翻訳部と、前記第２の言語における第４の語彙に関する情報
である第２言語専門語彙情報に基づき、前記取得した音声に対応する前記第２の言語の言葉である第２言語専門翻訳句を生成する第２言語専門翻訳部と、前記生成した第２言語一般翻訳句、及び前記生成した第２言語専門翻訳句の間で相互に対応する部分である第２言語対応部分を抽出する第２対応区間算出部と、前記抽出した前記第１言語対応部分、及び前記抽出した前記第２言語対応部分に基づき、前記取得した音声の言語を特定する言語識別部と、を備える。 One of the present inventions for solving the above-mentioned problem is a language recognition system that includes a processor and a memory and recognizes a language of a voice that is emitted, and includes a voice acquisition unit that acquires a voice and a first language. A first language general translation unit that generates a first language general translation phrase that is a word in the first language corresponding to the acquired voice, based on first language general vocabulary information that is information about the first vocabulary; A first language specialized translation phrase, which is a word in the first language corresponding to the acquired voice, is generated based on first language specialized vocabulary information that is information about a second vocabulary in the first language. A first specialized section calculating unit that extracts a first specialized language translation unit, the generated first language general translation phrase, and a first language corresponding portion that is a mutually corresponding portion between the generated first language specialized translation phrase. And a second language general translation phrase that is a word in the second language corresponding to the acquired voice, based on second language general vocabulary information that is information about the third vocabulary in the second language. A second language specialized language, which is a word in the second language corresponding to the acquired voice, based on a bilingual general translation unit and second language specialized vocabulary information that is information about the fourth vocabulary in the second language. A second-language specialized translation unit that generates a translated phrase, the generated second-language general translated phrase, and a second-language corresponding portion that is a mutually corresponding portion of the generated second-language specialized translated phrase are extracted. And a language identifying unit that identifies the language of the acquired voice based on the extracted first language corresponding part and the extracted second language corresponding part.

本発明によれば、言語認識を正確に行うことができる。 According to the present invention, language recognition can be accurately performed.

上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

図１は、本実施形態に係る言語認識システム１の構成の一例を示す図である。FIG. 1 is a diagram showing an example of the configuration of a language recognition system 1 according to this embodiment. 図２は、サービスロボット２００のハードウェア構成の一例を説明する図である。FIG. 2 is a diagram illustrating an example of the hardware configuration of the service robot 200. 図３は、言語識別サーバ１００が備えるハードウェア構成の一例を説明する図である。FIG. 3 is a diagram illustrating an example of a hardware configuration included in the language identification server 100. 図４は、サービスロボット２００が備える機能の一例を説明する図である。FIG. 4 is a diagram illustrating an example of the functions of the service robot 200. 図５は、言語識別サーバ１００が備える機能の一例を説明する図である。FIG. 5 is a diagram illustrating an example of functions provided in the language identification server 100. 図６は、言語識別処理の一例を説明するフローチャートである。FIG. 6 is a flowchart illustrating an example of language identification processing. 図７は、言語識別処理の一例を説明するフローチャートである。FIG. 7 is a flowchart illustrating an example of language identification processing. 図８は、言語識別処理の具体例を説明する図である。FIG. 8 is a diagram illustrating a specific example of language identification processing. 図９は、データベース更新処理の一例を説明するフローチャートである。FIG. 9 is a flowchart illustrating an example of the database update process. 図１０は、データベース更新画面の一例を示す図である。FIG. 10 is a diagram showing an example of the database update screen. 図１１は、サービスロボット２００による外国人観光客への観光案内の一例を説明する図である。FIG. 11 is a diagram illustrating an example of tourist guidance to foreign tourists by the service robot 200.

本発明を実施するための形態につき、以下図面を用いて説明する。 Embodiments for carrying out the present invention will be described below with reference to the drawings.

図１は、本実施形態に係る言語認識システム１の構成の一例を示す図である。同図に示すように、言語認識システム１は、言語の識別を行う情報処理装置である言語識別サーバ１００と、サービスロボット２００と、無線アクセスポイント３００とを含んで構成されている。言語認識システム１は、主に国外からの観光客（以下、単に観光客という）に対応する業務（以下、本件業務という）において、その観光客が発した言葉の言語を識別する言語認識システムである。 FIG. 1 is a diagram showing an example of the configuration of a language recognition system 1 according to this embodiment. As shown in the figure, the language recognition system 1 is configured to include a language identification server 100, which is an information processing device for performing language identification, a service robot 200, and a wireless access point 300. The language recognition system 1 is a language recognition system that identifies the language of a word spoken by a tourist in a business (hereinafter referred to as the business) that mainly deals with tourists from abroad (hereinafter simply referred to as tourists). is there.

なお、観光客が発する言葉の言語には、日本語、英語、中国語、韓国語などの様々な言語（以下、対象言語という）がありうるが、本実施形態の言語認識システム１は、日本語（以下、第１の言語という）、及び英語（以下、第２の言語という）を識別するものとする。 The language spoken by the tourist may be various languages such as Japanese, English, Chinese, and Korean (hereinafter referred to as target language). However, the language recognition system 1 of the present embodiment is A word (hereinafter referred to as a first language) and English (hereinafter referred to as a second language) shall be identified.

サービスロボット２００は、観光客が発した音声を認識し、認識した音声を言語識別サーバ１００に送信する、自律的な移動が可能な情報処理装置である。また、サービスロボット２００は、言語識別サーバ１００から送信されてきた各種の情報に基づき音声を生成し、生成した音声を発することで、観光客との対話を行う。 The service robot 200 is an information processing device capable of autonomous movement, which recognizes a voice uttered by a tourist and transmits the recognized voice to the language identification server 100. In addition, the service robot 200 generates a voice based on various information transmitted from the language identification server 100, and emits the generated voice to interact with the tourist.

無線アクセスポイント３００は、サービスロボット２００と言語識別サーバ１００との
間の通信を制御する情報処理装置である。無線アクセスポイント３００は、サービスロボット２００と無線通信網５等を介して通信可能に接続し、また、言語識別サーバ１００と有線ＬＡＮ４等の有線の通信網、又は無線の通信網を介して通信可能に接続している。 The wireless access point 300 is an information processing device that controls communication between the service robot 200 and the language identification server 100. The wireless access point 300 is communicatively connected to the service robot 200 via the wireless communication network 5 or the like, and can communicate with the language identification server 100 via a wired communication network such as the wired LAN 4 or a wireless communication network. Connected to.

（ハードウェア構成）
次に、言語認識システム１を構成する各情報処理装置のハードウェア構成について説明する。 (Hardware configuration)
Next, the hardware configuration of each information processing device that constitutes the language recognition system 1 will be described.

図２は、サービスロボット２００のハードウェア構成の一例を説明する図である。同図に示すように、サービスロボット２００は、ＣＰＵ（Central Processing Unit）等から
なるプロセッサ２１と、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、又はＮＶＲＡＭ（Non-Volatile RAM）等の主記憶装置や、ハードディスク（Hard Disk Drive）やＳＳＤ（Solid State Drive）等の補助記憶装置からなる記憶装置２２と、集音装置２３と、スピーカ２４と、カメラ２５と、移動装置２６と、無線アクセスポイント３００を介して言語識別サーバ１００と通信する通信装置２７とを備える。 FIG. 2 is a diagram illustrating an example of the hardware configuration of the service robot 200. As shown in the figure, the service robot 200 includes a processor 21 including a CPU (Central Processing Unit) and a main unit such as a RAM (Random Access Memory), a ROM (Read Only Memory), or an NVRAM (Non-Volatile RAM). A storage device, a storage device 22 including a storage device and an auxiliary storage device such as a hard disk (Hard Disk Drive) and an SSD (Solid State Drive), a sound collection device 23, a speaker 24, a camera 25, a mobile device 26, and wireless access. The communication device 27 that communicates with the language identification server 100 via the point 300 is provided.

記憶装置２２は、サービスロボット２００が行う対話の内容に関する情報（シナリオデータ）や、シナリオデータに基づき行う対話の順序や応答方法を制御するプログラム（シナリオ動作プログラム）を記憶している。 The storage device 22 stores information (scenario data) about the content of the dialogue performed by the service robot 200, and a program (scenario operation program) that controls the order of the dialogue and the response method based on the scenario data.

集音装置２３は、観光客の音声をデジタルデータに変換する。集音装置２３は、例えば、音声を電気信号に変換するマイクロフォンと、マイクロフォンから出力された電気信号をデジタルデータに変換するＡＤコンバータとを含んで構成される。本実施形態では、集音装置２３は指向性の異なる複数のマイクロフォンを備え、これらのマイクロフォンにより複数の方向からの音声が入力されるものとする。 The sound collection device 23 converts a tourist's voice into digital data. The sound collection device 23 is configured to include, for example, a microphone that converts voice into an electric signal and an AD converter that converts the electric signal output from the microphone into digital data. In the present embodiment, the sound collecting device 23 includes a plurality of microphones having different directivities, and sounds from a plurality of directions are input by these microphones.

スピーカ２４は、音声を出力する。スピーカ２４は、例えば、言語識別サーバ１００から受信した、音声に関する情報（デジタルデータ）をアナログ電気信号に変換するＤＡコンバータと、変換されたアナログ電気信号を増幅するアンプ（amplifier）と、増幅され
たアナログ電気信号を物理振動に変換して音声を生成するスピーカーとを含んで構成される。 The speaker 24 outputs sound. The speaker 24 is, for example, a DA converter that converts information (digital data) related to voice received from the language identification server 100 into an analog electric signal, an amplifier that amplifies the converted analog electric signal, and an amplified signal. And a speaker that converts an analog electric signal into physical vibration to generate sound.

カメラ２５は、所定の範囲の被写体を撮影する撮影装置であり、例えば、サービスロボット２００に相対する人の位置や動きなどを認識する。 The camera 25 is a photographing device that photographs a subject in a predetermined range, and recognizes, for example, the position or movement of a person facing the service robot 200.

移動装置２６は、サービスロボット２００を走行させるための装置であり、例えば、モータ等の動力機構や、車輪等の操舵機構から構成される。 The moving device 26 is a device for causing the service robot 200 to travel, and includes, for example, a power mechanism such as a motor and a steering mechanism such as wheels.

図３は、言語識別サーバ１００が備えるハードウェア構成の一例を説明する図である。同図に示すように、言語識別サーバ１００は、ＣＰＵ（Central Processing Unit）等の
プロセッサ１１と、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、又はＮＶＲＡＭ（Non-Volatile RAM）等からなる主記憶装置１２、ハードディスクやＳＳＤ（Solid State Drive）等からなる補助記憶装置１３と、タッチパネルや操作ボタン等の
入力装置１４と、液晶ディスプレイ、プリンタ等の出力装置１５と、無線アクセスポイント３００を介してサービスロボット２００と通信するための通信装置１６とを備える。 FIG. 3 is a diagram illustrating an example of a hardware configuration included in the language identification server 100. As shown in FIG. 1, the language identification server 100 includes a processor 11 such as a CPU (Central Processing Unit) and a RAM (Random Access Memory), a ROM (Read Only Memory), an NVRAM (Non-Volatile RAM), or the like. Via the main storage device 12, an auxiliary storage device 13 such as a hard disk and SSD (Solid State Drive), an input device 14 such as a touch panel and operation buttons, an output device 15 such as a liquid crystal display and a printer, and a wireless access point 300. And a communication device 16 for communicating with the service robot 200.

（機能）
次に、各情報処理装置が備える機能について説明する。 (function)
Next, the function of each information processing apparatus will be described.

図４は、サービスロボット２００が備える機能の一例を説明する図である。同図に示す
ように、サービスロボット２００は、音声取得部２１０、音声送信部２２０、情報受信部２３０、及び応答部２４０を備える。 FIG. 4 is a diagram illustrating an example of the functions of the service robot 200. As shown in the figure, the service robot 200 includes a voice acquisition unit 210, a voice transmission unit 220, an information reception unit 230, and a response unit 240.

音声取得部２１０は、音声を取得する。具体的には、音声取得部２１０は、集音装置２３に入力された音から雑音等を除去することにより、人の音声のみを抽出する。 The voice acquisition unit 210 acquires voice. Specifically, the voice acquisition unit 210 extracts only the human voice by removing noise and the like from the sound input to the sound collection device 23.

音声送信部２２０は、音声取得部２１０が抽出した音声の情報を言語識別サーバ１００に送信する。 The voice transmitting unit 220 transmits the voice information extracted by the voice acquiring unit 210 to the language identification server 100.

情報受信部２３０は、言語識別サーバ１００から送信されてくる、言語識別サーバ１００が特定した言語（詳細は後述）に関する情報等を受信する。 The information receiving unit 230 receives information about the language (details will be described later) specified by the language identification server 100, which is transmitted from the language identification server 100.

応答部２４０は、音声取得部２１０が前記取得した音声に対応する音声を、言語識別サーバ１００が前記特定した言語に基づき出力する。 The response unit 240 outputs the voice corresponding to the voice acquired by the voice acquisition unit 210 based on the language specified by the language identification server 100.

以上のサービスロボット２００の機能は、サービスロボット２００のハードウエアによって、もしくは、サービスロボット２００のプロセッサ２１が、記憶装置２２に記憶されているプログラムを読み出して実行することにより実現される。 The functions of the service robot 200 described above are realized by the hardware of the service robot 200 or by the processor 21 of the service robot 200 reading and executing the program stored in the storage device 22.

図５は、言語識別サーバ１００が備える機能の一例を説明する図である。同図に示すように、言語識別サーバ１００は、音声取得部１１０、第１言語処理部１２０、第２言語処理部１３０、言語識別部１４０、応答生成部１５０、及びデータベース更新部１６０を備える。 FIG. 5 is a diagram illustrating an example of functions provided in the language identification server 100. As shown in the figure, the language identification server 100 includes a voice acquisition unit 110, a first language processing unit 120, a second language processing unit 130, a language identification unit 140, a response generation unit 150, and a database updating unit 160.

音声取得部１１０は、サービスロボット２００から音声の情報を取得する。 The voice acquisition unit 110 acquires voice information from the service robot 200.

第１言語処理部１２０は、第１言語音声認識部１２１、第１対応区間算出部１２２、第１信頼度算出部１２３を備える。また、第１言語処理部１２０は、第１言語一般語彙情報１２５、第１言語専門語彙情報１２６、及び第１言語音節情報１２７の各データベースを管理している。 The first language processing unit 120 includes a first language speech recognition unit 121, a first corresponding section calculation unit 122, and a first reliability calculation unit 123. The first language processing unit 120 also manages databases of the first language general vocabulary information 125, the first language specialized vocabulary information 126, and the first language syllable information 127.

第１言語音声認識部１２１は、第１言語一般翻訳部１２１１、及び第１言語専門翻訳部１２１２を備える。 The first-language speech recognition unit 121 includes a first-language general translation unit 1211 and a first-language specialized translation unit 1212.

第１言語一般翻訳部１２１１は、第１の言語における第１の語彙に関する情報である第１言語一般語彙情報に基づき、前記取得した音声に対応する前記第１の言語の言葉である第１言語一般翻訳句を生成する。なお、第１の語彙は、本実施形態では、第１言語で一般に発せられる、本件業務において使用頻度が高い（本件業務で日常的に発せられる、又は本件業務で聞くことが多い）語彙であるものとする。 The first language general translation unit 1211 is based on the first language general vocabulary information, which is information about the first vocabulary in the first language, and is the first language that is the word in the first language corresponding to the acquired voice. Generate a general translation phrase. In the present embodiment, the first vocabulary is a vocabulary that is commonly spoken in the first language and that is frequently used in the subject work (which is commonly used in the subject work or is often heard in the subject work). I shall.

第１言語専門翻訳部１２１２は、前記第１の言語における第２の語彙に関する情報である第１言語専門語彙情報に基づき、前記取得した音声に対応する前記第１の言語の言葉である第１言語専門翻訳句を生成する。なお、第２の語彙は、本実施形態では、本件業務において使用頻度が低い（本件業務で発声されることが少ない、又は本件業務では聞き慣れない）語彙であるとする。例えば、第２の語彙は、日本に来訪する外国人が発する日本語の固有名詞（地名等）など、一般に発音が人によって不明確となるような語彙を含む。 The first-language specialized translation unit 1212 is a word in the first language corresponding to the acquired voice, based on first-language specialized vocabulary information that is information regarding a second vocabulary in the first language, and is a word in the first language. Generate a language-specific translation phrase. In the present embodiment, the second vocabulary is assumed to be a vocabulary that is not frequently used in the present work (it is rarely uttered in the present work or is unfamiliar with the present work). For example, the second vocabulary includes a vocabulary whose pronunciation is generally unclear by a person, such as a Japanese proper noun (a place name, etc.) issued by a foreigner visiting Japan.

第１対応区間算出部１２２は、前記生成した第１言語一般翻訳句、及び前記生成した第１言語専門翻訳句の間で相互に対応する部分である第１言語対応部分を抽出する。 The first corresponding section calculation unit 122 extracts a first language corresponding part which is a part corresponding to each other between the generated first language general translation phrase and the generated first language specialized translation phrase.

例えば、前記第１対応区間算出部は、前記第１言語対応部分を、前記第１言語一般翻訳句を表す文字列と前記第１言語専門翻訳句を表す文字列との共通性に基づき抽出する。また、例えば、前記第１対応区間算出部は、前記第１言語一般翻訳句における各単語の発声のタイミング、及び前記第１言語専門翻訳句における各単語の発声のタイミングを取得し、取得したこれらの単語の発声のタイミングの共通性に基づき、前記第１言語対応部分を抽出する。 For example, the first corresponding section calculation unit extracts the first language corresponding portion based on commonality between a character string representing the first language general translation phrase and a character string representing the first language specialized translation phrase. .. Further, for example, the first corresponding section calculation unit acquires and acquires the timing of utterance of each word in the first language general translation phrase and the timing of utterance of each word in the first language specialized translation phrase. The first language corresponding portion is extracted based on the commonality of the utterance timings of the words.

第１言語一般語彙情報１２５は、第１の言語における第１の語彙を記憶したデータベース（言語モデル）である。具体的には、第１言語一般語彙情報１２５は、第１の言語の第１の語彙における単語、定型句、文等の情報を記憶している。 The first language general vocabulary information 125 is a database (language model) that stores the first vocabulary in the first language. Specifically, the first language general vocabulary information 125 stores information such as words, fixed phrases and sentences in the first vocabulary of the first language.

第１言語専門語彙情報１２６は、前記第１の言語における第２の語彙を記憶したデータベース（言語モデル）である。具体的には、第１言語専門語彙情報１２６は、第１の言語の第２の語彙における単語、定型句、文等の情報を記憶している。 The first language specialized vocabulary information 126 is a database (language model) in which the second vocabulary in the first language is stored. Specifically, the first language specialized vocabulary information 126 stores information such as words, fixed phrases, and sentences in the second vocabulary of the first language.

第１言語音節情報１２７は、第１の言語において使用される音声の情報（音響モデル）を記憶しているデータベースである。第１言語音節情報１２７には、例えば、第１の言語における音節（発声のタイミング等を含む）の情報が含まれる。 The first language syllable information 127 is a database that stores information (acoustic model) of voices used in the first language. The first language syllable information 127 includes, for example, information on syllables (including utterance timing and the like) in the first language.

第１信頼度算出部１２３は、後述する第１信頼度を算出する。 The first reliability calculation unit 123 calculates a first reliability described later.

第２言語処理部１３０は、第２言語音声認識部１３１、第２対応区間算出部１３２、及び第２信頼度算出部１３３を備える。また、第２言語処理部１３０は、第２言語一般語彙情報１３５、第２言語専門語彙情報１３６、及び第２言語音節情報１３７を管理している。 The second language processing unit 130 includes a second language speech recognition unit 131, a second corresponding section calculation unit 132, and a second reliability calculation unit 133. The second language processing unit 130 also manages second language general vocabulary information 135, second language specialized vocabulary information 136, and second language syllable information 137.

第２言語音声認識部１３１は、第２言語一般翻訳部１３１１、及び第２言語専門翻訳部１３１２を備える。 The second language speech recognition unit 131 includes a second language general translation unit 1311 and a second language specialized translation unit 1312.

第２言語一般翻訳部１３１１は、第２の言語における第３の語彙に関する情報である第２言語一般語彙情報に基づき、前記取得した音声に対応する前記第２の言語の言葉である第２言語一般翻訳句を生成する。なお、第３の語彙は、本実施形態では、第２言語で一般に発せられる、本件業務において使用頻度が高い（本件業務で日常的に発せられる、又は本件業務で聞くことが多い）語彙であるとする。 The second language general translation unit 1311 is based on the second language general vocabulary information, which is information about the third vocabulary in the second language, and is the second language that is the word in the second language corresponding to the acquired voice. Generate a general translation phrase. Note that, in the present embodiment, the third vocabulary is a vocabulary commonly spoken in the second language and frequently used in the present work (pronounced routinely in the present work or often heard in the present work). And

第２言語専門翻訳部１３１２は、前記第２の言語における第４の語彙に関する情報である第２言語専門語彙情報に基づき、前記取得した音声に対応する前記第２の言語の言葉である第２言語専門翻訳句を生成する。なお、第４の語彙は、本実施形態では、本件業務において使用頻度が低い（本件業務で発声されることが少ない、又は本件業務では聞き慣れない）語彙であるとする。例えば、第４の語彙は、日本に来訪する外国人が発する日本語の固有名詞（地名等）など、一般に発音が人によって不明確となるような語彙を含む。 The second-language specialized translation unit 1312 is a word in the second language corresponding to the acquired voice, based on the second-language specialized vocabulary information that is information about the fourth vocabulary in the second language, and the second language is the second language. Generate a language-specific translation phrase. In the present embodiment, the fourth vocabulary is assumed to be a vocabulary that is used less frequently in the present work (which is less likely to be uttered in the present work or unfamiliar with the present work). For example, the fourth vocabulary includes a vocabulary whose pronunciation is generally unclear by a person, such as a Japanese proper noun (such as a place name) issued by a foreigner who visits Japan.

第２対応区間算出部１３２は、前記生成した第２言語一般翻訳句、及び前記生成した第２言語専門翻訳句の間で相互に対応する部分である第２言語対応部分を抽出する。 The second corresponding section calculation unit 132 extracts a second language corresponding part which is a part corresponding to each other between the generated second language general translation phrase and the generated second language specialized translation phrase.

例えば、前記第２対応区間算出部１３２は、前記第２言語対応部分を、前記第２言語一般翻訳句を表す文字列と前記第２言語専門翻訳句を表す文字列との共通性に基づき抽出する。また、例えば、前記第２対応区間算出部１３２は、前記第２言語一般翻訳句における各単語の発声のタイミング、及び前記第２言語専門翻訳句における各単語の発声のタイミングを取得し、取得したこれらの単語の発声のタイミングの共通性に基づき、前記第２言
語対応部分を抽出する。 For example, the second corresponding section calculation unit 132 extracts the second language corresponding portion based on the commonality between the character string representing the second language general translation phrase and the character string representing the second language specialized translation phrase. To do. Further, for example, the second corresponding section calculation unit 132 acquires and acquires the timing of utterance of each word in the second language general translation phrase and the timing of utterance of each word in the second language specialized translation phrase. The portion corresponding to the second language is extracted based on the commonality of the utterance timings of these words.

第２言語一般語彙情報１３５は、第２の言語における第３の語彙を記憶したデータベース（言語モデル）である。具体的には、第２言語一般語彙情報１３５は、第２の言語の第３の語彙における単語、定型句、文等の情報を、文字列として記憶している。 The second language general vocabulary information 135 is a database (language model) that stores the third vocabulary in the second language. Specifically, the second language general vocabulary information 135 stores information such as words, fixed phrases, and sentences in the third vocabulary of the second language as a character string.

第２言語専門語彙情報１３６は、前記第２の言語における第４の語彙を記憶したデータベース（言語モデル）である。具体的には、第２言語専門語彙情報１３６は、第２の言語の第４の語彙における単語、定型句、文等の情報を、文字列として記憶している。 The second language specialized vocabulary information 136 is a database (language model) that stores the fourth vocabulary in the second language. Specifically, the second language specialized vocabulary information 136 stores, as a character string, information such as a word, a fixed phrase, and a sentence in the fourth vocabulary of the second language.

第２言語音節情報１３７は、第２の言語において使用される音声の情報（音響モデル）を記憶しているデータベースである。第２言語音節情報１３７には、例えば、第２の言語における音節（発声のタイミング等を含む）の情報が含まれる。 The second language syllable information 137 is a database that stores information (acoustic model) of voices used in the second language. The second language syllable information 137 includes, for example, information on syllables (including vocalization timing) in the second language.

第２信頼度算出部１３３は、後述する第２信頼度を算出する。 The second reliability calculation unit 133 calculates a second reliability described later.

言語識別部１４０は、前記抽出した前記第１言語対応部分、及び前記抽出した前記第２言語対応部分に基づき、前記入力された音声の言語を特定する。 The language identifying unit 140 identifies the language of the input voice based on the extracted first language corresponding part and the extracted second language corresponding part.

具体的には、前記言語識別部１４０は、前記第１言語対応部分の、前記第１の言語としての確からしさを示す指標である第１信頼度と、前記第２言語対応部分の、前記第２の言語としての確からしさを示す指標である第２信頼度とに基づき、前記取得した音声の言語を特定する。 Specifically, the language identifying unit 140 uses the first reliability that is an index indicating the certainty of the first language corresponding portion as the first language and the second reliability of the second language corresponding portion. The language of the acquired voice is specified based on the second reliability, which is an index indicating the certainty as the second language.

なお、信頼度（第１信頼度、第２信頼度）は各単語ごとに算出され、各信頼度は所定の範囲の値（又はスコア。例えば、0.0〜1.0）を有する。この信頼度の値が下限に近い（例えば、0.0に近い）ほど、当該算出対象となった単語と同じ程度のスコアを有する他の単
語（単語仮説）が多く存在する。 The reliability (first reliability, second reliability) is calculated for each word, and each reliability has a value (or score, for example, 0.0 to 1.0) within a predetermined range. The closer the value of this reliability is to the lower limit (for example, closer to 0.0), the more other words (word hypotheses) that have the same score as the word to be calculated.

また、信頼度は、例えば、認識した音声における各単語の単語仮説についての事後確率を算出することによって求められる。この算出方法の詳細は、例えば、「Confidence Measures for Large Vocabulary Continuous Speech Recognition, IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001」に開示されている。 Further, the reliability is obtained, for example, by calculating the posterior probability of the word hypothesis of each word in the recognized voice. Details of this calculation method are disclosed, for example, in "Confidence Measures for Large Vocabulary Continuous Speech Recognition, IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 3, MARCH 2001".

応答生成部１５０は、前記取得した音声に対応する、前記特定した言語による音声を出力する。 The response generation unit 150 outputs a voice in the specified language corresponding to the obtained voice.

データベース更新部１６０は、前記第１言語一般語彙情報、前記第１言語専門語彙情報、前記第２言語一般語彙情報、又は前記第２言語専門語彙情報の入力を受け付け、入力を受け付けた情報を出力する。 The database updating unit 160 accepts an input of the first language general vocabulary information, the first language specialized vocabulary information, the second language general vocabulary information, or the second language specialized vocabulary information, and outputs the accepted information. To do.

また、データベース更新部１６０は、専門言語更新部１６１、及び専門言語変換部１６２を備える。 The database updating unit 160 also includes a specialized language updating unit 161 and a specialized language conversion unit 162.

専門言語更新部１６１は、前記第１の言語の言葉を取得し、取得した前記第１の言語の一部を、言葉の使用頻度に関する情報に基づき抽出し、抽出した前記第１の言語の一部を前記第１言語専門語彙情報として記憶し、前記第２の言語の言葉を取得し、取得した前記第２の言語の一部を、言葉の使用頻度に関する情報に基づき抽出し、抽出した前記第２の言語の一部を前記第２言語専門語彙情報として記憶する。 The specialized language updating unit 161 acquires the word in the first language, extracts a part of the acquired first language based on information on the frequency of use of the word, and extracts one of the extracted first languages. Part is stored as the first language specialized vocabulary information, the words of the second language are acquired, a part of the acquired second language is extracted based on information about the frequency of use of the words, and the extracted words are extracted. A part of the second language is stored as the second language specialized vocabulary information.

専門言語変換部１６２は、前記抽出した前記第１の言語の言葉の一部を前記第２の言語の言葉に変換し、変換した前記第２の言語の言葉を前記第２言語専門語彙情報として記憶し、前記抽出した前記第２の言語の言葉の一部を前記第１の言語の言葉に変換し、変換した前記第１の言語の言葉を前記第１言語専門語彙情報として記憶する。 The specialized language conversion unit 162 converts a part of the extracted words of the first language into the words of the second language, and converts the converted words of the second language as the second language specialized vocabulary information. The part of the stored words of the second language that has been stored is converted into the words of the first language, and the converted words of the first language are stored as the first language specialized vocabulary information.

以上に説明した言語識別サーバ１００の機能は、言語識別サーバ１００のハードウエアによって、もしくは、言語識別サーバ１００のプロセッサ１１が、主記憶装置１２や補助記憶装置１３に記憶されているプログラムを読み出して実行することにより実現される。 The function of the language identification server 100 described above is performed by the hardware of the language identification server 100, or the processor 11 of the language identification server 100 reads a program stored in the main storage device 12 or the auxiliary storage device 13. It is realized by executing.

また、以上に説明した言語識別サーバ１００の機能の一部は、サービスロボット２００に設けてもよい。また、サービスロボット２００の機能の一部も、言語識別サーバ１００に設けてもよい。 Further, some of the functions of the language identification server 100 described above may be provided in the service robot 200. Further, some of the functions of the service robot 200 may be provided in the language identification server 100.

（処理）
次に、言語認識システム１において行われる処理について説明する。 (processing)
Next, the processing performed in the language recognition system 1 will be described.

＜言語識別処理＞
図６、７は、言語認識システム１において行われる処理のうち、観光客が発した言葉の言語を識別する処理（以下、言語識別処理という）の一例を説明するフローチャートである（紙面の都合上、２つの図に分けている）。言語識別処理は、例えば、言語識別サーバ１００及びサービスロボット２００に電源が投入された場合に開始される。 <Language identification processing>
6 and 7 are flowcharts illustrating an example of a process (hereinafter, referred to as a language identification process) of identifying a language of a word spoken by a tourist among the processes performed in the language recognition system 1 (for convenience of space). It is divided into two figures). The language identification processing is started, for example, when the language identification server 100 and the service robot 200 are powered on.

同図に示すように、まずサービスロボット２００は、観光客からの音声の受信を待機する（Ｓ１）。 As shown in the figure, the service robot 200 first waits for reception of voice from a tourist (S1).

音声が受信されると、サービスロボット２００は、受信した音声から、観光客の音声以外の音（物音や雑音等）を除去することにより、観光客の音声のみを取得する（Ｓ３）。具体的には、サービスロボット２００は、取得した複数のチャンネルの音声の位相差に基づき、雑音を除去する。雑音等が除去された音声は、例えば、サービスロボット２００が正面方向から取得した音声（１チャンネルの音声）のみとなる。 When the voice is received, the service robot 200 removes sounds other than the voice of the tourist (such as noise and noise) from the received voice to acquire only the voice of the tourist (S3). Specifically, the service robot 200 removes noise based on the acquired phase differences of the voices of the plurality of channels. The sound from which noise and the like have been removed is, for example, only the sound (one-channel sound) acquired by the service robot 200 from the front direction.

サービスロボット２００は、Ｓ３で取得した音声を、言語識別サーバ１００に送信する（Ｓ５）。 The service robot 200 transmits the voice acquired in S3 to the language identification server 100 (S5).

言語識別サーバ１００は、サービスロボット２００から音声を受信すると（Ｓ７）、第１言語一般語彙情報１２５に基づき、Ｓ７で受信した音声に対応する、第１の言語の言葉の文字列（第１言語一般翻訳句）を生成する（Ｓ９）。 When the language identification server 100 receives a voice from the service robot 200 (S7), based on the first language general vocabulary information 125, the character string of the word of the first language (the first language corresponding to the voice received in S7) A general translation phrase is generated (S9).

また、言語識別サーバ１００は、第１言語専門語彙情報１２６に基づき、Ｓ７で受信した音声に対応する、第１の言語の言葉の文字列（第１言語専門翻訳句）を生成する（Ｓ１１）。 In addition, the language identification server 100 generates a character string of a word in the first language (first language specialized translation phrase) corresponding to the voice received in S7, based on the first language specialized vocabulary information 126 (S11). ..

そして、言語識別サーバ１００は、Ｓ９で生成した第１言語一般翻訳句と、Ｓ１１で生成した第１言語専門翻訳句との間で相互に対応する部分（第１言語対応部分）を抽出する（Ｓ１３）。具体的には、言語識別サーバ１００は、第１言語対応部分として、第１言語一般翻訳句に対応する文字列と第１言語専門翻訳句に対応する文字列との共通する部分を抽出する。 Then, the language identification server 100 extracts a portion (first language corresponding portion) that mutually corresponds between the first language general translation phrase generated in S9 and the first language specialized translation phrase generated in S11 ( S13). Specifically, the language identification server 100 extracts, as the first language corresponding part, a common part of the character string corresponding to the first language general translation phrase and the character string corresponding to the first language specialized translation phrase.

そして、言語識別サーバ１００は、Ｓ１３で抽出した第１言語抽出分の第１信頼度を算出する（Ｓ１５）。 Then, the language identification server 100 calculates the first reliability of the first language extraction extracted in S13 (S15).

また、言語識別サーバ１００は、第２言語一般語彙情報に基づき、Ｓ７で受信した音声に対応する、第２の言語の言葉の文字列（第２言語一般翻訳句）を生成する（Ｓ１７）。また、言語識別サーバ１００は、第２言語専門語彙情報に基づき、Ｓ７で受信した音声に対応する、第２の言語の言葉の文字列（第２言語専門翻訳句）を生成する（Ｓ１９）。 In addition, the language identification server 100 generates a character string of a word in the second language (second language general translation phrase) corresponding to the voice received in S7, based on the second language general vocabulary information (S17). In addition, the language identification server 100 generates a character string of a word in the second language (second language specialized translation phrase) corresponding to the voice received in S7 based on the second language specialized vocabulary information (S19).

また、言語識別サーバ１００は、Ｓ１７で生成した第２の言語の文字列と、Ｓ１９で生成した第２の言語の文字列との間で相互に対応する部分（第２言語対応部分）を抽出する（Ｓ２１）。具体的には、第２言語対応部分として、第２言語一般翻訳句に対応する文字列と第２言語専門翻訳句に対応する文字列との共通部分を抽出する。 Further, the language identification server 100 extracts a portion (second language corresponding portion) in which the character string of the second language generated in S17 and the character string of the second language generated in S19 correspond to each other. Yes (S21). Specifically, the common part of the character string corresponding to the second language general translation phrase and the character string corresponding to the second language specialized translation phrase is extracted as the second language corresponding part.

そして、言語識別サーバ１００は、Ｓ２１で抽出した第２言語対応部分の第２信頼度を算出する（Ｓ２３）。 Then, the language identification server 100 calculates the second reliability of the second language corresponding part extracted in S21 (S23).

言語識別サーバ１００は、Ｓ１５で算出した第１信頼度と、Ｓ２３で算出した第２信頼度とに基づき、Ｓ７で受信した音声の言語を特定する（Ｓ２５）。具体的には、言語識別サーバ１００は、第１信頼度及び第２信頼度を比較し、信頼度が高かった方の言語を、Ｓ７で受信した音声の言語として記憶する。 The language identification server 100 identifies the language of the voice received in S7 based on the first reliability calculated in S15 and the second reliability calculated in S23 (S25). Specifically, the language identification server 100 compares the first reliability and the second reliability, and stores the language with the higher reliability as the language of the voice received in S7.

続いて、言語識別サーバ１００は、Ｓ７で受信した音声の意味内容を解析する（Ｓ２７）。 Subsequently, the language identification server 100 analyzes the meaning content of the voice received in S7 (S27).

具体的には、言語識別サーバ１００は、Ｓ７で受信した音声に対応する、Ｓ２５で特定した言語の文字列を取得する。例えば、Ｓ２５で特定した言語が第１の言語である場合は、言語識別サーバ１００は、Ｓ９又はＳ１１で生成した文字列を取得し、Ｓ２５で特定した言語が第２の言語である場合は、言語識別サーバ１００は、Ｓ１７又はＳ１９で生成した文字列を取得する。そして、言語識別サーバ１００は、取得した文字列の意味内容を解析する。 Specifically, the language identification server 100 acquires the character string of the language specified in S25, which corresponds to the voice received in S7. For example, when the language specified in S25 is the first language, the language identification server 100 acquires the character string generated in S9 or S11, and when the language specified in S25 is the second language, The language identification server 100 acquires the character string generated in S17 or S19. Then, the language identification server 100 analyzes the meaning content of the acquired character string.

言語識別サーバ１００は、Ｓ２７で解析した内容を、サービスロボット２００に送信する（Ｓ２９） The language identification server 100 transmits the content analyzed in S27 to the service robot 200 (S29).

サービスロボット２００は、言語識別サーバ１００から受信した情報に基づき、音声を生成する（Ｓ３１、Ｓ３３）。具体的には、例えば、サービスロボット２００は、受信した情報、及びシナリオ動作プログラムに従って、シナリオデータが示す対話の情報を音声に変換する。 The service robot 200 generates a voice based on the information received from the language identification server 100 (S31, S33). Specifically, for example, the service robot 200 converts the information of the dialogue indicated by the scenario data into voice according to the received information and the scenario operation program.

そして、サービスロボット２００は、Ｓ３３で生成した音声を出力する（Ｓ３５）。以上で言語識別処理は終了する（Ｓ３７）。 Then, the service robot 200 outputs the voice generated in S33 (S35). With that, the language identification process ends (S37).

＜言語識別処理の具体例＞
ここで、言語識別処理の具体例について説明する。
図８は、言語識別処理の具体例を説明する図である。同図に示すように、まず、言語識別サーバ１００が、"How to get to KASAI RINKAI KOEN?"という音声の情報（以下、本音声という）をサービスロボット２００から受信したとする（符号１００１）。 <Specific example of language identification processing>
Here, a specific example of the language identification processing will be described.
FIG. 8 is a diagram illustrating a specific example of language identification processing. As shown in the figure, first, it is assumed that the language identification server 100 receives voice information "How to get to KASAI RINKAI KOEN?" (hereinafter referred to as the voice) from the service robot 200 (reference numeral 1001).

すると、言語識別サーバ１００は、第１言語専門語彙情報１２６に基づき、本音声の言語が第１の言語（日本語）であると仮定して、本音声を、「初月葛西臨海公園」という文字列に変換する（符号１００２）。なお、言語識別サーバ１００は、この文字列を、「初月」なる文字列と、「葛西臨海公園」なる文字列との組み合わせとして認識する。 Then, the language identification server 100 assumes that the language of this voice is the first language (Japanese) based on the first language specialized vocabulary information 126, and calls this voice "Hatsutsuki Kasai Rinkai Park". Convert to a character string (reference numeral 1002). The language identification server 100 recognizes this character string as a combination of the character string “first month” and the character string “Kasai Rinkai Park”.

また、言語識別サーバ１００は、第１言語一般語彙情報１２５に基づき、本音声の言語が第１の言語（日本語）であると仮定して、本音声を、「初月火災臨海公園」という文字列に変換する（符号１００３）。なお、言語識別サーバ１００は、この文字列を、「初月」なる文字列と、「火災臨海公園」なる文字列の組み合わせとして認識する。 Further, the language identification server 100 assumes that the language of this voice is the first language (Japanese) based on the first language general vocabulary information 125, and calls this voice "Hatsutsuki Fire Seaside Park". Convert to a character string (reference numeral 1003). The language identification server 100 recognizes this character string as a combination of the character string “first month” and the character string “fire coastal park”.

言語識別サーバ１００は、符号１００２に示した文字列と、符号１００３に示した文字列とに基づき、第１言語対応部分を抽出する（符号１００４）。すなわち、言語識別サーバ１００は、符号１００２に示した文字列と、符号１００３に示した文字列との間の、連続する共通の文字列である「初月」を、第１言語対応部分として抽出する。 The language identification server 100 extracts the first language corresponding part based on the character string shown by reference numeral 1002 and the character string shown by reference numeral 1003 (reference numeral 1004). That is, the language identification server 100 extracts “first month”, which is a continuous common character string between the character string shown by the reference numeral 1002 and the character string shown by the reference numeral 1003, as the first language corresponding part. To do.

なお、共通の文字列が複数存在する場合は、共通の文字列のうち、その文字列の長さが最大である共通の文字列が抽出される。後記する第２言語対応部分についても同様である。 When there are a plurality of common character strings, the common character string having the maximum length is extracted from the common character strings. The same applies to the second language corresponding part described later.

次に、言語識別サーバ１００は、第２言語専門語彙情報１３６に基づき、本音声の言語が第２の言語（英語）であると仮定して、本音声を、「How to get to KASAI RINKAI KOEN?」なる文字列に変換する（符号１００５）。なお、言語識別サーバ１００は、この文字列を、「How to get to」なる文字列と、「KASAI RINKAI KOEN」なる文字列との組み合わせとして認識する。 Next, the language identification server 100 assumes that the language of the main voice is the second language (English) based on the second language specialized vocabulary information 136, and outputs the main voice as "How to get to KASAI RINKAI KOEN. It is converted into a character string "?" (reference numeral 1005). The language identification server 100 recognizes this character string as a combination of the character string “How to get to” and the character string “KASAI RINKAI KOEN”.

また、言語識別サーバ１００は、第２言語一般語彙情報１３５に基づき、本音声の言語が第２の言語（英語）であると仮定して、本音声を、「How to get to as I bring eye hole?」という文字列に変換する（符号１００６）。なお、言語識別サーバ１００は、この文字列を、「How to get to」なる文字列と、「as I bring eye hole」なる文字列の組み合わせとして認識する。 In addition, the language identification server 100 assumes that the language of the main voice is the second language (English) based on the second language general vocabulary information 135, and outputs the main voice as "How to get to as I bring eye. It is converted into a character string "hole?" (reference numeral 1006). The language identification server 100 recognizes this character string as a combination of the character string “How to get to” and the character string “as I bring eye hole”.

言語識別サーバ１００は、符号１００５に示した文字列と、符号１００６に示した文字列とに基づき、第２言語対応部分を抽出する（符号１００７）。すなわち、言語識別サーバ１００は、符号１００５に示した文字列と、符号１００６に示した文字列との間の、連続する共通の文字列である「How to get to」を、第２言語対応部分として抽出する。 The language identification server 100 extracts the second language corresponding part based on the character string shown by reference numeral 1005 and the character string shown by reference numeral 1006 (reference numeral 1007). That is, the language identification server 100 determines that the continuous common character string “How to get to” between the character string 1005 and the character string 1006 is the second language corresponding part. To extract.

なお、言語識別サーバ１００は、上記のように文字列に基づき第２言語対応部分を抽出する代わりに、又はそれに加えて、第１言語一般翻訳句における各言葉の発声のタイミング、及び、第１言語専門翻訳句における各言葉の発声のタイミングに基づき（音響モデルに基づき）第２言語対応部分を抽出してもよい。すなわち、言語識別サーバ１００は、符号１００５の文字列の発声と、符号１００６の文字列の発声とでは、「How to get to」
の部分において各単語が発声されるタイミングや無音のタイミングが共通するので、「How to get to」を第２言語対応部分として抽出する。なお、このような方法は、第１言語
対応部分についても同様に採用してもよい。 The language identification server 100 may, instead of or in addition to extracting the second language corresponding part based on the character string as described above, utter the timing of uttering each word in the first language general translation phrase, and The second language corresponding part may be extracted based on the timing of utterance of each word in the language-specific translation phrase (based on the acoustic model). That is, the language identification server 100 determines "How to get to" between the utterance of the character string 1005 and the utterance of the character string 1006.
Since the timing at which each word is uttered and the timing at which there is no sound are common in the portion, "How to get to" is extracted as the second language corresponding portion. It should be noted that such a method may be similarly adopted for the portion corresponding to the first language.

そして、言語識別サーバ１００は、符号１００４に示した第１言語対応部分の信頼度（第１信頼度）と、符号１００７に示した第２言語対応部分の信頼度（第２信頼度）とを算出する（符号１００８）。 Then, the language identification server 100 sets the reliability (first reliability) of the first language corresponding part indicated by reference numeral 1004 and the reliability (second reliability) of the second language corresponding part indicated by reference numeral 1007. It is calculated (reference numeral 1008).

例えば、言語識別サーバ１００は、第１信頼度の方が第２信頼度より高いと判断した場合、第１信頼度を算出した第１言語対応部分の言語である第１の言語（日本語）が、符号１００１に示した音声に対応する言語であると認識する。 For example, when the language identification server 100 determines that the first reliability is higher than the second reliability, the first language (Japanese), which is the language of the first language corresponding part for which the first reliability is calculated. Is recognized as a language corresponding to the voice indicated by reference numeral 1001.

＜データベース更新処理＞
次に、言語認識システム１において行われる各データベースの更新処理について説明する。 <Database update process>
Next, the update processing of each database performed in the language recognition system 1 will be described.

図９は、データベースを更新する処理（以下、データベース更新処理という）の一例を説明するフローチャートである。データベース更新処理は、例えば、言語識別サーバ１００に所定の操作入力が行われた際に開始される。 FIG. 9 is a flowchart illustrating an example of a process of updating a database (hereinafter referred to as a database update process). The database update process is started, for example, when a predetermined operation input is performed on the language identification server 100.

同図に示すように、まず言語識別サーバ１００は、本件業務で使用される、第１の言語の言葉（例えば、文章、単語。以下、業務データという。）の入力をユーザから受け付ける（Ｓ４１）。なお、業務データは、言語識別サーバ１００が取得した音声（サービスロボット２００が受信した音声）に基づき生成された第１の言語であってもよい。 As shown in the figure, first, the language identification server 100 receives an input of a word in the first language (for example, a sentence or a word; hereinafter referred to as business data) used in the business of this case from the user (S41). .. Note that the business data may be the first language generated based on the voice acquired by the language identification server 100 (voice received by the service robot 200).

業務データの入力が受け付けられると、言語識別サーバ１００は、業務データから、第１の言語の第２の語彙（専門用語や、外国人が発する固有名詞等）を抽出し、抽出した第２の語彙を第１言語専門語彙情報１２６に記憶する（Ｓ４２）。なお、第２の語彙の抽出は、例えば、言語識別サーバ１００が記憶している、第１の言語における各単語の使用頻度に関する情報に基づき抽出される。例えば、言語識別サーバ１００は、業務データが示す単語の使用頻度が所定値以下、又は所定の範囲にある場合に、その業務データを第１言語専門語彙情報１２６に記憶する。 When the input of the business data is accepted, the language identification server 100 extracts the second vocabulary of the first language (technical terms, proper nouns and the like originated by foreigners) from the business data and extracts the second vocabulary. The vocabulary is stored in the first language specialized vocabulary information 126 (S42). The second vocabulary is extracted, for example, based on information stored in the language identification server 100 about the usage frequency of each word in the first language. For example, the language identification server 100 stores the business data in the first language specialized vocabulary information 126 when the frequency of use of the word indicated by the business data is less than or equal to a predetermined value or within a predetermined range.

また、言語識別サーバ１００は、Ｓ４２で抽出した第２の語彙を、第４の語彙に変換し、変換した第４の語彙を、第２言語専門語彙情報１３６に記憶する（Ｓ４３）。以上で処理は終了する（Ｓ４４）。 In addition, the language identification server 100 converts the second vocabulary extracted in S42 into a fourth vocabulary, and stores the converted fourth vocabulary in the second language specialized vocabulary information 136 (S43). With that, the process ends (S44).

なお、以上のデータベース更新処理は、第１の言語だけでなく第２の言語についても同様に行われる。すなわち、言語識別サーバ１００は、業務データから、第２の言語の第４の語彙（専門用語や、外国人が発する固有名詞等）を抽出し、抽出した第４の語彙を第２言語専門語彙情報１３６に記憶する。また、言語識別サーバ１００は、抽出した第４の語彙を第２の語彙に変換し、変換した第２の語彙を、第１言語専門語彙情報１２６に記憶する。 The database update process described above is performed not only for the first language but also for the second language. That is, the language identification server 100 extracts the fourth vocabulary of the second language (technical terms, proper nouns issued by foreigners, etc.) from the business data, and extracts the fourth vocabulary as the second language specialized vocabulary. Information 136 is stored. In addition, the language identification server 100 converts the extracted fourth vocabulary into the second vocabulary, and stores the converted second vocabulary in the first language specialized vocabulary information 126.

＜データベース更新画面＞
ここで、データベース更新処理において表示される画面について説明する。 <Database update screen>
Here, the screen displayed in the database update process will be described.

図１０は、データベース更新処理において表示される、各データベースを編集するための画面（以下、データベース更新画面という）の一例を示す図である。同図に示すように、データベース更新画面は、一般言語モデル編集画面５００、及び専門言語モデル編集画面６００を備える。 FIG. 10 is a diagram showing an example of a screen for editing each database (hereinafter referred to as a database update screen) displayed in the database update process. As shown in the figure, the database update screen includes a general language model edit screen 500 and a specialized language model edit screen 600.

一般言語モデル編集画面５００は、第１の語彙、又は第３の語彙の入力を受け付ける。一般言語モデル編集画面５００は、第１言語入力欄５０１、第２言語入力欄５０２、及び登録確定欄５０３を備える。第１言語入力欄５０１には、第１言語の文字列（第１の語彙）の入力が受け付けられる。第２言語入力欄５０２には、第２言語の文字列（第３の語彙）の入力が受け付けられる。登録確定欄５０３には、第１言語入力欄５０１に入力された文字列を第１言語一般語彙情報１２５に記憶させ、第２言語入力欄５０２に入力された文字列を第２言語一般語彙情報１３５に記憶させる旨の入力が受け付けられる。 The general language model edit screen 500 receives input of the first vocabulary or the third vocabulary. The general language model edit screen 500 includes a first language input field 501, a second language input field 502, and a registration confirmation field 503. The first language input field 501 accepts input of a character string (first vocabulary) in the first language. In the second language input field 502, input of a character string in the second language (third vocabulary) is accepted. In the registration confirmation field 503, the character string input in the first language input field 501 is stored in the first language general vocabulary information 125, and the character string input in the second language input field 502 is stored in the second language general vocabulary information. An input to the effect that it is stored in 135 is accepted.

専門言語モデル編集画面６００は、第２の語彙、又は第４の語彙の入力を受け付ける。専門言語モデル編集画面６００は、第１言語入力欄６０１、第２言語入力欄６０２、及び登録確定欄６０３を備える。第１言語入力欄６０１には、第１言語の文字列（第２の語彙
）の入力が受け付けられる。第２言語入力欄６０２には、第２言語の文字列の入力（第４の語彙）が受け付けられる。登録確定欄６０３には、第１言語入力欄６０１に入力された文字列を第１言語専門語彙情報１２６に記憶させ、第２言語入力欄６０２に入力された文字列を第２言語専門語彙情報１３６に記憶させる旨の入力が受け付けられる。 The specialized language model edit screen 600 receives input of the second vocabulary or the fourth vocabulary. The specialized language model edit screen 600 includes a first language input field 601, a second language input field 602, and a registration confirmation field 603. The first language input field 601 accepts input of a character string (second vocabulary) in the first language. The second language input field 602 accepts input of a character string in the second language (fourth vocabulary). In the registration confirmation field 603, the character string input in the first language input field 601 is stored in the first language specialized vocabulary information 126, and the character string input in the second language input field 602 is stored in the second language specialized vocabulary information. An input to store the data in 136 is accepted.

＜サービスロボットによる観光案内の例＞
ここで、サービスロボット２００による観光案内の例を説明する。
図１１は、サービスロボット２００による外国人観光客への観光案内の一例を説明する図である。同図に示すように、１４チャンネルのマイクを備えるサービスロボット２００に、外国人観光客が発した、複数の種類の言語（日本語、英語、中国語、韓国語）のうちある一つの言語（以下、入力言語という）による言葉が入力される（符号１１０１）。この言葉の音声と同時に入力された雑音等の他の不要な音声は、所定の雑音抑制処理により除去され、１チャンネルの音声（以下、入力音声という）となる。 <Example of tourist information by service robot>
Here, an example of sightseeing guidance by the service robot 200 will be described.
FIG. 11 is a diagram illustrating an example of tourist guidance to foreign tourists by the service robot 200. As shown in the figure, the service robot 200 equipped with a 14-channel microphone has one of a plurality of languages (Japanese, English, Chinese, Korean) issued by a foreign tourist ( Hereinafter, a word in an input language is input (reference numeral 1101). Other unnecessary voice such as noise that is input at the same time as the voice of this word is removed by a predetermined noise suppression process, and becomes one-channel voice (hereinafter referred to as input voice).

サービスロボット２００から入力音声を受信した言語識別サーバ１００は、言語識別処理を行う。 Upon receiving the input voice from the service robot 200, the language identification server 100 performs language identification processing.

具体的には、言語識別サーバ１００は、入力音声の、日本語としての信頼度（第１信頼度）を、第１言語一般語彙情報１２５、第１言語専門語彙情報１２６、及び第１言語音節情報１２７に基づき算出する（符号１１０２）。また、言語識別サーバ１００は、入力音声の、英語としての信頼度（第２信頼度）を、第２言語一般語彙情報１３５、第２言語専門語彙情報１３６、及び第２言語音節情報１３７に基づき算出する（符号１１０３）。なお、不図示であるが、言語識別サーバ１００は、その他の言語（中国語、韓国語等）についても同様に、各言語の信頼度を算出する。 Specifically, the language identification server 100 determines the reliability (first reliability) of the input voice as Japanese (first reliability), first language general vocabulary information 125, first language specialized vocabulary information 126, and first language syllables. It is calculated based on the information 127 (reference numeral 1102). Also, the language identification server 100 determines the reliability (second reliability) of the input voice as English based on the second language general vocabulary information 135, the second language specialized vocabulary information 136, and the second language syllable information 137. It is calculated (reference numeral 1103). Although not shown, the language identification server 100 similarly calculates the reliability of each language for other languages (Chinese, Korean, etc.).

そして言語識別サーバ１００は、算出した各信頼度を比較し、最も信頼度が高い言語（本例では英語とする）が、入力音声の入力言語であると特定する（符号１１０４）。さらに、言語識別サーバ１００は、特定した言語の意味内容（入力音声の意味内容）を解析することにより、入力音声が、所定の行き先への行き方の質問であることを認識し、認識した結果をサービスロボット２００に送信する（符号１１０５）。 Then, the language identification server 100 compares the calculated reliabilities and specifies that the language with the highest reliance (English in this example) is the input language of the input voice (reference numeral 1104). Further, the language identification server 100 recognizes that the input voice is a question of how to reach a predetermined destination by analyzing the semantic content of the specified language (the semantic content of the input voice), and displays the recognized result. It is transmitted to the service robot 200 (reference numeral 1105).

サービスロボット２００は、受信した情報に基づき、上記質問に対応する発声、すなわち、所定の行き先への行き方を説明する文章の発声を行う（符号１１０６）。 Based on the received information, the service robot 200 utters a voice corresponding to the above question, that is, utters a sentence explaining how to reach a predetermined destination (reference numeral 1106).

以上のように、本実施形態の言語認識システム１によれば、第１言語一般語彙情報に基づき生成した第１言語一般翻訳句と、第１言語専門語彙情報に基づき生成した第１言語専門翻訳句との間で相互に対応する第１言語対応部分を抽出し、第２言語一般語彙情報に基づき生成した第２言語一般翻訳句と、第２言語専門語彙情報に基づき生成した第２言語専門翻訳句との間で相互に対応する第２言語対応部分を抽出し、第１言語対応部分、及び第２言語対応部分に基づき音声の言語を特定する。すなわち、第１の言語及び第２の言語について、異なる語彙の情報に基づき抽出した言葉の間の対応部分に基づいて言語を認識するので、言語認識を行うのに適切な部分のみを取り出して言語を識別することができる。 As described above, according to the language recognition system 1 of the present embodiment, the first-language general translation phrase generated based on the first-language general vocabulary information and the first-language specialized translation generated based on the first-language specialized vocabulary information. A second language general translation phrase generated based on the second language general vocabulary information and a second language specialty generated based on the second language specialized vocabulary information by extracting the first language corresponding parts corresponding to the phrases. The second language corresponding part corresponding to the translation phrase is extracted, and the language of the voice is specified based on the first language corresponding part and the second language corresponding part. That is, for the first language and the second language, the language is recognized based on the corresponding portions between the words extracted based on the information of the different vocabulary, and therefore only the portion suitable for performing the language recognition is extracted. Can be identified.

例えば、専門語彙の言語モデルによる音声の認識結果と、一般語彙の言語モデルによる音声の認識結果とが対応する部分に基づき言語を識別することができる。したがって、発言者が発する言葉に外来語が含まれていることによって音声の認識結果に誤りが生じるおそれがある場合であっても、発言者が発した言語を正確に認識することができる。 For example, the language can be identified based on the part where the speech recognition result by the language model of the specialized vocabulary corresponds to the speech recognition result by the language model of the general vocabulary. Therefore, even if there is a risk that an error may occur in the voice recognition result due to the inclusion of foreign words in the words spoken by the speaker, the language spoken by the speaker can be accurately recognized.

このように、本実施形態の言語認識システム１によれば、言語認識を正確に行うことができる。 As described above, according to the language recognition system 1 of the present embodiment, it is possible to accurately perform language recognition.

また、本実施形態の言語認識システム１によれば、第１言語対応部分を、第１言語一般翻訳句を表す文字列と第１言語専門翻訳句を表す文字列との共通性に基づき抽出し、第２言語対応部分を、第２言語一般翻訳句を表す文字列と第２言語専門翻訳句を表す文字列との共通性に基づき抽出するので、第１言語対応部分及び第２言語対応部分において意味内容が共通する部分を確実に抽出することができる。これにより、言語認識を正確に行うことができる。 Further, according to the language recognition system 1 of the present embodiment, the first language corresponding portion is extracted based on the commonality between the character string representing the first language general translation phrase and the character string representing the first language specialized translation phrase. , The second language corresponding part is extracted based on the commonality between the character string representing the second language general translation phrase and the character string representing the second language specialized translation phrase, so that the first language corresponding part and the second language corresponding part are extracted. In, it is possible to reliably extract a part having a common meaning. Thereby, language recognition can be accurately performed.

また、本実施形態の言語認識システム１によれば、第１言語一般翻訳句における各単語の発声のタイミング、及び第１言語専門翻訳句における各単語の発声のタイミングの共通性に基づき第１言語対応部分を抽出し、第２言語一般翻訳句における各単語の発声のタイミング、及び第２言語専門翻訳句における各単語の発声のタイミングの共通性に基づき第２言語対応部分を抽出するので、各言語に特有に存在する発声のタイミングやリズムに基づき言語認識を行うことができる。これにより、言語認識を正確に行うことができる。 Further, according to the language recognition system 1 of the present embodiment, the first language is based on the commonality of the utterance timing of each word in the first language general translation phrase and the utterance timing of each word in the first language specialized translation phrase. Since the corresponding portion is extracted and the second language corresponding portion is extracted based on the commonality of the utterance timing of each word in the second language general translated phrase and the utterance timing of each word in the second language specialized translated phrase, Language recognition can be performed based on the timing and rhythm of vocalization that is unique to the language. Thereby, language recognition can be accurately performed.

また、本実施形態の言語認識システム１によれば、第１言語対応部分の、第１の言語としての確からしさを示す第１信頼度と、第２言語対応部分の、第２の言語としての確からしさを示す第２信頼度とに基づき、音声の言語を特定するので、言語認識を安定した精度で行うことができる。 Further, according to the language recognition system 1 of the present embodiment, the first reliability indicating the certainty of the first language corresponding part as the first language and the second reliability of the second language corresponding part as the second language. Since the language of the voice is specified based on the second reliability indicating the certainty, it is possible to perform the language recognition with stable accuracy.

また、本実施形態の言語認識システム１によれば、第１言語一般語彙情報、第１言語専門語彙情報、第２言語一般語彙情報、又は第２言語専門語彙情報の入力を受け付け、入力を受け付けた情報を出力するので、言語認識システム１のユーザは、自身による入力内容を確認しながら各情報を適宜に更新し、言語認識の精度を向上させることができる。 Further, according to the language recognition system 1 of the present embodiment, the input of the first language general vocabulary information, the first language specialized vocabulary information, the second language general vocabulary information, or the second language specialized vocabulary information is accepted, and the input is accepted. Since the information is output, the user of the language recognition system 1 can appropriately update each information while checking the input content by himself/herself to improve the accuracy of language recognition.

また、本実施形態の言語認識システム１によれば、発せられた音声に対応する、上記特定した言語の音声を出力するので、人との的確な対話が可能となる。例えば、ロボットによる外国人観光客との自動音声対話を的確に行うことができる。 Further, according to the language recognition system 1 of the present embodiment, since the voice of the specified language corresponding to the uttered voice is output, an accurate dialogue with a person is possible. For example, it is possible to appropriately perform automatic voice dialogue with a foreign tourist by a robot.

また、本実施形態の言語認識システム１によれば、取得した第１及び第２の言語の一部を言葉の使用頻度に関する情報に基づき抽出し、抽出した第１及び第２の言語の一部を第１及び第２言語専門語彙情報として記憶するので、第１及び第２言語専門語彙情報を適切な内容に更新することができる。これにより、言語認識の精度を向上させることができる。 Further, according to the language recognition system 1 of the present embodiment, a part of the acquired first and second languages is extracted based on the information on the frequency of use of words, and a part of the extracted first and second languages is extracted. Is stored as the first and second language specialized vocabulary information, the first and second language specialized vocabulary information can be updated to an appropriate content. As a result, the accuracy of language recognition can be improved.

また、本実施形態の言語認識システム１によれば、抽出した第１及び第２の言語の言葉の一部を第２及び第１の言語の言葉に変換し、変換した第２及び第１の言語の言葉を第２及び第１言語専門語彙情報として記憶するので、第１の言語及び第２の言語の語彙を同時に蓄積することができる。これにより、言語認識の精度を向上させることができる。 Further, according to the language recognition system 1 of the present embodiment, a part of the extracted words of the first and second languages is converted into the words of the second and first languages, and the converted second and first words are converted. Since the words of the language are stored as the second and first language specialized vocabulary information, the vocabularies of the first language and the second language can be stored at the same time. As a result, the accuracy of language recognition can be improved.

以上の実施形態の説明は、本発明の理解を容易にするためのものであり、本発明を限定するものではない。本発明はその趣旨を逸脱することなく、変更、改良され得ると共に本発明にはその等価物が含まれる。 The above description of the embodiments is for facilitating the understanding of the present invention and does not limit the present invention. The present invention can be modified and improved without departing from the spirit thereof and the present invention includes equivalents thereof.

例えば、本実施形態では、言語認識システム１が、観光客の発した言葉を認識するシステムである場合を説明したが、発言者は観光客に限られず、人一般でよい。例えば、サービスロボット２００が看護用ロボットや介護用ロボットである等、発言者が病院施設や介護施設における患者や医師であるという場合でもよい。 For example, in the present embodiment, the case has been described in which the language recognition system 1 is a system that recognizes words spoken by tourists, but the speaker is not limited to tourists, and may be any person in general. For example, the service robot 200 may be a nursing robot or a nursing robot, and the speaker may be a patient or a doctor in a hospital facility or a nursing facility.

また、言語識別サーバ１００は、第１言語対応部分及び第２言語対応部分を、予め記憶
しておいた、第１言語対応部分及び第２言語対応部分に関する語彙の情報から抽出してもよい。 Further, the language identification server 100 may extract the first language corresponding part and the second language corresponding part from previously stored vocabulary information relating to the first language corresponding part and the second language corresponding part.

また、本実施形態では、対象言語として、第１の言語（日本語）、及び第２の言語（英語）のみを例として挙げたが、言語の種類や数はこれに限られず、中国語、韓国語等の他の言語も、言語認識の対象の言語とすることが可能である。 Further, in the present embodiment, only the first language (Japanese) and the second language (English) are given as the target languages, but the type and number of languages are not limited to this, and Chinese, Other languages such as Korean can also be the target language for language recognition.

１言語認識システム、１００言語識別サーバ、２００サービスロボット、２１０音声取得部、１２１１第１言語一般翻訳部、１２１２第１言語専門翻訳部、１２２第１対応区間算出部、１３１１第２言語一般翻訳部、１３１２第２言語専門翻訳部、１３２第２対応区間算出部、１３３第２信頼度算出部、１４０言語識別部、１５０
応答生成部 1 language recognition system, 100 language identification server, 200 service robot, 210 voice acquisition unit, 1211 first language general translation unit, 1212 first language specialized translation unit, 122 first corresponding section calculation unit, 1311 second language general translation unit , 1312 second language specialized translation unit, 132 second corresponding section calculation unit, 133 second reliability calculation unit, 140 language identification unit, 150
Response generator

Claims

A language recognition system, comprising a processor and a memory, for recognizing a language of an emitted voice,
A voice acquisition unit that acquires voice,
A first language that generates a first language general translation phrase that is a word of the first language corresponding to the acquired voice, based on first language general vocabulary information that is information about the first vocabulary in the first language. General translation department,
A first language specialized translation phrase, which is a word in the first language corresponding to the acquired voice, is generated based on first language specialized vocabulary information that is information about a second vocabulary in the first language. A specialized translation department,
A first corresponding section calculation unit that extracts a first language corresponding portion that is a mutually corresponding portion between the generated first language general translation phrase and the generated first language specialized translation phrase;
A second language that generates a second language general translation phrase that is a word of the second language corresponding to the acquired voice, based on second language general vocabulary information that is information about the third vocabulary in the second language. General translation department,
A second language specialized translation phrase, which is a word in the second language corresponding to the acquired voice, is generated based on second language specialized vocabulary information that is information about a fourth vocabulary in the second language. A specialized translation department,
A second corresponding section calculation unit that extracts a second language corresponding portion that is a mutually corresponding portion between the generated second language general translation phrase and the generated second language specialized translation phrase;
A language identification unit that identifies a language of the acquired voice based on the extracted first language corresponding part and the extracted second language corresponding part;
A language recognition system equipped with.

The first corresponding section calculation unit extracts the first language corresponding portion based on commonality between a character string representing the first language general translation phrase and a character string representing the first language specialized translation phrase,
The second corresponding section calculation unit extracts the second language corresponding portion based on the commonality between the character string representing the second language general translation phrase and the character string representing the second language specialized translation phrase.
The language recognition system according to claim 1.

The first corresponding section calculation unit acquires the timing of utterance of each word in the first language general translation phrase and the timing of utterance of each word in the first language specialized translation phrase, and the acquisition of these acquired words Based on the timing commonality, the first language corresponding part is extracted,
The second corresponding section calculation unit acquires the timing of utterance of each word in the second language general translation phrase and the timing of utterance of each word in the second language specialized translation phrase, and the acquired utterance of these words. The second language corresponding part is extracted based on the commonality of the timings of
The language recognition system according to claim 1.

The language identification unit has a first reliability, which is an index indicating the certainty of the first language corresponding portion as the first language, and an authenticity of the second language corresponding portion as the second language. Specify the language of the acquired voice based on the second reliability, which is an index indicating the likelihood,
The language recognition system according to claim 1.

The language recognition system according to claim 1, further comprising: a response unit that outputs a voice in the specified language corresponding to the acquired voice.

A database that accepts an input of at least one of the first language general vocabulary information, the first language specialized vocabulary information, the second language general vocabulary information, and the second language specialized vocabulary information, and outputs the accepted information. The language recognition system according to claim 1, further comprising an updating unit.

Gets the words of the first language, some words of the acquired first language, and extracted based on the information about the frequency of use of words, the part of the words of the extracted first language Stored as vocabulary information specialized in the first language,
The second retrieves the words of the language, some of the words of the acquired second language, and extracted based on the information about the frequency of use of words, extracted the part of the words of the second language Store as second language specialized vocabulary information,
The language recognition system according to claim 1, further comprising a specialized language updating unit.

Converting a part of the extracted words of the first language to words of the second language, storing the converted words of the second language as the second language specialized vocabulary information,
A part of the extracted words of the second language is converted into the words of the first language, and the converted words of the first language are stored as the first language specialized vocabulary information;
The language recognition system according to claim 7, further comprising a specialized language conversion unit.

A response unit that outputs a voice in the specified language corresponding to the obtained voice,
A database that accepts an input of at least one of the first language general vocabulary information, the first language specialized vocabulary information, the second language general vocabulary information, and the second language specialized vocabulary information, and outputs the accepted information. Update department,
Gets the words of the first language, some words of the acquired first language, and extracted based on the information about the frequency of use of words, the part of the words of the extracted first language Stored as vocabulary information specialized in the first language,
The second retrieves the words of the language, some of the words of the acquired second language, and extracted based on the information about the frequency of use of words, extracted the part of the words of the second language A specialized language update unit that stores the second language specialized vocabulary information;
Converting a part of the extracted words of the first language to words of the second language, storing the converted words of the second language as the second language specialized vocabulary information,
A specialized language conversion unit that converts a part of the extracted words of the second language into words of the first language and stores the converted words of the first language as the first language specialized vocabulary information; Equipped with
The first corresponding section calculation unit extracts the first language corresponding portion based on commonality between a character string representing the first language general translation phrase and a character string representing the first language specialized translation phrase, and / Or, the timing of utterance of each word in the first language general translation phrase and the timing of utterance of each word in the first language specialized translation phrase are acquired, and the acquired timing of utterance of these words is common. Based on the above, the first language corresponding part is extracted,
The second corresponding section calculation unit extracts the second language corresponding portion based on commonality between a character string representing the second language general translation phrase and a character string representing the second language specialized translation phrase, and Or, the timing of utterance of each word in the second language general translation phrase and the timing of utterance of each word in the second language specialized translation phrase are acquired, and commonality of the acquired utterance timings of these words is acquired. Based on the second language corresponding portion is extracted,
The language identification unit has a first reliability, which is an index indicating the certainty of the first language corresponding portion as the first language, and an authenticity of the second language corresponding portion as the second language. Specify the language of the acquired voice based on the second reliability, which is an index indicating the likelihood,
The language recognition system according to claim 1.

A language recognition method for recognizing the language of a spoken voice,
An information processing device including a processor and a memory,
A voice acquisition process that acquires voice,
A first language that generates a first language general translation phrase that is a word of the first language corresponding to the acquired voice, based on first language general vocabulary information that is information about the first vocabulary in the first language. General translation processing,
A first language specialized translation phrase, which is a word in the first language corresponding to the acquired voice, is generated based on first language specialized vocabulary information that is information about a second vocabulary in the first language. Language specialized translation processing,
A first corresponding section calculation process for extracting a first language corresponding portion which is a mutually corresponding portion between the generated first language general translation phrase and the generated first language specialized translation phrase;
A second language that generates a second language general translation phrase that is a word of the second language corresponding to the acquired voice, based on second language general vocabulary information that is information about the third vocabulary in the second language. General translation processing,
A second language specialized translation phrase, which is a word in the second language corresponding to the acquired voice, is generated based on second language specialized vocabulary information that is information about a fourth vocabulary in the second language. Language specialized translation processing,
A second corresponding section calculation process for extracting a second language corresponding portion which is a mutually corresponding portion between the generated second language general translation phrase and the generated second language specialized translation phrase;
A language identification process for specifying the language of the acquired voice based on the extracted first language corresponding part and the extracted second language corresponding part;
A language recognition method that performs.

In the first corresponding section calculation process, the first language corresponding portion is extracted based on the commonality between the character string representing the first language general translation phrase and the character string representing the first language specialized translation phrase,
The second corresponding section calculation process includes a process of extracting the second language corresponding part based on the commonality between a character string representing the second language general translation phrase and a character string representing the second language specialized translation phrase. Including,
The language recognition method according to claim 10.

Gets the words of the first language, some words of the acquired first language, and extracted based on the information about the frequency of use of words, the part of the words of the extracted first language Stored as vocabulary information specialized in the first language,
The second retrieves the words of the language, some of the words of the acquired second language, and extracted based on the information about the frequency of use of words, extracted the part of the words of the second language Store as second language specialized vocabulary information,
The language recognition method according to claim 10, wherein a specialized language update process is executed.

A language recognition program for recognizing the language of voices emitted,
An information processing device including a processor and a memory,
A voice acquisition process that acquires voice,
A first language that generates a first language general translation phrase that is a word of the first language corresponding to the acquired voice, based on first language general vocabulary information that is information about the first vocabulary in the first language. General translation processing,
A first language specialized translation phrase, which is a word in the first language corresponding to the acquired voice, is generated based on first language specialized vocabulary information that is information about a second vocabulary in the first language. Language specialized translation processing,
A first corresponding section calculation process for extracting a first language corresponding portion which is a mutually corresponding portion between the generated first language general translation phrase and the generated first language specialized translation phrase;
A second language that generates a second language general translation phrase that is a word of the second language corresponding to the acquired voice, based on second language general vocabulary information that is information about the third vocabulary in the second language. General translation processing,
A second language specialized translation phrase, which is a word in the second language corresponding to the acquired voice, is generated based on second language specialized vocabulary information that is information about a fourth vocabulary in the second language. Language specialized translation processing,
A second corresponding section calculation process for extracting a second language corresponding portion which is a mutually corresponding portion between the generated second language general translation phrase and the generated second language specialized translation phrase;
A language identification process for specifying the language of the acquired voice based on the extracted first language corresponding part and the extracted second language corresponding part;
A language recognition program that executes.

In the first corresponding section calculation process, the first language corresponding portion is extracted based on the commonality between the character string representing the first language general translation phrase and the character string representing the first language specialized translation phrase,
The second corresponding section calculation process includes a process of extracting the second language corresponding part based on the commonality between a character string representing the second language general translation phrase and a character string representing the second language specialized translation phrase. Including,
The language recognition program according to claim 13.

Gets the words of the first language, some words of the acquired first language, and extracted based on the information about the frequency of use of words, the part of the words of the extracted first language Stored as vocabulary information specialized in the first language,
The second retrieves the words of the language, some of the words of the acquired second language, and extracted based on the information about the frequency of use of words, extracted the part of the words of the second language Store as second language specialized vocabulary information,
The language recognition program according to claim 13, which executes a specialized language update process.