JP2004271895A

JP2004271895A - Multilingual speech recognition system and pronunciation learning system

Info

Publication number: JP2004271895A
Application number: JP2003062332A
Authority: JP
Inventors: Takeshi Hanazawa; 健花沢; Ryosuke Isotani; 亮輔磯谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2003-03-07
Filing date: 2003-03-07
Publication date: 2004-09-30

Abstract

<P>PROBLEM TO BE SOLVED: To solve the problem in which conventional technology costs high because recognition system and phoneme models must be made by each language and the throughput in recognition increases as compared with recognition of a speech of a single language. <P>SOLUTION: A multilingual speech recognition system as an embodiment of the present invention is equipped with a speech recognition part 103 and a multilingual recognition dictionary 104, in which pronunciation information on words of each language to be recognized is described in the form of pronunciation of a common specific language together with properties. In recognition, words having the same meaning are handled as different words when having different pronunciation information. For the purpose of learning pronunciation of a foreign language, pronunciation information of the foreign language and pronunciation information of the mother tongue are described in the form of pronunciation of one language for one word, and a recognition system for the language is used for recognition to decide how accurate pronunciation is on the basis of which pronunciation information a spoken word is closer to. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は複数言語の音声を同時に認識可能な音声認識システム、および音声認識システムを利用した発音学習システムに関する。
【０００２】
【従来の技術】
近年の音声認識システムにおいては、単言語の認識ができるだけでなく、複数言語に対応できるものが望まれている。複数言語の認識が可能であれば、異なる言語間での通訳システムなどへの応用が可能である。
【０００３】
従来の複数言語音声認識システムの一例として、各言語用の音声認識システムを並列に実行して認識結果のスコアを比較し、よりスコアの高いものを選択する方法を採用した音声認識装置がある（特許文献１参照。）。
【０００４】
前記複数言語音声認識システムの他の例として、各言語用の音声認識システムを並列に実行して各言語における認識尤度を正規化し比較し認識尤度が最も大きな大きな言語を認識結果として出力する音声認識装置がある（特許文献２参照。）。
【０００５】
また、このような音声認識システムを外国語の発音学習に利用しようという試みも行われている。その場合、学習者が発声した内容をシステムが認識し、認識結果に応じてその発声内容がどの程度当該外国語の発音に近いかを判定し、その結果を学習者へフィードバックする。
【０００６】
従来の外国語発音学習システムの一例として、母国語の音声認識システムをベースとして、外国語の音素モデルを含むような音声標準パタンを新たに用意して、モデル音声との一致の程度を評価する方法を採用している外国語学習装置がある（特許文献３参照。）。
【０００７】
【特許文献１】
特開２００１−１８８５５６号公報（第１頁、図１）
【特許文献２】
特開平１０−１１６０９３号公報（第１頁、図１）
【特許文献３】
特開２００１−２８２０９８号公報（第１頁、図１）
【０００８】
【発明が解決しようとする課題】
従来の技術の問題点は、言語ごとに認識システムや音素モデルを作成する必要があるためコストがかかり、認識時の処理量も単言語の音声の認識と比べ増大する、ということである。
【０００９】
本発明の目的は、複数の言語の発音を認識する簡易な音声認識システムを提供すること、及び実現が簡易な発音学習システムを提供することにある。
【００１０】
【課題を解決するための手段】
本発明の第１の複数言語音声認識システムは、入力音声を受け付け、少なくとも一部の単語に複数の言語の発音情報を、共通のある特定の言語の発音記述方式で記述し、その発音情報がどの言語に属しているかの情報とともに登録した認識辞書を備え、前記認識辞書に登録されている単語の中から入力音声に最も近い単語または単語列およびその発音情報を探索し、前記探索の結果として少なくとも前記単語または単語列の発音情報がどの言語に属していたかに応じて出力を変えることを特徴とする。
【００１１】
本発明の第２の複数言語音声認識システムは、入力音声を受け付け、少なくとも一部の単語に複数の言語の発音情報を、そのうちのいずれかの言語の発音記述方式で記述し、その発音情報がどの言語に属しているかの情報とともに登録した認識辞書を備え、前記認識辞書に登録されている単語の中から入力音声に最も近い単語または単語列およびその発音情報を探索し、前記探索の結果として少なくとも前記単語または単語列の発音情報がどの言語に属していたかに応じて出力を変えることを特徴とする。
【００１２】
本発明の第３の複数言語音声認識システムは、入力音声を受け付け、少なくとも一部の単語に複数の方言の発音情報を、そのうちのいずれかの方言の発音記述方式で記述し、その発音情報がどの方言に属していたかの情報とともに登録した認識辞書を備え、前記認識辞書に登録されている単語の中から入力音声に最も近い単語または単語列およびその発音情報を探索し、前記探索の結果として少なくとも前記単語または単語列の発音情報がどの方言に属していたかを出力することを特徴とする。
【００１３】
本発明の第１の発音学習システムは、入力音声を受け付け、少なくとも一部の単語に母国語風と外国語風の発音情報を、そのうちのいずれかの言語の発音記述方式で記述し、その発音情報がどの言語に属していたかの情報とともに登録した認識辞書を備え、前記認識辞書に登録されている単語の中から入力音声に最も近い単語または単語列およびその読みを探索し、前記探索の結果として少なくとも前記単語または単語列の読みが母国語風であったか外国語風であったかを出力することを特徴とする。
【００１４】
本発明の第２の発音学習システムは、提示された発声内容に対する入力音声を受け付け、前記発声内容の単語に母国語風と外国語風の発音情報を、そのうちのいずれかの言語の発音記述方式で記述し、その発音情報がどの言語に属していたかの情報とともに登録した認識辞書を備え、前記認識辞書に登録されている単語の中から入力音声に最も近い単語または単語列およびその発音情報を探索し、前記探索の結果として少なくとも前記単語または単語列の発音情報が母国語風であったか外国語風であったかを出力することで、発声内容の外国語発音らしさを判定することを特徴とする。
【００１５】
本発明の第３の発音学習システムは、提示された発声内容に対する入力音声を受け付け、前記発声内容の単語に母国語風と外国語風の発音情報を、そのうちのいずれかの言語の発音記述方式で記述し、その発音情報がどの言語に属していたかの情報とともに登録した認識辞書を備え、前記認識辞書に登録されている単語の中から入力音声に最も近い単語または単語列およびその発音情報を探索し、前記探索の結果として少なくとも母国語のもっともらしさのスコアと外国語のもっともらしさのスコアから外国語らしさのスコアを計算して出力することを特徴とする。
【００１６】
本発明の第４の発音学習システムは、前記第１〜第３の何れかの発音学習システムに於いて、ある言語風の発音情報を、その言語の本来の発音あるいは表記あるいはその両者の組み合わせから、異なる言語の発音情報へ変換する発音変換部を備え、前記発音変換部の出力である発音情報を認識辞書に登録することで、前記発音情報を探索の対象として用いることを特徴とする。
【００１７】
本発明の第５の発音学習システムは、前記第１〜第４の何れかの発音学習システムに於いて、少なくとも韻律を発音情報として用いることを特徴とする。
【００１８】
本発明の第６の発音学習システムは、音声認識を利用した発音学習システムであり、認識結果が母国語であれば母国語で結果出力し、外国語であれば外国語で結果出力することを特徴とする。
【００１９】
【発明の実施の形態】
次に、本発明の実施の形態について図面を参照して詳細に説明する。図１は、本発明による複数言語音声認識システムの全体の構成を示した図である。本発明による複数言語音声認識システムは、音声入力を行うマイクロフォン１０２と、入力音声から最も確からしい結果を探索する音声認識部１０３と、音声認識部１０３の探索対象である各単語に複数言語の発音情報をその属性とともに記述した複数言語認識辞書１０４と、音声認識部１０３の探索の結果を出力する結果出力部１０５とからなる。
【００２０】
図２は、図１の複数言語認識辞書１０４の例である。ある単語Ａに対し、言語ａの発音情報Ａａと言語ｂの発音情報Ａｂをそれぞれ別のエントリとして持つ。また、各エントリにはその属性が付与されている。
【００２１】
次に、図３の流れ図を用いて本発明による複数言語音声認識システムの動作の一例を説明する。
【００２２】
入力音声１０１に対して処理をスタートし（ステップ２０１）、マイクロフォン１０２より音声を入力し（ステップ２０２）、音声認識部１０３において最も確からしい単語エントリを探索し（ステップ２０３）、探索した単語エントリの属性を参照し、その属性に応じて（ステップ２０４）、結果出力部１０５にて言語ａの出力を行なう（ステップ２０５）、或いは言語ｂの出力を行う（ステップ２０６）。
【００２３】
尚、複数言語認識辞書の属性が言語ａ、言語ｂの他にありｎ通りの言語識別を含む場合は、ステップ２０４以降でそれに応じたｎ通りの言語出力を行う。
【００２４】
次に、本発明の第２の実施の形態について説明する。本発明の第２の実施の形態では、複数言語認識辞書として、外国語の単語に対し当該外国語風および母国語風の複数の発音情報を持つものを備える。
【００２５】
次に、図４の流れ図を用いて本発明の第２の実施の形態の動作の一例を説明する。入力音声１０１に対して処理をスタートし（ステップ３０１）、マイクロフォン１０２より音声を入力し（ステップ３０２）、音声認識部１０３において最も確からしい単語およびその発音情報を探索する（ステップ３０３）。
【００２６】
このとき、どの発音情報によってその単語が探索されたかの属性に応じて入力音声の外国語らしさが決まり、その属性に応じて（ステップ３０４）、結果出力部１０５にて母国語属性の出力を行なう（ステップ３０５）、或いは外国語属性の出力を行う（ステップ３０６）。
【００２７】
【実施例】
次に、具体的な実施例を用いて前記第１及び第２の実施形態を説明する。図５は本発明における複数言語認識辞書の一例である。図５に示すように、英語属性のｃｏｆｆｅｅには英語での本来の発音をそれに近い日本語発音表記で表した「かふぃ」、「かひ」の読みを、日本語属性のｃｏｆｆｅｅには「こーひー」の読みをそれぞれ与えることで、英語日本語両方のｃｏｆｆｅｅが認識可能となる。
【００２８】
外国語の発音学習システムとしては、ｃｏｆｆｅｅの入力音声に対し英語属性のｃｏｆｆｅｅが結果として出力されれば、英語として発音が正確であると判定する。
【００２９】
結果出力においては、属性をそのまま出力する方法と、日本語属性であれば「こーひー」を、英語属性であれば「ｃｏｆｆｅｅ」をそれぞれ出力する方法と、それらをテキスト出力だけでなく音声出力する方法とがある。
【００３０】
また、音声認識部１０３において、日本語属性の「こーひー」に対するもっともらしさと英語属性の「かふぃ」に対するもっともらしさから、英語らしさのスコアを計算して学習者に提示するという方法もある。
【００３１】
更に、発音情報としては図５に示すような読みだけではなく、アクセントやイントネーションなどの韻律を用いて英語らしさと日本語らしさを判定するという方法もある。この方法は、同一言語内での方言の違いを判定するのに効果がある。
【００３２】
図６は、図１の複数言語認識辞書１０４を作成するために発音変換部４０２を用いた例である。
【００３３】
ある言語本来の発音情報４０１を発音変換部４０２に入力し、別の言語の発音情報として複数言語認識辞書１０４を自動生成する。
【００３４】
例えば、英語のｃｏｆｆｅｅという単語には元々／ｋ／ａｏ／ｆ／ｉｙ／という英語の発音情報が付与されていたとすると、／ｋ／ａｏ／から「か」を、／ｆ／ｉｙ／から「ふぃ」あるいは「ひ」を生成するようなルールを発音変換部に用意しておくことで、ｃｏｆｆｅｅという単語の発音情報として「かふぃ」あるいは「かひ」を得ることができる。
【００３５】
同様に、ｃｏｆｆｅｅという表記のうち、”ｃｏ”から「こー」を、”ｆｆｅｅ”から「ひー」を生成するようなルールを発音変換部に用意しておくことで、ｃｏｆｆｅｅという単語の発音情報として「こーひー」も得ることができる。
【００３６】
【発明の効果】
第１の効果は、複数言語の音声を認識するために単言語の音声認識システムがあれば良いので、複数の音声認識システムや複数言語に対応した音声標準パタンを用意する必要がないため、簡易な実現が可能となり、認識時の処理量の増加も抑えられることである。
【００３７】
第２の効果は、外国語等の発音学習において、複数の音声認識システムや複数言語に対応した音声標準パタンを用意する必要がないため、簡易な実現が可能となることである。
【図面の簡単な説明】
【図１】本発明の第１および第２の実施の形態の構成を示すブロック図である。
【図２】図１の複数言語認識辞書１０４の具体例を示す図である。
【図３】第１の実施の形態の動作を示す流れ図である。
【図４】第２の実施の形態の動作を示す流れ図である。
【図５】第１および第２の実施の形態の具体例を示す図である。
【図６】第１および第２の実施の形態の具体例を示す図である。
【符号の説明】
１０１入力音声
１０２マイクロフォン
１０３音声認識部
１０４複数言語認識辞書
１０５結果出力部
４０１各言語本来の発音情報
４０２発音変換部
Ｓ２０１認識処理開始
Ｓ２０２音声の入力
Ｓ２０３探索
Ｓ２０４探索結果の属性を参照
Ｓ２０５言語ａの出力
Ｓ２０６言語ｂの出力
Ｓ２０７認識処理終了
Ｓ３０１認識処理開始
Ｓ３０２音声の入力
Ｓ３０３探索
Ｓ３０４探索結果の属性を参照
Ｓ３０５母国語属性の出力
Ｓ３０６外国語属性の出力
Ｓ３０７認識処理終了[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a speech recognition system capable of simultaneously recognizing speech in a plurality of languages, and a pronunciation learning system using the speech recognition system.
[0002]
[Prior art]
In recent years, a speech recognition system that can not only recognize a single language but also supports a plurality of languages is desired. If recognition of a plurality of languages is possible, application to an interpreter system between different languages is possible.
[0003]
As an example of a conventional multilingual speech recognition system, there is a speech recognition device that adopts a method of executing speech recognition systems for respective languages in parallel, comparing scores of recognition results, and selecting a higher score. See Patent Document 1.).
[0004]
As another example of the multilingual speech recognition system, a speech recognition system for each language is executed in parallel to normalize and compare the recognition likelihood in each language, and outputs a large language having the largest recognition likelihood as a recognition result. There is a voice recognition device (see Patent Document 2).
[0005]
Attempts have also been made to use such a speech recognition system for learning pronunciation in foreign languages. In that case, the system recognizes the content uttered by the learner, determines how close the uttered content is to the pronunciation of the foreign language in accordance with the recognition result, and feeds back the result to the learner.
[0006]
As an example of a conventional foreign language pronunciation learning system, a new speech standard pattern including a phoneme model of a foreign language is newly prepared based on a native language speech recognition system, and the degree of matching with the model speech is evaluated. There is a foreign language learning device adopting the method (see Patent Document 3).
[0007]
[Patent Document 1]
JP 2001-188556 A (page 1, FIG. 1)
[Patent Document 2]
JP-A-10-116093 (page 1, FIG. 1)
[Patent Document 3]
JP 2001-282098 A (page 1, FIG. 1)
[0008]
[Problems to be solved by the invention]
The problem with the conventional technique is that it is necessary to create a recognition system and a phoneme model for each language, so that the cost is high and the processing amount at the time of recognition is larger than that of monolingual speech recognition.
[0009]
An object of the present invention is to provide a simple speech recognition system for recognizing pronunciations in a plurality of languages, and to provide a pronunciation learning system that can be easily realized.
[0010]
[Means for Solving the Problems]
A first multi-language speech recognition system of the present invention accepts input speech, describes pronunciation information of a plurality of languages in at least some of the words by a common pronunciation description method of a specific language, and the pronunciation information is A recognition dictionary registered with information on which language belongs, a word or word string closest to the input voice and its pronunciation information are searched from words registered in the recognition dictionary, and as a result of the search, The output is changed at least according to which language the pronunciation information of the word or word string belongs to.
[0011]
A second multi-language speech recognition system of the present invention accepts an input speech, describes pronunciation information of a plurality of languages in at least some of the words by a pronunciation description method of any of the languages, and the pronunciation information is A recognition dictionary registered with information on which language belongs, a word or word string closest to the input voice and its pronunciation information are searched from words registered in the recognition dictionary, and as a result of the search, The output is changed at least according to which language the pronunciation information of the word or word string belongs to.
[0012]
The third multilingual speech recognition system of the present invention accepts an input speech, describes pronunciation information of a plurality of dialects in at least a part of words by using a pronunciation description method of any of the dialects, and the pronunciation information is used. A recognition dictionary registered with information on which dialect it belongs to, a word or word string closest to the input voice and its pronunciation information are searched from words registered in the recognition dictionary, and at least as a result of the search, It is characterized by outputting which dialect the pronunciation information of the word or word string belongs to.
[0013]
The first pronunciation learning system of the present invention receives an input speech, describes native language-like and foreign language-like pronunciation information in at least some of the words in a pronunciation description system of one of the languages, and generates the pronunciation. A recognition dictionary registered with information on which language the information belongs to, a word or word string closest to the input voice and its reading are searched from words registered in the recognition dictionary, and as a result of the search, It is characterized by outputting whether at least the reading of the word or the word string is a native language style or a foreign language style.
[0014]
A second pronunciation learning system according to the present invention receives an input speech corresponding to a presented utterance content, and outputs native-style and foreign-language style pronunciation information to words of the utterance content, and a pronunciation description system for any of the languages. And a recognition dictionary registered together with information on which language the pronunciation information belongs to, and searching for a word or word string closest to the input voice and its pronunciation information from words registered in the recognition dictionary. Then, by outputting whether at least the pronunciation information of the word or the word string is a native language style or a foreign language style as a result of the search, the likelihood of foreign language pronunciation of the utterance content is determined.
[0015]
A third pronunciation learning system according to the present invention receives an input voice corresponding to a presented utterance content, and outputs native language-like and foreign language-like pronunciation information to words of the utterance content, and a pronunciation description system for any one of the languages. And a recognition dictionary registered together with information on which language the pronunciation information belongs to, and searching for a word or word string closest to the input voice and its pronunciation information from words registered in the recognition dictionary. Then, as a result of the search, a score of the plausibility of the foreign language is calculated and output from at least the plausibility score of the native language and the plausibility score of the foreign language.
[0016]
A fourth pronunciation learning system according to the present invention, in any one of the first to third pronunciation learning systems, converts the pronunciation information of a certain language style from the original pronunciation or notation of the language or a combination of both. A pronunciation conversion unit that converts the pronunciation information into pronunciation information of a different language. The pronunciation information output from the pronunciation conversion unit is registered in a recognition dictionary, so that the pronunciation information is used as a search target.
[0017]
According to a fifth pronunciation learning system of the present invention, in any one of the first to fourth pronunciation learning systems, at least prosody is used as pronunciation information.
[0018]
The sixth pronunciation learning system of the present invention is a pronunciation learning system using speech recognition, and outputs a result in a native language if the recognition result is a native language, and outputs a result in a foreign language if the recognition result is a foreign language. Features.
[0019]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a diagram showing an overall configuration of a multilingual speech recognition system according to the present invention. A multi-language speech recognition system according to the present invention includes a microphone 102 that performs speech input, a speech recognition unit 103 that searches for the most probable result from the input speech, and a multi-language pronunciation for each word to be searched by the speech recognition unit 103. It comprises a multi-language recognition dictionary 104 in which information is described together with its attributes, and a result output unit 105 for outputting a search result of the speech recognition unit 103.
[0020]
FIG. 2 is an example of the multiple language recognition dictionary 104 of FIG. For a certain word A, pronunciation information Aa of language a and pronunciation information Ab of language b are provided as separate entries. Each entry is provided with its attribute.
[0021]
Next, an example of the operation of the multilingual speech recognition system according to the present invention will be described with reference to the flowchart of FIG.
[0022]
Processing is started for the input speech 101 (step 201), speech is input from the microphone 102 (step 202), and the most probable word entry is searched for in the speech recognition unit 103 (step 203). With reference to the attribute and according to the attribute (step 204), the result output unit 105 outputs the language a (step 205) or outputs the language b (step 206).
[0023]
If the attribute of the multiple language recognition dictionary is other than the language a and the language b and includes n kinds of language identifications, then n steps of the language output are performed in step 204 and subsequent steps.
[0024]
Next, a second embodiment of the present invention will be described. In the second embodiment of the present invention, a multilingual recognition dictionary having a plurality of foreign language-like and native language-like pronunciation information for a foreign language word is provided.
[0025]
Next, an example of the operation of the second exemplary embodiment of the present invention will be described with reference to the flowchart of FIG. Processing is started for the input speech 101 (step 301), speech is input from the microphone 102 (step 302), and the most probable word and its pronunciation information are searched for in the speech recognition unit 103 (step 303).
[0026]
At this time, the foreign language likeness of the input voice is determined according to the attribute of which pronunciation information the word was searched for, and the result output unit 105 outputs the native language attribute according to the attribute (step 304) (step 304). (Step 305) Or, output the foreign language attribute (Step 306).
[0027]
【Example】
Next, the first and second embodiments will be described using specific examples. FIG. 5 is an example of the multilingual recognition dictionary in the present invention. As shown in FIG. 5, in the English attribute coffee, the pronunciations of “kafu” and “kahi”, which represent the original pronunciation in English in a similar Japanese phonetic notation, and in the Japanese attribute coffee, By giving each of the readings of "kohi", the coffee in both English and Japanese can be recognized.
[0028]
In a foreign language pronunciation learning system, if an English attribute coffee is output as a result with respect to an input speech of coffee, it is determined that the pronunciation is accurate as English.
[0029]
In the output of the result, a method of outputting the attribute as it is, a method of outputting "kohi" for the Japanese attribute, and a method of outputting "coffee" for the English attribute, not only outputting the text but also the audio There is a way to output.
[0030]
Also, in the voice recognition unit 103, a score of English-likeness is calculated from the plausibility of the Japanese attribute "kohi" and the plausibility of the English attribute "kafu" and presented to the learner. There is also.
[0031]
Further, there is also a method of judging Englishness and Japaneseness using pronunciation as shown in FIG. 5 as pronunciation information, as well as prosody such as accent and intonation. This method is effective in determining dialect differences in the same language.
[0032]
FIG. 6 is an example in which the pronunciation conversion unit 402 is used to create the multiple language recognition dictionary 104 of FIG.
[0033]
The original pronunciation information 401 of a certain language is input to the pronunciation conversion unit 402, and the multilingual recognition dictionary 104 is automatically generated as pronunciation information of another language.
[0034]
For example, assuming that the English word "coffee" was originally provided with the English pronunciation information of / k / ao / f / iy /, "/" from / k / ao / and "fu" from / f / iy /. By preparing a rule for generating “@” or “hi” in the pronunciation conversion unit, “kafu” or “kahi” can be obtained as pronunciation information of the word “coffee”.
[0035]
Similarly, in the notation “coffee”, a rule that generates “ko” from “co” and “hi” from “ffee” is prepared in the pronunciation conversion unit, so that the pronunciation of the word “coffee” is obtained. "Kohi" can also be obtained as information.
[0036]
【The invention's effect】
The first effect is that there is no need to prepare a plurality of speech recognition systems or speech standard patterns corresponding to a plurality of languages, since a single language speech recognition system is sufficient for recognizing speech in a plurality of languages. And the increase in the amount of processing at the time of recognition can be suppressed.
[0037]
The second effect is that it is not necessary to prepare a plurality of speech recognition systems and speech standard patterns corresponding to a plurality of languages in pronunciation learning of a foreign language or the like, so that simple realization is possible.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of first and second embodiments of the present invention.
FIG. 2 is a diagram showing a specific example of the multiple language recognition dictionary 104 of FIG.
FIG. 3 is a flowchart showing the operation of the first embodiment.
FIG. 4 is a flowchart showing the operation of the second embodiment.
FIG. 5 is a diagram showing a specific example of the first and second embodiments.
FIG. 6 is a diagram showing a specific example of the first and second embodiments.
[Explanation of symbols]
101 Input speech 102 Microphone 103 Speech recognition unit 104 Multiple language recognition dictionary 105 Result output unit 401 Original pronunciation information of each language 402 Pronunciation conversion unit S201 Recognition processing start S202 Voice input S203 Search S204 Refer to search result attribute S205 Language a Output S206 Output of language b S207 Recognition processing end S301 Recognition processing start S302 Speech input S303 Search S304 Refer to search result attributes S305 Output of native language attributes S306 Output of foreign language attributes S307 End of recognition processing

Claims

A recognition dictionary that accepts input speech, describes pronunciation information in multiple languages in at least some of the words using a common pronunciation description method for a specific language, and registers the pronunciation information as well as information on which language it belongs to. And searching for the word or word string closest to the input voice and its pronunciation information from the words registered in the recognition dictionary, and as a result of the search, at least the pronunciation information of the word or word string is in any language. A multilingual speech recognition system characterized in that the output is changed according to whether the user belongs.

A recognition dictionary that accepts input speech, describes pronunciation information in multiple languages for at least some of the words in a pronunciation description method for any of the languages, and registers the pronunciation information along with information on which language the pronunciation information belongs to. And searching for the word or word string closest to the input voice and its pronunciation information from the words registered in the recognition dictionary, and as a result of the search, at least the pronunciation information of the word or word string is in any language. A multilingual speech recognition system characterized in that the output is changed according to whether the user belongs.

A recognition dictionary that accepts input speech, describes pronunciation information of a plurality of dialects in at least some of the words in a pronunciation description method of one of the dialects, and registers the pronunciation dictionary with information on which dialect the pronunciation information belongs to. A word or a word string closest to the input voice and its pronunciation information are searched from the words registered in the recognition dictionary, and as a result of the search, at least the pronunciation information of the word or the word string belongs to any dialect. A multi-language speech recognition system, characterized by outputting whether or not the user has spoken.

Accepts input speech, describes native language and foreign language pronunciation information in at least some of the words using the pronunciation description method of one of these languages, and registers it along with information on which language the pronunciation information belonged to A word or word string closest to the input voice and its reading are searched from words registered in the recognition dictionary, and at least the reading of the word or word string is in the native language as a result of the search. A pronunciation learning system that outputs whether it was wind or foreign language.

An input voice for the presented utterance content is received, and the pronunciation information of the native language style and the foreign language style is described in the words of the utterance content in a pronunciation description method of any of the languages, and the pronunciation information is written in any language. A recognition dictionary registered together with information as to whether the word belongs to, a word or a word string closest to the input voice and its pronunciation information are searched from words registered in the recognition dictionary, and at least the word or the word is searched as a result of the search. A pronunciation learning system characterized in that the pronunciation information of a word string is output as to whether it is a native language style or a foreign language style, thereby determining the foreign language pronunciation of the utterance content.

An input voice for the presented utterance content is received, and the pronunciation information of the native language style and the foreign language style is described in the words of the utterance content in a pronunciation description method of any of the languages, and the pronunciation information is written in any language. A recognition dictionary registered together with information as to whether the word belongs to, a word or word string closest to the input voice and its pronunciation information are searched from the words registered in the recognition dictionary, and at least the native language is searched as a result of the search. A pronunciation learning system that calculates and outputs a score of a foreign language likelihood from a score of the likelihood and a score of the likelihood of a foreign language.

A pronunciation conversion unit for converting pronunciation information in a certain language from the original pronunciation or notation of the language or a combination of both to pronunciation information in a different language, and recognizes the pronunciation information output from the pronunciation conversion unit as a recognition dictionary The pronunciation learning system according to any one of claims 4 to 6, wherein the pronunciation information is used as a search target by registering the pronunciation information.

The pronunciation learning system according to any one of claims 4 to 7, wherein at least prosody is used as pronunciation information.

A pronunciation learning system using voice recognition, wherein if the recognition result is a native language, the result is output in a native language, and if the recognition result is a foreign language, the result is output in a foreign language.