JP2007193350A

JP2007193350A - Speech correction apparatus, speech correction method and recording medium

Info

Publication number: JP2007193350A
Application number: JP2007047403A
Authority: JP
Inventors: Ayako Minematsu; 彩子峰松
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1997-11-17
Filing date: 2007-02-27
Publication date: 2007-08-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speech recognition apparatus and the method thereof for correctly recognizing an English word from non-native English pronunciation. <P>SOLUTION: A vector data generating part 180 and a label generating part 182 processes speech data of one sentence English speech pronounced by a Japanese to convert it to a label string. A candidate word generating part 184 correlates the label string of one sentence to a first candidate word comprising one or more English words. An analogous word adding part 186 uses a word database 160 to search an English word analogous to the pronunciation of the first candidate word, such as an analogous word "lead" for a first candidate word "read", for example, (it is difficult for a Japanese to discriminate between "l" and "r" in pronunciation), and adds the obtained analogous word to the first candidate word to make it be a second candidate word. A selection part 188 selects one of the second candidate words as a final result of recognition in response to users operation and connects the selected words into English text data for output. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、特定の言語を母国語とする話者による他の特定の言語の話し声に含まれる後者の言語の単語それぞれを識別する音声識別装置およびその方法、例えば、日本人が話す英語の音声を識別し、音声に含まれる英単語の列を示すデータ（テキストデータ）として出力する音声識別装置およびその方法に関する。 The present invention relates to a speech identification device and method for identifying each word of the latter language included in the speech of another specific language by a speaker whose native language is a specific language, for example, an English speech spoken by a Japanese The present invention relates to a voice identification device and method for outputting the data as text (text data) indicating a sequence of English words included in the voice.

また、本発明は、上記音声識別装置およびその方法の処理過程において得られるデータ（候補単語データ）を利用して、正しい発音を話者に教え、発音を矯正する発音矯正装置およびその方法に関する。 The present invention also relates to a pronunciation correcting apparatus and method for correcting correct pronunciation by teaching a speaker of correct pronunciation using data (candidate word data) obtained in the process of the speech identification apparatus and method.

不特定話者が話す音声に含まれる単語それぞれを識別し、テキストデータとして出力する音声識別装置が用いられている。例えば、特開平０６−１２４８３号公報、特開平０８−５０４９３号公報および特開平０９−２２２９７(文献１〜３）等は、このような音声識別方法を開示する。 A voice identification device that identifies each word included in the voice spoken by an unspecified speaker and outputs it as text data is used. For example, Japanese Patent Laid-Open Nos. 06-12483, 08-50493, and 09-22297 (references 1 to 3) disclose such voice identification methods.

例えば、これら従来の音声識別方法を用い、英語の音声から英単語を識別する英語用音声識別装置により、日本人が発音した英語音声から英文のテキストデータを作成すると、識別率が低下してしまう。これは、英語には、日本語に存在しない音（th等）、あるいは、日本語においては区別しにくい音（l,r等）等があり、日本人は、一般に、このような英語の音を正確に発音できないので、英語用音声識別装置が不正確な発音をそのまま単語に置き換えてしまうためである。例えば、日本人が英語で"rice"と発音したつもりであっても、英語用の音声識別装置は、この音声を"lice"あるいは"louse"と識別してしまうことがある。 For example, using these conventional speech identification methods and creating English text data from English speech pronounced by Japanese using an English speech recognition device that identifies English words from English speech will result in a decrease in the recognition rate. . In English, there are sounds that do not exist in Japanese (th etc.) or sounds that are difficult to distinguish in Japanese (l, r, etc.). This is because the English speech recognition device replaces an incorrect pronunciation with a word as it is. For example, even if a Japanese intends to pronounce “rice” in English, an English voice identification device may identify this voice as “lice” or “louse”.

このような不具合は、上に示した例とは逆に、英語を母国語とするアメリカ人が、日本語の音声から日本文のテキストを作成する音声識別装置を用いる場合、あるいは、イギリス式の英語を母国語とする英国人が、アメリカ式英語用に調整された音声識別装置を用いる場合、あるいは、特定の人が何らかの理由で正確な発音をしにくくなった場合等、様々な場合に生じうる。しかしながら、上記各文献に開示された音声識別方法のいずれも、このような不具合を解決しえない。 Contrary to the example shown above, this problem occurs when an American whose native language is English uses a speech recognition device that creates Japanese text from Japanese speech, or an English- It occurs in various cases, such as when an English-speaking British uses a voice recognition device adjusted for American English, or when a particular person is difficult to pronounce correctly for some reason. sell. However, none of the voice identification methods disclosed in the above documents can solve such a problem.

ここで、話者の英語の発音が上達し、ネイティブスピーカーに近づけば、音声識別装置による識別率が向上することは言うまでもなく、しかも、話者にとって英会話が上達することは好ましいことである。 Here, if the speaker's English pronunciation improves and comes closer to the native speaker, it goes without saying that the recognition rate by the voice identification device is improved, and it is preferable for the speaker to improve English conversation.

例えば、特開平４−５４９５６号公報は、話者の英語音声を識別し、識別した英語音声を話者に確認させる学習装置を開示する（文献４）。また、例えば、特開昭６０−１２３８８４号公報は、音声合成ＬＳＩを用い、話者に学習しようとする音声を聞かせる英語学習機を開示する（文献５）。その他、特公昭４４−７１６２号公報、特開平７−１１７８０７号公報、特開昭６１−１８０６８号公報、特開平８−２７５８８号公報、特開昭６２−１１１２７８号公報、特開昭６２−２９９９８５号公報、特開平３−７５８６９号公報、特公平６−２７９７１号公報、特公平８−１２５３５号公報および特開平３−２２６７８５号公報等、数多くの文献に、外国語の発音を学習するための学習装置が開示されている（文献６〜１５）。 For example, Japanese Patent Laid-Open No. 4-54956 discloses a learning device that identifies a speaker's English speech and allows the speaker to confirm the identified English speech (Reference 4). Further, for example, Japanese Patent Laid-Open No. 60-123984 discloses an English language learning machine that uses a voice synthesis LSI to let a speaker hear a voice to be learned (Reference 5). In addition, JP-B-44-7162, JP-A-7-117807, JP-A-61-1868, JP-A-8-27588, JP-A-62-211278, JP-A-62-299985 No. 3, 75,869, JP-B-6-27971, JP-B-8-12535, and JP-A-3-226785, for learning pronunciation of foreign languages A learning device is disclosed (References 6 to 15).

しかしながら、これらの文献に開示された学習装置を用いても、提示された発音と自分の発音とを比較しなければならなかったり、自分の発音のどこが誤っているかが分からなかったりして、話者が充分な学習効果を必ずしも得ることはできない。
特開平０６−１２４８３号公報特開平０８−５０４９３号公報特開平０９−２２２９７号公報特開平４−５４９５６号公報特開昭６０−１２３８８４号公報特公昭４４−７１６２号公報特開平７−１１７８０７号公報特開昭６１−１８０６８号公報特開平８−２７５８８号公報特開昭６２−１１１２７８号公報特開昭６２−２９９９８５号公報特開平３−７５８６９号公報特公平６−２７９７１号公報特公平８−１２５３５号公報特開平３−２２６７８５号公報 However, even with the learning devices disclosed in these documents, it is necessary to compare the proposed pronunciation with your own pronunciation, or because you do not know what is wrong with your pronunciation. A person cannot always obtain a sufficient learning effect.
Japanese Patent Laid-Open No. 06-12383 Japanese Patent Laid-Open No. 08-50493 JP 09-22297 A Japanese Patent Laid-Open No. 4-54956 JP-A-60-123848 Japanese Patent Publication No. 44-7162 JP-A-7-117807 JP 61-18068 A JP-A-8-27588 JP-A-62-111278 JP-A-62-299985 JP-A-3-75869 Japanese Patent Publication No. 6-27971 Japanese Patent Publication No.8-12535 JP-A-3-226785

本発明は、上述した従来技術の問題点に鑑みてなされたものであり、所定の言語を母国語としない話者（ノン・ネイティブ）による所定の言語の話し声に含まれる単語それぞれを識別し、話者が意図する所定の言語の単語に置換して、正確なテキストデータを作成することができる音声識別装置およびその方法を提供することを目的とする。 The present invention has been made in view of the above-described problems of the prior art, and identifies each word included in the spoken language of a predetermined language by a speaker who is not a native language of the predetermined language (non-native), It is an object of the present invention to provide a speech identification apparatus and method capable of creating accurate text data by replacing words in a predetermined language intended by a speaker.

また、本発明は、話されている地域が異なる等のために、同一の言語の発音が変化したような場合であっても、いずれの地域の話者による話し声でも、話者が意図する単語に変換して、正確なテキストデータを作成することができる音声識別装置およびその方法を提供することを目的とする。また、本発明は、発音の個人差を補って、常に高い識別率を保つことができる音声識別装置およびその方法を提供することを目的とする。 In addition, the present invention provides a word intended by a speaker, regardless of whether the pronunciation of the same language has changed due to differences in the spoken region, etc. It is an object of the present invention to provide a voice identification apparatus and method capable of generating accurate text data by converting the data into the above. It is another object of the present invention to provide a voice identification apparatus and method that can compensate for individual differences in pronunciation and always maintain a high identification rate.

さらに、本発明は、上記音声識別装置およびその方法の処理の過程で得られるデータを利用して話者の発音の問題点を指摘し、話者にネイティブスピーカの発音を学習させ、話者の発音を矯正する発音矯正装置およびその方法を提供することを目的とする。また、本発明は、話者の発音と正確な発音とを自動的に比較して誤りを指摘することができ、さらに、話者が発音をどのように矯正すべきかを示す詳細な情報を提示し、その発音を矯正することができる発音矯正装置およびその方法を提供することを目的とする。 Furthermore, the present invention points out the problem of the speaker's pronunciation by using the data obtained in the process of the voice identification device and method, and allows the speaker to learn the pronunciation of the native speaker. An object of the present invention is to provide a pronunciation correcting apparatus and method for correcting pronunciation. In addition, the present invention can automatically compare the pronunciation of the speaker with the correct pronunciation to point out an error, and further presents detailed information indicating how the speaker should correct the pronunciation An object of the present invention is to provide a pronunciation correcting apparatus and method that can correct the pronunciation.

［第１の音声識別装置]
上記目的を達成するために、本発明にかかる第１の音声識別装置は、話し声に含まれる１つ以上の単語を示す音声データから、前記単語それぞれを識別する音声識別装置であって、１つ以上の前記単語の音声データそれぞれに、これらの音声データの１つ以上それぞれを識別して得られる１つ以上の前記単語の組み合わせの候補（候補単語）の１組以上を対応付ける候補単語対応付け手段と、１つ以上の前記単語の音声データそれぞれに対応付けた前記候補単語それぞれに、前記候補単語それぞれの発音に対応しうる１つ以上の前記単語の組み合わせ（類似単語）の０組以上を対応付ける類似単語対応付け手段と、１つ以上の前記単語の音声データそれぞれに対応付けた前記候補単語、および、前記候補単語それぞれに対応付けた前記類似単語のいずれかを選択し、前記単語の音声データそれぞれの識別結果とする音声データ識別手段とを有する。 [First voice identification device]
In order to achieve the above object, a first speech identification device according to the present invention is a speech identification device for identifying each of the words from speech data indicating one or more words included in a spoken voice. Candidate word associating means for associating one or more sets of one or more candidate word combinations (candidate words) obtained by identifying one or more of these voice data with each of the voice data of the words. And 0 or more sets of one or more combinations of words (similar words) that can correspond to the pronunciation of each candidate word are associated with each of the candidate words associated with each of the speech data of the one or more words. Similar word association means, the candidate word associated with each of the one or more voice data of the word, and the similar unit associated with each of the candidate words Select one of the, and an audio data identification means for the identification result of each audio data of the word.

好適には、前記音声データは、所定の言語の話し声に含まれる１つ以上の単語を示し、前記候補単語対応付け手段は、１つ以上の前記単語の音声データそれぞれに、これらの音声データの１つ以上それぞれを識別して得られる前記所定の言語の候補単語の１組以上を対応付け、前記類似単語対応付け手段は、１つ以上の前記単語の音声データそれぞれに対応付けた前記候補単語それぞれに、前記候補単語それぞれの発音に対応しうる前記所定の言語の類似単語の０組以上を対応付け、音声データ識別手段は、１つ以上の前記単語の音声データそれぞれに対応付けた前記候補単語、および、前記候補単語それぞれに対応付けた前記類似単語のいずれかを選択し、１つ以上の前記単語の音声データそれぞれの識別結果とする。 Preferably, the voice data indicates one or more words included in a spoken language of a predetermined language, and the candidate word associating means assigns each of the voice data of the one or more words to the voice data of these voice data. One or more sets of candidate words of the predetermined language obtained by identifying each one or more are associated, and the similar word associating means associates the candidate words associated with each of the one or more speech data of the words Each candidate is associated with 0 or more pairs of similar words in the predetermined language that can correspond to the pronunciation of each of the candidate words, and the speech data identification means associates each candidate with speech data of one or more of the words Either a word or the similar word associated with each of the candidate words is selected and used as an identification result for each of the voice data of one or more of the words.

好適には、前記所定の言語の話し声の発音は、前記所定の言語以外の所定の言語を主に話す話者によってなされ、前記話者によって前記所定の言語の単語の１つ以上それぞれが発音された場合に、前記所定の言語の話し声に含まれる単語の音声データの１つ以上それぞれに対応しうる前記所定の言語の単語の０組以上を、予め前記所定の言語の単語の１つ以上それぞれと対応付けて、前記所定の言語の単語の１つ以上それぞれの前記類似単語として記憶する類似単語記憶手段を有し、前記類似単語対応付け手段は、予め前記所定の言語の単語の１つ以上それぞれと対応付けて記憶した前記類似単語の０組以上を、前記候補単語それぞれに対応付ける。 Preferably, the pronunciation of the voice of the predetermined language is made by a speaker who mainly speaks a predetermined language other than the predetermined language, and each of one or more words of the predetermined language is pronounced by the speaker. In this case, zero or more sets of words of the predetermined language that can correspond to one or more of the voice data of the words included in the spoken language of the predetermined language, respectively, one or more of the words of the predetermined language And similar word storage means for storing one or more words in the predetermined language as the similar words, and the similar word associating means includes one or more words in the predetermined language in advance. Each of the candidate words is associated with zero or more sets of the similar words stored in association with each of the candidate words.

好適には、前記候補単語対応付け手段は、音声データに対応付けた前記候補単語それぞれの確からしさを示す確率データを、音声データに対応付けた前記候補単語それぞれに付し、前記音声データ識別手段は、前記確率データの値が所定の範囲をとる前記候補単語のみを選択し、前記単語の音声データの識別結果とする。 Preferably, the candidate word associating means attaches probability data indicating the probability of each of the candidate words associated with the speech data to each of the candidate words associated with the speech data, and the speech data identifying means Selects only the candidate words for which the value of the probability data falls within a predetermined range, and uses it as the identification result of the speech data of the words.

好適には、前記候補単語対応付け手段は、前記類似単語それぞれに対応する発音の誤りを示す誤り情報を、音声データに対応付けた前記候補単語それぞれに付す。 Preferably, the candidate word associating means attaches error information indicating a pronunciation error corresponding to each of the similar words to each of the candidate words associated with speech data.

［音声識別装置の作用]
本発明にかかる音声識別装置は、特定の言語（以下本項目において英語を例示する）以外の言語（同様に日本語を例示する）を母国語とし、主に話す話者（同様に日本人を例示する）が発音した英語の話し声（音声）に含まれる単語を識別し、英単語に置き換えてテキストデータを作成する。 [Operation of voice recognition device]
The speech recognition apparatus according to the present invention uses a language other than a specific language (hereinafter, English is exemplified in this item) as a native language (similarly, Japanese is exemplified), and mainly speaks (similarly, Japanese). A word included in an English spoken voice (speech) pronounced by (example) is identified and replaced with an English word to create text data.

本発明にかかる音声識別装置において、マイク等から入力され、ディジタルデータに変換された日本人による英語音声（音声データ）は、例えば、音の特徴（音の高さ、強さ、イントネーション等）ごとに量子化したベクトルデータに変換され、さらに、ラベルと呼ばれ、発音記号に類似する音のデータに変換されて、候補単語対応付け手段に対して出力される。 In the speech identification device according to the present invention, English speech (speech data) by a Japanese input from a microphone or the like and converted into digital data is, for example, for each sound feature (sound pitch, strength, intonation, etc.). Is further converted into vector data quantized to, further referred to as a label, converted into sound data similar to a phonetic symbol, and output to candidate word association means.

［候補単語対応付け手段]
候補単語対応付け手段は、ラベルに変換された音声データを１つの単語、あるいは、一連の複数の単語ごとに処理し、音声データと、音声データの識別結果の候補として、単独（１個）の英単語、あるいは、複数の英単語の組み合わせ（これらを総称して候補単語と記す）とを対応付ける。 [Candidate word association means]
The candidate word associating means processes the voice data converted into the label for one word or a series of a plurality of words, and sets the voice data and the identification result of the voice data as a single (one) candidate. An English word or a combination of English words (collectively referred to as candidate words) is associated.

［類似単語記憶手段]
類似単語記憶手段は、正確な英語の発音とは異なるが、日本人が英語を発音した場合に、音声データと対応しうる単独の英単語、あるいは、複数の英単語の組み合わせ（これらを総称して類似単語と記す）と、候補単語となりうる単独の英単語あるいは複数の英単語の組み合わせとを予め対応付けた、例えば、検索用辞書データを記憶する。一例を挙げると、この辞書データにおいては、日本人による不正確な英語の発音に対応するために、候補単語となりうる単独の英単語"lead"に、類似単語"read"（日本人が区別しにくいl,rを考慮したもの、一般に日本人は"r"の発音が不得手である）が対応付けられる。なお、英単語に対して類似単語が存在しない場合があるので、このような場合には、この辞書において、英単語に類似単語は対応付けられない。 [Similar word storage means]
Similar word storage means different from accurate English pronunciation, but when a Japanese pronounces English, it can be a single English word or a combination of multiple English words that can correspond to speech data (collectively referring to these words) For example, search dictionary data is stored in which a single English word that can be a candidate word or a combination of a plurality of English words is associated in advance. For example, in this dictionary data, in order to cope with inaccurate English pronunciation by Japanese, a single English word "lead" that can be a candidate word is used, and the similar word "read" It is difficult to take into account l and r. Generally, Japanese is not good at pronunciation of "r"). Note that similar words may not exist for English words. In such a case, similar words are not associated with English words in this dictionary.

［類似単語対応付け手段]
類似単語対応付け手段は、類似単語記憶手段が記憶した辞書データを検索し、候補単語に対応付けられた類似単語を読み出して、候補単語に対する類似単語の対応付けを行なう。上記例においては、日本人が発音した英単語"read"に対応する音声データに、英単語"lead"と、類似単語"read"とが対応付けられる。 [Similar word matching means]
The similar word association means searches the dictionary data stored in the similar word storage means, reads out the similar words associated with the candidate words, and associates the similar words with the candidate words. In the above example, the English word “lead” and the similar word “read” are associated with the voice data corresponding to the English word “read” pronounced by the Japanese.

［音声データ識別手段]
音声データ識別手段は、例えば、それまでに識別された英単語列に対する構文解析処理に基づいて、あるいは、ユーザの選択操作に応じて、音声データに対応付けられた候補単語および類似単語のいずれかを選択し、識別結果とする。 [Voice data identification means]
The voice data identifying means is, for example, one of a candidate word and a similar word associated with the voice data based on a parsing process for an English word string identified so far or according to a user's selection operation. Is selected as the identification result.

本発明にかかる音声識別装置の各構成要素は、次々に入力される音声データに対して、ここまでに述べた処理を順次、行ない、音声データに含まれる英単語を識別し、識別した英単語を接続したテキストデータを作成する。 Each component of the speech identification device according to the present invention sequentially performs the processes described so far on the sequentially input speech data, identifies the English words included in the speech data, and identifies the identified English words Create text data that connects

なお、ここまでの説明では、日本人による英語の音声を例示したが、類似単語記憶手段が、アメリカ式の英語の発音とは異なるが、イギリス式の英語の発音である場合に、音声データと対応しうる類似単語を、候補単語に対応付けた辞書データを記憶する等の変更を加えることにより、本発明にかかる音声識別装置は、イギリス式の発音による英語音声、および、アメリカ式の発音による英語音声の両者を識別してテキストデータを生成することができる。 In the above description, the English voice by the Japanese has been exemplified. However, when the similar word storage means is different from the American English pronunciation, the voice data By adding a change such as storing dictionary data in which similar words that can correspond to candidate words are stored, the speech identification device according to the present invention is based on English speech by English pronunciation and American pronunciation. Both English speech can be identified and text data can be generated.

［所定の言語の範囲]
このように、上記「所定の言語」の範囲は、候補単語対応付け手段が、充分な識別率で、音声データと単語とを対応付けられる範囲として定義される。従って、例えば、通常、同一とされる言語であっても、地域的な隔離により発音が相互に異なってしまい、いずれかに適するように調整された候補単語対応付け手段のみでは充分な識別率を得られない複数の方言(例えば、アメリカ、イギリス、オーストラリアおよび南アフリカ等各国の英語、スペインおよび中南米等各国のスペイン語)は、上記「所定の言語」の同一範囲には含まれない。さらに、何らかの理由により、特定の人の発音が不明りょうになり、この人の母国語（主に話す言葉）に適するように調整された候補単語対応付け手段のみでは充分な識別率が得られなくなった場合も同様である。 [Specified language range]
As described above, the range of the “predetermined language” is defined as a range in which the candidate word association unit can associate the speech data and the word with a sufficient identification rate. Therefore, for example, even in the same language, pronunciations may differ from each other due to regional isolation, and a sufficient identification rate can be obtained with only candidate word association means adjusted to suit one. A plurality of dialects that cannot be obtained (for example, English in countries such as the United States, the United Kingdom, Australia, and South Africa, and Spanish in countries such as Spain and Latin America) are not included in the same range of the “predetermined language”. In addition, for some reason, the pronunciation of a specific person becomes unclear, and the candidate word matching means adjusted to suit the native language (mainly spoken language) of this person cannot provide a sufficient identification rate. The same applies to the case.

［第２の音声識別装置]
また、本発明にかかる第２の音声識別装置は、所定の言語以外を主に話す話者による前記所定の言語の音声に含まれる前記所定の言語の１つ以上の単語を示す音声データから、前記所定の言語の単語の１つ以上それぞれを識別する音声識別装置であって、前記所定の言語の単語の音声データの１つ以上それぞれに、これらの音声データの１つ以上それぞれを識別して得られる前記所定の言語の単語、および、前記話者が話した可能性がある前記所定の言語の単語の１つ以上またはこれらのいずれかを対応付ける単語対応付け手段と、１つ以上の前記単語の音声データそれぞれに対応付けた単語のいずれかを選択し、１つ以上の前記単語の音声データそれぞれの識別結果とする音声データ識別手段とを有する。 [Second voice identification device]
Further, the second speech identification device according to the present invention is based on speech data indicating one or more words of the predetermined language included in the speech of the predetermined language by a speaker who mainly speaks a language other than the predetermined language. A speech identification device for identifying each of one or more words of the predetermined language, wherein each of the speech data of the words of the predetermined language is identified with one or more of these speech data. The obtained word of the predetermined language and the word association means for associating one or more of the words of the predetermined language that the speaker may have spoken or any of them, and the one or more of the words Voice data identifying means for selecting one of the words associated with each of the voice data and identifying each voice data of the one or more words.

［音声識別方法]
また、本発明にかかる第１の音声識別方法は、話し声に含まれる１つ以上の単語を示す音声データから、前記単語それぞれを識別する音声識別方法であって、１つ以上の前記単語の音声データそれぞれに、これらの音声データの１つ以上それぞれを識別して得られる１つ以上の前記単語の組み合わせの候補（候補単語）の１組以上を対応付けるステップと、１つ以上の前記単語の音声データそれぞれに対応付けた前記候補単語それぞれに、前記候補単語それぞれの発音に対応しうる１つ以上の前記単語の組み合わせ（類似単語）の０組以上を対応付けるステップと、１つ以上の前記単語の音声データそれぞれに対応付けた前記候補単語、および、前記候補単語それぞれに対応付けた前記類似単語のいずれかを選択し、前記単語の音声データそれぞれの識別結果とするステップとを含む。 [Voice identification method]
A first speech identification method according to the present invention is a speech identification method for identifying each of the words from speech data indicating one or more words included in a spoken voice, and the speech of the one or more words Associating each data with one or more sets of one or more candidate word combinations (candidate words) obtained by identifying one or more of these speech data, and the speech of the one or more words Associating each candidate word associated with each data with zero or more combinations of one or more words (similar words) that can correspond to the pronunciation of each candidate word; and One of the candidate word associated with each of the speech data and the similar word associated with each of the candidate words is selected, and the speech data of the word is selected. And a step of the identification result of, respectively.

また、本発明にかかる第２の音声識別方法は、所定の言語以外を主に話す話者による前記所定の言語の音声に含まれる前記所定の言語の１つ以上の単語を示す音声データから、前記所定の言語の単語の１つ以上それぞれを識別する音声識別方法であって、前記所定の言語の単語の音声データの１つ以上それぞれに、これらの音声データの１つ以上それぞれを識別して得られる前記所定の言語の単語、および、前記話者が話した可能性がある前記所定の言語の単語の１つ以上またはこれらのいずれかを対応付けるステップと、１つ以上の前記単語の音声データそれぞれに対応付けた単語のいずれかを選択し、１つ以上の前記単語の音声データそれぞれの識別結果とするステップとを含む。 Further, the second speech identification method according to the present invention is based on speech data indicating one or more words of the predetermined language included in the speech of the predetermined language by a speaker who mainly speaks a language other than the predetermined language. A speech identification method for identifying each of one or more words of the predetermined language, wherein each of the speech data of the words of the predetermined language is identified with one or more of these speech data. Associating the obtained word in the predetermined language and / or one or more of the words in the predetermined language that the speaker may have spoken, and voice data of the one or more words Selecting any one of the words associated with each of them, and using each of the voice data of the one or more words as an identification result.

［音声矯正装置]
また、本発明にかかる音声矯正装置は、単語を示す音声データを識別して得られる単語の候補（候補単語）を１個以上、対応付ける候補単語対応付け手段と、音声データに対応付けた前記候補単語それぞれに、前記候補単語それぞれの発音に対応しうる単語（類似単語）を０個以上、対応付ける類似単語対応付け手段と、前記音声データが示す単語と、この音声データに対応付けられた前記候補単語それぞれに対応付けられた前記類似単語とが一致する場合に、前記音声データが示す単語と同じ前記類似単語に対応し、前記音声データが示す単語の発音を矯正する発音矯正データを出力する発音矯正データ出力手段とを有する。 [Sound correction device]
In addition, the speech correction apparatus according to the present invention includes candidate word association means for associating one or more word candidates (candidate words) obtained by identifying speech data indicating words, and the candidates associated with speech data. Similar word associating means for associating each word with zero or more words (similar words) that can correspond to the pronunciation of each of the candidate words, the word indicated by the voice data, and the candidate associated with the voice data Pronunciation corresponding to the same similar word as that indicated by the audio data and outputting pronunciation correction data for correcting the pronunciation of the word indicated by the audio data when the similar words associated with each word match Correction data output means.

［発音矯正装置の作用]
本発明にかかる発音矯正装置において、候補単語対応付け手段および類似単語対応付け手段は、上述した本発明にかかる音声識別装置においてと同様に、音声データと、候補単語・類似単語とを対応付ける。 [Operation of pronunciation correction device]
In the pronunciation correcting apparatus according to the present invention, the candidate word associating means and the similar word associating means associate speech data with candidate words / similar words as in the above-described speech recognition apparatus according to the present invention.

［発音矯正データ出力手段]
話者がネイティブに近い正しい発音をしている場合には、話者の意図した単語と音声データの識別結果とは候補単語に含まれることになる。一方、話者の発音が誤っていたり、不明瞭であったりすると、話者の意図した単語は候補単語に含まれるが、音声データの識別結果は類似単語に含まれるということになる。従って、予め話者に発音すべき単語を示し、この単語を発音させた場合に、この単語が音声データの識別結果において、類似単語と一致するということは、ユーザ（話者）の発音に何らかの誤り、あるいは、発音に不明瞭さがあることを意味する。 [Pronunciation data output means]
If the speaker has a correct pronunciation close to native, the word intended by the speaker and the identification result of the voice data are included in the candidate word. On the other hand, if the speaker's pronunciation is incorrect or unclear, the word intended by the speaker is included in the candidate word, but the identification result of the speech data is included in the similar word. Therefore, when a word to be pronounced is shown to the speaker in advance and this word is pronounced, the fact that this word matches the similar word in the identification result of the voice data means that the pronunciation of the user (speaker) is something. It means that there is an error or ambiguity in pronunciation.

発音矯正データ出力手段は、話者に示した単語が類似単語と一致する場合に、話者に示した単語と一致する類似単語に対応付けられ、発音の誤り・不明瞭さを矯正する情報（例えば、ネイティブスピーカが正しい発音を行なう際の口および舌の動きを示す画像データ、および、ネイティブスピーカと比べて、話者の発音のどこが誤っているかを文章で示すテキストデータ）をモニタに表示し、話者に発音の矯正を促すとともに、話者の発音がネイティブスピーカの発音に近づくように学習を補助する。 When the word shown to the speaker matches a similar word, the pronunciation correction data output means is associated with the similar word that matches the word shown to the speaker, and corrects the pronunciation error / ambiguity ( For example, the monitor displays image data that shows the movement of the mouth and tongue when the native speaker makes correct pronunciation, and text data that shows in writing what is wrong in the speaker's pronunciation compared to the native speaker. In addition to prompting the speaker to correct pronunciation, learning is assisted so that the speaker's pronunciation approaches that of the native speaker.

［音声矯正方法]
また、本発明にかかる音声矯正方法は、単語を示す音声データを識別して得られる単語の候補（候補単語）を１個以上、対応付け、音声データに対応付けた前記候補単語それぞれに、前記候補単語それぞれの発音に対応しうる単語（類似単語）を０個以上、対応付け、前記音声データが示す単語と、この音声データに対応付けられた前記候補単語それぞれに対応付けられた前記類似単語とが一致する場合に、前記音声データが示す単語と同じ前記類似単語に対応し、前記音声データが示す単語の発音を矯正する発音矯正データを出力する。 [Voice correction method]
In addition, the speech correction method according to the present invention associates one or more word candidates (candidate words) obtained by identifying speech data indicating words, and each of the candidate words associated with speech data 0 or more words (similar words) that can correspond to the pronunciation of each candidate word are associated, the word indicated by the speech data, and the similar word associated with each of the candidate words associated with the speech data Is matched with the same similar word as the word indicated by the voice data, and pronunciation correction data for correcting the pronunciation of the word indicated by the voice data is output.

［第１実施形態]
以下、本発明の第１の実施形態を説明する。なお、以下、説明の明確化および便宜のために、特に断らない限り、本発明にかかる音声識別処理が、日本人が話す英語を識別するために調整されている場合を具体例として説明する。 [First embodiment]
Hereinafter, a first embodiment of the present invention will be described. In the following, for the sake of clarity of explanation and convenience, a case where the voice identification processing according to the present invention is adjusted to identify English spoken by Japanese will be described as a specific example unless otherwise specified.

［コンピュータ１]
まず、図１を参照して、本発明にかかる音声識別処理を実現するコンピュータ１を説明する。図１は、本発明にかかる音声識別処理を実現するコンピュータ１の構成を例示する図である。図１に例示するように、コンピュータ１は、例えば、音声入出力機能を有するパーソナルコンピュータであって、ＣＰＵ、メモリおよびこれらの周辺装置等を含むコンピュータ本体１０、出力装置１００、光磁気（ｍｏ）ディスク装置、ハードディスク装置あるいはフロッピーディスク装置等の記憶装置１１０、および、入力装置１２０から構成される。出力装置１００は、ＣＲディスプレイ装置等のモニタ１０２、音声出力用のスピーカ１０４、プリンタ１０６等を含む。入力装置１２０は、マイク１２２、音声入力用ボード１２４、キーボード１２６およびマウス１２８等を含む。 [Computer 1]
First, with reference to FIG. 1, a computer 1 that realizes a voice identification process according to the present invention will be described. FIG. 1 is a diagram exemplifying a configuration of a computer 1 that realizes voice identification processing according to the present invention. As illustrated in FIG. 1, the computer 1 is, for example, a personal computer having a voice input / output function, and includes a computer main body 10 including a CPU, a memory, and peripheral devices thereof, an output device 100, a magneto-optical (mo). It comprises a storage device 110 such as a disk device, hard disk device or floppy disk device, and an input device 120. The output device 100 includes a monitor 102 such as a CR display device, an audio output speaker 104, a printer 106, and the like. The input device 120 includes a microphone 122, a voice input board 124, a keyboard 126, a mouse 128, and the like.

［モニタ１０２]
出力装置１００において、モニタ１０２は、コンピュータ１のユーザに対して操作用のＧＵＩ画像、および、コンピュータ本体１０が音声を識別して得られたテキストデータ等を表示する。 [Monitor 102]
In the output device 100, the monitor 102 displays a GUI image for operation to the user of the computer 1, text data obtained by the computer main body 10 identifying voice, and the like.

［スピーカ１０４]
スピーカ１０４は、コンピュータ本体１０が音声を識別して得られたテキストデータを音声として出力するため等に用いられる。 [Speaker 104]
The speaker 104 is used for outputting text data obtained by the computer body 10 identifying voice as voice.

［プリンタ１０６]
プリンタ１０６は、コンピュータ本体１０が音声を識別して得られたテキストデータのハードコピーを出力するため等に用いられる。 [Printer 106]
The printer 106 is used for outputting a hard copy of text data obtained by the computer main body 10 identifying voice.

［記憶装置１１０]
記憶装置１１０は、コンピュータ本体１０による制御に従って動作し、コンピュータ本体１０が音声を識別して得られたテキストデータを記憶する。また、記憶装置１１０は、音声識別に必要なデータ（以下、このようなデータを「単語データ」と総称する）およびプログラム等を記憶し、記憶した単語データおよびプログラム等をコンピュータ本体１０に対して出力する。なお、記憶装置１１０に記憶される単語データは、例えば、音声識別プログラム１６により作成され、あるいは、フロッピーディスク等の記録媒体により供給され、ラベル列テーブル、インデックステーブル、単語レコード、類似単語レコードおよび誤り情報コードテーブル（ラベル列データを除くこれらの詳細は、図４〜図７を参照して後述する）を含む。 [Storage device 110]
The storage device 110 operates according to control by the computer main body 10 and stores text data obtained by the computer main body 10 identifying voice. The storage device 110 stores data necessary for voice identification (hereinafter, such data is collectively referred to as “word data”), a program, and the like, and the stored word data and program are stored in the computer main body 10. Output. The word data stored in the storage device 110 is created by, for example, the voice identification program 16 or supplied by a recording medium such as a floppy disk, and includes a label string table, an index table, a word record, a similar word record, and an error. An information code table (these details excluding label column data will be described later with reference to FIGS. 4 to 7) is included.

［マイク１２２]
マイク１２２は、ユーザが発音した話し声の音声を集音し、アナログ形式の音声信号に変換して音声入力用ボード１２４に対して出力する [Microphone 122]
The microphone 122 collects the voice of the voice spoken by the user, converts it into an analog audio signal, and outputs it to the audio input board 124.

［音声入力用ボード１２４]
音声入力用ボード１２４は、コンピュータ本体１０による制御に従って動作し、マイク１２２から入力された音声信号をサンプリングし、音声信号の波形に対応するディジタル形式の音声データに変換してコンピュータ本体１０に対して出力する。 [Voice input board 124]
The voice input board 124 operates in accordance with the control by the computer main body 10, samples the voice signal input from the microphone 122, converts it into digital voice data corresponding to the waveform of the voice signal, and outputs it to the computer main body 10. Output.

［キーボード１２６，マウス１２８]
キーボード１２６およびマウス１２８は、例えば、モニタ１０２に表示されたＧＵＩに対するユーザの操作を受け入れて、操作入力としてコンピュータ本体１０に対して出力する。 [Keyboard 126, Mouse 128]
The keyboard 126 and the mouse 128, for example, accept user operations on the GUI displayed on the monitor 102, and output them to the computer main body 10 as operation inputs.

［ソフトウェア１４]
以下、図２を参照して、本発明にかかる音声識別処理を実現するソフトウェアの構成を説明する。図２は、本発明にかかる音声識別処理を実現するソフトウェア１４の構成を示す図である。なお、図２においては、本発明にかかる音声識別処理の実現に関係がないソフトウェアの構成部分を省略してある。 [Software 14]
Hereinafter, with reference to FIG. 2, the configuration of software for realizing the voice identification processing according to the present invention will be described. FIG. 2 is a diagram showing a configuration of the software 14 for realizing the voice identification process according to the present invention. In FIG. 2, software components that are not related to the implementation of the voice identification process according to the present invention are omitted.

図２に示すように、ソフトウェア１４は、ハードウェア（Ｈ／Ｗ）サポート部１４２、オペレーションシステム（ＯＳ）１４８およびアプリケーション部から構成される。ハードウェアサポート部１４２は、音声デバイスドライバ１４４および記憶デバイスドライバ１４６を含む。オペレーティングシステム１４８は、例えば、ＯＳ／２（ＩＢＭ社商品名）あるいはＷｉｎｄｏｗｓ（マイクロソフト社商品名）といった汎用ＯＳであって、音声インターフェース（ＩＦ）部１５０および記憶装置インターフェース部１５２を含む。また、ソフトウェア１４は、アプリケーション部として音声識別プログラム１６を含む。ソフトウェア１４のこれらの構成部分は、記憶装置１１０に記憶され、必要に応じてコンピュータ本体１０のメモリにロードされて実行される。 As shown in FIG. 2, the software 14 includes a hardware (H / W) support unit 142, an operation system (OS) 148, and an application unit. The hardware support unit 142 includes an audio device driver 144 and a storage device driver 146. The operating system 148 is a general-purpose OS such as OS / 2 (IBM product name) or Windows (Microsoft product name), and includes an audio interface (IF) unit 150 and a storage device interface unit 152. The software 14 includes a voice identification program 16 as an application unit. These components of the software 14 are stored in the storage device 110 and loaded into the memory of the computer main body 10 for execution as necessary.

［音声デバイスドライバ１４４]
ハードウェアサポート部１４２において、音声デバイスドライバ１４４は、音声入力用ボード１２４を制御して、マイク１２２から入力される音声信号を音声データに変換させる。また、音声デバイスドライバ１４４は、音声入力用ボード１２４から入力された音声データを音声インターフェース部１５０に対して出力するインターフェース機能を実現する。また、音声デバイスドライバ１４４は、オペレーティングシステム１４８の音声インターフェース部１５０の制御に従って、音声入力用ボード１２４のサンプリング周期の変更等の設定変更、および、サンプリングの開始および終了等の動作制御を行なう。 [Audio device driver 144]
In the hardware support unit 142, the audio device driver 144 controls the audio input board 124 to convert an audio signal input from the microphone 122 into audio data. The audio device driver 144 implements an interface function for outputting audio data input from the audio input board 124 to the audio interface unit 150. Further, the audio device driver 144 performs setting control such as changing the sampling cycle of the audio input board 124 and operation control such as starting and ending sampling according to control of the audio interface unit 150 of the operating system 148.

［記憶デバイスドライバ１４６]
記憶デバイスドライバ１４６は、オペレーティングシステム１４８の記憶装置インターフェース部１５２からの要求（制御）に応じて、記憶装置１１０の動作を制御し、単語データおよび音声識別の結果として得られたテキストデータを記憶させ、あるいは、記憶装置１１０が記憶しているこれらのデータを読み出させる。また、記憶デバイスドライバ１４６は、記憶装置１１０から入力された単語データおよびテキストデータを、記憶装置インターフェース部１５２に対して出力し、あるいは、記憶装置インターフェース部１５２から入力されたこれらのデータを記憶デバイスドライバ１４６に対して出力するインターフェース機能を実現する。 [Storage Device Driver 146]
The storage device driver 146 controls the operation of the storage device 110 in response to a request (control) from the storage device interface unit 152 of the operating system 148, and stores word data and text data obtained as a result of voice identification. Alternatively, these data stored in the storage device 110 are read out. Further, the storage device driver 146 outputs the word data and text data input from the storage device 110 to the storage device interface unit 152, or these data input from the storage device interface unit 152 as a storage device. An interface function for outputting to the driver 146 is realized.

［オペレーティングシステム１４８]
オペレーティングシステム１４８は、音声インターフェース部１５０および記憶装置インターフェース部１５２が実現する機能の他、コンピュータ本体１０におけるプログラムの実行制御を行なう。また、オペレーティングシステム１４８は、音声識別プログラム１６が出力するテキストデータおよびＧＵＩ画像をモニタ１０２に表示する処理、テキストデータを音声信号に変換してスピーカ１０４を介して出力する処理、プリンタ１０６に対するハードコピーを行なうために必要な処理、および、キーボード１２６およびマウス１２８に対するユーザの操作を受け入れる等を行なう。 [Operating system 148]
The operating system 148 performs program execution control in the computer main body 10 in addition to the functions realized by the voice interface unit 150 and the storage device interface unit 152. The operating system 148 also displays text data and a GUI image output from the voice identification program 16 on the monitor 102, converts text data into a voice signal and outputs the voice signal via the speaker 104, and makes a hard copy for the printer 106. For example, processing necessary for performing the above and accepting user operations on the keyboard 126 and the mouse 128 are performed.

［音声インターフェース部１５０]
オペレーティングシステム１４８において、音声インターフェース部１５０は、音声識別プログラム１６からの要求（制御）に応じて、音声デバイスドライバ１４４を制御する。また、音声インターフェース部１５０は、音声デバイスドライバ１４４から入力された音声データを音声識別プログラム１６に対して出力するインターフェース機能を実現する。 [Voice interface unit 150]
In the operating system 148, the voice interface unit 150 controls the voice device driver 144 in response to a request (control) from the voice identification program 16. Further, the voice interface unit 150 realizes an interface function for outputting the voice data input from the voice device driver 144 to the voice identification program 16.

［記憶装置インターフェース部１５２]
記憶装置インターフェース部１５２は、記憶装置１１０の記憶領域を管理する。また、記憶装置インターフェース部１５２は、音声識別プログラム１６からの要求（制御）に応じて記憶デバイスドライバ１４６を制御し、音声識別プログラム１６から要求された単語データおよびテキストデータを記憶装置１１０から読み出させ、読み出された単語データおよびテキストデータを音声識別プログラム１６に対して出力する。また、記憶装置インターフェース部１５２は、音声識別プログラム１６から入力される単語データおよびテキストデータを、記憶デバイスドライバ１４６を介して記憶装置１１０の空き記憶領域に記憶させる。 [Storage device interface unit 152]
The storage device interface unit 152 manages the storage area of the storage device 110. Further, the storage device interface unit 152 controls the storage device driver 146 in response to a request (control) from the voice identification program 16, and reads word data and text data requested from the voice identification program 16 from the storage device 110. The read word data and text data are output to the voice identification program 16. Further, the storage device interface unit 152 stores word data and text data input from the voice identification program 16 in a free storage area of the storage device 110 via the storage device driver 146.

［音声識別プログラム１６]
以下、図３を参照して音声識別プログラム１６を説明する。図３は、図２に示した音声識別プログラム１６の構成を示す図である。 [Voice Identification Program 16]
Hereinafter, the voice identification program 16 will be described with reference to FIG. FIG. 3 is a diagram showing the configuration of the voice identification program 16 shown in FIG.

図３に示すように、音声識別プログラム１６は、単語データベース部１６０、制御部１６２および音声識別部１８から構成される。音声識別部１８は、ベクトルデータ生成部１８０、ラベル作成部１８２、候補単語作成部１８４、類似単語追加部１８６および絞り込み部１８８を含む。音声識別プログラム１６は、これらの構成部分により、操作用のＧＵＩ画像を表示し、表示したＧＵＩ画像に対するユーザの操作に従って、音声インターフェース部１５０から入力される音声データを、記憶装置インターフェース部１５２から入力される単語データを用いて識別し、識別の結果として得られた単語列をテキストデータとしてオペレーティングシステム１４８を介して出力する。 As shown in FIG. 3, the voice identification program 16 includes a word database unit 160, a control unit 162, and a voice identification unit 18. The voice identification unit 18 includes a vector data generation unit 180, a label creation unit 182, a candidate word creation unit 184, a similar word addition unit 186, and a narrowing-down unit 188. The voice identification program 16 uses these components to display a GUI image for operation, and inputs voice data input from the voice interface unit 150 from the storage device interface unit 152 in accordance with a user operation on the displayed GUI image. The word string obtained as a result of the identification is output via the operating system 148 as text data.

［制御部１６２]
制御部１６２は、操作用のＧＵＩ画像をモニタ１０２に表示し、表示したＧＵＩ画像に対して、ユーザが入力装置１２０のキーボード１２６およびマウス１２８を用いて行なう操作をオペレーティングシステム１４８を介して受け入れる。また、制御部１６２は、受け入れた操作入力に応じて、オペレーティングシステム１４８の音声インターフェース部１５０および記憶装置インターフェース部１５２を制御する。 [Control unit 162]
The control unit 162 displays an operation GUI image on the monitor 102 and accepts an operation performed by the user using the keyboard 126 and the mouse 128 of the input device 120 via the operating system 148 for the displayed GUI image. Further, the control unit 162 controls the voice interface unit 150 and the storage device interface unit 152 of the operating system 148 according to the received operation input.

また、制御部１６２は、受け入れた操作入力に応じて単語データベース部１６０を制御して、候補単語作成部１８４が音声データと候補単語とを対応付けるために用いるラベル列テーブル、類似単語追加部１８６が類似単語を候補単語と対応付けるために用いるインデックステーブル、単語レコード、類似単語レコードおよび誤り情報コードテーブル（図４〜図７を参照して後述する）を含む単語データを作成あるいは更新させ、記憶装置インターフェース部１５２等を介して記憶装置１１０に記憶させる。 In addition, the control unit 162 controls the word database unit 160 according to the received operation input, and the label string table and the similar word adding unit 186 used by the candidate word creating unit 184 to associate the voice data with the candidate words. Create or update word data including an index table, a word record, a similar word record, and an error information code table (to be described later with reference to FIGS. 4 to 7) used to associate similar words with candidate words, and a storage device interface The data is stored in the storage device 110 via the unit 152 or the like.

また、制御部１６２は、ＧＵＩ画像内に、音声データの各部分と対応付けた候補単語および類似単語を表示し、表示したこれらの単語に対する操作入力に応じて、候補単語および類似単語のいずれかを絞り込み部１８８に選択させ、最終的な識別結果とさせる。なお、制御部１６２による候補単語および類似単語の表示方法の例として、候補単語作成部１８４および類似単語追加部１８６が対応付けた候補単語をモニタ１０２に反転表示し、ユーザのキーボード１２６に対する操作に応じて候補単語および類似単語を変更して順次、表示する方法、あるいは、ユーザがモニタ１０２に表示された候補単語の誤り部分を見つけて、マウス１２８でクリックした場合に、クリックされた部分の候補単語と対応付けたウィンドウ内に、候補単語および類似単語の一覧を表示する等を挙げることができる。 In addition, the control unit 162 displays candidate words and similar words associated with each part of the audio data in the GUI image, and either the candidate word or the similar word is displayed according to the operation input for these displayed words. Is selected by the narrowing-down unit 188, and the final identification result is obtained. In addition, as an example of a method for displaying candidate words and similar words by the control unit 162, candidate words associated by the candidate word creating unit 184 and the similar word adding unit 186 are highlighted on the monitor 102, so that the user can operate the keyboard 126. The candidate word and the similar word are changed accordingly, and displayed sequentially, or when the user finds an error part of the candidate word displayed on the monitor 102 and clicks with the mouse 128, the candidate of the clicked part For example, a list of candidate words and similar words can be displayed in a window associated with the word.

［単語データベース部１６０]
単語データベース部１６０は、上述のように制御部１６２の制御に従って単語データを作成または更新し、記憶装置１１０に記憶させ、管理する。また、単語データベース部１６０は、候補単語作成部１８４に対して単語データ（ラベル列テーブル）を出力する。 [Word database section 160]
The word database unit 160 creates or updates word data according to the control of the control unit 162 as described above, and stores and manages the word data in the storage device 110. Further, the word database unit 160 outputs word data (label sequence table) to the candidate word creation unit 184.

また、単語データベース部１６０は、類似単語追加部１８６の要求に応じて単語データ（インデックステーブル、単語レコード、類似単語レコードおよび誤り情報コードテーブル;図４〜図７）を検索し、検索の結果として得られ、類似単語追加部１８６に入力された第１の候補単語に対応する単語レコード、類似単語レコードおよび誤り情報を類似単語追加部１８６に対して出力する。 Further, the word database unit 160 searches the word data (index table, word record, similar word record, and error information code table; FIGS. 4 to 7) in response to a request from the similar word adding unit 186, and as a result of the search. The obtained word record corresponding to the first candidate word input to the similar word adding unit 186, the similar word record, and the error information are output to the similar word adding unit 186.

［単語データ]
ここで、以下の説明の理解を容易にするために、図４〜図７を参照して、ラベル列テーブルを除く単語データ（インデックステーブル、単語レコード、類似単語レコードおよび誤り情報テーブル）を説明する。 [Word data]
Here, in order to facilitate understanding of the following description, word data (index table, word record, similar word record, and error information table) excluding the label column table will be described with reference to FIGS. .

［インデックステーブル]
図４は、単語データのインデックステーブルに含まれるデータを例示する図である。インデックステーブルは、単語データベース部１６０が、単語の先頭文字（Ａ〜Ｚ）により分類された単語レコードを検索するために用いられる。インデックステーブルは、図４に示すように、先頭文字Ａ〜Ｚそれぞれの記録領域の先頭を示すポインタと、先頭文字をそれぞれＡ〜Ｚとする単語レコードの数とが対応付けられて構成される。 [Index table]
FIG. 4 is a diagram illustrating data included in the index table of word data. The index table is used by the word database unit 160 to search for word records classified by the first character (A to Z) of the word. As shown in FIG. 4, the index table is configured by associating a pointer indicating the head of the recording area of each of the first characters A to Z with the number of word records having the first characters A to Z, respectively.

［単語レコード]
図５は、単語データの単語レコードに含まれるデータを例示する図である。図５に示すように、単語レコードは、類似単語追加部１８６において、候補単語作成部１８４が作成した第１の候補単語データ［候補単語データ（１）]
と突き合わされ、比較される見出し単語(TarWord)、次の見出し単語へのポインタ(NextP)、単語レコードに含まれる類似単語の数(#Can)および類似単語レコード(CanWord)が対応付けられて構成される。 [Word record]
FIG. 5 is a diagram illustrating data included in a word record of word data. As illustrated in FIG. 5, the word record includes the first candidate word data [candidate word data (1)] created by the candidate word creation unit 184 in the similar word addition unit 186.
The heading word (TarWord), the pointer to the next heading word (NextP), the number of similar words contained in the word record (#Can), and the similar word record (CanWord) Is done.

なお、図５においては、類似単語レコードが単語レコードに直接、対応付けた場合が示されているが、例えば、単語レコードに類似単語レコードのポインタを対応付け、類似単語レコードを単語レコードと別ファイルとする等、実現方法は問わない。また、単語レコードそれぞれの見出し単語(TarWord)を複数にして、類似単語追加部１８６が、連続した複数の単語に対応するラベルと、複数の単語を含む見出し単語とを対応付けられるようにしてもよい。 FIG. 5 shows a case where the similar word record is directly associated with the word record. For example, a pointer to the similar word record is associated with the word record, and the similar word record is separated from the word record. The realization method does not ask | require. Further, a plurality of headwords (TarWord) of each word record may be provided so that the similar word adding unit 186 can associate labels corresponding to a plurality of consecutive words with a headword including a plurality of words. Good.

［類似単語レコード]
図６は、単語データの類似単語レコードに含まれるデータを例示する図である。図６に示すように、類似単語レコードは、入力単語数（#m；#mは１以上の整数)、入力候補単語(aWord, aWord-1,aWord-2,...,aWord-m-1)、出力単語数（#n；#nは０以上の整数）、類似単語(COWord,COWord-1,..,COWord-n)および誤りコード(ECode)が対応付けられて構成される。 [Similar word record]
FIG. 6 is a diagram illustrating data included in a similar word record of word data. As shown in FIG. 6, the similar word record includes the number of input words (#m; #m is an integer of 1 or more), input candidate words (aWord, aWord-1, aWord-2, ..., aWord-m- 1) The number of output words (#n; #n is an integer of 0 or more), similar words (COWord, COWord-1, .., COWord-n), and an error code (ECode) are associated with each other.

これらの内、入力単語数 (#m)は、類似単語追加部１８６が、候補単語作成部１８４から入力された第１の候補単語を類似単語に対応付ける際に、前後いくつの候補単語を参照するかを示す(以下、第１の候補単語の後方のｍ−１文字を参照する場合を例として説明する)。 Among these, the number of input words (#m) refers to the number of candidate words before and after the similar word adding unit 186 associates the first candidate word input from the candidate word creating unit 184 with the similar word. (Hereinafter, the case where the m-1 character behind the first candidate word is referred to will be described as an example).

入力候補単語(aWord, aWord-1, aWord-2,...,aWord-m-1)は、候補単語作成部１８４から類似単語追加部１８６に連続して入力されたｍ個の第１の候補単語(TarWard, TarWord-1,...,TarWord-m-1)と突き合わされ、比較される単語列を示す。つまり、類似単語追加部１８６においては、第ｐ番目の第１の候補単語(TarWord)は、入力されても直ちに類似単語と対応付けされず、さらに類似単語追加部１８６にｍ−１個の第１の候補単語が入力された後に、第ｐ番目〜第ｐ＋ｍ−１番目のｍ個の連続した第１の候補単語(TarWard, TarWord-1,..., TarWord-m-1)それぞれと、類似単語レコードのｍ個の入力候補単語(aWord, aWord-1, aWord-2,..., aWord-m-1)それぞれとが比較され、これらが一致した場合にのみ、第ｐ番目の第１の候補単語（aWord = TarWord)と、類似単語レコード内で入力候補単語に続くｎ個の類似単語(COWord, COWard1,.., COWord-n)とが対応付けられる。なお、類似単語が存在しない場合には、出力単語数(#n)の値は０とされ、類似単語は類似レコード内に対応付けられない。 The input candidate words (aWord, aWord-1, aWord-2,..., AWord-m-1) are the m first words successively input from the candidate word creation unit 184 to the similar word addition unit 186. Indicates word strings to be compared with candidate words (TarWard, TarWord-1, ..., TarWord-m-1). That is, in the similar word adding unit 186, the p-th first candidate word (TarWord) is not immediately associated with the similar word even if it is input, and the similar word adding unit 186 further receives the (m−1) th After one candidate word is input, each of the p-th to p + m-1th m consecutive first candidate words (TarWard, TarWord-1, ..., TarWord-m-1), Each of the m input candidate words (aWord, aWord-1, aWord-2, ..., aWord-m-1) in the similar word record is compared, and only when they match, the p-th One candidate word (aWord = TarWord) is associated with n similar words (COWord, COWard1, .., COWord-n) following the input candidate word in the similar word record. If there is no similar word, the number of output words (#n) is set to 0, and the similar word is not associated in the similar record.

図６に示した類似単語レコードにおける第１の候補単語と類似単語との間のマッピング方法について、さらに説明する。第１の候補単語と類似単語との間の対応付け（マッピング）の方法としては、例えば、以下に示す４つの方法が考えられる。 The mapping method between the first candidate word and the similar word in the similar word record shown in FIG. 6 will be further described. As a method of mapping (mapping) between the first candidate word and the similar word, for example, the following four methods are conceivable.

［第１の方法]
１つの単語が別の１つの単語に誤って識別される場合に対応するために、１つの単語のみを含む第１の候補単語に、１つの単語のみを含む類似単語を対応付ける。第１の方法の例としては、第１の候補単語"read"の"r"が、正しく発音されなかった場合に備えて、第１の候補単語"read"と類似単語"lead"とを対応付けることが挙げられる。さらに、第１の方法をとる場合の対応付けの例としては、"sink"と"think"、"fell"と"fill"、"seat"と"sit"、"better"と"bitter"、"nut"と"not"、"fund"と"found"、"boat"と"bought"および"coal"と"call"との対応付け等を挙げることができる。 [First method]
In order to cope with the case where one word is erroneously identified as another word, a similar word including only one word is associated with the first candidate word including only one word. As an example of the first method, the first candidate word “read” is associated with the similar word “lead” in case “r” of the first candidate word “read” is not pronounced correctly. Can be mentioned. Furthermore, as examples of correspondence when the first method is adopted, “sink” and “think”, “fell” and “fill”, “seat” and “sit”, “better” and “bitter”, “ Examples include the correspondence between "nut" and "not", "fund" and "found", "boat" and "bought", and "coal" and "call".

[第２の方法]
１つの単語が別の複数の単語に誤って識別される場合に対応するために、１つの単語のみを含む第１の候補単語に、複数の単語を含む類似単語を対応付ける。第２の方法の例としては、第１の候補単語"jumped"の"ed"が、正しく"t"と発音されなかった場合に備えて、第１の候補単語"jumped"と類似単語"jump","and"を対応付けることが挙げられる。さらに、第２の方法をとる場合の対応付けの例としては、"check in"と"chickin"との対応付け等を挙げることができる。 [Second method]
In order to cope with a case where one word is erroneously identified as another word, a similar word including a plurality of words is associated with the first candidate word including only one word. As an example of the second method, the first candidate word “jumped” and the similar word “jump” are prepared in case “ed” of the first candidate word “jumped” is not correctly pronounced “t”. One example is to associate "," and ". Furthermore, as an example of association in the case of taking the second method, association between “check in” and “chickin” can be cited.

[第３の方法]
複数の単語が別の１つの単語に誤って識別される場合に対応するために、複数の単語を含む第１の候補単語に、１つの単語のみを含む類似単語を対応付ける。第３の方法の例としては、第１の候補単語 "have", "to"がつながって発音された場合に備えて、第１の候補単語"have","to"と類似単語"hat"を対応付けることが挙げられる。さらに、第３の方法をとる場合の対応付けの例としては、"Iwii"と"aisle"との対応付け等を挙げることができる。 [Third method]
In order to cope with a case where a plurality of words are mistakenly identified as another one word, a similar word including only one word is associated with the first candidate word including the plurality of words. As an example of the third method, the first candidate words “have”, “to” and the similar word “hat” are prepared in case the first candidate words “have”, “to” are connected and pronounced. Are associated with each other. Furthermore, as an example of association in the case of taking the third method, association between “Iwii” and “aisle” can be cited.

[第４の方法]
複数の単語が別の複数の単語に誤って識別される場合に対応するために、複数の単語を含む第１の候補単語に、複数の単語を含む類似単語を対応付ける。第１〜第３の方法は、第４の方法に限定を加えた方法と考えることができるので、図６に示した類似単語レコードは、第４の方法に基づいて作成され、複数の単語を含む第１の候補単語に複数の単語を含む類似単語が対応付けてある。 [Fourth method]
In order to cope with a case where a plurality of words are erroneously identified as another plurality of words, a similar word including a plurality of words is associated with the first candidate word including the plurality of words. Since the first to third methods can be considered as methods that are limited to the fourth method, the similar word record shown in FIG. 6 is created based on the fourth method, and a plurality of words are recorded. Similar words including a plurality of words are associated with the first candidate word to be included.

類似単語は、コンピュータ１を利用する話者（ユーザ）の発音が不正確な場合に、候補単語の代わりに選択される。従って、絞り込み部１８８において、候補単語ではなく、類似単語が最終的に選ばれた場合には、話者は、英語の発音上、選ばれた類似単語に対応する誤りをしていることになる。誤りコード (ECode)は、このような観点から類似単語レコードに付加され、最終的に選択された類似単語に対応する発音上の誤りを符号の形式で示す。 The similar word is selected instead of the candidate word when the pronunciation of the speaker (user) using the computer 1 is inaccurate. Therefore, in the narrowing-down unit 188, when a similar word is finally selected instead of a candidate word, the speaker has made an error corresponding to the selected similar word in English pronunciation. . The error code (ECode) is added to the similar word record from such a viewpoint, and indicates a pronunciation error corresponding to the finally selected similar word in the form of a code.

[誤り情報コードテーブル]
図７は、単語データの誤り情報コードテーブルを例示する図である。図７に示すように、誤り情報コードテーブルは、誤りコード（ECode; 0, 1,2,...)と、誤りの内容を示す情報（例えば、「rをlと発音した」，「lをrと発音した」，「thをsと発音した」等の誤り情報）とが対応付けられて構成される。 [Error information code table]
FIG. 7 is a diagram illustrating an error information code table of word data. As shown in FIG. 7, the error information code table includes an error code (ECode; 0, 1, 2,...) And information indicating error contents (for example, “r is pronounced as l”, “l Error information such as “pronunciated as r” and “pronounced th as s”).

[ベクトルデータ生成部１６０]
音声識別部１８において、ベクトルデータ生成部１６０（図３）は、音声インターフェース部１５０から入力される音声データを処理して、音声の複数の特徴（音の高さ、強さ、イントネーション等）それぞれについて量子化を行ない、これらの特徴それぞれを示す数値を含むベクトルデータを生成して制御部１６２に対して出力する。例えば、音声データのサンプリング周波数が１１ｋＨｚである場合、ベクトルデータ生成部１６０は、音声データを１／１００秒単位で処理し、音声データの複数の種類の特徴それぞれを量子化し、複数の要素からなるベクトルデータを生成する。 [Vector data generator 160]
In the voice identification unit 18, the vector data generation unit 160 (FIG. 3) processes the voice data input from the voice interface unit 150, and each of a plurality of voice characteristics (pitch, strength, intonation, etc.). Is quantized, vector data including numerical values indicating each of these features is generated and output to the control unit 162. For example, when the sampling frequency of the audio data is 11 kHz, the vector data generation unit 160 processes the audio data in units of 1/100 second, quantizes each of a plurality of types of features of the audio data, and includes a plurality of elements. Generate vector data.

［ラベル作成部１８２]
ラベル作成部１８２は、ベクトルデータ生成部１６０から入力されたベクトルデータを、発音記号に類似するラベルと呼ばれるデータに変換し、１センテンス分ずつ候補単語作成部１８４に対して出力する。ラベル作成部１８２は、この変換処理を、例えば、色々な人（大人、子供、男、女等）の実際の話し声のサンプルから生成したラベルと、連続した複数のベクトルデータのパターンとを対応付けたラベルテーブルを用い、連続した複数のベクトルデータに対応するラベルを選択することにより行なう。ただし、「センテンス」という用語は、実際の文章内の実際のセンテンスに必ずしも対応せず、単に音声識別の処理単位を示す。 [Label creation unit 182]
The label creation unit 182 converts the vector data input from the vector data generation unit 160 into data called a label similar to a phonetic symbol, and outputs the data to the candidate word creation unit 184 for each sentence. The label creation unit 182 associates this conversion process with, for example, a label generated from samples of actual speech of various people (adults, children, men, women, etc.) and a plurality of continuous vector data patterns. The label table is used to select labels corresponding to a plurality of continuous vector data. However, the term “sentence” does not necessarily correspond to an actual sentence in an actual sentence, but simply indicates a processing unit for voice identification.

［候補単語作成部１８４]
候補単語作成部１８４は、音声データにおいて、１つ以上の英単語に対応する連続した１つ以上のラベル列を、英単語との対応を示すラベル列テーブルを用いて、ラベル列それぞれが示す１つ以上の英単語の組み合わせの１つ以上と対応付け、対応付けた英単語の組み合わせを第１の候補単語として類似単語追加部１８６に対して出力する。（以下、説明の簡略化のために、候補単語作成部１８４が、１つの英単語に対応するラベル列それぞれを、１つの英単語のみを含む第１の候補単語に対応付ける場合を例として説明する。） [Candidate word creation unit 184]
The candidate word creation unit 184 uses a label column table indicating correspondence with English words to indicate one or more continuous label columns corresponding to one or more English words in the audio data. It associates with one or more combinations of one or more English words, and outputs the associated combination of English words to the similar word adding unit 186 as the first candidate word. (Hereinafter, for simplification of explanation, a case where the candidate word creation unit 184 associates each label string corresponding to one English word with a first candidate word including only one English word will be described as an example. .)

ここで、例えば、候補単語作成部１８４は、ラベルが示す音をアルファベットに変換し、変換して得られたアルファベット列を英単語に変換するのではなく、ラベル列を直接、英単語（第１の候補単語）に変換する。つまり、候補単語作成部１８４は、例えば、第１の候補単語として"read"を作成する際に、ラベル列を"r", "e","a", "d"という４個のアルファベットに置換してから"read"という単語を第１の候補単語として対応付けるのではなく、ラベル列に直接に"read"という単語を対応付ける。 Here, for example, the candidate word creation unit 184 does not convert the sound indicated by the label into alphabet, and converts the alphabet string obtained by the conversion into English words, but directly converts the label string into English words (first Candidate word). That is, for example, when creating “read” as the first candidate word, the candidate word creation unit 184 converts the label string into four alphabets “r”, “e”, “a”, and “d”. Instead of associating the word “read” as the first candidate word after replacement, the word “read” is directly associated with the label string.

なお、候補単語作成部１８４は、作成した第１の候補単語を、図８に示す入力レコード(InWord)、および、図９に示す入力レコードマトリクス(InMatrix)の形式で、１センテンス分ずつ類似単語追加部１８６に対して出力する。図８は、候補単語作成部１８４が類似単語追加部１８６に出力する入力レコード(InWord)のデータ構造を示す図である。図９は、候補単語作成部１８４が類似単語追加部１８６に出力する入力レコードマトリクス(InMatrix)のデータ構造を示す図である。 Note that the candidate word creation unit 184 uses the created first candidate words as similar words for each sentence in the format of the input record (InWord) shown in FIG. 8 and the input record matrix (InMatrix) shown in FIG. The data is output to the adding unit 186. FIG. 8 is a diagram illustrating a data structure of an input record (InWord) output from the candidate word creation unit 184 to the similar word addition unit 186. FIG. 9 is a diagram illustrating a data structure of an input record matrix (InMatrix) output from the candidate word creation unit 184 to the similar word addition unit 186.

図８に示すように、候補単語作成部１８４は、ラベル列と対応付けた単語およびその単語長を示すデータ(InWord)それぞれに、その単語が１つのセンテンスにおいて第ｉ番目であること、および、１つのセンテンスの第ｉ番目の第ｊ番目の第１の候補単語であることを示すデータを付加し、入力レコード(InWord)を作成して類似単語追加部１８６に対して出力する。ただし、ｉ，ｊは整数であって、ｉは最大単語数(Maxi)以下、ｊは最大候補数(Maxj)以下である。 As shown in FIG. 8, the candidate word creation unit 184 indicates that the word associated with the label string and the data (InWord) indicating the word length are the i-th word in one sentence, and Data indicating that it is the i-th j-th first candidate word of one sentence is added, an input record (InWord) is created and output to the similar word adding unit 186. However, i and j are integers, i is not more than the maximum number of words (Maxi), and j is not more than the maximum number of candidates (Maxj).

さらに、候補単語作成部１８４は、図８に示すように、ラベル作成部１８２から入力されたラベル列と、選択した英単語に対応するラベル列テーブル内のラベル列との一致の程度、言いかえると、ラベル列が第１の候補単語を示している確率を示す確率データを作成し、入力レコードの単語およびその単語長を示すデータに付加して類似単語追加部１８６に対して出力する。 Further, as shown in FIG. 8, the candidate word creation unit 184 changes the degree of matching between the label sequence input from the label creation unit 182 and the label sequence in the label sequence table corresponding to the selected English word. Then, probability data indicating the probability that the label string indicates the first candidate word is generated, added to the data indicating the word of the input record and its word length, and output to the similar word adding unit 186.

また、１センテンス分の入力レコード(InWord)の作成が終了すると、候補単語作成部１８４は、図９に示すように、そのセンテンス中に含まれる最大単語数(Maxi)、同一のラベル列（読み）に対して、最大いくつの第１の候補単語が対応付けられたかを示す最大候補数(Maxj)、および、第ｉ番目の単語に第ｊ番目の単語が存在するかを示すフラグFlg(ij)を示す入力レコードマトリクスを作成し、１センテンス分の入力レコードとともに類似単語追加部１８６に対して出力する。なお、候補単語作成部１８４が第ｉ番目の単語に対応するラベル列の第１の候補単語を選べなかった場合には、フラグFlg(i1)は、第ｉ番目の第１番目の単語が存在しない旨を示す値（例えば０）とされる。 When the creation of the input record (InWord) for one sentence is completed, the candidate word creation unit 184, as shown in FIG. 9, displays the maximum number of words (Maxi) included in the sentence and the same label string (reading). ), A maximum number of candidates (Maxj) indicating how many first candidate words are associated with each other, and a flag Flg (ij indicating whether the j-th word exists in the i-th word ) Is generated and output to the similar word adding unit 186 together with the input records for one sentence. If the candidate word creation unit 184 cannot select the first candidate word in the label string corresponding to the i-th word, the flag Flg (i1) includes the i-th first word. It is set to a value (for example, 0) indicating that no operation is performed.

［類似単語追加部１８６]
類似単語追加部１８６は、候補単語作成部１８４から入力された入力レコードそれぞれに対して、ラベル列と対応付けることはできないが、日本人による英語の発音の癖等を考慮して、第１の候補単語に類似する可能性がある英単語を第１の候補単語に付加し、第２の候補単語を生成して絞り込み部１８８に対して出力する。 [Similar word addition unit 186]
Although the similar word adding unit 186 cannot associate each input record input from the candidate word creating unit 184 with a label string, the first candidate is considered in consideration of English pronunciation habits by the Japanese. An English word that may be similar to the word is added to the first candidate word, and a second candidate word is generated and output to the narrowing unit 188.

さらに類似単語追加部１８６の動作を詳細に説明する。類似単語追加部１８６は、まず、候補単語作成部１８４から入力された１センテンス分の入力レコード(InWord)に含まれる第ｐ番目の第１の候補単語を順次、単語データベース部１６０に対して出力し、単語レコードの取得を要求する。単語データベース部１６０は、インデックステーブル（図４）を用いて単語レコード（図５）の検索を行ない、入力レコード(InWord)に含まれる単語と見出し単語(TarWord)が一致する単語レコードを取得し、類似単語追加部１８６に対して出力する。 Further, the operation of the similar word adding unit 186 will be described in detail. The similar word adding unit 186 first sequentially outputs the p-th first candidate word included in the input record (InWord) for one sentence input from the candidate word creating unit 184 to the word database unit 160. And request to get word records. The word database unit 160 searches the word record (FIG. 5) using the index table (FIG. 4), acquires a word record in which the word included in the input record (InWord) and the heading word (TarWord) match, It outputs to the similar word addition part 186.

類似単語追加部１８６は、第ｐ番目の第１の候補単語の単語インデックスを得ると、第ｐ番目から第ｐ＋ｍ−１番目の単語(InWord-p.j, InWord-p+1.j,...,InWord-p+m-1.j)と、単語データベース部１６０から入力される単語レコードに付加された類似単語レコード（図６）それぞれのｍ個の入力単語(aWord, aWord-1, aWord-2, ..., aWord-m-1)とを比較し、これらが一致した場合には、ｎ個の類似単語(COWord-1,COWord-2, ..., COWord-n)を第ｐ番目の第１の候補単語に付加して第２の候補単語を作成する。 When the similar word adding unit 186 obtains the word index of the p-th first candidate word, the p-th to p + m−1-th words (InWord-pj, InWord-p + 1.j,... , InWord-p + m-1.j) and m input words (aWord, aWord-1, aWord-) of each similar word record (FIG. 6) added to the word record input from the word database unit 160 2, ..., aWord-m-1), and if they match, n similar words (COWord-1, COWord-2, ..., COWord-n) are A second candidate word is created by adding to the first candidate word.

なお、類似単語追加部１８６が、第１の候補単語に類似単語を付加して第２の候補単語を作成するのではなく、第１の候補単語を類似単語で置き換えて第２の候補単語を作成するように処理を変更することも可能である。この場合には、類似単語に第１の候補単語が含まれているか否かを問わない。 The similar word adding unit 186 does not create the second candidate word by adding the similar word to the first candidate word, but replaces the first candidate word with the similar word to replace the second candidate word. It is also possible to change the process to create. In this case, it does not matter whether or not the first candidate word is included in the similar words.

さらに、類似単語追加部１８６は、第１の候補単語に付加した類似単語レコードに対応する誤りコード(ECode)を単語データベース部１６０に対して出力し、誤りコードが示す誤り情報の取得を要求する。単語データベース部１６０は、この要求に応じて誤り情報コードテーブル（図７）を検索し、誤り情報を取得して類似単語追加部１８６に対して出力する。 Further, the similar word adding unit 186 outputs an error code (ECode) corresponding to the similar word record added to the first candidate word to the word database unit 160, and requests acquisition of error information indicated by the error code. . In response to this request, the word database unit 160 searches the error information code table (FIG. 7), acquires error information, and outputs it to the similar word adding unit 186.

図１０は、類似単語追加部１８６が絞り込み部１８８に出力する出力レコード(OutWord)のデータ構造を示す図である。図１１は、類似単語追加部１８６が絞り込み部１８８に出力する出力レコードマトリクス(OutMatrix)のデータ構造を示す図である。 FIG. 10 is a diagram illustrating a data structure of an output record (OutWord) output from the similar word adding unit 186 to the narrowing-down unit 188. FIG. 11 is a diagram illustrating a data structure of an output record matrix (OutMatrix) output from the similar word adding unit 186 to the narrowing-down unit 188.

なお、類似単語追加部１８６は、第２の候補単語、単語長を示すデータ、確率データおよび誤り情報（または誤り情報コード）と、その単語が１つのセンテンスにおいて第ｉ'番目であること、および、１つのセンテンスの第ｉ'番目の第ｊ'番目の第１の候補単語であることを示すデータを付加し、図１０に示すように、入力レコード（図８）と同様な形式をとる出力レコード(OutWord)の形式で絞り込み部１８８に対して出力する。 The similar word adding unit 186 includes the second candidate word, data indicating the word length, probability data, and error information (or error information code), and the word is the i'th in one sentence. Output indicating that it is the i'th j'th first candidate word of one sentence, and takes the same format as the input record (FIG. 8) as shown in FIG. The data is output to the narrowing down unit 188 in the form of a record (OutWord).

また、１センテンス分の出力レコード(OutWord)の作成が終了すると、類似単語追加部１８６は、図１１に示すように、入力レコードマトリクス（図９）と同様に、そのセンテンス中に含まれる最大単語数(Maxi')、同一のラベル列（読み）に対して、最大いくつの第２の候補単語が対応付けられたかを示す最大候補数(Maxj')、および、第ｉ'番目の単語に第ｊ'番目の単語が存在するかを示すフラグFlg(i'j')を示す出力レコードマトリクスを作成し、１センテンス分の出力レコードとともに絞り込み部１８８に対して出力する。 When the output record (OutWord) for one sentence is completed, the similar word adding unit 186, as shown in FIG. 11, performs the maximum word included in the sentence as in the input record matrix (FIG. 9). Number (Maxi '), the maximum number of candidates (Maxj') indicating the maximum number of second candidate words associated with the same label string (reading), and the i'th word An output record matrix indicating the flag Flg (i′j ′) indicating whether or not the j′-th word exists is created and output to the narrowing-down unit 188 together with an output record for one sentence.

［絞り込み部１８８]
絞り込み部１８８は、出力レコードとして類似単語追加部１８６から入力された第２の候補単語をモニタ１０２に表示し、例えば、ユーザの操作に応じて、あるいは、それまでに識別した単語列の構文解析結果に基づいて、第２の候補単語のいずれかを、最終的な識別結果として選択し、選択した単語を並べたテキストデータを作成し、モニタ１０２、スピーカ１０４あるいはプリンタ１０６に対して出力する。 [Refinement part 188]
The narrowing-down unit 188 displays the second candidate word input from the similar word adding unit 186 as an output record on the monitor 102, and for example, syntactic analysis of the word string identified in response to the user's operation or so far Based on the result, one of the second candidate words is selected as a final identification result, and text data in which the selected word is arranged is created and output to the monitor 102, the speaker 104, or the printer 106.

絞り込み部１８８によるテキストデータの作成についてさらに説明する。例えば、１つのセンテンスの第１番目の単語の第２の候補単語がｎ1個(OutWord-1.1, OutWord-1.2, ...,OutWord-1.n1)、第２番目の単語の第２の候補単語がｎ２個といったようにある場合には、絞り込み部１８８は、第１番目の単語の第２の候補単語のいずれか、および、第２番目の単語の第２の候補単語のいずれかといったように、各単語の第２の候補文字のいずれかを、単語の順番通りにモニタ１０２に表示する。 The creation of text data by the narrowing unit 188 will be further described. For example, n1 second candidate words (OutWord-1.1, OutWord-1.2, ..., OutWord-1.n1) of the first word of one sentence, the second candidate of the second word When there are n2 words or the like, the narrowing-down unit 188 selects one of the second candidate words for the first word and one of the second candidate words for the second word. In addition, one of the second candidate characters of each word is displayed on the monitor 102 in the order of the words.

ユーザが、例えば、マウス１２８を用いてＧＵＩ画像のウィンドウ内に表示された第２の候補単語をクリックすると、制御部１６２は、クリックされた第２の候補単語を反転表示し、その後、ユーザが同じ部分をクリックするたびに、表示する第２の候補単語を変更する。ユーザが最終的に第２の候補単語を選択し、その旨の操作をマウス１２８あるいはキーボード１２６に対して行なったり、あるいは、次の単語の選択操作に移行すると、絞り込み部１８８は、最後に表示した第２の候補単語を最終的に選択された識別結果とする。ユーザは必要に応じてこの作業を繰り返し、絞り込み部１８８は、この作業に応じて単語の選択を行ない、テキストデータを作成する。 When the user clicks the second candidate word displayed in the window of the GUI image using the mouse 128, for example, the control unit 162 highlights the clicked second candidate word, and then the user Each time the same part is clicked, the second candidate word to be displayed is changed. When the user finally selects the second candidate word and performs an operation to that effect on the mouse 128 or the keyboard 126 or shifts to a next word selection operation, the narrowing-down unit 188 displays the last word The selected second candidate word is used as the finally selected identification result. The user repeats this work as necessary, and the narrowing-down unit 188 selects a word according to this work and creates text data.

なお、絞り込み部１８８の処理を、第２の候補単語を表示する際に、単語の前後関係を考慮して、最終的な識別結果として選択される可能性が高い第２の候補単語のみをモニタ１０２に表示したり、あるいは、最終的な識別結果として選択される可能性が高い順番でモニタ１０２に表示させるようにしたりすると便利である。 Note that when the second candidate word is displayed, the narrowing-down unit 188 monitors only the second candidate word that is likely to be selected as the final identification result in consideration of the context of the word. It is convenient to display the information on the monitor 102 or to display the information on the monitor 102 in the order that the possibility of being selected as the final identification result is high.

便宜的に日本語を識別する場合を具体例として挙げて説明する。候補単語作成部１８４は、「あかいはな」という日本語の音声データから得られたラベルを、「たかい・はな（高い・花，高い・鼻）」と識別し、さらに、類似単語追加部１８６がこれらに類似単語「赤い・罠」を付加して、これら３種類を第２の候補単語として絞り込み部１８８に出力する可能性がある。このような場合、絞り込み部１８８において、第２の候補単語の前半の識別結果が「赤い」とされた場合に、後半の識別結果は、「花」、「鼻」および「罠」の順で確からしいと考えられる。このような場合、絞り込み部１８８の処理を、「赤い」という単語に続けて「花」，「鼻」のみを表示するようにしたり、「花」，「鼻」，「罠」の順番で表示するようにすると、ユーザの選択操作を最小限にすることができる。 A case where Japanese is identified for convenience will be described as a specific example. The candidate word creation unit 184 identifies the label obtained from the Japanese voice data “Akai Hana” as “Taka Hana (high, flower, high, nose)”, and further adds a similar word addition unit There is a possibility that 186 adds the similar word “red / 罠” to these and outputs these three types as the second candidate words to the narrowing-down unit 188. In such a case, in the narrowing-down unit 188, when the identification result of the first half of the second candidate word is “red”, the identification result of the second half is “flower”, “nose”, and “罠” in this order. It seems to be certain. In such a case, the process of the narrowing-down unit 188 may display only “flower” and “nose” following the word “red”, or display in the order of “flower”, “nose”, and “罠”. By doing so, the user's selection operation can be minimized.

また、絞り込み部１８８の処理を、類似単語追加部１８６から入力される出力レコードに付加された確率データを利用して、例えば、ユーザが設定する閾値以上の範囲の値をとる確率データに対応する第２の候補単語のみをモニタ１０２に表示するようにすると、最終的な識別結果として選択される可能性が高い第２の候補単語のみがモニタ１０２に表示されることとなり、ユーザの選択操作をより少なくすることができる。 Further, the processing of the narrowing-down unit 188 corresponds to, for example, probability data that takes a value in a range greater than or equal to a threshold set by the user, using probability data added to the output record input from the similar word adding unit 186. If only the second candidate word is displayed on the monitor 102, only the second candidate word that is likely to be selected as the final identification result is displayed on the monitor 102, and the user's selection operation is performed. Can be less.

［コンピュータ１の動作]
以下、図１２を参照して、コンピュータ１における音声識別処理を説明する。図１２は、コンピュータ１における本発明にかかる音声識別処理を示すフローチャート図である。なお、図１２においては、説明の簡略化のために、本発明にかかる音声識別処理の内、基本的な処理のみを示してあり、上述した確率データあるいは誤り情報を用いた処理は適宜、省略されている。 [Operation of computer 1]
Hereinafter, the voice identification process in the computer 1 will be described with reference to FIG. FIG. 12 is a flowchart showing the voice identification process according to the present invention in the computer 1. In FIG. 12, for the sake of simplification of explanation, only basic processing is shown in the speech identification processing according to the present invention, and the processing using the above-described probability data or error information is appropriately omitted. Has been.

図１２に示すように、ステップ１００（Ｓ１００）において、音声識別プログラム１６のベクトルデータ生成部１６０およびラベル作成部１８２（図３）は、日本人ユーザが発音した１センテンス（単位）分の音声データを処理し、ラベル列に変換する。ベクトルデータ生成部１６０およびラベル作成部１８２が生成した１センテンス分のラベル列は、候補単語作成部１８４に入力される。 As shown in FIG. 12, in step 100 (S100), the vector data generation unit 160 and the label creation unit 182 (FIG. 3) of the voice identification program 16 perform one sentence (unit) of voice data pronounced by a Japanese user. Is processed and converted into a label string. The label string for one sentence generated by the vector data generation unit 160 and the label generation unit 182 is input to the candidate word generation unit 184.

ステップ１０２（Ｓ１０２）において、候補単語作成部１８４は、ラベル作成部１８２から入力された１センテンス分のラベル列を第１の候補単語に対応付け、図８に示した入力レコード(InWordij)の形式で類似単語追加部１８６に対して出力し、さらに、図９に示した入力レコードマトリクス(InMatrix)を作成して類似単語追加部１８６に対して出力する。 In step 102 (S102), the candidate word creation unit 184 associates the label string for one sentence input from the label creation unit 182 with the first candidate word, and forms the input record (InWordij) shown in FIG. Are output to the similar word adding unit 186, and the input record matrix (InMatrix) shown in FIG. 9 is generated and output to the similar word adding unit 186.

ステップ１０４（Ｓ１０４）において、類似単語追加部１８６は、単語データベース部１６０に対して処理の対象となっている入力レコードに含まれる第１の候補単語の単語レコード（図５）の検索を要求する。単語データベース部１６０は、類似単語追加部１８６の要求に応じてインデックステーブル（図４）を用いて検索を行ない、検索の結果として第１の候補単語（入力レコード）に対応する単語レコードが得られた場合には、得られた単語レコードを類似単語追加部１８６に対して出力してＳ１０６の処理に進み、得られなかった場合にはＳ１１０の処理に進む。 In step 104 (S104), the similar word adding unit 186 requests the word database unit 160 to search for a word record (FIG. 5) of the first candidate word included in the input record to be processed. . The word database unit 160 performs a search using the index table (FIG. 4) in response to a request from the similar word addition unit 186, and a word record corresponding to the first candidate word (input record) is obtained as a result of the search. If it is obtained, the obtained word record is output to the similar word adding unit 186 and the process proceeds to S106. If not obtained, the process proceeds to S110.

ステップ１０６（Ｓ１０６）において、類似単語追加部１８６は、単語データベース部１６０から入力された単語レコードの類似単語レコード（図６）を処理し、第１の候補単語（入力レコード）に対応する類似単語を取得する。 In step 106 (S106), the similar word adding unit 186 processes the similar word record (FIG. 6) of the word record input from the word database unit 160, and the similar word corresponding to the first candidate word (input record). To get.

ステップ１０８（Ｓ１０８）において、取得した類似単語を第１の候補単語に付加して第２の候補単語を作成する。 In step 108 (S108), the acquired similar word is added to the first candidate word to create a second candidate word.

ステップ１１０（Ｓ１１０）において、類似単語追加部１８６は、１センテンスに含まれる入力レコードの全ての処理を終了したか否かを判断し、終了した場合にはＳ１１０の処理に進み、終了しない場合には、処理対象を次の入力レコードに変更してＳ１０４の処理に戻る。 In step 110 (S110), the similar word adding unit 186 determines whether or not all the processes for the input records included in one sentence have been completed, and if completed, proceeds to the process of S110, and if not completed. Changes the processing target to the next input record and returns to the processing of S104.

ステップ１１２（Ｓ１１２）において、類似単語追加部１８６は、Ｓ１０８において作成された第２の候補単語を、図１０に示した出力レコードの形式で絞り込み部１８８に対して出力する。さらに、類似単語追加部１８６は、第２の候補単語に対応する出力レコードマトリクス（図１１）を作成し、絞り込み部１８８に対して出力する。絞り込み部１８８は、入力された第２の候補単語をモニタ１０２上のＧＵＩ画像のウィンドウ内に表示し、ユーザの操作に応じて最終的な識別結果を、英文のテキストデータとして出力する。 In step 112 (S112), the similar word adding unit 186 outputs the second candidate word created in S108 to the narrowing unit 188 in the output record format shown in FIG. Further, the similar word adding unit 186 creates an output record matrix (FIG. 11) corresponding to the second candidate word and outputs the output record matrix to the narrowing unit 188. The narrowing-down unit 188 displays the input second candidate word in the window of the GUI image on the monitor 102, and outputs the final identification result as English text data in accordance with a user operation.

［変形例]
なお、絞り込み部１８８の処理を、ユーザが、類似単語追加部１８６において類似単語として第２の候補単語に付加された単語を最終的な識別結果として選択した場合に、選択された単語に付加された誤り情報をモニタ１０２に表示するようにすると、ユーザは、自分の英語の発音の欠点を知ることができ、コンピュータ１を音声識別装置として用いるほか、英語の発音の学習装置として用いることができる。誤り情報の表示方法としては、正しい発音をするための口の形を表示する、あるいは、正しい発音を音声合成してユーザに聞かせる等の方法が考えられる。 [Modification]
The process of the narrowing down unit 188 is added to the selected word when the user selects a word added to the second candidate word as a similar word in the similar word adding unit 186 as a final identification result. If the error information is displayed on the monitor 102, the user can know the shortcomings of his / her English pronunciation, and can use the computer 1 as a speech identification device and also as an English pronunciation learning device. . As a display method of error information, a method of displaying a mouth shape for correct pronunciation or synthesizing correct pronunciation and letting the user hear it is conceivable.

また、上述した第１の実施形態においては、日本人が発音した英語から英文テキストを識別する処理を例示したが、本発明にかかる音声識別処理は、このような場合に限定されず、例えば、アメリカ式英語を話すアメリカ人が、英国式英語用に調整された音声識別装置を用いてテキストを作成する等、同一の言語であっても発音が異なる場合、あるいは、個人の発音に癖があったり、不明りょうであったりする場合の識別率の低下に対処するために、広く応用することができる。 Further, in the first embodiment described above, the process of identifying English text from English pronounced by a Japanese is exemplified, but the speech identification process according to the present invention is not limited to such a case. An American who speaks American English uses a voice recognition device tuned for British English to create text, such as when the pronunciation is different even in the same language, or there is a discrepancy in personal pronunciation It can be widely applied to deal with a decline in the identification rate when it is unknown or unknown.

以上述べたように、本発明にかかる音声識別処理においては、候補単語作成部１８４が識別した第１の候補単語に、類似単語追加部１８６において類似単語を付加して第２の候補単語とするので、音声識別率が向上する。便宜的にコンピュータ１により日本語を識別する場合を具体例として説明する。例えば、ユーザが日本語で「あかいはな（赤い花）」と発音したつもりであっても、「あ」の発音が何らかの原因で不明りょうになって、ラベル作成部１８２が、「あかいはな」とも「たかいはな」ともつかないラベル列を生成することがある。 As described above, in the speech identification processing according to the present invention, the similar word adding unit 186 adds a similar word to the first candidate word identified by the candidate word creating unit 184 to obtain the second candidate word. Therefore, the voice identification rate is improved. For convenience, the case where Japanese is identified by the computer 1 will be described as a specific example. For example, even if the user intends to pronounce “Akai Hana (red flower)” in Japanese, the pronunciation of “A” becomes unknown for some reason, and the label creating unit 182 causes the “Akai Hana” to be pronounced. In some cases, a label string that does not connect with "Takai Hana" may be generated.

候補単語作成部１８４が、このラベル列に対応する第１の候補単語として「高い・鼻」を選択し、類似単語追加部１８６に出力した場合であっても、単語レコードが適切に作成されていれば、類似単語追加部１８６が第１の単語として「赤い・鼻」，「赤い・花」を第１の候補単語に付加して第２の候補単語とすることができる。従って、候補単語作成部１８４が作成した第１の候補単語には含まれていなかった本来の「赤い・花」もモニタ１０２に表示されることになり、ユーザは第２の候補単語の中から正しい識別結果を選択することができる。 Even when the candidate word creation unit 184 selects “high / nose” as the first candidate word corresponding to the label string and outputs it to the similar word addition unit 186, the word record is created appropriately. Then, the similar word adding unit 186 can add “red / nose” and “red / flower” to the first candidate word as the first word to be the second candidate word. Therefore, the original “red flower” that was not included in the first candidate word created by the candidate word creation unit 184 is also displayed on the monitor 102, and the user can select from the second candidate words. The correct identification result can be selected.

また、通常、音声識別方法は、純粋な音声処理に文法的解析処理（文章中の特定の位置には名詞がおかれやすい等）や、言語モデル処理（並んだ単語の確からしさ）等が組み合わされて実現されるので、これらの処理を行なう前に、本発明にかかる音声識別方法で類似単語を追加しておくと、音声識別率が大幅に向上するなど、効果が非常に大きい。 In general, speech recognition methods combine pure speech processing with grammatical analysis processing (nouns are easily placed at specific positions in sentences, etc.) and language model processing (probability of aligned words). Therefore, if similar words are added by the speech identification method according to the present invention before these processes are performed, the effect such as a significant improvement in the speech identification rate is obtained.

［第２実施形態]
図３に示した音声識別プログラム１６において、音声識別の結果として候補単語が選択されるということは、ユーザ（話者）の発音がネイティブスピーカに比較的近く、本発明にかからない一般的な音声識別装置を用いても高い識別率で識別可能な程度に正確であることを意味し、反対に、類似単語が選択されるということは、話者の発音に誤り、あるいは、不明瞭さがあることを意味する。このことは、音声識別プログラム１６において、ユーザが意図した単語が類似単語に含まれる場合にも当てはまる。 [Second Embodiment]
In the voice identification program 16 shown in FIG. 3, the candidate word is selected as a result of the voice identification. This means that the pronunciation of the user (speaker) is relatively close to that of the native speaker and the general voice identification that does not apply to the present invention. This means that it is accurate enough to be recognized with a high identification rate even when using a device. On the other hand, if a similar word is selected, the speaker's pronunciation is incorrect or unclear. Means. This also applies to the case where the word intended by the user is included in the similar words in the voice identification program 16.

従って、ユーザが意図した単語が、類似単語として識別された場合に、話者に対して発音がどのように誤っているか、正しく発音するにはどのようにしたらよいか等の情報（発音矯正情報）を提示することにより、話者の発音の学習を補助でき、話者の発音を矯正できることが分かる。以下、第２の実施形態として示す発音矯正方法は、この点に着目し、第１の実施形態として示した音声識別処理を変形し、話者の発音が不正確であると判断できる場合に、話者に発音矯正情報を示す画像表示するようにしたものである。 Therefore, when the word intended by the user is identified as a similar word, information such as how the pronunciation is wrong for the speaker and how to pronounce correctly (pronunciation correction information) ) Can assist in learning the pronunciation of the speaker and correct the pronunciation of the speaker. Hereinafter, the pronunciation correction method shown as the second embodiment pays attention to this point, and when the speech identification process shown as the first embodiment is modified and it can be determined that the speaker's pronunciation is incorrect, An image showing pronunciation correction information is displayed to the speaker.

［コンピュータ２]
図１３は、本発明にかかる音声識別処理および発音矯正方法を実現するコンピュータ２の構成を示す図である。なお、特に断らない限り、以下の図面に示す構成部分は、これまでの図面に示した同一符号の構成部分と同じである。図１３に示すように、コンピュータ２は、コンピュータ１（図１）の入力装置１２０を入力装置１３０で置換した構成を採り、入力装置１３０は、入力装置１２０に画像入力用ボード１３２を追加した構成を採る。画像入力用ボード１３２は、例えば、発音矯正画像（図１９）に用いられる画像データをビデオカメラとり込むために用いられる。 [Computer 2]
FIG. 13 is a diagram showing a configuration of the computer 2 that realizes the voice identification process and the pronunciation correction method according to the present invention. Unless otherwise specified, the constituent parts shown in the following drawings are the same as the constituent parts having the same reference numerals shown in the previous drawings. As shown in FIG. 13, the computer 2 adopts a configuration in which the input device 120 of the computer 1 (FIG. 1) is replaced with an input device 130, and the input device 130 has a configuration in which an image input board 132 is added to the input device 120. Take. The image input board 132 is used, for example, for capturing image data used for a pronunciation correction image (FIG. 19).

［ソフトウェア２０]
図１４は、本発明にかかる音声識別処理および発音矯正方法を実現するソフトウェア２０を示す図である。図１４に示すように、ソフトウェア２０は、ソフトウェア１４（図２）における音声識別プログラム１６を音声識別・矯正プログラム２２で置換した構成を採る。ソフトウェア１４において各構成部分の間で入出力されるデータの他に、ソフトウェア２０においては、画像データがさらに入出力され、ソフトウェア１４における識別結果（テキストデータ）の代わりに、ユーザ（話者）の発音を矯正する発音矯正情報を示す画像（発音矯正画像）がモニタ１０２等に出力されるようになっている。 [Software 20]
FIG. 14 is a diagram showing the software 20 for realizing the voice identification process and the pronunciation correction method according to the present invention. As shown in FIG. 14, the software 20 employs a configuration in which the voice identification program 16 in the software 14 (FIG. 2) is replaced with a voice identification / correction program 22. In addition to the data input / output between each component in the software 14, image data is further input / output in the software 20, and instead of the identification result (text data) in the software 14, the user (speaker) An image (pronunciation correction image) indicating pronunciation correction information for correcting pronunciation is output to the monitor 102 or the like.

［音声識別・矯正プログラム２２]
図１５は、図１４に示した音声識別・矯正プログラム２２の構成を示す図である。図１５に示すように、音声識別・矯正プログラム２２は、音声識別プログラム１６（図３）および発音矯正プログラム２４から構成される（絞り込み部１８８は省略）。 [Voice identification / correction program 22]
FIG. 15 is a diagram showing the configuration of the voice identification / correction program 22 shown in FIG. As shown in FIG. 15, the voice identification / correction program 22 includes a voice identification program 16 (FIG. 3) and a pronunciation correction program 24 (the narrowing-down unit 188 is omitted).

［類似単語追加部１８６の変更点]
音声識別・矯正プログラム２２においては、音声識別プログラム１６においてと異なり、類似単語追加部１８６は、発音矯正プログラム２４の比較部２４０および発音矯正情報表示部２４２に対して類似単語レコード（図１８）を出力する。 [Changes in similar word addition unit 186]
In the speech identification / correction program 22, unlike the speech identification program 16, the similar word addition unit 186 sends similar word records (FIG. 18) to the comparison unit 240 and the pronunciation correction information display unit 242 of the pronunciation correction program 24. Output.

［制御部１６２の変更点]
図１６は、図１５に示した制御部１６２が表示する発音指示画像を例示する図である。図１７（Ａ），（Ｂ）はそれぞれ、図１５に示した単語データベース部１６０が生成する矯正情報インデックスを例示する図であって、（Ａ）はｒの発音を矯正するための発音矯正画像を示し、（Ｂ）はｌの発音を矯正するための発音矯正画像を示す。図１８は、第２の実施形態において単語データベース部１６０が生成する類似単語レコードを示す図である。 [Changes in control unit 162]
FIG. 16 is a diagram exemplifying a sound generation instruction image displayed by the control unit 162 shown in FIG. FIGS. 17A and 17B are diagrams illustrating examples of correction information indexes generated by the word database unit 160 shown in FIG. 15, and FIG. 17A is a pronunciation correction image for correcting the pronunciation of r. (B) shows a pronunciation correction image for correcting the pronunciation of l. FIG. 18 is a diagram illustrating similar word records generated by the word database unit 160 in the second embodiment.

制御部１６２は、図１６に例示するように、ユーザに発音すべき単語（図１６においては"read"）を示し、発音を促す発音指示画像（図１６においては「"read"と発音してみて下さい！」というテキストデータを含む画像）をさらに生成してモニタ１０２に表示し、ユーザに発音を指示した単語（正解単語）を比較部２４０に対して出力する。また、制御部１６２は、発音矯正画像表示部２４２から矯正情報コード(CCode;図１８)が入力された場合に、この矯正情報コードを単語データベース部１６０に対して出力し、この矯正情報コードが示す矯正情報インデックス（図１７（Ａ），（Ｂ））を単語データベース部１６０から得る。さらに、制御部１６２は、この矯正情報インデックスに含まれるｎ個のエントリ（ｎは整数；図１７（Ａ），（Ｂ）においてはｎ＝８）が示す画像データおよびテキストデータを記憶装置１１０から読み出して、発音矯正画像表示部２４２に対して出力する。 As illustrated in FIG. 16, the control unit 162 indicates a word to be pronounced by the user (“read” in FIG. 16), and pronounces a pronunciation instruction image (“read” in FIG. 16) to prompt the pronunciation. The image including the text data “Please see!” Is further generated and displayed on the monitor 102, and the word (correct word) instructing the user to pronounce is output to the comparison unit 240. Further, when the correction information code (CCode; FIG. 18) is input from the pronunciation correction image display unit 242, the control unit 162 outputs the correction information code to the word database unit 160, and the correction information code is The correction information index shown (FIGS. 17A and 17B) is obtained from the word database unit 160. Further, the control unit 162 stores the image data and text data indicated by n entries (n is an integer; n = 8 in FIGS. 17A and 17B) included in the correction information index from the storage device 110. Read out and output to the pronunciation correction image display unit 242.

図１９は、図１７（Ａ）に例示した矯正情報インデックスが示す第１の発音矯正画像を例示する図である。なお、図１９においては、図示の簡略化のために、図１７（Ａ）に例示したエントリ５〜７に対応するテキストデータは省略されている。また、制御部１６２は、図１７（Ａ）に例示したような矯正情報インデックスに含まれるエントリと、図１９に例示するようなエントリそれぞれが示す画像データ(Image)およびテキストデータ(Text)とを対応付けて、記憶装置１１０に記憶する。 FIG. 19 is a diagram illustrating a first pronunciation correction image indicated by the correction information index illustrated in FIG. In FIG. 19, for simplification of illustration, text data corresponding to entries 5 to 7 illustrated in FIG. 17A is omitted. Further, the control unit 162 stores an entry included in the correction information index as illustrated in FIG. 17A, and image data (Image) and text data (Text) indicated by each of the entries as illustrated in FIG. The data is stored in the storage device 110 in association with each other.

図１７（Ａ）に例示した矯正情報インデックスは、ユーザのｒの発音を矯正するために用いられ、発音の矯正すべき点を示すテキストデータ、ｒを発音する際の口の形を示す画像データ、ｌを発音する際の口の形を示す画像データ、ｒを発音するためのアドバイスを示すテキストデータ、ｒを含む単語の例、ｌを含む単語の例、ｒとｌとが現れる単語の例を示すテキストデータ、および、ｒとｌとが現れる文章の例を示すテキストデータをそれぞれ示すエントリ（エントリ１〜４，８）を含む。図１７（Ａ）に例示した矯正情報インデックスからは、図１９に例示するような発音矯正画像が生成され、モニタ１０２に表示される。 The correction information index illustrated in FIG. 17A is used to correct the pronunciation of the user's r, and is text data indicating a point where the pronunciation should be corrected, and image data indicating the shape of the mouth when the r is pronounced. , Image data indicating the mouth shape when sounding l, text data indicating advice for sounding r, examples of words including r, examples of words including l, examples of words in which r and l appear And entries (entries 1 to 4 and 8) respectively indicating text data indicating examples of sentences in which r and l appear. From the correction information index illustrated in FIG. 17A, a pronunciation correction image as illustrated in FIG. 19 is generated and displayed on the monitor 102.

また、図１７（Ｂ）に例示した矯正情報インデックスは、ユーザのｌの発音を矯正するために用いられ、発音の矯正すべき点を示すテキストデータ、ｌを発音する際の口の形を示す画像データ、ｒを発音する際の口の形を示す画像データ、ｌを発音するためのアドバイスを示すテキストデータ、ｌを含む単語の例、ｒを含む単語の例、ｒとｌとが現れる単語の例を示すテキストデータ、および、ｒとｌとが現れる文章の例を示すテキストデータをそれぞれ示すエントリを含む。 Further, the correction information index illustrated in FIG. 17B is used to correct the pronunciation of the user l, and indicates text data indicating a point to be corrected for pronunciation and the shape of the mouth when l is pronounced. Image data, image data indicating the shape of the mouth when sounding r, text data indicating advice for sounding l, examples of words including l, examples of words including r, words in which r and l appear And entries indicating text data indicating examples of sentences in which r and l appear, respectively.

［単語データベース部１６０の変更点]
単語データベース部１６０は、ソフトウェア１４においてと異なり、誤り情報コードテーブル（図７）の代わりに、図１７（Ａ），（Ｂ）に例示したような矯正情報インデックスを作成し、単語データとして記憶装置１１０にさらに記憶する。 [Changes in word database section 160]
Unlike the software 14, the word database unit 160 creates a correction information index as exemplified in FIGS. 17A and 17B instead of the error information code table (FIG. 7), and stores it as word data. Further store in 110.

また、単語データベース部１６０は、図１８に示すように、矯正情報インデックスのいずれかを示す矯正情報コード(CCode; correction code)を、誤りコード(ECode;図６)の代わりに類似単語レコードに付加し、記憶装置１１０に記憶する。また、単語データベース部１６０は、発音矯正画像表示部２４２から制御部１６２を介して矯正情報コードが入力された場合に、入力された矯正情報コード（図１８）に対応する矯正情報インデックス（図１７（Ａ），（Ｂ））を記憶装置１１０から読み出して、制御部１６２に対して出力する。 Further, as shown in FIG. 18, the word database unit 160 adds a correction information code (CCode; correction code) indicating one of the correction information indexes to the similar word record instead of the error code (ECode; FIG. 6). And stored in the storage device 110. When the correction information code is input from the pronunciation correction image display unit 242 via the control unit 162, the word database unit 160 corrects the correction information index (FIG. 17) corresponding to the input correction information code (FIG. 18). (A) and (B)) are read from the storage device 110 and output to the control unit 162.

［比較部２４０]
比較部２４０（図１５）は、類似単語追加部１８６から入力される類似単語レコード（図１８）が示す類似単語レコードに含まれる類似単語それぞれと、制御部１６２から入力される正解単語とを比較し、正解単語が類似単語のいずれかと一致するか否かを判断する。比較部２４０は、正解単語が類似単語と一致する場合にはその旨を、一致しない場合にはその旨を発音矯正画像表示部２４２に対して通知する。 [Comparator 240]
The comparison unit 240 (FIG. 15) compares each similar word included in the similar word record indicated by the similar word record (FIG. 18) input from the similar word addition unit 186 with the correct word input from the control unit 162. Then, it is determined whether the correct word matches any of the similar words. When the correct word matches the similar word, the comparison unit 240 notifies the fact to the pronunciation correction image display unit 242.

［発音矯正画像表示部２４２]
発音矯正画像表示部２４２は、比較部２４０が正解単語のいずれかと類似単語とが一致すると判定した場合に、類似単語レコード（図１８）に付加された矯正情報コード(CCode)が示す矯正情報インデックス（図１７（Ａ），（Ｂ））が示す画像データおよびテキストデータの取得を制御部１６２に要求する。制御部１６２が、この要求に応えて画像データおよびテキストデータを記憶装置１１０から読み出し、発音矯正画像表示部２４２に対して出力すると、発音矯正画像表示部２４２は、これらのデータを、図１９において符号（ａ）〜（ｅ）を付して例示したような位置に配置し、発音矯正画像を生成してモニタ１０２に表示する。 [Pronunciation correction image display unit 242]
The pronunciation correction image display unit 242 corrects the correction information index indicated by the correction information code (CCode) added to the similar word record (FIG. 18) when the comparison unit 240 determines that any of the correct words matches the similar word. The control unit 162 is requested to acquire the image data and text data indicated by (FIGS. 17A and 17B). In response to this request, the control unit 162 reads out image data and text data from the storage device 110 and outputs them to the pronunciation correction image display unit 242, and the pronunciation correction image display unit 242 displays these data in FIG. It arrange | positions in the position which attached | subjected code | symbol (a)-(e), illustrated, and produces | generates a pronunciation correction image, and displays it on the monitor 102. FIG.

［音声識別・矯正プログラム２２の動作]
以下、音声識別・矯正プログラム２２の動作を説明する。図２０は、第２の実施形態における音声識別・矯正プログラム２２（図１５）の処理（Ｓ２０）を示すフローチャートである。 [Operation of voice identification / correction program 22]
Hereinafter, the operation of the voice identification / correction program 22 will be described. FIG. 20 is a flowchart showing the process (S20) of the voice identification / correction program 22 (FIG. 15) in the second embodiment.

図２０において、ステップ２００（Ｓ２００）に示すように、ユーザの操作入力に応じて、制御部１６２が、例えば、図１６に示したように、"read"という単語の発音をユーザに促す発音指示画像をモニタ１０２に表示する。制御部１６２は、正解単語"read"を比較部２４０に対して出力する。ユーザが発音指示画像に応じて"read"と発音すると、音声識別プログラム１６のベクトルデータ作成部１８０、ラベルデータ作成部１８２、候補単語作成部１８４および類似単語追加部１８６（図１５）は、ユーザが発音した音声を識別し、類似単語レコードを比較部２４０および発音矯正画像表示部２４２に対して出力する。 In FIG. 20, as shown in step 200 (S <b> 200), in response to a user operation input, the control unit 162, for example, as shown in FIG. 16, a pronunciation instruction that prompts the user to pronounce the word “read” The image is displayed on the monitor 102. The control unit 162 outputs the correct word “read” to the comparison unit 240. When the user pronounces “read” in response to the pronunciation instruction image, the vector data creation unit 180, label data creation unit 182, candidate word creation unit 184, and similar word addition unit 186 (FIG. 15) of the voice identification program 16 Is identified and a similar word record is output to the comparison unit 240 and the pronunciation correction image display unit 242.

ここで、第１の実施形態において例示したように、ユーザが正しく"read"と発音すると、候補単語作成部１８４は単語"read"を含む候補単語を類似単語追加部１８６に対して出力し、類似単語追加部１８６は、単語"lead"等を類似単語として含む類似単語レコード（図１８）を比較部２４０および発音矯正画像表示部２４２に対して出力する。反対に、例えば、ユーザが"r"の発音と"l"の発音とを区別できず、不正確な"read"の発音を行なうと、候補単語作成部１８４は、単語"read"の代わりに単語"lead"等を含む候補単語を類似単語追加部１８６に対して出力し、類似単語追加部１８６は、単語"read"等を類似単語として含む類似単語レコードを比較部２４０および発音矯正画像表示部２４２に対して出力する。 Here, as illustrated in the first embodiment, when the user correctly pronounces “read”, the candidate word creation unit 184 outputs candidate words including the word “read” to the similar word addition unit 186, The similar word adding unit 186 outputs a similar word record (FIG. 18) including the word “lead” or the like as a similar word to the comparison unit 240 and the pronunciation correction image display unit 242. On the other hand, for example, if the user cannot distinguish between the pronunciation of “r” and the pronunciation of “l” and makes an incorrect pronunciation of “read”, the candidate word creation unit 184 replaces the word “read”. Candidate words including the word “lead” or the like are output to the similar word adding unit 186, and the similar word adding unit 186 displays a similar word record including the word “read” or the like as a similar word in the comparison unit 240 and pronunciation correction image display. To the unit 242.

ステップ２０２（Ｓ２０２）において、比較部２４０は、制御部１６２から入力された正解単語と、類似単語追加部１８６から入力された類似単語レコードに含まれる類似単語それぞれとを比較し、正解単語が類似単語のいずれかと一致しない場合には、その旨を発音矯正画像表示部２４２に通知して"read"の発音の矯正・学習に関する処理を終了し、例えば次の単語の発音の矯正・学習に関する処理に進む。これ以外の場合には、比較部２４０は、正解単語が類似単語のいずれかと一致したことを発音矯正画像表示部２４２に通知して、Ｓ２０４の処理に進む。 In step 202 (S202), the comparison unit 240 compares the correct word input from the control unit 162 with each similar word included in the similar word record input from the similar word adding unit 186, and the correct word is similar. If it does not match any of the words, the pronunciation correction image display unit 242 is notified of this, and the processing related to the correction / learning of the pronunciation of “read” is terminated. For example, the processing related to the correction / learning of the pronunciation of the next word Proceed to In other cases, the comparison unit 240 notifies the pronunciation corrective image display unit 242 that the correct word matches any of the similar words, and proceeds to the process of S204.

ステップ２０４（Ｓ２０４）において、発音矯正画像表示部２４２は、類似単語レコードから矯正情報コード(CCode)を得る。さらに、発音矯正画像表示部２４２は、制御部１６２に対して矯正情報コードを出力し、発音矯正画像に用いる画像データおよびテキストデータの取得を要求する。制御部１６２は、発音矯正画像表示部２４２からの要求に応じて、単語データベース部１６０から、図１７（Ａ）に例示した矯正情報インデックスを得て、この矯正情報インデックスのエントリ（エントリ１〜４，８）それぞれが示す画像データおよびテキストデータを記憶装置１１０から読み出し、発音矯正画像表示部２４２に対して出力する。 In step 204 (S204), the pronunciation correction image display unit 242 obtains a correction information code (CCode) from the similar word record. Further, the pronunciation correction image display unit 242 outputs a correction information code to the control unit 162 and requests acquisition of image data and text data used for the pronunciation correction image. In response to a request from the pronunciation correction image display unit 242, the control unit 162 obtains the correction information index illustrated in FIG. 17A from the word database unit 160, and the correction information index entry (entries 1 to 4). 8) The image data and text data indicated by each are read from the storage device 110 and output to the pronunciation correction image display unit 242.

ステップ２０６（Ｓ２０６）において、発音矯正画像表示部２４２は、制御部１６２から入力された矯正情報インデックスのエントリ（エントリ１〜４，８）それぞれに対応する画像データおよびテキストデータを、それぞれ図１９に例示する位置（ａ）〜（ｅ）に配置した発音矯正画像を生成し、モニタ１０２に表示し、"read"に関する処理を終了し、例えば次の単語の発音の矯正・学習に関する処理に進む。 In step 206 (S206), the pronunciation correction image display unit 242 displays the image data and text data corresponding to the correction information index entries (entries 1 to 4, 8) input from the control unit 162, respectively, in FIG. The pronunciation correction images arranged at the illustrated positions (a) to (e) are generated, displayed on the monitor 102, the processing relating to “read” is terminated, and the processing proceeds to, for example, processing relating to correction / learning of pronunciation of the next word.

［変形例]
なお、第２の実施形態においては、発音矯正情報がテキストデータおよび画像データのみを含む場合を例示したが、発音矯正情報が他の種類のデータを含んでいてもよい。例えば、発音矯正情報に正しい発音の音声データを含め、発音矯正情報（図１９）をモニタ１０２に表示するとともに、音声データをスピーカ１０４を介して出力するようにしてもよい。 [Modification]
In the second embodiment, the case where the pronunciation correction information includes only text data and image data is exemplified, but the pronunciation correction information may include other types of data. For example, the pronunciation correction information may include sound data of correct pronunciation, the pronunciation correction information (FIG. 19) may be displayed on the monitor 102, and the sound data may be output via the speaker 104.

また、第２の実施形態においては、類似単語と発音矯正情報コードとを対応付けて管理する場合を示したが、図６に示した類似単語レコードに含めて管理するように音声識別・矯正プログラム２２を構成してもよい。また、第２の実施形態においては、音声識別・矯正プログラム２２が発音矯正情報のみを出力する場合を示したが、発音矯正情報および誤り情報の両方を出力するように構成してもよい。また、第２の実施形態においては、制御部１６２が発音指示画像をモニタ１０２に表示し、発音矯正画像表示部２４２が発音矯正情報をモニタ１０２に表示する場合を例示したが、これらの構成部分のいずれかが、これら両方の画像をモニタ１０２に表示するように音声識別・矯正プログラム２２を構成してもよい。 Further, in the second embodiment, the case where similar words and pronunciation correction information codes are managed in association with each other has been shown. However, the voice identification / correction program is managed so as to be included in the similar word record shown in FIG. 22 may be configured. In the second embodiment, the case where the voice identification / correction program 22 outputs only the pronunciation correction information has been described. However, both the pronunciation correction information and the error information may be output. Further, in the second embodiment, the case where the control unit 162 displays the pronunciation instruction image on the monitor 102 and the pronunciation correction image display unit 242 displays the pronunciation correction information on the monitor 102 is exemplified. The voice identification / correction program 22 may be configured to display either of these images on the monitor 102.

また、発音矯正画像（図１９）に表示される全ての情報が矯正情報インデックス（図１７（Ａ），（Ｂ））に登録されている必要はない。例えば、図１７（Ａ）に示した矯正すべき点および再発音支持のテキストデータは、音声識別・矯正プログラム２２（図１５）の作り方に応じて、矯正情報インデックスに登録されていても、あるいは、矯正情報インデックスには登録されず、発音矯正画像に予め書き込まれていてもよい。 Further, it is not necessary that all information displayed in the pronunciation correction image (FIG. 19) is registered in the correction information index (FIGS. 17A and 17B). For example, the points to be corrected and the text data for supporting re-sounding shown in FIG. 17A may be registered in the correction information index depending on how to create the voice identification / correction program 22 (FIG. 15), or Instead of being registered in the correction information index, it may be written in advance in the pronunciation correction image.

［第３実施形態]
以下、第３の実施形態として、第２の実施形態として示した発音矯正方法を応用した発音学習方法を説明する。この発音学習方法は、図２０に示した発音識別・矯正プログラム２２の動作を改良し、図１６および図１９に示した発音指示画像および発音の基礎を示す発音矯正画像の他に、発展的な学習を指示する発音矯正画像（図２１〜図２３）をさらに表示し、ユーザの発音学習の便宜を図ったものである。 [Third Embodiment]
Hereinafter, as a third embodiment, a pronunciation learning method to which the pronunciation correction method shown as the second embodiment is applied will be described. This pronunciation learning method improves the operation of the pronunciation identification / correction program 22 shown in FIG. 20, and in addition to the pronunciation instruction image and the pronunciation correction image showing the basics of pronunciation shown in FIGS. The phonetic correction images (FIGS. 21 to 23) for instructing learning are further displayed to facilitate the user's pronunciation learning.

図２１〜図２３は、それぞれ第３の実施形態として示す発音学習方法において用いられる第２〜第４の発音矯正画像を例示する図である。第３の実施形態において、発音識別・矯正プログラム２２は、図１６，１９に示した発音矯正画像に加え、図２１〜図２３に例示する発音矯正画像を表示する。 21 to 23 are diagrams illustrating second to fourth pronunciation correction images used in the pronunciation learning method shown as the third embodiment. In the third embodiment, the pronunciation identification / correction program 22 displays the pronunciation correction images illustrated in FIGS. 21 to 23 in addition to the pronunciation correction images shown in FIGS.

図２１に示す第２の発音矯正画像は、例えば、発音識別・矯正プログラム２２が、第１の発音矯正画像（図１９）を見てユーザが"read"を正しく発音したと判断した場合に表示され、ユーザが正しく"r","l"を区別して発音しているかを確認するために用いられる。なお、第２の発音矯正画像は、ユーザが第１の発音矯正画像に応じてユーザが正しく"read"と発音するまでは表示されず、ユーザが第２の発音矯正画像に示された単語"write","raw", "long", "light"の全てを正しく発音できるようになるまで繰り返し表示される。また、第２の発音矯正画像は、第１の発音矯正画像において省略されていた矯正情報インデックス（図１７（Ａ））のエントリ５，６を含んでいる。 The second pronunciation correction image shown in FIG. 21 is displayed, for example, when the pronunciation identification / correction program 22 determines that the user has correctly pronounced “read” by looking at the first pronunciation correction image (FIG. 19). And is used to check whether the user correctly pronounces “r” and “l”. Note that the second pronunciation correction image is not displayed until the user correctly pronounces “read” in accordance with the first pronunciation correction image, and the user does not display the word “shown in the second pronunciation correction image”. "write", "raw", "long", "light" are all displayed repeatedly until they can be pronounced correctly. Further, the second pronunciation correction image includes entries 5 and 6 of the correction information index (FIG. 17A) omitted in the first pronunciation correction image.

図２２に示す第３の発音矯正画像は、例えば、発音識別・矯正プログラム２２が、第２の発音矯正画像（図２１）内の各単語をユーザが正しく発音したと判断した場合に表示され、ユーザが"r","l"を区別するためのさらに進んだ練習を行なうために用いられる。なお、第３の発音矯正画像は、ユーザが第２の発音矯正画像に示された各単語の全てをユーザが正しく発音するまでは表示されず、第３の発音矯正画像に示されたセンテンス"writeletters", "great troubleの全てをユーザが正しく発音できるようになるまで繰り返し表示される。また、第２の発音矯正画像は、第１の発音矯正画像において省略されていた矯正情報インデックス（図１７（Ａ））のエントリ７を含んでいる。 The third pronunciation correction image shown in FIG. 22 is displayed, for example, when the pronunciation identification / correction program 22 determines that the user has correctly pronounced each word in the second pronunciation correction image (FIG. 21). Used by the user for further practice to distinguish between "r" and "l". Note that the third pronunciation correction image is not displayed until the user correctly pronounces all the words shown in the second pronunciation correction image, and the sentence “shown in the third pronunciation correction image” “writeletters” and “great trouble” are repeatedly displayed until the user can pronounce correctly. The second pronunciation correction image is the correction information index (FIG. 17) omitted in the first pronunciation correction image. (A)) entry 7 is included.

図２３に示す第４の発音矯正画像は、例えば、発音識別・矯正プログラム２２が、第３の発音矯正画像（図２２）内のセンテンスの全てをユーザが正しく発音したと判断した場合に表示され、ユーザが"r","l"を区別して発音できるようになったことを確認するために用いられる。なお、第４の発音矯正画像は、ユーザが第３の発音矯正画像に示されたセンテンスの全てをユーザが正しく発音するまでは表示されず、第４の発音矯正画像に示されたセンテンス"Theriver rose several feet and finally overflowed its banks."をユーザが正しく発音できるようになるまで繰り返し表示される。 The fourth pronunciation correction image shown in FIG. 23 is displayed, for example, when the pronunciation identification / correction program 22 determines that the user has correctly pronounced all the sentences in the third pronunciation correction image (FIG. 22). This is used to confirm that the user can pronounce "r" and "l" separately. Note that the fourth pronunciation correction image is not displayed until the user correctly pronounces all the sentences shown in the third pronunciation correction image, and the sentence “Theriver” shown in the fourth pronunciation correction image is not displayed. "rose several feet and finally overflowed its banks." is displayed repeatedly until the user can pronounce it correctly.

[音声識別・矯正プログラム２２の動作]
以下、第３の実施形態における音声識別・矯正プログラム２２の動作を説明する。図２４は、第３の実施形態における音声識別・矯正プログラム２２（図１５）の動作（Ｓ３０）を示すフローチャートである。図２５は、図２４に示した学習項目のリストアップ処理（Ｓ３００）において作成される学習項目リストを例示する図である。 [Operation of voice identification / correction program 22]
Hereinafter, the operation of the voice identification / correction program 22 in the third embodiment will be described. FIG. 24 is a flowchart showing the operation (S30) of the voice identification / correction program 22 (FIG. 15) in the third embodiment. FIG. 25 is a diagram illustrating a learning item list created in the learning item list-up process (S300) shown in FIG.

図２４に示すように、ステップ３００（Ｓ３００）において、音声識別・矯正プログラム２２は、図２０に示したように、正解単語と類似単語とを比較し、類似単語と一致する正解単語を求める。さらに、音声識別・矯正プログラム２２は、求めた正解単語において、ユーザが苦手とする発音（例えば"r","th"）を決定し、例えば図２５に例示するように、学習項目としてリストアップする。 As shown in FIG. 24, in step 300 (S300), the speech identification / correction program 22 compares the correct word with the similar word as shown in FIG. 20, and obtains the correct word that matches the similar word. Further, the voice identification / correction program 22 determines the pronunciation (for example, “r”, “th”) that the user is not good at in the obtained correct word, and lists it as a learning item, for example, as illustrated in FIG. To do.

ステップ３０２（Ｓ３０２）において、音声識別・矯正プログラム２２は、Ｓ３００の処理においてリストアップした学習項目がまだ学習されずに残っているか否かを判断する。音声識別・矯正プログラム２２は、学習項目が残っている場合にはＳ３０４の処理に進み、これ以外の場合には処理を終了する。 In step 302 (S302), the speech identification / correction program 22 determines whether or not the learning items listed in the process of S300 remain unlearned. The speech identification / correction program 22 proceeds to the process of S304 when the learning item remains, and ends the process otherwise.

ステップ３０４（Ｓ３０４）において、音声識別・矯正プログラム２２は、Ｓ３００の処理においてリストアップした学習項目のひとつを取り出す。ステップ３０６（Ｓ３０６）において、音声識別・矯正プログラム２２は、第２の実施形態に示したように、発音指示画像（図１６）および発音矯正画像（図１９）を表示し、ユーザに発音を学習させる。 In step 304 (S304), the voice identification / correction program 22 takes out one of the learning items listed in the process of S300. In step 306 (S306), the speech identification / correction program 22 displays the pronunciation instruction image (FIG. 16) and the pronunciation correction image (FIG. 19), as shown in the second embodiment, and learns pronunciation for the user. Let

ステップ３０８（Ｓ３０８）において、音声識別・矯正プログラム２２は、ユーザの音声を識別し、識別の結果として得られた単語が候補単語と一致する場合にはユーザが正しく発音したと判断してＳ３１０の処理に進み、これ以外の場合にはユーザが正しく発音しなかったと判断してＳ３０６の処理に戻る。 In step 308 (S308), the voice identification / correction program 22 identifies the user's voice, and if the word obtained as a result of the identification matches the candidate word, the voice identification / correction program 22 determines that the user has pronounced correctly and proceeds to S310. In other cases, it is determined that the user has not pronounced correctly, and the process returns to S306.

ステップ３１０（Ｓ３１０）において、音声識別・矯正プログラム２２は、学習に用いていない応用問題（第２〜第４の発音矯正画像；図２１〜図２３）があるか否かを判断する。応用問題が残っている場合には発音識別・矯正プログラム２２はＳ３１２の処理に進み、これ以外の場合にはＳ３０２の処理に戻る。 In step 310 (S310), the speech identification / correction program 22 determines whether there is an applied problem (second to fourth pronunciation correction images; FIGS. 21 to 23) that is not used for learning. When the application problem remains, the pronunciation identification / correction program 22 proceeds to the process of S312 and otherwise returns to the process of S302.

ステップ３１２（Ｓ３１２）において、音声識別・矯正プログラム２２は、第２〜第４の発音矯正画像のいずれかを表示し、ユーザに発音を学習させる。ステップ３１４（Ｓ３１４）において、音声識別・矯正プログラム２２は、ユーザが正しく発音できたか否かを判断し、ユーザの発音が正しい場合にはＳ３１６の処理に進み、これ以外の場合にはＳ３１２の処理に戻る。 In step 312 (S312), the speech identification / correction program 22 displays any one of the second to fourth pronunciation correction images and allows the user to learn pronunciation. In step 314 (S314), the voice identification / correction program 22 determines whether or not the user has pronounced correctly. If the user's pronunciation is correct, the process proceeds to S316. Otherwise, the process of S312 is performed. Return to.

ステップ３１６（Ｓ３１６）において、音声識別・矯正プログラム２２は、ユーザの発音に、Ｓ３００の処理においてリストアップされた学習項目以外の誤りがあるか否かを判断する。このような誤りがある場合には、音声識別・矯正プログラム２２はＳ３１８の処理に進み、これ以外の場合にはＳ３１０の処理に戻る。 In step 316 (S316), the voice identification / correction program 22 determines whether or not the user's pronunciation has an error other than the learning items listed in the process of S300. If there is such an error, the voice identification / correction program 22 proceeds to the process of S318, and otherwise returns to the process of S310.

ステップ３１８（Ｓ３１８）において、音声識別・矯正プログラム２２は、Ｓ３１６の処理において見つかったユーザの発音上の誤りを学習項目に加え、Ｓ３１０の処理に戻る。 In step 318 (S318), the speech identification / correction program 22 adds the user's pronunciation error found in the process of S316 to the learning item, and returns to the process of S310.

以上説明したように、本発明にかかる音声識別装置およびその方法によれば、上述した従来技術の問題点に鑑みてなされたものであり、所定の言語を母国語としない話者（ノン・ネイティブ）による所定の言語の話し声に含まれる単語それぞれを識別し、話者が意図する所定の言語の単語に置換して、正確なテキストデータを作成することができる。 As described above, according to the speech identification device and method therefor according to the present invention, it has been made in view of the above-mentioned problems of the prior art, and a speaker whose non-native language is a predetermined language (non-native) ) Can be identified and replaced with words in a predetermined language intended by the speaker to create accurate text data.

また、本発明にかかる音声識別装置およびその方法によれば、話されている地域が異なる等のために、同一の言語の発音が変化したような場合であっても、いずれの地域の話者による話し声でも、話者が意図する単語に変換して、正確なテキストデータを作成することができる。また、本発明にかかる音声識別装置およびその方法によれば、発音の個人差を補って、常に高い識別率を保つことができる。 Further, according to the speech identification device and method according to the present invention, even if the pronunciation of the same language changes due to a different spoken area, the speaker in any area Can be converted into words intended by the speaker, and accurate text data can be created. In addition, according to the voice identification device and method according to the present invention, it is possible to always maintain a high identification rate by compensating for individual differences in pronunciation.

さらに、本発明にかかる発音矯正装置およびその方法によれば、本発明にかかる上記音声識別装置およびその方法の処理の過程で得られるデータを利用して話者の発音の問題点を指摘することができ、また、話者にネイティブスピーカの発音を学習させ、話者の発音を矯正することができる。また、本発明にかかる発音矯正装置およびその方法によれば、話者の発音と正確な発音とを自動的に比較して誤りを指摘することができ、さらに、話者がどのように発音を矯正すべきかを示す詳細な情報を提示し、その発音を矯正することができる。 Furthermore, according to the pronunciation correcting apparatus and method thereof according to the present invention, the problem of the pronunciation of the speaker is pointed out using the data obtained in the process of the voice identifying apparatus and method according to the present invention. In addition, the speaker can learn the pronunciation of the native speaker and correct the speaker's pronunciation. Further, according to the pronunciation correcting apparatus and method according to the present invention, it is possible to automatically compare the pronunciation of the speaker with the correct pronunciation to indicate an error, and further, how the speaker can pronounce the pronunciation. It is possible to present detailed information indicating whether or not to correct and correct the pronunciation.

本発明にかかる音声識別処理を実現するコンピュータの構成を例示する図である。It is a figure which illustrates the structure of the computer which implement | achieves the audio | voice identification process concerning this invention. 図２は、本発明にかかる音声識別処理を実現するソフトウェアの構成を示す図である。FIG. 2 is a diagram showing a software configuration for realizing the voice identification process according to the present invention. 図２に示した音声識別プログラムの構成を示す図である。It is a figure which shows the structure of the audio | voice identification program shown in FIG. 単語データのインデックステーブルに含まれるデータを例示する図である。It is a figure which illustrates the data contained in the index table of word data. 単語データの単語レコードに含まれるデータを例示する図である。It is a figure which illustrates the data contained in the word record of word data. 単語データの類似単語レコードに含まれるデータを例示する図である。It is a figure which illustrates the data contained in the similar word record of word data. 単語データの誤り情報コードテーブルを例示する図である。It is a figure which illustrates the error information code table of word data. 候補単語作成部が類似単語追加部に出力する入力レコード(InWord)のデータ構造を示す図である。It is a figure which shows the data structure of the input record (InWord) which a candidate word preparation part outputs to a similar word addition part. 候補単語作成部が類似単語追加部に出力する入力レコードマトリクス(InMatrix)のデータ構造を示す図である。It is a figure which shows the data structure of the input record matrix (InMatrix) which a candidate word preparation part outputs to a similar word addition part. 類似単語追加部が絞り込み部に出力する出力レコード(OutWord)のデータ構造を示す図である。It is a figure which shows the data structure of the output record (OutWord) which a similar word addition part outputs to a narrowing-down part. 類似単語追加部が絞り込み部に出力する出力レコードマトリクス(OutMatrix)のデータ構造を示す図である。It is a figure which shows the data structure of the output record matrix (OutMatrix) which a similar word addition part outputs to a narrowing-down part. コンピュータにおける本発明にかかる音声識別処理を示すフローチャート図である。It is a flowchart figure which shows the audio | voice identification process concerning this invention in a computer. 図１３は、本発明にかかる音声識別処理および発音矯正方法を実現するコンピュータの構成を示す図である。FIG. 13 is a diagram showing the configuration of a computer that implements the speech identification process and the pronunciation correction method according to the present invention. 図１４は、本発明にかかる音声識別処理および発音矯正方法を実現するソフトウェアを示す図である。FIG. 14 is a diagram showing software for realizing the voice identification process and the pronunciation correction method according to the present invention. 図１４に示した音声識別・矯正プログラムの構成を示す図である。It is a figure which shows the structure of the audio | voice identification and correction program shown in FIG. 図１６は、図１５に示した制御部が表示する発音指示画像を例示する図である。FIG. 16 is a diagram illustrating a sound generation instruction image displayed by the control unit illustrated in FIG. 15. 図１７（Ａ），（Ｂ）はそれぞれ、図１５に示した単語データベース部が生成する矯正情報インデックスを例示する図であって、（Ａ）はｒの発音を矯正するための発音矯正画像を示し、（Ｂ）はｌの発音を矯正するための発音矯正画像を示す。FIGS. 17A and 17B are diagrams illustrating examples of correction information indexes generated by the word database unit shown in FIG. 15, and FIG. 17A shows pronunciation correction images for correcting the pronunciation of r. (B) shows a pronunciation correction image for correcting the pronunciation of l. 図１８は、第２の実施形態において単語データベース部が生成する類似単語レコードを示す図である。FIG. 18 is a diagram illustrating similar word records generated by the word database unit in the second embodiment. 図１７（Ａ）に例示した矯正情報インデックスが示す第１の発音矯正画像を例示する図である。It is a figure which illustrates the 1st pronunciation correction image which the correction information index illustrated in Drawing 17 (A) shows. 第２の実施形態における音声識別・矯正プログラム（図１５）の処理（Ｓ２０）を示すフローチャートである。It is a flowchart which shows the process (S20) of the audio | voice identification and correction program (FIG. 15) in 2nd Embodiment. 第３の実施形態として示す発音学習方法において用いられる第２の発音矯正画像を例示する図である。It is a figure which illustrates the 2nd pronunciation correction image used in the pronunciation learning method shown as 3rd Embodiment. 第３の実施形態として示す発音学習方法において用いられる第３の発音矯正画像を例示する図である。It is a figure which illustrates the 3rd pronunciation correction image used in the pronunciation learning method shown as 3rd Embodiment. 第３の実施形態として示す発音学習方法において用いられる第４の発音矯正画像を例示する図である。It is a figure which illustrates the 4th pronunciation correction image used in the pronunciation learning method shown as 3rd Embodiment. 第３の実施形態における音声識別・矯正プログラム（図１５）の動作（Ｓ３０）を示すフローチャートである。It is a flowchart which shows operation | movement (S30) of the audio | voice identification and correction program (FIG. 15) in 3rd Embodiment. 図２４に示した学習項目のリストアップ処理（Ｓ３００）において作成される学習項目リストを例示する図である。FIG. 25 is a diagram illustrating a learning item list created in the learning item list-up process (S300) shown in FIG. 24;

Explanation of symbols

１，２・・・コンピュータ
１０・・・コンピュータ本体
１００・・・出力装置
１０２・・・モニタ
１０４・・・スピーカ
１０６・・・プリンタ
１２０，１３０・・・入力装置
１２２・・・マイク
１２４・・・音声入力用ボード
１２６・・・キーボード
１２８・・・マウス
１３２・・・画像入力用ボード
１１０・・・記憶装置
１４，２０・・・ソフトウェア
１４２・・・ハードウェアサポート部
１４４・・・音声デバイスドライバ
１４６・・・記憶デバイスドライバ
１４８・・・オペレーティングシステム
１５０・・・音声インターフェース部
１５２・・・記憶装置インターフェース部
１６・・・音声識別プログラム
１６０・・・単語データベース部
１６２・・・制御部
１８・・・音声識別部
１８０・・・ベクトルデータ生成部
１８２・・・ラベル作成部
１８４・・・候補単語作成部
１８６・・・類似単語追加部
１８８・・・絞り込み部
２２・・・音声識別・矯正プログラム
２４・・・発音矯正プログラム
２４０・・・比較部
２４２・・・発音矯正画像表示部 1, 2 ... Computer 10 ... Computer main body 100 ... Output device 102 ... Monitor 104 ... Speaker 106 ... Printer 120, 130 ... Input device 122 ... Microphone 124 ... Voice input board 126 ... Keyboard 128 ... Mouse 132 ... Image input board 110 ... Storage device 14, 20 ... Software 142 ... Hardware support unit 144 ... Audio device Driver 146 ... Storage device driver 148 ... Operating system 150 ... Voice interface unit 152 ... Storage device interface unit 16 ... Voice identification program 160 ... Word database unit 162 ... Control unit 18 ... voice identification unit 180 ... vector data generation unit 182 ... 184 ... Candidate word creation unit 186 ... Similar word addition unit 188 ... Narrowing unit 22 ... Voice identification / correction program 24 ... Pronunciation correction program 240 ... Comparison unit 242 ... -Pronunciation correction image display

Claims

Candidate word associating means for associating one or more word candidates (candidate words) obtained by identifying speech data indicating words;
Similar word associating means for associating zero or more words (similar words) that can correspond to the pronunciation of each candidate word with each of the candidate words associated with voice data;
When the word indicated by the audio data matches the similar word associated with each of the candidate words associated with the audio data, it corresponds to the same similar word as the word indicated by the audio data, A pronunciation correction device comprising: pronunciation correction data output means for outputting pronunciation correction data for correcting the pronunciation of a word indicated by the voice data.

Associating one or more word candidates (candidate words) obtained by identifying speech data indicating words;
0 or more words (similar words) that can correspond to the pronunciation of each of the candidate words are associated with each of the candidate words associated with the voice data,
When the word indicated by the audio data matches the similar word associated with each of the candidate words associated with the audio data, it corresponds to the same similar word as the word indicated by the audio data, A pronunciation correction method for outputting pronunciation correction data for correcting the pronunciation of a word indicated by the voice data.

A candidate word associating step for associating one or more word candidates (candidate words) obtained by identifying speech data indicating words;
A similar word associating step of associating zero or more words (similar words) that can correspond to the pronunciation of each candidate word with each of the candidate words associated with speech data;
When the word indicated by the audio data matches the similar word associated with each of the candidate words associated with the audio data, it corresponds to the same similar word as the word indicated by the audio data, A computer-readable recording medium storing a program for causing a computer to execute a pronunciation correction data output step of outputting pronunciation correction data for correcting pronunciation of a word indicated by the voice data.