JP2015011348A

JP2015011348A - Training and evaluation method for foreign language speaking ability using voice recognition and device for the same

Info

Publication number: JP2015011348A
Application number: JP2014126355A
Authority: JP
Inventors: キ−ヨンパク; Ki-Yon Park; ユン−クアンイ; Yun-Kuang Lee; ヒョン−ペジョン; Hyeon-Pe Jong
Original assignee: Electronics and Telecommunications Research Institute ETRI
Current assignee: Electronics and Telecommunications Research Institute ETRI
Priority date: 2013-06-26
Filing date: 2014-06-19
Publication date: 2015-01-19
Also published as: KR20150001189A

Abstract

PROBLEM TO BE SOLVED: To provide a method, in relation to a foreign language speaking ability training and evaluation method using voice recognition, for training and evaluating foreign language speaking ability by a user by using a computer or other measurement devices.SOLUTION: The method includes: a step S310 for inputting a foreign language voice signal generated by a first user; a step S320 for recording the voice signal that has been input; a step S330 for reproducing the voice signal that has been recorded to provide the signal to a second user; a step S340 for inputting text data corresponding to the voice signal that has been provided by the second user; a step S350 for comparing the text data that has been input with the voice signal that has been recorded to measure a degree of accuracy; and a step S360 for providing an evaluation result with the degree of accuracy that has been measured to the first user.

Description

本発明は、音声認識を用いた外国語スピーキング能力の訓練及び評価方法に関し、より詳細には、コンピュータまたはその他の測定装置を用いて外国語スピーキング能力をユーザが自ら訓練し、評価する方法に関する。 The present invention relates to a method for training and evaluating a foreign language speaking ability using speech recognition, and more particularly, to a method for a user to train and evaluate a foreign language speaking ability by using a computer or other measuring device.

英語を含む多くの外国語教育過程においてスピーキング能力に対する重要度が大きくなることにより、これに関する教育需要が増えつつあり、国家英語能力評価試験（ＮＥＡＴ：ＮａｔｉｏｎａｌＥｎｇｌｉｓｈＡｂｉｌｉｔｙＴｅｓｔ）などのような大規模なスピーキング能力評価試験も増えることになり、これにより訓練及び評価装置に関する需要が大きくなっている。 As the importance of speaking ability increases in many foreign language education processes, including English, the educational demand for this is increasing, and there is a large scale such as the National English Abilities Test (NEAT). There will also be an increase in speaking ability evaluation tests, thereby increasing the demand for training and evaluation equipment.

図１は、従来外国語スピーキング訓練または評価方法の一例として、（ａ）予め定められたスクリプト（script）を朗読する場合、（ｂ）自由に発声し、音声認識システムが発声内容を自動で認識する場合を示す図面である。 Fig. 1 shows an example of a conventional foreign language speaking training or evaluation method. (A) When reading a predetermined script (b), (b) Speaking freely and the speech recognition system automatically recognizes the utterance content. It is drawing which shows the case where it does.

先ず、図１の（ａ）に示すように、現在コンピュータ及びその他の補助装置を用いた外国語スピーキング訓練及び評価方法のうち、スクリプトが予め定められている場合は、ユーザがスクリプトをみて、それに従って朗読することにより音声信号が発生される。このように発生された音声信号を録音し、録音された信号と事前に格納されたスクリプトとを比較して定量的にスピーキング能力を評価することが一般的な方法であった。 First, as shown in FIG. 1A, when a script is determined in advance in a foreign language speaking training and evaluation method using a computer and other auxiliary devices, the user views the script and A speech signal is generated by reading according to the above. It has been a general method to record the voice signal generated in this way and compare the recorded signal with a pre-stored script to quantitatively evaluate the speaking ability.

しかし、このような場合は、スクリプトを予め用意しておき、読むということから、実際のスピーキングとは差があり、テストの有効性が低下する。 However, in such a case, since the script is prepared and read in advance, there is a difference from the actual speaking, and the effectiveness of the test is reduced.

一方、上述のような問題点を解決するために、図１の（ｂ）に示すように、自動音声認識システムを用いて、ユーザが発声した内容を自動で認識し、これを用いてスピーキング能力を評価する場合もあるが、現在の音声認識技術では、認識の結果が不正確であり、また若干の認識誤作動でもある場合には、最終の評価結果が大きく変わることもあるので、正確な評価が不可能であるという問題点があった。 On the other hand, in order to solve the above-described problems, as shown in FIG. 1B, the automatic speech recognition system is used to automatically recognize the content spoken by the user, and using this, the speaking ability However, in the current speech recognition technology, if the recognition result is inaccurate, and if it is a slight recognition malfunction, the final evaluation result may change greatly. There was a problem that evaluation was impossible.

特開２００８−２４２４３７号公報JP 2008-242437 A

本発明は、上述した問題点を解決するためになされたものであって、本発明は、ユーザがスクリプトなしで先に自由に発声した後に、これを録音し、ユーザに直ちにまたは一定時間後に聞かせるようにし、ユーザはこれを聞いて自分が発声した内容をキーボードなどの入力装置を用いて直接タイピングし、このようにタイピングした文章と録音された発声文章とを互いに比較してスピーキング能力の評価を行う方法を提供する。 The present invention has been made to solve the above-described problems, and the present invention records the voice after the user speaks freely without a script first, and prompts the user immediately or after a certain period of time. The user listens to this and types what he / she uttered directly using an input device such as a keyboard, and compares the typed text and the recorded uttered text with each other to evaluate the speaking ability. Provide a way to do.

このような過程を通じて、上述した問題点を全て解決することができ、ユーザも自ら発声した内容を再び確認しながらテキストを作成するので、聞き取り能力が向上するだけでなく、自分の発音上の問題点を自ら認識することになり、教育効果をより高めることができる。 Through this process, all the above-mentioned problems can be solved, and the user creates the text while reconfirming the contents spoken by himself. The point will be recognized by itself, and the educational effect can be further enhanced.

本発明の技術的課題を達成するために、本発明の一実施例は、音声認識を用いた外国語スピーキング能力の訓練及び評価方法において、第１ユーザが発声した外国語音声信号が入力されるステップと、上記入力された音声信号をレコーディングするステップと、上記レコーディングされた音声信号を再生して第２ユーザに提供するステップと、上記提供された音声信号に対応するテキストデータが上記第２ユーザにより入力されるステップと、上記入力されたテキストデータと上記レコーディングされた音声信号とを比較して正確度を測定するステップと、上記測定された正確度による評価結果を上記第１ユーザに提供するステップと、を含む。 In order to achieve the technical problem of the present invention, in an embodiment of the present invention, a foreign language speech signal uttered by a first user is input in a method for training and evaluating a foreign language speaking ability using speech recognition. A step of recording the input audio signal, a step of reproducing the recorded audio signal and providing it to a second user, and text data corresponding to the provided audio signal is stored in the second user. A step of measuring the accuracy by comparing the input text data with the recorded voice signal, and providing an evaluation result based on the measured accuracy to the first user. Steps.

上記第１ユーザは、上記音声認識を用いた外国語スピーキング能力訓練及び評価での被評価者であり、上記第２ユーザは、上記第１ユーザと同一のユーザであることが好ましい。 Preferably, the first user is a person to be evaluated in foreign language speaking ability training and evaluation using the voice recognition, and the second user is the same user as the first user.

上記レコーディングされた音声信号を再生して第２ユーザに提供するステップは、上記入力された音声信号をレコーディングした直ちにまたは所定のｎ時間（ｎは、陽の実数）後に、上記第２ユーザに提供することが好ましい。 The step of reproducing the recorded audio signal and providing it to the second user is provided to the second user immediately after recording the input audio signal or after a predetermined n hours (n is a positive real number). It is preferable to do.

上記提供された音声信号に対応するテキストデータが入力されるステップは、上記提供された音声信号に対応する全体テキストが入力されるか、または音声認識モジュールが認識した音声信号のテキストのうちの誤りが含まれた一部テキストに対する修正データが入力されることが好ましい。 The step of inputting text data corresponding to the provided speech signal includes inputting the whole text corresponding to the provided speech signal or an error in the text of the speech signal recognized by the speech recognition module. It is preferable that correction data for a part of text including “” is input.

上記正確度を測定するステップは、上記レコーディングされた音声信号を、アナログ音声信号からデジタル音声信号である音声データに変換するステップと、上記入力されたテキストデータを発音記号で表示した文字列である発音列に変換するステップと、上記変換された発音列を上記変換された音声データと整列させるステップと、上記整列された発音列と音声データとを比較して上記発音列単位の正確度を定量的に測定するステップと、を含むことが好ましい。 The step of measuring the accuracy is a step of converting the recorded voice signal from an analog voice signal to voice data which is a digital voice signal, and a character string in which the input text data is displayed by phonetic symbols. The step of converting to a pronunciation string, the step of aligning the converted pronunciation string with the converted voice data, and comparing the aligned pronunciation string and the voice data to determine the accuracy of the pronunciation string unit Preferably measuring.

上記正確度を定量的に測定するステップは、上記発音列の音声的特徴（ｐｈｏｎｅｔｉｃｆｅａｔｕｒｅ）を用いて、上記音声的特徴が上記音声データに含まれているか否かに基づいて上記正確度を測定することがさらに好ましい。 The step of quantitatively measuring the accuracy measures the accuracy based on whether or not the speech feature is included in the speech data, using the phonetic feature of the phonetic sequence. More preferably.

上記正確度を定量的に測定するステップは、上記発音列の各音素に該当する信号自体をモデルとして定義し、上記音声データと上記定義されたモデルとの間の差をスコアで計算することにより、上記正確度を測定することがさらに好ましい。 In the step of quantitatively measuring the accuracy, the signal corresponding to each phoneme in the phonetic string is defined as a model, and a difference between the speech data and the defined model is calculated as a score. More preferably, the accuracy is measured.

一方、本発明の技術的課題を解決するために、本発明の他の実施例は、音声認識を用いた外国語スピーキング能力の訓練及び評価装置において、第１ユーザが発声した外国語音声信号が入力される音声信号受信部と、上記入力された音声信号をレコーディングする録音部と、上記レコーディングされた音声信号を再生して第２ユーザに提供する音声信号再生部と、上記提供された音声信号に対応するテキストデータが上記第２ユーザにより入力されるテキスト受信部と、上記入力されたテキストデータと上記レコーディングされた音声信号とを比較して正確度を測定する正確度測定部と、上記測定された正確度による評価結果を上記第１ユーザに提供する評価部と、を含む。 On the other hand, in order to solve the technical problem of the present invention, another embodiment of the present invention is a foreign language speaking ability training and evaluation apparatus using speech recognition, wherein a foreign language speech signal uttered by a first user is An input audio signal receiving unit, a recording unit for recording the input audio signal, an audio signal reproducing unit for reproducing the recorded audio signal and providing it to a second user, and the provided audio signal A text receiving unit in which text data corresponding to the second user is input, an accuracy measuring unit that measures the accuracy by comparing the input text data and the recorded voice signal, and the measurement And an evaluation unit that provides the first user with an evaluation result based on the accuracy obtained.

上記音声信号再生部は、上記入力された音声信号をレコーディングした直ちにまたは所定のｎ時間（ｎは、陽の実数）後に、上記第２ユーザに提供することが好ましい。 The audio signal reproduction unit preferably provides the second user immediately after recording the input audio signal or after a predetermined n hours (n is a positive real number).

上記テキスト受信部には、上記提供された音声信号に対応する全体テキストが入力されるか、または音声認識モジュールが認識した音声信号のテキストのうちの誤りが含まれた一部テキストに対する修正データが入力されることが好ましい。 The text receiving unit receives the whole text corresponding to the provided speech signal, or correction data for a partial text including an error in the text of the speech signal recognized by the speech recognition module. Preferably it is entered.

上記正確度測定部は、上記レコーディングされた音声信号を、アナログ音声信号からデジタル音声信号である音声データに変換するＡＤコンバータと、上記入力されたテキストデータを発音記号で表示した文字列である発音列に変換する発音列変換部と、上記変換された発音列を上記変換された音声データと整列させる整列部と、上記整列された発音列と音声データとを比較して上記発音列単位の正確度を定量的に測定するサブ正確度測定部と、を含むことが好ましい。 The accuracy measuring unit includes an AD converter that converts the recorded audio signal from an analog audio signal to audio data that is a digital audio signal, and a pronunciation that is a character string in which the input text data is represented by phonetic symbols. A phonetic string conversion unit for converting to a string; an alignment unit for aligning the converted phonetic string with the converted voice data; and comparing the sorted phonetic string with the voice data to accurately determine the pronunciation string unit. And a sub-accuracy measuring unit that quantitatively measures the degree.

上記サブ正確度測定部は、上記発音列の音声的特徴（ｐｈｏｎｅｔｉｃｆｅａｔｕｒｅ）を用いて、上記音声的特徴が上記音声データに含まれているか否かに基づいて上記正確度を測定することがさらに好ましい。 The sub-accuracy measuring unit may further measure the accuracy based on whether or not the audio feature is included in the audio data using a phonetic feature of the pronunciation sequence. preferable.

上記サブ正確度測定部は、上記発音列の各音素に該当する信号自体をモデルとして定義し、上記音声データと上記定義されたモデルとの間の差をスコアで計算することにより、上記正確度を測定することがさらに好ましい。 The sub-accuracy measurement unit defines the signal itself corresponding to each phoneme of the phonetic sequence as a model, and calculates the difference between the speech data and the defined model as a score, thereby calculating the accuracy. It is more preferable to measure.

本発明に係る音声認識を用いた外国語スピーキング能力の訓練及び評価方法により、ユーザが、定められたスクリプトを読むのではなく、自由に発声するように誘導することで、実際のスピーキング能力を訓練することができ、またこれを評価することができ、ユーザが入力した文章と発声した音声信号とを自動で比較して、既存の音声認識システム上の誤作動による評価結果の誤りを防止することができる。 With the method for training and evaluating foreign language speaking ability using speech recognition according to the present invention, the user is guided to speak freely instead of reading a predetermined script, thereby training the actual speaking ability. And can evaluate this, and automatically compares the text entered by the user with the voice signal uttered to prevent erroneous evaluation results due to malfunctions in the existing speech recognition system. Can do.

また、評価時に音声認識技術を用いて発声内容と文字列を音素単位で整列する方法を適用することができるので、ユーザの発声に対する流暢性評価、発音評価などがより正確に可能になる。 In addition, since a method of aligning utterance contents and character strings in units of phonemes using a speech recognition technique at the time of evaluation can be applied, fluency evaluation, pronunciation evaluation, and the like for a user's utterance can be performed more accurately.

従来の外国語スピーキング訓練または評価方法の一例であって、（ａ）予め定められたスクリプトを朗読する場合、（ｂ）自由に発声し、音声認識システムが発声内容を自動で認識する場合を示す図面である。It is an example of a conventional foreign language speaking training or evaluation method, and shows (a) when reading a predetermined script, (b) uttering freely, and the speech recognition system automatically recognizes the utterance content. It is a drawing. 本発明の一実施例により、外国語スピーキング能力訓練及び評価装置を用いて自由に発声し、評価結果が提供される例示の図面である。6 is an exemplary diagram illustrating an evaluation result obtained by freely speaking using a foreign language speaking ability training and evaluation device according to an embodiment of the present invention. 本発明の一実施例により、音声認識を用いた外国語スピーキング能力訓練及び評価方法を示すフローチャートである。4 is a flowchart illustrating a foreign language speaking ability training and evaluation method using speech recognition according to an embodiment of the present invention. 本発明の一実施例により、音声認識を用いた外国語スピーキング能力訓練及び評価装置を示す機能ブロック図である。FIG. 4 is a functional block diagram illustrating a foreign language speaking ability training and evaluation device using speech recognition according to an embodiment of the present invention. 本発明の一実施例により、録音された音声データとテキストデータとを比較して正確度を測定する方法を説明するための図面である。4 is a diagram illustrating a method of measuring accuracy by comparing recorded voice data and text data according to an exemplary embodiment of the present invention.

以下では、添付した図面に基づいて、本発明の実施例を本発明が属する技術分野で通常の知識を有する者が容易に実施できるように詳細に説明する。しかし、本発明は、様々の異なる形態で実現でき、ここで説明する実施例に限定されない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art to which the present invention pertains can easily implement the embodiments. However, the invention can be implemented in a variety of different forms and is not limited to the embodiments described herein.

本発明を明確に説明するために、説明と関係ない部分は図面から省略し、明細書全体にわたって同一または類似した部分については、同一または類似した図面符号をつける。 In order to clearly describe the present invention, portions not related to the description are omitted from the drawings, and portions that are the same or similar throughout the specification are given the same or similar drawings.

また、本明細書全体において、ある部分がある構成要素を「含む」とするときは、特に反対となる記載がない限り、他の構成要素を除外せず、他の構成要素をさらに含むことができることを意味する。 In addition, throughout the present specification, when a component includes a certain component, the component may include another component without excluding the other component unless otherwise stated to the contrary. Means you can.

さらに、各図面の構成要素に参照符号を付加するに当たって、同一の構成要素については、たとえ他の図面に表示されても、できるだけ同一の符号を有するようにした。また、本発明を説明するに当たって、係わる公知構成または機能に関する具体的な説明が本発明の要旨をかえって不明にすると判断される場合は、その詳細な説明を省略する。 Further, in adding reference numerals to the components of each drawing, the same components are given the same reference numerals as much as possible even if they are displayed in other drawings. Further, in describing the present invention, when it is determined that a specific description related to a known configuration or function is unclear, the detailed description thereof will be omitted.

また、本発明の構成要素を説明するに当たって、第１、第２、Ａ、Ｂ、（ａ）、（ｂ）などの用語を用いることができる。このような用語は、その構成要素を他の構成要素と区別するためのものであって、その用語により当該構成要素の本質や順番または順序などが限定されない。ある構成要素が他の構成要素に「連結」、「結合」または「接続」されると記載された場合、その構成要素は、他の構成要素に直接的に連結または接続でき、さらに、各構成要素の間にまた他の構成要素が「連結」、「結合」または「接続」されることもできることを理解しなければならない。 In describing the constituent elements of the present invention, terms such as first, second, A, B, (a), and (b) can be used. Such terms are used to distinguish the constituent elements from other constituent elements, and the essence, order, or order of the constituent elements are not limited by the terms. When a component is described as being “coupled”, “coupled” or “connected” to another component, the component can be directly coupled or connected to another component, and each component It should be understood that other components may also be “coupled”, “coupled” or “connected” between the elements.

本発明は、音声認識システムを用いて外国語スピーキング能力を訓練し、評価する方法に関するものであって、音声認識技術を用いてユーザが話した内容を文字に変換して外国語発声内容を自動で評価する過程において、音声認識技術の誤りを補うために、発声したユーザが、発声内容をキーボードなどの入力装置を用いて文字で入力するようにし、入力された文字データと録音された音声データとを比較してユーザの外国語能力を評価して、正確な評価結果をユーザに提供することにより、自ら訓練するようにする方法及び装置に関する。 The present invention relates to a method for training and evaluating a foreign language speaking ability using a speech recognition system, and automatically converts the content spoken by a user into characters by using speech recognition technology to convert foreign language utterance content. In order to compensate for errors in speech recognition technology in the process of evaluating in, the user who uttered input the utterance content with characters using an input device such as a keyboard, and the input character data and recorded voice data The present invention relates to a method and apparatus for self-training by comparing a user's foreign language ability and providing the user with an accurate evaluation result.

以下では、様々な図面に基づいて本発明の音声認識を用いた外国語スピーキング能力の訓練及び評価方法をより詳細に説明する。 Hereinafter, a method for training and evaluating a foreign language speaking ability using speech recognition according to the present invention will be described in more detail with reference to various drawings.

図２は、本発明の一実施例による外国語スピーキング能力訓練及び評価装置を用いて自由に発声し、評価結果の提供を受ける例示図面である。 FIG. 2 is an exemplary drawing for freely speaking and receiving evaluation results using a foreign language speaking ability training and evaluation apparatus according to an embodiment of the present invention.

先立って簡単に述べたように、従来の音声認識を用いた外国語スピーキング能力評価システムの場合は、ユーザが予め定められた文章をユーザが読むと、これによりユーザの発音能力またはイントネーションなどを評価したり、またはユーザが自由に発声する場合には自動音声認識を試みて、認識された結果が文法及び外国語語法に合うか否かを評価したりする方式であった。 As described briefly earlier, in the case of a foreign language speaking ability evaluation system using conventional speech recognition, when a user reads a predetermined sentence, the user's pronunciation ability or intonation is evaluated. Or, when the user speaks freely, automatic speech recognition is attempted, and whether or not the recognized result matches the grammar and the foreign language method is evaluated.

前者の場合は、自由に話す能力に対する評価が不可能であり、後者の場合は、自動音声認識システムの誤りにより一貫性のある評価が不可能であるという短所があった。 In the former case, it is impossible to evaluate the ability to speak freely, and in the latter case, a consistent evaluation is impossible due to an error in the automatic speech recognition system.

したがって、図２に示すように、本発明では、ユーザが自由に発声するようにした後に、訓練及び評価装置によりこれを録音して再生し、ユーザは、録音された発声内容を聞いた後に当該文章を直接文字に変えてテキスト（ｔｅｘｔ）を入力することになり、評価装置は、これに基づいて、前に録音された音声データに対する正確度を測定して、ユーザのスピーキング能力評価結果（ｒｅｓｕｌｔ）を提供する。 Therefore, as shown in FIG. 2, in the present invention, after the user utters freely, this is recorded and reproduced by the training and evaluation device, and the user hears the recorded utterance content and Based on this, the evaluation device measures the accuracy with respect to the previously recorded voice data, and evaluates the user's speaking ability evaluation result (result). )I will provide a.

上記のような方法により、ユーザは、定められたスクリプトなしで、自由に外国語スピーキングを訓練しながら、既存の自動音声認識システムの誤作動による評価システムの誤り発生の可能性を防止することができる。 By using the method as described above, the user can freely train foreign language speaking without a prescribed script, and prevent the possibility of an error in the evaluation system due to a malfunction of the existing automatic speech recognition system. it can.

さらに詳細な動作過程は、以下の図３及び図４に基づいて説明する。 A more detailed operation process will be described with reference to FIGS. 3 and 4 below.

図３は、本発明の一実施例により、音声認識を用いた外国語スピーキング能力訓練及び評価方法を示すフローチャートである。 FIG. 3 is a flowchart illustrating a foreign language speaking ability training and evaluation method using speech recognition according to an embodiment of the present invention.

図３を参考すると、本発明の訓練及び評価方法は、第１ユーザにより発声された外国語音声信号が入力されるステップ（Ｓ３１０）と、入力された音声信号をレコーディングするステップ（Ｓ３２０）と、レコーディングされた音声信号を再生して第２ユーザに提供するステップ（Ｓ３３０）と、提供された音声信号に対応するテキストデータが上記第２ユーザから入力されるステップ（Ｓ３４０）と、入力されたテキストデータと上記レコーディングされた音声信号とを比較して正確度を測定するステップ（Ｓ３５０）と、測定された正確度による評価結果を上記第１ユーザに提供するステップ（Ｓ３６０）と、を含む。 Referring to FIG. 3, the training and evaluation method of the present invention includes a step (S310) in which a foreign language speech signal uttered by a first user is input, a step (S320) in which the input speech signal is recorded, The step of reproducing the recorded voice signal and providing it to the second user (S330), the step of inputting text data corresponding to the provided voice signal from the second user (S340), and the inputted text Comparing data with the recorded audio signal to measure accuracy (S350), and providing an evaluation result based on the measured accuracy to the first user (S360).

ここで、区分して記載した第１ユーザと第２ユーザは、同一のユーザであってもよく、他のユーザであってもよい。第２ユーザが自動化装置（例：自動音声認識システム）であってもよい。 Here, the first user and the second user described separately may be the same user or other users. The second user may be an automated device (eg, an automatic speech recognition system).

通常、英語スピーキング能力評価の場合は、ユーザ（すなわち、被評価者）が与えられた環境にて自ら自由に発声し、評価者はこれを録音した後、事後にこれを聞いて、どれほど流暢に発声したのかを定量的に評価することになる。 Usually, in the case of English speaking ability evaluation, the user (ie, the evaluated person) speaks freely in the given environment, and the evaluator records this and then listens to it after the fact, how fluent It will be quantitatively evaluated whether the voice is spoken.

本発明は、この過程で録音された音声ファイルを聞いて評価するステップを、コンピュータなどの自動化装置により自動で行われるようにすることに関するものであって、大きく５つのステップに分けると、（１）被評価者が、与えられた環境または任務に従って自由に発声するステップ、（２）発声する内容を録音するステップ、（３）録音された内容を被評価者に聞かせるステップ、（４）録音された内容を被評価者が聞き、キーボードなどの入力装置を用いてテキストに変換及び入力するステップ、（５）入力されたテキストと録音されたデータとを比較してスピーキング能力を自動で評価するステップから構成されることができる。 The present invention relates to making the step of listening to and evaluating the audio file recorded in this process automatically performed by an automatic device such as a computer, and is roughly divided into five steps. ) A step where the evaluated person speaks freely according to the given environment or mission, (2) a step of recording the content to be uttered, (3) a step of letting the evaluated person hear the recorded content, (4) recording A step in which the person to be evaluated listens to the received content and converts and inputs it into text using an input device such as a keyboard, and (5) the spoken ability is automatically evaluated by comparing the input text with the recorded data. It can consist of steps.

上述した被評価者が、録音された内容を聞き、キーボードなどの入力装置を用いてテキストに変換するステップでは、被評価者が初めから最後までの全体テキストを作成してもよく、または音声認識モジュールが認識した内容のうち、被評価者により誤りを含んだ一部分のみを修正するようにしてもよい。 In the step of the above-mentioned evaluated person listening to the recorded content and converting it into text using an input device such as a keyboard, the evaluated person may create the entire text from the beginning to the end, or voice recognition. Of the contents recognized by the module, only a part containing errors by the evaluator may be corrected.

このような多数のステップを経ることにより、ユーザは自分が発声した内容を再度確認する機会を得るだけでなく、全体の音声認識システムの認識及び変換の誤作動を防止することができるので、発音、イントネーションなどのスピーキング能力自体を評価することにおける正確度及び一貫性を大きく向上させることができる。 Through these many steps, the user not only has the opportunity to reconfirm the content he uttered, but can also prevent recognition and conversion malfunctions of the entire speech recognition system, so The accuracy and consistency in evaluating the speaking ability itself, such as intonation, can be greatly improved.

以下では、訓練及び評価装置を構成する細部モジュールの間の信号及びデータ伝達過程を具体的に説明する。 Hereinafter, the signal and data transmission process between the detail modules constituting the training and evaluation apparatus will be described in detail.

図４は、本発明の一実施例による音声認識を用いた外国語スピーキング能力訓練及び評価装置を示す機能ブロック図である。 FIG. 4 is a functional block diagram illustrating a foreign language speaking ability training and evaluation apparatus using speech recognition according to an embodiment of the present invention.

図４を参考すると、本発明の訓練及び評価装置４００は、第１ユーザが発声した外国語音声信号が入力される音声信号受信部４１０と、入力された音声信号をレコーディングする録音部４２０と、レコーディングされた音声信号を再生して第２ユーザに提供する音声信号再生部４３０と、提供された音声信号に対応するテキストデータを上記第２ユーザにより入力されるテキスト受信部４４０と、入力されたテキストデータと上記レコーディングされた音声信号とを比較して正確度を測定する正確度測定部４５０と、測定された正確度による評価結果を上記第１ユーザに提供する評価部４６０と、を含む。 Referring to FIG. 4, the training and evaluation apparatus 400 of the present invention includes an audio signal receiving unit 410 to which a foreign language audio signal uttered by a first user is input, a recording unit 420 that records the input audio signal, An audio signal reproducing unit 430 that reproduces a recorded audio signal and provides it to a second user, a text receiving unit 440 that inputs text data corresponding to the provided audio signal by the second user, and an input An accuracy measuring unit 450 that measures the accuracy by comparing text data with the recorded audio signal, and an evaluation unit 460 that provides an evaluation result based on the measured accuracy to the first user.

訓練及び評価装置４００を構成する細部モジュールの間のデータ伝達過程を説明すると、先ず、被評価者により外国語で発声された音声（ｖｏｉｃｅ）が音声信号受信部４１０に入力される。 The data transmission process between the detailed modules constituting the training and evaluation apparatus 400 will be described. First, a voice uttered in a foreign language by the person to be evaluated is input to the voice signal receiving unit 410.

音声信号受信部４１０は、これを録音部４２０に伝達し、録音部４２０では、入力された音声信号をレコーディングして、以後の正確度の測定及び評価のために正確度測定部４５０に伝達する。 The audio signal receiving unit 410 transmits this to the recording unit 420, and the recording unit 420 records the input audio signal and transmits it to the accuracy measuring unit 450 for subsequent accuracy measurement and evaluation. .

一方、録音された音声信号は、音声信号再生部４３０により再び被評価者に提供されるが、このとき、音声信号を録音した直ちに被評価者に提供されてもよく、所定のｎ時間（ｎは、陽の実数）後に被評価者に提供されてもよい。 On the other hand, the recorded audio signal is provided to the evaluated person again by the audio signal reproducing unit 430. At this time, the recorded audio signal may be provided to the evaluated person immediately after recording the audio signal, and may be provided for a predetermined n hours (n May be provided to the evaluated person after a positive real number).

上述したように、録音された音声信号は、被評価者に提供されることができ、他のユーザ（または音声認識システム）に提供された後に、評価のための基礎資料として活用されることもできる。 As described above, the recorded voice signal can be provided to the person to be evaluated, and can be used as a basic material for evaluation after being provided to another user (or a voice recognition system). it can.

その後、録音された音声（ｒｅｃｏｒｄｅｄｖｏｉｃｅ）を聴取した被評価者は、提供された音声信号に対応する全体テキストを入力するか、または音声認識システムが認識した音声信号の全体テキストのうち、誤りが含まれた一部テキストに対する修正データを入力する方式でスクリプトテキストを入力し、訓練及び評価装置４００内のテキスト受信部４４０が、これを受信する。 Thereafter, the evaluator who listened to the recorded voice inputs the whole text corresponding to the provided voice signal, or an error is detected in the whole text of the voice signal recognized by the voice recognition system. The script text is input by inputting correction data for the included partial text, and the text receiving unit 440 in the training and evaluation apparatus 400 receives the script text.

そして、テキスト受信部４４０は、テキストデータを正確度測定部４５０に伝達して音声信号評価の基礎資料として活用する。 Then, the text receiving unit 440 transmits the text data to the accuracy measuring unit 450 and uses it as a basic material for voice signal evaluation.

正確度測定部４５０を構成する細部モジュールを説明すると、正確度測定部４５０は、録音された音声信号を、アナログ音声信号からデジタル音声信号である音声データに変換するＡＤコンバータ４５１と、入力されたテキストデータを発音記号で表示した文字列である発音列に変換する発音列変換部４５３と、変換された発音列を上記変換された音声データと整列させる（ｆｏｒｃｅｄｔｏａｌｉｇｎ）整列部４５２と、整列された発音列と音声データとを比較して発音列単位の正確度を定量的に測定するサブ正確度測定部４５４と、を含むことができる。 The detailed module constituting the accuracy measurement unit 450 will be described. The accuracy measurement unit 450 converts an audio signal recorded from an analog audio signal into audio data that is a digital audio signal, and an AD converter 451 input thereto. A pronunciation string conversion unit 453 that converts text data into a pronunciation string that is a character string displayed as a phonetic symbol; an alignment unit 452 that aligns the converted pronunciation string with the converted voice data; A sub-accuracy measuring unit 454 that compares the generated pronunciation sequence with the audio data to quantitatively measure the accuracy of the pronunciation sequence unit.

最後に、上記のような役割を行う正確度測定部４５０の測定結果に応じて、発声に対する最終の評価結果が評価部４６０を通じて被評価者にフィードバックされる。 Finally, the final evaluation result for the utterance is fed back to the person to be evaluated through the evaluation unit 460 according to the measurement result of the accuracy measurement unit 450 performing the above role.

図５は、本発明の一実施例により録音された音声データとテキストデータとを比較して正確度を測定する方法を説明するための図面である。 FIG. 5 is a diagram for explaining a method of measuring accuracy by comparing voice data and text data recorded according to an embodiment of the present invention.

ユーザが発声した音声信号（または変換された音声データ）とユーザが入力したテキストデータとを用いて、スピーキング能力を測定し、評価する具体的な方法は、次のような方法を用いることができる。 The following method can be used as a specific method for measuring and evaluating the speaking ability using the voice signal (or converted voice data) uttered by the user and the text data input by the user. .

先ず、ユーザが入力したテキストデータを発音列に変換する。発音列とは、当該英文を発音記号で表示した文字列を意味する。 First, text data input by the user is converted into a pronunciation string. The pronunciation string means a character string in which the English sentence is displayed with a phonetic symbol.

その後、このような発音列をユーザが発声した音声データと整列する（ｆｏｒｃｅｄｔｏａｌｉｇｎ）。 After that, such a pronunciation string is aligned with voice data uttered by the user (forced to align).

上記の過程は、一般の音声認識システムを用いて、音声データのうちの当該発音列に該当する区間を正確に一致させる過程である。 The above process is a process of accurately matching a section corresponding to the pronunciation string in the voice data by using a general voice recognition system.

ここで、上記のようにユーザが自ら発声内容を正確に文字列（ｔｅｘｔ）に変えたので、上記の整列過程での誤差を大きく低減することができる。 Here, as described above, the user himself / herself changes the content of the utterance into the character string (text), so that the error in the alignment process can be greatly reduced.

発音列と音声信号が整列された状態になると、発音列単位で音声信号を分析し、ユーザがどれほど正確に当該発音列を発声したのかを定量的に測定できる状態になる。 When the pronunciation string and the voice signal are aligned, the voice signal is analyzed in units of the pronunciation string, and it becomes possible to quantitatively measure how accurately the user has uttered the pronunciation string.

このステップでは、発音列の音声的特徴（ｐｈｏｎｅｔｉｃｆｅａｔｕｒｅ）を用いて当該特徴が音声信号に含まれているか否かなどを正確度を測定する手段として用いることができる。例えば、／ｂ／、／ｄ／、／ｇ／のような有声音の場合、有声音の特徴が音声信号にあるか否かを用いることができる。 In this step, it can be used as a means for measuring the accuracy of whether or not the feature is included in the speech signal using the phonetic feature of the pronunciation string. For example, in the case of voiced sounds such as / b /, / d /, and / g /, it can be used whether or not the characteristics of the voiced sound are in the audio signal.

また他の方法により、個別の音声学的特徴以外にも各音素に該当する音声信号自体を、それぞれモデルとして定義し、現在ユーザの信号と格納されたモデルとの間の差をスコアで計算する方法もある。 In addition to the individual phonetic features, the speech signal itself corresponding to each phoneme is defined as a model by another method, and the difference between the current user signal and the stored model is calculated as a score. There is also a method.

図５を参考すると、入力されたテキスト５２０を発音列に変換し５３０、録音された音声データ５１０と互いに整列させた後に、各発音列（５４１ｂ〜５４３ｂ）に該当する区間において、音声信号のモデル（５４１ａ〜５４３ａ）の間の差をそれぞれのスコアで計算し、総合を集計することにより評価結果を算出することができる５５０。 Referring to FIG. 5, the input text 520 is converted into a pronunciation string 530, and after being aligned with the recorded voice data 510, the model of the voice signal in the section corresponding to each pronunciation string (541 b to 543 b). An evaluation result can be calculated 550 by calculating the difference between (541a to 543a) with each score and totaling the total.

このように、本発明による外国語スピーキング能力の訓練及び評価方法を用いると、ユーザが定められたスクリプトを読むのではなく、自由に発声するように誘導することにより、実際のスピーキング能力を訓練及び評価することができるとともに、ユーザが入力した文章と発声した音声信号とを自動で比較して自動音声認識誤作動による評価システムの誤りを防止する。 Thus, using the foreign language speaking ability training and evaluation method according to the present invention, the actual speaking ability can be trained and guided by guiding the user to speak freely rather than reading the prescribed script. In addition to being able to evaluate, the sentence inputted by the user and the voice signal uttered are automatically compared to prevent an error of the evaluation system due to an automatic voice recognition malfunction.

また、音声認識技術を用いて発声内容と文字列を音素単位で整列する方法を適用して、ユーザ発声に対する流暢性評価、発音評価などがより正確に可能となる効果がある。
以上では、本発明の実施例を構成する全ての構成要素が１つに結合されたり、結合されて動作するものとして説明したが、本発明がこのような実施例に限定されるものではない。すなわち、本発明の目的範囲内であれば、その全ての構成要素が一つ以上に、選択的に結合して動作することもできる。また、その全ての構成要素がそれぞれ１つの独立したハードウェアとして実現されることもでき、各構成要素の一部または全部が選択的に組み合わせられて、１つまたは複数のハードウェアで組み合わせられた一部または全部の機能を行うプログラムモジュールを有するコンピュータプログラムとして実現されることもできる。そのコンピュータプログラムを構成するコード及びコードセグメントは、本発明の技術分野の当業者により容易に推論できるものである。このようなコンピュータプログラムは、コンピュータが読み取り可能な格納媒体に格納され、コンピュータにより読み取られて実行されることにより、本発明の実施例を実現することもできる。コンピュータプログラムの格納媒体としては、磁気記録媒体、光記録媒体などが含まれ得る。 In addition, there is an effect that fluency evaluation and pronunciation evaluation for user utterance can be performed more accurately by applying a method of aligning utterance contents and character strings in units of phonemes using a speech recognition technique.
In the above description, it has been described that all the components constituting the embodiment of the present invention are combined into one or operate by being combined, but the present invention is not limited to such an embodiment. In other words, all the components can be selectively combined and operated within the scope of the present invention. In addition, all of the components can be realized as one independent hardware, and some or all of the components are selectively combined and combined with one or more hardware. It can also be realized as a computer program having program modules that perform some or all of the functions. Codes and code segments constituting the computer program can be easily inferred by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable storage medium, and can be read and executed by the computer to implement the embodiments of the present invention. The computer program storage medium may include a magnetic recording medium, an optical recording medium, and the like.

また、以上で記載した「含む」、「構成する」または「有する」などの用語は、特に反対の記載がない限り、当該構成要素が内在され得ることを意味するので、他の構成要素を除くのではなく、他の構成要素をさらに含むことができると解釈されなければならない。技術的、もしくは科学的な用語を含む全ての用語は、異なるように定義されない限り、本発明が属する技術分野で通常の知識を有する者によって一般的に理解されるのと同じ意味を有する。辞書に定義されている用語のように一般的に使用される用語は、関連技術の文脈上の意味と一致するものとして解釈されるべきであり、本発明において明白に定義しない限り、理想的、もしくは過度に形式的な意味に解釈されない。 In addition, terms such as “including”, “constituting”, or “having” described above mean that the component can be included unless otherwise stated, and exclude other components Rather, it should be construed that it can further include other components. All terms, including technical or scientific terms, unless defined differently have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Commonly used terms such as those defined in the dictionary should be construed as consistent with the contextual meaning of the related art and are ideal unless explicitly defined in the present invention, Or it is not overly interpreted in a formal sense.

以上の説明は、本発明の技術思想を例示的に説明したことに過ぎず、本発明が属する技術分野で通常の知識を有する者であれば、本発明の本質的な特性から逸脱しない範囲で多様な修正及び変形が可能であることを理解できよう。従って、本発明に開示された実施例は本発明の技術思想を限定するものではなく、説明するためのものであって、このような実施例によって本発明の技術思想の範囲が限定されない。本発明の保護範囲は本発明の請求範囲によって解釈されるべきであり、それと同等な範囲内にある全ての技術思想は本発明の権利範囲に含まれるものとして解釈されなければならない。 The above description is merely illustrative of the technical idea of the present invention, and a person having ordinary knowledge in the technical field to which the present invention belongs can be used without departing from the essential characteristics of the present invention. It will be understood that various modifications and variations are possible. Accordingly, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but are for explanation, and the scope of the technical idea of the present invention is not limited by such examples. The protection scope of the present invention should be construed by the claims of the present invention, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the right of the present invention.

Claims

In foreign language speaking ability training and evaluation method using speech recognition,
Inputting a foreign language voice signal uttered by the first user;
Recording the input audio signal;
Replaying and providing the recorded audio signal to a second user;
Text data corresponding to the provided audio signal is input by the second user;
Comparing the input text data and the recorded audio signal to measure accuracy;
Providing an evaluation result based on the measured accuracy to the first user;
A foreign language speaking ability training and evaluation method using speech recognition characterized by comprising:

The first user is an evaluated person in a foreign language speaking ability training and evaluation using the voice recognition, and the second user is the same user as the first user. A foreign language speaking ability training and evaluation method using the speech recognition according to 1.

Replaying the recorded audio signal and providing it to the second user comprises:
2. The foreign user using voice recognition according to claim 1, wherein the input voice signal is provided to the second user immediately after recording or after a predetermined n hours (n is a positive real number). Language speaking ability training and evaluation methods.

The step of inputting text data corresponding to the provided audio signal includes:
The entire text corresponding to the provided speech signal is input, or correction data for a partial text including an error in the text of the speech signal recognized by the speech recognition module is input. A foreign language speaking ability training and evaluation method using speech recognition according to claim 1.

Measuring the accuracy comprises:
Converting the recorded audio signal from an analog audio signal to audio data which is a digital audio signal;
Converting the input text data into a phonetic string that is a character string displayed with phonetic symbols;
Aligning the converted pronunciation sequence with the converted audio data;
The foreign language using speech recognition according to claim 1, further comprising the step of comparing the aligned pronunciation sequence and speech data to quantitatively measure the accuracy of the pronunciation sequence unit. Speaking ability training and evaluation methods.

The step of quantitatively measuring the accuracy includes:
6. The accuracy of claim 5, wherein the accuracy is measured based on whether the phonetic feature is included in the voice data using a phonetic feature of the phonetic sequence. Foreign language speaking ability training and evaluation method using speech recognition.

The step of quantitatively measuring the accuracy includes:
A signal corresponding to each phoneme of the phonetic sequence is defined as a model, and the accuracy is measured by calculating a difference between the speech data and the defined model as a score. A foreign language speaking ability training and evaluation method using voice recognition according to claim 5.

In foreign language speaking ability training and evaluation equipment using speech recognition,
An audio signal receiving unit to which a foreign language audio signal uttered by the first user is input;
A recording unit for recording the input audio signal;
An audio signal reproduction unit that reproduces the recorded audio signal and provides it to a second user;
A text receiving unit in which text data corresponding to the provided audio signal is input by the second user;
An accuracy measurement unit that measures the accuracy by comparing the input text data and the recorded voice signal;
An evaluation unit that provides the first user with an evaluation result based on the measured accuracy;
A foreign language speaking ability training and evaluation device using speech recognition, characterized by comprising:

The first user is an evaluated person in a foreign language speaking ability training and evaluation using the voice recognition, and the second user is the same user as the first user. A foreign language speaking ability training and evaluation device using the voice recognition according to 8.

The audio signal reproduction unit is
9. The foreign language using voice recognition according to claim 8, wherein the input voice signal is provided to the second user immediately after recording or after a predetermined n hours (n is a positive real number). Speaking ability training and evaluation equipment.

The text receiving unit
The entire text corresponding to the provided speech signal is input, or correction data for a partial text including an error in the text of the speech signal recognized by the speech recognition module is input. A foreign language speaking ability training and evaluation device using speech recognition according to claim 8.

The accuracy measuring unit includes:
An AD converter that converts the recorded audio signal from an analog audio signal to audio data that is a digital audio signal;
A phonetic string conversion unit that converts the input text data into a phonetic string that is a character string displayed with phonetic symbols;
An alignment unit for aligning the converted pronunciation sequence with the converted audio data;
The speech recognition according to claim 8, further comprising: a sub-accuracy measurement unit that quantitatively measures the accuracy of the phonetic string unit by comparing the aligned phonetic strings and voice data. Foreign language speaking ability training and evaluation device used.

The sub-accuracy measuring unit is
The method according to claim 12, wherein the accuracy is measured based on whether or not the phonetic feature is included in the voice data using a phonetic feature of the phonetic sequence. Foreign language speaking ability training and evaluation device using speech recognition.

The sub-accuracy measuring unit is
The accuracy is measured by defining a signal corresponding to each phoneme in the phonetic sequence as a model, and calculating a difference between the speech data and the defined model by a score. Item 13. A foreign language speaking ability training and evaluation device using speech recognition according to Item 12.