JP2016045467A

JP2016045467A - Utterance evaluation device, utterance evaluation method and program

Info

Publication number: JP2016045467A
Application number: JP2014171913A
Authority: JP
Inventors: 佐藤　壮一; Soichi Sato; 壮一佐藤; 悠哉藤田; Yuya Fujita; 庄衛佐藤; Shoe Sato; 善行本田; Yoshiyuki Honda
Original assignee: Nippon Hoso Kyokai NHK
Current assignee: Japan Broadcasting Corp
Priority date: 2014-08-26
Filing date: 2014-08-26
Publication date: 2016-04-04
Anticipated expiration: 2034-08-26
Also published as: JP6366179B2

Abstract

PROBLEM TO BE SOLVED: To evaluate whether or not, there is a reading error in utterance.SOLUTION: A reading error creation part 161 of an utterance evaluation device 1 regards a word including a Chinese character out of words forming a sentence acquired from text data, as a processing object word, and then creates a way of reading of reading error. A sound production dictionary update part 162 associates the processing object word to which identification information is added, with the way of reading of reading error, and then registers, thereby updating a sound production dictionary. A language model update part 163 acquires association between words, from the text data, and based on association between words which is created by adding identification information to the processing object word included in the association between words, and a predetermined occurrence frequency applied to the created association between words, updates a language model. A recognition processing part 19 recognizes voice data based on the updated sound production dictionary and the language model. A user interface control part 14 outputs a reading error when the word to which identification information is added, is included in a result of voice recognition.SELECTED DRAWING: Figure 1

Description

本発明は、発話評価装置、発話評価方法、及びプログラムに関する。 The present invention relates to an utterance evaluation device, an utterance evaluation method, and a program.

人間の発話音声の善し悪しを自動評定する技術がある（例えば、特許文献１参照）。この技術では、予めネイティブ話者の発話から生成しておいた音響モデル、言語モデル、及び音素継続長モデルに基づいて、発話音声の音声データから、例えば発話の発音、発話速度、発話の流暢さなど、発音に関連し得る特徴量を抽出する。そして、抽出された特徴量に基づいて、文単位・単語単位で発音を評価する。 There is a technique for automatically evaluating the quality of human speech (see, for example, Patent Document 1). In this technology, based on an acoustic model, a language model, and a phoneme duration model generated in advance from a native speaker's utterance, for example, utterance pronunciation, utterance speed, fluency of utterance from speech data For example, feature quantities that can be related to pronunciation are extracted. Then, the pronunciation is evaluated in sentence units and word units based on the extracted feature values.

特開２００６−８４９６６号公報JP 2006-84966 A

従来の発話評価装置では、発話すべき文章（正解文）をもとに発音の評価を行う。そのため、正解文と全く同じ文章を発話しなければ、発音を評価することができない。また、正解文のネイティブ話者の発話がないと評価を行うことができない。さらには、従来技術は発音に関する評価であり、正解文に対して発話者の発音が悪いのか、発話者が読み間違えたのか評価するものではない。音声認識処理においても、読み間違いの認識が可能なモデルはなかった。 In a conventional utterance evaluation device, pronunciation is evaluated based on a sentence to be uttered (correct sentence). Therefore, pronunciation cannot be evaluated unless the same sentence as the correct sentence is spoken. In addition, the evaluation cannot be performed without the utterance of the correct native speaker. Furthermore, the related art is an evaluation related to pronunciation, and does not evaluate whether the speaker's pronunciation is bad for the correct sentence or whether the speaker has mistaken reading. In the speech recognition process, there was no model that could recognize misreading.

本発明は、このような事情を考慮してなされたもので、発話に読み間違いがあるかを評価することができる発話評価装置、発話評価方法、及びプログラムを提供する。 The present invention has been made in view of such circumstances, and provides an utterance evaluation apparatus, an utterance evaluation method, and a program that can evaluate whether or not there is a reading error in an utterance.

本発明の一態様は、単語と前記単語の読みとを対応付けた発音辞書を記憶する発音辞書記憶部と、単語同士のつながり易さを表す言語モデルを記憶する言語モデル記憶部と、テキストデータから取得した文を構成する単語のうち漢字を含んだ前記単語を処理対象単語とし、前記処理対象単語に含まれる前記漢字が取り得る読みに基づいて前記処理対象単語の読み間違いの読み方を生成する読み間違い生成部と、読み間違いを示す識別情報を付加した前記処理対象単語と、前記処理対象単語に対して前記読み間違い生成部が生成した前記読み間違いの読み方とを対応付けて登録することにより前記発音辞書を更新する発音辞書更新部と、前記テキストデータから単語同士のつながりを取得し、取得した前記単語同士のつながりに含まれる前記処理対象単語に前記識別情報を付加して生成した単語同士のつながりと、生成した前記単語同士のつながりに付与した所定の出現頻度とに基づいて前記言語モデルを更新する言語モデル更新部と、前記発音辞書更新部が更新した前記発音辞書及び前記言語モデル更新部が更新した前記言語モデルに基づいて音声データを音声認識する認識処理部と、前記認識処理部による音声認識の結果に前記識別情報が付加された単語が含まれる場合に、読み間違いを出力する出力部と、を備えることを特徴とする発話評価装置である。
この発明によれば、発話評価装置は、テキストデータから取得した文を構成する単語のうち、漢字を含んだ単語を処理対象単語として読み間違いの読み方を生成し、識別情報を付加した処理対象単語と読み間違いの読み方とを対応付けて発音辞書に登録する。発話評価装置は、テキストデータから単語同士のつながりを取得すると、取得した単語同士のつながりに含まれる処理対象単語に識別情報を付加して読み間違いの読み方が付与された単語を含んだ単語同士のつながりを生成し、所定の出現頻度を付与する。発話評価装置は、生成した単語同士のつながりと、付与した出現頻度とに基づいて言語モデルを更新する。発話評価装置は、更新された発音辞書及び言語モデルに基づいて音声データを音声認識し、音声認識結果に識別情報が付加された単語が含まれる場合に、読み間違いを出力する。
これにより、発話評価装置は、音声データが示す発話に読み間違いがあった場合に、読み間違いを通知することができる。 One aspect of the present invention is a pronunciation dictionary storage unit that stores a pronunciation dictionary in which a word and a reading of the word are associated with each other, a language model storage unit that stores a language model representing the ease of connection between words, and text data Among the words constituting the sentence acquired from the above, the word including kanji is set as a processing target word, and a reading error of the processing target word is generated based on a reading that can be taken by the kanji included in the processing target word By registering the reading error generation unit, the processing target word to which identification information indicating a reading error is added, and the reading error generated by the reading error generation unit with respect to the processing target word in association with each other A pronunciation dictionary updating unit that updates the pronunciation dictionary; and a connection between words from the text data, and the processing included in the acquired connection between the words A language model updating unit that updates the language model based on a connection between words generated by adding the identification information to a target word and a predetermined appearance frequency assigned to the generated connection between the words; and the pronunciation A recognition processing unit that recognizes speech data based on the pronunciation dictionary updated by the dictionary update unit and the language model updated by the language model update unit, and the identification information is added to the result of speech recognition by the recognition processing unit An utterance evaluation apparatus comprising: an output unit that outputs a reading error when a recorded word is included.
According to this invention, the utterance evaluation device generates a reading error by using a word including a kanji as a processing target word among words constituting a sentence acquired from text data, and adds the identification information to the processing target word. Are registered in the pronunciation dictionary in association with reading mistakes. When the utterance evaluation device acquires the connection between words from the text data, the utterance evaluation device adds identification information to the processing target word included in the acquired connection between words and adds words to each other that include a word that has been given a reading error. A connection is generated and given frequency of appearance is given. The utterance evaluation device updates the language model based on the generated connection between the words and the given appearance frequency. The utterance evaluation device recognizes speech data based on the updated pronunciation dictionary and language model, and outputs a reading error when the speech recognition result includes a word with identification information added.
Thereby, the utterance evaluation apparatus can notify a reading error when there is a reading error in the utterance indicated by the voice data.

本発明の一態様は、上述する発話評価装置であって、前記読み間違い生成部は、前記テキストデータから取得した文を構成する単語のうち漢字を含んだ所定の品詞の前記単語を前記処理対象単語とする、ことを特徴とする。
この発明によれば、発話評価装置は、テキストデータが示す文を構成する漢字を含んだ単語のうち、所定の品詞の単語に読み間違いを付与する。
これにより、発話評価装置は、所定の品詞の単語についての読み間違いを検出することができる。 One aspect of the present invention is the utterance evaluation device described above, wherein the reading error generation unit is configured to process the word having a predetermined part-of-speech including a kanji among words constituting a sentence acquired from the text data. It is characterized by being a word.
According to this invention, the utterance evaluation device gives a reading error to a word having a predetermined part-of-speech word among words including kanji constituting the sentence indicated by the text data.
Thereby, the utterance evaluation apparatus can detect a reading error for a word having a predetermined part of speech.

本発明の一態様は、上述する発話評価装置であって、前記言語モデル更新部は、生成した前記単語同士のつながりに、前記識別情報を付加する前の前記単語同士のつながりについて前記テキストデータから算出した出現頻度に基づく出現頻度を付与する、ことを特徴とする。
この発明によれば、発話評価装置は、読み間違いの単語の出現頻度を、正しい読みの単語の出現頻度に基づいて設定する。
これにより、発話評価装置は、実際の読み間違いの単語の出現確率が低い場合でも、発話を音声認識する際に読み間違いの単語を認識しやすくすることができる。 One aspect of the present invention is the utterance evaluation device described above, wherein the language model update unit uses the text data for the connection between the words before adding the identification information to the generated connection between the words. An appearance frequency based on the calculated appearance frequency is assigned.
According to this invention, the utterance evaluation device sets the appearance frequency of a misread word based on the appearance frequency of a correctly read word.
As a result, the utterance evaluation device can make it easier to recognize misread words when speech recognition of utterances is performed even when the appearance probability of actual misread words is low.

本発明の一態様は、発話評価装置が実行する発話評価方法であって、テキストデータから取得した文を構成する単語のうち漢字を含んだ前記単語を処理対象単語とし、前記処理対象単語に含まれる前記漢字が取り得る読みに基づいて前記処理対象単語の読み間違いの読み方を生成する読み間違い生成ステップと、単語と前記単語の読みとを対応付けた発音辞書に、読み間違いを示す識別情報を付加した前記処理対象単語と、前記処理対象単語に対して前記読み間違い生成ステップにおいて生成された前記読み間違いの読み方とを対応付けて登録することにより、前記発音辞書を更新する発音辞書更新ステップと、前記テキストデータから単語同士のつながりを取得し、取得した前記単語同士のつながりに含まれる前記処理対象単語に前記識別情報を付加して生成した単語同士のつながりと、生成した前記単語同士のつながりに付与した所定の出現頻度とに基づいて、単語同士のつながり易さを表す言語モデルを更新する言語モデル更新ステップと、前記発音辞書更新ステップにおいて更新した前記発音辞書及び前記言語モデル更新ステップにおいて更新した前記言語モデルに基づいて音声データを音声認識する認識処理ステップと、前記認識処理ステップにおける音声認識の結果に前記識別情報が付加された単語が含まれる場合に、読み間違いを出力する出力ステップと、を有することを特徴とする発話評価方法である。 One aspect of the present invention is an utterance evaluation method executed by an utterance evaluation apparatus, wherein the word including kanji among words constituting a sentence acquired from text data is set as a processing target word, and is included in the processing target word A reading error generation step for generating a reading error of the word to be processed based on a reading that can be taken by the kanji, and a pronunciation dictionary in which the word and the reading of the word are associated with each other. A pronunciation dictionary updating step for updating the pronunciation dictionary by registering the added processing target word and the reading mistake generated in the reading error generation step in association with the processing target word; , Acquiring a connection between words from the text data, and identifying the processing target word included in the acquired connection between the words A language model update step for updating a language model representing the ease of connection between words based on the connection between words generated by adding a report and the predetermined appearance frequency assigned to the generated connection between the words; Recognition processing step for recognizing speech data based on the pronunciation dictionary updated in the pronunciation dictionary update step and the language model updated in the language model update step, and the identification in the result of speech recognition in the recognition processing step An utterance evaluation method comprising: an output step of outputting a reading error when a word to which information is added is included.

本発明の一態様は、コンピュータを、単語と前記単語の読みとを対応付けた発音辞書を記憶する発音辞書記憶手段と、単語同士のつながり易さを表す言語モデルを記憶する言語モデル記憶手段と、テキストデータから取得した文を構成する単語のうち漢字を含んだ前記単語を処理対象単語とし、前記処理対象単語に含まれる前記漢字が取り得る読みに基づいて前記処理対象単語の読み間違いの読み方を生成する読み間違い生成手段と、読み間違いを示す識別情報を付加した前記処理対象単語と、前記処理対象単語に対して前記読み間違い生成手段が生成した前記読み間違いの読み方とを対応付けて登録することにより前記発音辞書を更新する発音辞書更新手段と、前記テキストデータから単語同士のつながりを取得し、取得した前記単語同士のつながりに含まれる前記処理対象単語に前記識別情報を付加して生成した単語同士のつながりと、生成した前記単語同士のつながりに付与した所定の出現頻度とに基づいて前記言語モデルを更新する言語モデル更新手段と、前記発音辞書更新手段が更新した前記発音辞書及び前記言語モデル更新手段が更新した前記言語モデルに基づいて音声データを音声認識する認識処理手段と、前記認識処理手段による音声認識の結果に前記識別情報が付加された単語が含まれる場合に、読み間違いを出力する出力手段と、を具備する発話評価装置として機能させるためのプログラムである。 According to one aspect of the present invention, a computer stores a pronunciation dictionary storage unit that stores a pronunciation dictionary in which a word and a reading of the word are associated with each other, and a language model storage unit that stores a language model that indicates the ease of connection between words. The word including kanji in the words constituting the sentence acquired from the text data is set as the processing target word, and the reading error of the processing target word is read based on the reading that the kanji included in the processing target word can take. A registration error generation unit that generates identification error information, a processing target word to which identification information indicating a reading error is added, and a reading method of the reading error generated by the reading error generation unit with respect to the processing target word. The pronunciation dictionary updating means for updating the pronunciation dictionary by acquiring the connection between words from the text data, and the acquired words A language model that updates the language model based on a connection between words generated by adding the identification information to the processing target word included in a connection and a predetermined appearance frequency assigned to the generated connection between the words Updating means, recognition processing means for recognizing speech data based on the pronunciation dictionary updated by the pronunciation dictionary updating means and the language model updated by the language model updating means, and a result of speech recognition by the recognition processing means Is a program for functioning as an utterance evaluation apparatus comprising: an output means for outputting a reading error when the identification information is added to a word.

本発明によれば、発話に読み間違いがあるかを評価することができる。 According to the present invention, it is possible to evaluate whether an utterance has a reading error.

本発明の一実施形態による発話評価装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the speech evaluation apparatus by one Embodiment of this invention. 同実施形態による発話評価装置の全体処理を示すフローチャートである。It is a flowchart which shows the whole process of the speech evaluation apparatus by the embodiment. 同実施形態による発話評価装置の発音辞書及び言語モデル更新処理を示すフローチャートである。It is a flowchart which shows the pronunciation dictionary and language model update process of the speech evaluation apparatus by the embodiment. 同実施形態による発話評価装置の読み間違い指摘処理を示すフローチャートである。It is a flowchart which shows the reading mistake indication process of the speech evaluation apparatus by the embodiment. 同実施形態による発話評価装置が表示させる読み間違い指摘画面の例を示す図である。It is a figure which shows the example of the reading mistake indication screen which the speech evaluation apparatus by the embodiment displays.

以下、図面を参照しながら本発明の実施形態を詳細に説明する。
本発明の一実施形態による発話評価装置は、学習用のテキストデータに読み間違いを付与し、付与した読み間違いに基づいて音声認識に用いるモデルを適応化することにより、発話に含まれる読み間違いを認識可能とする。これにより、本実施形態の発話評価装置は、音声認識結果に読み間違いがあるかを評価することができる。例えば、本実施形態の発話評価装置は、アナウンサーや役者等が、台本などのあらかじめ決まった文章を読む際に、音声認識技術を用いて自動的に読み間違いを指摘する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
An utterance evaluation apparatus according to an embodiment of the present invention assigns a reading error to learning text data, and adapts a model used for speech recognition based on the given reading error, thereby detecting a reading error included in the utterance. Make it recognizable. Thereby, the utterance evaluation apparatus of this embodiment can evaluate whether there is a reading error in the speech recognition result. For example, the speech evaluation apparatus according to the present embodiment automatically points out a reading error using a voice recognition technology when an announcer, an actor, or the like reads a predetermined sentence such as a script.

図１は、本発明の一実施形態による発話評価装置１の構成を示す機能ブロック図であり、本実施形態と関係する機能ブロックのみを抽出して示してある。発話評価装置１は、コンピュータ装置により実現することができる。同図に示すように、発話評価装置１は、音響モデル記憶部１１、発音辞書記憶部１２、言語モデル記憶部１３、ユーザーインターフェース制御部１４、テキストデータ取得部１５、更新部１６、音声データ取得部１７、音響特徴量抽出部１８、及び認識処理部１９を備えて構成される。 FIG. 1 is a functional block diagram showing a configuration of an utterance evaluation apparatus 1 according to an embodiment of the present invention, and only functional blocks related to the present embodiment are extracted and shown. The utterance evaluation device 1 can be realized by a computer device. As shown in the figure, the utterance evaluation apparatus 1 includes an acoustic model storage unit 11, a pronunciation dictionary storage unit 12, a language model storage unit 13, a user interface control unit 14, a text data acquisition unit 15, an update unit 16, and voice data acquisition. A unit 17, an acoustic feature amount extraction unit 18, and a recognition processing unit 19 are configured.

音響モデル記憶部１１は、音響モデルを記憶する。音響モデルは、音素とその音素の音響特徴量とを対応付けたデータである。本実施形態では、音響モデルとして、音響特徴量と音素との間の統計的関係を表すＨＭＭ（Hidden Markov Model、隠れマルコフモデル）音響モデルを用いる。発音辞書記憶部１２は、発音辞書を記憶する。発音辞書は、単語と読みの対応付けを表すデータである。本実施形態では、発音辞書として、文字や語と、それらの読みの音素との関係を表すデータを用いる。言語モデル記憶部１３は、言語モデルを記憶する。言語モデルは、単語同士のつながり易さを表すデータである。本実施形態では、言語モデルとして、言語のｎ個の要素（文字や語）が出現する順序の統計的確率を表すｎ−ｇｒａｍ言語モデルを使用する。 The acoustic model storage unit 11 stores an acoustic model. The acoustic model is data in which a phoneme is associated with an acoustic feature amount of the phoneme. In the present embodiment, an HMM (Hidden Markov Model) acoustic model representing a statistical relationship between acoustic features and phonemes is used as the acoustic model. The pronunciation dictionary storage unit 12 stores a pronunciation dictionary. The pronunciation dictionary is data representing correspondence between words and readings. In the present embodiment, data representing the relationship between characters and words and their phonemes is used as the pronunciation dictionary. The language model storage unit 13 stores a language model. The language model is data representing the ease of connection between words. In the present embodiment, an n-gram language model representing the statistical probability of the order in which n elements (characters and words) appear in the language is used as the language model.

ユーザーインターフェース制御部１４は、情報を出力する出力部の一例であり、ディスプレイなどの表示装置に画面を表示させる。テキストデータ取得部１５は、テキストデータを取得する。
更新部１６は、読み間違い生成部１６１、発音辞書更新部１６２、及び言語モデル更新部１６３を備える。読み間違い生成部１６１は、テキストデータから文を取得し、取得した文を構成する単語に漢字を含んだ所定の品詞の単語がある場合、その単語に読み間違いの読み方を付与する。発音辞書更新部１６２は、読み間違い生成部１６１が付与した読み間違いの読み方に基づいて、発音辞書記憶部１２に記憶されている発音辞書を更新する。言語モデル更新部１６３は、テキストデータから取得した文からｎ−ｇｒａｍとその出現頻度を得ると、得られたｎ−ｇｒａｍに含まれる正しい読みの単語を、読み間違い生成部１６１が読み間違いを付与した単語に置き換えたｎ−ｇｒａｍを生成する。言語モデル更新部１６３は、読み間違いを付与した単語に置き換えて生成したｎ−ｇｒａｍに、読み間違いを付与した単語に置き換える前のｎ−ｇｒａｍの出現頻度に基づく出現頻度を付与する。言語モデル更新部１６３は、生成したｎ−ｇｒａｍと付与した出現頻度とに基づいて、言語モデル記憶部１３に記憶されている言語モデルを更新する。 The user interface control unit 14 is an example of an output unit that outputs information, and displays a screen on a display device such as a display. The text data acquisition unit 15 acquires text data.
The update unit 16 includes a reading error generation unit 161, a pronunciation dictionary update unit 162, and a language model update unit 163. The misreading generation unit 161 acquires a sentence from text data, and if there is a word with a predetermined part-of-speech that includes kanji in the words constituting the acquired sentence, the misreading generation unit 161 assigns a reading error to the word. The pronunciation dictionary update unit 162 updates the pronunciation dictionary stored in the pronunciation dictionary storage unit 12 based on how to read the reading mistake given by the reading mistake generation unit 161. When the language model update unit 163 obtains the n-gram and the appearance frequency from the sentence acquired from the text data, the reading error generation unit 161 gives a reading error to the correct reading word included in the obtained n-gram. An n-gram replaced with the word is generated. The language model update unit 163 gives the appearance frequency based on the appearance frequency of the n-gram before the replacement with the word with the reading error to the n-gram generated by the replacement with the word with the reading error. The language model update unit 163 updates the language model stored in the language model storage unit 13 based on the generated n-gram and the given appearance frequency.

音声データ取得部１７は、発話の音声データを取得する。音響特徴量抽出部１８は、音声データ取得部１７から取得した音声データから音響特徴量を抽出する。認識処理部１９は、音響モデル記憶部１１に記憶されている音響モデル、発音辞書記憶部１２に記憶されている更新後の発音辞書及び言語モデル記憶部１３に記憶されている更新後の言語モデル、及び音響特徴量抽出部１８が抽出した音響特徴量を用いて音声データの音声認識を行う。認識処理部１９は、音声認識結果を設定した認識結果データをユーザーインターフェース制御部１４に出力する。ユーザーインターフェース制御部１４は、認識結果データに読み間違いの単語が含まれている場合、読み間違いであることを通知する情報を表示装置に表示させる。 The voice data acquisition unit 17 acquires voice data of an utterance. The acoustic feature amount extraction unit 18 extracts an acoustic feature amount from the voice data acquired from the voice data acquisition unit 17. The recognition processing unit 19 includes an acoustic model stored in the acoustic model storage unit 11, an updated pronunciation dictionary stored in the pronunciation dictionary storage unit 12, and an updated language model stored in the language model storage unit 13. , And the acoustic feature quantity extracted by the acoustic feature quantity extraction unit 18 is used to perform voice recognition of the voice data. The recognition processing unit 19 outputs the recognition result data in which the voice recognition result is set to the user interface control unit 14. When the recognition result data includes a misread word, the user interface control unit 14 causes the display device to display information notifying that the read error has occurred.

図２は、発話評価装置１の全体処理を示すフローチャートである。
まず、発話評価装置１のテキストデータ取得部１５は、テキストデータを取得する。更新部１６は、テキストデータ取得部１５が取得したテキストデータから文を取得し、取得した文を構成する単語に漢字が含まれる所定の品詞の単語が含まれている場合、その単語に読み間違いを付与する。更新部１６は、付与した読み間違いの読み方に基づいて発音辞書記憶部１２に記憶されている発音辞書及び言語モデル記憶部１３に記憶されている言語モデルを更新する（ステップＳ１０５）。更新部１６は、読み間違いの読み方に対応付けられる単語には読み間違いの単語であることを表す識別情報を付与して発音辞書に登録する。また、更新部１６は、テキストデータに含まれる文からｎ−ｇｒａｍを取得する。更新部１６は、取得したｎ−ｇｒａｍを構成する単語に読み間違いの読み方を付与した元の単語が含まれる場合、その単語に読み間違いの識別情報を付与することにより、正しい読みの単語を読み間違いの単語に置き換えたｎ−ｇｒａｍを生成する。更新部１６は、読み間違いの単語を含んだｎ−ｇｒａｍにより言語モデルを更新する。なお、更新部１６は、読み間違いの単語を含んだｎ−ｇｒａｍについては、そのｎ−ｇｒａｍを生成する元となった、正しい単語を含んだｎ−ｇｒａｍに基づく出現頻度を付与する。 FIG. 2 is a flowchart showing the overall processing of the utterance evaluation apparatus 1.
First, the text data acquisition unit 15 of the utterance evaluation apparatus 1 acquires text data. The update unit 16 acquires a sentence from the text data acquired by the text data acquisition unit 15, and if the word constituting the acquired sentence includes a word with a predetermined part-of-speech including kanji, the word is misread Is granted. The updating unit 16 updates the pronunciation dictionary stored in the pronunciation dictionary storage unit 12 and the language model stored in the language model storage unit 13 based on how to read the given reading mistake (step S105). The update unit 16 adds identification information indicating that the word is misread to the word associated with the reading mistake and registers it in the pronunciation dictionary. Moreover, the update part 16 acquires n-gram from the sentence contained in text data. The update unit 16 reads the correct reading word by adding the reading error identification information to the word when the original word to which the reading error is given is included in the acquired n-gram. An n-gram replaced with an erroneous word is generated. The update unit 16 updates the language model with an n-gram that includes a misread word. In addition, about the n-gram containing the word of misreading, the update part 16 gives the appearance frequency based on the n-gram containing the correct word used as the origin which produced | generated the n-gram.

発音辞書及び言語モデルの更新後、音声データ取得部１７は入力された音声データを取得する。音響特徴量抽出部１８は、音声データ取得部１７が取得した音声データから音響特徴量を抽出する。認識処理部１９は、音響特徴量抽出部１８が抽出した音響特徴量と、音響モデル記憶部１１に記憶されている音響モデルと、ステップＳ１０５において更新された発音辞書及び言語モデルとを用いて、入力された音声データを音声認識する。認識処理部１９は、音声認識結果を設定した認識結果データをユーザーインターフェース制御部１４に出力する。ユーザーインターフェース制御部１４は、認識結果データに識別情報が付与された読み間違いの単語が含まれている場合、読み間違いであることを通知する情報を表示装置に表示させる（ステップＳ１１０）。 After updating the pronunciation dictionary and the language model, the voice data acquisition unit 17 acquires the input voice data. The acoustic feature quantity extraction unit 18 extracts an acoustic feature quantity from the voice data acquired by the voice data acquisition unit 17. The recognition processing unit 19 uses the acoustic feature amount extracted by the acoustic feature amount extraction unit 18, the acoustic model stored in the acoustic model storage unit 11, and the pronunciation dictionary and language model updated in step S105. Recognize the input voice data. The recognition processing unit 19 outputs the recognition result data in which the voice recognition result is set to the user interface control unit 14. When the recognition result data includes a misread word to which identification information is added, the user interface control unit 14 causes the display device to display information notifying that there is a misread (step S110).

図３は、発話評価装置１の発音辞書及び言語モデル更新処理を示すフローチャートである。同図は、図２のステップＳ１０５における発音辞書及び言語モデルの更新処理の詳細を示す。
まず、ユーザーインターフェース制御部１４は、台本や原稿などのテキストデータを入力するための画面を表示装置に表示させる。テキストデータ取得部１５は、キーボード入力やファイル入力等を行う入力手段（図示せず）により入力されたテキストデータを取得し、更新部１６に出力する（ステップＳ２０５）。更新部１６は、ステップＳ２１０からステップＳ２５０の処理により、入力されたテキストデータを元に、発音辞書の更新を行う。 FIG. 3 is a flowchart showing the pronunciation dictionary and language model update processing of the utterance evaluation apparatus 1. This figure shows the details of the pronunciation dictionary and language model update processing in step S105 of FIG.
First, the user interface control unit 14 displays a screen for inputting text data such as a script or a manuscript on the display device. The text data acquisition unit 15 acquires text data input by an input unit (not shown) that performs keyboard input, file input, and the like, and outputs the text data to the update unit 16 (step S205). The update unit 16 updates the pronunciation dictionary based on the input text data by the processing from step S210 to step S250.

更新部１６の読み間違い生成部１６１は、テキストデータにまだ取得していない文がある場合（ステップＳ２１０：ＮＯ）、テキストデータからまだ取得していない１文を取得する（ステップＳ２１５）。読み間違い生成部１６１は、ステップＳ２１５において取得した文を形態素解析し、文を単語に分割するとともにその単語の品詞を取得する（ステップＳ２２０）。読み間違い生成部１６１は、形態素解析の結果、ステップＳ２１５において取得した文から単語を取得できないと判断した場合（ステップＳ２２５：ＮＯ）、ステップＳ２１０からの処理を繰り返す。 When there is a sentence that has not yet been acquired in the text data (step S210: NO), the reading error generation unit 161 of the update unit 16 acquires one sentence that has not yet been acquired from the text data (step S215). The reading error generation unit 161 performs morphological analysis on the sentence acquired in step S215, divides the sentence into words, and acquires the part of speech of the word (step S220). If the reading error generation unit 161 determines that the word cannot be acquired from the sentence acquired in step S215 as a result of the morphological analysis (step S225: NO), the processing from step S210 is repeated.

読み間違い生成部１６１は、ステップＳ２１５において取得した文から単語を取得できると判断した場合（ステップＳ２２５：ＹＥＳ）、その文に名詞の単語が含まれるかを判断する（ステップＳ２３０）。読み間違い生成部１６１は、取得した文に名詞の単語が含まれていないと判断した場合（ステップＳ２３０：ＮＯ）、ステップＳ２１０からの処理を繰り返す。 If the reading error generation unit 161 determines that a word can be acquired from the sentence acquired in step S215 (step S225: YES), the reading error generation unit 161 determines whether the sentence includes a noun word (step S230). When the reading error generation unit 161 determines that the acquired sentence does not include the noun word (step S230: NO), the reading error generation unit 161 repeats the processing from step S210.

読み間違い生成部１６１は、取得した文に名詞の単語が含まれていると判断した場合（ステップＳ２３０：ＹＥＳ）、その名詞を１文字ずつに分解する（ステップＳ２３５）。読み間違い生成部１６１は、名詞を分解して得られた文字の中に漢字が含まれていないと判断した場合（ステップＳ２４０：ＮＯ）、ステップＳ２１０からの処理を繰り返す。 If the reading error generation unit 161 determines that the acquired sentence includes a noun word (step S230: YES), the reading error generation unit 161 decomposes the noun into characters one by one (step S235). The misreading generation unit 161 repeats the processing from step S210 when determining that kanji is not included in the character obtained by decomposing the noun (step S240: NO).

一方、読み間違い生成部１６１は、名詞を分解して得られた文字の中に漢字が含まれていると判断した場合（ステップＳ２４０：ＹＥＳ）、漢字が含まれる名詞に読み間違いの読み方を付与する（ステップＳ２４５）。以下では、漢字が含まれる名詞の単語を「処理対象単語」と記載する。 On the other hand, if the misreading generation unit 161 determines that kanji is included in the character obtained by decomposing the noun (step S240: YES), the misreading is given to the noun including the kanji. (Step S245). Hereinafter, a noun word including kanji is referred to as a “processing target word”.

例えば、予め発話評価装置１が内部または外部に備える図示しない記憶部（あるいは、発音辞書記憶部１２）に、各漢字の音読み及び訓読みのデータを格納しておく。これらの音読み及び訓読みのデータは、音素で表したデータであってもよい。読み間違い生成部１６１は、処理対象単語に含まれる漢字の音読みと訓読みのデータを記憶部から読み出すことにより、その漢字の読み方を得る。読み間違い生成部１６１は、処理対象単語それぞれについて、その処理対象単語に含まれる漢字の読みを用いて、以下のように読み間違いの読み方を作成する。 For example, the phonetic reading and knot reading data of each kanji are stored in advance in a storage unit (or pronunciation dictionary storage unit 12) (not shown) provided inside or outside of the speech evaluation apparatus 1. These sound readings and knot reading data may be data represented by phonemes. The reading error generation unit 161 reads the kanji sound reading and kanji reading data included in the processing target word from the storage unit, thereby obtaining the kanji reading. For each word to be processed, the reading error generation unit 161 creates a reading error reading method using the kanji readings included in the processing word as follows.

すなわち、読み間違い生成部１６１は、処理対象単語に含まれる漢字について得た読み方の全ての組み合わせを作成し、その処理対象単語の読み方とする。読み間違い生成部１６１は、処理対象単語の正しい読み方を、発音辞書記憶部１２に記憶されている発音辞書から読み出す。読み間違い生成部１６１は、処理対象単語について生成した読み方のうち、正しい読み方以外を読み間違いとする。 That is, the misreading generation unit 161 creates all combinations of readings obtained for kanji included in the processing target word, and sets the reading of the processing target word. The reading error generation unit 161 reads the correct reading of the processing target word from the pronunciation dictionary stored in the pronunciation dictionary storage unit 12. The reading error generation unit 161 determines reading errors other than the correct reading among the readings generated for the processing target word.

例えば「象潟」という名詞の場合、「象」の読み方として音読み「しょう」、「ぞう」及び訓読み「かたち」、「かたど（る）」が読み出され、「潟」の読み方として音読み「せき」及び訓読み「かた」が読み出される。読み間違い生成部１６１は、「象」の読み方と「潟」の読み方との全ての組み合わせから「しょうせき」、「しょうかた」、「ぞうせき」、「ぞうかた」、…を「象潟」の読み方として作成する。読み間違い生成部１６１は、発音辞書記憶部１２に記憶されている発音辞書から読み出した「象潟」の正しい読み「きさかた」以外の読み方を読み間違いとする。 For example, in the case of the noun “Kigata”, the readings of “Elephant” are “Sho”, “Elephant”, “Kun” and “Katado”, and “Seki” is read as “Kata”. And the cautionary reading “how” is read out. The misreading generation unit 161 changes “Shiseki”, “Shōkata”, “Elephant”, “Elephant”, etc. from all combinations of “Elephant” reading and “gata” reading. As a way to read "". The reading error generation unit 161 makes a reading error other than the correct reading “Kisakata” of “Kisakata” read from the pronunciation dictionary stored in the pronunciation dictionary storage unit 12.

発音辞書更新部１６２は、ステップＳ２４５において読み間違い生成部１６１が作成した各処理対象単語の読み間違いを発音辞書に追加登録する（ステップＳ２５０）。発音辞書更新部１６２は、発音辞書に処理対象単語とその処理対象単語の読み間違いの読み方の音素とを対応付けて発音辞書に登録する際、処理対象単語には読み間違いを示す識別情報を付与する。例えば、単語「象潟」と読み間違い「しょうせき」とを対応付けて登録する際、単語「象潟」には識別情報を付加する。本実施形態では、識別情報として単語に「※」を付加し、読み間違いの単語であることを表す。例えば、単語「象潟」から生成された読み間違いの単語は「※象潟」となる。なお、発音辞書に正解の読み方と対応づけて元から登録されている単語には識別情報は付与しない。 The pronunciation dictionary update unit 162 additionally registers a reading error of each processing target word created by the reading error generation unit 161 in step S245 in the pronunciation dictionary (step S250). The pronunciation dictionary update unit 162 assigns identification information indicating a reading error to the processing word when the processing word and the phoneme of the reading error of the processing target word are associated with the pronunciation dictionary and registered in the pronunciation dictionary. To do. For example, when registering the word “Kigata” and the misreading “Shiseki” in association with each other, identification information is added to the word “Kisakata”. In the present embodiment, “*” is added to a word as identification information to indicate a misread word. For example, a misread word generated from the word “Kigata” is “* Kigata”. Note that identification information is not given to words that are registered in the pronunciation dictionary in association with correct readings.

発音辞書に登録された読み間違いの単語は、言語モデルの学習データには通常は含まれていない。従って、言語モデルにおいては、読み間違いの単語に出現確率０％などの低いが付与される。この場合、音声認識において読み間違いの単語を認識結果として得ることは難しい。そこで、次に、ステップＳ２５５〜ステップＳ２８０の処理により、言語モデル更新部１６３は、読み間違いの単語を用いたｎ−ｇｒａｍに、実際よりも高い出現頻度を与え、言語モデルに追加する。本実施形態では、正しい読み方と同様の出現確率で、読み間違いが発生すると仮定し、言語モデルの読み間違いへの適応を行う。 Misread words registered in the pronunciation dictionary are usually not included in the language model learning data. Therefore, in the language model, words with a low appearance probability such as 0% are assigned to misread words. In this case, it is difficult to obtain a misread word as a recognition result in speech recognition. Therefore, next, through the processing from step S255 to step S280, the language model update unit 163 gives an appearance frequency higher than the actual to n-gram using a misread word and adds it to the language model. In this embodiment, it is assumed that a reading error occurs with the same appearance probability as the correct reading method, and the language model is adapted to the reading error.

言語モデル更新部１６３は、言語モデル更新のため、テキストデータ取得部１５が取得したテキストデータに含まれる文に基づいて、ｎ単語の連鎖であるｎ−ｇｒａｍを取得するとともに、そのｎ−ｇｒａｍの出現頻度を算出する（ステップＳ２５５）。言語モデル更新部１６３は、取得したｎ−ｇｒａｍを全て取り出していない場合（ステップＳ２６０：ＮＯ）、まだ取り出していないｎ−ｇｒａｍを１つ取り出す（ステップＳ２６５）。言語モデル更新部１６３は、取り出したｎ−ｇｒａｍに読み間違いの単語の生成元となった単語が含まれていないと判断した場合（ステップＳ２７０：ＮＯ）、ステップＳ２６０からの処理を繰り返す。 The language model update unit 163 acquires an n-gram that is a chain of n words based on a sentence included in the text data acquired by the text data acquisition unit 15 for updating the language model, and the n-gram of the n-gram is updated. The appearance frequency is calculated (step S255). If all the acquired n-grams have not been extracted (step S260: NO), the language model update unit 163 extracts one n-gram that has not yet been extracted (step S265). When the language model update unit 163 determines that the extracted n-gram does not include a word that is a generation source of a misread word (step S270: NO), the process from step S260 is repeated.

言語モデル更新部１６３は、ステップＳ２６５において取り出したｎ−ｇｒａｍに、読み間違いの単語の生成元となった単語が含まれていると判断した場合（ステップＳ２７０：ＮＯ）、読み間違いの単語を含んだｎ−ｇｒａｍを作成する（ステップＳ２７５）。具体的には、言語モデル更新部１６３は、取り出したｎ−ｇｒａｍを構成する単語のうち、読み間違いの単語の生成元となった単語に、読み間違いを表す識別情報を付加して新たなｎ−ｇｒａｍを作成する。言語モデル更新部１６３は、識別情報が付加された読み間違いの単語の出現頻度には、例えば、読み間違いの単語の生成元となった単語と同じ出現頻度、あるいは、その出現頻度に所定の演算を行って得られた出現頻度を付与する。例えば、ｎ＝２、ステップＳ２６５において取り出したｎ−ｇｒａｍが「象潟」→「の」（出現頻度ａ）である場合、言語モデル更新部１６３は、「象潟」に識別情報を付加してｎ−ｇｒａｍ「※象潟」→「の」（出現頻度ａ）を生成する。言語モデル更新部１６３は、ステップＳ２６０からの処理を繰り返す。 If the language model update unit 163 determines that the n-gram extracted in step S265 includes the word that is the source of the misread word (step S270: NO), the language model update unit 163 includes the misread word. An n-gram is created (step S275). Specifically, the language model update unit 163 adds identification information indicating misreading to the word that is the generation source of the misread word from the extracted n-gram and adds a new n -Create a gram. The language model update unit 163 may calculate the occurrence frequency of the misread word to which the identification information is added, for example, the same appearance frequency as the word from which the misread word is generated, or a predetermined calculation for the appearance frequency. Appearance frequency obtained by performing is given. For example, if n = 2 and the n-gram extracted in step S265 is “Kisakata” → “no” (appearance frequency a), the language model update unit 163 adds identification information to “Kisakata” and adds n− The gram “* Kigata” → “no” (appearance frequency a) is generated. The language model update unit 163 repeats the processing from step S260.

そして、言語モデル更新部１６３は、ステップＳ２５５において取得したｎ−ｇｒａｍを全て取り出したと判断する（ステップＳ２６０：ＹＥＳ）。言語モデル更新部１６３は、ステップＳ２５５において取得したｎ−ｇｒａｍと、ステップＳ２７５において生成したｎ−ｇｒａｍとを用いて、言語モデル記憶部１３に記憶されている言語モデルを、従来技術と同様に更新する。 Then, the language model update unit 163 determines that all the n-grams acquired in step S255 have been extracted (step S260: YES). The language model update unit 163 updates the language model stored in the language model storage unit 13 using the n-gram acquired in step S255 and the n-gram generated in step S275 in the same manner as in the related art. To do.

なお、言語モデル更新部１６３は、テキストデータから取り出されたｎ−ｇｒａｍを構成する単語に読み間違いを表す識別情報を付加して新たなｎ−ｇｒａｍを作成する際、予め決められた出現頻度を付与してもよい。 Note that the language model updating unit 163 adds a predetermined appearance frequency when creating a new n-gram by adding identification information indicating a reading error to words constituting the n-gram extracted from the text data. It may be given.

図４は、発話評価装置１の読み間違い指摘処理を示すフローチャートである。同図は、図２のステップＳ１１０における読み間違い指摘処理の詳細を示す。
音声データ取得部１７は、音声データの入力待ちである（ステップＳ３０５：ＮＯ）。発話評価装置１に発話の音声データが入力されると、音声データ取得部１７は、入力された音声データを取得する（ステップＳ３０５：ＹＥＳ）。発話は、図３のステップＳ２０５において取得したテキストデータが示す台本や原稿などを読んだ発話でもよく、それ以外の文章を読んだ発話でもよい。 FIG. 4 is a flowchart showing a misreading indication process of the utterance evaluation apparatus 1. This figure shows the details of the misread indication process in step S110 of FIG.
The voice data acquisition unit 17 waits for voice data input (step S305: NO). When utterance voice data is input to the utterance evaluation device 1, the voice data acquisition unit 17 acquires the input voice data (step S305: YES). The utterance may be an utterance obtained by reading a script or a manuscript indicated by the text data acquired in step S205 in FIG. 3, or an utterance obtained by reading other sentences.

音響特徴量抽出部１８は、音声データ取得部１７が取得した音声データから、音響特徴量を抽出する。認識処理部１９は、音響特徴量抽出部１８から読み出した音響特徴量と、音響モデル、発音辞書、及び言語モデルを用いて、従来技術と同様に音声認識処理を行う。すなわち、音響特徴量抽出部１８が抽出した時系列の音響特徴量と、音響モデル記憶部１１に記憶されている音響モデルとを照合して音素系列を得る。認識処理部１９は、得られた音素系列における音素の並びと、発音辞書記憶部１２に記憶されている発音辞書とを照合して、音素系列に対応する単語列を得る。認識処理部１９は、言語モデル記憶部１３に記憶されている言語モデルを用いて、得られた単語列の出現確率を得る。認識処理部１９は、もっとも出現確率の高い単語列を音声認識結果として設定した認識結果データをユーザーインターフェース制御部１４に出力する（ステップＳ３１０）。 The acoustic feature quantity extraction unit 18 extracts an acoustic feature quantity from the voice data acquired by the voice data acquisition unit 17. The recognition processing unit 19 performs speech recognition processing using the acoustic feature amount read from the acoustic feature amount extraction unit 18, the acoustic model, the pronunciation dictionary, and the language model as in the conventional technique. That is, the phoneme series is obtained by comparing the time-series acoustic feature quantity extracted by the acoustic feature quantity extraction unit 18 with the acoustic model stored in the acoustic model storage unit 11. The recognition processing unit 19 collates the phoneme sequence in the obtained phoneme sequence with the pronunciation dictionary stored in the pronunciation dictionary storage unit 12 to obtain a word string corresponding to the phoneme sequence. The recognition processing unit 19 uses the language model stored in the language model storage unit 13 to obtain the appearance probability of the obtained word string. The recognition processing unit 19 outputs recognition result data in which the word string having the highest appearance probability is set as the voice recognition result to the user interface control unit 14 (step S310).

ユーザーインターフェース制御部１４は、認識処理部１９から受信した認識結果データが示す単語列に識別情報が付加された単語が含まれているかを判断する（ステップＳ３１５）。ユーザーインターフェース制御部１４が、認識結果データが示す音声認識結果に識別情報が付加された単語が含まれていないと判断した場合（ステップＳ３１５：ＮＯ）、発話評価装置は、ステップＳ３０５からの処理を繰り返す。ユーザーインターフェース制御部１４が、認識結果データが示す音声認識結果に識別情報が付加された単語が含まれていないと判断した場合（ステップＳ３１５：ＮＯ）、発話評価装置は、ステップＳ３０５からの処理を繰り返す。一方、ユーザーインターフェース制御部１４は、認識結果データが示す音声認識結果に識別情報が付加された単語が含まれていると判断した場合（ステップＳ３１５：ＹＥＳ）、読み間違いを表示装置に表示させるなどして通知し、ユーザに警告する（ステップＳ３２０）。発話評価装置１は、ステップＳ３０５からの処理を繰り返す。 The user interface control unit 14 determines whether the word string indicated by the recognition result data received from the recognition processing unit 19 includes a word with identification information added (step S315). When the user interface control unit 14 determines that the speech recognition result indicated by the recognition result data does not include the word with the identification information added (step S315: NO), the utterance evaluation apparatus performs the processing from step S305. repeat. When the user interface control unit 14 determines that the speech recognition result indicated by the recognition result data does not include the word with the identification information added (step S315: NO), the utterance evaluation apparatus performs the processing from step S305. repeat. On the other hand, when the user interface control unit 14 determines that the speech recognition result indicated by the recognition result data includes a word to which the identification information is added (step S315: YES), the user interface control unit 14 displays a reading error on the display device. To notify the user (step S320). The utterance evaluation device 1 repeats the processing from step S305.

図５は、発話評価装置１のユーザーインターフェース制御部１４が表示装置に表示させる読み間違い指摘画面である。同図に示すように、ユーザーインターフェース制御部１４は、認識処理部１９が出力した認識結果データが示す音声認識結果を表示させるとともに、音声認識結果に読み間違いの単語が含まれる場合、その読み間違いの単語が含まれる文を表示させる。同図では、読み間違いの単語が含まれるとして検出された文のうち、最後の文（読み間違い文章）と、それより前の文（読み間違い履歴）が表示されている。 FIG. 5 is a reading error indication screen displayed on the display device by the user interface control unit 14 of the speech evaluation apparatus 1. As shown in the figure, the user interface control unit 14 displays the speech recognition result indicated by the recognition result data output from the recognition processing unit 19 and, if the speech recognition result includes a misread word, the reading error. Display sentences that contain the word. In the figure, the last sentence (read mistake sentence) and the previous sentence (read mistake history) among the sentences detected as including misread words are displayed.

上述したように、発話評価装置１は、学習用テキストデータから、読み間違いが起こりうる単語を抽出し、その単語の表記を変更した上で読み間違いの読み方（読み間違い候補発音列）を付与して発音辞書に追加登録する。さらに、発話評価装置１は、テキストデータから抽出したｎ−ｇｒａｍを構成する単語を、読み間違いの読み方を付与した単語に置き換えたｎ−ｇｒａｍを追加生成し、言語モデルを更新する。これにより、発話評価装置１は、発話に読み間違いが含まれていた際に、変更した表記の単語を出力する。従って、学習用テキストデータに含まれていた単語が正しい読みで発話されなかった場合に、読み間違いの検出が可能となる。また、特定の話者の発話を学習データとして用いないため、様々な発話に対して読み間違いを検出することができる。 As described above, the utterance evaluation apparatus 1 extracts words that may cause reading mistakes from the text data for learning, changes the notation of the words, and gives a reading method of reading mistakes (reading error candidate pronunciation strings). To add to the pronunciation dictionary. Furthermore, the utterance evaluation apparatus 1 additionally generates an n-gram in which the words constituting the n-gram extracted from the text data are replaced with words that are given a reading error, and updates the language model. As a result, the utterance evaluation apparatus 1 outputs the changed notation word when the utterance includes a reading error. Therefore, when a word included in the learning text data is not uttered with correct reading, it is possible to detect a reading error. In addition, since the utterance of a specific speaker is not used as learning data, it is possible to detect misreading for various utterances.

以上説明した発話評価装置１は、例えば、原稿や台本、教科書といったあらかじめ読む内容の決まったものに対して、話者の読み間違いを自動で指摘することが可能となる。例えば、発話評価装置１は、原稿を学習用テキストデータに用いて予め発音辞書及び言語モデルを更新したのち、その原稿を声に出して読む。発話評価装置１は、読み間違いがあった時には、その旨を表示する。これにより、放送現場や舞台現場などで下読みの際に、自分の読み方に誤りがないかを確認したり、教育現場で学生が予習の段階で音読に誤りがないか事前に確認したりすることができ、正しい情報の送出やスムーズな仕事・授業につながる。 The utterance evaluation apparatus 1 described above can automatically point out a speaker's reading error automatically for a predetermined content such as a manuscript, a script, or a textbook. For example, the utterance evaluation apparatus 1 reads the text aloud after updating the pronunciation dictionary and the language model in advance using the text as learning text data. When there is a reading error, the utterance evaluation device 1 displays that fact. This allows you to check whether there are any mistakes in your reading when pre-reading on the broadcast site or stage site, or to check in advance whether there is an error in reading aloud at the education site. Can lead to the transmission of correct information and smooth work and classes.

言葉を扱う職業では、読み間違いをなくすことは大きな課題である。例えば、アナウンサーは、読み間違いやすい単語の単語集を保持しており、読み間違いを如何に少なくするかは重大な関心事である。そこで、本実施形態の発話評価装置を用いることによって、事前に読み間違いを発見することができ、正しい情報を放送することにつながる。
また、アナウンサーの読み間違い同様に、役者の台本の読み間違いについても、本実施形態の発話評価装置を用いて事前に練習を行うことで、スムーズな稽古や収録を行うことが可能となる。
また、教育現場において、国語の音読は欠かすことのできない教育方法である。その際、本実施形態の発話評価装置をゲーム感覚で用いることによって、簡易な漢字の読み間違いによる授業進行の遅れや、生徒自身の読み間違いによる羞恥心を軽減することができる。 In a language-oriented profession, eliminating misreading is a major challenge. For example, an announcer maintains a word collection of easily misread words, and how to reduce misreading is a serious concern. Therefore, by using the utterance evaluation apparatus of the present embodiment, it is possible to discover reading mistakes in advance, leading to broadcasting correct information.
In addition to the misreading of the announcer, the reading of the actor's script can be practiced and recorded smoothly by practicing in advance using the speech evaluation apparatus of the present embodiment.
In addition, reading aloud in Japanese is an indispensable educational method in educational settings. At that time, by using the utterance evaluation apparatus of the present embodiment as if it were a game, it is possible to reduce the delay in the course progression due to a simple reading error of kanji and the shame caused by the student's reading error.

なお、上述の発話評価装置１は、内部にコンピュータシステムを有している。そして、発話評価装置１の動作の過程は、プログラムの形式でコンピュータ読み取り可能な記録媒体に記憶されており、このプログラムをコンピュータシステムが読み出して実行することによって、上記処理が行われる。ここでいうコンピュータシステムとは、ＣＰＵ及び各種メモリやＯＳ、周辺機器等のハードウェアを含むものである。 The utterance evaluation apparatus 1 described above has a computer system inside. The operation process of the utterance evaluation apparatus 1 is stored in a computer-readable recording medium in the form of a program, and the above processing is performed by the computer system reading and executing this program. The computer system here includes a CPU, various memories, an OS, and hardware such as peripheral devices.

また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含むものとする。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよい。 Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case, and a program that holds a program for a certain period of time are also included. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１発話評価装置
１１音響モデル記憶部
１２発音辞書記憶部
１３言語モデル記憶部
１４ユーザーインターフェース制御部
１５テキストデータ取得部
１６更新部
１７音声データ取得部
１８音響特徴量抽出部
１９認識処理部
１６１読み間違い生成部
１６２発音辞書更新部
１６３言語モデル更新部 DESCRIPTION OF SYMBOLS 1 Speech evaluation apparatus 11 Acoustic model memory | storage part 12 Pronunciation dictionary memory | storage part 13 Language model memory | storage part 14 User interface control part 15 Text data acquisition part 16 Update part 17 Speech data acquisition part 18 Acoustic feature-value extraction part 19 Recognition processing part 161 Reading mistake Generation unit 162 Pronunciation dictionary update unit 163 Language model update unit

Claims

A pronunciation dictionary storage unit that stores a pronunciation dictionary in which a word and a reading of the word are associated;
A language model storage unit for storing a language model representing the ease of connection between words;
Of the words constituting the sentence acquired from the text data, the word including the kanji is set as the processing target word, and the reading of the processing target word is read based on the reading that the kanji included in the processing target word can take. A reading error generation section to generate,
The pronunciation dictionary is updated by associating and registering the processing target word to which identification information indicating a reading error is added and the reading error generated by the reading error generation unit with respect to the processing target word. Pronunciation dictionary update part,
A connection between words is acquired from the text data, and a connection between words generated by adding the identification information to the processing target word included in the acquired connection between words and a connection between the generated words A language model update unit that updates the language model based on the given frequency of appearance;
A recognition processing unit that recognizes speech data based on the pronunciation dictionary updated by the pronunciation dictionary update unit and the language model updated by the language model update unit;
An output unit that outputs a reading error when a word to which the identification information is added is included in a result of speech recognition by the recognition processing unit;
An utterance evaluation apparatus comprising:

The misreading generation unit sets the word of a predetermined part-of-speech including the kanji among the words constituting the sentence acquired from the text data as the processing target word.
The utterance evaluation apparatus according to claim 1.

The language model update unit gives an appearance frequency based on the appearance frequency calculated from the text data for the connection between the words before adding the identification information to the generated connection between the words,
The utterance evaluation apparatus according to claim 1 or 2, characterized by the above.

An utterance evaluation method executed by the utterance evaluation device,
Of the words constituting the sentence acquired from the text data, the word including the kanji is set as the processing target word, and the reading of the processing target word is read based on the reading that the kanji included in the processing target word can take. A reading error generation step to generate,
The processing target word in which identification information indicating a reading error is added to the pronunciation dictionary in which the word and the reading of the word are associated, and the reading error generated in the reading error generation step for the processing target word A pronunciation dictionary update step of updating the pronunciation dictionary by registering the readings in association with each other;
A connection between words is acquired from the text data, and a connection between words generated by adding the identification information to the processing target word included in the acquired connection between words and a connection between the generated words A language model update step for updating a language model representing the ease of connection between words based on the given frequency of appearance;
A recognition processing step for recognizing speech data based on the pronunciation dictionary updated in the pronunciation dictionary update step and the language model updated in the language model update step;
An output step of outputting a reading error when a word to which the identification information is added is included in a result of speech recognition in the recognition processing step;
An utterance evaluation method characterized by comprising:

Computer
Pronunciation dictionary storage means for storing a pronunciation dictionary in which a word and a reading of the word are associated with each other;
A language model storage means for storing a language model representing ease of connection between words;
Of the words constituting the sentence acquired from the text data, the word including the kanji is set as the processing target word, and the reading of the processing target word is read based on the reading that the kanji included in the processing target word can take. A reading error generation means to generate,
The pronunciation dictionary is updated by associating and registering the processing target word to which identification information indicating a reading error is added and the reading mistake generated by the reading error generation unit with respect to the processing target word. Pronunciation dictionary update means;
A connection between words is acquired from the text data, and a connection between words generated by adding the identification information to the processing target word included in the acquired connection between words and a connection between the generated words Language model updating means for updating the language model based on the given frequency of appearance;
Recognition processing means for recognizing speech data based on the pronunciation dictionary updated by the pronunciation dictionary update means and the language model updated by the language model update means;
An output means for outputting a reading error when a word to which the identification information is added is included in a result of speech recognition by the recognition processing means;
A program for functioning as an utterance evaluation apparatus.