JP5336788B2

JP5336788B2 - Speech recognition apparatus and program

Info

Publication number: JP5336788B2
Application number: JP2008208543A
Authority: JP
Inventors: 孝司佐瀬; 栄二宇都宮; 俊樹遠藤
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2008-08-13
Filing date: 2008-08-13
Publication date: 2013-11-06
Anticipated expiration: 2028-08-13
Also published as: JP2010044240A

Description

本発明は、音声認識技術に関し、特に、ユーザから音声認識要求と共に入力された音声信号が語彙外であった場合に、ユーザに対して語彙外である旨を通知する技術に関する。 The present invention relates to a voice recognition technique, and more particularly, to a technique for notifying a user that a word is out of a vocabulary when a voice signal input together with a voice recognition request from the user is out of the vocabulary.

従来から、音声認識装置において、言語モデル（音声認識辞書）に追加したい単語が含まれるフリーフォーマットで記述された文書ファイルから言語モデルに単語を登録する技術が知られている。特許文献１に開示されている音声認識装置では、未知語が含まれた文書ファイルを読み込み、形態素解析などを行なって単語を抽出し、言語モデルに存在しない単語を未知語として抽出する。抽出した未知語を一覧表示し、ユーザが未知語と未知語に付与された読みと品詞の修正や削除を行なった後、選択した未知語を言語モデルに一括登録する。
特開２００３−３１６３７６号公報 2. Description of the Related Art Conventionally, a technology for registering words in a language model from a document file described in a free format that includes words to be added to a language model (speech recognition dictionary) in a speech recognition apparatus is known. The speech recognition apparatus disclosed in Patent Document 1 reads a document file containing unknown words, performs morphological analysis, etc., extracts words, and extracts words that do not exist in the language model as unknown words. The extracted unknown words are displayed in a list, and after the user corrects or deletes the readings and parts of speech assigned to the unknown words and unknown words, the selected unknown words are collectively registered in the language model.
JP 2003-316376 A

しかしながら、上記の従来技術では、フリーフォーマットで記述されている文書ファイルから単語を抜き出して登録をするので、不必要な単語を言語モデルに追加してしまう可能性がある。その結果、言語モデルが肥大化し、音声認識の性能が劣化してしまうことが懸念される。 However, in the above prior art, since words are extracted from a document file described in a free format and registered, unnecessary words may be added to the language model. As a result, there is a concern that the language model is enlarged and the speech recognition performance is deteriorated.

本発明は、このような事情に鑑みてなされたものであり、ユーザから音声認識要求と共に入力された音声信号が語彙外であった場合に、ユーザに対して語彙外である旨を通知することができる音声認識装置およびプログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and when a voice signal input together with a voice recognition request from a user is out of the vocabulary, the user is notified that it is out of the vocabulary. An object of the present invention is to provide a speech recognition apparatus and program capable of performing the above.

（１）上記の目的を達成するために、本発明は、以下のような手段を講じた。すなわち、本発明の音声認識装置は、ユーザから音声認識要求と共に入力された音声信号が語彙外であった場合に、ユーザに対して語彙外である旨を通知する音声認識装置であって、ユーザから音声認識要求および音声信号の入力を受け付ける認識処理要求受信手段と、前記入力された音声信号が語彙外であるかどうかを判定する語彙外判定部と、音響モデルおよび言語モデルを用いて、入力された音声信号の音声認識処理を行なう音声認識処理手段と、を備え、入力された音声信号が語彙外であった場合は、ユーザに対して語彙外である旨を通知することを特徴としている。 (1) In order to achieve the above object, the present invention takes the following measures. That is, the speech recognition device according to the present invention is a speech recognition device that notifies a user that a vocabulary is out of a vocabulary when a speech signal input together with a speech recognition request from the user is out of the vocabulary. A recognition processing request receiving means for receiving a speech recognition request and an input of a speech signal from, a non-vocabulary determining unit for determining whether or not the input speech signal is out of a vocabulary, an input using an acoustic model and a language model Voice recognition processing means for performing voice recognition processing of the received voice signal, and when the input voice signal is out of the vocabulary, the user is notified that it is out of the vocabulary .

このように、入力された音声信号が語彙外であった場合は、ユーザに対して語彙外である旨を通知するので、無駄な再発声を防ぐことができ、音声認識装置の利便性を向上させることが可能となる。 In this way, if the input speech signal is out of the vocabulary, the user is notified that the vocabulary is out of the vocabulary, so it is possible to prevent useless recurrence and improve the convenience of the speech recognition device. It becomes possible to make it.

（２）また、本発明の音声認識装置は、入力された音声信号が語彙外であった場合は、ユーザに対して、前記語彙外であるとされた音声の前記言語モデルへの登録を要求することを特徴としている。 (2) In addition, the speech recognition apparatus of the present invention requests the user to register in the language model speech that is outside the vocabulary when the input speech signal is outside the vocabulary. It is characterized by doing.

このように、入力された音声信号が語彙外であった場合は、ユーザに対して、語彙外であるとされた音声の言語モデルへの登録を要求するので、正確に認識可能な語彙数を増加させることが可能となる。 In this way, when the input speech signal is out of the vocabulary, the user is required to register the speech that is considered out of the vocabulary in the language model. It can be increased.

（３）また、本発明の音声認識装置において、前記語彙外判定部は、音声認識要求と共に入力された音声信号と、過去に入力された音声信号との類似度を測定する類似度測定手段と、現時刻から所定期間内に、前記音声認識要求と共に入力された音声信号と類似度が高い過去の音声がある場合は、語彙外の再発声であると判定する語彙外発声判定手段と、を備えることを特徴としている。 (3) Moreover, in the speech recognition apparatus of the present invention, the out-of-vocabulary determination unit includes similarity measurement means for measuring the similarity between the speech signal input together with the speech recognition request and the speech signal input in the past. A non-vocabulary utterance determination means for determining that the voice is a recurrent utterance outside the vocabulary when there is a past voice having a high similarity to the voice signal input together with the voice recognition request within a predetermined period from the current time. It is characterized by providing.

このように、類似度の高い音声信号が現時刻から所定期間内に入力された場合は、語彙外の再発声であると判定することにより、ユーザに対して語彙外発声を回避させることが可能となる。なお、所定期間内の再発声の回数は、Ｎ（Ｎは、自然数。）回である。 As described above, when a voice signal having a high degree of similarity is input within a predetermined period from the current time, it is possible to make the user avoid the vocabulary utterance by determining that the utterance is out of the vocabulary. It becomes. Note that the number of recurrent voices within a predetermined period is N (N is a natural number).

（４）また、本発明のプログラムは、音声認識装置において、ユーザから音声認識要求と共に入力された音声信号が語彙外であった場合に、ユーザに対して語彙外である旨を通知する機能を実現させるプログラムであって、ユーザから音声認識要求および音声信号の入力を受け付ける処理と、前記入力された音声信号が語彙外であるかどうかを判定する処理と、入力された音声信号が語彙外であった場合は、ユーザに対して語彙外である旨を通知する処理と、入力された音声信号が語彙外であった場合は、ユーザに対して、前記語彙外であるとされた発声の内容の前記言語モデルへの登録を要求する処理と、を含む一連の処理を、コンピュータで読み取りおよび実行可能にコマンド化したことを特徴としている。 (4) The program of the present invention has a function of notifying the user that the vocabulary is out of the vocabulary when the voice signal input together with the voice recognition request from the user is out of the vocabulary. A program for realizing, a process of accepting a voice recognition request and a voice signal input from a user, a process of determining whether or not the inputted voice signal is outside the vocabulary, and the inputted voice signal outside the vocabulary If there is, the process of notifying the user that it is out of the vocabulary, and if the input audio signal is out of the vocabulary, the content of the utterance that is determined to be out of the vocabulary to the user A series of processes including a process for requesting registration in the language model is converted into a command that can be read and executed by a computer.

本発明によれば、入力された音声信号が語彙外であった場合は、ユーザに対して語彙外である旨を通知するので、無駄な再発声を防ぐことができ、音声認識装置の利便性を向上させることが可能となる。 According to the present invention, when the input speech signal is out of the vocabulary, the user is notified that the vocabulary is out of the vocabulary. Can be improved.

次に、本発明に係る実施形態について、図面を参照しながら説明する。図１は、本実施形態に係る音声認識装置の概略構成を示す図である。図１において、認識処理要求受信手段１０は、ユーザ端末からの音声認識要求、音声データおよびユーザＩＤを受信し、音声認識処理手段２０に対して、音声データおよびユーザＩＤを出力して、認識処理を指示する。また、認識結果をユーザ端末に返信する。また、認識処理要求受信手段１０は、ユーザに対して語彙外である旨を通知する語彙外通知手段１１と、ユーザからの登録を受け付けて言語モデル２２を更新する辞書更新手段１２とを備えている。 Next, embodiments according to the present invention will be described with reference to the drawings. FIG. 1 is a diagram illustrating a schematic configuration of a speech recognition apparatus according to the present embodiment. In FIG. 1, a recognition process request receiving unit 10 receives a voice recognition request, voice data, and a user ID from a user terminal, outputs voice data and a user ID to the voice recognition processing unit 20, and performs a recognition process. Instruct. The recognition result is returned to the user terminal. The recognition process request receiving means 10 also includes an out-of-vocabulary notifying means 11 for notifying the user that it is out of the vocabulary, and a dictionary updating means 12 for accepting registration from the user and updating the language model 22. Yes.

音声認識処理手段２０は、認識処理要求受信手段１０から受信した認識要求に従い、音響モデル２１、言語モデル２２を用いて認識処理を行ない、認識結果を認識処理要求受信手段１０に返信する。 The speech recognition processing means 20 performs recognition processing using the acoustic model 21 and the language model 22 in accordance with the recognition request received from the recognition processing request receiving means 10, and returns the recognition result to the recognition processing request receiving means 10.

語彙外判定部３０は、音声が入力されると、過去の入力音声との比較を行ない、類似度を測定する。短時間にＮ回（Ｎは、自然数。）にわたり入力された音声が類似している場合には、語彙外の発声であると判定する。語彙外であると判定した場合、語彙外通知手段１１に対して、ユーザに対して語彙外である旨を通知するように指示する。一方、語彙外でないと判定した場合、通常の音声認識処理へ移行する。語彙外通知手段１１は、ユーザに対して語彙外である旨を通知すると共に、言語モデル（辞書）への登録を要求する。ユーザから登録がなされた場合は、語彙外通知手段１１は、それを受信し、辞書更新手段１２は、言語モデル２２に対して単語登録を行なう。 When a voice is input, the out-of-vocabulary determination unit 30 compares the input voice with a past input voice and measures the similarity. If the voices input N times in a short time (N is a natural number) are similar, it is determined that the utterance is outside the vocabulary. If it is determined that it is out of the vocabulary, it instructs the outside vocabulary notifying means 11 to notify the user that it is out of the vocabulary. On the other hand, if it is determined that it is not out of the vocabulary, the routine proceeds to normal speech recognition processing. The non-vocabulary notifying means 11 notifies the user that it is out of the vocabulary and requests registration to the language model (dictionary). When the registration is performed by the user, the extra-vocabulary notification unit 11 receives the registration, and the dictionary update unit 12 registers the word in the language model 22.

図２は、語彙外通知手段および辞書更新手段の概念を示す図である。語彙外判定部３０から、語彙外であると判定されると、その旨を示す「語彙外フラグ情報」が語彙外通知手段１１に入力される（ステップＳ１１）。次に、語彙外通知手段１１は、ユーザに対して「語彙外通知」と「単語登録要求（依頼）」を送信する（ステップＳ１２）。辞書更新手段１２は、ユーザから「単語・読み」情報を受信すると（ステップＳ１３）、言語モデルへ反映する（ステップＳ１４）。この場合、例えば、記述文法である場合は、辞書および文法辞書に登録する。また、ディクテーションである場合は、Ｎグラムに追加する。 FIG. 2 is a diagram showing the concept of the extra-vocabulary notification means and the dictionary update means. If the non-vocabulary determination unit 30 determines that it is outside the vocabulary, “non-vocabulary flag information” indicating that is input to the non-vocabulary notification means 11 (step S11). Next, the non-vocabulary notification means 11 transmits a “non-vocabulary notification” and a “word registration request (request)” to the user (step S12). When the dictionary updating means 12 receives the “word / reading” information from the user (step S13), it reflects it in the language model (step S14). In this case, for example, in the case of descriptive grammar, it is registered in a dictionary and a grammar dictionary. If it is a dictation, it is added to the N-gram.

図３は、語彙外判定部３０の機能構成を示す図である。語彙外判定部３０は、類似度測定手段３１と、過去履歴ＤＢ３３とを備えている。類似度測定手段３１における類似度の測定により、語彙外であると判定された場合、その旨を示す「語彙外フラグ情報」が類似度測定手段３１から語彙外通知手段１１に入力される。 FIG. 3 is a diagram illustrating a functional configuration of the out-of-vocabulary determination unit 30. The out-of-vocabulary determination unit 30 includes a similarity measurement unit 31 and a past history DB 33. When it is determined by the similarity measurement unit 31 that the word is out of the vocabulary, “non-vocabulary flag information” indicating that is input from the similarity measurement unit 31 to the out-of-vocabulary notification unit 11.

図４は、図３に示した過去履歴ＤＢ３３の概略構成を示す図である。蓄積音声ＤＢ３３ａは、入力された音声を蓄積するＤＢである。蓄積される音声データは、ＰＣＭ形式などの音声データの他、スペクトル領域のデータ、ケプストラム領域のデータ、ＶＱデータなどであってもよい。認識結果ＤＢ３３ｂは、認識結果を蓄積するＤＢである。蓄積される認識結果は、認識された文字および認識スコアである。認識スコアは、さらに音響尤度と言語確率に別けて保持してもよい。アクセス情報ＤＢ３３ｃは、アクセス情報を蓄積するＤＢである。蓄積されるアクセス情報としては、アクセス時間、アクセスユーザＩＤ、および、対応する蓄積音声ＤＢに格納された音声データ名、対応する認識結果ＤＢに格納された認識結果ファイル名がある。 FIG. 4 is a diagram showing a schematic configuration of the past history DB 33 shown in FIG. The stored voice DB 33a is a DB that stores input voice. The voice data to be stored may be spectrum data, cepstrum data, VQ data, etc. in addition to voice data in the PCM format or the like. The recognition result DB 33b is a DB that accumulates recognition results. The recognition results that are accumulated are recognized characters and recognition scores. The recognition score may be further held separately for acoustic likelihood and language probability. The access information DB 33c is a DB that accumulates access information. The access information to be accumulated includes access time, access user ID, voice data name stored in the corresponding stored voice DB, and recognition result file name stored in the corresponding recognition result DB.

図５は、図３に示した類似度測定手段３１の概略構成を示すブロック図である。類似度測定手段３１は、入力音声と過去に発声された蓄積音声データの類似度を判定する。また、認識処理後は、認識結果（文字列）の距離を求め、認識結果間距離情報テーブルに格納する。求めた音声データ間の距離が閾値以下である場合に、両者が類似していると判定する。図５において、類似度判定制御手段３１ａは、認識処理要求受信手段１０から入力音声データとユーザＩＤを受信した後、蓄積音声ＤＢ３３ａにある同一ユーザＩＤの音声データを取得する。アクセス情報分析手段３１ｂは、認識処理要求受信手段１０から入力音声データとユーザＩＤを受信した後、該当ユーザＩＤのアクセス情報を取得し、類似度を判定するために用いる音声データを選択する。以下の条件を満足するものを、類似度を測定する音声データとして選択する。
（条件）現時刻からＴ以内にＮ（Ｎは、自然数。）回以上発生された同一ユーザの音声であること。 FIG. 5 is a block diagram showing a schematic configuration of the similarity measuring means 31 shown in FIG. The similarity measurer 31 determines the similarity between the input voice and accumulated voice data uttered in the past. Further, after the recognition process, the distance of the recognition result (character string) is obtained and stored in the distance information table between recognition results. When the obtained distance between the audio data is equal to or less than the threshold, it is determined that the two are similar. In FIG. 5, the similarity determination control means 31a receives the input voice data and the user ID from the recognition process request receiving means 10, and then acquires the voice data of the same user ID in the accumulated voice DB 33a. After receiving the input voice data and the user ID from the recognition process request receiving means 10, the access information analyzing means 31b acquires the access information of the corresponding user ID and selects the voice data used for determining the similarity. Those satisfying the following conditions are selected as audio data for measuring the similarity.
(Condition) The voice of the same user generated N times (N is a natural number) within T from the current time.

データ加工手段３１ｃは、入力音声、および蓄積音声ＤＢ３３ａから取得した音声データを同じ種類のデータ形式に加工する。例えば、両者がＰＣＭ等の音声データやスペクトル領域のデータである場合には、スペクトル領域のデータ、ケプストラム領域のデータ、ＶＱデータなどに加工する。両者が、ケプストラム領域のデータの場合には、ケプストラム領域のデータ、ＶＱデータなどに加工する。両者がＶＱデータの場合にはそのままにする。 The data processing unit 31c processes the input voice and the voice data acquired from the stored voice DB 33a into the same type of data format. For example, if both are audio data such as PCM or spectral domain data, they are processed into spectral domain data, cepstrum domain data, VQ data, and the like. If both are cepstrum area data, they are processed into cepstrum area data, VQ data, and the like. When both are VQ data, they are left as they are.

距離計算手段３１ｄは、音声データ間の距離または、認識結果の距離を計算する。求めた距離は、音声間距離情報テーブルや認識結果間距離情報テーブル、および語彙外発声判定手段３１ｅに出力する。 The distance calculation means 31d calculates the distance between the voice data or the distance of the recognition result. The obtained distance is output to the inter-speech distance information table, the recognition result inter-distance information table, and the extra-vocabulary utterance determination unit 31e.

語彙外発声判定手段３１ｅは、距離計算手段３１ｄで求めた距離が閾値以下である場合に、類似していると判定する。判定結果は、語彙外通知手段１１に出力される。 The out-of-vocabulary utterance determination unit 31e determines that they are similar when the distance obtained by the distance calculation unit 31d is equal to or less than a threshold value. The determination result is output to the non-vocabulary notification unit 11.

図６は、図５に示した距離計算手段３１ｄが行なう距離計算方法の概念を示す図である。この距離計算では、異なるフレーム数の２つの音声の距離は、ＤＴＷ（動的時間伸縮法）を用いて求める。各フレーム間の距離の例として、以下の距離尺度がある。
（１）対数スペクトル、ＬＰＣスペクトル、ケプストラム、ＶＱデータのユークリッド距離。
（２）ＬＰＣスペクトルを用いた最尤スペクトル距離。
（３）Ｃｏｓｈ尺度。 FIG. 6 is a diagram showing the concept of the distance calculation method performed by the distance calculation means 31d shown in FIG. In this distance calculation, the distance between two voices having different numbers of frames is obtained using DTW (Dynamic Time Stretching Method). Examples of distances between frames include the following distance measures.
(1) Logarithmic spectrum, LPC spectrum, cepstrum, Euclidean distance of VQ data.
(2) Maximum likelihood spectral distance using LPC spectrum.
(3) Cosh scale.

図７は、図５に示した語彙外発声判定手段３１ｅの動作の概念を示す図である。語彙外発声判定手段３１ｅは、アクセス時刻が、現時刻からＴ以内、かつＮ回以上の発生であって、類似度が高い（距離が閾値以下）場合に、入力された音声が語彙外であると判定する。 FIG. 7 is a diagram showing the concept of the operation of the out-of-vocabulary utterance determination unit 31e shown in FIG. The out-of-vocabulary utterance determination unit 31e has an input voice that is out of the vocabulary when the access time is within T from the current time and occurs N times or more and the similarity is high (distance is equal to or less than a threshold). Is determined.

図８は、本実施形態に係る音声認識装置の動作を示すフローチャートである。音声認識装置は、認識要求を受信すると（ステップＳ８０）、入力された音声データとユーザＩＤを語彙外判定部３０に入力する。そして、語彙外判定部３０では、類似度測定手段３１が類似度を測定する。次に、語彙外判定部３０が、語彙外であるかどうかを判定する（ステップＳ８１）。この判定の結果、語彙外でない場合は、通常の認識処理を行なう（ステップＳ８２）。すなわち、認識処理要求受信手段１０から音声認識処理手段２０に対して音声データおよびユーザＩＤが入力される。 FIG. 8 is a flowchart showing the operation of the speech recognition apparatus according to this embodiment. When receiving the recognition request (step S80), the speech recognition apparatus inputs the input speech data and user ID to the out-of-vocabulary determination unit 30. Then, in the out-of-vocabulary determining unit 30, the similarity measuring unit 31 measures the similarity. Next, the non-vocabulary determination unit 30 determines whether or not it is out of the vocabulary (step S81). If the result of this determination is not outside the vocabulary, normal recognition processing is performed (step S82). That is, voice data and a user ID are input from the recognition process request receiving unit 10 to the voice recognition processing unit 20.

一方、ステップＳ８１において、語彙外であると判定された場合は、語彙外判定部３０は、語彙外フラグ情報を語彙外通知手段１１へ送信する。そして、語彙外通知手段１１が、ユーザに対して語彙外である旨を通知する（ステップＳ８３）。また、ユーザに対して、「単語登録要求（依頼）」を送信する（ステップＳ８３）。次に、ユーザから「単語・読み」を受信した場合（ステップＳ８４）、辞書更新手段１２へその情報を送信する。辞書更新手段１２は、言語モデル２２に単語登録を行なう（ステップＳ８５）。 On the other hand, if it is determined in step S81 that it is out of the vocabulary, the out-of-vocabulary determination unit 30 transmits the out-of-vocabulary flag information to the out-of-vocabulary notification means 11. Then, the non-vocabulary notifying means 11 notifies the user that it is out of the vocabulary (step S83). In addition, a “word registration request (request)” is transmitted to the user (step S83). Next, when “word / reading” is received from the user (step S84), the information is transmitted to the dictionary updating means 12. The dictionary update unit 12 registers words in the language model 22 (step S85).

以上のような本発明の特徴的な動作は、コンピュータにプログラムを実行させることによって行なうことが可能である。すなわち、本発明のプログラムは、音声認識装置において、ユーザから音声認識要求と共に入力された音声信号が語彙外であった場合に、ユーザに対して語彙外である旨を通知する機能を実現させるプログラムであって、ユーザから音声認識要求および音声信号の入力を受け付ける処理と、前記入力された音声信号が語彙外であるかどうかを判定する処理と、入力された音声信号が語彙外であった場合は、ユーザに対して語彙外である旨を通知する処理と、入力された音声信号が語彙外であった場合は、ユーザに対して、前記語彙外であるとされた音声の前記言語モデルへの登録を要求する処理と、を含む一連の処理を、コンピュータで読み取りおよび実行可能にコマンド化したことを特徴としている。 The characteristic operations of the present invention as described above can be performed by causing a computer to execute a program. That is, the program of the present invention is a program for realizing a function of notifying a user that a word is out of the vocabulary when the voice signal input together with the voice recognition request from the user is out of the vocabulary. A process for receiving a voice recognition request and input of a voice signal from a user, a process for determining whether the input voice signal is out of the vocabulary, and a case where the input voice signal is out of the vocabulary Is a process of notifying the user that it is out of the vocabulary, and if the input speech signal is out of the vocabulary, the user is sent to the language model of the speech that is out of the vocabulary. A series of processes including a process for requesting registration of a computer is converted into a command that can be read and executed by a computer.

本実施形態に係る音声認識装置の概略構成を示す図である。It is a figure which shows schematic structure of the speech recognition apparatus which concerns on this embodiment. 語彙外通知手段および辞書更新手段の概念を示す図である。It is a figure which shows the concept of a non-vocabulary notification means and a dictionary update means. 語彙外判定部３０の機能構成を示す図である。It is a figure which shows the function structure of the non-vocabulary determination part. 図３に示した過去履歴ＤＢ３３の概略構成を示す図である。It is a figure which shows schematic structure of the past log | history DB33 shown in FIG. 図３に示した類似度測定手段３１の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the similarity measuring means 31 shown in FIG. 図５に示した距離計算手段３１ｄが行なう距離計算方法の概念を示す図である。It is a figure which shows the concept of the distance calculation method which the distance calculation means 31d shown in FIG. 5 performs. 図５に示した語彙外発声判定手段３１ｅの動作の概念を示す図である。It is a figure which shows the concept of operation | movement of the vocabulary utterance determination means 31e shown in FIG. 本実施形態に係る音声認識装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech recognition apparatus which concerns on this embodiment.

Explanation of symbols

１０認識処理要求受信手段
１１語彙外通知手段
１２辞書更新手段
２０音声認識処理手段
２１音響モデル
２２言語モデル
３０語彙外判定部
３１類似度測定手段
３１ａ類似度判定制御手段
３１ｂアクセス情報分析手段
３１ｃデータ加工手段
３１ｄ距離計算手段
３１ｅ語彙外発声判定手段
３３過去履歴ＤＢ
３３ａ蓄積音声ＤＢ
３３ｂ認識結果ＤＢ
３３ｃアクセス情報ＤＢ
３３ｄ再発声種別ＤＢ
DESCRIPTION OF SYMBOLS 10 Recognition process request | requirement receiving means 11 Out-of-vocabulary notification means 12 Dictionary update means 20 Speech recognition processing means 21 Acoustic model 22 Language model 30 Out-of-vocabulary determination part 31 Similarity measurement means 31a Similarity determination control means 31b Access information analysis means 31c Data processing Means 31d Distance calculation means 31e Out-of-vocabulary utterance judgment means 33 Past history DB
33a Accumulated voice DB
33b Recognition result DB
33c Access information DB
33d Recurrence type DB

Claims

A voice recognition device for notifying a user that a voice signal input together with a voice recognition request from the user is out of the vocabulary.
A recognition process request receiving means for receiving an input of a voice recognition request and a voice signal from a user;
An out-of-vocabulary determination unit for determining whether the input audio signal is out of the vocabulary; and
Using an acoustic model and a language model, and a voice recognition processing means for performing speech recognition processing of the input audio signal,
The out-of-vocabulary determination unit
Similarity measuring means for measuring the similarity between the voice signal input together with the voice recognition request and a voice signal input in the past;
An out-of-vocabulary utterance determination unit that determines that a recurrent utterance is out of the vocabulary when there is a past sound having a high similarity to the sound signal input together with the sound recognition request within a predetermined period from the current time. ,
If the input speech signal is out of the vocabulary, the user is notified that the speech is out of the vocabulary, and the registration of the speech that is out of the vocabulary is requested to the language model. Voice recognition device.

A program for realizing a function of notifying a user that a voice signal input together with a voice recognition request from a user is out of the vocabulary when the voice signal input from the user is out of the vocabulary,
A process of accepting a voice recognition request and a voice signal input from a user;
The similarity between the voice signal input together with the voice recognition request and the voice signal input in the past are measured, and the degree of similarity between the voice signal input together with the voice recognition request is high within a predetermined period from the current time. When there is a past voice, a process for determining whether the input voice signal is outside the vocabulary by determining that the voice is a recurrent voice outside the vocabulary ; and
If the input audio signal is out of vocabulary, processing to notify the user that it is out of vocabulary; and
When the input speech signal is out of the vocabulary , processing for requesting the user to register the content of the utterance that is out of the vocabulary into the language model used for the speech recognition processing; A program characterized in that a series of processes including are converted into commands that can be read and executed by a computer.