JP3259734B2

JP3259734B2 - Voice recognition device

Info

Publication number: JP3259734B2
Application number: JP00760992A
Authority: JP
Inventors: 喜久美鏑木
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1992-01-20
Filing date: 1992-01-20
Publication date: 2002-02-25
Anticipated expiration: 2017-02-25
Also published as: JPH05197390A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device.

【０００２】[0002]

【従来の技術】音声認識装置とは、音声による入力デー
タを文字データとして認識する装置をいう。この音声認
識装置は、キー操作をすることなくデータを入力するこ
とができるため、キー入力装置のキー配列やキー操作方
法などを知らなくても、誰でも簡単に操作することがで
きる。このような従来の音声認識装置の構成を、図４の
ブロック図に示す。同図より、従来の音声認識装置は、
入力された音声を音声入力部１１０で分析し、ディジタ
ル信号に変換して出力する。音声認識部１２０では、音
声入力部１１０からの出力データに基づいて、音声認識
用辞書に登録された基本パターンとマッチングさせるこ
とにより入力音声の認識を行う。この認識結果は操作者
が確認できるように認識結果表示部１３０に表示され
る。また、認識結果は記憶部１４０にも送られ、入力音
声を分析した特徴抽出の結果データとして保存される。2. Description of the Related Art A speech recognition apparatus is an apparatus for recognizing speech input data as character data. Since this voice recognition device can input data without performing key operations, anyone can easily operate the voice recognition device without knowing the key layout and key operation method of the key input device. The configuration of such a conventional speech recognition apparatus is shown in the block diagram of FIG. As shown in FIG.
The input voice is analyzed by the voice input unit 110, converted into a digital signal and output. The voice recognition unit 120 recognizes the input voice by performing matching with a basic pattern registered in a voice recognition dictionary based on output data from the voice input unit 110. This recognition result is displayed on the recognition result display unit 130 so that the operator can confirm it. The recognition result is also sent to the storage unit 140 and stored as feature extraction result data obtained by analyzing the input speech.

【０００３】[0003]

【発明が解決しようとする課題】ところで、従来の音声
認識装置は、操作者が正確に発音して入力しても、誤っ
て認識されることがあった。このような誤認識が発生す
る原因として、従来装置の使用環境における背景雑音の
発生や、文章単位の入力音声データに対して文法的知識
や文の意味的知識を用いて処理されていないことなどが
挙げられる。特に、助詞は明確に発音されることが少な
いので、助詞を正確に認識できるかによって、入力音声
データ全体の認識率が大きく変わることがある。However, in the conventional voice recognition apparatus, even if the operator pronounces and inputs a sound correctly, the voice recognition apparatus may be erroneously recognized. Possible causes of such misrecognition include background noise in the environment in which the conventional device is used, and the fact that input speech data in units of sentences is not processed using grammatical knowledge or semantic knowledge of sentences. Is mentioned. In particular, since particles are rarely pronounced clearly, the recognition rate of the entire input voice data may vary greatly depending on whether particles can be accurately recognized.

【０００４】本発明は、このような原因を解消して、入
力音声データを正しく認識できる音声認識装置を提供す
ることを目的とする。[0004] It is an object of the present invention to provide a speech recognition apparatus capable of resolving such causes and correctly recognizing input speech data.

【０００５】[0005]

【課題を解決するための手段】上記課題を解決するため
に、本発明の音声識別装置は、入力された音声データの
適正な発話単位を抽出する発話単位抽出部と、発話単位
抽出部で抽出された発話単位での入力を指示する入力指
示部と、発話単位ごとに複数の辞書を有し、発話単位抽
出部で抽出された発話単位の辞書を用いて、入力される
音声データを認識する音声認識部とを備える。In order to solve the above-mentioned problems, a speech recognition apparatus according to the present invention comprises an utterance unit extraction unit for extracting a proper utterance unit of input speech data, and an utterance unit extraction unit. An input instructing unit for instructing an input in the uttered unit and a plurality of dictionaries for each uttered unit, and the input speech data is recognized using the utterance unit dictionary extracted by the uttered unit extracting unit. A voice recognition unit.

【０００６】[0006]

【作用】本発明の音声認識装置によれば、入力された音
声データの適正な発話単位が発話単位抽出部で抽出され
る。この発話単位の情報を受けた入力指示部は、この発
話単位で音声入力するよう指示を与える。また、音声認
識部では、発話単位ごとに備えられた複数の辞書の中か
ら、発話単位抽出部で抽出された発話単位の辞書が選択
される。したがって、入力指示部での指示に合わせた発
話単位で入力された音声データであれば、同じ発話単位
の辞書を用いて、音声認識部で音声認識される。According to the speech recognition apparatus of the present invention, an appropriate speech unit of the input speech data is extracted by the speech unit extraction unit. The input instruction unit having received the information of the utterance unit gives an instruction to input a voice in the utterance unit. Further, the speech recognition unit selects a dictionary of speech units extracted by the speech unit extraction unit from a plurality of dictionaries provided for each speech unit. Therefore, if the voice data is input in units of utterances in accordance with the instruction in the input instruction unit, the voice recognition unit uses the dictionary of the same utterance unit to perform voice recognition.

【０００７】[0007]

【実施例】以下、本発明の一実施例について、添付図面
を用いて説明する。図１は、本実施例の音声認識装置の
概要を示すブロック図である。同図より、音声入力部１
０に入力された音声データは音声認識部２０に与えら
れ、音声認識処理が行われる。この認識結果は認識結果
表示部３０に表示されると共に、発話単位抽出部４０に
与えられる。発話単位抽出部４０では、この認識結果の
データの他に、背景雑音データや訂正入力データが与え
られ、音声入力部１０に入力される音声データの適正な
発話単位が抽出される。この抽出結果が入力指示部であ
る発話単位表示部５０に表示され、適正な発話単位に区
切って音声データを入力するように指示が与えられる。
また、この抽出結果は音声認識部２０にも与えられ、こ
の抽出結果から適正な音声認識用辞書が選択されて、選
択された音声認識用辞書を用いて入力音声データが認識
される。An embodiment of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram illustrating an outline of the speech recognition apparatus according to the present embodiment. As shown in FIG.
The voice data input to 0 is provided to the voice recognition unit 20, and voice recognition processing is performed. This recognition result is displayed on the recognition result display unit 30 and is also given to the utterance unit extraction unit 40. The utterance unit extraction unit 40 receives background noise data and correction input data in addition to the recognition result data, and extracts an appropriate utterance unit of the voice data input to the voice input unit 10. The result of the extraction is displayed on the utterance unit display unit 50, which is an input instruction unit, and an instruction is given to input audio data in appropriate utterance units.
The extraction result is also provided to the speech recognition unit 20, and an appropriate speech recognition dictionary is selected from the extraction result, and input speech data is recognized using the selected speech recognition dictionary.

【０００８】本実施例の詳細な説明を行う前に、本実施
例の処理の特徴について説明する。人間が発話できる音
声の文法的な単位には、「音素」、「単語」、「文
節」、「文章」などがある。本実施例では、音声認識装
置に入力する音声のこのような発話単位に着目してお
り、会話内容の難易度や、背景雑音のレベルなどの情報
から、どの発話単位で区切って音声入力するのが最適か
を判断し、最適な発話単位での音声入力を指示してい
る。つまり、会話内容の難易度が低い場合や背景雑音の
レベルが低い場合には、認識処理で誤認識する可能性が
低いので、「文節」や「文章」ごとに区切った音声入力
を指示し、会話内容の難易度が高い場合や背景雑音のレ
ベルが高い場合には、認識処理で誤認識する可能性が高
いので、「音素」や「単語」ごとに区切った音声入力を
指示するのである。この音声入力における最適な発話単
位の指示が、本実施例の１番目の特徴であり、発話単位
抽出部４０で行っている。このような発話単位抽出部４
０での指示を受けて、操作者が指示通りの発話単位で区
切って入力した音声データは、指示された発話単位に最
適な音声認識辞書を用いて音声認識される。この音声認
識処理での最適な辞書の選択が本実施例の２番目の特徴
であり、音声認識部２０で行っている。Before giving a detailed description of this embodiment, the features of the processing of this embodiment will be described. The grammatical units of speech that can be uttered by humans include “phonemes”, “words”, “phrases”, and “sentences”. In the present embodiment, attention is paid to such utterance units of the voice input to the voice recognition device, and the utterance unit is divided into any utterance units from information such as the difficulty level of the conversation content and the level of the background noise. Is determined to be optimal, and a voice input in an optimal utterance unit is instructed. In other words, if the difficulty level of the conversation content is low or the level of the background noise is low, there is a low possibility of misrecognition in the recognition process. When the difficulty level of the conversation content is high or the level of the background noise is high, there is a high possibility that the recognition process will cause erroneous recognition. Therefore, a voice input divided for each "phoneme" or "word" is instructed. The instruction of the optimal utterance unit in the voice input is the first feature of the present embodiment, and is performed by the utterance unit extraction unit 40. Such utterance unit extraction unit 4
In response to the instruction at 0, the speech data input by the operator in units of utterances as instructed is speech-recognized using a speech recognition dictionary most suitable for the instructed utterance unit. The selection of an optimal dictionary in the voice recognition processing is the second feature of the present embodiment, and is performed by the voice recognition unit 20.

【０００９】本実施例の処理の流れは、人間が雑音レベ
ルの高い環境で電話をする場合と良く似ている。例え
ば、「明日の３時に成田に着く。」と発話して、相手が
聞き取れなかった場合には、「明日の」、「３時に」、
「成田に」、「着く」と文節単位で発話する。それで
も、聞き取れなかった場合には、単に、「明日」、「３
時」、「成田」と発話して、さらに、相手が「成田」と
「羽田」を聞き間違えた場合には、「ナ」、「リ」、
「タ」と音素で発話する。人間はこのようにその時々で
最も適当と思われる発話単位を判断して、会話を進めて
いる。本実施例では、適正な発話単位を人間ではなく、
装置側で判断し、その情報を操作者に伝えることによっ
て、よりスムーズな音声認識処理を行っている。The flow of processing in this embodiment is very similar to the case where a person calls in an environment with a high noise level. For example, if you say "I will arrive at Narita at 3 o'clock tomorrow" and the other party cannot hear you, "tomorrow", "3 o'clock",
Say "to Narita" and "to arrive" in units of phrases. If you still could n’t hear it, simply say “tomorrow”, “3
"Time", "Narita", and if the other party mistakenly hears "Narita" and "Haneda", "Na", "Li",
Say “ta” with phonemes. In this way, humans proceed with the conversation by determining the most appropriate utterance unit at that time. In this embodiment, an appropriate utterance unit is not a human,
By making a determination on the device side and transmitting the information to the operator, smoother voice recognition processing is performed.

【００１０】次に、本実施例の音声認識装置の詳細な構
成を、図２のブロック図を用いて説明する。同図より、
音声入力部１０には、音声を入力するマイク１１と、入
力した音声データをデジタル信号としてサンプリングす
る高域強調フィルタ１２およびＡＤ変換器１３が備えら
れている。また、音声認識部２０には、デジタル信号に
変換された音声信号を周波数変換して周波数領域での特
徴パラメータ列を抽出する特徴抽出回路２１と、抽出さ
れた特徴パラメータ列を記憶する特徴パラメータ列記憶
回路２２と、特徴パラメータ列の音声認識処理を行うＤ
Ｐマッチング回路２３と、発話単位ごとに複数の音声認
識用辞書を有する認識辞書群２４とが備えられている。
さらに、認識結果表示部３０には、音声認識結果を表示
するディスプレイ装置３１と、ディスプレイ装置３１を
制御する表示制御回路３２とが備えられている。また、
発話単位抽出部４０には、音声認識部２０での認識結果
を記憶する音声認識記憶回路４１と、音声認識記憶回路
４１に記憶された音声認識結果を分析して発話内容の難
易度を判定する難易度判定部４２と、ディスプレイ装置
３１に表示された音声認識結果から認識結果に誤りがあ
ることが判明した場合に訂正データを入力する訂正入力
部４３と、この訂正データを基にして音声認識の認識率
を検出する認識率検出部４４と、背景音を測定するマイ
ク４５と、測定された背景音から背景雑音データを分析
する雑音検出部である高域強調フィルタ４６と、難易度
判定部４２で判定された難易度データと認識率検出部で
検出された認識率データと高域強調フィルタ４６で分析
された背景雑音データを入力して、適正な発話単位を判
定する適正発話単位判定部４７と、この判定結果を記憶
する発話単位記憶回路４８とが備えられている。さら
に、発話単位表示部５０には、認識率検出部４４で検出
された認識率と発話単位記憶回路４８に記憶された適正
発話単位を表示するディスプレイ装置５１と、ディスプ
レイ装置５１を制御する表示制御回路５２とが備えられ
ている。Next, the detailed configuration of the speech recognition apparatus of the present embodiment will be described with reference to the block diagram of FIG. From the figure,
The audio input unit 10 includes a microphone 11 for inputting audio, a high-frequency emphasis filter 12 for sampling input audio data as a digital signal, and an AD converter 13. The speech recognition unit 20 includes a feature extraction circuit 21 that performs frequency conversion on a speech signal converted into a digital signal and extracts a feature parameter sequence in a frequency domain, and a feature parameter sequence that stores the extracted feature parameter sequence. A storage circuit 22 for performing a speech recognition process on the characteristic parameter sequence;
A P matching circuit 23 and a recognition dictionary group 24 having a plurality of speech recognition dictionaries for each utterance unit are provided.
Further, the recognition result display unit 30 includes a display device 31 for displaying a speech recognition result, and a display control circuit 32 for controlling the display device 31. Also,
The utterance unit extraction unit 40 analyzes the speech recognition result stored in the speech recognition storage circuit 41 and the speech recognition storage circuit 41 that stores the recognition result of the speech recognition unit 20, and determines the difficulty of the utterance content. A difficulty level determination unit 42, a correction input unit 43 for inputting correction data when it is determined from the speech recognition result displayed on the display device 31 that there is an error in the recognition result, and speech recognition based on the correction data. , A microphone 45 for measuring the background sound, a high-frequency emphasis filter 46 as a noise detecting unit for analyzing background noise data from the measured background sound, and a difficulty level determining unit The difficulty utterance data determined in step S42, the recognition rate data detected by the recognition rate detection unit, and the background noise data analyzed by the high-frequency emphasis filter 46 are input to determine the appropriate utterance unit to determine the appropriate utterance unit A determination unit 47, a speech unit memory circuit 48 for storing the determination result is provided. Further, the utterance unit display unit 50 includes a display device 51 for displaying the recognition rate detected by the recognition rate detection unit 44 and the proper utterance unit stored in the utterance unit storage circuit 48, and a display control for controlling the display device 51. A circuit 52 is provided.

【００１１】次に、本実施例の処理の流れについて説明
する。操作者によってマイク１１から入力された音声デ
ータは、高域強調フィルタ１２とＡＤ変換器１３を通し
て、デジタル信号に変換される。この音声信号が特徴抽
出回路２１で周波数変換されて、周波数領域での特徴パ
ラメータ列が抽出される。抽出された特徴パラメータ列
は、特徴パラメータ列記憶回路２２に記憶される。ＤＰ
マッチング回路では、特徴パラメータ列記憶回路２２に
記憶された音声信号を読み出して、認識辞書群２４に登
録されている複数の音声認識用辞書の中からいずれかの
辞書を用いて、ＤＰマッチング処理が行われる。この処
理で音声信号が符号列として認識され、認識結果表示部
３０および音声認識記憶回路４１に転送される。認識結
果表示部３０に転送された符号列は、ディスプレイ装置
３１に認識結果として表示され、操作者が確認すること
ができる。認識結果に誤りがあった場合には、操作者は
訂正入力部４３に訂正データを入力することができる。
この訂正データは認識率検出部４４に送られて認識率が
検出される。そして、認識率検出部４４で検出された認
識率データが適正発話単位判定部４７に送られる。ま
た、ＤＰマッチング回路２３から音声認識記憶回路４１
に転送された符号列は、難易度判定部４２に転送され
る。難易度判定部４２では、単語の頻出度から算出した
難易度情報や、会話形式の一つの特徴である文章の丁寧
さ情報や、一つの文章の長さ情報や、文章の係受け情報
等を指標として、入力された符号列の難易度が総合的に
判定される。この判定結果が難易度データとして適正発
話単位判定部４７に送られる。さらに、マイク４５で測
定された背景音が高域強調フィルタ４６で分析され、背
景雑音のレベル、種類等、音声認識率への影響の強さの
情報などが背景雑音データとして適正発話単位判定部４
７に送られる。Next, the flow of the processing of this embodiment will be described. The audio data input from the microphone 11 by the operator is converted into a digital signal through the high-frequency emphasis filter 12 and the AD converter 13. This audio signal is frequency-converted by the characteristic extraction circuit 21 to extract a characteristic parameter sequence in the frequency domain. The extracted characteristic parameter sequence is stored in the characteristic parameter sequence storage circuit 22. DP
In the matching circuit, the speech signal stored in the feature parameter string storage circuit 22 is read, and DP matching processing is performed using one of a plurality of speech recognition dictionaries registered in the recognition dictionary group 24. Done. In this process, the speech signal is recognized as a code string, and transferred to the recognition result display unit 30 and the speech recognition storage circuit 41. The code string transferred to the recognition result display unit 30 is displayed as a recognition result on the display device 31 so that the operator can confirm it. If there is an error in the recognition result, the operator can input correction data to the correction input unit 43.
The corrected data is sent to the recognition rate detection unit 44, where the recognition rate is detected. The recognition rate data detected by the recognition rate detecting unit 44 is sent to the appropriate utterance unit determining unit 47. Also, from the DP matching circuit 23 to the speech recognition storage circuit 41
Are transferred to the difficulty level determination unit 42. The difficulty determination unit 42 calculates the difficulty information calculated from the frequency of words, the politeness information of a sentence that is one of the features of the conversation format, the length information of one sentence, and the information related to the sentence. As an index, the degree of difficulty of the input code string is comprehensively determined. This determination result is sent to the appropriate utterance unit determination unit 47 as difficulty level data. Further, the background sound measured by the microphone 45 is analyzed by the high-frequency emphasis filter 46, and information on the influence of the background noise level and type, such as the level of the background noise, on the speech recognition rate is determined as background noise data. 4
7

【００１２】適正発話単位判定部４７では、このように
して入力された認識率データ、難易度データ、背景雑音
データに基づいて適正な発話単位が判定され、判定結果
が発話単位記憶回路４８に記憶される。また、この判定
結果が発話単位表示部５０のディスプレイ装置５１に表
示され、次に音声入力を行う際の適正な入力単位の指示
が操作者に与えられる。つまり、適正発話単位判定部４
７で最も音声認識に適している状態であると判定された
場合には、文章単位あるいは複数の文章単位での音声入
力を指示した情報がディスプレイ装置５１に表示される
のである。また、文節および単語単位での入力を行わな
いと十分に認識率が得られない場合には、文節および単
語単位での音声入力を指示した情報がディスプレイ装置
５１に表示されるのである。さらに、劣悪な環境の下で
は、音素単位での音声入力を指示した情報がディスプレ
イ装置５１に表示されるのである。The appropriate utterance unit determination unit 47 determines an appropriate utterance unit based on the recognition rate data, difficulty level data, and background noise data thus input, and stores the determination result in the utterance unit storage circuit 48. Is done. In addition, the result of this determination is displayed on the display device 51 of the utterance unit display unit 50, and the operator is given an instruction on an appropriate input unit when the next voice input is performed. That is, the appropriate utterance unit determination unit 4
If it is determined in step 7 that the state is most suitable for speech recognition, information indicating an instruction to input speech in units of sentences or a plurality of sentences is displayed on the display device 51. If the recognition rate cannot be sufficiently obtained unless the input is performed in units of phrases and words, the information instructing the voice input in units of phrases and words is displayed on the display device 51. Further, in a poor environment, information instructing voice input in units of phonemes is displayed on the display device 51.

【００１３】発話単位記憶回路４８に記憶された発話単
位情報は、ＤＰマッチング回路２３に与えられる。そし
て、指示された発話単位で入力される音声データとのＤ
Ｐマッチングに最適な音声認識用辞書が、認識辞書群２
４に登録された複数の音声認識用辞書の中から選択され
る。つまり、指示された発話単位が「音素」の場合に
は、認識辞書群２４に登録された「音素情報」の辞書が
選択される。また、指示された発話単位が「単語」の場
合には、認識辞書群２４に登録された「単語情報」の辞
書が選択される。さらに、指示された発話単位が「文
節」の場合には、認識辞書群２４に登録された「文法情
報」と「意味情報」の「文節」に関する辞書が選択され
る。また、指示された発話単位が「文章」の場合には、
認識辞書群２４に登録された「文法情報」と「意味情
報」の「文章」に関する辞書が選択される。このように
選択された辞書を用いて、音声認識処理が行われる。The utterance unit information stored in the utterance unit storage circuit 48 is given to the DP matching circuit 23. Then, D with the voice data input in the specified utterance unit
The most suitable speech recognition dictionary for P matching is the recognition dictionary group 2
4 is selected from a plurality of speech recognition dictionaries registered in. That is, when the specified utterance unit is “phoneme”, the dictionary of “phoneme information” registered in the recognition dictionary group 24 is selected. When the specified utterance unit is “word”, a dictionary of “word information” registered in the recognition dictionary group 24 is selected. Further, when the designated utterance unit is “phrase”, a dictionary related to “phrase” of “grammar information” and “semantic information” registered in the recognition dictionary group 24 is selected. If the specified utterance unit is “sentence”,
A dictionary relating to “sentence” of “grammar information” and “semantic information” registered in the recognition dictionary group 24 is selected. Speech recognition processing is performed using the dictionary selected in this way.

【００１４】この音声認識用辞書の選択処理について、
具体例を用いて説明する。まず、適正な発話単位が「単
語」であるという指示情報を受けた操作者は、次に行う
音声入力を単語単位に区切って行う。例えば、「花笠音
頭」を入力する場合は「花笠」と「音頭」に区切って行
うのである。入力された音声は、特徴抽出回路２１で音
声の特徴が抽出され、特徴パラメータ列記憶回路２２に
記憶される。そして、発話単位記憶回路４８に記憶され
た適正な発話単位の情報がＤＰマッチング回路２３に転
送され、「単語情報」の辞書が認識辞書群２４から選択
される。ＤＰマッチング回路２３では、この「単語情
報」の辞書を用いて、特徴パラメータ列記憶回路２２か
ら読み出した音声信号である特徴パラメータ列とＤＰマ
ッチングを行う。この際に、音声入力単位が「文節」や
「文章」などの場合にのみ有効な「文法情報」「意味情
報」などの情報は一切使用せずに、音声認識処理を進め
ることができ、非常に効率的に処理できる。そして、Ｄ
Ｐマッチング回路２３で認識された符号列がディスプレ
イ装置３１に表示される。Regarding the process of selecting the speech recognition dictionary,
This will be described using a specific example. First, the operator who has received the instruction information that the appropriate utterance unit is “word” performs the next voice input in units of words. For example, when inputting "Hanagasa Ondo", it is performed by dividing into "Hanagasa" and "Ondo". From the input speech, the feature of the speech is extracted by the feature extraction circuit 21 and stored in the feature parameter string storage circuit 22. Then, the information of the appropriate utterance unit stored in the utterance unit storage circuit 48 is transferred to the DP matching circuit 23, and a dictionary of “word information” is selected from the recognition dictionary group 24. The DP matching circuit 23 uses the dictionary of “word information” to perform DP matching with a feature parameter sequence, which is a speech signal read from the feature parameter sequence storage circuit 22. At this time, the speech recognition process can proceed without using any information such as "grammar information" or "semantic information" that is valid only when the speech input unit is "bunsetsu" or "sentence". Can be processed efficiently. And D
The code string recognized by the P matching circuit 23 is displayed on the display device 31.

【００１５】また、ＤＰマッチング回路２３での音声認
識処理が失敗した場合（使用した音声認識辞書に登録さ
れたすべてのデータとの距離がしきい値より大きかった
場合）には、音声認識処理が失敗した旨の表示をディス
プレイ装置３１で行い、操作者に音声の再入力を促す。
再入力時の適正な発話単位は、難易度判定部４２での難
易度データと、認識率検出部４４での認識率データと、
マイク４５から入力される背景雑音データとを加味し
て、適正発話単位判定部４７で行われる適正な入力単位
の再評価によって決定される。On the other hand, if the speech recognition processing in the DP matching circuit 23 has failed (if the distance from all the data registered in the speech recognition dictionary used is larger than the threshold), the speech recognition processing is stopped. The display indicating the failure is performed on the display device 31 to prompt the operator to re-input the voice.
The appropriate utterance unit at the time of re-input is the difficulty level data in the difficulty level determination unit 42, the recognition rate data in the recognition rate detection unit 44,
In consideration of the background noise data input from the microphone 45, the appropriate utterance unit determination unit 47 determines the appropriate input unit by reevaluation.

【００１６】なお、ディスプレイ装置５１に表示される
認識率データを見た操作者が、期待した音声認識率が得
られていないと判断した場合には、訂正入力部４３にそ
の旨の情報を入力することができる。操作者からの音声
認識率が悪いという情報は適正発話単位判定部４７に与
えられ、現時点で適正としている発話単位よりもさらに
短い発話単位を、適正な発話単位として判定する。この
判定結果による指示を受けて、次に入力される音声デー
タは、短い発話単位の入力であるために認識率は向上す
る。When the operator who sees the recognition rate data displayed on the display device 51 determines that the expected speech recognition rate has not been obtained, the operator inputs the information to the correction input section 43. can do. Information from the operator that the speech recognition rate is poor is given to the appropriate utterance unit determination unit 47, and an utterance unit shorter than the utterance unit that is currently appropriate is determined as an appropriate utterance unit. In response to the instruction based on the determination result, the next input voice data is a short utterance unit input, so that the recognition rate is improved.

【００１７】次に、難易度判定部４２の処理について、
詳細に説明する。難易度判定部４２では、入力された符
号列を分析して、単語の頻出度から算出した頻出度情報
と、文章の丁寧さ情報と、文章の長さ情報と、文章の係
受け情報とから総合的に難易度を判定する。単語の頻出
度から算出した頻出度情報は、入力された符号列の頻出
度のポイントを加算して得られる情報である。これは、
頻出度の低い単語ほど文章の難易度が高いことに基づい
ている。文章の丁寧さ情報は、入力された符号列中に含
まれる尊敬語、謙譲語、丁寧語を抽出して得られる情報
である。これは、丁寧な表現を用いた文章の難易度が高
いことに基づいている。文章の長さ情報は、２０以上の
単語からなる文章に１ポイント加算し、１０以下の単語
からなる文章からは１ポイント減算した情報である。こ
れは、単語数が多い文章が難易度の高い文章であるとの
経験則に基づいている。文章の係受け情報は、文章中の
係受け、その他の表現を文法的に１２レベルの難易性で
定義し、この文章の難易性を加算して得られる情報であ
る。これは、文章中で表現されている係受け情報と難易
度との相関関係に基づいている。以上のように算出した
各情報から得られるポイントの総和が計算され、その結
果が入力された符号列の難易度として適正発話単位判定
部４７に送られる。Next, the processing of the difficulty level determination section 42 will be described.
This will be described in detail. The difficulty level determination unit 42 analyzes the input code string, and determines the frequentness information calculated from the frequentness of words, the politeness information of the text, the length information of the text, and the dependency information of the text. Comprehensively determine the difficulty level. The frequency information calculated from the frequency of words is information obtained by adding the frequency points of the input code string. this is,
It is based on the fact that the less frequently the word is, the higher the difficulty of the sentence is. The politeness information of a sentence is information obtained by extracting respectful words, humble words, and polite words contained in the input code string. This is based on the difficulty of sentences using polite expressions. The sentence length information is information obtained by adding one point to a sentence consisting of 20 or more words and subtracting one point from a sentence consisting of 10 or less words. This is based on an empirical rule that a sentence with a large number of words is a sentence with a high degree of difficulty. The sentence dependency information is information obtained by grammatically defining the dependency and other expressions in the sentence with 12 levels of difficulty, and adding the difficulty of the sentence. This is based on the correlation between the dependency information expressed in the text and the difficulty level. The sum of the points obtained from the information calculated as described above is calculated, and the result is sent to the appropriate utterance unit determination unit 47 as the difficulty of the input code string.

【００１８】図３は、本実施例の音声認識装置をパーソ
ナルコンピュータで実現した例である。この例では、認
識結果情報と適正発話単位指示情報を同一画面上に表示
し、操作者の確認作業を容易にしている。FIG. 3 shows an example in which the speech recognition apparatus of this embodiment is realized by a personal computer. In this example, the recognition result information and the appropriate utterance unit instruction information are displayed on the same screen, thereby facilitating the operator's confirmation work.

【００１９】なお、本実施例では、音声入力部１０に、
マイク１１と、高域強調フィルタ１２と、ＡＤ変換器１
３を用いて、デジタル信号としてサンプリングしている
が、迅速に入力音声をサンプリングできるものであれ
ば、それ以外の構成であっても構わない。また、特徴抽
出回路２１では、デジタル信号に変換された音声信号を
周波数変換し、周波数領域での特徴パラメータを抽出し
て、発声された単語の特徴パラメータ列として表す方法
を用いたが、これ以外の方法であっても特徴を的確に抽
出できる方法であれば構わない。さらに、音声認識結果
を操作者に知らせる手段として、本実施例では、ディス
プレイ装置３１に音声認識結果を表示する方法を用いた
が、これ以外の方法でも、音声認識結果を迅速に操作者
に知らせることができる方法であれば構わない。また、
発話内容の難易度を判定する際に、難易度判定部４２で
単語の頻出度および会話の形式に関する情報を用いた
が、発話内容の難易度を的確に判断できる方法であれば
これ以外の情報を用いた方法であっても構わない。In this embodiment, the voice input unit 10
Microphone 11, high-frequency emphasis filter 12, AD converter 1
3, sampling is performed as a digital signal. However, any other configuration may be used as long as the input sound can be sampled quickly. In the feature extraction circuit 21, a method of frequency-converting a voice signal converted into a digital signal, extracting a feature parameter in a frequency domain, and expressing it as a feature parameter sequence of a spoken word is used. Any method may be used as long as the method can accurately extract features. Further, in this embodiment, a method of displaying the voice recognition result on the display device 31 is used as a means for notifying the operator of the voice recognition result, but other methods may be used to promptly notify the operator of the voice recognition result. Any method can be used. Also,
When determining the difficulty level of the utterance content, the difficulty level determination unit 42 used the information on the frequency of the words and the form of the conversation, but any other information that can accurately determine the difficulty level of the utterance content May be used.

【００２０】[0020]

【発明の効果】本発明の音声認識装置であれば、適正な
発話単位での入力が入力指示部で指示される。そして、
この指示に合わせた発話単位で入力された音声データ
は、この発話単位に合った適確な情報からなる辞書を用
いて、速やかに音声認識処理を行うことができる。この
ため、使用環境の変化や、音声入力データの変化や、音
声入力単位が頻繁に変化するような状況にも非常に柔軟
に対応した音声認識処理ができるようになり、音声認識
装置の使用の用途を大幅に広げることができるようにな
った。According to the speech recognition apparatus of the present invention, an input in an appropriate utterance unit is instructed by the input instructing section. And
Voice data input in units of utterances in accordance with this instruction can be promptly subjected to voice recognition processing using a dictionary composed of accurate information that matches the units of utterance. This makes it possible to perform speech recognition processing in a highly flexible manner even when the use environment changes, changes in voice input data, or situations where the voice input unit frequently changes. The application can be greatly expanded.

【００２１】また、予め、音声入力を行う際の発話単位
が指示されるため、初心者でも短期間に音声認識装置の
音声入力操作方法を取得することができるようになっ
た。このため、多くの人が使用できる音声認識装置の提
供が可能となった。Further, since the utterance unit at the time of voice input is specified in advance, even a beginner can acquire the voice input operation method of the voice recognition device in a short time. For this reason, it has become possible to provide a speech recognition device that can be used by many people.

[Brief description of the drawings]

【図１】本実施例の音声認識装置の概要を示すブロック
図である。FIG. 1 is a block diagram illustrating an outline of a speech recognition apparatus according to an embodiment.

【図２】本実施例の音声認識装置の詳細な構成を示すブ
ロック図である。FIG. 2 is a block diagram illustrating a detailed configuration of the speech recognition device according to the embodiment.

【図３】本実施例の音声認識装置の外観図である。FIG. 3 is an external view of a voice recognition device of the present embodiment.

【図４】従来例の音声認識装置のブロック図である。FIG. 4 is a block diagram of a conventional speech recognition device.

[Explanation of symbols]

１０…音声入力部、２０…音声認識部、３０…認識結果
表示部、４０…発話単位抽出部、５０…発話単位表示
部。10: voice input unit, 20: voice recognition unit, 30: recognition result display unit, 40: speech unit extraction unit, 50: speech unit display unit.

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 15/28 G10L 15/20 G10L 15/22 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G10L 15/28 G10L 15/20 G10L 15/22 JICST file (JOIS)

Claims

(57) [Claims]

1. An utterance unit extraction unit for extracting an appropriate utterance unit of input speech data; an input instruction unit for instructing input in the utterance unit extracted by the utterance unit extraction unit; A speech recognition unit that has a plurality of dictionaries and recognizes input speech data using the speech unit dictionary extracted by the speech unit extraction unit; A difficulty level determination unit that determines the difficulty level of the utterance content of the voice data; and an utterance unit determination unit that determines an optimal utterance unit for voice recognition from the difficulty level of the utterance content determined by the difficulty level determination unit. A speech recognition device characterized by being used.

2. An utterance unit extraction unit for extracting an appropriate utterance unit of input speech data; an input instruction unit for instructing input in the utterance unit extracted by the utterance unit extraction unit; A speech recognition unit that has a plurality of dictionaries and recognizes input speech data using the speech unit dictionary extracted by the speech unit extraction unit; A recognition rate detection unit that detects a recognition rate of a recognition result of the voice data; and a speech unit determination unit that determines an optimal speech unit for speech recognition based on the recognition rate detected by the recognition rate detection unit. A speech recognition device characterized in that:

3. An utterance unit extraction unit for extracting an appropriate utterance unit of the input speech data; an input instruction unit for instructing input in the utterance unit extracted by the utterance unit extraction unit; A speech recognition unit that has a plurality of dictionaries and recognizes input speech data using the speech unit dictionary extracted by the speech unit extraction unit; Speech recognition characterized by comprising: a noise detection unit for detecting; and a speech unit determination unit for determining an optimal speech unit for speech recognition according to the level of background noise detected by the noise detection unit. apparatus.

4. An utterance unit extraction unit for extracting an appropriate utterance unit of the input speech data; an input instruction unit for instructing input in the utterance unit extracted by the utterance unit extraction unit; A speech recognition unit that has a plurality of dictionaries and recognizes input speech data using the speech unit dictionary extracted by the speech unit extraction unit; A difficulty level determination unit that determines the difficulty level of the utterance content of the voice data; a recognition rate detection unit that detects a recognition rate of the recognition result of the input voice data; and a difficulty level of the utterance content determined by the difficulty level determination unit A speech recognition apparatus, comprising: a speech unit determination unit that determines an optimal speech unit for speech recognition based on a degree and a recognition rate detected by the recognition rate detection unit.

5. An utterance unit extraction unit for extracting an appropriate utterance unit of the input speech data; an input instruction unit for instructing input in the utterance unit extracted by the utterance unit extraction unit; A speech recognition unit that has a plurality of dictionaries and recognizes input speech data using the speech unit dictionary extracted by the speech unit extraction unit; A difficulty level determination unit that determines the difficulty level of the utterance content of the voice data, a noise detection unit that detects background noise, and a difficulty level of the utterance content determined by the difficulty level determination unit and the noise level detected by the noise detection unit. A speech recognition device comprising: a speech unit determination unit that determines a speech unit optimal for speech recognition based on a background noise level.

6. An utterance unit extraction unit for extracting an appropriate utterance unit of the input speech data; an input instruction unit for instructing input in the utterance unit extracted by the utterance unit extraction unit; A speech recognition unit that has a plurality of dictionaries and recognizes input speech data using the speech unit dictionary extracted by the speech unit extraction unit; A recognition rate detection unit that detects a recognition rate of a recognition result of voice data, a noise detection unit that detects background noise, and a recognition rate detected by the recognition rate detection unit and a background noise detected by the noise detection unit. A speech recognition apparatus, comprising: a speech unit determination unit that determines an optimal speech unit for speech recognition based on a level.

7. An utterance unit extraction unit for extracting an appropriate utterance unit of the input speech data; an input instruction unit for instructing input in the utterance unit extracted by the utterance unit extraction unit; A speech recognition unit that has a plurality of dictionaries and recognizes input speech data using the speech unit dictionary extracted by the speech unit extraction unit; A difficulty determination unit that determines the difficulty of the utterance content of the voice data; a recognition rate detection unit that detects a recognition rate of a recognition result of the input voice data; a noise detection unit that detects background noise; Determining an optimum utterance unit for speech recognition based on the difficulty level of the utterance content determined by the determination unit, the recognition rate detected by the recognition rate detection unit, and the level of background noise detected by the noise detection unit; Utterance unit determining unit A speech recognition device characterized in that: