JPS6347797A

JPS6347797A - Word voice preselection system

Info

Publication number: JPS6347797A
Application number: JP61191396A
Authority: JP
Inventors: 沢井　秀文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-08-15
Filing date: 1986-08-15
Publication date: 1988-02-29

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明は、音声認識装置における単語音声の予備選択方
式に関する。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a method for preselecting word sounds in a speech recognition device.

皿米肢亙従来、単語音声の予備選択方式としては、未知入力音声
中から抽出したある特徴量に基づいて標準パターン（単
語辞書）中の同一の持重量と比較を行ない、類似の性質
を有するパターンを選択するのが通常の方法である。と
ころが、このような方法によると、予備選択によって絞
られる認識対象数の多寡にバラツキを生じたり、一定の
規準による予備選択法では、入力単語の性質（例えば、
標準パターン中に類似単語が多く存在する場合）によっ
ては、候補として選択さ九ない（即ち、リジェクトされ
る）場合等の不都合を生じることが多い。Conventionally, as a preliminary selection method for word sounds, a comparison is made with the same weight in a standard pattern (word dictionary) based on a certain feature extracted from unknown input speech, and it is compared with the same weight in a standard pattern (word dictionary). The usual method is to select a pattern. However, with these methods, there may be variations in the number of recognition targets narrowed down by preliminary selection, and preliminary selection methods based on certain criteria may have problems with the nature of the input word (e.g.
In some cases (when there are many similar words in the standard pattern), inconveniences often occur, such as not being selected as a candidate (that is, being rejected).

目　　　　　的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、認識対象数の多い大語受の単語音声を認識する際
に、認識処理時間の短縮を目的として候補単語の限定を
行なう単語の予備選択方式において、候補数を一定値以
下になるまで複数の予備選択方式を使用して予りｍ択を
行い、もって。Purpose The present invention was made in view of the above-mentioned circumstances.
In particular, when recognizing large word sounds with a large number of recognition targets, in a word preliminary selection method that limits the number of candidate words for the purpose of shortening the recognition processing time, the number of candidates is reduced to a certain value or less. Make m selections in advance using the preselection method of .

認識処理時間のバラツキを無くして処理の高速比を′４
昂ることを目的としてなされたものである。Eliminating variations in recognition processing time and increasing the processing speed ratio by '4'
It was done for the purpose of being happy.

１−一双本発明は、上記目的を達成するために、音声を入力する
ためのマイクロフォン、音声中の特徴的な時系列を求め
るための特徴分祈部、大語常単語音声の認識に先立って
単語の候補を選択する予備選択部、予備選択を行う際に
照合す企予備選択用辞書、予備選択部で絞られた候補単
語を認識する認識処理部、認識処理の際に参照するため
の単語標準パターン格納部、認識結果を出力する認識結
果出力端子とから成り、複数の相異った予備選択方式を
用いて候補単語数が予め定めた一定値以下になるまで予
備選択を行うことを特徴としたものである。以下５本発
明の実施例に基づいて説明する。1-1 In order to achieve the above-mentioned object, the present invention provides a microphone for inputting voice, a characteristic part prayer unit for determining a characteristic time sequence in voice, and a voice recognition device for common words. A preliminary selection unit that selects word candidates, a dictionary for preliminary selection that is checked when performing preliminary selection, a recognition processing unit that recognizes candidate words narrowed down by the preliminary selection unit, and words that are referred to during recognition processing. It consists of a standard pattern storage unit and a recognition result output terminal that outputs recognition results, and is characterized by performing preliminary selection using a plurality of different preliminary selection methods until the number of candidate words falls below a predetermined value. That is. The following will explain based on five embodiments of the present invention.

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図で、図中、１は音声入力用マイクロフォン、
２は音声の特徴抽出部、３は単語の予備選択部、４は予
備選択用辞書格納部、５は認識処理部、６は標準パター
ン（辞書）格納部、７は認識結果出力部で、マイクロフ
ォン１より入力された音声は特徴抽出部２で音声に特有
な特徴パラメータの時系列に変換される。予備選択部３
では、この特徴パラメータの系列を用いて予め人語賞単
語辞書から予備選択用辞書４を作成しておいたものとの
マツチングを行って候補単語を絞り込んでおく。認識処
理部５では予備選択部３で絞られた単語について辞書格
納部６の単語標準パターンとのマツチングが行われ、入
カバターンに最も近いパターン名を認識結果として認識
結果出力部７に出力する。FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, in which 1 is a voice input microphone;
2 is a voice feature extraction unit, 3 is a word preliminary selection unit, 4 is a preliminary selection dictionary storage unit, 5 is a recognition processing unit, 6 is a standard pattern (dictionary) storage unit, 7 is a recognition result output unit, and 7 is a recognition result output unit. The voice inputted from 1 is converted by the feature extractor 2 into a time series of characteristic parameters specific to the voice. Preliminary selection section 3
Now, using this series of feature parameters, matching is performed with a preliminary selection dictionary 4 created in advance from the human language award word dictionary to narrow down candidate words. In the recognition processing section 5, the words narrowed down by the preliminary selection section 3 are matched with the word standard patterns in the dictionary storage section 6, and the pattern name closest to the input cover pattern is outputted to the recognition result output section 7 as a recognition result.

第２図は、第１図に示した予備選択部３における処理の
内容を表わす流れ図で、図中、３１は処理開始端子、３
２，３４．３５は各々相異なったＭ個の予備選択方法部
であり、３３は処理の終了を判定する判定処理部、３６
は処理終了端子である。予備選択処理としては、まず、
予備選択方法部３２である規準に基づいて候補単語の選
択を行なう。選択の方法は、未知入力音声中の特徴的な
パラメータを抽出し、これと同−又は類似の特徴を有す
る標準パターンを選択する。この時、選択された候補パ
ターン数（単語数）をに個とする。FIG. 2 is a flowchart showing the contents of the process in the preliminary selection section 3 shown in FIG.
2, 34, and 35 are M different preliminary selection method units, 33 is a determination processing unit that determines the end of processing, and 36
is the processing end terminal. As a preliminary selection process, first,
A preliminary selection method unit 32 selects candidate words based on certain criteria. The selection method involves extracting characteristic parameters from unknown input speech and selecting a standard pattern having the same or similar characteristics. At this time, the number of selected candidate patterns (number of words) is set to .

一方、最終的に絞り込みたい候補パターン数を予めＮ個
と設定しておく。前記にやＮの値は、勿論、認識対象単
語数や予備選択法の性能に依存することは言うまでもな
いが、特にＮの値は、所望の認識システムの性能に合わ
せて経験的に決定されることが多い。On the other hand, the number of candidate patterns to be finally narrowed down is set in advance to N. It goes without saying that the value of N depends on the number of words to be recognized and the performance of the preliminary selection method, but the value of N in particular is determined empirically according to the performance of the desired recognition system. There are many things.

このようにして、予備選択方法部３２の処理により候補
数ＫがＮ以下になれば、予備選択処理を終了する。しか
し、通常、認識対象数が多い場合や、Ｎが認識対象数に
比較して小さい場合には、単一の予備選択法では判定処
理部３３の判定条件（Ｋ≦Ｎ）を満たすことは稀である
。このような場合には、前記予備選択方法部３２とは異
なる第２の予備選択方法部３４により、さらに、候補単
語を絞り込む必要がある。３２，３４．３５の相異なる
予備選択法としては、例えば、音声の継続時間長、無音
区間数等の比較、語頭部分や語中部分の音韻の分類、パ
ワーデイツプの有無、その他音声パターン中から特殊な
処理により抽出したパターンに基づく方法等、種々の方
法を用いることができ、また、これらの方法は、順不同
で用いることも可能である。従って、第２図に示した処
理の流れに従えば、認識処理部５に送る候補数Ｋが常に
Ｎより小さくなった時点で予備選択を終了することがで
き、従来技術の欠点である未知入力音声の性質の違いに
よる認識対象となる候補数のバラツキや必要以上の予備
選択処理を抑制することが可能となる。In this manner, when the number of candidates K becomes equal to or less than N through the processing of the preliminary selection method section 32, the preliminary selection process is terminated. However, normally, when the number of recognition targets is large or when N is small compared to the number of recognition targets, it is rare that a single preliminary selection method satisfies the determination condition (K≦N) of the determination processing unit 33. It is. In such a case, it is necessary to further narrow down the candidate words using a second preliminary selection method section 34 different from the preliminary selection method section 32. Different preliminary selection methods for 32, 34, and 35 include, for example, comparing the duration of speech, the number of silent intervals, etc., classifying the phonemes of the initial and middle parts of words, the presence or absence of power dips, and other special selection methods from speech patterns. Various methods can be used, such as a method based on patterns extracted by various processes, and these methods can also be used in any order. Therefore, if the process flow shown in FIG. 2 is followed, the preliminary selection can be terminated when the number of candidates sent to the recognition processing unit 5 is always smaller than N, and the unknown input It is possible to suppress variations in the number of recognition target candidates due to differences in voice characteristics and to suppress unnecessary preliminary selection processing.

効　　　果以上の説明から明らかなように１本発明によれば、予め
、最終的に絞り込みたい候補パターン数に達するまで、
相異なった予備選択法を直列的に使用し、各々の処理が
終了する段階で判定処理を行なうので、認識処理に必要
な時間の増加を一定時間内に制限することができ、効率
的な予備選択処理および認識処理が可能となる。Effects As is clear from the above explanation, according to the present invention, in advance, the number of candidate patterns to be finally narrowed down is reached.
Since different preliminary selection methods are used in series and judgment processing is performed at the end of each process, it is possible to limit the increase in time required for recognition processing to within a certain amount of time, and to achieve efficient preliminary selection. Selection processing and recognition processing become possible.

[Brief explanation of the drawing]

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図、第２図は、第１図に示した予備選択部３の
処理内容を説明するための流れ図である。１・・・マイク、２・・・特徴抽出部、３・・・予備選
択部、４・・・予備選択用辞書格納部、５・・・認識処
理部。６・・・標準パターン、７・・・認識結果出力部、３２
゜３４．３５・・・予備選択部、３３・・・判定部。第　　１　図第２図FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, and FIG. 2 is a flowchart for explaining the processing contents of the preliminary selection section 3 shown in FIG. DESCRIPTION OF SYMBOLS 1... Microphone, 2... Feature extraction part, 3... Preliminary selection part, 4... Preliminary selection dictionary storage part, 5... Recognition processing part. 6... Standard pattern, 7... Recognition result output section, 32
゜34.35... Preliminary selection section, 33... Judgment section. Figure 1 Figure 2

Claims

[Claims]

A microphone for inputting speech, a feature section for finding characteristic time sequences in speech, a preliminary selection section for selecting word candidates prior to recognition of large vocabulary word speech, and a pre-selection section for making preliminary selections. It consists of a preliminary selection dictionary for comparison, a recognition processing section that recognizes candidate words narrowed down by the preliminary selection section, a word standard pattern storage section for reference during recognition processing, and a recognition result output terminal that outputs recognition results. A word-sound preliminary selection method, characterized in that preliminary selection is performed using a plurality of different preliminary selection methods until the number of candidate words becomes equal to or less than a predetermined constant value.