JPS59184398A

JPS59184398A - Voice recognition equipment

Info

Publication number: JPS59184398A
Application number: JP58058871A
Authority: JP
Inventors: 板橋　功
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1983-04-04
Filing date: 1983-04-04
Publication date: 1984-10-19

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は音声認識装置に関する。[Detailed description of the invention] The present invention relates to a speech recognition device.

近年、コンビーータや各種制御装置等にオケル入力装置
として音声認識装置が本格的実用期を迎えるに至ってい
る。人間の話す言葉をそのまま認識できる音声認識装置
は利用のための特別な８＋１練もいらず視線や手足が拘
束されないなど数々のオＵ点があることは既に周知の通
夛でおるが、現在本格的実用期に入っているいわゆる特
定話者用の単語音声認識装置においてはそのような利点
のある一方で、認識が単語あるいは単語列を単位として
行われるため、定形叙述文などの認識にはそのままでは
適用できないという欠点がある。In recent years, voice recognition devices have come into full-scale practical use as input devices for converters and various control devices. It is already well known that voice recognition devices that can recognize the words spoken by humans have many merits, such as no special 8+1 training is required to use them, and the eyes and hands and feet are not restricted. While word speech recognition devices for so-called specific speakers, which are now in practical use, have such advantages, recognition is performed in units of words or word strings, so they are not suitable for recognition of fixed descriptive sentences, etc. The disadvantage is that it cannot be applied.

これに対し、日本では表音文字により言語体系が構成さ
れていることを利用した単音節認識装置が出現している
。これによれば単音節を複数個発声することで任意の叙
述文を構成することができるが、現在のところこのよう
な単音節認識装置の認識率即ち発声された音声をどれだ
け正確に認識できるかの割合は単語音声認識装置のそれ
に比べわずかに及ばないという状態にある。また、この
ような方法は日本語のように表音文字によ多言語体系が
構成されている言語にしか適用できないという欠点を有
している。このような現状の一方で、実用という面から
は頻出する叙述文はかなシ限定されておシ、典型的ない
くつかの定形叙述文形を想定すればその殆どが包含でき
る。これは日本語に限らず他の言語でも適用可能な論理
であシ、単音節認識装置によらずとも殆どの定形叙述文
が、限られた数の単語の組み合わせで構成可能であるこ
とを意味する。In contrast, in Japan, a monosyllable recognition device has appeared that takes advantage of the fact that the language system is composed of phonetic characters. According to this, it is possible to construct any descriptive sentence by uttering multiple monosyllables, but at present the recognition rate of such monosyllable recognition devices, that is, how accurately the uttered sounds can be recognized. This ratio is slightly lower than that of word speech recognition devices. Furthermore, this method has the disadvantage that it can only be applied to languages such as Japanese, which have a multilingual system composed of phonetic characters. On the other hand, from a practical point of view, frequently occurring descriptive sentences are limited to short sentences, and most of them can be included by assuming a few typical fixed descriptive sentence forms. This is a logic that can be applied not only to Japanese but also to other languages, and it means that most fixed-form descriptive sentences can be composed of a limited number of combinations of words without using a monosyllable recognition device. do.

本発明の目的は、任意の言語の文法に従って定形叙述文
を認識する音声認識装置を提供することである。An object of the present invention is to provide a speech recognition device that recognizes fixed predicate sentences according to the grammar of any language.

本発明によれば文法規則にしたがって文法的に有シ得な
い大形が発生しないように逐次認識対象語を予測選別し
、高い精度で定形叙述文を認識できる音声認識装置が得
られる。According to the present invention, it is possible to obtain a speech recognition device that can predict and select words to be recognized sequentially according to grammatical rules so that grammatically impossible large words do not occur, and can recognize fixed-form descriptive sentences with high accuracy.

以下本発明の一実施例の図を用いて本発明の詳細な説明
する。図面は本発明装置の一実施例を示したものであり
、本図において１００は人間の音声を電気信号に変換す
るピックアップ部、２００はピックアップ部ｌＯＯで得
られた電気信号を周波数分析する周波数分析部、３００
は周波数分析部２００　の出力を標本化・量子化する標
本化・量゛さ　　　　　　　　子化部であｊｏ、４００
は標本化・量子化された音□ 声をあらかじめ格納されている音声パターン即ち標準パ
ターンと比較し、その結果を判定する判定部、５００は
該言語の文法規則にしたがった定形文形をあらかじめ記
憶しておく定形大形記憶部、６００は標準パターン記憶
部である。The present invention will be described in detail below using the drawings of one embodiment of the present invention. The drawing shows one embodiment of the device of the present invention. In the drawing, 100 is a pickup section that converts human voice into an electrical signal, and 200 is a frequency analyzer that analyzes the frequency of the electrical signal obtained by the pickup section lOO. Department, 300
400 is a sampling/quantization unit that samples and quantizes the output of the frequency analysis unit 200;
500 is a judgment unit that compares the sampled and quantized sound □ voice with a pre-stored speech pattern, that is, a standard pattern, and judges the result; The regular large storage section 600 is a standard pattern storage section.

動作を説明すると、発声者の音声はピックアップ部１０
０で電気信号に変換されて周波数分析部２００へ入力さ
れここで周波数帯域毎のパワースペクトルに分割される
。この段階では電気信号は未だアナログ量であるが、通
常の場合は続く標本化・量子化部３００においてこれら
パワースペクトルは標本化及び量子化をうけてディジタ
ル量になる。判定部４００は予め定形大形記憶部５００
を参照して次の発声でどのような種類の単語が認識され
るのが妥当かを調べ、標本化・量子化部３００の出力と
比較されるべき標準パターンを標準パターン記憶部６０
０から読み出し比較判定の演算を行う。このようにして
一度に認識対象とする標準パターンを限定することで、
認識スピードの向上と認識率の向上の両方の利点を得る
ことが可能となる。また判定部４００　ではこのように
して得られた認識結果を任意の装置に出力する機能も有
する。定形大形記憶部５００にあらかじめ記憶されるべ
き文法規則としては、例えば英語では基本文型Ｓ（王語
）＋■（述語）＋０（目的語）というような単語レベル
で展開されたものが考えられよう。To explain the operation, the voice of the speaker is picked up by the pickup section 10.
0, it is converted into an electrical signal and input to the frequency analysis section 200, where it is divided into power spectra for each frequency band. At this stage, the electrical signals are still analog quantities, but normally in the subsequent sampling/quantization section 300, these power spectra are subjected to sampling and quantization to become digital quantities. The determination unit 400 stores in advance a fixed large storage unit 500.
, the standard pattern to be compared with the output of the sampling/quantization unit 300 is stored in the standard pattern storage unit 60.
Read from 0 and perform computation for comparison and determination. By limiting the standard patterns to be recognized at one time in this way,
It is possible to obtain the advantages of both improved recognition speed and improved recognition rate. The determination unit 400 also has a function of outputting the recognition results obtained in this manner to an arbitrary device. Examples of grammatical rules that should be stored in advance in the fixed form large storage unit 500 include those developed at the word level, such as the basic sentence pattern S (King language) + ■ (Predicate) + 0 (Object) in English. Good morning.

本説明では、周波数分析部２００　と標本化・量子化部
３００で音声をディジタル化して認識動作を行うと説明
したが、マツチングのアルゴリズムによっては必ずしも
このような方法によらずともよく、そのような場合にお
いても本発明の意図するところはいささかも損われるも
のではない。また本説明ではピックアップ部１００　を
用いて音声を収録すると説明したが、これにはマイクロ
フォンの他テープレコーダ等の装置を利用することも可
能である。In this explanation, it has been explained that the frequency analysis section 200 and the sampling/quantization section 300 digitize the audio and perform the recognition operation, but depending on the matching algorithm, this method may not necessarily be used. Even in such cases, the intent of the present invention is not impaired in the slightest. Furthermore, in this description, it has been explained that the pickup unit 100 is used to record audio, but it is also possible to use a device such as a tape recorder in addition to a microphone.

以上説明したように、本発明は任意の言語に固有の文法
規則を利用することで、文法的な誤シのない定形叙述文
を高い精度で一識するのにきわめて有効である。As explained above, the present invention is extremely effective in identifying fixed-form descriptive sentences without grammatical errors with high accuracy by using grammatical rules specific to any language.

[Brief explanation of drawings]

図面は本発明の一実施例を示す構成図である。図において、１００・旧・・ピックアップ部、２００・
・・・・・周波数分析部、３０ｏ・・・・・・標本化・
量子化部、４００・・・・・・判定部、５００・・・・
・・定形大形記憶部、６００・・・・・・標準パターン
記憶部である。The drawing is a configuration diagram showing an embodiment of the present invention. In the figure, 100. Old pickup section, 200.
...Frequency analysis section, 30o...Sampling...
Quantization section, 400... Judgment section, 500...
. . . Regular large storage section, 600 . . . Standard pattern storage section.

Claims

[Claims]

1. A speech recognition device that recognizes a string of uttered input words, wherein the speech recognition device recognizes fixed-form descriptive sentences by sequentially predicting and selecting words to be recognized according to a predetermined grammar.