JPS6266299A - Voice recognition equipment - Google Patents

Voice recognition equipment

Info

Publication number
JPS6266299A
JPS6266299A JP20713285A JP20713285A JPS6266299A JP S6266299 A JPS6266299 A JP S6266299A JP 20713285 A JP20713285 A JP 20713285A JP 20713285 A JP20713285 A JP 20713285A JP S6266299 A JPS6266299 A JP S6266299A
Authority
JP
Japan
Prior art keywords
section
voice
power
distance
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP20713285A
Other languages
Japanese (ja)
Inventor
潤一郎 藤本
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP20713285A priority Critical patent/JPS6266299A/en
Publication of JPS6266299A publication Critical patent/JPS6266299A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 技術分野 本発明は、音声認識装置に関する。[Detailed description of the invention] Technical field The present invention relates to a speech recognition device.

従来技術 本出願人は、単語単位に発生した音声を、2値化処理し
て特徴パターンを求め、この2値化処理して求めた特徴
パターンと辞書パターンを線形マツチングして認識する
B T S P (Binary T S P)につい
て提案した。
Prior Art The present applicant has developed a BTS system in which speech generated word by word is binarized to obtain a feature pattern, and the feature pattern obtained by this binarization processing is linearly matched with a dictionary pattern for recognition. P (Binary T S P) was proposed.

しかし、このB T S P方式では認識における母音
のウェイトが大きく、子音による違いがあまり現れず例
えば「濃く」と「億」の誤認識等を引き起こしやすい。
However, in this BTSP method, vowels are given a large weight in recognition, and differences between consonants do not appear much, which tends to cause misrecognition of, for example, ``doku'' and ``billion''.

豆−一孜 本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声認識装置における子音部の認識精度を向上さ
せることを目的としてなされたものである。
The present invention was made in view of the above-mentioned circumstances.
In particular, this was done with the aim of improving the recognition accuracy of consonant parts in a speech recognition device.

遭−一一戊 本発明は、上記目的を達成するために、音声収集部と、
特徴量変換部と、音声区間検出部と、標準パターン格納
部とを有して成り、音声収集部で得られた音声を特徴量
変換部にて特徴量に変換し、音声区間に係る部分だけを
抽出し、あらかじめ登録された標準パターンとの類似性
(距離)を求め。
SUMMARY OF THE INVENTION In order to achieve the above object, the present invention includes a voice collection section,
It has a feature converter, a voice section detecting section, and a standard pattern storage section.The feature converter converts the voice obtained by the voice collecting section into a feature, and extracts only the portion related to the voice section. is extracted and its similarity (distance) to a pre-registered standard pattern is determined.

類似度(距離)の最も高い(低い)ものを認識結果とし
て出力する音声認識装置において、音声のパワーを求め
、パワーの増減と逆(正)の関係となるような重みをつ
けて類似度の計算をすることを特徴としたものである。
In a speech recognition device that outputs the highest (lowest) similarity (distance) as a recognition result, the power of the voice is determined, and the similarity is calculated by assigning weights that have an inverse (positive) relationship with the increase or decrease in power. It is characterized by calculation.

以下、本発明の実施例に基づいて説明する。Hereinafter, the present invention will be explained based on examples.

第1図は、本発明が適用される音声認識装置の一例を説
明するための電気的ブロック線図で、図中、1はマイク
、2は音声区間検出部、3は特微量抽出部、4はパター
ン収納部、5は切り換えスイッチ、6は照合部、7は結
果出力部、8はDPマツチング部、9はフレーム間距離
部、10はフレームパワー掛算部で、照合部6が第2図
のようになっているのが特徴である。而して、本発明は
FIG. 1 is an electrical block diagram for explaining an example of a speech recognition device to which the present invention is applied. is a pattern storage section, 5 is a changeover switch, 6 is a matching section, 7 is a result output section, 8 is a DP matching section, 9 is an interframe distance section, 10 is a frame power multiplication section, and the matching section 6 is the same as shown in FIG. It is characterized by the fact that it looks like this. Therefore, the present invention is as follows.

子音が母音に比べてパワーが小さいことに着目してなさ
れたものであり、音声収集部と、特徴量変換部と、音声
区間検出部と、標準パターン格納部とを有して成り、音
声収集部で得られた音声を特徴量変換にて特徴量に変換
し、音声区間に係る部分だけを抽出し、あらかじめ登録
されて標準パターンとの類似性(距離)を求め類似度(
距離)の最も高い(低い)ものを認識結果として出力す
る音声認識装置において、音声のパワーを求め、パワー
の増減と逆(正)の関係となるような重みをつけて類似
度の計算をするようにしたものである。
It was developed by focusing on the fact that consonants have lower power than vowels, and it is comprised of a voice collection section, a feature value conversion section, a voice section detection section, and a standard pattern storage section. The voice obtained in the section is converted into a feature by feature conversion, only the parts related to the voice section are extracted, and the similarity (distance) with the standard pattern registered in advance is calculated.
In a speech recognition device that outputs the highest (lowest) distance (distance) as the recognition result, the power of the speech is determined, and the degree of similarity is calculated by assigning weights that have an inverse (positive) relationship with the increase or decrease in power. This is how it was done.

上述のように、本発明においては、第1図に示したよう
な一般的な音声認識装置の照合部6が第2図に示すよう
に構成されており、まず、マイクからの音声区間を切り
出して特徴抽出する。音声区間は音声のパワーが一定値
を越えた時から下るまでの区間をとり出すような方法で
良く、また、特徴抽出部はバンドパスフィルタ群による
周波数分析等で良い。標準パターン作成時にはこのバン
ドパスフィルタの出力を10m秒毎に12〜16bit
程度でサンプリングして格納しておく。認識時には特徴
抽出した未知入カバターンと登録されている標準パター
ンを照合し、最も類似しているものを認識結果として出
力する。照合はフレーム間距離による動的計画法を用い
る方法(DPマツチング)など知られている方法を用い
れば良い。バンドパスフィルタの数をn個とし、入カバ
ターンを。i、標準パターンをbi とするとフレーム
間距離は。
As described above, in the present invention, the matching unit 6 of the general speech recognition device shown in FIG. 1 is configured as shown in FIG. Extract features. The voice section may be extracted from the period from when the voice power exceeds a certain value to when it drops, and the feature extracting section may perform frequency analysis using a group of band-pass filters. When creating a standard pattern, the output of this bandpass filter is 12 to 16 bits every 10 msec.
Sample and store it. During recognition, the system compares the extracted unknown cover patterns with registered standard patterns, and outputs the most similar pattern as the recognition result. For matching, a known method such as a method using dynamic programming based on interframe distance (DP matching) may be used. The number of bandpass filters is n, and the input cover pattern is: If i and the standard pattern are bi, the interframe distance is.

DPマツチングの際にフレーム間距離を求め、そげるこ
とによりフレームパワーに比例した距離となり、パワー
の小さい方がウェイトが大きくなり。
When performing DP matching, the distance between frames is determined and the distance is proportional to the frame power, and the smaller the power, the greater the weight.

パワーの小さい部分を注目した認識が可能となる。This makes it possible to recognize parts with low power.

ここでは距離を用いているが類似度を用いる時はフレー
ムパワーをかける部分を割れば良い。又、パワーを掛け
ると出力0の部分の距離がOとなってしまうため、不都
合が正しるので最大パワーよりも大きな値αからパワー
を引き、これでフレーム幅距離を割れば良い。
Here, distance is used, but when using similarity, it is sufficient to divide the part to which frame power is applied. Also, if the power is multiplied, the distance of the part where the output is 0 becomes O, so to correct the problem, it is sufficient to subtract the power from the value α, which is larger than the maximum power, and divide the frame width distance by this.

第3図は、上述のごとき場合の一実施例を示す要部構成
図で、図中、11はパワー計算部、12はα−パワ一部
、13は割算部で、この実施例は、図示のように引算部
12において、最大パワーよりも大きなαからパワーを
引算し、割算部13において、上述のごとくしてこの引
算した値でフレーム間距離を割り算するようにしたもの
である。
FIG. 3 is a main part configuration diagram showing an example of the above-mentioned case. In the figure, 11 is a power calculation section, 12 is a part of α-power, and 13 is a division section. As shown in the figure, the subtraction unit 12 subtracts the power from α, which is larger than the maximum power, and the division unit 13 divides the inter-frame distance by this subtracted value as described above. It is.

仇−一来 以上の説明から明らかなように、本発明によると、子音
部分に大きなウェイトがつき、精度の良い音声認識が可
能となる。
As is clear from the above description, according to the present invention, a large weight is given to consonant parts, making it possible to perform highly accurate speech recognition.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、本発明が適用される音声認識装置の一例を示
す図、第2図は、第1図に示した照合部の詳細図、第3
図は、本発明の他の実施例を示す要部構成図である。 1・・・マイク、2・・・音声区間検出部、3・・・特
微量抽出部、4・・・パターン収納部、5・・・切り換
えスイッチ、6・・・照合部、7・・・結果出力部、8
・・・DPマツチング部、9・・・フレーム間距離部、
10・・・フレームパワー掛算部、11・・・パワー計
算部、12・・・α−パワ一部、13・・・割算部。
FIG. 1 is a diagram showing an example of a speech recognition device to which the present invention is applied, FIG. 2 is a detailed diagram of the matching section shown in FIG. 1, and FIG.
The figure is a main part configuration diagram showing another embodiment of the present invention. DESCRIPTION OF SYMBOLS 1... Microphone, 2... Voice section detection section, 3... Feature amount extraction section, 4... Pattern storage section, 5... Changeover switch, 6... Verification section, 7... Result output section, 8
... DP matching section, 9... inter-frame distance section,
DESCRIPTION OF SYMBOLS 10... Frame power multiplication part, 11... Power calculation part, 12... α-power part, 13... Division part.

Claims (1)

【特許請求の範囲】[Claims] 音声収集部と、特徴量変換部と、音声区間検出部と、標
準パターン格納部とを有して成り、音声収集部で得られ
た音声を特徴量変換部にて特徴量に変換し、音声区間に
係る部分だけを抽出し、あらかじめ登録された標準パタ
ーンとの類似性(距離)を求め、類似度(距離)の最も
高い(低い)ものを認識結果として出力する音声認識装
置において、音声のパワーを求め、パワーの増減と逆(
正)の関係となるような重みをつけて類似度の計算をす
ることを特徴とする音声認識装置。
It has a voice collection section, a feature amount conversion section, a voice section detection section, and a standard pattern storage section.The voice obtained by the voice collection section is converted into a feature amount by the feature amount conversion section. A speech recognition device extracts only the part related to the section, calculates the similarity (distance) to a pre-registered standard pattern, and outputs the one with the highest (lowest) similarity (distance) as the recognition result. Seeking power, increasing and decreasing power and vice versa (
A speech recognition device characterized in that similarity calculation is performed by applying weights such that a relationship of (positive) is established.
JP20713285A 1985-09-19 1985-09-19 Voice recognition equipment Pending JPS6266299A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP20713285A JPS6266299A (en) 1985-09-19 1985-09-19 Voice recognition equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP20713285A JPS6266299A (en) 1985-09-19 1985-09-19 Voice recognition equipment

Publications (1)

Publication Number Publication Date
JPS6266299A true JPS6266299A (en) 1987-03-25

Family

ID=16534720

Family Applications (1)

Application Number Title Priority Date Filing Date
JP20713285A Pending JPS6266299A (en) 1985-09-19 1985-09-19 Voice recognition equipment

Country Status (1)

Country Link
JP (1) JPS6266299A (en)

Similar Documents

Publication Publication Date Title
JP2739950B2 (en) Pattern recognition device
US4885791A (en) Apparatus for speech recognition
EP0474496B1 (en) Speech recognition apparatus
JPS63153598A (en) Voice spectrum analyzer
JPS6266299A (en) Voice recognition equipment
EP0109140B1 (en) Recognition of continuous speech
JP2856429B2 (en) Voice recognition method
JPS63213899A (en) Speaker collation system
JPS61233791A (en) Voice section detection system for voice recognition equipment
JPS6131880B2 (en)
JP2514985B2 (en) Voice recognition system
KR950002704B1 (en) Speech recognition system
JP2602271B2 (en) Consonant identification method in continuous speech
JPH0451840B2 (en)
Raman et al. Performance of isolated word recognition system for confusable vocabulary
JPS61143800A (en) Voice recognition equipment
JPH0569240B2 (en)
JPS6389900A (en) Voice recognition equipment
JP2655637B2 (en) Voice pattern matching method
JPS58190999A (en) Voice recognition equipment
JPS6255798B2 (en)
JPS63223696A (en) Voice pattern generation system
JPS60168199A (en) Voice feature extractor
JPH0221598B2 (en)
JPH0119600B2 (en)