JPS63221397A

JPS63221397A - Monosyllable voice recognition equipment

Info

Publication number: JPS63221397A
Application number: JP62057621A
Authority: JP
Inventors: 室井　哲也
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1987-03-11
Filing date: 1987-03-11
Publication date: 1988-09-14

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】抜４九更本発明は、単音節音声認識に関する。[Detailed description of the invention] Nuki 49 The present invention relates to monosyllabic speech recognition.

灸未遣監従来の単音節音声認識においては、母音を同定した後に
全ての子音とマツチングするため演算量が多いという欠
点があった。Conventional monosyllabic speech recognition has the disadvantage that it requires a large amount of calculation because it must match all consonants after identifying the vowel.

一圧一一道一本発明は、上述のごとき実情に鑑みてなされたもので、
特に、単音節音声ｍｓにおいて、音声区間の先頭が、音
声エネルギーが低くかつ低い周波数成分の占める割合が
高いとき、有声子音とのみマツチングを行なうことによ
り、マツチング処理量の低減を図ることを目的としてな
されたものである。One Pressure One Road One This invention was made in view of the above-mentioned circumstances.
In particular, in monosyllabic speech ms, when the beginning of the speech interval has low speech energy and a high proportion of low frequency components, the purpose of this study is to reduce the amount of matching processing by performing matching only with voiced consonants. It has been done.

勇−−」え本発明は、上記目的を達成するために、入力信号を周波
数分析して特徴ベクトルの時系列（ｘｌ。In order to achieve the above object, the present invention frequency-analyzes an input signal to generate a time series of feature vectors (xl).

ｘ２・・・Ｘｒ）に変換する特徴系列変換手段と、入力
信号から音声区間を切り出す音声区間検出手段とを具備
する単音節音声認識装置において、音声のエネルギーが
第１の閾値より低くかつ低い周波数成分が音声エネルギ
ー中に占める割合が第２の閾値より高くなるフレームが
、音声区間の先頭から第３の閾値より長く継続するとき
、撥音／Ｎ／と有声子音（／ｂ／、／ｄ／、／ｇ／、／
ｍ／、／ｎ／、／ｚ／、／ｒ／）を持つ単音節とのみマ
ツチングすることを特徴としたものである。以下。x2... When a frame in which the proportion of the component in the speech energy is higher than the second threshold continues from the beginning of the speech section for longer than the third threshold, the phonic sound /N/ and voiced consonants (/b/, /d/, /g/, /
It is characterized by matching only with monosyllables having the following characters (m/, /n/, /z/, /r/). below.

本発明の実施例に基いて説明する。An explanation will be given based on an example of the present invention.

第１図は、本発明の一実施例を説明するための構成図で
１図中、１はマイク、２は特徴系列変換部、３は音声区
間検出部、４は判定部、５は認識部で、まず、マイクよ
り入力された音声信号を周波数分析して特徴ベクトルの
時系列に変換する。FIG. 1 is a block diagram for explaining one embodiment of the present invention. In the figure, 1 is a microphone, 2 is a feature sequence converter, 3 is a speech section detector, 4 is a determination unit, and 5 is a recognition unit. First, the audio signal input from the microphone is frequency-analyzed and converted into a time series of feature vectors.

入力信号を周波数分析する方法には、様々なものがある
が、本実施例では、１５チヤンネルのバンドパスフィル
タ一群を用いる事にする。フィルターのチャンネルは中
心周波数２５０〜６３００＆の範囲で１／３オクターブ
ごとに配置すればよい。There are various methods for frequency analysis of an input signal, but in this embodiment, a group of 15 channel bandpass filters will be used. The filter channels may be arranged every 1/3 octave within the center frequency range of 250 to 6300&.

音声区間検出手段は１本発明では、直接関係がないので
、その詳細な説明は省略するが、様々な方法があること
は知られている。Since the voice section detection means is not directly related to the present invention, a detailed explanation thereof will be omitted, but it is known that there are various methods.

判定部では、音声区間の先頭がバズバ一部であるか否か
を判定している。バズバ一部は、有声破裂音の前部に存
在し、破裂時点以前に声帯が振動を開始するために、呪
われる区間である０判定で用いるパラメータは、以下で
述べる正規化音声エネルギーＰＬと低周波数成分の割合
ＬＬとを用いる。The determination unit determines whether the beginning of the voice section is part of the buzzer. The buzzer part exists at the front of the voiced plosive, and the vocal cords start vibrating before the point of plosive, so the parameters used in the 0 judgment, which is a cursed section, are the normalized vocal energy PL and low The frequency component ratio LL is used.

ここでχｉ、ｊは、ｉフレームのｊチャンネルの出力で
ある。Here, χi,j is the output of the j channel of the i frame.

つまり、Ｐｉはｉフレームの音声エネルギー（Σｊ＝１Ｔｈｌは第１の閾値、Ｔｈ２は第２の閾値、Ｔｈ３は第
３の閾値、ｉはクレーム番号、Ｐｉは正規化音声エネル
ギー、Ｌｉは低周波数成分の割合２ｍはバスバ一部（Ｐ
ｉ＜ＴｈｌかつＬｉ＞Ｔｈ２）と判定されたフレーム数
であり、ｍが閾値Ｔｈ３を越えるか（Ａ）、否か（Ｂ）
で認識の対象を変えている。但し、ｍ＞Ｔｈであっても
、つまり（Ａ）であっても、／ｂ／、／ｄ／、／ｇ／、
／ｚ／以外に／ｒ／や／ｍ／、／ｎ／、／Ｎ／ともマツ
チングを行うようにしている。これは、流音／ｒ／や鼻
音／ｍ／、／ｎ／、／Ｎ／の音声区間先頭でバズバ一部
と非常に良く似た特徴（つまりＰｉ（ＴｈｌかつＬｉ＞
Ｔｈ２）を持つ場合があるからである。That is, Pi is the audio energy of i frame (Σj=1, Thl is the first threshold, Th2 is the second threshold, Th3 is the third threshold, i is the claim number, Pi is the normalized audio energy, and Li is the low frequency The component ratio 2m is part of the busbar (P
i<Thl and Li>Th2), and whether m exceeds the threshold Th3 (A) or not (B)
The object of recognition is changed. However, even if m>Th, that is, even if (A), /b/, /d/, /g/,
In addition to /z/, /r/, /m/, /n/, and /N/ are also matched. This is a feature very similar to the buzz part at the beginning of the vocal section of the fluid sound /r/ and the nasal sound /m/, /n/, /N/ (i.e., Pi (Thl and Li>
Th2).

また、有声破裂音においてもバズバ一部を持たない場合
もあり、ｍ　（Ｔ　Ｈ３の場合つまり（Ｂ）の場合には
全ての子音とマツチングを行うようにしている。Also, voiced plosives may not have part of the buzz, so in the case of m (T H3, that is, in the case of (B)), matching with all consonants is performed.

以上のようにして、本発明によって、認識性能を劣化さ
せることなく、マツチングの演算量を削減する事が可能
である。なお、閾値は、Ｔｈ１＝０．７５．Ｔｈ２＝０
．９．Ｔｈ３＝４（フレーム周期５　ｍ５ｅｃの場合、
つまり、２０■ｓｅｃ程度）に設定すればよい。As described above, according to the present invention, it is possible to reduce the amount of matching calculations without deteriorating recognition performance. Note that the threshold value is Th1=0.75. Th2=0
．． 9. Th3=4 (if the frame period is 5 m5ec,
In other words, it may be set to about 20 seconds).

籐−一来以上の説明から明らかなように、本発明によると、認識
性能を劣化させることなく、子音の識別に必要な演算量
を削減することができる。As is clear from the above description, according to the present invention, the amount of calculation required for consonant identification can be reduced without deteriorating recognition performance.

[Brief explanation of drawings]

第１図は、本発明の一実施例を説明するための構成図、
第２図は、その動作説明をするためのフローチャートで
ある。１・・・マイク、２・・・特徴系列変換部、３・・・音
声区間検出部、４・・・判定部、５・・・認識部。FIG. 1 is a configuration diagram for explaining one embodiment of the present invention,
FIG. 2 is a flowchart for explaining the operation. DESCRIPTION OF SYMBOLS 1...Microphone, 2...Feature series converter, 3...Speech section detection part, 4...Determination part, 5...Recognition part.

Claims

[Claims]

Frequency analysis of the input signal is performed to generate a time series of feature vectors (X_
1, X_2... When a frame in which the proportion of low frequency components in the speech energy is higher than the second threshold continues from the beginning of the speech section for longer than the third threshold, the phonic /N/ and voiced consonants (/b/, /d ／、／
A monosyllabic speech recognition device characterized in that it matches only monosyllables having g/, /m/, /n/, /z/, /r/).