JPH0466040B2 - - Google Patents

Info

Publication number
JPH0466040B2
JPH0466040B2 JP17893583A JP17893583A JPH0466040B2 JP H0466040 B2 JPH0466040 B2 JP H0466040B2 JP 17893583 A JP17893583 A JP 17893583A JP 17893583 A JP17893583 A JP 17893583A JP H0466040 B2 JPH0466040 B2 JP H0466040B2
Authority
JP
Japan
Prior art keywords
threshold
band
value
pattern
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP17893583A
Other languages
Japanese (ja)
Other versions
JPS6069699A (en
Inventor
Masahide Yoneyama
Junichiro Fujimoto
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP17893583A priority Critical patent/JPS6069699A/en
Publication of JPS6069699A publication Critical patent/JPS6069699A/en
Publication of JPH0466040B2 publication Critical patent/JPH0466040B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】 技術分野 本発明は、音声認識装置における音声パターン
作成に関する。
DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to speech pattern creation in a speech recognition device.

従来技術 単語の音声認識を行う場合、現在、DP法(動
的計画法)を用いたマツチング法が一般化してい
るが、この方法は計算量が多いという欠点を持つ
ている。そこで、音声を時間、周波数パターン
(TSP)として表わしてマツチングする方法が提
案されている。
Prior Art When performing speech recognition of words, a matching method using the DP method (dynamic programming) is currently common, but this method has the drawback of requiring a large amount of calculation. Therefore, a method has been proposed in which audio is expressed as a time frequency pattern (TSP) and matched.

第1図は、上記TSPの一例を示すブロツク線
図で、図中、1はマイク、2はスイツチ、3a,
3bはアンプ、4a,4bはフイルター群、5
a,5bは2値化回路、6は辞書部、7は類似度
照合部、8は比較部、9は結果出力部である。こ
の方法は、単語TSPをある閾値で2値化したパ
ターンを標準パターンとして登録しておき、未知
入力パターンを閾値を変えて2値変パターンにし
て登録されたパターンに重ね合せる操作をしてそ
の重なり具合から類似度を求めるもので、こうす
ることで低い閾で2値化した幅の広いパターンと
高い閾で2値化した幅の狭いパターンの重ね合せ
をみるため周波数方向のゆらぎを吸収したパター
ンマツチングが可能で、これは不特定話者認識向
きのマツチング手段となる。
FIG. 1 is a block diagram showing an example of the above TSP, in which 1 is a microphone, 2 is a switch, 3a,
3b is an amplifier, 4a and 4b are filter groups, 5
a, 5b are binarization circuits, 6 is a dictionary section, 7 is a similarity comparison section, 8 is a comparison section, and 9 is a result output section. In this method, a pattern obtained by binarizing the word TSP with a certain threshold is registered as a standard pattern, and an unknown input pattern is changed to a binary variable pattern by changing the threshold and superimposed on the registered pattern. The degree of similarity is calculated from the degree of overlap, and by doing this, fluctuations in the frequency direction are absorbed to see the superposition of a wide pattern binarized with a low threshold and a narrow pattern binarized with a high threshold. Pattern matching is possible, and this is a matching method suitable for speaker-independent recognition.

第2図は、ある単語のTSPの時間の1サンプ
ルを示す図で、横軸に周波数、縦軸にレベルを示
す。この場合、声帯、音源特性が周波数の上昇と
共に低下するため、発声された単語の周波数パタ
ーン上でも高周波数域でのレベルが下る。これを
L1,L2の閾値が「1」、「0」に2値化すると、
第2図b,cに示すごとく、同じパターンでも
「1」となる部分が違つてくる。そこで、閾値を
声帯特性に近い傾斜を持たせる方法が考えられる
が、この方法によると、第2図に示したような2
値化パターンの違いは少なくなるが、第3図に示
すようにF1に比べてF2,F3のピークが小さい場
合に、やはりパターンに違いが出るという欠点が
あつた。
FIG. 2 is a diagram showing one sample of the TSP time of a certain word, with the horizontal axis showing frequency and the vertical axis showing level. In this case, since the vocal cords and sound source characteristics decrease as the frequency increases, the level in the high frequency range of the frequency pattern of the uttered word also decreases. this
When the threshold values of L 1 and L 2 are binarized to "1" and "0",
As shown in FIG. 2b and c, even if the pattern is the same, the parts that become "1" are different. Therefore, a method can be considered in which the threshold value has a slope close to the vocal fold characteristics, but according to this method, the 2
Although the difference in the value pattern is reduced, there is still a drawback that a difference appears in the pattern when the peaks of F 2 and F 3 are smaller than that of F 1 as shown in FIG. 3.

目 的 本発明は上述のごとき実情に鑑みてなされたも
ので、特に、2値化に際してTSP上のピークが
欠落しないようにすることを目的としてなされた
ものである。
Purpose The present invention has been made in view of the above-mentioned circumstances, and in particular, has been made for the purpose of preventing peaks on TSP from being omitted during binarization.

構 成 本発明の構成について、以下、実施例に基づい
て説明する。
Configuration The configuration of the present invention will be described below based on Examples.

本発明は、パターン上の各凸部又は凹部を検出
し、その凸部又は凹部のピーク又はデイツプの一
定値倍するか一定値を加減することによつて2値
化閾値を決めるものである。
The present invention detects each convex portion or concave portion on a pattern, and determines a binarization threshold by multiplying the peak or dip of the convex portion or concave portion by a constant value or by adding or subtracting a constant value.

第4図は、本発明の一実施例を説明するための
構成図で、図中、10はマイク、11はフイルタ
ー群、12は音声区間切出し部、13は凸部検出
部(極大点検出部)、14はレジスター、15は
2値化部、16は閾値決定部、17は辞書部、1
8は類似度照合部、19は結果表示部で、まず、
マイク10で入力された単語音声をバンドパスフ
イルター群11によつて周波数の特徴パシメータ
に変換し、その中から音声に関する部分だけをと
り出す。この信号は凸部検出部13にて時間の1
サンプルずつ周波数方向に凸部を検出する。凸部
の検出は例えば隣接するフイルター出力の差を見
ながらの符号が反転する部分として検出すること
ができる。検出された凸部のピーク値は閾値決定
部16へ送られ、ここで隣接するピークを結ぶ直
線(第5図の破線より一定の値だけ低い値(第5
図実線)として閾値が決定される。なお、辞書作
成時と認識時によつて2値化の閾値を変化させる
場合は、それぞれによつて上記一定値を変化させ
れば良い。このようにして辞書登録された後、未
知入力は2値化されて照合部では辞書登録された
各単語と類似度の算出が行われ、最大類似度を有
する結果を認識結果として出力する。類似度の算
出は未知の音声の時間周波数パターンの2値化さ
れたものを辞書の同様のパターン上へ重ね、重な
り具合によつて求める。重ねるパターンの時間長
が異なる場合は例えば線形伸縮などによつて長さ
を揃えても良いし、辞書パターンが同じ単語を何
回も発声した2値パターンの加算からなる場合は
長さを揃えなくとも良いこともある。また、第4
図の凸部検出を凹部検出におきかえ、一定の値だ
け高い値を閾値とすることも可能で、このように
しても同様の効果を得ることができる。
FIG. 4 is a configuration diagram for explaining one embodiment of the present invention. In the figure, 10 is a microphone, 11 is a filter group, 12 is a voice section extraction section, and 13 is a convex part detection part (maximum point detection part). ), 14 is a register, 15 is a binarization section, 16 is a threshold value determination section, 17 is a dictionary section, 1
8 is a similarity comparison section, 19 is a result display section, and first,
A word voice inputted through a microphone 10 is converted into a frequency characteristic pathometer by a group of bandpass filters 11, and only the voice-related portion is extracted from the voice. This signal is detected by the protrusion detection section 13 at 1 of the time.
Convex portions are detected sample by sample in the frequency direction. A convex portion can be detected, for example, by looking at the difference between adjacent filter outputs and detecting a portion where the sign is reversed. The peak value of the detected convex portion is sent to the threshold value determination unit 16, where it is determined by a straight line connecting adjacent peaks (a value lower by a certain value than the broken line in FIG. 5).
The threshold value is determined as (solid line in the figure). In addition, when changing the threshold value of binarization depending on the time of dictionary creation and the time of recognition, the above-mentioned constant value may be changed depending on each. After being registered in the dictionary in this manner, the unknown input is binarized, and the matching unit calculates the degree of similarity with each word registered in the dictionary, and outputs the result having the maximum degree of similarity as a recognition result. The degree of similarity is calculated by superimposing a binarized time-frequency pattern of an unknown voice on a similar pattern in a dictionary and determining the degree of overlap. If the overlapping patterns have different time lengths, the lengths may be made equal by linear expansion or contraction, or if the dictionary pattern consists of the addition of binary patterns in which the same word is uttered many times, the lengths may not be made equal. There are also good things. Also, the fourth
It is also possible to replace the detection of convex portions in the figure with detection of concave portions and set a value higher than a certain value as the threshold value, and the same effect can also be obtained in this way.

第6図は、本発明の他の実施例を説明するため
の構成図である。この実施例は、基本的には、第
4図に示した実施例と同様であるが、図示のよう
に、音声区間切り出し後に凸部検出部13と凹部
検出部(極小点検出部)20を設けて、特徴パタ
ーンの凸部と凹部を求め、凸部のピークと凹部の
ピーク値、デイツプ値の差から2値化の閾値を決
定するものである。
FIG. 6 is a configuration diagram for explaining another embodiment of the present invention. This embodiment is basically the same as the embodiment shown in FIG. 4, but as shown in the figure, a convex portion detection section 13 and a concave portion detection section (minimum point detection section) 20 are installed after cutting out the voice section. This method determines the convex portions and concave portions of the characteristic pattern, and determines the binarization threshold from the difference between the peak value of the convex portion, the peak value of the concave portion, and the dip value.

第7図は、本発明の更に他の実施例を説明する
ための構成図で、図中のA部は第4図又は第6図
のA部に相当している。この実施例は、図示のよ
うに、音声区間切り出し部の次に、或いは、凸
部、凹部検出部の前に傾斜補正部21を設け、第
3図に示した如き周波数方向への傾斜を持つ値を
引き、ピークのレベルをほぼ一定にしてから上記
の如き操作を行なうようにしたものである。この
ようにしてピークレベルの補正を行つた後に、第
4図、第6図のA部に相当する部分へ信号伝達す
れば良い。なお、傾斜補正は例えば最小自乗直線
(三輪、城戸:音声研究会資料S79−24)などに
よつて達成できる。
FIG. 7 is a configuration diagram for explaining still another embodiment of the present invention, and section A in the figure corresponds to section A in FIG. 4 or FIG. 6. In this embodiment, as shown in the figure, a slope correction section 21 is provided next to the voice section extraction section or before the protrusion/concavity detection section, so as to have an inclination in the frequency direction as shown in FIG. The above operation is performed after subtracting the value and making the peak level almost constant. After correcting the peak level in this manner, the signal may be transmitted to the portion corresponding to portion A in FIGS. 4 and 6. Incidentally, the tilt correction can be achieved by, for example, a least square straight line (Miwa, Kido: Speech Study Group Material S79-24).

効 果 以上の説明から明らかなように、本発明による
と、音声の時間−周波数パターン上の特徴的なピ
ークを欠くことのない正確な音声パターンの作成
が可能となる。
Effects As is clear from the above description, according to the present invention, it is possible to create an accurate voice pattern without missing characteristic peaks in the time-frequency pattern of voice.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、TSPマツチング法の一例を示す構
成図、第2図及び第3図は、それぞれTSPの時
間の1サンプルを示す図、第4図は、本発明の一
実施例を説明するための構成図、第5図は、動作
説明をするための波形図、第6図及び第7図は、
それぞれ本発明の他の実施例を説明するための構
成図である。 10……マイク、11……フイルター群、12
……音声区間切り出し部、13……凸部検出部
(極大点検出部)、14……レジスター、15……
2値化部、16……閾値決定部、17……辞書
部、18……類似度照合部、19……結果表示
部、20……凹部検出部(極小点検出部)、21
……傾斜補正部。
FIG. 1 is a block diagram showing an example of the TSP matching method, FIGS. 2 and 3 are diagrams each showing one sample of TSP time, and FIG. 4 is a diagram for explaining one embodiment of the present invention. 5 is a waveform diagram for explaining the operation, and FIGS. 6 and 7 are
FIG. 7 is a configuration diagram for explaining other embodiments of the present invention. 10...Microphone, 11...Filter group, 12
...Voice section cutting section, 13...Protrusion detection section (maximum point detection section), 14...Register, 15...
Binarization unit, 16...Threshold value determination unit, 17...Dictionary unit, 18...Similarity matching unit, 19...Result display unit, 20...Concavity detection unit (minimum point detection unit), 21
...Inclination correction section.

Claims (1)

【特許請求の範囲】 1 音声信号を複数の周波数帯域に分割する帯域
分割手段と、この帯域分割手段により分割された
それぞれの帯域信号を2値化する2値化手段を有
して音声パターンを作成する音声パターン作成装
置において、周波数に対する前記帯域信号の極大
点を検出する検出手段と、前記検出手段で検出さ
れた極大点のうち隣接する2点で決定される直線
を求め、当該直線上の値よりも所定値が低い値を
閾値に設定する閾値設定手段を有し、前記閾値設
定手段で設定された閾値で前記各帯域信号を2値
化することを特徴とする音声パターン作成装置。 2 音声信号を複数の周波数帯域に分割する帯域
分割手段と、この帯域分割手段により分割された
それぞれの帯域信号を2値化する2値化手段を有
して音声パターンを作成する音声パターン作成装
置において、周波数に対する前記帯域信号の極小
点を検出する検出手段と、前記検出手段で検出さ
れた極小点のうち隣接する2点で決定される直線
を求め、当該直線上の値よりも所定値が高い値を
閾値に設定する閾値設定手段を有し、前記閾値設
定手段で設定された閾値で前記各帯域信号を2値
化することを特徴とする音声パターン作成装置。
[Scope of Claims] 1. A sound pattern is created by having a band division means for dividing an audio signal into a plurality of frequency bands, and a binarization means for binarizing each band signal divided by the band division means. In the speech pattern creation device to be created, a detection means for detecting the maximum point of the band signal with respect to frequency and a straight line determined by two adjacent points among the maximum points detected by the detection means, and a line on the straight line are determined. 1. A voice pattern creation device, comprising a threshold setting means for setting a threshold value to a value lower than a predetermined value, and binarizing each of the band signals using the threshold set by the threshold setting means. 2. An audio pattern creation device that creates an audio pattern by having a band dividing means that divides an audio signal into a plurality of frequency bands, and a binarizing means that binarizes each band signal divided by the band dividing means. In this step, a straight line is determined by a detection means for detecting the minimum point of the band signal with respect to frequency and two adjacent points among the minimum points detected by the detection means, and a predetermined value is determined from the value on the straight line. 1. A voice pattern creation device, comprising a threshold setting means for setting a high value as a threshold, and binarizing each of the band signals using the threshold set by the threshold setting means.
JP17893583A 1983-09-26 1983-09-26 Voice pattern generator Granted JPS6069699A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP17893583A JPS6069699A (en) 1983-09-26 1983-09-26 Voice pattern generator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP17893583A JPS6069699A (en) 1983-09-26 1983-09-26 Voice pattern generator

Publications (2)

Publication Number Publication Date
JPS6069699A JPS6069699A (en) 1985-04-20
JPH0466040B2 true JPH0466040B2 (en) 1992-10-21

Family

ID=16057212

Family Applications (1)

Application Number Title Priority Date Filing Date
JP17893583A Granted JPS6069699A (en) 1983-09-26 1983-09-26 Voice pattern generator

Country Status (1)

Country Link
JP (1) JPS6069699A (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0446037B1 (en) * 1990-03-09 1997-10-08 AT&T Corp. Hybrid perceptual audio coding
IT1249940B (en) * 1991-06-28 1995-03-30 Sip IMPROVEMENTS TO VOICE CODERS BASED ON SYNTHESIS ANALYSIS TECHNIQUES.

Also Published As

Publication number Publication date
JPS6069699A (en) 1985-04-20

Similar Documents

Publication Publication Date Title
US4401849A (en) Speech detecting method
US4516215A (en) Recognition of speech or speech-like sounds
US4513436A (en) Speech recognition system
EP0240329A2 (en) Noise compensation in speech recognition
JPH0466040B2 (en)
KR930011738B1 (en) Last point error amendment method of speech recognition system
JPS584198A (en) Standard pattern registration system for voice recognition unit
KR20070049831A (en) Method for dividing initial state by dividing into a syllables and a phoneme, system for implementing the same
JP2997007B2 (en) Voice pattern matching method
JPH0585917B2 (en)
JP2666296B2 (en) Voice recognition device
JP2594028B2 (en) Voice recognition device
JPH05210397A (en) Voice recognizing device
JP2844592B2 (en) Discrete word speech recognition device
JPS59205680A (en) Pattern comparator
JPH0342480B2 (en)
JP2712586B2 (en) Pattern matching method for word speech recognition device
JP2547541B2 (en) Monosyllabic speech recognizer
JPS5886598A (en) Voice recognition equipment
JPS60175098A (en) Voice recognition equipment
JPH0469959B2 (en)
JPH02108936A (en) Method for recognizing voice
JPH071437B2 (en) Voice recognizer
JPS59195296A (en) Voice recognition equipment
JPH044600B2 (en)