JPS6129519B2

JPS6129519B2 -

Info

Publication number: JPS6129519B2
Application number: JP53053967A
Authority: JP
Inventors: Hiroya Fujisaki; Hitoshi Shibagaki; Hiroshi Yamada; Hidekazu Shiratori; Yasuo Sato
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1978-05-06
Filing date: 1978-05-06
Publication date: 1986-07-07
Also published as: JPS54145409A

Abstract

PURPOSE:To increase a processing speed by extracting a sampling point which corresponds to a vowel part in a monosyllable, by selecting a recognition-expectant monosyllable, through the collation with the previously-registered reference vowel feature quantity, and by collating the monosyllable with an unknown voice. CONSTITUTION:By making use of the steadiness of the extracted feature quantity, steady-vowel-part extraction circuit 6 extracts a sampling point which corresponds to a vowel part in a monosyllable and then collates 10 a vowel feature quantity, which corresponds to this sampling point, with the reference vowel feature quantity previously registered for each monosyllable. As a result, the recognition-expectant monosyllable is determined and then collated 13 again with an unknown voice. Consequently, the processing speed can be increased.

Description

【発明の詳細な説明】本発明は、単音節音声認識装置、特に音声信号
の周波数分析結果にもとづいて特徴量を抽出して
認識処理を行なう音声認識装置において、入力音
声の特徴量の定常性を利用して単音節中の母音部
分に対応した標本点を抽出し、該標本点に対応し
た母音特徴量によつて認識対象候補単音節を選び
出し、該候補単音節に対して２次照合をとるよう
にして処理速度を向上した単音節音声認識装置に
関するものである。DETAILED DESCRIPTION OF THE INVENTION The present invention provides a monosyllabic speech recognition device, particularly a speech recognition device that performs recognition processing by extracting features based on frequency analysis results of a speech signal. is used to extract sample points corresponding to the vowel part in a monosyllable, select candidate monosyllables to be recognized using the vowel features corresponding to the sample points, and perform secondary matching on the candidate monosyllables. The present invention relates to a monosyllabic speech recognition device that improves processing speed in this way.

音声認識装置においては、音声信号の周波数分
析結果を利用して各音素の特徴を有効に表わして
いるパラメータを抽出し、該抽出されたパラメー
タと登録単語に対応した予め登録されている音素
のパラメータと照合して未知入力音声の認識を行
なうようにされる。即ち上記パラメータとして例
えば第１ホルマント周波数および第２ホルマント
周波数などをサンブリングしてこのパラメータを
使用するようにされる。しかし、上記照合に当つ
て、サンブリング点を大にとりより精密な照合を
行なおうとすると、上記照合処理に要する時間が
大となる。 In the speech recognition device, parameters that effectively represent the characteristics of each phoneme are extracted using the frequency analysis results of the speech signal, and the extracted parameters and the parameters of the pre-registered phoneme corresponding to the registered word are used. The unknown input voice is recognized by comparing it with the unknown input voice. That is, as the above parameters, for example, the first formant frequency and the second formant frequency are sampled and used. However, in the above-mentioned verification, if a large number of sampling points is used to perform more precise verification, the time required for the verification processing becomes long.

このため、上記パラメータが時間的に急変する
区間となだらかに変化する区間とが存在すること
に着目し、前者区間において密にサンブリング
し、後者区間において粗にサンブリングすること
によつて、即ち不均一なサンプリング点でサンブ
リングすることによつて、より少ない標本数のも
とで認識率を高める方式が考慮されている（特願
昭52年43972号）。 For this reason, we focused on the fact that there are sections where the above parameters change rapidly over time and sections where they change gently, and by sampling densely in the former section and coarsely sampling in the latter section, A method is being considered that increases the recognition rate with a smaller number of samples by sampling at non-uniform sampling points (Patent Application No. 43972 of 1972).

本発明は上記不均一なサンプリング点でサンブ
リングする方式に限られるものではないが、該方
式の場合、上記パラメータが時間的になだらかに
変化する点即ち定常性をもつ点がいわば母音に対
応する音節に対応していることを利用し、予め認
識対象候補をしぼつた上で、より詳細な照合処理
を行なうようにして、処理速度を向上することを
目的としている。そして更に上記より詳細な照合
処理を行なうに当つて照合処理時間を短縮するこ
とを目的としている。そしてそのため、本発明の
単音節音声認識装置は、音声信号の周波数分析結
果を利用して当該音声信号の特徴量を抽出し該特
徴量に対応して未知入力音声の認識を行なう単音
節音声認識装置において、上記抽出される特徴量の定常性を利用して、単音節中の毎音部分に対応した標本点を抽出す
る母音定常部抽出機能部と、上記標本点に対応した母音特徴量をもつて予め
単音節毎に母音基準特徴量を登録している母音定
常部パラメータ登録部と、上記母音定常部抽出機能部によつて抽出された
標本点にもとづいて得られた上記未知入力音声に
おける母音特徴量と上記母音定常部パラメータ登
録部に登録されている母音基準特徴量とを照合す
ることによつて認識対象候補単音節を決定する母
音定常部照合・候補選択機能部とを少なくともそ
なえ、当該決定された認識対象候補単音節に対して上
記未知入力音声との２次照合をとるよう構成され
てなり、上記母音定常部抽出機能部は、入力音声の累積
変動量AV（tn）を逐次演算して累積し、該累積
変動量が予め定めた閾値に達した不均一サンプリ
ング点が決定され、該不均一サンプリング点毎の
当該不均一サンプリング点を決定するに至つた累
積回数に対応した重みが最大の不均一サンプリン
グ点をもつて、上記標本点を抽出するよう構成さ
れることを特徴としている。以下図面を参照しつつ
説明する。 Although the present invention is not limited to the above method of sampling at non-uniform sampling points, in the case of this method, the point where the above parameter changes gradually over time, that is, the point with stationarity, corresponds to a vowel. The purpose is to improve processing speed by narrowing down recognition target candidates in advance by using the fact that they correspond to syllables, and then performing more detailed matching processing. Furthermore, it is an object of the present invention to shorten the verification processing time when performing more detailed verification processing than the above. Therefore, the monosyllabic speech recognition device of the present invention extracts the feature quantity of the speech signal using the frequency analysis result of the speech signal, and performs monosyllabic speech recognition in accordance with the feature quantity. The device includes: a vowel stationary part extraction function unit that extracts sample points corresponding to each sound part in a single syllable by utilizing the stationarity of the extracted feature values; In the unknown input speech obtained based on the sample points extracted by the vowel constant part parameter registration unit which registers the vowel reference feature amount for each single syllable in advance, and the vowel constant part extraction function unit. at least a vowel constant part matching/candidate selection function unit that determines a recognition target candidate monosyllable by comparing the vowel feature quantity and the vowel reference feature quantity registered in the vowel constant part parameter registration unit; The vowel constant part extraction function unit is configured to perform a secondary comparison of the determined recognition target candidate monosyllable with the unknown input speech, and the vowel constant part extraction function unit sequentially extracts the cumulative variation amount AV (tn) of the input speech. A weight corresponding to the cumulative number of times the non-uniform sampling point has been calculated and accumulated, and the non-uniform sampling point at which the cumulative amount of variation has reached a predetermined threshold is determined, and the non-uniform sampling point for each non-uniform sampling point has been determined. is characterized in that it is configured to extract the sample points with the maximum number of non-uniform sampling points. This will be explained below with reference to the drawings.

第１図は本発明の一実施例構成、第２図は第１
図に示す母音定常部抽出回路部の一実施例構成、
第３図は第１図に示す１点鎖線部分の一実施例構
成を示す。 Figure 1 shows the configuration of one embodiment of the present invention, and Figure 2 shows the configuration of the first embodiment.
An example configuration of the vowel stationary part extraction circuit shown in the figure,
FIG. 3 shows an embodiment of the structure of the portion shown in FIG. 1 shown in dotted lines.

図中、１は抽出された特徴量、２は帯域フイル
タ群であつて入力音声をＮチヤンネル例えば15チ
ヤンネルの周波数信号P₁（ｔ），P₂（ｔ），……，
Ｐ_N（ｔ）に分解するもの、３はパラメータ抽出
部であつて入力音声即ち単音節音声の特徴量（パ
ラメータ）例えば第１ホルマント周波数に相当す
るモーメントM₁や第２ホルマント周波数に相当
するモーメントM₂や更には低域電力や高域電力
などを含めて後述するパラメータ時系列照合判定
部にもとづく単音節に対する照合に有効な特徴量
を抽出するもの、４はサンブリング時刻決定回路
であつて図示特徴量１に関連して示される如き不
均一サンプリング点T₀，T₁，……を決定するも
のを表わす。また５は不均一サンプリング回路部
であつて上記特徴量１に示すサンブリング点
T₀，T₁……Ｔ〓……に対応して特徴量をサンブ
リングして時系列情報を得るもの、６は母音定常
部抽出回路部であつて上記特徴量１に示すタイミ
ングＴ〓に対応した標本点とそれに対応した母音
特徴量を抽出するもの、７，８は夫々切換回路で
あつて登録音声に対応した情報を登録する登録モ
ードと未知入力音声を認識する認識モードとを切
換えるもの、９は母音定常部パラメータ登録部で
あつて各登録単音節に対応した母音基準特徴量を
格納するもの、１０は母音定常部照合・候補選定
部であつて認識モード時に上述の母音特徴量にも
とづいて認識対象候補単音節を選定するものを表
わす。更に１１はパラメータ時系列登録部であつ
て各登録単音節毎に上述のサンブリング点T₀，
T₁……，Ｔ〓……に対応した特徴量（パラメー
タ）を時系列に格納するもの、１２は候補選択回
路部であつて上述の候補選定部１０によつて選ば
れた候補単音節（複数個）に対応している基準特
徴量（パラメータ）時系列情報を次のパラメータ
時系列照合判定部１３に導びくもの、１３はパラ
メータ時系列照合判定部であつて認識モード時に
未知入力音声の特徴量（パラメータ）時系列情報
と上記候補単音節のそれとの照合をとるもの、１
４は出力回路であつて認識されたカテゴリ名を出
力するもの、１５は制御部であつて装置全体を制
御するものを表わしている。 In the figure, 1 is the extracted feature amount, and 2 is a group of band filters, which input the input audio into N channels, for example, 15 channels of frequency signals P ₁ (t), P ₂ (t), ...,
P _N (t), and 3 is a parameter extraction unit that extracts feature quantities (parameters) of the input speech, that is, monosyllabic speech, such as the moment M ₁ corresponding to the first formant frequency and the moment corresponding to the second formant frequency. 4 is a sampling time determination circuit that extracts feature quantities effective for matching monosyllables based on the parameter time series matching judgment unit described later, including M ₂ and further including low frequency power and high frequency power; It represents what determines the non-uniform sampling points T ₀ , T ₁ , . . . as shown in relation to the illustrated feature quantity 1. 5 is a non-uniform sampling circuit section, which is a sampling point shown in feature amount 1 above.
T ₀ , T ₁ ...T〓...corresponding to the timing T〓 shown in the above feature amount 1. 7 and 8 are respectively switching circuits that switch between a registration mode for registering information corresponding to registered speech and a recognition mode for recognizing unknown input speech. , 9 is a vowel constant part parameter registration unit which stores the vowel reference feature corresponding to each registered monosyllable, and 10 is a vowel constant part matching/candidate selection unit which stores the vowel feature in the above-mentioned vowel feature in the recognition mode. This represents the selection of recognition target candidate monosyllables based on this. Furthermore, 11 is a parameter time series registration unit which stores the above-mentioned sampling points T ₀ ,
12 is a candidate selection circuit section which stores the feature amounts (parameters) corresponding to T ₁ . 13 is a parameter time series matching/determining unit that guides the time series information of reference feature values (parameters) corresponding to the plurality of parameters) to the next parameter time series matching/judging unit 13, which guides the time series information of reference feature values (parameters) corresponding to Comparison of feature quantity (parameter) time series information with that of the above candidate monosyllable, 1
Reference numeral 4 represents an output circuit that outputs the recognized category name, and 15 represents a control unit that controls the entire apparatus.

パラメータ抽出回路３は、公知の如く、一定周
期のクロツク・パルスが発生する毎にに示す演算に対応して第１ホルマント周波数およ
び第２ホルマント周波数などに対応した特徴量を
計算し、その結果を図示しないレジスタに記憶す
る。なお、上記第(1)式において、Pi（tn）は例え
ば10msec毎の時点tnにおいてサンブリングされ
た第ｉ番目のフイルタの出力、Wijはその荷重、
Fiはその中心周波数を表わしている。そして上
記荷重Wijは上記量M₁，M₂が第１および第２ホ
ルマント周波数に一致するよう実験的に決定され
るものと考えてよい。勿論、該パラメータ抽出回
路３においては上記第１ホルマント周波数や第２
ホルマント周波数以外に他の特徴量を抽出するよ
うにされるが、以下、説明を簡単にするために上
記第１ホルマント周波数や第２ホルマント周波数
をもつて代表的な特徴量として説明する。 As is well known, the parameter extraction circuit 3 extracts data every time a clock pulse of a certain period occurs. The feature amount corresponding to the first formant frequency, the second formant frequency, etc. is calculated in accordance with the calculation shown in , and the result is stored in a register (not shown). In the above equation (1), Pi (tn) is the output of the i-th filter sampled at time tn every 10 msec, Wij is its load,
Fi represents its center frequency. The load Wij may be considered to be determined experimentally so that the quantities M ₁ and M ₂ coincide with the first and second formant frequencies. Of course, in the parameter extraction circuit 3, the first formant frequency and the second
In addition to the formant frequency, other feature quantities are extracted, but in order to simplify the explanation, the first formant frequency and the second formant frequency will be described below as typical feature quantities.

サンブリング時刻決定回路４は、で定義される累積変動量AV（tn）の演算を、上
記第(1)式に示す特徴量M₁，M₂を演算する周期で
実行してゆき、上記不均一サンプリング点tnkを
決定する。即ち、上記累積変動量AV（tn）が予
め定められた閾値を超えたか否かを監視し、該閾
値を超えた時点tnkを第ｋ番目の不均一サンプリ
ング点として決定する。 The sampling time determination circuit 4 is The cumulative variation amount AV(tn) defined by is calculated at the cycle of calculating the feature amounts M ₁ and M ₂ shown in the above equation (1), and the non-uniform sampling point tnk is determined. That is, it is monitored whether the cumulative variation amount AV(tn) exceeds a predetermined threshold value, and the time point tnk at which the threshold value is exceeded is determined as the k-th non-uniform sampling point.

なお上記第(2)式において、Ｖ（tn）はフイルタ
の出力変動量であり、次式で定義される。 In the above equation (2), V(tn) is the amount of variation in the output of the filter, and is defined by the following equation.

上記によつて、累積変動量AV（tn）は或る不
均一サンプリング点が発生した以後においてパラ
メータの変化を累積していつたものであることが
判る。そして該累積変動量AV（tn）が或る閾値
を超えると次の不均一なサンブリング点が決定さ
れその時点で先の累積変動量AV（tn）はリセツ
トされることが判る。この結果上記不均一サンプ
リング点T₀，T₁……Ｔ〓，……はパラメータの
変化が急激である区間で密に現われ、変化が定常
的である区間で粗に現われることが判る。 From the above, it can be seen that the cumulative variation amount AV(tn) is an accumulation of parameter changes after a certain non-uniform sampling point occurs. It can be seen that when the cumulative variation AV(tn) exceeds a certain threshold, the next non-uniform sampling point is determined, and at that point the previous cumulative variation AV(tn) is reset. As a result, it can be seen that the non-uniform sampling points T ₀ , T ₁ . . . T〓, .

不均一サンプリング回路５は、上記不均一サン
プリング点毎に上記抽出された特徴量M₁，M₂を
サンブリングして、登録モードには第１図図示の
パラメータ時系列登録部１１に格納し、また認識
モード時にはパラメータ時系列照合判定部１３に
入力する。 The non-uniform sampling circuit 5 samples the extracted feature quantities M ₁ and M ₂ for each non-uniform sampling point and stores them in the parameter time series registration unit 11 shown in FIG. 1 in the registration mode, Also, in the recognition mode, the parameters are input to the parameter time series comparison determination section 13.

母音定常部抽出回路部６は、(i)不均一サンプリ
ング点T₀，T₁……が決定されてゆく間における
上記累積回数をカウントしてゆき、(ii)該カウント
値を重みＷとしたとき該重みＷの最大の不均一サ
ンプリング点（特徴量１の場合、サンブリング点
Ｔ〓）を決定する。即ち標本点を決定する。そし
て当該標本点に対応して得られた母音特徴量を、
登録モード時には母音定常部パラメータ登録部９
に、また認識モード時には母音定常部照合・候補
選定部１０に供給する。 The vowel stationary part extraction circuit unit 6 (i) counts the cumulative number of times while the non-uniform sampling points T ₀ , T ₁ . . . are being determined, and (ii) uses the counted value as a weight W. Then, the maximum non-uniform sampling point of the weight W (in the case of feature amount 1, the sampling point T) is determined. That is, sample points are determined. Then, the vowel features obtained corresponding to the sample point are
In registration mode, vowel stationary part parameter registration part 9
Also, in the recognition mode, it is supplied to the vowel constant part matching/candidate selection unit 10.

未知入力音声である単音節音声を認識する認識
モードにおいては、次のように処理される。なお
この時においては、登録単音節に対応した特徴量
時系列情報が第１図図示の登録部１１に格納され
ており、また登録単音節に対応した母音特徴量が
第１図図示の登録部９に格納されている。 In the recognition mode for recognizing monosyllabic speech that is unknown input speech, processing is performed as follows. At this time, the feature amount time series information corresponding to the registered monosyllable is stored in the registration unit 11 shown in FIG. 1, and the vowel feature amount corresponding to the registered monosyllable is stored in the registration section 11 shown in FIG. It is stored in 9.

(1) 入力された未知入力音声に対応して、帯域フ
イルタ群２、パラメータ抽出回路３、サンブリ
ング時刻決定回路４、不均一サンプリング回路
部５を介して、上述の如く、不均一サンプリン
グ点T₀，T₁……に対応した特徴量がパラメー
タ時系列照合判定部１３に供給される。(1) Corresponding to the input unknown input voice, the non-uniform sampling point T is determined as described above via the band filter group 2, the parameter extraction circuit 3, the sampling time determination circuit 4, and the non-uniform sampling circuit section 5. ₀ , T _{1 , .} . . are supplied to the parameter time series comparison determination unit 13.

(2) 一方、母音定常部抽出回路部６によつて、上
述の如く、抽出された標本点Ｔ〓に対応した母
音特徴量が母音定常部照合・候補選定部１０に
供給される。(2) On the other hand, the vowel constant extraction circuit section 6 supplies the vowel feature amount corresponding to the extracted sample point T as described above to the vowel constant section matching/candidate selection section 10.

(3) このとき、制御部１５の制御のもとに、登録
単音節に対応した母音基準特徴量が図示登録部
９から、各登録単音節毎に図示照合・候補選定
部１０に順次読出される。(3) At this time, under the control of the control unit 15, the vowel reference features corresponding to the registered monosyllables are sequentially read out from the illustrated registration unit 9 to the illustrated matching/candidate selection unit 10 for each registered monosyllable. Ru.

(4) 該照合・候補選定部１０は、上記読出された
母音基準特徴量と上記母音定常部抽出回路部６
から供給された母音特徴量とを順次照合してゆ
く。そして照合がとれた複数の単音節を候補単
音節として選定し、候補選択回路１２に通知す
る。(4) The matching/candidate selection unit 10 uses the read vowel reference feature amount and the vowel constant part extraction circuit unit 6.
The vowel features supplied from Then, a plurality of matched monosyllables are selected as candidate monosyllables, and the candidate selection circuit 12 is notified.

(5) 次いで、上記選択された候補単音節にしぼら
れた上での詳細な即ち２次照合処理に入つてゆ
く。即ち、制御部１５はパラメータ時系列登録
部１１に対して読出しをかける。該読出しによ
つて登録部１１から各登録単音節毎に時系列情
報が出力されてくるが、このとき候補選択回路
１２は上記処理(4)において選択された候補単音
節に対応する基準時系列情報のみを選別した上
でパラメータ時系列照合判定部１３に伝送す
る。(5) Next, the candidates are narrowed down to the selected single syllables and then detailed, ie, secondary matching processing begins. That is, the control unit 15 reads the parameter time series registration unit 11. Through this reading, time series information is output from the registration unit 11 for each registered monosyllable, but at this time, the candidate selection circuit 12 selects the reference time series corresponding to the candidate monosyllable selected in the above process (4). Only the information is selected and transmitted to the parameter time series comparison determination section 13.

(6) 判定部１３では、先に不均一サンプリング回
路部５から未知入力音声に対応した特徴量時系
列情報が入力されている。そして該情報が入力
されている。そして該情報と上記基準時系列情
報との照合をとる。該照合処理に当つては、例
えば公知のダイナミツク・プログラミングを用
いた処理をとるようにされる。(6) In the determination unit 13, the feature amount time series information corresponding to the unknown input voice is previously input from the non-uniform sampling circuit unit 5. The information has been input. Then, the information is compared with the reference time series information. The verification process is performed using, for example, known dynamic programming.

(7) そして、もつともよく照合のとれた登録単音
節をもつて上記未知入力音声が当該登録単音節
に属するものとされて、出力回路１４に認識結
果としてセツトされる。(7) Then, if the registered monosyllable is matched well, the unknown input speech is determined to belong to the registered monosyllable, and is set in the output circuit 14 as a recognition result.

第２図は第１図に示す母音定常部抽出回路部６
の一実施例構成を示す。図において、１６は比較
回路、１７は重み最大値レジスタ、１８は重み最
大値パラメータ・レジスタを表わしている。 Figure 2 shows the vowel stationary part extraction circuit 6 shown in Figure 1.
The configuration of one embodiment is shown. In the figure, 16 represents a comparison circuit, 17 represents a maximum weight value register, and 18 represents a maximum weight value parameter register.

第２図図示の構成は次のように動作する。即ち (8) 今、第１図図示の不均一サンプリング回路５
にもとづいて、上述の不均一サンプリング点
T₀，T₁，……に対応した重みやパラメータが
得られているものとする。 The arrangement shown in FIG. 2 operates as follows. That is, (8) Now, the non-uniform sampling circuit 5 shown in FIG.
Based on the above non-uniform sampling points
It is assumed that weights and parameters corresponding to T ₀ , T ₁ , . . . have been obtained.

(9) この状態で、不均一サンプリング点T₀，
T₁，……に夫々対応した重みとパラメータと
が順に入力されてくる。最初、重み最大値レジ
スタ１７は零にクリヤされている。(9) In this state, the uneven sampling point T ₀ ,
Weights and parameters corresponding to T ₁ , . . . are input in order. Initially, the weight maximum value register 17 is cleared to zero.

(10) 不均一サンプリング点T₀に対応した重みが
比較回路１６に入力されてくるとき、該重み
W₀がレジスタ１７の内容Rmaxと比較される。
そして不均一サンプリング点T₀に対応した重
みがレジスタ回路１６の内容よりも大であるこ
とから、当該重みがレジスタ１７にセツトされ
ると共に、不均一サンプリング点T₀に対応し
たパラメータP₀がレジスタ１８にセツトされ
る。(10) When the weight corresponding to the non-uniform sampling point T ₀ is input to the comparison circuit 16, the weight
W ₀ is compared with the contents of register 17 Rmax.
Since the weight corresponding to the non-uniform sampling point _T0 is larger than the contents of the register circuit 16, the weight is set in the register 17, and the parameter P0 corresponding to the non-uniform sampling point _T0 is set in the register circuit ₁₆ . It is set to 18.

(11) 以下不均一サンプリング点T₁に対応した重
みW₁が入力されてくるとき、W₁≧Rmaxなる
条件が調べられる。そして該条件が満足されて
いれば、レジスタ１７内に重みW₁がセツトさ
れると共にレジスタ１８内にパラメータP₁がセ
ツトされる。しかし、上記条件が満足されない
場合、レジスタ１７，１８の内容は変更されな
い。(11) Below, when the weight W ₁ corresponding to the non-uniform sampling point T ₁ is input, the condition that W ₁ ≧Rmax is checked. If the condition is satisfied, the weight W ₁ is set in the register 17 and the parameter P ₁ is set in the register 18. However, if the above conditions are not satisfied, the contents of registers 17 and 18 are not changed.

(12) このようにして、順次不均一サンプリング点
に対応した重みとパラメータとが入力されてゆ
き、最終的に重みが最大となる不均一サンプリ
ング点Ｔ〓に対応した重みＷαがレジスタ１７
にセツトされ、またパラメータＰαがレジスタ
１８にセツトされる。(12) In this way, the weights and parameters corresponding to the non-uniform sampling points are inputted one after another, and finally the weight Wα corresponding to the non-uniform sampling point T〓 having the maximum weight is stored in the register 17.
The parameter Pα is also set in the register 18.

(13) そして、重み最大な不均一サンプリング点
Ｔ〓に対応したパラメータＰαが切換回路８を
介して、第１図図示の登録部９あるいは選定部
１０に供給される。(13) Then, the parameter Pα corresponding to the uneven sampling point T with the largest weight is supplied via the switching circuit 8 to the registration section 9 or the selection section 10 shown in FIG.

第３図は第１図図示１点鎖線部の一実施例構成
を示す。図中の符号９，１０は第１図に対応し、
１９，２０は夫々アドレス・カウンタ、２１は照
合処理部、２２は比較回路、２３は候補母音パラ
メータ・レジスタ、２４はパラメータ時系列登録
部アドレス情報、２５は選択回路、２６は候補登
録部アドレス情報格納部を表わしている。 FIG. 3 shows an embodiment of the structure of the part shown in FIG. Reference numerals 9 and 10 in the figure correspond to those in FIG.
19 and 20 are address counters, 21 is a collation processing unit, 22 is a comparison circuit, 23 is a candidate vowel parameter register, 24 is parameter time series registration unit address information, 25 is a selection circuit, and 26 is candidate registration unit address information. It represents the storage section.

認識モードのもとにおいては、複数の登録単音
節の母音基準特徴量が母音定常部パラメータ登録
部９に格納されている。図示の場合、次のように
動作する。即ち (14) この状態で、未知入力音声の母音特徴量が
切換回路８をへて供給されてくると、カウンタ
１９によつて登録部９から各登録単音節の母音
基準特徴量が順に読出される。そして上記入力
音声の母音特徴量との距離が測定される。即ち
照合処理部２１において順次照合されてゆく。 In the recognition mode, vowel reference feature amounts of a plurality of registered single syllables are stored in the vowel constant part parameter registration unit 9. In the illustrated case, the operation is as follows. That is, (14) In this state, when the vowel feature of the unknown input speech is supplied through the switching circuit 8, the counter 19 sequentially reads out the vowel reference feature of each registered monosyllable from the registration section 9. Ru. Then, the distance from the vowel feature of the input voice is measured. That is, the verification processing unit 21 sequentially performs verification.

(15) 該照合の結果、比較的よい照合がとれた場
合、当該母音基準特徴量は比較回路２２に供給
される。このときカウンタ２０がカウントを開
始して、候補母音パラメータ・レジスタ２３か
ら、既に先に格納されている候補母音特徴量が
順次読出され、比較回路２２に供給される。(15) If a relatively good match is obtained as a result of the matching, the vowel reference feature is supplied to the comparison circuit 22. At this time, the counter 20 starts counting, and the previously stored candidate vowel feature quantities are sequentially read out from the candidate vowel parameter register 23 and supplied to the comparison circuit 22.

(16) 比較回路２２において、上記照合処理部２
１から供給されてきた母音特徴量がレジスタ２
３からの候補母音特徴量と比較される。もしも
一致しているものがない場合、当該母音特徴量
はレジスタ２３に候補母音特徴量として格納さ
れる。また一致しているものが既に候補となつ
ている場合には、レジスタ２３内に格納される
ことはない。(16) In the comparison circuit 22, the verification processing section 2
The vowel features supplied from 1 are stored in register 2.
It is compared with the candidate vowel features from 3. If there is no matching vowel feature, the vowel feature is stored in the register 23 as a candidate vowel feature. Further, if a matching item is already a candidate, it will not be stored in the register 23.

(17) 上記処理（16）において、レジスタ２３内
に新規に候補として格納されるとき、比較回路
２２は選択回路２５に対して指示を発する。こ
れによつて選択回路２５は、上記新規候補母音
特徴量に対応した登録単音節の特徴量時系列が
格納されている登録部（第１図図示の登録部１
１）のアドレス情報２４を、候補登録部アドレ
ス情報格納部２６にセツトする。即ち情報格納
部２６内は、レジスタ２３内に格納される候補
母音特徴量に対応した登録単音節の特徴時系列
情報の格納アドレス情報がセツトされる。該ア
ドレス情報は、２次照合に当つて、候補単音節
に対応した基準特徴量を選択的に第１図図示の
照合判定部１３に供給するために利用される。(17) In the above process (16), when a candidate is newly stored in the register 23, the comparison circuit 22 issues an instruction to the selection circuit 25. As a result, the selection circuit 25 selects a registration unit (registration unit 1 shown in FIG.
The address information 24 of 1) is set in the address information storage section 26 of the candidate registration section. That is, in the information storage section 26, storage address information of feature time series information of registered monosyllables corresponding to the candidate vowel feature amount stored in the register 23 is set. The address information is used for selectively supplying the reference feature amount corresponding to the candidate monosyllable to the matching determination unit 13 shown in FIG. 1 during the secondary matching.

上記の如く、候補単音節が選択され、以後該候
補として絞られた単音節に対して２次照合が行な
われる。この場合、言うまでもなく、未知入力音
声の特徴量時系列情報と登録単音節の基準特徴量
時系列情報とが時系列をたどりつつ照合されてゆ
く。この場合の照合処理時間は、大略時系列情報
の情報数Ｎの２乗に比例する。 As described above, candidate monosyllables are selected, and thereafter, secondary matching is performed on the monosyllables narrowed down as candidates. In this case, needless to say, the feature amount time-series information of the unknown input speech and the reference feature amount time-series information of the registered monosyllable are compared in chronological order. The matching processing time in this case is roughly proportional to the square of the number N of time series information.

本発明の場合、上記２次照合処理時に次の如き
処理態様をとり、照合処理時間を更に短縮するよ
うにしている。即ち、２次照合を行なうに当つ
て、特徴量時系列情報の先頭の情報から照合をと
つてゆくが、上記標本点Ｔ〓までの時系列情報に
よつて、照合を調べるようにする。このようにす
ることによつて、照合時間が短縮される（例えば
情報数が1/2になれば時間は1/4に短縮される。）
更に上記標本点以降の特徴量については、パワー
が低く一般にバラツキが多い。このために該バラ
ツキの多い特徴量を利用して照合をとる場合に認
識誤りを生ずる１つの原因ともなることがあつた
が、この点もあわせて改善される。勿論当該２次
照合処理に当つてどのような特徴量を用いるかは
任意であるが、本発明が直接関連するいわば１次
照合においては、母音についての照合を行うにと
どまつている。 In the case of the present invention, the following processing mode is adopted during the secondary verification processing to further shorten the verification processing time. That is, in performing the secondary verification, verification is performed starting from the first information in the feature quantity time-series information, but verification is performed using the time-series information up to the sample point T〓. By doing this, the matching time is shortened (for example, if the number of information is halved, the time is shortened to 1/4).
Furthermore, the power of the feature values after the sample point is low and there is generally a lot of variation. This has sometimes been one of the causes of recognition errors when matching is performed using the widely varying feature quantities, but this point is also improved. Of course, what kind of feature quantity is used in the secondary matching process is arbitrary, but in the so-called primary matching, which is directly related to the present invention, only vowels are matched.

以上説明した如く、本発明によれば音声認識処
理に当つて処理時間が大幅に短縮される。そして
特に不均一サンプリング点によるサンブリング方
式を採用した場合簡単に標本点を抽出することが
可能となる。 As explained above, according to the present invention, the processing time for speech recognition processing is significantly reduced. In particular, when a sampling method using non-uniform sampling points is adopted, sample points can be easily extracted.

[Brief explanation of the drawing]

第１図は本発明の一実施例構成、第２図は第１
図に示す母音定常部抽出回路部の一実施例構成、
第３図は第１図に示す１点鎖線部分の一実施例構
成を示す。図中、１は抽出された特徴量、５は不均一サン
プリング回路部、６は母音定常部抽出回路部、９
は母音定常部パラメータ登録部、１０は母音定常
部照合・候補選定部、１１はパラメータ時系列登
録部、１２は候補選択回路、１３はパラメータ時
系列照合判定部を表わす。 Figure 1 shows the configuration of one embodiment of the present invention, and Figure 2 shows the configuration of the first embodiment.
An example configuration of the vowel stationary part extraction circuit shown in the figure,
FIG. 3 shows an embodiment of the structure of the portion shown in FIG. 1 shown in dotted lines. In the figure, 1 is the extracted feature amount, 5 is the non-uniform sampling circuit part, 6 is the vowel stationary part extraction circuit part, 9
10 represents a vowel constant part parameter registration unit, 10 represents a vowel constant part matching/candidate selection unit, 11 represents a parameter time series registration unit, 12 represents a candidate selection circuit, and 13 represents a parameter time series matching determination unit.

Claims

[Scope of Claims] 1. A monosyllabic speech recognition device that uses frequency analysis results of a speech signal to extract feature quantities of the speech signal and recognizes unknown input speech in accordance with the feature quantities, a vowel stationary part extraction function unit that extracts sample points corresponding to the vowel part in a single syllable by using the stationarity of the feature values; A vowel stationary part parameter registration unit that registers vowel reference feature quantities, and a vowel feature quantity in the unknown input speech obtained based on the sample points extracted by the vowel stationary part extraction function unit and the vowel stationary part. a vowel stationary part matching/candidate selection function unit that determines a recognition target candidate single syllable by comparing it with vowel reference feature quantities registered in a part parameter registration unit; The vowel constant part extraction function section sequentially calculates and accumulates the cumulative variation amount AV (tn) of the input speech, and performs a secondary comparison with the unknown input speech for a single syllable. A non-uniform sampling point whose cumulative variation amount has reached a predetermined threshold is determined, and the non-uniform sampling point has the maximum weight corresponding to the cumulative number of times that the non-uniform sampling point has been determined for each non-uniform sampling point. A monosyllabic speech recognition device characterized in that it is configured to extract the sample points as described above. 2 The above-mentioned secondary verification is performed using the reference feature time series information from the tip of the recognition target candidate monosyllable to the sample point where the vowel reference feature was determined, and the reference feature time series information from the tip of the unknown input speech to the vowel feature. 2. The monosyllabic speech recognition device according to claim 1, wherein the monosyllabic speech recognition device is configured to compare the feature amount time series information up to the determined sample point.