JP2572753B2

JP2572753B2 - Unspecified speaker consonant identification device

Info

Publication number: JP2572753B2
Application number: JP61222397A
Authority: JP
Inventors: 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-09-19
Filing date: 1986-09-19
Publication date: 1997-01-16
Anticipated expiration: 2012-01-16
Also published as: JPS6377098A

Description

【発明の詳細な説明】［概要］予め多数の話者の音声に係るそれらの音節パターンを
辞書に登録しておいて、入力された音声を分析後、これ
と該辞書内の各音節パターンとを照合して、最も距離の
小なるものを選び出すことにより音声認識を行なう装置
によって、不特定話者の音声を認識しようとする場合、
子音自体の調音位置を表すスペクトルが不安定である
上、子音の識別に重要な役割りを有する渡り部のスペク
トル（音節を構成する子音と母音との間のスペクトル）
が、話者が異なると同一種類の母音であっても母音部パ
ターンが異なることから、後続の母音の変化に応じて変
化してしまうため、認識率の高い装置を実現することが
困難であると言う問題点があった。本発明はこのような
従来の問題点を解決するため、入力された音声から母音
を抽出すると共にその種類を識別して、辞書中から、こ
れと同一種類の母音を有し母音スペクトルの類似した音
節を候補として選出し、これらの候補と入力音声の分析
結果とを照合することにより高い子音識別率を得ること
の出来る技術について開示している。DETAILED DESCRIPTION OF THE INVENTION [Overview] The syllable patterns related to the voices of a large number of speakers are registered in a dictionary in advance, and the input voice is analyzed. When a device that performs speech recognition by selecting the object with the smallest distance by comparing the
The spectrum showing the articulation position of the consonant itself is unstable, and the spectrum of the transitional part that plays an important role in the identification of the consonant (the spectrum between the consonant and the vowel constituting the syllable)
However, even if the speakers are different, even if the vowels are of the same type, the vowel part patterns are different, so that they change according to the change of the subsequent vowels, so that it is difficult to realize a device with a high recognition rate. There was a problem that said. In order to solve such a conventional problem, the present invention extracts vowels from input speech and identifies their types, and from the dictionary, has a vowel of the same type as this and has a similar vowel spectrum. A technique is disclosed in which syllables are selected as candidates, and a high consonant identification rate can be obtained by comparing these candidates with the analysis results of the input speech.

［産業上の利用分野］本発明は、不特定話者を対象とする音声認識装置に関
するものであって、特に高い認識効率を得ることの出来
る不特定話者子音識別装置に係る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition device for an unspecified speaker, and more particularly to an unspecified speaker consonant identification device capable of obtaining high recognition efficiency.

［従来の技術］音声認識装置は文書作成や多数のコマンドを必要とす
る装置の操作を音声で行う場合に有用である。各話者毎
に、音声の違いがあるため、不特定話者の音声認識、特
に子音認識が困難である。そのため、話者毎に膨大な音
声を登録する認識装置が実用化されているが、この膨大
な登録量のため音声認識装置の普及が阻まれている。2. Description of the Related Art A speech recognition device is useful when performing speech creation for a document or operation of a device that requires a large number of commands. Since there is a difference in voice for each speaker, it is difficult to recognize a voice of an unspecified speaker, especially a consonant. For this reason, a recognition device that registers an enormous amount of speech for each speaker has been put into practical use, but the enormous amount of registration has prevented the spread of the speech recognition device.

第２図は従来の不特定話者音声認識装置の構成の例を
示すブロック図であって、50は分析部、51は照合部、52
は音節パターン辞書（マルチテンプレート）を表わして
いる。FIG. 2 is a block diagram showing an example of the configuration of a conventional speaker-independent speech recognition apparatus.
Represents a syllable pattern dictionary (multi-template).

第２図において、分析部50は入力された音声をディジ
タル信号に変換し、数十ミリ秒ごとに短区間周波数分析
してその結果の周波数スペクトルを出力する。照合部51
はこれを音節パターン辞書52に格納されている総ての音
節パターンと照合して最も距離の近い音節を識別結果と
して出力する。In FIG. 2, an analysis unit 50 converts an input voice into a digital signal, performs a short-period frequency analysis every several tens of milliseconds, and outputs a resultant frequency spectrum. Collation unit 51
Matches this with all syllable patterns stored in the syllable pattern dictionary 52, and outputs the syllable that is closest in distance as the identification result.

［発明が解決しようとする問題点］第３図は音節パターンの例を示す図であって、縦はフ
ォルマント周波数、横軸は時間を示しており、53、54は
それぞれ異なる音節のパターンであって、それぞれ実線
で示したものが話者Ａの音声入力、点線で示したものが
話者Ｂの音声入力によるものである。[Problems to be Solved by the Invention] FIG. 3 is a diagram showing an example of a syllable pattern, in which the vertical axis represents the formant frequency, the horizontal axis represents time, and 53 and 54 represent different syllable patterns. The solid lines indicate the voice input of speaker A, and the dotted lines indicate the voice input of speaker B.

第３図に見られるように、音節内の子音の特徴は、子
音自体の調音位置を表わすスペクトル（以下調音位置ス
ペクトルと呼ぶ。たとえば、破裂音の場合は破裂部スペ
クトルがこれにあたる）および子音から後続母音への渡
りのスペクトルの両方にある。調音位置スペクトルは、
一般に不安定であり特に連続音声中ではいつも観測でき
るとは限らない。そこで、渡り部スペクトルを利用する
ことが望まれる。しかし、渡り部スペクトルは後続する
母音によって著しく変化するため特定話者の場合でも後
続母音毎に識別法を変える必要がある。不特定話者の場
合、同じ種類の母音でも話者が異なると母音パターンが
異なる渡り部スペクトルも変化し、子音識別に重大な影
響を与えることになる。As can be seen in FIG. 3, the characteristics of the consonants in the syllables include the spectrum representing the articulation position of the consonant itself (hereinafter referred to as the articulation position spectrum. Both in the spectrum of the transition to the subsequent vowel. The articulation position spectrum is
It is generally unstable and cannot always be observed, especially in continuous speech. Therefore, it is desired to use the transition part spectrum. However, since the transition spectrum significantly changes depending on the succeeding vowel, it is necessary to change the identification method for each succeeding vowel even for a specific speaker. In the case of an unspecified speaker, even if the vowels of the same type are different from one another, the transition portion spectrum having a different vowel pattern changes, which has a significant effect on consonant identification.

このような条件から、前述したように従来の不特定話
者音声認識装置においては、膨大な音声を登録する必要
がある上、不特定話者の子音の識別が必ずしも明確に行
えない場合があることから、高い認識率を有する装置を
実現することが困難であると言う問題点があった。Under such conditions, as described above, in the conventional speaker-independent speech recognition apparatus, it is necessary to register an enormous amount of speech, and it may not always be possible to clearly identify the consonant of the speaker. Therefore, there is a problem that it is difficult to realize a device having a high recognition rate.

本発明はこのような従来の問題点に鑑み、子音部の識
別に母音部の話者の違いによる影響を与えないようにし
て、高い認識率を得ることの出来る不特定話者子音識別
装置を提供することを目的としている。In view of such a conventional problem, the present invention provides an unspecified speaker consonant discrimination device that can obtain a high recognition rate by preventing the identification of consonants from being affected by the difference in speakers of vowel parts. It is intended to provide.

［問題点を解決するための手段］本発明によれば、上述の目的は、前記特許請求の範囲
に記載した手段により達成される。[Means for Solving the Problems] According to the present invention, the above-mentioned object is achieved by the means described in the claims.

すなわち、本発明は、入力音声をディジタル信号に変
換して短区間周波数分析を行なう分析部と、分析部の出
力から入力音声の母音部パターンを抽出する母音抽出部
と、抽出された母音部パターンがどの母音に相当するか
を識別する母音識別部と、予め用意した多数の話者の音
節パターンを格納した音節パターン辞書と、母音識別部
が識別した母音に相当する母音部パターンを持つ音節パ
ターンを音節パターン辞書から取り出す第一辞書選択部
と、第一辞書選択部で選択した各音節パターンにおける
母音部パターンと、母音抽出部で抽出した入力音声の音
節パターンにおける母音部パターンとを照合して、第一
辞書選択部で選択した音節パターンの中から候補となる
音節パターンを選択する第二辞書選択部と、分析部の出
力と第二辞書選択部によって選択された候補の音節パタ
ーンとを照合する照合部とを具備する不特定話者子音識
別装置である。That is, the present invention provides an analysis unit that converts an input voice into a digital signal and performs a short interval frequency analysis, a vowel extraction unit that extracts a vowel portion pattern of the input voice from an output of the analysis unit, and an extracted vowel portion pattern. A vowel identification unit for identifying which vowel corresponds to, a syllable pattern dictionary storing syllable patterns of a number of speakers prepared in advance, and a syllable pattern having a vowel part pattern corresponding to the vowel identified by the vowel identification unit The first vocal part pattern in each syllable pattern selected by the first dictionary selecting part, and the vowel part pattern in the syllable pattern of the input speech extracted by the vowel extracting part are extracted by extracting the vowel part pattern from the syllable pattern dictionary. A second dictionary selection unit for selecting a candidate syllable pattern from the syllable patterns selected by the first dictionary selection unit, and an output of the analysis unit and a second dictionary selection unit Therefore unspecified speaker consonant identification apparatus and a collation unit for collating the syllable pattern of selected candidate.

［作用］上述した本発明による不特定話者子音識別装置におい
ては、子音に後続する母音の種類を考慮することは勿
論、本発明では母音部の話者特性も考慮することによっ
て過渡部および母音部の話者による違いの影響を除去す
る。具体的には、多数の話者より作成した音節パターン
辞書の内、入力音節の母音部分のスペクトルとよく似た
母音部分のスペクトルを持つものだけを識別時に用いる
ことによってこれを実現するものであって、不特定話者
の音声に対して高い子音識別率を得ることが出来る。[Operation] In the above-described unspecified speaker consonant identification device according to the present invention, not only the type of a vowel following the consonant is considered, but also the present invention considers the speaker characteristics of the vowel part, so that the transient part and the vowel part can be obtained. Eliminate the effects of differences between different speakers. Specifically, among syllable pattern dictionaries created by many speakers, only those having a vowel part spectrum that is very similar to the vowel part spectrum of the input syllable are used for identification. Thus, a high consonant recognition rate can be obtained for the voice of the unspecified speaker.

［実施例］第１図は本発明の一実施例のブロック図であって、１
は分析部、２は母音抽出部、３は母音識別部、４は音節
パターン辞書、５は第一辞書選択部、６は第二辞書選択
部、７は照合部を表わしている。FIG. 1 is a block diagram of an embodiment of the present invention.
Denotes an analysis unit, 2 denotes a vowel extraction unit, 3 denotes a vowel identification unit, 4 denotes a syllable pattern dictionary, 5 denotes a first dictionary selection unit, 6 denotes a second dictionary selection unit, and 7 denotes a collation unit.

入力された音声は分析部１でデジタル化された後、数
十ms毎にFFT等で周波数分析（スペクトル分析）され
る。The input voice is digitized by the analysis unit 1 and frequency-analyzed (spectral analysis) by FFT or the like every several tens of ms.

母音抽出部２では入力された音声のパワーが大きくス
ペクトル変化の少ない部分が母音部として抽出され、母
音部分のスペクトルが母音識別部及び第二辞書選択部６
へ送られる。The vowel extractor 2 extracts a portion of the input voice where the power of the voice is large and the spectrum change is small as a vowel portion, and the spectrum of the vowel portion is converted into a vowel identification portion and a second dictionary selection portion 6.
Sent to

母音識別部３では、入力音声の母音部スペクトルを用
いて母音部の識別を行う。日本語の場合、５母音（アイ
ウエオ）の識別が行われる。The vowel identification unit 3 identifies a vowel using the vowel spectrum of the input voice. In the case of Japanese, identification of five vowels (aiueo) is performed.

音節パターン辞書４には、多数話者により登録され
た、音節名、母音名、音節パターンおよび母音パターン
等が第１表に示す形式で記憶されている。The syllable pattern dictionary 4 stores syllable names, vowel names, syllable patterns, vowel patterns, and the like, registered by many speakers.

第一辞書選択部５では、母音識別結果に従って音節パ
ターン辞書４から該当する項目を選び出し、第二辞書選
択部６へ送る。The first dictionary selection unit 5 selects a corresponding item from the syllable pattern dictionary 4 according to the vowel identification result, and sends it to the second dictionary selection unit 6.

第二辞書選択部６では、第一辞書選択部５で選択され
た各項目の母音パターンと母音抽出部２で得られた入力
音声の母音パターンの距離計算（類似度計算）を行い、
同一音節名ごとに距離の小さい（類似度の大きい）ほう
から複数個を照合部７へ送り出す。 The second dictionary selection unit 6 calculates a distance (similarity calculation) between the vowel pattern of each item selected by the first dictionary selection unit 5 and the vowel pattern of the input voice obtained by the vowel extraction unit 2.
A plurality of the same syllable names are sent to the matching unit 7 from the smaller distance (larger similarity).

照合部７では、第二辞書選択部６で選択された項目の
音節過渡部パターンと入力音節の過渡部パターンの距離
計算（類似度計算）を行い最も距離の小さい（類似度の
大きい）もの１つまたは距離の小さい（類似度の大き
い）ものから複数個を識別結果として出力する。The matching unit 7 calculates the distance (similarity calculation) between the syllable transition pattern of the item selected by the second dictionary selection unit 6 and the transition pattern of the input syllable (similarity calculation). One or a plurality of shortest distances (larger similarities) are output as identification results.

ここで、上記音節過渡部パターンに音節の子音の調音
位置スペクトルを含めておいてもよい。Here, the articulation position spectrum of a consonant of a syllable may be included in the syllable transition part pattern.

［発明の効果］以上説明したように、本発明によれば、話者による母
音のスペクトルの違いを考慮して音節過渡部の情報を子
音識別に利用できるので、高い認識率を達成することが
できる。[Effects of the Invention] As described above, according to the present invention, the information of the syllable transition part can be used for consonant discrimination in consideration of the difference in the spectrum of vowels between speakers, so that a high recognition rate can be achieved. it can.

[Brief description of the drawings]

第１図は本発明の一実施例のブロック図、第２図は従来
の不特定話者音声認識装置の構成を示すブロック図、第
３図は音節パターンの例を示す図である。１……分析部、２……母音抽出部、３……母音識別部、
４……音節パターン辞書、５……第一辞書選択部、６…
…第二辞書選択部、７……照合部FIG. 1 is a block diagram of one embodiment of the present invention, FIG. 2 is a block diagram showing a configuration of a conventional speaker-independent speech recognition device, and FIG. 3 is a diagram showing an example of a syllable pattern. 1 ... analysis unit, 2 ... vowel extraction unit, 3 ... vowel identification unit,
4 ... Syllable pattern dictionary, 5 ... First dictionary selection unit, 6 ...
... Second dictionary selection unit, 7 ... Collation unit

Claims

(57) [Claims]

An analysis section for converting an input voice into a digital signal to perform a short-period frequency analysis, and a vowel extraction section for extracting a vowel pattern of the input voice from an output of the analysis section. A vowel identification unit (3) for identifying which vowel the extracted vowel pattern corresponds to; a syllable pattern dictionary (4) storing syllable patterns of a large number of speakers prepared in advance; A first lexical selector (5) for extracting a syllable pattern having a vowel part pattern corresponding to the vowel identified by (3) from the syllable pattern dictionary (4), and each syllable pattern selected by the first dictionary selector (5) Is compared with the vowel pattern of the syllable pattern of the input voice extracted by the vowel extraction unit (2), and candidate syllables are selected from the syllable patterns selected by the first dictionary selection unit (5). Pa Second dictionary selection section (6), the analysis unit (1) output and the matching portion for matching the syllable pattern of selected candidate by the second dictionary selection unit (6) for selecting over emissions (7)
And an unspecified speaker consonant identification device.