JPS6377098A

JPS6377098A - Unspecified speaker consonant identifier

Info

Publication number: JPS6377098A
Application number: JP61222397A
Authority: JP
Inventors: 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1986-09-19
Filing date: 1986-09-19
Publication date: 1988-04-07
Anticipated expiration: 2012-01-16
Also published as: JP2572753B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Abstract] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】［概　　　　要コ予め多数の話者の音声に係るそれらの音節パターンを辞
書に登録しておいて、入力された音声を分析後、これと
該辞書内の各音節パターンとを照合して、最も距離の小
なるものを選び出すことにより音声認識を行なう装置に
よって、不特定話者の音声を認識しようとする場合、子
音自体の調音位置を表すスペクトルが不安定である上、
子音の熾別に重要な役割りを有する渡り部のスペクトル
（音節を構成する子音と母音ｔの間のスペクトル）が、
話者が異なると同一種類の母音であっても母音部パター
ンが異なることから、後続の母音の変化に応じて変化し
てしまうため、認識率の高い装置を実現することが困難
であると言う問題点があった０本発明はこのような従来
の問題点を解決するため、入力された音声から母音を抽
出すると共にその種類を識別して、辞書中から、これと
同一種類の母音を存し母音スペクトルの類似した音節を
候補として選出し、これらの候補と入力音声の分析結果
とを照合することにより高い子音識別率を得ることの出
来る技術について開示している。[Detailed description of the invention] [Summary] Syllable patterns related to the voices of many speakers are registered in a dictionary in advance, and after analyzing the input voice, this and each syllable pattern in the dictionary are registered in advance. When attempting to recognize the speech of an unspecified speaker using a device that performs speech recognition by comparing the two consonants and selecting the one with the smallest distance, the spectrum representing the articulatory position of the consonant itself is unstable. ,
The transition spectrum (the spectrum between the consonant and vowel t that makes up the syllable), which plays an important role in distinguishing between consonants, is
It is said that it is difficult to realize a device with a high recognition rate because the vowel part pattern differs even for the same type of vowel between different speakers, and it changes according to the change in the following vowel. In order to solve these conventional problems, the present invention extracts a vowel from input speech, identifies its type, and searches the dictionary for vowels of the same type. This paper discloses a technique that can obtain a high consonant identification rate by selecting syllables with similar vowel spectra as candidates and comparing these candidates with analysis results of input speech.

「産業上の利用分野］本発明は不特定話者を対象とする音声認識装置に関する
ものであって、特に高い識別率を得ることの出来るもの
であって、特に高い識別率を得ることの出来る不特定話
者子音識別Ｖｉ　Ｗに係る。"Field of Industrial Application" The present invention relates to a speech recognition device for unspecified speakers, and is capable of obtaining a particularly high identification rate. This relates to speaker-independent consonant identification ViW.

一モ従来の技術］音声認識装置は文書作成や多数のコマンドを必要とする
装置の操作を音声で行う場合に有用である。各話者毎に
、音声の違いがあるため、不特定話者の音声認識、特に
子音認識が困難である。そのため、話者毎に膨大な音声
を登録する認識装置が実用化されているが、この膨大な
登り、Ｎのため音声認識装置の普及が阻まれている。BACKGROUND OF THE INVENTION Speech recognition devices are useful when creating documents or operating devices that require a large number of commands using voice. Since the voices of each speaker are different, it is difficult to recognize the voices of unspecified speakers, especially consonant recognition. For this reason, recognition devices that register a huge amount of speech for each speaker have been put into practical use, but this huge number of calls, N, has prevented the widespread use of speech recognition devices.

第２図は従来の不特定話者音声認識装置の構成の例を示
すブロック図であって、５０は分析部、５１は照合部、
５２は音節パターン辞書（マルチテンプレート）を表わ
している。FIG. 2 is a block diagram showing an example of the configuration of a conventional speaker-independent speech recognition device, in which 50 is an analysis section, 51 is a collation section,
52 represents a syllable pattern dictionary (multi-template).

第２図において、分析部５０は入力された音声をディジ
タル信号に変換し、数十ミリ秒ごとに短区間周波数分析
してその結果の周波数スペクトルを出力する。照合部５
１はこれを音節パターン辞書５２に格納されている総て
の音節パターンと照合して最も距離の近い音節を識別結
果として出力する。In FIG. 2, an analysis unit 50 converts input audio into a digital signal, performs short-term frequency analysis every several tens of milliseconds, and outputs the resulting frequency spectrum. Collation section 5
1 compares this with all the syllable patterns stored in the syllable pattern dictionary 52 and outputs the closest syllable as the identification result.

イ発明が解決しようとする問題点コ第３図は音節パターンの例を示す図であって、縦軸はフ
ォルマント周波数、横軸は時間を示しており、５３．５
４はそれぞれ異なる音節のパターンであって、それぞれ
実線で示したものが話者Ａの音声入力、点線で示したも
のが話者Ｂの音声入力によるものである。B. Problems to be Solved by the Invention FIG. 3 is a diagram showing an example of a syllable pattern, in which the vertical axis represents formant frequency and the horizontal axis represents time.
4 are different syllable patterns, with the solid line representing speaker A's voice input and the dotted line representing speaker B's voice input.

第３図に見られるように、音節内の子音の特徴は、子音
自体の調音位置を表わすスペクトル（以下調音位置スペ
クトルと呼ぶ、たとえば、破裂音の場合は破裂部スペク
トルがこれにあたる）および子音からａｍ母音への渡り
のスペクトルの両方にある。調音位置スペクトルは、−
触に不安定であり特に連続音声中ではいつも観測できる
とは限らない。そこで、渡り部スペクトルを利用するこ
とが望まれる。しかし、渡り部スペクトルは後続する母
音によって著しく変化するため特定話者の場合でも後続
母音毎に熾別法を変える必要がある。不特定話者の場合
、同じ種類の母音でも話者が異なると母音パターンが異
なり渡り部スペクトルも変化し、子音識別に重大な影響
を与えることになる。As seen in Figure 3, the characteristics of a consonant within a syllable are determined by the spectrum representing the articulatory position of the consonant itself (hereinafter referred to as the articulatory position spectrum; for example, in the case of a plosive, this is the plosive part spectrum) and the consonant's position of articulation. Both sides of the spectrum transition to the am vowel. The articulatory position spectrum is −
It is unstable to the touch and cannot always be observed, especially during continuous speech. Therefore, it is desirable to utilize the transition spectrum. However, the transition spectrum changes significantly depending on the following vowels, so even in the case of a specific speaker, it is necessary to change the classification method for each subsequent vowel. In the case of unspecified speakers, even if the vowel is of the same type, the vowel pattern will differ between different speakers, and the transition spectrum will also change, which will have a significant impact on consonant identification.

このような条件から、前述したように従来の不特定話者
音声認識装置においては、膨大な音声を登録する必要が
ある上、不特定話者の子音の識別が必ずしも明確に行え
ない場合があることから、高い認識率を有する装置を実
現することが困難であると言う問題点があった。Due to these conditions, as mentioned above, conventional speaker-independent speech recognition devices need to register a huge amount of speech, and it may not always be possible to clearly identify consonants of any speaker. Therefore, there was a problem in that it was difficult to realize a device with a high recognition rate.

本発明はこのような従来の問題点に鑑み、子音部の識別
に母音部の話者の違いによる影響を与えないようにして
、高い認識率を得ることの出来る不特定話者子音識別装
置を提供することを目的としている。In view of these conventional problems, the present invention provides a speaker-independent consonant identification device that can obtain a high recognition rate by eliminating the influence of differences between speakers in the vowel part on the identification of consonant parts. is intended to provide.

［問題点を解決するだめの手段コ本発明によれば上述の目的は、前記特許請求の範囲に記
載のとおり、入力された音声をディジタル信号に変換し
て数十ミリ秒ごとに短区間周波数分析を行なう分析部と
、分析部の出力から入力音声の母音部パターンを抽出す
る母音抽山部と、抽出された母音部分の母音の種類を識
別する母音識別部と、予め用意した多数の話者の音節パ
ターンを音節ごとに母音部パターンと過渡部パターンと
に分けて格納した音節パターン辞書と、母音識別部の識
別結果に該当する母音部を持つ音節パターンを音節パタ
ーン辞書から取り出す第一辞書選択部と、第一辞書選択
部で選択された各音節パターン辞書の母音部パターンと
母音抽出部で抽出された入力音声の母音部パターンとの
距離を計算し、各音節名ごとに距離の小さい方から複数
個の音節を候補として選択する第二辞書選択部と、分析
部の出力と第二辞書選択部によって選択された候補の音
節パターンとの距離計算を行なって識別結果を決定する
照合部とを具備することを特徴とする特定話者子音識別
装置により達成される。[Means for Solving the Problems] According to the present invention, the above-mentioned object is as described in the claims, by converting input audio into a digital signal and converting it into a short-range frequency signal every several tens of milliseconds. An analysis section that performs analysis, a vowel extraction section that extracts the vowel pattern of input speech from the output of the analysis section, a vowel identification section that identifies the type of vowel in the extracted vowel section, and a large number of pre-prepared speech patterns. a syllable pattern dictionary that stores syllable patterns for each syllable divided into vowel part patterns and transition part patterns, and a first dictionary that extracts syllable patterns having vowel parts that correspond to the identification results of the vowel identification section from the syllable pattern dictionary. The selection unit calculates the distance between the vowel part pattern of each syllable pattern dictionary selected by the first dictionary selection part and the vowel part pattern of the input speech extracted by the vowel extraction part, and calculates the distance for each syllable name with a small distance. a second dictionary selection section that selects a plurality of syllables as candidates from the second dictionary selection section; and a matching section that determines an identification result by calculating the distance between the output of the analysis section and the candidate syllable pattern selected by the second dictionary selection section. This is achieved by a speaker-specific consonant identification device characterized by comprising:

［作　　用］上述した本発明による不特定話者子音識別装置において
は、子音に後続する母音の種類を考慮することは勿論、
本発明では母音部の話者特性も考慮することによって過
渡部および母音部の話者による違いの影ツを除去する。[Function] In the speaker-independent consonant identification device according to the present invention described above, it goes without saying that the type of vowel following the consonant is taken into account.
In the present invention, the influence of differences between speakers in the transitional part and the vowel part is removed by considering the speaker characteristics of the vowel part.

具体的には、多数の話者より作成した音節パターン辞書
の内、入力音節の母音部分のスペクトルとよく似た母音
部分のスペクトルを持つものだけを識別時に用いること
によってこれを実現するものであって、不特定話者の音
声に対して高い子音識別率を得ることが出来る。Specifically, this is achieved by using only those syllable pattern dictionaries created by a large number of speakers that have a vowel spectrum that is very similar to the vowel spectrum of the input syllable. Therefore, it is possible to obtain a high consonant identification rate for speech of unspecified speakers.

［実　施　例］第１図は本発明の一実施例のブロフク図であって、１は
分析部、２は母音抽出部、３は母音識別部、４は音節パ
ターン辞書、５は第一辞書選択部、６は第二辞書選択部
、７は照合部を表わしている。[Embodiment] FIG. 1 is a diagram of an embodiment of the present invention, in which 1 is an analysis section, 2 is a vowel extraction section, 3 is a vowel identification section, 4 is a syllable pattern dictionary, and 5 is a first dictionary. The selection section, 6 represents a second dictionary selection section, and 7 represents a collation section.

入力された音声は分析部１でデジタル化された後、数＋
ＩＩＳ毎にＦＦＴ等で周波数分析（スペクトル分析）さ
れる。After the input voice is digitized by the analysis unit 1, it is converted into a number +
Frequency analysis (spectrum analysis) is performed using FFT or the like for each IIS.

母音抽出部２では入力された音声のパワーが大きくスペ
クトル変化の少ない部分が母音部として抽出され、母音
部分のスペクトルが母音識別部及び第二辞書選択部６へ
送られる。The vowel extractor 2 extracts a portion of the input voice with large power and little spectral change as a vowel portion, and sends the spectrum of the vowel portion to the vowel identifier and second dictionary selector 6.

母音識別部３では、入力音声の母音部スペクトルを用い
て母音部の識別を行う。日本語の場合、５母音（アイウ
ェオ）の識別が行われる。The vowel identification unit 3 identifies vowel parts using the vowel part spectrum of the input speech. In the case of Japanese, five vowels (aiweo) are identified.

音節パターン辞書４には、多数話者により登録された、
音節名、母音名、音節パターンおよび母音パターン等が
第１表に示す形式で記憶されている。The syllable pattern dictionary 4 includes the following patterns registered by many speakers:
Syllable names, vowel names, syllable patterns, vowel patterns, etc. are stored in the format shown in Table 1.

第一辞書選択部５では、母音識別結果に従って音節パタ
ーン辞書４から該当する項目を選び出し、第二辞書選択
部６へ送る。The first dictionary selection section 5 selects a corresponding item from the syllable pattern dictionary 4 according to the vowel identification result and sends it to the second dictionary selection section 6.

第　　　１　　　表第二辞９選択部６では、第一辞書選択部５で選択された
各項目の母音パターンと母音抽出部２で得られた入力音
声の母音パターンの距離計算（類似度計算）を行い、同
−音節名ごとに距離の小ざい（類似度の大ｊい）はうか
ら１１数個を照合部７へ送り出す。Table 1 The second dictionary 9 selection unit 6 calculates the distance (similarity calculation) between the vowel pattern of each item selected by the first dictionary selection unit 5 and the vowel pattern of the input speech obtained by the vowel extraction unit 2. Then, for each same syllable name, 11 or more names with small distances (high degrees of similarity) are sent to the matching unit 7.

照合部７では、第二辞書選択部６で選択された項目の音
節過渡部パターンと入力音節の過渡部パターンの距離計
Ｘ　（１１似度計算）を行い最も距離の小さい（類似度
の大きい）もの１つまたは距離の小さい（類似度の大き
い）ものから複数個を識別結果として出力する。The matching unit 7 performs a distance meter X (11 similarity calculation) between the syllable transition pattern of the item selected in the second dictionary selection unit 6 and the input syllable transition pattern to find the smallest distance (highest similarity). One item or a plurality of items with a small distance (high degree of similarity) are output as identification results.

ここで、上記音節過渡部パターンに音節の子音の調音位
置スペクトルを含めておいてもよい。Here, the articulatory position spectrum of the consonant of the syllable may be included in the syllable transition pattern.

［発明の効果］以上説明したように、本発明によれば、話者による母音
のスペクトルの違いを考慮して音節過渡部の情報を子音
識別に利用できるので、高い認識率を達成することがで
きる。[Effects of the Invention] As explained above, according to the present invention, information on syllable transition parts can be used for consonant identification in consideration of differences in vowel spectra depending on the speaker, so a high recognition rate can be achieved. can.

[Brief explanation of the drawing]

第１図は本発明の一実施例のブロック図、第２図は従来
の不特定話者音声認識装置の構成を示すブロック図、第
３図は音節パターンの例を示す図である。FIG. 1 is a block diagram of an embodiment of the present invention, FIG. 2 is a block diagram showing the configuration of a conventional speaker-independent speech recognition device, and FIG. 3 is a diagram showing an example of a syllable pattern.

Claims

[Claims] An analysis section (1) that converts input speech into a digital signal and performs short-range frequency analysis every several tens of milliseconds, and a vowel part pattern of the input speech from the output of the analysis section (1). a vowel extraction unit (2) that extracts the vowel part; a vowel identification unit (3) that identifies the type of vowel in the extracted vowel part; and a vowel identification part (3) that identifies the vowel type of the extracted vowel part. a syllable pattern dictionary (4) that is stored separately from the vowel part patterns; and a first dictionary selection part (5) that extracts from the syllable pattern dictionary (4) a syllable pattern having a vowel part that corresponds to the identification result of the vowel identification part (3). ), and the distance between the vowel part pattern of each syllable pattern dictionary selected by the first dictionary selection part (5) and the vowel part pattern of the input speech extracted by the vowel extraction part (2) is calculated, and the distance is calculated from the one with the smaller distance. a second dictionary selection unit (6) that selects a plurality of syllables from as candidates; and an output of the analysis unit (1) and a second dictionary selection unit (6).
1. A speaker-independent consonant identification device comprising: a collation unit (7) that calculates a distance from a candidate syllable pattern selected by the above method to determine an identification result.