JPS61118800A

JPS61118800A - Voice analyzer

Info

Publication number: JPS61118800A
Application number: JP59240115A
Authority: JP
Inventors: 小林　敦仁
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1984-11-14
Filing date: 1984-11-14
Publication date: 1986-06-06

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は単音節音声認識装置に係り、単音節のうち特に
無声破裂音を正確に認識するため、その破裂時点を検出
するために用いられる音声分析装置に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to a monosyllabic speech recognition device, which is used to accurately recognize unvoiced plosives among monosyllables, and to detect the point at which they plosive. It relates to a speech analysis device.

音声認識装置は、認識単位によって、音節を単位として
認識する単音節音声認識装置、単語を単位として認識す
る単語音声認識装置、および連続音声を認識する連続音
声認識装置等に分類される。Speech recognition devices are classified by recognition units into monosyllabic speech recognition devices that recognize syllables as units, word speech recognition devices that recognize words as units, continuous speech recognition devices that recognize continuous speech, and the like.

これらのうち単音節音声認識装置は、被認識音声を音節
毎に入力し９例えば次のような手順によって２日本語の
場合であれば約６０種類に分類される音節の何れかに検
定する装置である。Among these, monosyllabic speech recognition devices are devices that input speech to be recognized syllable by syllable and, for example, use the following procedure to determine whether the speech is classified into one of approximately 60 types of syllables in the case of Japanese. It is.

ｉ）話者が単音節毎に発声した音声をマイクロホンによ
って電気信号に変換する。i) A microphone converts the voice uttered by a speaker into an electrical signal for each single syllable.

ｉｉ　）電気信号に変換された音声信号を９例えば１０
ｋＨｚ毎にサンプリングしてＡＤ変換する。ii) 9. For example, 10.
It samples every kHz and performs AD conversion.

１ｉｉ）ＡＤ変換された音声信号を１例えば２０ｍ５長
の分析窓によって１０ｍ５毎の周期で分析して、パワー
スペクトル等の特徴を抽出し２例えば４０窓分の特徴時
系列を作る。1ii) The AD-converted audio signal is analyzed every 10 m5 using an analysis window with a length of, for example, 20 m5, and features such as a power spectrum are extracted to create a feature time series for, for example, 40 windows.

ｉｖ）得られたられた特徴時系列を、単音節の種類毎に
予めＹＰ＝備した標準の特徴時系列と照合し類似度を求
める。iv) Compare the obtained feature time series with a standard feature time series prepared in advance for each type of monosyllable to determine the degree of similarity.

■）得られた類似度によって被認識音節の種類を決定す
る。(2) Determine the type of syllable to be recognized based on the obtained similarity.

第２図は無声破裂音１バ（ｐａ）に対する音、声信号の
例であり、■は子音ｐを特徴付ける領域。FIG. 2 is an example of the sound and voice signal for the voiceless plosive 1 ba (pa), and ■ is the area that characterizes the consonant p.

■は母音ａを特徴付ける領域、また、■は子音ｐから母
音ａに移り変わる領域すなわちｒ、ｇｊすｊを示し、母
音領域■と比べると無声破裂音を最も特徴付ける子音領
域Ｉおよび渡り領域■は非常に速い現象である。■ is the region that characterizes the vowel a, and ■ is the region that transitions from the consonant p to the vowel a, i.e., r, gj, j. Compared to the vowel region ■, the consonant region I and transition region ■, which most characterize voiceless plosives, are extremely This is a rapid phenomenon.

従って、無声破裂音の認識においては、その子音領域と
渡りとの特徴を正確に抽出することが非常に重要である
。Therefore, in recognizing voiceless plosives, it is very important to accurately extract the characteristics of the consonant region and transition.

[Conventional technology]

第３図は単音節音声認識装置の従来例の構成を示すブロ
ック図であり、１は話者が発声した単音節音声を電気信
号に変換するマイクロホン、２はマイクロホン１によっ
て電気信号に変換された音声信号を１例えば１０　ｋ　
Ｈｚ毎にサンプリングしてＡＤ変換するＡＤ変換器、３
はＡＤ変換器２の出力を一時記憶するバッファ、４はバ
ッファ３に記憶した音声信号を＋　２０１１３長の分析
窓によって１０ｍ５の周期で分析し、パワースペクトル
等の特徴を抽出し、　４０窓分の特徴時系列を作る特徴
抽出部、５は単音節の種類毎に予め準備した標準の特徴
時系列を登録する特徴辞書、６は特徴抽出部４によって
得られた特徴時系列と特徴辞書５に登録する標準の特徴
時系列との類似度を求める照合部、７は照合部６によっ
て得られた類似度によって被認識音節の種類を決定する
決定部である。FIG. 3 is a block diagram showing the configuration of a conventional example of a monosyllabic speech recognition device, in which 1 is a microphone that converts monosyllabic speech uttered by a speaker into an electrical signal; If the audio signal is 1, for example 10k
AD converter that samples every Hz and performs AD conversion, 3
4 is a buffer that temporarily stores the output of the AD converter 2, and 4 is a buffer that analyzes the audio signal stored in the buffer 3 at a cycle of 10 m5 using an analysis window of +20113 length, extracts features such as power spectrum, A feature extractor 5 creates a feature time series, a feature dictionary 5 registers standard feature time series prepared in advance for each type of monosyllable, and a feature 6 registers the feature time series obtained by the feature extractor 4 in the feature dictionary 5. 7 is a determining unit that determines the type of syllable to be recognized based on the similarity obtained by the matching unit 6.

[Problem that the invention seeks to solve]

以上のような構成において、特徴抽出部４では。 In the above configuration, the feature extraction unit 4.

第２図に示すように９例えば２０■Ｓ長の分析窓を■−
■−・・■−・・のように１例えば１０ｍ５の周期で移
動させながら各区間における特徴を抽出して特徴時系列
を作るのであるが２分析窓の始点に対し何等の規制がな
されていなかった。As shown in Fig. 2, an analysis window of 9, for example, 20 S length is
■-・・■-・・1 While moving at a period of, for example, 10 m5, a feature time series is created by extracting the features in each section, but 2 There are no restrictions on the starting point of the analysis window. Ta.

このため、子音領域」にかかる分析窓の位置が不規則に
なり１例えば■の場合と■°の場合とでは得られる特徴
が異なり、従って、子音領域Ｉの特徴を安定に、従って
正確に抽出できないという問題点があった。For this reason, the position of the analysis window over the "consonant region" becomes irregular, and the obtained features differ depending on the case of ■ and the case of ■°. Therefore, the characteristics of consonant region I can be stably and accurately extracted. The problem was that it couldn't be done.

[Means for solving problems]

本発明になる音声分析装置は、前記問題点を解決するた
め手段であり、音声信号から音声パワーの時系列を生成
する第一の演算回路と、前記音声信号から単位時間当た
り零交差回数の時系列を生成する第二の演算回路と、前
記第一の演算回路の出力と前記第二の演算回路の出力と
から前記音声信号中の無声破裂音の破裂時点を検出する
検出回路とを備えるものである。A voice analysis device according to the present invention is a means for solving the above problems, and includes a first arithmetic circuit that generates a time series of voice power from a voice signal, and a time series of zero crossings per unit time from the voice signal. A second arithmetic circuit that generates a sequence, and a detection circuit that detects the point of rupture of a voiceless plosive in the audio signal from the output of the first arithmetic circuit and the output of the second arithmetic circuit. It is.

[Effect]

第４図は無声破裂音“パ”の音声信号Ａと、その対数化
音声パワーの変化Ｂと零交差回数の変化Ｃとの関係を示
す図であり、音声信号Ａは点Ｐ付近から現れており、音
声信号Ａに伴って音声パワーＢも増大している。FIG. 4 is a diagram showing the relationship between the voice signal A of the voiceless plosive "pa", its logarithmized voice power change B, and the number of zero crossings change C. The voice signal A appears from around point P. Therefore, along with the audio signal A, the audio power B is also increasing.

特に破裂時点から母音の立上がりにがけての音声パワー
の変化は大きい。また、これを零交差回数Ｃで見ると、
破裂時点でのそれは可なり急激な減少鋼量を示しており
、無音区間および母音部のそれと比較して値としては小
さい。In particular, the change in voice power from the point of rupture to the rise of the vowel is large. Also, if we look at this in terms of the number of zero crossings C,
It shows a fairly rapid decrease in the amount of steel at the point of rupture, and the value is small compared to that in the silent section and vowel part.

本発明は、このような性質に着目してなされたものであ
り、音声パワーの変化Ｂと零交差回数の変化Ｃとの交点
Ｑとして、無声破裂音の破裂時点を検出す、るようにし
たものである。The present invention has been made with attention to such properties, and the point of rupture of a voiceless plosive is detected as the intersection point Q between the change B in voice power and the change C in the number of zero crossings. It is something.

〔Example〕

以下に本発明の要旨を第１図に示す実施例によって具体
的に説明する。The gist of the present invention will be specifically explained below with reference to an embodiment shown in FIG.

第１図（ａ）は本発明−実施例の構成を示すブロック図
であり、　１′は話者が発声した単音節音声を電気信号
に変換するマイ−クロホン　２１はマイクロホン１゛に
よって電気信号に変換された音声信号を９例えば１Ｑｋ
Ｈｚ毎にサンプリングして＾Ｄ変換するＡＤ変換器、３
°はＡＯ変換器２°の出力を一時記憶するバッファであ
る。FIG. 1(a) is a block diagram showing the configuration of an embodiment of the present invention, in which 1' is a microphone that converts monosyllabic speech uttered by a speaker into an electrical signal, and 21 is a microphone that converts monosyllabic speech uttered by a speaker into an electrical signal. 9. For example, 1Qk
AD converter that samples every Hz and converts it into D, 3
° is a buffer that temporarily stores the output of the AO converter 2°.

８は音声信号から音声パワーの時系列を生成する第一の
演算回路であり、マイクロホン１°によって電気信号に
変換されたのち、１０ｋＨｚ毎にサンプリングしＡＤ変
換され９時系列Ｘ　（１１１として表されバッファ３”
に格納される単音節音声信号を、２００サンプル長、す
なわち２０ｍ５長の分析窓を用い。8 is the first arithmetic circuit that generates a time series of audio power from the audio signal, which is converted into an electrical signal by a microphone 1°, sampled every 10 kHz, AD converted, and 9 time series X (represented as 111). Buffer 3”
A monosyllabic speech signal stored in , using an analysis window with a length of 200 samples, that is, a length of 20 m5.

１０ｍ５の周期で。With a cycle of 10m5.

Ｐ　（１）＝ｌｏｇ　　（Σｓｑ、　（ｘ（ｎｌ）　）
　−−−−（１）（ただしΣはｎ＝１〜２００の和を表
す）の演算を行い対数化パワーの時系列ＣＰ　（１１）
を出力し、その出力をバッファ９に格納する。P (1) = log (Σsq, (x(nl) )
---- Calculate (1) (where Σ represents the sum of n = 1 to 200) and logarithmized power time series CP (11)
The output is stored in the buffer 9.

１０は、音声信号から単位時間当たり零交差回数の時系
列を生成する第二の演算回路演算回路であり、パンツ７
３′に格納される単音節音声信号Ｘ（ｎｌを、２００サ
ンプル長の分析窓を用い、　１０＋＋＋ｓ毎に零交差回
数Ｚ　（６）を求め、零交差回数の時系列（Ｚ　（ｚ）
　）を出力し、その出力をバッファ１１に格納する。10 is a second arithmetic circuit arithmetic circuit that generates a time series of the number of zero crossings per unit time from an audio signal;
Using an analysis window of 200 samples length, the monosyllabic speech signal X (nl stored in
) and stores the output in the buffer 11.

１２および１３は正規化回路であり、それぞれ、バッフ
ァ９に格納される対数化音声パワーの時系列（Ｐ（ｊり
）、およびバッファ１１に格納される零交差回数の時系
列（Ｚ　（Ｎ）　）の各々に対し、それぞれ最大値が１
になるように正規化係数Ｋｌおよびに２を乗じ。12 and 13 are normalization circuits, which respectively calculate the time series (P(j)) of the logarithmized audio power stored in the buffer 9 and the time series (Z (N)) of the number of zero crossings stored in the buffer 11. ), the maximum value is 1
Multiply the normalization coefficient Kl and by 2 so that

Ｐ　（１）　　’　−Ｋｌ　ｘ　Ｐ　（１−−−−−−
（２）Ｚ　（１）　　’　−に２　ｘ　Ｚ　（ｎ）　　
　−−−−−−（３）として、それぞれ正規化された対
数音声パワーの時系列（Ｐ（６）”）、および零交差回
数の時系列（Ｚ（ｊ）’）を出力する。P (1) ' -Kl x P (1-------
(2) Z (1) ' - to 2 x Z (n)
------- (3) Output a time series of normalized logarithmic voice power (P(6)'') and a time series of the number of zero crossings (Z(j)').

１４は、第一の演算回路８の出力および第二の演算回路
１０の出力から、単音節音声信号中の無声破裂音の破裂
時点を検出する検出回路でり、正規化回路１２から出力
される正規化された対数音声パワーの時系列（ｐ（Ｉｔ
）’）、および正規化回路１３から出力される正規化さ
れた零交差回数の時系列（Ｚ（ｊり’）との交点（第４
図の点Ｑ）として。14 is a detection circuit that detects the plosive point of a voiceless plosive in a monosyllabic speech signal from the output of the first arithmetic circuit 8 and the output of the second arithmetic circuit 10, which is output from the normalization circuit 12. Time series of normalized logarithmic speech power (p(It
)'), and the time series of the normalized number of zero crossings output from the normalization circuit 13 (Z(jri'))
As point Q) in the figure.

破裂時点を検出する。Detect the point of rupture.

第１図（ｂ）は破裂点検出回の具体的、な回路構成例で
あり、１５は減算回路、１６は２桁のシフトレジスタ、
１７ｉ符号識別回路である。FIG. 1(b) shows a specific circuit configuration example of the bursting point detection circuit, in which 15 is a subtraction circuit, 16 is a two-digit shift register,
17i code identification circuit.

減算向！１５は、正規化回路１３から出力される正規化
零交差回数Ｚ（Ｎ）と正規化回路１２から出力される正
規化対数パワーとＰＣｌ＞　　’の差を求め、その出力
はシフトレジスタ１６に送られる。For subtraction! 15 calculates the difference between the normalized number of zero crossings Z(N) output from the normalization circuit 13, the normalized logarithmic power output from the normalization circuit 12, and PCl>', and sends the output to the shift register 16. It will be done.

符号識別回路１７は、シフトレジスタ１６の各桁の記憶
内容の積を求め、その符号が正から負に変化したとき出
力信号を発生する。The code identification circuit 17 calculates the product of the stored contents of each digit of the shift register 16, and generates an output signal when the sign changes from positive to negative.

〔Effect of the invention〕

以上説明したように２本発明によれば無声破裂音の破裂
時点を安定に検出できるので、単音節音声認識装置にお
いて特徴抽出のために使用する分析窓を、前記検出され
た破裂時点に同期させることにより、無声破裂音におけ
る子音領域の特徴を安定に従って正確に抽出できるので
、単音節音声認識装置の岬識精度を向上することができ
る。As explained above, according to the present invention, the point of rupture of a voiceless plosive can be stably detected, so the analysis window used for feature extraction in a monosyllabic speech recognition device is synchronized with the detected point of plosive. As a result, the characteristics of the consonant region in voiceless plosives can be extracted stably and accurately, thereby improving the accuracy of the monosyllabic speech recognition device.

[Brief explanation of drawings]

第１図（ａ）は本発明一実施例のブロック図。第１図（ｂｌは同実施例の一部の構成例を示す図。第２図は産業上の利用分野に関する説明図。第３図は従来技術のブロック図。第４図は作用に関する説明図である。図中。８は第一の演算回路、　　１０は第二の演算回路。￥−３ｔ３Ｉ＋ＷＪ FIG. 1(a) is a block diagram of one embodiment of the present invention. FIG. 1 (bl is a diagram showing a partial configuration example of the same embodiment. FIG. 2 is an explanatory diagram regarding industrial application fields. FIG. 3 is a block diagram of the prior art. FIG. 4 is an explanatory diagram regarding the action. In the figure. 8 is the first arithmetic circuit, 10 is the second arithmetic circuit. ¥-3t3 I+WJ

Claims

[Claims]

a first arithmetic circuit that generates a time series of audio power from an audio signal; a second arithmetic circuit that generates a time series of the number of zero crossings per unit time from the audio signal; and an output of the first arithmetic circuit. A speech analysis device comprising: a detection circuit that detects the point at which a voiceless plosive in the speech signal ruptures from the output of the second arithmetic circuit.