JPH0558559B2 - - Google Patents

Info

Publication number
JPH0558559B2
JPH0558559B2 JP63130784A JP13078488A JPH0558559B2 JP H0558559 B2 JPH0558559 B2 JP H0558559B2 JP 63130784 A JP63130784 A JP 63130784A JP 13078488 A JP13078488 A JP 13078488A JP H0558559 B2 JPH0558559 B2 JP H0558559B2
Authority
JP
Japan
Prior art keywords
pattern
vector
axis
spatiotemporal
grid point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
JP63130784A
Other languages
Japanese (ja)
Other versions
JPH01158496A (en
Inventor
Ryuichi Oka
Hiroshi Matsumura
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute of Advanced Industrial Science and Technology AIST
Sanyo Electric Co Ltd
Sanyo Denki Co Ltd
Original Assignee
Agency of Industrial Science and Technology
Sanyo Electric Co Ltd
Sanyo Denki Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency of Industrial Science and Technology, Sanyo Electric Co Ltd, Sanyo Denki Co Ltd filed Critical Agency of Industrial Science and Technology
Priority to JP63130784A priority Critical patent/JPH01158496A/en
Publication of JPH01158496A publication Critical patent/JPH01158496A/en
Publication of JPH0558559B2 publication Critical patent/JPH0558559B2/ja
Granted legal-status Critical Current

Links

Abstract

PURPOSE: To obtain a high recognition rate by utilizing the pattern of a vector field, performing a defocusing process (blurring process) by directions, and utilizing its result for voice recognition. CONSTITUTION: A time space pattern is converted into a vector field pattern having size and a direction at each grating point in a space by spatial differentiation, and as to the vector of the vector field pattern, its direction parameter is quantized into N values (N: integer). This quantized value is separated by vectors to generate N two-dimensional patterns having the sizes of the vectors as values at respective grating points, and the blurring process is performed by the directions of the two-dimensional patterns by the directions as to a time base and/or a space axis to extract a pattern as a feature of the voice. Consequently, high recognition rates is obtained even for voice recognition regarding large vocabulary and voice recognition regarding an unspecified speaker.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は音声認識等に利用する音声の特徴抽出
方式に関し、更に詳述すればベクトル場のパター
ンを利用し、またその方向別にボカシ処理(ボケ
処理ともいう)を施して、音声認識に利用する場
合は高い認識率を得ることができる新規な方式を
提供するものである。
[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a speech feature extraction method used for speech recognition, etc. More specifically, the present invention utilizes vector field patterns and performs blurring processing ( The present invention provides a new method that can obtain a high recognition rate when used for speech recognition by applying blurring processing.

〔従来技術〕[Prior art]

音声認識は、一般に、認識させるべき単語から
特徴を抽出して得た音声の標準パターンを単語
夫々に用意しておき、認識対象として入力された
音声から同様にして抽出した特徴パターンと複数
の標準パターンとを整合し、最も類似性が高い標
準パターンを求め、この標準パターンに係る単語
が入力されたものと判定する方式をとつている。
そして、従来は上記特徴パターンとして、音声信
号を分析して得られる、時間軸を横軸、空間軸を
縦軸とするスカラー場の時空間パターンそのもの
を用いていた。このようなスカラー場の時空間パ
ターンとしては、周波数を空間軸とするスペクト
ルが代表的なものであり、この他、ケフレンシー
を空間軸とするケプストラム、PARCOR係数、
LSP係数、声道断面積関数等種々の時空間パター
ンが用いられていた。
Generally, in speech recognition, a standard pattern of speech obtained by extracting features from the word to be recognized is prepared for each word, and feature patterns extracted in the same way from the speech input as recognition target and multiple standards are prepared for each word. The standard pattern with the highest similarity is found by matching the patterns, and the word associated with this standard pattern is determined to have been input.
Conventionally, the spatio-temporal pattern itself of a scalar field with the horizontal axis as the time axis and the vertical axis as the spatial axis, which is obtained by analyzing the audio signal, has been used as the feature pattern. A typical spatio-temporal pattern of such a scalar field is a spectrum with frequency as its spatial axis, as well as a cepstrum with quefrency as its spatial axis, a PARCOR coefficient,
Various spatiotemporal patterns such as LSP coefficients and vocal tract cross-sectional area functions were used.

また、音声認識の分野において解決すべき課題
の1つとして多数話者又は不特定話者への対応が
あり、これには1つの単語に多数の標準パターン
を用意することで認識率の向上を図つていた。更
に、話者が同一であつても発音速度が異なること
があり、このような場合にも対応できるように時
間軸変動を吸収し得るDPマツチング法が開発さ
れていた。
In addition, one of the issues to be solved in the field of speech recognition is dealing with multiple speakers or unspecified speakers, and improving the recognition rate by preparing a large number of standard patterns for one word. I was thinking about it. Furthermore, even if speakers are the same, their pronunciation speeds may differ, and a DP matching method that can absorb time axis fluctuations has been developed to cope with such cases.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

スカラー場の時空間パターンそのものを特徴と
して用いる従来の方式では、大語彙又は不特定話
者を対象とした場合、必ずしも十分な認識率が得
られておらず、たとえ、上述の如く1つの単語に
多数の標準パターンを用意したり、あるいはDP
マツチング法を用いても、これらは本格的な解決
にはならなかつた。
Conventional methods that use the spatiotemporal pattern of the scalar field itself as a feature do not necessarily achieve a sufficient recognition rate when targeting large vocabularies or unspecified speakers. Many standard patterns are available or DP
Even using the matching method, these problems could not be fully resolved.

従つて、大語彙又は不特定話者を対象とした音
声認識システムの実用化が停滞しているのであ
る。そこで、本発明者の1人は、特開昭60−
59394号公報及び「スペクトルベクトル場とスペ
クトルの音声認識における有効性比較について」
電子通信学会論文誌(D)、Vo1.J69−D.No.1P1704
(1986)において、時間−周波数の時空間パター
ンであるスカラー場のスペクトルを空間微分して
スペクトルベクトル場パターンを得、このパター
ンを音声の特徴として用いる手法を提案した。
Therefore, the practical application of speech recognition systems for large vocabularies or for unspecified speakers has stalled. Therefore, one of the inventors of the present invention
Publication No. 59394 and "Comparison of effectiveness of spectral vector field and spectrum in speech recognition"
Journal of the Institute of Electronics and Communication Engineers (D), Vo1.J69−D.No.1P1704
(1986) proposed a method for spatially differentiating the spectrum of a scalar field, which is a time-frequency spatiotemporal pattern, to obtain a spectral vector field pattern, and using this pattern as a feature of speech.

過去スペクトルの時空点の偏微分を特徴として
用いた研究はT.B.Martinによつて為され、
“Practical applications of voice input to
machines”Proc.IEEE,64−4(1976)に開示さ
れている。しかしながら、T.B.Martinは時空間
パターンf(t,x)から ∂f(t,x)/∂t,∂f(t,x)/∂xを算出し、
これによつて各フレームについての32種類の音韻
性を識別する関数を構成し、その結果を32個の2
値で表現したものを単語単位の線形整合に用いて
おり、上述のスペクトルスカラー場からスペクト
ルベクトル場を作成する手法とは異なつていた。
Research using partial differentials of space-time points in past spectra as features was conducted by TB Martin.
“Practical applications of voice input to
machines" Proc. IEEE, 64-4 (1976). However, TBMartin uses the spatio-temporal pattern f(t, x) as ∂f(t,x)/∂t, ∂f(t,x). /∂x is calculated,
In this way, we construct a function that identifies 32 types of phonology for each frame, and divide the results into 32 types of phonology.
This method uses values expressed as values for word-by-word linear matching, which is different from the method described above that creates a spectral vector field from a spectral scalar field.

本発明は上述の手法を工学的観点から更に一歩
勧めて実用化に適した改良を施した音声の特徴抽
出方式を提供することを主な目的とする。
The main object of the present invention is to provide a speech feature extraction method that is improved from an engineering perspective and is suitable for practical use.

また本発明は大語彙を対象とする音声認識、不
特定話者を対象とする音声認識においても高い認
識率が得られる音声の特徴抽出方式を提供するこ
とを他の目的としている。
Another object of the present invention is to provide a speech feature extraction method that can obtain a high recognition rate even in speech recognition that targets a large vocabulary and speech recognition that targets unspecified speakers.

〔課題を解決するための手段〕[Means to solve the problem]

本発明の基本的特徴は、音声信号を分析して時
間軸と空間軸とで規定されるスカラー場の時空間
パターンを得、該時空間パターンを用いて音声の
特徴を抽出する音声の特徴抽出方式において、前
記時空間パターンを空間微分することにより空間
の各格子点で大きさ及び方向をもつベクトル場パ
ターンに変換し、該ベクトル場パターンのベクト
ルについて、その方向パラメータをN値(N:整
数)に量子化し、この量子化値を同じくするベク
トル毎に各々分離して、そのベクトルの大きさを
各格子点の値としたN個の方向別2次元パターン
を作成し、該方向別2次元パターンの方向別に、
時間軸のみ又は時間軸及び空間軸の両方に関して
ボカシ処理を施してなるパターンを音声の特徴と
して抽出するにある。
The basic feature of the present invention is speech feature extraction that analyzes an audio signal to obtain a spatiotemporal pattern of a scalar field defined by a time axis and a spatial axis, and extracts speech features using the spatiotemporal pattern. In this method, the spatiotemporal pattern is spatially differentiated to convert it into a vector field pattern having a magnitude and direction at each grid point in space, and the direction parameter of the vector of the vector field pattern is set to N values (N: an integer). ), separate this quantized value into vectors with the same value, create N two-dimensional patterns for each direction with the size of the vector as the value of each grid point, and create a two-dimensional pattern for each direction. Depending on the direction of the pattern,
The purpose of this method is to extract a pattern obtained by blurring only the time axis or both the time and space axes as audio features.

このボカシ処理は、男、女一方の性のみの音声
の特徴を抽出する場合は時間軸に関してのみ行え
ばよい。
This blurring process only needs to be performed on the time axis when extracting voice features of only one gender, male or female.

男女両性の音声の特徴を抽出する場合は空間軸
についてもボカシ処理を行うが、時間軸に関する
ボカシ処理を空間軸に関するボカシ処理よりも強
調して行う。
When extracting features of voices of both sexes, blurring processing is also performed on the spatial axis, but the blurring processing on the time axis is emphasized more than the blurring processing on the spatial axis.

更にこれらのボカシ処理を予め定められた重み
値を有するマスクパターンをマスク演算すること
によつて行う。
Further, these blurring processes are performed by performing a mask operation using a mask pattern having a predetermined weight value.

〔作用〕[Effect]

入力された音声信号は時間軸と空間軸とで規定
されるスカラー場の時空間パターンからベクトル
の方向パラメータが量子化され、量子化された方
向毎に分離された複数の方向別2次元パターンに
変換される。そしてこの方向別2次元パターンは
ボカシ処理を施され方向性パターン特徴の集積化
が行われる。これによつて音声の特徴の強調と安
定化が得られる。
The input audio signal is quantized from the spatio-temporal pattern of the scalar field defined by the time and space axes, and the vector direction parameters are quantized into two-dimensional patterns separated for each quantized direction. converted. This directional two-dimensional pattern is then subjected to blurring processing, and directional pattern features are integrated. This results in enhancement and stabilization of voice features.

この集積化は時空点(t,x)の一種の構造化
を行うものである。すなわち、この構造化とはN
枚の方向性パターンを統合して考えるとき、時空
点(t,x)には最大N個のベクトルを付加する
ことである(第6図参照)。このことによる音声
認識における効果は音韻性をよりよく表す特徴の
形成とその安定な表現にあり、また音韻性の特徴
がある時空間区間のスペクトルの変化に対応して
いるとする。
This integration performs a kind of structuring of the space-time point (t, x). In other words, this structuring is N
When considering the directional patterns in an integrated manner, a maximum of N vectors are added to the space-time point (t, x) (see Fig. 6). The effect of this on speech recognition lies in the formation of features that better represent phonology and their stable representation, and also corresponds to changes in the spectrum of spatiotemporal intervals in which phonology features exist.

この特徴は、まず微視的にスペクトルベクトル
場で抽出され、次に異なつた方向区間にあるベク
トルが独立した特徴としてみなされた後にそれら
が独立して各時空点に集積される。方向ごとに独
立し、ボカシのマスクパターン内で積分すると
き、特徴の構造性が保たれたままでより巨視的な
特徴(広い時空間領域がつくる音声特徴)が捉え
られる。また、この特徴の集積が時空点(t,
x)ごとに行われるとすると、この音声特徴は特
定の時空間点のみに巨視的な特徴が形成されるの
ではなく、少しづつは異なるが広い(特に時間)
領域にわたつて安定に形成されることとなる。
This feature is first extracted microscopically in the spectral vector field, and then vectors in different directional intervals are considered as independent features, and then they are independently integrated at each space-time point. When integrating within a blurred mask pattern independently for each direction, more macroscopic features (audio features created by a wide spatiotemporal region) can be captured while the structural nature of the features is maintained. Moreover, the accumulation of this feature is the space-time point (t,
x), this audio feature is not a macroscopic feature formed only at a specific spatiotemporal point, but a wide range (especially in time) that differs slightly from time to time.
It will be stably formed over the area.

従つてこのボカシ処理による強調、安定化によ
つて音韻の区別化、話者の正規化が従来よりも高
精度で行える。
Therefore, by emphasizing and stabilizing the blurring process, phoneme differentiation and speaker normalization can be performed with higher accuracy than before.

〔実施例〕〔Example〕

以下本発明をその実施例を示す図面に基づいて
詳述する。
DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described in detail below based on drawings showing embodiments thereof.

第1図は本発明方式を実施するための装置の構
成を示すブロツク図である。
FIG. 1 is a block diagram showing the configuration of an apparatus for implementing the method of the present invention.

この実施例では分析部2で音声信号をスペクト
ル分析してスカラー場の時空間パターンとして空
間軸を周波数軸とするスペクトルを用いる。
In this embodiment, the analysis unit 2 spectrally analyzes the audio signal and uses a spectrum with the spatial axis as the frequency axis as the spatiotemporal pattern of the scalar field.

標準パターン作成のための音声の入力又は認識
対象の音声の入力はマイクロホン等の音声検出器
及びA/D変換器からなる音声入力部1によつて
行われ、これによつて得られた音声信号は通過周
波数帯域を夫々に異にする複数チヤネル(例えば
10〜30)のバンドパスフイルタを並列的に接続し
てなる分析部2に入力される。分析部では、分析
の結果、時空間パターンが得られ、このパターン
が単語区間切出部3によつて認識単位の単語ごと
に区分されて特徴抽出部4へ与えられる。単語区
間切出部3としては従来から知られているものを
用いればよい。
The input of the voice for standard pattern creation or the voice to be recognized is performed by the voice input unit 1 consisting of a voice detector such as a microphone and an A/D converter, and the voice signal obtained thereby is for multiple channels with different passing frequency bands (e.g.
10 to 30) bandpass filters connected in parallel. In the analysis section, a spatio-temporal pattern is obtained as a result of the analysis, and this pattern is segmented into each recognition unit word by the word section extraction section 3 and is provided to the feature extraction section 4 . As the word section cutting section 3, a conventionally known one may be used.

なお周波数帯域ごとに音声信号を分割する分析
部2として、以後の説明においては、上記した如
くバンドパスフイルタ群を用いることとするが、
高速フーリエ変換器を用いてもよい。
In the following description, a group of bandpass filters will be used as the analysis section 2 that divides the audio signal into frequency bands, as described above.
A fast Fourier transformer may also be used.

さて本発明方式は次に説明する特徴抽出部によ
つて特徴づけられる。特徴抽出部4への入力パタ
ーンは横軸を時間軸、縦軸を周波数とする時空間
パターンであり、単語区間切出部3によつて切出
された第2図に示す時空間パターンをf(t,x)
(但しtはサンプリングの時刻を示す番号、xは
バンドパスフイルタのチヤネル番号又は周波数帯
域を特定する番号。1≦t≦T,1≦x≦L但し
T,Lは夫々t,xの最大値)と表す。
Now, the method of the present invention is characterized by the feature extraction section described below. The input pattern to the feature extraction unit 4 is a spatio-temporal pattern with the horizontal axis as the time axis and the vertical axis as the frequency, and the spatio-temporal pattern shown in FIG. (t, x)
(However, t is a number indicating the sampling time, x is a channel number of a bandpass filter or a number specifying a frequency band. 1≦t≦T, 1≦x≦L However, T and L are the maximum values of t and x, respectively. ).

単語区間切出部3出力は特徴抽出部4の正規化
部41へ入力され、正規化部41は時間軸の線形
正規化をする。これは単語の長短、入力音声の長
短等をある程度吸収するためであり、時間軸をT
フレームからMフレーム(例えば16〜32フレ
ーム程度)にする。具体的にはM≦Tの場合は、
正規化した時空間パターンF(t,x)は下記(1)
式で求められる。
The output of the word section extraction section 3 is input to the normalization section 41 of the feature extraction section 4, and the normalization section 41 linearly normalizes the time axis. This is to absorb the length of words, the length of the input voice, etc. to a certain extent, and the time axis is T.
frame to M frames (for example, about 16 to 32 frames). Specifically, if M≦T,
The normalized spatiotemporal pattern F(t, x) is as follows (1)
It is determined by the formula.

F(t,x)=(T/M) F(t,x)= (T/M)

Claims (1)

【特許請求の範囲】 1 音声信号を分析して時間軸と空間軸とで規定
されるスカラー場の時空間パターンを得、該時空
間パターンを用いて音声の特徴を抽出する音声の
特徴抽出方式において、男、女一方の性のみの音
声の特徴抽出に際し、前記時空間パターンを空間
微分することにより空間の各格子点で大きさ及び
方向をもつベクトル場パターンに変換し、該ベク
トル場パターンのベクトルについて、その方向パ
ラメータをN値(N:整数)に量子化し、この量
子化値を同じくするベクトル毎に各々分離して、
そのベクトルの大きさを各格子点の値としたN個
の方向別2次元パターンを作成し、該方向別2次
元パターンの方向別に、時間軸のみに関してボカ
シ処理を施してなるパターンを音声の特徴として
抽出することを特徴とする音声の特徴抽出方式。 2 音声信号を分析して時間軸と空間軸とで規定
されるスカラー場の時空間パターンを得、該時空
間パターンを用いて音声の特徴を抽出する音声の
特徴抽出方式において、前記時空間パターンを空
間微分することにより空間の各格子点で大きさ及
び方向をもつベクトル場パターンに変換し、該ベ
クトル場パターンのベクトルについて、その方向
パラメータをN値(N:整数)に量子化し、この
量子化値を同じくするベクトル毎に各々分離し
て、そのベクトルの大きさを各格子点の値とした
N個の方向別2次元パターンを作成し、該方向別
2次元パターンの方向別に、時間軸及び空間軸に
関してボカシ処理を施してなるパターンを音声の
特徴として抽出し、前記ボカシ処理は、時間軸に
関するボカシ処理を空間軸に関するボカシ処理よ
りも強調して行うことを特徴とする音声の特徴抽
出方式。 3 音声信号を分析して時間軸と空間軸とで規定
されるスカラー場の時空間パターンを得、該時空
間パターンを用いて音声の特徴を抽出する音声の
特徴抽出方式において、前記時空間パターンを空
間微分することにより空間の各格子点で大きさ及
び方向をもつベクトル場パターンに変換し、該ベ
クトル場パターンのベクトルについて、その方向
パラメータをN値(N:整数)に量子化し、この
量子化値を同じくするベクトル毎に各々分離し
て、そのベクトルの大きさを各格子点の値とした
N個の方向別2次元パターンを作成し、該方向別
2次元パターンの方向別に、時間軸及び空間軸に
関してボカシ処理を施してなるパターンを音声の
特徴として抽出し、前記ボカシ処理は、時間軸に
関するボカシ処理を空間軸に関するボカシ処理よ
りも強調して行い、また男女両性の音声の特徴抽
出の場合の空間軸に関するボカシ処理は、一方の
性のみの音声の特徴抽出の場合の空間軸に関する
ボカシ処理に比してより強調して行うことを特徴
とする音声の特徴抽出方式。 4 音声信号を分析して時間軸と空間軸とで規定
されるスカラー場の時空間パターンを得、該時空
間パターンを用いて音声の特徴を抽出する音声の
特徴抽出方式において、前記時空間パターンを空
間微分することにより空間の各格子点で大きさ及
び方向をもつベクトル場パターンに変換し、該ベ
クトル場パターンのベクトルについて、その方向
パラメータをN値(N:整数)に量子化し、この
量子化値を同じくするベクトル毎に各々分離し
て、そのベクトルの大きさを各格子点の値とした
N個の方向別2次元パターンを作成し、該方向別
2次元パターンの方向別に、少なくとも時間軸に
関してボカシ処理を施してなるパターンを音声の
特徴として抽出し、前記ボカシ処理は、各方向別
2次元パターンの各格子点に対し、当該格子点に
対応する中心点を有すると共に当該中心点より時
間軸の両方向に各々2格子点分以上の広がりをも
ち、予め定められた重み値を有するマスクパター
ンをマスク演算する処理であることを特徴とする
音声の特徴抽出方式。 5 前記重み値が総て“1”である請求項4記載
の音声の特徴抽出方式。 6 音声信号を分析して時間軸と空間軸とで規定
されるスカラー場の時空間パターンを得、該時空
間パターンを用いて音声の特徴を抽出する音声の
特徴抽出方式において、前記時空間パターンを空
間微分することにより空間の各格子点で大きさ及
び方向をもつベクトル場パターンに変換し、該ベ
クトル場パターンのベクトルについて、その方向
パラメータをN値(N:整数)に量子化し、この
量子化値を同じくするベクトル毎に各々分離し
て、そのベクトルの大きさを各格子点の値とした
N個の方向別2次元パターンを作成し、該方向別
2次元パターンの方向別に、時間軸及び又は空間
軸に関してボカシ処理を施してなるパターンを音
声の特徴として抽出し、前記ボカシ処理は、各方
向別2次元パターンの各格子点に対し、当該格子
点に対応する中心点を有すると共に当該中心点よ
り時間軸の両方向に各々2格子点分以上の広がり
をもち、且つ当該中心点より空間軸の両方向に
各々1格子分以上の広がりをもち、さらに予め定
められた重み値を有するマスクパターンをマスク
演算する処理であることを特徴とする音声の特徴
抽出方式。 7 前記マスクパターンの時間軸方向の広がりの
方が空間軸方向の広がりより大きい請求項6記載
の音声の特徴抽出方式。 8 前記マスクパターンの中心点及び時間軸方向
の前記重み値は総て“1”であり空間軸方向の重
み値が“1”より小さい請求項6記載の音声の特
徴抽出方式。
[Claims] 1. A speech feature extraction method that analyzes a speech signal to obtain a spatiotemporal pattern of a scalar field defined by a time axis and a spatial axis, and extracts speech features using the spatiotemporal pattern. When extracting the features of voices of only one gender, male or female, the spatiotemporal pattern is transformed into a vector field pattern having a magnitude and direction at each grid point in space by spatial differentiation, and the vector field pattern is For vectors, quantize their direction parameters into N values (N: integer), separate these quantized values for each vector,
N directional two-dimensional patterns are created with the size of the vector as the value of each grid point, and the pattern is obtained by blurring only the time axis for each direction of the two-dimensional pattern. A voice feature extraction method characterized by extracting as follows. 2. In a speech feature extraction method in which a speech signal is analyzed to obtain a spatiotemporal pattern of a scalar field defined by a temporal axis and a spatial axis, and speech features are extracted using the spatiotemporal pattern, the spatiotemporal pattern is is transformed into a vector field pattern that has a magnitude and direction at each grid point in space by spatially differentiating, and the direction parameter of the vector of the vector field pattern is quantized to N values (N: integer), and this quantum Separate each vector with the same value, create N directional two-dimensional patterns with the size of the vector as the value of each grid point, and create a time axis for each direction of the two-dimensional pattern. and extracting a pattern obtained by performing blurring processing on the spatial axis as a feature of the sound, and the blurring processing is performed by emphasizing the blurring processing on the temporal axis more than the blurring processing on the spatial axis. method. 3. In a speech feature extraction method in which a speech signal is analyzed to obtain a spatiotemporal pattern of a scalar field defined by a temporal axis and a spatial axis, and speech features are extracted using the spatiotemporal pattern, the spatiotemporal pattern is is transformed into a vector field pattern that has a magnitude and direction at each grid point in space by spatially differentiating, and the direction parameter of the vector of the vector field pattern is quantized to N values (N: integer), and this quantum Separate each vector with the same value, create N directional two-dimensional patterns with the size of the vector as the value of each grid point, and create a time axis for each direction of the two-dimensional pattern. and a pattern obtained by performing blurring processing on the spatial axis is extracted as a voice feature, and the blurring processing is performed by emphasizing the blurring processing on the time axis more than the blurring processing on the spatial axis, and the features of voices of both sexes are extracted. A voice feature extraction method characterized in that the blurring processing on the spatial axis in the case of is performed more emphatically than the blurring processing on the spatial axis in the case of extracting features of voices of only one gender. 4. In a speech feature extraction method in which a speech signal is analyzed to obtain a spatiotemporal pattern of a scalar field defined by a temporal axis and a spatial axis, and speech features are extracted using the spatiotemporal pattern, the spatiotemporal pattern is is transformed into a vector field pattern that has a magnitude and direction at each grid point in space by spatially differentiating, and the direction parameter of the vector of the vector field pattern is quantized to N values (N: integer), and this quantum Separate each vector with the same value, and create N two-dimensional patterns for each direction with the size of the vector as the value of each grid point, and at least time for each direction of the two-dimensional pattern for each direction A pattern formed by performing blurring processing on the axis is extracted as a sound feature, and the blurring processing has a center point corresponding to the grid point and a center point from the center point for each grid point of the two-dimensional pattern for each direction. A voice feature extraction method characterized in that the process performs a mask calculation on a mask pattern having a spread of two or more grid points in both directions of a time axis and having a predetermined weight value. 5. The audio feature extraction method according to claim 4, wherein the weight values are all "1". 6. In a speech feature extraction method in which a speech signal is analyzed to obtain a spatiotemporal pattern of a scalar field defined by a temporal axis and a spatial axis, and speech features are extracted using the spatiotemporal pattern, the spatiotemporal pattern is is transformed into a vector field pattern that has a magnitude and direction at each grid point in space by spatially differentiating, and the direction parameter of the vector of the vector field pattern is quantized to N values (N: integer), and this quantum Separate each vector with the same value, create N directional two-dimensional patterns with the size of the vector as the value of each grid point, and create a time axis for each direction of the two-dimensional pattern. and/or a pattern formed by performing blurring processing on the spatial axis is extracted as a sound feature, and the blurring processing has a central point corresponding to the grid point for each grid point of the two-dimensional pattern for each direction, and A mask pattern that extends from a center point by two or more grid points in each direction on the time axis, and one grid point or more in each direction from the center point in both directions along the spatial axis, and further has a predetermined weight value. A voice feature extraction method characterized by performing a mask operation on the voice. 7. The audio feature extraction method according to claim 6, wherein the spread of the mask pattern in the time axis direction is larger than the spread in the spatial axis direction. 8. The audio feature extraction method according to claim 6, wherein the center point of the mask pattern and the weight value in the time axis direction are all "1", and the weight value in the spatial axis direction is smaller than "1".
JP63130784A 1987-09-30 1988-05-27 System for extracting characteristic of voice Granted JPH01158496A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63130784A JPH01158496A (en) 1987-09-30 1988-05-27 System for extracting characteristic of voice

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP62-248915 1987-09-30
JP24891587 1987-09-30
JP63130784A JPH01158496A (en) 1987-09-30 1988-05-27 System for extracting characteristic of voice

Publications (2)

Publication Number Publication Date
JPH01158496A JPH01158496A (en) 1989-06-21
JPH0558559B2 true JPH0558559B2 (en) 1993-08-26

Family

ID=17185316

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63130784A Granted JPH01158496A (en) 1987-09-30 1988-05-27 System for extracting characteristic of voice

Country Status (1)

Country Link
JP (1) JPH01158496A (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2613108B2 (en) * 1989-09-27 1997-05-21 工業技術院長 Voice recognition method
DE19729671A1 (en) * 1997-07-11 1999-01-14 Alsthom Cge Alcatel Electrical circuit arrangement arranged in a housing
JP4930608B2 (en) * 2010-02-05 2012-05-16 株式会社Jvcケンウッド Acoustic signal analysis apparatus, acoustic signal analysis method, and acoustic signal analysis program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0330159A (en) * 1989-06-27 1991-02-08 Alps Electric Co Ltd Magnetic disk device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0330159A (en) * 1989-06-27 1991-02-08 Alps Electric Co Ltd Magnetic disk device

Also Published As

Publication number Publication date
JPH01158496A (en) 1989-06-21

Similar Documents

Publication Publication Date Title
Hermansky et al. Multi-resolution RASTA filtering for TANDEM-based ASR
EP1041540B1 (en) Hierarchial subband linear predictive cepstral features for HMM-based speech recognition
JPH036517B2 (en)
Abdollahi et al. Speaker-independent isolated digit recognition using an aer silicon cochlea
CN113160852A (en) Voice emotion recognition method, device, equipment and storage medium
Khan et al. Speaker separation using visually-derived binary masks
Riazati Seresht et al. Spectro-temporal power spectrum features for noise robust ASR
Biswas et al. Hindi vowel classification using GFCC and formant analysis in sensor mismatch condition
Kulkarni et al. A review of speech signal enhancement techniques
EP0292929B1 (en) Method of feature extraction and recognition of voice and recognition apparatus
JP2003005790A (en) Method and device for voice separation of compound voice data, method and device for specifying speaker, computer program, and recording medium
Chavan et al. Speech recognition in noisy environment, issues and challenges: A review
Deiv et al. Automatic gender identification for hindi speech recognition
Singh et al. Novel feature extraction algorithm using DWT and temporal statistical techniques for word dependent speaker’s recognition
JPH0558559B2 (en)
Mertins et al. Vocal tract length invariant features for automatic speech recognition
Ruinskiy et al. Spectral and textural feature-based system for automatic detection of fricatives and affricates
Wang et al. Speech enhancement based on noise classification and deep neural network
JPH0330159B2 (en)
Khan et al. Speaker separation using visual speech features and single-channel audio.
Lee et al. Exploiting principal component analysis in modulation spectrum enhancement for robust speech recognition
Chandrasekaram New Feature Vector based on GFCC for Language Recognition
Muhsina et al. Signal enhancement of source separation techniques
Shahrul Azmi et al. Noise robustness of Spectrum Delta (SpD) features in Malay vowel recognition
Guntur Feature extraction algorithms for speaker recognition system and fuzzy logic

Legal Events

Date Code Title Description
R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

EXPY Cancellation because of completion of term
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20080826

Year of fee payment: 15