JP5051746B2

JP5051746B2 - Feature extraction apparatus and method, and program

Info

Publication number: JP5051746B2
Application number: JP2006298263A
Authority: JP
Inventors: 聖一郎大; 展之大津; 宏明児島
Original assignee: National Institute of Advanced Industrial Science and Technology AIST
Current assignee: National Institute of Advanced Industrial Science and Technology AIST
Priority date: 2006-11-01
Filing date: 2006-11-01
Publication date: 2012-10-17
Anticipated expiration: 2026-11-01
Also published as: JP2008116588A

Description

本発明は、例えば、音声認識装置、感情認識装置、音声検索システム、音楽検索システム、CMの放送確認、音響的診断による機器の異常検出、音響的な異常検出による家庭の防犯、聴覚障害者に異常音を知らせる支援装置などにおいて用いることのできる音声や音響信号、さらには一般的な１次元の時系列信号の特徴抽出装置及び方法並びにプログラムに関する。 The present invention is, for example, a voice recognition device, an emotion recognition device, a voice search system, a music search system, CM broadcast confirmation, device abnormality detection by acoustic diagnosis, home crime prevention by acoustic abnormality detection, The present invention relates to a voice and acoustic signal that can be used in a support device that notifies an abnormal sound, and a general one-dimensional time-series signal feature extraction device, method, and program.

高次局所自己相関特徴（HLAC）を用いた手法に関して、従来、特許文献１をはじめとする、画像データを対象とした特徴抽出に関する研究が進められ、特許文献２のように、動画像に適用した上で異常動作の検出などの応用にも発展している。また、非特許文献１は、変換手法としての時変パワースペクトルについて開示し、フィッシャー重みの技法を使用して、HLACに既存の技術を融合した一例を開示する。さらに、他の特徴を用いた音響的異常検出に関しては、例えば特許文献３のように、原子炉の異常検出のような個別の目的に応じた手法の開発が行われてきた。
特許第２９８２８１４号公報特開２００６−０７９２７２号公報特公平４−０３２３５６号公報加藤俊、滝口哲也，有木康雄: “フィッシャー重みマップを利用した高次局所自己相関特徴による音素認識”,日本音響学会平成17年度秋季研究発表会，1-P-10，pp.171-172，2005-09 With regard to the technique using higher-order local autocorrelation features (HLAC), research on feature extraction for image data including Patent Document 1 has been promoted and applied to moving images as in Patent Document 2. In addition, it has been developed for applications such as detection of abnormal operations. Non-Patent Document 1 discloses a time-varying power spectrum as a conversion method, and discloses an example in which an existing technology is integrated with HLAC using a Fisher weight technique. Furthermore, with respect to acoustic abnormality detection using other features, for example, as in Patent Document 3, a method corresponding to an individual purpose such as detection of abnormality in a nuclear reactor has been developed.
Japanese Patent No. 2982814 JP 2006-079272 A Japanese Patent Publication No. 4-032356 Shun Kato, Tetsuya Takiguchi, Yasuo Ariki: “Phoneme Recognition Using Higher-Order Local Autocorrelation Features Using Fisher Weight Map”, Acoustical Society of Japan 2005 Autumn Meeting, 1-P-10, pp.171-172 , 2005-09

高次局所自己相関特徴（HLAC）を用いた特徴抽出手法に関して、従来は主として画像データを対象として研究が進められ、音響信号を対象とした手法は確立されていなかった。また、他の特徴量を用いた手法の場合は、対象に依存した特徴量や判定方法等をその都度調査して設定する必要があった。
本発明は、対象への依存性が低い汎用的な特徴抽出法であるHLACの利点を、音響信号に対しても活かす手法を確立することを目的としている。 With regard to feature extraction methods using higher-order local autocorrelation features (HLAC), research has been progressed mainly for image data, and no method for acoustic signals has been established. In the case of a method using other feature amounts, it is necessary to investigate and set feature amounts and determination methods depending on the target each time.
An object of the present invention is to establish a technique for utilizing the advantages of HLAC, which is a general-purpose feature extraction method with low dependency on an object, also for an acoustic signal.

本発明の特徴抽出方法は、１次元時系列信号を非定常カオス解析に基づいて分析し、それによって生成される２次元画像から高次局所自己相関係数を計算して特徴を抽出する。この高次局所自己相関係数の計算は、前記１次元時系列信号の分析によって生成される２次元画像のヒストグラムを算出することによって得た閾値によって、２次元画像情報を２値の情報に変換することによって生成される２値画像情報に基づいて行われる。前記１次元時系列信号は音声信号或いは音響信号であり、前記非定常カオス解析はリカレンスプロット手法によって行われる。 The feature extraction method of the present invention analyzes a one-dimensional time-series signal based on non-stationary chaos analysis, calculates a higher-order local autocorrelation coefficient from a two-dimensional image generated thereby, and extracts features. This high-order local autocorrelation coefficient is calculated by converting two-dimensional image information into binary information using a threshold obtained by calculating a histogram of a two-dimensional image generated by analyzing the one-dimensional time series signal. This is performed based on the binary image information generated by doing so. The one-dimensional time series signal is an audio signal or an acoustic signal, and the unsteady chaos analysis is performed by a recurrence plot method.

また、本発明の特徴抽出装置は、１次元時系列信号を非定常カオス解析に基づいて分析する信号分析部と、それによって生成される２次元画像のヒストグラムを算出することによって得た閾値によって、２次元画像情報を２値の情報に変換する２値化処理部と、それによって生成される２値画像情報から高次局所自己相関係数を計算するパラメータ生成部から構成される。 Further, the feature extraction device of the present invention includes a signal analysis unit that analyzes a one-dimensional time-series signal based on non-stationary chaos analysis, and a threshold value obtained by calculating a histogram of a two-dimensional image generated thereby, A binarization processing unit that converts two-dimensional image information into binary information, and a parameter generation unit that calculates a higher-order local autocorrelation coefficient from the binary image information generated thereby.

また、本発明の特徴抽出プログラムは、１次元時系列信号を非定常カオス解析に基づいて分析する手順と、それによって生成される２次元画像のヒストグラムを算出することによって得た閾値によって、２次元画像情報を２値の情報に変換する手順と、それによって生成される２値画像情報から高次局所自己相関係数を計算する手順をコンピュータによって実行する。 In addition, the feature extraction program of the present invention is based on a procedure for analyzing a one-dimensional time-series signal based on non-stationary chaos analysis and a threshold obtained by calculating a histogram of a two-dimensional image generated thereby. A procedure for converting image information into binary information and a procedure for calculating higher-order local autocorrelation coefficients from the binary image information generated thereby are executed by a computer.

本発明によれば、対象となる系の非定常力学特性を反映したHLAC特徴量が抽出され、子音や異常音のような動的特性を持つ対象や、感情などの非言語情報に適した特徴抽出手法が可能となり、音声認識や異常音検出における精度の向上が実現される。 According to the present invention, HLAC features that reflect the unsteady dynamic characteristics of the target system are extracted, and features suitable for non-linguistic information such as objects with dynamic characteristics such as consonants and abnormal sounds, and emotions. An extraction method becomes possible, and improvement in accuracy in voice recognition and abnormal sound detection is realized.

以下、例示に基づき本発明を説明する。図１は、本発明の特徴抽出装置の概略構成を示す図である。本発明は、例えば音響信号や音声信号のような１次元信号を、非線形変換することにより得られた２次元特徴量に対して、高次局所自己相関を適用して特徴抽出をする。本発明における、非線形変換とは以下の１）で述べる位相空間再構成手続きのことである。本発明による特徴抽出方法は、音声認識装置、感情認識装置、音声検索システム、音楽検索システム、CMの放送確認、音響的診断による機器の異常検出、音響的な異常検出による家庭の防犯、聴覚障害者に異常音を知らせる支援装置等に利用することができる。 Hereinafter, the present invention will be described based on examples. FIG. 1 is a diagram showing a schematic configuration of a feature extraction apparatus of the present invention. In the present invention, for example, feature extraction is performed by applying high-order local autocorrelation to a two-dimensional feature value obtained by nonlinearly transforming a one-dimensional signal such as an acoustic signal or an audio signal. In the present invention, the non-linear transformation is a phase space reconstruction procedure described in 1) below. The feature extraction method according to the present invention includes a voice recognition device, an emotion recognition device, a voice search system, a music search system, CM broadcast confirmation, device abnormality detection by acoustic diagnosis, home crime prevention by acoustic abnormality detection, and hearing impairment It can be used for a support device that informs a person of an abnormal sound.

非定常カオス解析手法のひとつとして、RP（Recurrence Plot）がある。図１に示すように、音声信号等の１次元時系列信号は、信号分析部において、リカレンスプロット等の非定常カオス解析に基づいて分析されて、２次元画像が生成される。カオスに着目する一番の理由は、観測時系列データから力学系を構成できることであり、非定常に着目する一番の理由は、生体信号の多くはノイズや波形の時間変動により、ゆらいでいるため、古典的なスペクトルベースの解析法からは得られないような、広義な情報（特徴）を得ることができることである。
非定常カオスの方法論がもたらす理由は、この両者を統合したものであり、その中でもRPを選択した一番の理由は、音現象に関して有効であることと、音から生成される位相空間軌道は過去の調査から高次元になる傾向にあるので、RPはその本質的な情報を可視してくれる有効なツールとして用いることにある。カオス力学特性を数値や方程式ではなく、模様とか絵のように可視化する要素と、全く別分野である、パターン認識で培われてきたHLACという特徴抽出技術により、これらの絵をパターン化し、分類することで、本来は複雑高次な力学系を立式することなく、それと等価な力学構造を導き出すことが、「音の信号処理」に関しては有用である。音以外の1次元信号（脳波や筋電等）からの異常検出に関しては、RP以外の他の非定常カオスのツール（時空カオス理論におけるCML(Coupled Map Lattice)，DVS(Deterministic Versus Stochastic)法，係数族再構成法など）を用いることができる。 One of the transient chaos analysis methods is RP (Recurrence Plot). As shown in FIG. 1, a one-dimensional time-series signal such as an audio signal is analyzed based on a non-stationary chaos analysis such as a recurrence plot in a signal analysis unit to generate a two-dimensional image. The primary reason for focusing on chaos is that a dynamic system can be constructed from observation time series data, and the primary reason for focusing on non-stationary conditions is that many biological signals are fluctuated due to time fluctuations in noise and waveforms. Therefore, it is possible to obtain broad information (features) that cannot be obtained from a classical spectrum-based analysis method.
The reason for the unsteady chaos methodology is the integration of both, and the most important reason for choosing RP is that it is effective for sound phenomena and the phase space trajectory generated from the sound is the past. RP is to be used as an effective tool for visualizing its essential information. Pattern and classify these pictures using elements that visualize chaotic mechanical properties as patterns or pictures instead of numerical values and equations, and a feature extraction technology called HLAC that has been cultivated in pattern recognition, which is a completely different field. Thus, it is useful for “sound signal processing” to derive an equivalent dynamic structure without formulating a complex high-order dynamical system. For detection of abnormalities from one-dimensional signals other than sound (such as brain waves and myoelectric signals), other non-stationary chaos tools other than RP (CML (Coupled Map Lattice) in space-time chaos theory, DVS (Deterministic Versus Stochastic) method, Coefficient family reconstruction method etc. can be used.

２値化処理部では、信号分析部における分析によって生成された２次元画像のヒストグラムを算出し、そのモードの近傍に基づいて選定した閾値によって、２次元画像情報を２値の情報に変換する。
パラメータ生成部では、２値化処理部において生成された２値画像情報から高次局所自己相関特徴（HLAC特徴）を計算し、これによって特徴抽出を行う。 The binarization processing unit calculates a histogram of the two-dimensional image generated by the analysis in the signal analysis unit, and converts the two-dimensional image information into binary information using a threshold selected based on the vicinity of the mode.
The parameter generation unit calculates higher-order local autocorrelation features (HLAC features) from the binary image information generated by the binarization processing unit, and performs feature extraction.

このように、本発明は、音響信号に対する信号処理により、2次元座標上の特徴量に変換した上で、従来の２次元画像データに対するHLACの手法を適用する。これにより、対象となる系の非定常力学特性を反映したHLAC特徴量が抽出され、子音や異常音のような動的特性を持つ対象や、感情などの非言語情報に適した特徴抽出手法が可能となり、音声認識や異常音検出における精度の向上が実現される。 As described above, the present invention applies the HLAC technique to the conventional two-dimensional image data after converting the characteristic amount on the two-dimensional coordinates by signal processing on the acoustic signal. As a result, HLAC features that reflect the unsteady dynamic characteristics of the target system are extracted, and a feature extraction method suitable for non-linguistic information such as objects that have dynamic characteristics such as consonants and abnormal sounds and emotions. It becomes possible to improve the accuracy in voice recognition and abnormal sound detection.

以下、より具体化した図２のフローチャートに従って、さらに説明する。図２の「RP分析」の部分で多値画像を出力した時点で、RP分析は終了する。ここまでが、信号分析部に相当する。信号分析部で生成された画像に対して画像処理が施される。モード閾値選定2値化の2値化処理部は画像処理の一環である。 Hereinafter, further description will be given according to the more specific flowchart of FIG. When the multi-valued image is output in the “RP analysis” part of FIG. 2, the RP analysis ends. The steps so far correspond to the signal analysis unit. Image processing is performed on the image generated by the signal analysis unit. The binarization processing unit for mode threshold selection binarization is part of image processing.

Ｉ信号分析部
信号分析部は、音声信号等の１次元時系列信号を、リカレンスプロット等の非定常カオス解析に基づいて分析して、２次元画像を生成する。 I Signal Analysis Unit The signal analysis unit analyzes a one-dimensional time series signal such as an audio signal based on a non-stationary chaos analysis such as a recurrence plot, and generates a two-dimensional image.

１）位相空間再構成の手続き
力学系とは、ある時刻における系の状態が微分方程式などにより決定されるシステムのことをいう。位相空間は逆に、複数の状態変数が時系列データとして観測可能であっても、各状態変数を力学系の変数にそのまま対応づけることは困難である。本実証例で使用する音声データも音圧振幅１種類の時系列データであるが、もしこれが力学系であるならば、時系列データからあるルールに基づいて、複数次元の座標へ変換し（これを再構成と呼ぶ）、高次元における力学系の振る舞いを把握することが可能である。観測時系列データを再構成する場合の、再構成先のことを位相空間または遅延座標系と呼ぶ。 1) Procedure for phase space reconstruction A dynamic system is a system in which the state of the system at a certain time is determined by a differential equation or the like. Conversely, in the phase space, even if a plurality of state variables can be observed as time-series data, it is difficult to directly associate each state variable with a variable in the dynamic system. The voice data used in this demonstration example is also a time series data of one kind of sound pressure amplitude. If this is a dynamic system, it is converted from the time series data to a multi-dimensional coordinate based on a certain rule (this Is called reconstruction), and it is possible to grasp the behavior of dynamical systems in higher dimensions. The reconstruction destination when the observation time series data is reconstructed is called a phase space or a delayed coordinate system.

m自由度の力学系から得られたある状態変数の時系列データからd次元の位相空間に軌道を再構成しようとする場合、適当な時間遅れτごとのd個の状態変数によってd次元ベクトルを作成する。ここで、時系列データをx(n)、時間をnとすると、遅延座標は(１)式のように定義できる。
When trying to reconstruct a trajectory from time series data of a certain state variable obtained from a dynamic system of m degrees of freedom into a d-dimensional phase space, a d-dimensional vector is defined by d state variables for each appropriate time delay τ. create. Here, if the time series data is x (n) and the time is n, the delay coordinates can be defined as in equation (1).

この次元dと時間遅れτのことを埋め込みパラメータと呼び、このようにして再構成された軌道のフラクタル次元をmと仮定した場合、2m+1次元より大きな位相空間に再構成したアトラクタは、元のアトラクタの埋め込み(Embedding)になっていることが、Takens（タケンス）の埋め込み定理により保証されている。再構成軌道が元の力学系の軌道の埋め込みになっている場合、再構成された位相空間内のベクトルX(n)のアトラクタに、元の力学系の状態変数x(n)の特徴量が位相的に保存される（これを微分同相の関係になるという）。 When this dimension d and time delay τ are called embedding parameters, and the fractal dimension of the reconstructed trajectory is assumed to be m, the attractor reconstructed into a phase space larger than 2m + 1 dimension is This is guaranteed by the Embleming of Takens. When the reconstructed trajectory is an embedding of the original dynamical system trajectory, the attractor of the vector X (n) in the reconstructed phase space has the feature quantity of the state variable x (n) of the original dynamical system. Preserved topologically (this is called a diffeomorphic relationship).

２）埋め込みパラメータの推定
(１)式で定義した遅延座標を作成するために、埋め込みパラメータとして、時間遅れτと埋め込み次元dを推定しなければならない。本実証例で扱った埋め込み次元の推定法をa）に、時間遅れの推定法をｂ）に示す。 2) Estimation of embedding parameters
In order to create the delay coordinates defined by equation (1), the time delay τ and the embedding dimension d must be estimated as embedding parameters. The embedding dimension estimation method used in this demonstration example is shown in a), and the time delay estimation method is shown in b).

2-a) False Nearest Neighbor (FNN)法による埋め込み次元の推定
FNN法は、ある次元で再構成された空間で近傍にある点が、次元を変えることにより離れてしまうものを誤り近傍点（False Nearest Neighbor；FNN）とし、その数が0に近づいたときの次元を埋め込み次元とする方法である。近傍にあるかどうかの基準には、以下の2つの基準尺度がある 2-a) Estimation of embedding dimension by False Nearest Neighbor (FNN) method
In the FNN method, when a point that is close in a space reconstructed in a certain dimension is separated by changing the dimension, it is set as an error near point (FNN), and when the number approaches zero This is a method in which a dimension is an embedded dimension. There are the following two scales of criteria for whether or not they are close

(第一基準尺度)
先の(1)式に示すように、時系列データx(n)からある任意のd次元位相空間にデータを再構成する。位相空間における軌道上の任意の点x(n)から第r番目の近接点ｘ（ｎ）^(r)までのユークリッド距離をＲ_d（ｎ，ｒ）と表し(2)式を得る。
この時Ｒ_dをd+1次元までに拡張した距離をＲ_d+1とすると、(3)式を得る。
(First standard scale)
As shown in the previous equation (1), the data is reconstructed from the time series data x (n) into an arbitrary d-dimensional phase space. The Euclidean distance from an arbitrary point x (n) on the trajectory in the phase space to the rth closest point x (n) ⁽ r) is expressed as R _d (n, r), and the equation (2) is obtained.
At this time, if a distance obtained by extending R _d to the d + 1 dimension is R _{d + 1} , Equation (3) is obtained.

Ｒ_tolを閾値として、埋め込み次元dをd+1まで拡張した時の距離の変動割合を、軌道上のすべての点に対して閾値と比較するのが第一基準尺度である [(４)式]。

The first reference scale is to compare the distance fluctuation ratio when the embedding dimension d is expanded to d + 1 with R _tol as a threshold [Equation (4) ].

（第二基準尺度）
Ｒ_Aをアトラクタサイズとして、
のように表す。この時、(５)式を満たす第r番目の近傍点は、第二基準尺度における誤り近傍点と定義される。
(Second standard scale)
With _RA as the attractor size,
It expresses like this. At this time, the r-th neighbor point satisfying equation (5) is defined as an error neighbor point in the second reference scale.

埋め込み次元dを増加することによって、こうした二つの基準尺度により求められたFNNの比率（False Nearest Neighbor Ratio; FNNR）はシステムに決定論性が存在する場合は減少してゆく。これにより、FNNRが0に低減する時の埋め込み次元が適切な埋め込み次元となる。閾値選定をＲ_tol≧１０のように決めれば、FNNの数を定量的に評価でき、埋め込み次元を客観的に定量的に推定できることをKennelは数値実験により提唱している。 By increasing the embedding dimension d, the FNN ratio (False Nearest Neighbor Ratio; FNNR) determined by these two reference measures decreases when the system has determinism. Thereby, the embedding dimension when FNNR is reduced to 0 becomes an appropriate embedding dimension. Kennel has proposed by numerical experiments that if the threshold selection is determined as R _tol ≧ 10, the number of FNNs can be evaluated quantitatively and the embedding dimension can be estimated quantitatively objectively.

2-b) 相互情報量による時間遅れの推定
遅延座標を作る際に、時間遅れが短いと非常に相関性の強いアトラクタが遅延座標系に反映され、又時間遅れが長すぎると相関性が失われ、でたらめな軌道として反映されることになる。このような時、アトラクタをもとに導き出される多くの非線形統計量はもはや信頼性がなくなり、ここからカオス性を判断することは意味をなさなくなってしまう。従って、適切な時間遅れを見出すことは非常に重要であり、そのために、時間遅れと相関性の高い相互情報量から推定することが一般に行われている。遅延座標系におけるアトラクタの相互情報量Mは、P(x(n))を時系列変数x(n)の出現する確率とすれば式(６)のように定義される。
2-b) Estimating time delay based on mutual information When creating delayed coordinates, if the time delay is short, a highly correlated attractor is reflected in the delayed coordinate system, and if the time delay is too long, the correlation is lost. It will be reflected as a random trajectory. At such times, many non-linear statistics derived from attractors are no longer reliable, and it makes no sense to determine chaos here. Accordingly, it is very important to find an appropriate time delay, and for this reason, it is generally performed to estimate from a mutual information amount highly correlated with the time delay. The mutual information amount M of the attractor in the delayed coordinate system is defined as in Equation (6), where P (x (n)) is the probability that the time series variable x (n) will appear.

相互情報量Mは、状態変数の統計量に依存するので、ホワイトノイズなどのランダム信号などはM=0となり相関性がないことになる。反対にM=∞なら完全に相関性が保持される信号になる。適切な時間遅れとしては、相互情報量の推移グラフ中の第一次極小に至る時の時間遅れを適用することが好ましいとされている。相互情報量の値が次元に対して、明らかに単調減少となる場合には、傾きが1/eとなる時の値を遅れ時間とした。 Since the mutual information M depends on the statistic of the state variable, random signals such as white noise have M = 0 and no correlation. On the other hand, if M = ∞, the signal is completely correlated. As an appropriate time delay, it is preferable to apply the time delay when reaching the first minimum in the transition graph of the mutual information amount. When the mutual information value clearly decreased monotonously with respect to the dimension, the value when the slope was 1 / e was taken as the delay time.

2-c）埋め込みパラメータの相互推定手順
埋め込み次元dと時間遅れτは、相互に依存し合う量であるので、両パラメータが収束するまで以下の計算を繰り返し、平衡解を算出する。以下にその手順を示す。 2-c) Procedure for mutual estimation of embedding parameters Since the embedding dimension d and the time delay τ are mutually dependent quantities, the following calculation is repeated until both parameters converge to calculate an equilibrium solution. The procedure is shown below.

[τ→ｄ]
各母音について適当な時間遅れをq種類与える。ここではqの上限を10に設定する。1≦τ≦qの範囲の総計q種類のτに関して、FNN分析を実行して、各τに関するq種類の埋め込み次元値を推定する。この埋め込み次元値の集合を次元列と呼ぶことにする。 [τ → d]
Give q kinds of appropriate time delays for each vowel. Here, the upper limit of q is set to 10. For the total q types of τ in the range of 1 ≦ τ ≦ q, FNN analysis is performed to estimate q types of embedding dimension values for each τ. This set of embedded dimension values is called a dimension sequence.

[ｄ→τ]
次に、時系列データの相互情報量から、ｒ種類の埋め込み次元数を基に、時間遅れを推定する。ｒは先のqと同様に、ここでは埋め込み次元の種類を表す記号とする。すなわち、１≦d≦ｒ範囲の総計ｒ種類の埋め込み次元に関して、相互情報量の計算を行い、最初の極小値をとる値を時間遅れとすることによりｒ種の値を得ることに相当する。この時間遅れ値の集合を、遅れ列と呼ぶことにする。 [d → τ]
Next, the time delay is estimated from the mutual information amount of the time series data based on the r types of embedding dimensions. Here, r is a symbol representing the type of embedding dimension, as in q above. That is, this is equivalent to calculating the mutual information regarding the total of r types of embedding dimensions in the range of 1 ≦ d ≦ r, and obtaining r types of values by delaying the first minimum value. This set of time delay values is called a delay sequence.

[ｄ⇔τ]
システムが決定論的であれば、埋め込みパラメータは一意に定まるという仮定を立てる。
そうすると、次元列、遅れ列はｄとτの2次元平面上で交点を持つこととなり、これが最適値となる。 [d⇔τ]
If the system is deterministic, the assumption is made that the embedding parameters are uniquely determined.
If it does so, a dimension row | line | column and a delay row | line | column will have an intersection on the two-dimensional plane of d and (tau), and this will become an optimal value.

この交点が複数存在するような場合は、（d、 τ）の値が共に小さくなるように、すなわちd-τ平面の原点から一番近い交点を選択することとした。これは、FNN法はシステムの最小埋め込み次元を推定する方法であることと、次元を変化させたときの相互情報量の最初の極小値を遅れ時間として用いることの双方の条件を考慮するためである。相互情報量の変化に関しては、複数の極小値が出現するケースも多く、極小値の判定が難しい場合もある。こうした場合、適切と推定される遅れ時間の値が大きく幅を持ってしまう。また、位相空間軌道の非線形相関を保持するという意味では、最初の極小値が必ずしも最適であるとは限らない。例えば、最初と二番目の極小値が同じ値をとり、双方のτが離れている場合、遅れ時間に差が出てくることになるが、この差異によりカオス性の判定が変わってしまうこともある。あるいは、元々対象としている系に確率論的（stochastic）な性質があり、カオス解析には不向きであることもあるので、遅れ時間の計算には注意が必要である。ただ、一般には、最も遅れ時間が小さくなる最初の極小値を用いることが一般的である。 When there are multiple intersections, the intersection point closest to the origin of the d-τ plane is selected so that both the values of (d, τ) are small. This is because the FNN method is a method for estimating the minimum embedding dimension of the system and considers the conditions for using the first local minimum value of the mutual information as the delay time when the dimension is changed. is there. Regarding changes in the mutual information amount, there are many cases where a plurality of local minimum values appear, and determination of local minimum values may be difficult. In such a case, the value of the delay time estimated to be appropriate has a large range. In addition, the first minimum value is not always optimal in the sense that the nonlinear correlation of the phase space trajectory is maintained. For example, if the first and second minimum values have the same value and both τ are separated, there will be a difference in the delay time, but this difference may change the determination of chaos. is there. Alternatively, the system that is originally targeted has stochastic properties and may not be suitable for chaos analysis, so care must be taken when calculating the delay time. However, in general, it is common to use the first minimum value with the smallest delay time.

３）ＲＰ分析
RP法は、軌道上の任意な位置ベクトルにより決まるユークリッドノルム（複数次元におけるベクトルの長さ、以下ノルム）情報をそのまま2次元画像の画素濃度として置き換えるので、軌道の高次元化とは関係なく独立に推定できる分析法の一つである。位置ベクトルの選択は一般に任意だが、本実証例実験では「時間」に関してのベクトルを用いる。 RP分析法は、この時間的な変化を画像上の空間配置として可視化されるので、軌道上のどの地点での変化かを瞬時に読み取ることができる。 3) RP analysis
The RP method replaces the Euclidean norm (vector length in multiple dimensions, hereinafter referred to as the norm) information determined by an arbitrary position vector on the orbit as it is as the pixel density of the two-dimensional image, so it is independent of the higher-order trajectory. This is one of the analytical methods that can be estimated. The selection of the position vector is generally arbitrary, but in this demonstration experiment, a vector related to “time” is used. In the RP analysis method, this temporal change is visualized as a spatial arrangement on the image, so it is possible to instantly read at which point on the trajectory the change.

位相空間内に再構成される時系列データの数をNとする。図３のように、要素数N×NのRP平面上の格子点(i、 j)に、図中左端の位相空間内軌道を構成する変位ベクトルv_{i}、 v_{j}間のノルム（距離）によって色づけしていくことを考える。ここで、i、 jはRP平面上の要素番号を表し、ノルムを計算するベクトルの始点インデックスと終点インデックスである。RP平面上に始点インデックスi（図３ではi=o）を定め、終点インデックスとして時間進展する方向であるj=1からj=Nまで、それぞれのノルムを計算していく。i=o+1に始点が置き換わった時は、RP平面上では、1列分i方向へ始点インデックスがずれることになり、同様にj方向の計算が行われる。このように軌道の時間発展nは、RP平面上では図の手前から奥へ（j方向）、左から右側へ（i方向）段階的に進むことになる。本実証実験における探索方法は、総当たり戦方式とし、自分自身(i=j)のインデックスは除くものとした。各々のノルムの計算値によるRP平面上の画素D(i、j)は、軌道の最大ノルムから最小ノルムの間を、一定の階級値ごとに区分し、各階級に属するノルムを色彩r(k)で割り当てることで決めた[(７)式]。
(但し、i≦j、 2≦k<r、 k=0の時δ_{k}=0とする)
ここでkは色彩の数であり、rは色彩を表す。δ_kは解像度パラメータと呼ばれ、上述の色彩インデックスkで区分される階級値であり、次の(８)式で表す。
Let N be the number of time-series data reconstructed in the phase space. As shown in FIG. 3, the norm between the displacement vectors v_ {i} and v_ {j} that form the orbit in the phase space at the left end of the figure at the grid point (i, j) on the RP plane with N × N elements Think of coloring by (distance). Here, i and j represent element numbers on the RP plane, and are the start point index and end point index of the vector for calculating the norm. A start point index i (i = o in FIG. 3) is determined on the RP plane, and each norm is calculated from j = 1 to j = N, which is a direction of time progression as an end point index. When the starting point is replaced with i = o + 1, the starting point index is shifted in the i direction by one column on the RP plane, and the calculation in the j direction is similarly performed. In this way, the time evolution n of the orbit progresses stepwise from the front of the figure to the back (j direction) and from the left to the right (i direction) on the RP plane. The search method in this demonstration experiment was the round robin method, and the index of myself (i = j) was excluded. The pixel D (i, j) on the RP plane according to the calculated value of each norm divides the orbit from the maximum norm to the minimum norm for each fixed class value, and the norm belonging to each class is represented by the color r (k ) [Expression (7)] decided by assigning.
(However, when i ≦ j, 2 ≦ k <r, k = 0, δ_ {k} = 0)
Here, k is the number of colors, and r is a color. [delta] _k is called the resolution parameter, a class value which is divided by color index k described above, represented by the following equation (8).

3-a) 解像度パラメータδ_kの制約
最小ノルムmin|v_i-v_j|を黒色で、最大ノルムmax|v_i-v_j|を白色で示し、その間のノルムの値をグラデーション化する。対応する点の色彩が白色に近づく程、軌道上の各変位ベクトル間のノルムが大きく、黒色に近づく程、ノルムが小さくなることを意味する。又、表示の用途に合わせてグレイスケールからRGBスケールに変更することで可視化の幅が広がり、得られたイメージを画像処理することで、より詳細な特徴量抽出が可能にもなる。解像度パラメータδ_kを非常に小さな値に取った場合、得られる画像からは軌道上の変位ベクトルの些細な変化を捉えることができる。ただし、この解像度パラメータδ_kにはδ_k＜1/2^Bの制約がある。ここでBは量子化ビット数を表す。理論的にδ_kは、時間的に連続な軌道であれば無限小まで設定できるが、本実証実験で扱うような離散データの場合、位相空間軌道も離散化された変位ベクトルの集合なので、そこで張られるノルムを区分する階級幅にも1/2^Bまでという下限がある。RGB解像度(3×8=24bit)の場合、使用する音声データの量子化ビット数は16bitなので、δ解像度の方が細かいことになる。 3-a) Restriction of resolution parameter δ _k The minimum norm min | v _i -v _j | is shown in black and the maximum norm max | v _i -v _j | is shown in white, and the norm value between them is gradationized. It means that the norm between the displacement vectors on the trajectory is larger as the color of the corresponding point is closer to white, and the norm is smaller as it is closer to black. In addition, the range of visualization is expanded by changing the gray scale to the RGB scale according to the display application, and more detailed feature amount extraction can be performed by performing image processing on the obtained image. When the resolution parameter δ _{k is set} to a very small value, a slight change in the displacement vector on the trajectory can be captured from the obtained image. However, the resolution parameter δ _k has a constraint of δ _k <1/2 ^B. Here, B represents the number of quantization bits. Theoretically, δ _k can be set to infinitely small if it is a temporally continuous trajectory, but in the case of discrete data handled in this demonstration experiment, the phase space trajectory is also a set of discretized displacement vectors. There is also a lower limit of 1/2 ^B on the class width that divides the stretched norm. In the case of RGB resolution (3 × 8 = 24 bits), since the quantization bit number of the audio data to be used is 16 bits, the δ resolution is finer.

3-b) 多重解像度解析
多重解像度解析ではδの帯域幅を連続的に変動させることで、小スケールから大スケールまでのノルムの階級分けを行う。(7)式では、ある色彩数の上限rまでのkに対し頻度分布が算出された。例えば、r=256の場合は256段階の解像度で固定された画像が1枚できて、この画像からは1本の色彩頻度分布(ヒストグラム)が算出される。ある特定の解像度を持つ画像のことをここでは便宜上、「階層」という言葉で表現する。今上限値を、r=256から128に変化させる時、生成画像の解像度はr=128に固定され、r=128の一つのヒストグラムが計算される。r=256とr=128の２つの階層間の画像からは２つの頻度分布が生成されることになり、w種の複数の階層間からはwだけのヒストグラムが生成される。このように、ヒストグラムhを上限解像度毎にw階層だけ解像度変化させながら、RP分析を逐次行うことを、RP多重解像度解析と呼ぶことにする。ヒストグラムhをw階層で分けることを、f^(w)= h(k)で表現する。kは帯域パラメータとする。 3-b) Multi-resolution analysis Multi-resolution analysis classifies norms from small to large scales by continuously varying the bandwidth of δ. In equation (7), the frequency distribution is calculated for k up to the upper limit r of a certain color number. For example, in the case of r = 256, one image fixed at a resolution of 256 steps is made, and one color frequency distribution (histogram) is calculated from this image. For the sake of convenience, an image having a specific resolution is expressed by the term “hierarchy”. When the upper limit value is changed from r = 256 to 128, the resolution of the generated image is fixed at r = 128, and one histogram with r = 128 is calculated. Two frequency distributions are generated from images between two layers of r = 256 and r = 128, and only w histograms are generated from a plurality of w types of layers. In this way, sequentially performing RP analysis while changing the resolution of the histogram h by w layers for each upper limit resolution is referred to as RP multi-resolution analysis. Dividing the histogram h into w levels is expressed as f ^(w) = h (k). k is a band parameter.

例として、f^(w=2)は解像度r=2までの2値分布、f^(w=3)なら解像度r=3までの3値分布となり、f^(w=256)では256値の解像度分布を考える。その結果、RP分析においては様々な解像度を持つRP画像から画素の濃度分布を計算し、全分布の最頻出値（モード）の階層による変化（以下モード遷移と呼ぶ）に着目できる For example, f ^{(w = 2)} is a binary distribution up to resolution r = 2, f ^{(w = 3)} is a ternary distribution up to resolution r = 3, and f ^{(w = 256)} is a 256-value resolution distribution. think of. As a result, in the RP analysis, the pixel density distribution is calculated from RP images with various resolutions, and the change due to the hierarchy of the most frequent values (modes) of the entire distribution (hereinafter referred to as mode transition) can be noted

II ２値化処理部（画像処理）
２値化処理部では、上記信号分析部における分析によって生成された２次元画像のヒストグラムを算出し、そのモードの近傍に基づいて選定した閾値によって、２次元画像情報を２値の情報に変換する。 II Binarization processing unit (image processing)
The binarization processing unit calculates a histogram of the two-dimensional image generated by the analysis in the signal analysis unit, and converts the two-dimensional image information into binary information using a threshold selected based on the vicinity of the mode. .

モード閾値選定型2値化法
RP多重解像度分解によって生成されるｗ種の画像からｗ個のヒストグラムが生成される。元の時系列データの力学特性に決定論性がみられる場合は、経験的にこのヒストグラムの形状は上に凸型となる。決定論性が少なく、確率論的なランダムネスが混入されている場合は、ヒストグラムは平坦になる傾向がある。決定論的カオス解析を行う際は、ヒストグラムの上に凸形状の性質を利用し、上述したヒストグラムのピーク値であるモードを探索する。ヒストグラムとは画素濃度の頻度なので、このピークとは画像中に一番多く含まれる色彩数を拾い上げることに相当する。モード値は、一つの色彩数ｒのみでピークが出現することはないので、頂点の前後の色彩数を、頂点を栄えにプラスマイナスの範囲で容易に選定できる。例として、頂点をｒとし選定範囲を３とすると、ｒ±３の範囲で色彩数を絞り込むことができる。ここで、マイナスの範囲であるｒ−１、ｒ−２、ｒ−３に相当する色彩を白色に置き換え、プラスの範囲であるｒ＋１、ｒ＋２、ｒ＋３に相当する色彩を黒色に置き換えると、2値画像に変換できる。これは、多重解像度分解によって多値化した画像の挙動を観察し、各々のヒストグラムのモード推移が収束した際、収束するということは、ある解像度以降これ以上増やしても、アトラクタの距離関係（力学特性）が、十分その時点の色彩数で表現される、飽和状態の目安となる。 Mode threshold selection type binarization method
W histograms are generated from w images generated by RP multi-resolution decomposition. When determinism is seen in the dynamic characteristics of the original time series data, the shape of this histogram is convex upward empirically. When the determinism is low and probabilistic randomness is mixed, the histogram tends to be flat. When performing deterministic chaos analysis, the mode which is the peak value of the above-mentioned histogram is searched using the property of a convex shape on the histogram. Since the histogram is the frequency of pixel density, this peak corresponds to picking up the number of colors contained most in the image. Since the mode value does not appear as a peak only with one color number r, the number of colors before and after the vertex can be easily selected within a plus or minus range with the vertex prospering. As an example, if the vertex is r and the selection range is 3, the number of colors can be narrowed down within a range of r ± 3. Here, when the colors corresponding to the negative ranges r-1, r-2, r-3 are replaced with white, and the colors corresponding to the positive ranges r + 1, r + 2, r + 3 are replaced with black, binary values are obtained. Can be converted to an image. This is to observe the behavior of multi-valued images by multi-resolution decomposition, and when the mode transition of each histogram converges, it means that it converges even if it increases further after a certain resolution. (Characteristic) is a measure of the saturation state expressed sufficiently by the number of colors at that time.

この時点で1本のヒストグラムを取り出して、そのモードの前後の色彩とは、位相空間上に散在している2点間距離の頻度数が一番多いノルムのクラスに対応しているので、言い換えると、モード値の前後が指し示す位相空間上の2点間距離は、アトラクタの幾何構造を特徴づける寄与率が最大のクラスとなる。本操作は、複雑多岐な全情報を持つアトラクタを特徴づけるのに最も相応しいノルムのクラスから、必要不可欠な情報だけを取り出し、再度２値画像を構成することで、この画像が保有する情報が、音声の力学系を特徴づけるための最有力候補となり得ることを意味するものである。
後は、この画像をパターン認識することで、最終的に音声信号の判別や認識を行うことができることになる。 At this point, one histogram is taken out, and the colors before and after the mode correspond to the norm class with the highest frequency of the distance between two points scattered in the phase space. The distance between two points in the phase space indicated before and after the mode value is the class with the largest contribution rate that characterizes the geometric structure of the attractor. This operation takes out only the essential information from the class of norm that is most suitable for characterizing an attractor with all kinds of complex information, and constructs a binary image again. It means that it can be the most promising candidate for characterizing the dynamical system of speech.
After that, by recognizing this image as a pattern, it is possible to finally determine and recognize the audio signal.

III パラメータ生成部
パラメータ生成部では、２値化処理部において生成された２値画像情報から高次局所自己相関係数を計算し、これによって特徴抽出を行う。 III Parameter Generation Unit The parameter generation unit calculates a higher-order local autocorrelation coefficient from the binary image information generated by the binarization processing unit, and thereby performs feature extraction.

高次局所自己相関特徴(Higher-order Local Auto-Correlation；HLAC)
従来の平行移動不変性を伴う自己相関関数を、高次化したものであり(９)式で表され、これをN次自己相関関数と呼ぶ。Nは相関係数の次数である。
f(r)は、本稿では1次元音声時系列ではなく、RP画像から得られた画像から自己相関特徴を得ることを考える。その際、画像内の局所的な3×３画素領域に絞りながら、高々N=2次までの高次局所自己相関で計算する。局所領域における画素配置パターンのことをマスクパターンと呼び、2値画像であれば25種のマスクパターンが、256値グレイスケール画像であれば35種のマスクパターンにより、対応する画素を画像全体に渡り積和をとってゆくことになる。図４に、グレイスケール画像のマスクパターン35種を例示している。相関係数次数N=2までを示し、格子内数字は画素値の累乗を表している。 Higher-order Local Auto-Correlation (HLAC)
A conventional autocorrelation function with translation invariance is made higher-order and expressed by equation (9), which is called an Nth-order autocorrelation function. N is the order of the correlation coefficient.
In this paper, f (r) is considered to obtain autocorrelation features from images obtained from RP images instead of one-dimensional audio time series. At that time, the calculation is performed with high-order local autocorrelation up to N = 2 order while narrowing down to a local 3 × 3 pixel region in the image. The pixel arrangement pattern in the local area is called a mask pattern, and 25 types of mask patterns are used for a binary image, and 35 types of mask patterns are used for a 256-value grayscale image. I will take the sum of products. FIG. 4 illustrates 35 types of mask patterns of gray scale images. Correlation coefficient orders up to N = 2 are shown, and the numbers in the lattice represent the powers of the pixel values.

本発明の特徴抽出装置の概略構成を示す図である。It is a figure which shows schematic structure of the feature extraction apparatus of this invention. 本発明の特徴抽出手順をより具体化したフローチャートである。It is the flowchart which actualized the characteristic extraction procedure of this invention more. ＲＰ分析を説明する図である。It is a figure explaining RP analysis. グレイスケール画像のマスクパターン35種を例示する図である。It is a figure which illustrates 35 types of mask patterns of a gray scale image.

Claims

The time series signal 1D reconstituted phase space of any dimension, the data in phase space by analyzing based on the non-stationary chaos analysis, norm between displacement vectors constituting the trajectory of the phase space Is converted into binary information by a threshold obtained by generating a two-dimensional image by calculating the pixel density of the two- dimensional image, and calculating a histogram of the two-dimensional image. the higher-order local auto-correlation coefficient calculated by applying a mask pattern on the binary image information generated, feature extraction process consisting of extracting features.

The feature extraction method according to claim 1, wherein the one-dimensional time series signal is an audio signal or an acoustic signal, and the unsteady chaos analysis is performed by a recurrence plot method.

By reconstructing a one-dimensional time-series signal into a phase space of an arbitrary dimension and analyzing the data in the phase space based on non-stationary chaos analysis, the norm between displacement vectors constituting the trajectory in the phase space A signal analysis unit that generates a two-dimensional image by replacing the pixel density of the two-dimensional image with
A binarization processing unit that converts two-dimensional image information into binary information according to a threshold obtained by calculating a histogram of the two-dimensional image;
A feature extraction apparatus comprising a parameter generation unit that calculates a higher-order local autocorrelation coefficient by applying a mask pattern from binary image information generated by conversion and extracts a feature.

The feature extraction apparatus according to claim 3 , wherein the one-dimensional time series signal is an audio signal or an acoustic signal, and the unsteady chaos analysis is performed by a recurrence plot method.

By reconstructing a one-dimensional time-series signal into a phase space of an arbitrary dimension and analyzing the data in the phase space based on non-stationary chaos analysis, the norm between displacement vectors constituting the trajectory in the phase space To generate a two-dimensional image by replacing the pixel density of the two-dimensional image with
A procedure for converting the two-dimensional image information into binary information according to a threshold obtained by calculating a histogram of the two-dimensional image generated thereby;
A feature extraction program for calculating a higher-order local autocorrelation coefficient by applying a mask pattern to binary image information generated by conversion , and executing a procedure for extracting features by a computer.

The feature extraction program according to claim 5 , wherein the one-dimensional time series signal is an audio signal or an acoustic signal, and the unsteady chaos analysis is performed by a recurrence plot method.