CN103761965A

CN103761965A - Method for classifying musical instrument signals

Info

Publication number: CN103761965A
Application number: CN201410008533.4A
Authority: CN
Inventors: 郭一娜; 王志社; 郅逍遥; 王晓梅; 李临生
Original assignee: Taiyuan University of Science and Technology
Current assignee: Taiyuan University of Science and Technology
Priority date: 2014-01-09
Filing date: 2014-01-09
Publication date: 2014-04-30
Anticipated expiration: 2034-01-09
Also published as: CN103761965B

Abstract

The invention discloses a method for classifying musical instrument signals, and belongs to the technical field of electronic information. Modules adopted in the method comprise a phase-space reconstruction module, a principal component analysis module, a feature extraction module and a flexibility neural tree module. The method is characterized by comprising the step of carrying out the phase-space reconstruction on a time sequence produced by different musical instrument sample signals, the step of removing redundant information through the principal component analysis to achieve the dimensionality reduction purpose, the step of depicting the differences of different musical instruments in the phase space through the probability density function by analyzing the features of various musical instruments, and the step of utilizing a flexible neural tree model to serves as a classifier to carry out classification. The method can effectively solve the problem of the high dependency of an artificial neural network structure, and the classification accuracy of a single musical instrument can reach up to 98.7 percent.

Description

A kind of sorting technique of instrument signal

Technical field

The invention belongs to electronic information technical field, be specifically related to a kind of sorting technique of instrument signal.

Background technology

Music is part indispensable during people live, yet we understand the mode of music information can only, by real-time audition, therefrom obtain own interested content.Along with scientific and technological development, the information that people can touch, also in growth at full speed, therefore, obtains own interested music by real-time audition not easy in the data of magnanimity.In recent years, music data analysis and retrieval have become domestic and international study hotspot.

Most music relies on the performance of musical instrument and expresses, and same different instrument playing for first song brings different enjoyments also can to people's the sense of hearing.In computing machine identification field, along with the development of signal analysis and data mining technology, the multiple cross one another technology such as time frequency analysis, neural network are all applied in the middle of musical instrument classification gradually.

Effectively feature extraction and sorter selection are the committed steps in musical instrument classification always.Mel cepstrum coefficient feature is used in voice signal field conventionally, but also yields unusually brilliant results in the classification of musical genre and instrument signal.MPEG-7 is as with audio feature extraction and the standardization framework that is described as object, essential in the feature extraction of music signal, especially better to similar music signal effect.Except numerous acoustic characteristics, sorter has also obtained further investigation in audio classification widely, such as neural network, gauss hybrid models, hidden Markov model, Bayesian decision sorter and support vector machine etc.In researcher, there is people to utilize single-channel voice partition method to make two or more musical instrument separation obtain possibility by Gauss's modeling, someone uses AM/FM amplitude modulation/frequency modulation modulating characteristic instrument signal to be analyzed and identified (its accuracy rate can reach 70%), also someone uses Mel cepstrum coefficient feature, permanent q conversion and autocorrelation function carry out feature extraction, in conjunction with Bayesian decision sorter, make recognition accuracy reach 79%-84%.But in existing achievement in research, the classification accuracy of some sound signals is generally on the low side, illustrate that above-mentioned acoustic characteristic does not depict the difference between all signals well, and the parameter of most of sorter is chosen and operating process exists complicated loaded down with trivial details shortcoming.

Summary of the invention

The object of the invention is to provide a kind of sorting technique of instrument signal, can effectively overcome the shortcoming existing in prior art.

The present invention realizes like this, by employing, comprise that phase space reconfiguration module, principal component analysis (PCA) module, characteristic extracting module and flexible Neural Tree module carry out feature extraction to various instrument signal, then through sorter, classify, it is characterized in that implementation step is:

The first step, the musical instrument sample signal collecting is carried out to phase space reconfiguration, mainly determine the time delay of sample signal and embed dimension, comprise the following steps:

If the One-dimension Time Series of certain instrument signal sample is x=(x ₁, x ₂..., x _k) ^t, be made as τ time delay, set up the embedded space Y of a m dimension, x is mapped in this embedded space, the vector of the phase space of reconstruct is:

\begin{matrix} Y = (y_{1}, y_{2}, . . ., y_{n}) = \\ (\begin{matrix} x (1) & x (2) & \cdot \cdot \cdot & x (n) \\ x (1 + τ) & x (2 + τ) & \cdot \cdot \cdot & x (n + τ) \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x (1 + (m - 1) τ) & x (2 + (m - 1) τ) & \cdot \cdot \cdot & x (n + (m - 1) τ) \end{matrix}) \end{matrix} - - - (1)

Wherein, K is the length of audio frequency time series x; The number n=K-of phase vector (m-1) τ in phase space (n=1,2 ..., N).

(1), Average Mutual method is determined optimum delay time τ

Choose Average Mutual method and determine best delay time T:

I (τ) = Σ_{n = 1}^{N} P (x_{n}, x_{n + τ}) \log_{2} \frac{P (x_{n}, x_{n + τ})}{P (x_{n}) P (x_{n + τ})} - - - (2)

P (x wherein _n, x _{n+ τ}) be the joint distribution probability in restructuring graph, P (x _n), P (x _{n+ τ}) be marginal distribution probability.Choose the τ of first local minimum of I (τ) as optimum delay time.

(2), false Neighbor Points method is determined embedding dimension m

In m dimension phase space, to phase vector y arbitrarily _n(i)=x (i), x (i+ τ) ..., x[i+ (m-1) τ] } have and face recently a y in certain distance _n ^nN(i), and distance between two points be:

D _m(i)=||y _n(i)-y _n ^NN(i)|| (3)

D _m(i) be a less amount, when the embedding dimension of phase space is increased to m+1 dimension from m dimension, the distance of these two phase points will change.If neighbor distance is D in this case _m+1(i), have:

D ² _m+1(i)=D ² _m(i)+||x(i+τm)-x ^NN(i+τm)|| (4)

By providing certain threshold value, judge which point is false Neighbor Points, order

{ΔD}_{m} (i, m) = \sqrt{\frac{D_{m + 1}^{2} (i) - D_{m}^{2} (i)}{D_{m}^{2} (i)}} = \frac{| x (i + τm) - x^{NN} (i + τm) |}{D_{m} (i)} - - - (5)

Δ D wherein _mwhat (i, m) represented m dimension faces the distance a little variable quantity while tieing up with respect to m+1 recently.

If Δ D _m(i, m)>=10%, illustrates that Neighbor Points is false.Whether each phase vector in m-dimensional space is differentiated to its Neighbor Points is successively false, calculates the ratio that false Neighbor Points accounts for whole phase vectors.When the ratio of false Neighbor Points is less than 5%, think that trajectory of phase space opens completely, m is now the best dimension that embeds.

(3), phase space reconfiguration

Determine delay time T and embed after dimension m, can be by original One-dimension Time Series x=(x ₁, x ₂..., x _k) ^tdelay reconstruction obtains embedding vector x _m=(x _t, x _t-τ..., x _{t-(m-1) τ}) ^t, these vectors have been determined a point on the embedded space Y of m dimension, each retardation of different time points just produced a series of such points in m-dimensional space, thereby formed the phase space reconfiguration matrix A that reflects m * n rank of primal system information;

Second step, the musical instrument sample signal after reconstruct is carried out to principal component analysis (PCA), eliminate redundancy information, reaches the effect of dimensionality reduction:

R=E (AA ^t), RV=V Λ, wherein A is the m * n rank matrix obtaining after phase space reconfiguration, R is the autocorrelation matrix of m component, m * m rank eigenvectors matrix that V is R, its column vector is the proper vector of the orthonomalization of R; Λ is the feature diagonal matrix of R, λ _i(i=1,2 ..., m) be the element on i diagonal line; M incoherent new variables Y=V of structure ^ta, Y={y ₁, y ₂..., y _m} ^t, to λ _i(i=1,2 ..., m) by after descending sort, get p the corresponding proper vector of larger eigenwert above, obtain p * n rank vector matrix B, wherein p>=3 after dimensionality reduction;

The 3rd step, principal component analysis (PCA) is processed after resulting matrix B carry out feature extraction, it is main that what extract is the eigenwert of probability density function:

We suppose the region G of sample data be one centered by x, the minimum cube that the length of side is h, first defines a kernel function:

Required probability density function is:

p (x) = \frac{1}{N} Σ_{n = 1}^{N} \frac{1}{h^{m}} k (\frac{x - x_{n}}{h}) - - - (7)

Wherein, N is sample point number, n=1, and 2 ..., N, m is data dimension, k (u) is the kernel function of definition.Obtain thus the matrix on m * n rank, after it is sampled line by line, form the characteristic set H on m * c rank, wherein c<n.

The 4th step, adopt flexible Neural Tree model as the sorter of musical instrument identification module, carry out neural network training:

The calculation process adopting in flexible Neural Tree model is as follows: first set the initial configuration of flexible Neural Tree, then according to the initial configuration optimization of the characteristic set H obtaining and setting and train a fuzzy classification model.The instruction that comprises two types in this model: flexible neuron instruction and terminal instruction.Flexible neuron instruction connects its subtree for the non-leaf node of tree construction, and terminal instruction is each input feature vector.Function instruction set and terminal instruction set are expressed as follows:

S=F∪T={+ ₂，+ ₃，...,+ _N}∪{z ₁,z ₂,...z _n} (8)

Wherein+ _i(2,3 ..., N) represent non-leaf node instruction, this non-leaf node has i input, has i input variable; z ₁, z ₂... z _nrepresent leaf node instruction, this leaf node is without input variable.

Adopt Gaussian function as flexible excitation function:

f (j_{i}, b_{i}, z) = e^{- {(\frac{z - j_{i}}{b_{i}})}^{2}} - - - (9)

Wherein, j _iand b _iit is the random adjustable parameter producing in the constructive process of flexible Neural Tree.Fitness function is the problem that another need to be in advance definite except instruction set, selects root-mean-square error as fitness function.

Advantage of the present invention and good effect: the present invention adopts a kind of new musical instrument classification to classify to instrument signal, the method is carried out phase space reconfiguration by the time series that different musical instruments are produced, and discloses the particular attribute of every kind of musical instrument; By principal component analysis (PCA), remove redundant information, reach dimensionality reduction object; By analyzing various musical instrument characteristics, adopt probability density function to portray the difference of each musical instrument in phase space; Finally, adopt flexible Neural Tree model to classify as sorter, it has effectively solved artificial neural network structure's high dependency problem.The application's parameter choose and operating process simple, experimental result can make the classification accuracy of single musical instrument be up to 98.7%, and the root-mean-square error of this sorter is minimum reaches 0.045570.

Accompanying drawing explanation

Fig. 1 is the invention process process flow diagram;

Fig. 2 and Fig. 3 are the time domain waveform figure of the experiment signal that adopts, the sample signal x that Fig. 2 is circular horn _1-1time domain waveform figure, the sample signal x that Fig. 3 is piano _2-1time domain waveform figure;

Fig. 4 and Fig. 5 adopt the definite circular horn sample x of mutual information method _1-1with piano sample x _2-1delay time T;

Fig. 6 and Fig. 7 adopt the definite circular horn sample x of false Neighbor Points method _1-1with piano sample x _2-1embedding dimension m;

Fig. 8 and Fig. 9 are respectively circular horn sample x _1-1with piano sample x _2-1phase space reconfiguration figure in three dimensions;

Figure 10 and Figure 11 are respectively circular horn sample x _1-1with piano sample x _2-1the eigenwert of probability density function;

Figure 12 is the flexible Neural Tree model finally training;

Figure 13 and Figure 14 are respectively circular horn sample x _1-1with piano sample x _2-1training sample and the prediction output of test sample book and target output.

In figure: amplitude represents amplitude, N represents sample point number, and I (τ) represents Average, and τ represents time delay, and m represents to embed dimension, Δ D _mrepresent that false neighbour leads, PDF represents probability density function values, and time represents the time, Value of forecast and target represents prediction and target output, and Train data is training sample, and Test data represents test sample book, forecast represents predicted value, and target represents desired value.

Embodiment

We select Matlab7.0 and Visual Studio2008 as software platform, and programming realizes the present invention program's design.Implementation process is to choose two kinds of common instrument signal: circular horn, piano are as experimental subjects.For each instrument signal, select 10 groups of actual measurement sample datas, sample data length is 2000 points.Wherein front 8 groups as training sample, other 2 groups as test sample book.

Two kinds of one of them samples of instrument signal of take describe as example, and concrete operation step is as follows:

First sample signal is carried out to phase space reconfiguration processing, then carry out principal component analysis (PCA) and reduce redundancy, extract pivot, then enter feature extraction part, finally carry out musical instrument identification, adopt flexible Neural Tree neural network training to reach identifying purpose.

In the first step, the present embodiment, the processing concrete steps of phase space reconfiguration are as follows:

(1), optimum delay time τ's determines

Choose Average Mutual method and determine best delay time T, as shown in the formula:

I (τ) = Σ_{n = 1}^{N} P (x_{n}, x_{n + τ}) \log_{2} \frac{P (x_{n}, x_{n + τ})}{P (x_{n}) P (x_{n + τ})} - - - (2)

Choose the τ of first local minimum of Average I (τ) as optimum delay time.Fig. 3 is for adopting the definite circular horn sample x of the method _1-1with piano sample x _2-1delay time T, τ _1-1=3, τ _2-1=6

(2), best the determining of dimension m that embed

Choose false Neighbor Points method and determine embedding dimension m, according to following formula, judge which point is false Neighbor Points:

{ΔD}_{m} (i, m) = \sqrt{\frac{D_{m + 1}^{2} (i) - D_{m}^{2} (i)}{D_{m}^{2} (i)}} = \frac{| | y_{n + τm} (i + τm) - {y_{n + τm}}^{NN} (i + τm) | |}{D_{m} (i)} - - - (5)

If Δ D _m(i, m)>=10%, illustrates that Neighbor Points is false, wherein τ _1-1=3, τ _2-1=6.Calculate respectively x _1-1and x _2-1false Neighbor Points account for the ratio of whole phase vectors, when the ratio of false Neighbor Points is less than 5%, think that trajectory of phase space opens completely, m is now the best dimension that embeds.Fig. 4 is for adopting the definite circular horn sample x of the method _1-1with piano sample x _2-1embedding dimension m, m _1-1=m _2-1=5.

(3), phase space reconfiguration

Determine x _1-1and x _2-1delay time T _1-1=3, τ _2-1=6 and embed dimension m _1-1=m _2-1=5, by their time series delay reconstructions separately, obtain embedding vector x ₅=(x _t, x _t-3, x _t-6, x _t-9, x _t-12) T and x ₅=(x _t, x _t-6, x _t-12, x _t-18, x _t-24) T, these vectors have been determined respectively a point on 5 dimension embedded spaces separately, each retardation of different time points has just produced a series of such points in 5 dimension spaces, thereby has formed the phase space of two 5 dimensions of reflection primal system information.In order to see intuitively the figure after reconstruct, the three-dimensional that we choose is wherein carried out imaging, and Fig. 5 is x _1-1and x _2-1phase space reconfiguration figure in three dimensions.

The processing concrete steps of principal component analysis (PCA) in second step, the present embodiment are as follows:

R=E (AA ^t), RV=V Λ, wherein A is x _1-15 * 1991 matrixes that obtain after phase space reconfiguration, R is the autocorrelation matrix of 5 dimension reconstruct variablees, 5 * 5 rank eigenvectors matrixs that V is R, its column vector is the proper vector of the orthonomalization of R; Λ is the feature diagonal matrix of R, λ _i(i=1,2 ..., 5) be the element on i diagonal line; Construct 5 incoherent new variables Y=V ^tx, Y={y ₁, y ₂..., y ₅} ^t, to Y={y ₁, y ₂..., y ₅} ^tby after descending sort, get 3 corresponding proper vectors of larger eigenwert above, obtain 3 * 1991 rank vector matrix B after dimensionality reduction ₁;

In like manner obtain x _2-1vector matrix B after principal component analysis (PCA) dimensionality reduction ₂.

In the 3rd step, the present embodiment, the processing concrete steps of feature extraction part are as follows:

The matrix B that dimensionality reduction after principal component analysis (PCA) is obtained ₁and B ₂, extract the eigenwert of its probability density function:

p (x) = \frac{1}{N} Σ_{n = 1}^{N} \frac{1}{h^{m}} k (\frac{x - x_{n}}{h}) - - - (7)

Wherein, N=1991, m=3, h=0.2.Through over-sampling, form the characteristic set H of two 3 * 10 _1-1and H _2-1.Fig. 6 is circular horn sample x _1-1with piano sample x _2-1the eigenwert figure of probability density function.

In like manner draw according to the method described above the characteristic set H of all the other 9 groups of sample datas of circular horn and piano _1-iand H _2-i, i=2 ..., 10.

In the 4th step, the present embodiment, adopt flexible Neural Tree to identify musical instrument as sorter, mainly on the software platform of VisualStudio2008, implement, concrete steps are as follows:

Using the circular horn data H as training sample _1-i(i=1 ..., 8) characteristic set be successively placed in the file of train.txt, all the other two groups of data are placed in the file of text.txt.First set the initial configuration of flexible Neural Tree, then according to the initial configuration optimization of the characteristic set in train.txt file and setting and train a fuzzy classification model.Then test sample book signal is input in the flexible Neural Tree model training and carries out performance verification.In like manner, to piano data H _2-i(i=1 ..., 8) carry out training and testing.

Figure 7 shows that the flexible Neural Tree model finally training.Wherein, function instruction set is F={+ ₂,+ ₃,+ ₄, terminal instruction set is: T={x ₁, x ₂, x ₃, x ₄, x ₅, x ₆, x ₇.Choose root-mean-square error RMSE=0.1 and judge that as threshold value whether classification results is correct, when the RMSE value of output is greater than 0.1, be regarded as the musical instrument that is identified as other; Otherwise, be considered as identification correct.The discrimination of i kind musical instrument is as follows:

R_{i} = \frac{n}{N} \times 100 %

Wherein, i is sample class, and n is the number of samples that is identified as i kind musical instrument, and N is the total sample number of i kind musical instrument.The present embodiment effect is, circular horn sample x _1-1with piano sample x _2-1discrimination be respectively 84.9% and 98.7%, average root-mean-square error be respectively 0.088685 and 0.045570, Fig. 8 be circular horn sample x _1-1with piano sample x _2-1training sample and the prediction output of test sample book and target output comparison diagram, as can be seen from the figure the prediction output of test sample book and target output are basically identical, although have part to occur to depart from, very approaching generally.The classification that experiment shows to the present invention is directed to instrument signal is very effective.

Claims

1. the sorting technique of an instrument signal, by employing, comprise that phase space reconfiguration module, principal component analysis (PCA) module, characteristic extracting module and flexible Neural Tree module carry out feature extraction to various instrument signal, then through sorter, classify, it is characterized in that implementation step is:

\begin{matrix} Y = (y_{1}, y_{2}, . . ., y_{n}) = \\ (\begin{matrix} x (1) & x (2) & \cdot \cdot \cdot & x (n) \\ x (1 + τ) & x (2 + τ) & \cdot \cdot \cdot & x (n + τ) \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ x (1 + (m - 1) τ) & x (2 + (m - 1) τ) & \cdot \cdot \cdot & x (n + (m - 1) τ) \end{matrix}) \end{matrix} - - - (1)

Wherein, K is the length of audio frequency time series x, the number n=K-of phase vector (m-1) τ in phase space (n=1,2 ..., N), (1), Average Mutual method are determined optimum delay time τ

Choose Average Mutual method and determine best delay time T:

I (τ) = Σ_{n = 1}^{N} P (x_{n}, x_{n + τ}) \log_{2} \frac{P (x_{n}, x_{n + τ})}{P (x_{n}) P (x_{n + τ})} - - - (2)

P (x wherein _n, x _{n+ τ}) be the joint distribution probability in restructuring graph, P (x _n), P (x _{n+ τ}) be marginal distribution probability.Choose the τ of first local minimum of I (τ) as optimum delay time;

(2), false Neighbor Points method is determined embedding dimension m

D _m(i)=||y _n(i)-y _n ^NN(i)|| (3)

D _m(i) be a less amount, when the embedding dimension of phase space is increased to m+1 dimension from m dimension, the distance of these two phase points will change; If neighbor distance is D in this case _m+1(i), have:

D ² _m+1(i)=D ² _m(i)+||x(i+τm)-x ^NN(i+τm)|| (4)

{ΔD}_{m} (i, m) = \sqrt{\frac{D_{m + 1}^{2} (i) - D_{m}^{2} (i)}{D_{m}^{2} (i)}} = \frac{| x (i + τm) - x^{NN} (i + τm) |}{D_{m} (i)} - - - (5)

Δ D wherein _mwhat (i, m) represented m dimension faces the distance a little variable quantity while tieing up with respect to m+1 recently,

If Δ D _m(i; m)>=10%; illustrate that Neighbor Points is false; whether each phase vector in m-dimensional space is differentiated to its Neighbor Points is successively false; calculate the ratio that false Neighbor Points accounts for whole phase vectors; when the ratio of false Neighbor Points is less than 5%, think that trajectory of phase space opens completely, m is now the best dimension that embeds;

(3), phase space reconfiguration

R=E (AA ^t), RV=V Λ, wherein A is the m * n rank matrix obtaining after phase space reconfiguration, R is the autocorrelation matrix of m component, m * m rank eigenvectors matrix that V is R, its column vector is the proper vector of the orthonomalization of R, the feature diagonal matrix that Λ is R, λ _i(i=1,2 ..., m) be the element on i diagonal line, m incoherent new variables Y=V of structure ^ta, Y={y ₁, y ₂..., y _m} ^t, to λ _i(i=1,2 ..., m) by after descending sort, get p the corresponding proper vector of larger eigenwert above, obtain p * n rank vector matrix B, wherein p>=3 after dimensionality reduction;

Required probability density function is:

p (x) = \frac{1}{N} Σ_{n = 1}^{N} \frac{1}{h^{m}} k (\frac{x - x_{n}}{h}) - - - (7)

Wherein, N is sample point number, n=1, and 2 ..., N, m is data dimension, k (u) is the kernel function of definition, obtains thus the matrix on m * n rank, after it is sampled line by line, forms the characteristic set H on m * c rank, wherein c<n;

The calculation process adopting in flexible Neural Tree model is as follows: first set the initial configuration of flexible Neural Tree, then according to the initial configuration optimization of the characteristic set H obtaining and setting and train a fuzzy classification model; The instruction that comprises two types in this model: flexible neuron instruction and terminal instruction, flexible neuron instruction connects its subtree for the non-leaf node of tree construction, and terminal instruction is each input feature vector, and function instruction set and terminal instruction set are expressed as follows:

S=F∪T={+ ₂，+ ₃，...,+ _N}∪{z ₁,z ₂,...z _n} (8)

Wherein+ _i(2,3 ..., N) represent non-leaf node instruction, this non-leaf node has i input, has i input variable; z ₁, z ₂... z _nrepresent leaf node instruction, this leaf node is without input variable,

Adopt Gaussian function as flexible excitation function:

f (j_{i}, b_{i}, z) = e^{- {(\frac{z - j_{i}}{b_{i}})}^{2}} - - - (9)

Wherein, j _iand b _ibe the random adjustable parameter producing in the constructive process of flexible Neural Tree, fitness function is the problem that another need to be in advance definite except instruction set, selects root-mean-square error as fitness function.