CN103761965B

CN103761965B - A kind of sorting technique of instrument signal

Info

Publication number: CN103761965B
Application number: CN201410008533.4A
Authority: CN
Inventors: 郭一娜; 王志社; 郅逍遥; 王晓梅; 李临生
Original assignee: Taiyuan University of Science and Technology
Current assignee: Taiyuan University of Science and Technology
Priority date: 2014-01-09
Filing date: 2014-01-09
Publication date: 2016-05-25
Anticipated expiration: 2034-01-09
Also published as: CN103761965A

Abstract

A sorting technique for instrument signal, belongs to electronic information technical field, comprises phase space reconfiguration module, principal component analysis module, characteristic extracting module and flexible Neural Tree module. Be characterised in that implementation step is that the time series that different musical instrument sample signals are produced is carried out phase space reconfiguration, remove redundancy by principal component analysis, reach dimensionality reduction object, again by analyzing various musical instrument characteristics, adopt probability density function to portray the difference of each musical instrument in phase space, finally, adopt flexible Neural Tree model to classify as grader, it can effectively solve artificial neural network structure's high dependency problem, can make the classification accuracy of single musical instrument be up to 98.7%.

Description

Classification method of musical instrument signals

Technical Field

The invention belongs to the technical field of electronic information, and particularly relates to a method for classifying musical instrument signals.

Background

Music is an indispensable part in life of people, however, the mode of knowing music information can only be used for acquiring interesting contents from the music information through real-time audition. With the development of science and technology, the information that people can contact is rapidly increasing, so that it is not easy to obtain interesting music in mass data through real-time audition. In recent years, music data analysis and retrieval have become a research hotspot at home and abroad.

Most of music is expressed by depending on the performance of musical instruments, and the same song is performed by different musical instruments, so that different enjoyment is brought to the hearing of people. In the field of computer identification, with the continuous development of signal analysis and data mining technologies, various intercrossed technologies such as time-frequency analysis and neural networks are gradually applied to musical instrument classification.

Efficient feature extraction and classifier selection have been key steps in instrument classification. Mel cepstral coefficient features are commonly applied in the field of speech signals, but also magnify the heteroscedasticity in the classification of music genres and instrument signals. MPEG-7, as a standardized framework for the purpose of audio feature extraction and description, is indispensable in feature extraction of music signals, and is particularly more effective for similar music signals. In addition to numerous audio features, a wide range of classifiers have been extensively studied in audio classification, such as neural networks, gaussian mixture models, hidden markov models, bayesian decision classifiers, and support vector machines. Some researchers use a single-channel voice separation method to separate two or more musical instruments possibly through Gaussian modeling, others use amplitude modulation-frequency modulation characteristics to analyze and identify musical instrument signals (the accuracy rate can reach 70%), others use Mel cepstrum coefficient characteristics, constant q transformation and autocorrelation functions to extract characteristics, and a Bayesian decision classifier is combined to enable the identification accuracy rate to reach 79% -84%. However, in the existing research results, the classification accuracy of some audio signals is generally low, which indicates that the above audio characteristics do not well delineate the differences between all the signals, and the parameter selection and operation processes of most classifiers have the disadvantages of complexity and complexity.

Disclosure of Invention

The invention aims to provide a method for classifying musical instrument signals, which can effectively overcome the defects in the prior art.

The invention is realized in such a way that various musical instrument signals are subjected to feature extraction by adopting a phase space reconstruction module, a principal component analysis module, a feature extraction module and a flexible neural tree module, and then are classified by a classifier, and the method is characterized by comprising the following implementation steps of:

the method comprises the following steps of firstly, carrying out phase space reconstruction on collected instrument sample signals, mainly determining the delay time and the embedding dimension of the sample signals, and comprising the following steps:

let a one-dimensional time sequence of a certain instrument signal sample be x ═ x (x)₁,x₂,...,x_K)^TSetting the delay time as tau, establishing an m-dimensional embedding space Y, mapping x into the embedding space, and then reconstructing a vector of a phase space as:

\begin{matrix} Y = (y_{1}, y_{2}, ..., y_{n}) = \\ (\begin{matrix} x (1) & x (2) & ... & x (n) \\ x (1 + τ) & x (2 + τ) & ... & x (n + τ) \\ . & . & . \\ . & . & . \\ . & . & . \\ x (1 + (m - 1) τ) & x (2 + (m - 1) τ) ... & ... & x (n + (m - 1) τ) \end{matrix}) \end{matrix} - - - (1)

wherein K is the length of the audio time series x; the number N of phase vectors in phase space is K- (m-1) τ (N is 1, 2.., N),

(1) determining the optimal delay time tau by the average mutual information method

And (3) selecting an average mutual information method to determine the optimal delay time tau:

I (τ) = Σ_{n = 1}^{N} P (x_{n}, x_{n + τ}) \log_{2} \frac{P (x_{n}, x_{n + τ})}{P (x_{n}) P (x_{n + τ})} - - - (2)

wherein P (x)_n,x_n+τ) To reconstruct the joint distribution probability in the graph, P (x)_n),P(x_n+τ) Selecting tau of a first local minimum value of I (tau) as an optimal delay time for the edge distribution probability;

(2) determining embedding dimension m by false neighbor method

For arbitrary phasor y in m-dimensional phasor space_n(i)＝{x(i),x(i+τ),...,x[i+(m-1)τ]All have a nearest neighbor y within a certain distance_n ^NN(i) And the distance between the two points is:

D_m(i)＝||y_n(i)-y_n ^NN(i)||(3)

D_m(i) is a small quantity, the distance between these two phase points changes when the embedding dimension of the phase space increases from the m dimension to the m +1 dimension, in which case the adjacent distance is D_m+1(i) Then, there are:

D² _m+1(i)＝D² _m(i)+||x(i+τm)-x^NN(i+τm)||(4)

judging which points are false neighbors by giving a certain threshold value, and enabling

{ΔD}_{m} (i, m) = \sqrt{\frac{D_{m + 1}^{2} (i) - D_{m}^{2} (i)}{D_{m}^{2} (i)}} = \frac{| x (i + τ m) - x^{N N} (i + τ m) |}{D_{m} (i)} - - - (5)

Wherein Δ D_m(i, m) represents the amount of change in the distance of the nearest neighbor point in m dimensions with respect to the time in m +1 dimensions,

if Δ D_m(i, m) is more than or equal to 10%, the adjacent points are false, whether the adjacent points of each phasor in the m-dimensional space are false is judged in sequence, the proportion of the false adjacent points in all phasors is calculated, when the proportion of the false adjacent points is less than 5%, the phasor space track is considered to be completely opened, and m at the moment is the optimal embedding dimension;

(3) phase space reconstruction

After determining the delay time τ and the embedding dimension m, the original one-dimensional time sequence x may be determined as (x)₁,x₂,...,x_K)^TDelayed reconstruction to obtain an embedded vector x_m＝(x_t,x_t-τ,...,x_t-(m-1)τ)^TThese vectors define a point in the m-dimensional embedding space Y, and the respective delay amounts at different time points produce a series of such points in the m-dimensional space, thereby forming a phase space reconstruction matrix a of order m × n reflecting the original system information;

and secondly, performing principal component analysis on the reconstructed musical instrument sample signal, and eliminating redundant information to achieve the effect of reducing the dimension:

R＝E(AA^T) Where A is m × n order matrix obtained after phase space reconstruction, R is autocorrelation matrix of m components, V is m × m order eigenvector matrix of R, column vector is orthogonal normalized eigenvector of R, Λ is characteristic diagonal matrix of R, and lambda is V Λ_i(i ═ 1, 2.. times, m) is the element on the ith diagonal; constructing m new independent variables Y ═ V^TA，Y＝{y₁,y₂,...,y_m}^TTo λ_iAfter the (i ═ 1, 2.. multidot.m) is arranged in a descending order, the eigenvectors corresponding to the previous p larger eigenvalues are taken to obtain a vector matrix B of p × n order after dimension reduction, wherein p is more than or equal to 3;

thirdly, extracting the characteristics of the matrix B obtained after the principal component analysis processing, wherein the extracted characteristic values of the probability density function are as follows:

let us assume that the sample data region G is a tiny cube with x as the center and a side length of h, and first define a kernel function:

the probability density function is then found to be:

p (x) = \frac{1}{N} Σ_{n = 1}^{N} \frac{1}{h^{m}} (\frac{x - x_{n}}{h}) - - - (7)

n is the number of sample points, N is 1, 2., N, m is the data dimension, k (u) is a defined kernel function, thereby obtaining an m × N-order matrix, and after sampling the matrix line by line, forming an m × c-order feature set H, wherein c is less than N;

and fourthly, training a neural network by adopting a flexible neural tree model as a classifier of the musical instrument recognition module:

the calculation flow adopted in the flexible neural tree model is as follows: firstly, setting an initial structure of a flexible neural tree, and then optimizing and training a fuzzy classification model according to an obtained feature set H and the set initial structure, wherein the model comprises two types of instructions: the flexible neuron instruction is used for connecting a non-leaf node of a tree structure with a subtree, the terminal instruction is each input characteristic, and a function instruction set and a terminal instruction set are represented as follows:

S＝F∪T＝{+₂，+₃，...,+_N}∪{z₁,z₂,...z_n}(8)

wherein_i(2, 3.. An, N) represents a non-leaf node instruction having_iAn input is namely_iAn input variable; z is a radical of₁,z₂,...z_nRepresenting a leaf node instruction, the leaf node having no input variable;

adopting a Gaussian function as a flexible excitation function:

f (j_{i}, b_{i}, z) = e^{- {(\frac{z - j_{i}}{b_{i}})}^{2}} - - - (9)

wherein,j_iand b_iAre adjustable parameters that are randomly generated during the creation of the flexible neural tree. The fitness function is another problem that needs to be determined in advance in addition to the instruction set, and the root mean square error is selected as the fitness function.

The invention has the advantages and positive effects that: the invention adopts a new musical instrument classification method to classify musical instrument signals, and the method discloses the specific attribute of each musical instrument by carrying out phase space reconstruction on time sequences generated by different musical instruments; redundant information is removed through principal component analysis, and the purpose of reducing the dimension is achieved; by analyzing the characteristics of various instruments, the difference of the various instruments in a phase space is described by adopting a probability density function; and finally, a flexible neural tree model is used as a classifier for classification, so that the problem of high dependency of an artificial neural network structure is effectively solved. The method has the advantages that the parameter selection and the operation process are simple, the highest classification accuracy of a single musical instrument can reach 98.7% through experimental results, and the lowest root mean square error of the classifier can reach 0.045570.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIGS. 2 and 3 are time domain waveforms of signals used in the experiment, and FIG. 2 shows a sample signal x with a circle number_1-1Fig. 3 is a sample signal x of a piano_2-1Time domain waveform diagrams of (a);

FIGS. 4 and 5 are respectively a sample x of a circle number determined by mutual information_1-1And piano sample x_2-1The delay time τ of (d);

FIGS. 6 and 7 are respectively a sample x of the circle number determined by the pseudo-neighbor method_1-1And piano sample x_2-1The embedding dimension m;

FIGS. 8 and 9 are respectively a sample x of the circle number_1-1And piano sample x_2-1Reconstructing a picture in phase space in three-dimensional space;

FIGS. 10 and 11 are respectively a sample x of the circle number_1-1And piano sample x_2-1The eigenvalues of the probability density function of (1);

FIG. 12 is the final trained flexible neural tree model;

FIGS. 13 and 14 are respectively a sample x of the circle number_1-1And piano sample x_2-1The predicted output and the target output of the training samples and the test samples.

In the figure: amplitude, N the number of sample points, I (τ) the average mutual information content, τ the delay time, m the embedding dimension, Δ D_mThe method comprises the steps of representing a false neighbor rate, PDF representing a probability density function value, time representing time, value of forecast and target output, Traindata being a training sample, Testdata representing a test sample, forecast representing a predicted value, and target representing a target value.

Detailed Description

Matlab7.0 and Visualstudio2008 are selected as software platforms, and the design of the scheme is realized through programming. The implementation process is to select two common instrument signals: round horn, piano as the subject of experiment. For each instrument signal, 10 sets of measured sample data were selected, the sample data length being 2000 points. The first 8 groups were used as training samples and the other 2 groups were used as test samples.

Taking one sample of two instrument signals as an example for explanation, the specific operation steps are as follows:

firstly, carrying out phase space reconstruction processing on a sample signal, then carrying out principal component analysis to reduce redundancy, extracting principal elements, then entering a characteristic extraction part, finally carrying out musical instrument identification, and training a neural network by adopting a flexible neural tree to achieve the identification purpose.

First, the specific processing steps of phase space reconstruction in this embodiment are as follows:

(1) determination of the optimum delay time tau

Selecting an average mutual information method to determine the optimal delay time tau, as follows:

I (τ) = Σ_{n = 1}^{N} P (x_{n}, x_{n + τ}) \log_{2} \frac{P (x_{n}, x_{n + τ})}{P (x_{n}) P (x_{n + τ})} - - - (10)

the optimum delay time is chosen to be τ of the first local minimum of the average mutual information quantity I (τ). FIGS. 4 and 5 are respectively a sample x of the circle number determined by this method_1-1And piano sample x_2-1Is delayed by a delay time τ, τ_1-1＝3，τ_2-1＝6

(2) Determination of the optimal embedding dimension m

And (3) determining an embedding dimension m by selecting a false neighbor point method, and judging which points are false neighbors according to the following formula:

{ΔD}_{m} (i, m) = \sqrt{\frac{D_{m + 1}^{2} (i) - D_{m}^{2} (i)}{D_{m}^{2} (i)}} = \frac{| | y_{n + τ m} (i + τ m) - y_{n + τ m}^{N N} (i + τ m) | |}{D_{m} (i)} - - - (11)

if Δ D_m(i, m) ≧ 10%, this indicates that the neighbor is spurious, where τ_1-1＝3，τ_2-16. Separately calculate x_1-1And x_2-1When the proportion of the false neighbors is less than 5%, the phase space trajectory is considered to be completely opened, and m at the moment is the optimal embedding dimension. FIGS. 6 and 7 are respectively a sample x of the circle numbers determined by this method_1-1And piano sample x_2-1M, m of the embedding dimension_1-1＝m_2-1＝5。

(3) Phase space reconstruction

Determining x_1-1And x_2-1Is delayed by a time τ_1-1＝3，τ_2-16 and embedding dimension m_1-1＝m_2-1The embedded vector x is obtained from their respective time series delayed reconstruction₅＝(x_t,x_t-3,x_t-6,x_t-9,x_t-12)^TAnd x₅＝(x_t,x_t-6,x_t-12,x_t-18,x_t-24)^TThese areThe vectors each define a point in a respective 5-dimensional embedding space, and the respective delay amounts at different points in time produce a series of such points in the 5-dimensional space, thereby forming two 5-dimensional phase spaces reflecting the original system information. In order to visually see the reconstructed image, three dimensions are selected for imaging, and x is shown in fig. 8 and 9 respectively_1-1And x_2-1The image is reconstructed in phase space in three-dimensional space.

Second, the specific processing steps of principal component analysis in this embodiment are as follows:

R＝E(AA^T) And RV ═ V Λ where a is x_1-1A 5 × 1991 matrix obtained after phase space reconstruction, wherein R is an autocorrelation matrix of 5-dimensional reconstruction variables, V is a 5 × 5 th order eigenvector matrix of R, column vectors of the matrix are orthogonal normalized eigenvectors of R, Λ is an eigendiagonal matrix of R and lambda_i(i ═ 1, 2.., 5) is the element on the ith diagonal; constructing 5 new irrelevant variables Y ═ V^TX，Y＝{y₁,y₂,...,y₅}^TFor Y ═ Y₁,y₂,...,y₅}^TAfter arranging according to descending order, the eigenvectors corresponding to the previous 3 larger eigenvalues are taken to obtain a reduced-dimension 3 × 1991 order vector matrix B₁；

By the same token, x is obtained_2-1Vector matrix B after principal component analysis dimensionality reduction₂。

Step three, the specific processing steps of the feature extraction part in this embodiment are as follows:

a matrix B obtained by reducing the dimension after the principal component analysis₁And B₂Extracting the characteristic value of the probability density function:

p (x) = \frac{1}{N} Σ_{n = 1}^{N} \frac{1}{h^{m}} k (\frac{x - x_{n}}{h}) - - - (12)

n1991, m 3, H0.2, and two feature sets H of 3 × 10 are formed by sampling_1-1And H_2-1. FIGS. 10 and 11 are respectively a sample x of the circle number_1-1And piano sample x_2-1A figure of eigenvalues of the probability density function of (2).

And obtaining the feature set H of the rest 9 groups of sample data of the round number and the piano according to the same method_1-iAnd H_2-i，i＝2,...,10。

In the fourth step, in this embodiment, a flexible neural tree is used as a classifier to identify musical instruments, and is mainly implemented on a visual studio2008 software platform, and the specific steps are as follows:

circle number data H to be used as training sample_1-iThe feature sets of (i ═ 1,. and 8) are placed in sequence in the file of text. Firstly, setting an initial structure of a flexible neural tree, and then optimizing and training a fuzzy classification model according to a feature set in a train. And then inputting the test sample signal into the trained flexible neural tree model for performance verification. Similarly, for piano data H_2-i(i 1.., 8) were trained and tested.

Fig. 12 shows the final trained flexible neural tree model. Wherein the function instruction set is F { +₂,+₃,+₄And the terminal instruction set is as follows: t is＝{x₁,x₂,x₃,x₄,x₅,x₆,x₇}. Selecting root mean square error RMSE of 0.1 as a threshold value to judge whether the classification result is correct, and when the output RMSE value is larger than 0.1, identifying the RMSE as other musical instruments; otherwise, the identification is considered to be correct. The recognition rate of the i-th instrument is as follows:

R_{i} = \frac{n}{N} \times 100 %

wherein i is a sample category, N is the number of samples identified as i-type instruments, and N is the total number of samples of i-type instruments. The effect of this embodiment is that the sample x of the circle number_1-1And piano sample x_2-1The recognition rates of (a) and (b) are 84.9% and 98.7%, respectively, the mean root mean square error is 0.088685 and 0.045570, respectively, and fig. 13 and 14 are the circle number sample x, respectively_1-1And piano sample x_2-1The predicted output of the test sample is substantially consistent with the target output, although partially deviated, but generally very close. Experiments have shown that the present invention is very effective for the classification of instrument signals.

Claims

1. A method for classifying musical instrument signals comprises the steps of carrying out feature extraction on various musical instrument signals by adopting a phase space reconstruction module, a principal component analysis module, a feature extraction module and a flexible neural tree module, and then classifying the musical instrument signals by a classifier, and is characterized by comprising the following implementation steps of:

\begin{matrix} Y = (y_{1}, y_{2}, ..., y_{n}) = \\ (\begin{matrix} x (1) & x (2) & ... & x (n) \\ x (1 + τ) & x (2 + τ) & ... & x (n + τ) \\ . & . & . \\ . & . & . \\ . & . & . \\ x (1 + (m - 1) τ) & x (2 + (m - 1) τ) & ... & x (n + (m - 1) τ) \end{matrix}) \end{matrix} - - - (1)

I (τ) = Σ_{n = 1}^{N} P (x_{n}, x_{n + τ}) \log_{2} \frac{P (x_{n}, x_{n + τ})}{P (x_{n}) P (x_{n + τ})} - - - (2)

(2) determining embedding dimension m by false neighbor method

D_{m} (i) = | | y_{n} (i) - y_{n}^{N N} (i) | | - - - (3)

D_{m + 1}^{2} (i) = D_{m}^{2} (i) + | | x (i + τ m) - x^{N N} (i + τ m) | | - - - (4)

{ΔD}_{m} (i, m) = \sqrt{\frac{D_{m + 1}^{2} (i) - D_{m}^{2} (i)}{D_{m}^{2} (i)}} = \frac{| | y_{n + τ m} (i + τ m) - y_{n + τ m}^{N N} (i + τ m) | |}{D_{m} (i)} - - - (5)

(3) phase space reconstruction

After determining the delay time τ and the embedding dimension m, the original one-dimensional time sequence x may be determined as (x)₁,x₂,...,x_K)^TDelayed reconstruction to obtain an embedded vector x_m＝(x_t,x_t-τ,...,x_t-(m-1)τ)^TThese vectors define a point in the m-dimensional embedding space Y, and the respective delay amounts at different points in time produce a series of such points in the m-dimensional space, thereby constituting a signal reflecting the original system informationA phase space reconstruction matrix A of order m × n;

the probability density function is then found to be:

p (x) = \frac{1}{N} Σ_{n = 1}^{N} \frac{1}{h^{m}} k (\frac{x - x_{n}}{h}) - - - (7)

S＝F∪T＝{+₂,+₃,…,+_N}∪{z₁,z₂,…,z_n}(8)

wherein_i(2, 3.. N) represents a non-leaf node instruction, the non-leaf node having i inputs, i.e., i input variables; z is a radical of₁,z₂,...z_nRepresenting a leaf node instruction, the leaf node having no input variable;

adopting a Gaussian function as a flexible excitation function:

f (j_{i}, b_{i}, z) = e^{- {(\frac{z - j_{i}}{b_{i}})}^{2}} - - - (9)

wherein j is_iAnd b_iIs an adjustable parameter randomly generated in the process of creating the flexible neural tree, a fitness function is another problem needing to be determined in advance besides an instruction set, and a root mean square error is selected as the fitness function.