CN106782583A

CN106782583A - Robust scale contour feature extraction algorithm based on nuclear norm

Info

Publication number: CN106782583A
Application number: CN201611132721.3A
Authority: CN
Inventors: 李锵; 王蒙蒙; 关欣
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2016-12-09
Filing date: 2016-12-09
Publication date: 2017-05-31
Anticipated expiration: 2036-12-09
Also published as: CN106782583B

Abstract

The invention discloses a kind of specially robust scale contour feature extraction algorithm based on nuclear norm, step 1, music signal to be entered conversion；Step 2, windowing process is carried out to music signal, and carry out Fourier transform, obtain the time-frequency matrix of music signal, it is determined that initial beat point；Step 3, using nuclear norm constraint frequency spectrum low-rank is carried out to time-frequency rank of matrix；Simultaneously with the noise spot in a norm constraint matrix, low-rank is carried out to signal spectrum with following convex optimization problem, and remove noise；Step 4, during iterative constrained, using the low-rank characteristic of frequency spectrum, realize threshold adaptive adjustment algorithm；Step 5, effective dimension-reduction treatment is carried out to time-frequency matrix, obtain the chord feature of 12 dimensions.Compared with prior art, the present invention has extracted the chord feature of robustness；Significantly reduce the time of algorithm；The scale contour feature of the music signal of different type and style can exactly be recovered.

Description

Robust scale contour feature extraction algorithm based on nuclear norm

Technical field

The invention belongs to the audio signal analysis field in computer audio system, more particularly to a kind of scale profile is special Levy extraction algorithm.

The method that the present invention is provided can exactly recover the scale contour feature of the music signal of different type and style.

Background technology

The harmonic components of music are the important elements of music, are the important topics in music information retrieval field.Audio signal Different frequency fundamental frequency and its harmonic components be to constitute chord and influence the important component of music color.In addition, different frequency The key factor that chord is advanced is constituted during composition extension in time.Intuitively say, the music in the chord duration is in frequency Domain can present certain structural --- low-rank characteristic.The chord feature extraction of music belongs to the audio in computer audio system A part in signal analysis, this field mainly processes the various information separated from voice signal.Meanwhile, The chord feature of music is also to extract the basis of some advanced music information.

The mid-level features of music refer to be extracted from audio signal, and can represent the information of audio signal, most Eventually can be used as a part for advanced features.In recent years, many scholars propose various mid-level features that can characterize music. That be wherein most widely used is exactly sound level contour feature (Pitch Class Profiles, PCP).However, due to original sound Include voice, drumbeat, plosive and Gaussian noise in music signal so that quality and the sound to be analyzed of PCP characteristic performances The type of music signal has very big relation.There are many scholars to propose the improvement project based on PCP, for example, what Gomez was proposed The EPCP (Enhanced PCP) that HPCP (Harmonic PCP), Lee are proposed.These schemes all from change frequency domain extraction composition Hand, so the superior performance for obtaining being suitable for specific music type feature.

In addition, from for chord traveling, because each chord has certain duration, during this period of time PCP is special The stability levied determines the accuracy rate of chord identification.There are many scholars to propose the improvement of-chromagram that advanced based on PCP Scheme.Fujisjima assumes that chord persistently counts frame, using sliding window mean filter, so as to reduce the influence of noise, and avoids Chord frequently changes；GeoffroyPeeters avoids the frequent change of chord using sliding window medium filtering；Bello is false It is indeclinable to determine chord in a beat, avoids chord frequently to change using beat simultaneous techniques.

Most of Beat-Tracking Model is made up of note end-point detection, end points intensity curve periodicity extraction two parts.No matter Which kind of model, the basic goal of end-point detection is all the peak value for choosing effective end points curve, its be substantially extreme point whether It is the clustering problem of beat point.

It can be seen that, it is structural that most of chord feature extraction scheme does not all consider that music signal is showed on frequency spectrum, It is known it is assumed that so as to optimize chord feature using some simple processing methods using some.

The content of the invention

Based on prior art, the present invention proposes a kind of robust scale contour feature extraction algorithm based on nuclear norm, will Chord feature extraction problem is converted into convex optimization problem, using nuclear norm constraint and a norm constraint, meanwhile, using the frequency of chord The showed low-rank characteristic of spectrum, realizes threshold adaptive algorithm.

A kind of specially robust scale contour feature extraction algorithm based on nuclear norm of the invention, the algorithm includes following step Suddenly：

Step 1, music signal to be entered is converted to the single pass standard audios of sample rate 22050Hz/16bit/, as The data points that audio signal x (n) being referenced, wherein n are included by audio signal after conversion；

Step 2, windowing process is carried out to music signal x (n), window function is W (k), wherein k is wide for the window of window function Degree, so as to obtain signal time-domain matrix X_k×m, wherein X_·,m=x (k m/2:K m/2+m) W (k), m is the frame obtained after framing Number, then carries out Fourier transform (Fourier Transform), obtains the time-frequency matrix D=F X of music signal, and wherein F is Fourier transform matrix；

The harmonic components and noise that step 3, hypothesis audio signal frequency spectrum are included are separate, i.e. D=A+E, its Middle matrix A represents that the harmonic components included in spectral matrix do the matrix for constituting, and E represents making an uproar included in spectral matrix The matrix that sound composition is constituted；According to assumed above, the recovery of harmonics matrix A can be attributed to following convex optimization problem：

S.t.A+E=D

Wherein | | | |_*The nuclear norm (nuclearnorm) of representing matrix, i.e. the singular value sum of matrix；||||₁Represent square One norm of battle array, i.e., all nonzero element sums；The matrix A separated is exactly the frequency spectrum after low-rank, and matrix E is then wrapped Containing sparse big noise and other nonharmonic components, D is then the frequency spectrum of original music signal；

Step 4, during iterative constrained, using the low-rank characteristic of frequency spectrum, realize threshold adaptive adjustment algorithm；Specifically Step is as follows：Initialization singular value interceptive value parameter μ, parameter lambda, iteration index k=0, provisional matrix Y₀=D, E₀It is full zero moment Battle array；Carry out singular value decompositionObtain singular value matrix Σ；Then, from μ_kTo 1.5 μ_kDeng Choose 20 data points in intervalWherein 1≤i≤20, for eachCarry out singular value decomposition inverse operationOnly it is distributed on several Frequency points due to harmonic components, therefore calculating matrixIn a certain row variance, And fromIt is middle to choose during so that variance is maximum, corresponding index i, and makeComplete threshold value adaptive Answer selection algorithm；Calculate the matrix that this step is obtainedUpdate With k=k+1 until convergence.

Step 5, effective dimension-reduction treatment is carried out to time-frequency matrix, obtain the chord feature of 12 dimensions.Under normal circumstances, it is stipulated that Note A₀Frequency on the basis of the frequency 440Hz at place, and pass throughObtain the frequency values at other notes.Wherein b be note with A₀Between interval difference.Then, by mapping equationCome to humorous Each frequency content of ripple matrix A is mapped, so as to obtain robust scale Outline Feature Vector.Wherein x homographies A is each The corresponding frequency values of row, and f_refThen pass throughObtain.

Compared with prior art, the present invention effectively removes voice while music frequency domain original structure is not destroyed Destruction with other noises to structure of a chord, has extracted the chord feature of robustness；Significantly reduce the time of algorithm；Energy Recover the scale contour feature of the music signal of different type and style exactly.

Brief description of the drawings

Fig. 1 is overall flow figure of the present invention；

Fig. 2 is different type chord traveling figure；

Fig. 3 is the present invention and the Comparative result schematic diagram of other algorithms, 1, original ALM algorithms；2nd, APS-ALM algorithms；3、 ASP algorithms.

Specific embodiment

The present invention is described in further detail below in conjunction with the accompanying drawings.

Step 1, music signal conversion：Music signal to be entered is converted into sample rate 22050Hz/16bit/ single pass Standard audio, as the audio signal being referenced.

Step 3, frequency spectrum low-rank and noise remove：Be can see from frequency spectrum, music signal mainly includes two kinds of compositions： Harmonic components and sparse big noise.Harmonic components show as with obvious low-rank structure in structure；And sparse big noise master Show as openness.Therefore, low-rank is carried out to signal spectrum with following convex optimization problem, and removes noise：

S.t.A+E=D

Wherein | | | |_*The nuclear norm (nuclear norm) of representing matrix, i.e. the singular value sum of matrix；||||₁Represent square One norm of battle array, i.e., all nonzero element sums.

The matrix A separated is exactly the frequency spectrum after low-rank, and matrix E is then comprising sparse big noise and some its His nonharmonic component, D is then the frequency spectrum of original music signal.

Step 4, PCP characteristics extractions：

(4-1) defines a frequency spectrum to the mapping matrix of PCP features, and matrix form is as follows：

Wherein 2 π ω_j, 0≤j≤N-1 represents the frequency values representated by each frequency content in frequency spectrum, and N represents frequency spectrum and obtains The frequency number scope for obtaining；And f_i, 1≤i≤12 item represent the frequency values corresponding to 12 scales.

Wherein,

It is mapping function, according to the function obtained by twelve-tone equal temperament, with universality；

(4-2) obtains the chord traveling feature under low-rank is constrained, i.e. RPCP features by C=PA.

Present invention employs the test database of the chord automatic identification of international music information retrieval evaluation and test match (MIREX) (Practice Data), the different snatch of music of 20 style of song rhythm altogether, meanwhile, each snatch of music has 39 or 40 Position expert carries out manual markings to the chordal type of the fragment.

It is to verify the validity of inventive algorithm, the robustness scale profile based on nuclear norm proposed by the invention is special The influence that algorithm advances chord is levied to be compared with existing epidemic algorithms.Using smooth journey of the main scale in chord traveling Spend to be quantitatively described algorithms of different, so as to judge the influence that algorithms of different advances chord.Result is as shown in Figure 2.From From the point of view of experimental result, compared with other algorithms, the chord on main scale can be advanced more preferable smoothing effect to this algorithm, makes Chord keeps stabilization, the frequent degree reduction of change within a certain period of time, and this chord identification to entire song has guidance to make With.

In addition, in order to verify inhibitory action of this algorithm to noise, while accurate to chord identification in order to verify this algorithm The influence of rate, we use template matching algorithm, and with Harmonic PCP the most popular features as a comparison, to this calculation The validity of method is illustrated.Experimental result is as shown in table 1.The chord obtained by this algorithm is can be seen that from experimental result Recognition accuracy improves 9% than Harmonic PCP.

Table 1, the average chord discrimination based on robust PCP and HPCP compare

Generally, the method for solving the nuclear norm constraint convex optimization problem of low-rank is to use augmented vector approach (ALM), This method is relatively broad as the application of input for sparse matrix.But, with the increase of matrix dimensionality, the time will be big It is big to increase.

According to chord feature it is exclusive the characteristics of, the present invention propose a kind of threshold adaptive adjustment algorithm based on chord feature ASP-ALM (adaptive selectingparameterArgument Lagrange Multiplier, ASP-ALM) algorithm. The algorithm flow is as follows：Initialization singular value interceptive value parameter μ, parameter lambda, iteration index k=0, provisional matrix Y₀=D, E₀For Full null matrix；Carry out singular value decompositionObtain singular value matrix Σ；Then, from μ_kArrive 1.5μ_k20 data points are chosen at equal intervalsWherein 1≤i≤20, for eachCarry out singular value decomposition inverse operationOnly it is distributed on several Frequency points due to harmonic components, therefore calculating matrixIn a certain row variance, And fromIt is middle to choose during so that variance is maximum, corresponding index i, and makeComplete threshold value adaptive Answer selection algorithm；Calculate the matrix that this step is obtainedUpdate With

Major chord	Ab	A	Bb	B	C	Db	D	Eb	E	F	Gb	G
													RPCP (%)	76.1	80	76.6	69.0	76.1	71.8	80.4	72.9	79.6	77.6	73.3	63.6
HPCP (%)	73	78	63.8	66.7	71.7	69.2	78.4	64.6	71.4	61.2	68.9	63.6
													Minor triad	Abm	Am	Bbm	Bm	Cm	Dbm	Dm	Ebm	Em	Fm	Gbm	Gm
RPCP (%)	84.8	74	69	63.4	88.2	87	75	43.6	65.2	80.4	76.3	66.7
													HPCP (%)	72.7	73.7	67.9	58.5	74.5	85.7	73.5	41	65.2	67.9	63.2	56.4

K=k+1 is until convergence.

Adaptive algorithm algorithm flow is as shown in Figure 1.Wherein μ enters the degree that row matrix is recovered in representing ALM algorithms.ASP- ALM algorithms can greatly reduce time loss of the ALM algorithms in chord characteristic extraction procedure.

Test result contrast is as shown in Figure 3：It will be clear that time loss has obtained big reduction from result.

Claims

1. it is a kind of specially based on nuclear norm robust scale contour feature extraction algorithm, it is characterised in that the algorithm include following step Suddenly：

Step (1), music signal to be entered is converted to the single pass standard audios of sample rate 22050Hz/16bit/, as quilt The data points that audio signal x (n) of reference, wherein n are included by audio signal after conversion；

Step (2), windowing process is carried out to music signal x (n), window function is W (k), wherein k is the window width of window function, So as to obtain signal time-domain matrix X_k×m, wherein X_·,m=x (k m/2:K m/2+m) W (k), m is the frame number obtained after framing, Then Fourier transform (Fourier Transform) is carried out, the time-frequency matrix D=F X of music signal are obtained, wherein F is Fu In leaf transformation matrix；

The harmonic components and noise that step (3), hypothesis audio signal frequency spectrum are included are separate, i.e. D=A+E, wherein Matrix A represents that the harmonic components included in spectral matrix do the matrix for constituting, and E represents the noise included in spectral matrix The matrix that composition is constituted；According to assumed above, the recovery of harmonics matrix A can be attributed to following convex optimization problem：

\underset{A, E}{m i n} | | A | |_{*} + λ | | E | |_{1}

S.t.A+E=D

Wherein | | | |_*The nuclear norm (nuclear norm) of representing matrix, i.e. the singular value sum of matrix；|| ||₁Representing matrix A norm, i.e., all nonzero element sums；The matrix A separated is exactly the frequency spectrum after low-rank, and matrix E is then included Sparse big noise and other nonharmonic components, D are then the frequency spectrums of original music signal；

Step (4), during iterative constrained, using the low-rank characteristic of frequency spectrum, realize threshold adaptive adjustment algorithm；Specific step It is rapid as follows：Initialization singular value interceptive value parameter μ, parameter lambda, iteration index k=0, provisional matrix Y₀=D, E₀It is full zero moment Battle array；Carry out singular value decompositionObtain singular value matrix Σ；Then, from μ_kTo 1.5 μ_kDeng Choose 20 data points in intervalWherein 1≤i≤20, for eachCarry out singular value decomposition inverse operationOnly it is distributed on several Frequency points due to harmonic components, therefore calculating matrixIn a certain row variance, and FromIt is middle to choose during so that variance is maximum, corresponding index i, and makeComplete threshold adaptive choosing Select algorithm；Calculate the matrix that this step is obtainedUpdateY_k+1= Y_k+μ_k(D-A_k+1-E_k+1) and k=k+1 until convergence.

Step (5), effective dimension-reduction treatment is carried out to time-frequency matrix, obtain the chord feature of 12 dimensions.Under normal circumstances, it is stipulated that sound Symbol A₀Frequency on the basis of the frequency 440Hz at place, and pass throughObtain the frequency values at other notes.Wherein b is note and A₀ Between interval difference.Then, by mapping equationCome to harmonic wave square Each frequency content of battle array A is mapped, so as to obtain robust scale Outline Feature Vector.Wherein x homographies A is per a line institute Corresponding frequency values, and f_refThen pass throughObtain.

2. a kind of robust scale contour feature extraction algorithm based on nuclear norm as claimed in claim 1, it is characterised in that institute The threshold adaptive adjustment algorithm algorithm is stated to comprise the following steps：

Initialization singular value interceptive value parameter μ, parameter lambda, iteration index k=0, provisional matrix Y₀=D, E₀It is full null matrix；Enter Row singular value decompositionObtain singular value matrix Σ；Then, from μ_kTo 1.5 μ_kAt equal intervals Choose 20 data pointsWherein 1≤i≤20, for eachCarry out singular value decomposition inverse operation Only it is distributed on several Frequency points due to harmonic components, therefore calculating matrixIn a certain row variance, and from It is middle to choose during so that variance is maximum, corresponding index i, and makeComplete threshold adaptive selection algorithm；Calculate The matrix that this step is obtainedUpdateY_k+1=Y_k+μ_k(D- A_k+1-E_k+1) and k=k+1 until convergence.