A kind of song clusters method based on Iterative k-means Algorithm
Technical field
The present invention relates to technical field of data processing, are that a kind of song based on Iterative k-means Algorithm is poly- specifically
Class method.
Background technique
Current Internet era, huge event portal website possess the library of mass data.User often has searching
Demand that is similar with song oneself is liked or belonging to same category of song.Traditional search engines are only suitable in user for mesh
Mark song information scans under conditions of being expressly understood that, can not according to the similitude of the emotion of auditory perceptual between song into
Row search.The method inefficiency that the artificial setting music label of traditional dependence classifies to music, not for magnanimity
The processing of data, and manual tag accuracy is lower.Therefore, the music portal website of existing environment lack to magnanimity song into
The efficient method accurately sorted out of row, cause user can not according to remaining song with oneself like the similitude of song come fast and easy
Oneself is looked in ground may interested song.
Authorization Notice No.: CN104077598B;A kind of invention and created name: emotion recognition side based on voice fuzzy cluster
A kind of emotion identification method based on voice fuzzy cluster is disclosed in the Chinese invention patent of method, comprising: to the voice of input
Signal is pre-processed;Voice signal after extraction process characteristic information (characteristic information include mel cepstrum coefficients (MFCC),
Fundamental tone, formant, short-time energy);Multiclass emotion is grouped, and phase is chosen according to the type after the grouping of multiclass emotion respectively
The characteristic information answered;Classification processing is carried out respectively according to the characteristic information that each group of emotion class combination is chosen;According to each group of feelings
Output result after feeling class assembled classification carries out speech emotion recognition;Beneficial effects of the present invention are to be chosen by different emotions
Different feature, with the FCM of the improved adaptive fuzzy K mean cluster method same features of emotion more all than traditional approach
The recognition effect of method is well very much, and discrimination is higher, and effect is more preferable.
((Mel-frequency cepstrum coefficients, MFCC) can reflect human ear to mel-frequency cepstrum coefficient
Pitch auditory properties, be a kind of characteristic parameter for being widely used in speech processes field.By by MFCC and its first-order difference,
Second differnce combines the static nature and behavioral characteristics that can reflect music signal comprehensively, improves the recognition performance to audio.
K-means algorithm is a kind of indirect clustering method based on similarity measurement between sample, belongs to unsupervised learning side
Method.K-means algorithm receives input quantity k, and n data object is then divided into k cluster to make cluster obtained
Meet: the object similarity in same cluster is higher;And the object similarity in different clusters is smaller.Wherein, similarity is clustered
It is to obtain one " center object " (center of attraction or cluster center) using the mean value of object in each cluster come what is calculated.
The course of work of k-means algorithm are as follows: arbitrarily select k object as initial clustering from n data object first
Center;And for remaining other objects, then according to both the Euclidean distance of they and these cluster centres judgements similarity, divide
Cluster most like with it representated by cluster centre is not assigned these to;Then each cluster for obtaining and newly clustering is calculated again
Center, i.e., the mean value of all objects in the cluster;This process is constantly repeated until canonical measure function starts convergence.One
As all using mean square deviation as canonical measure function.K cluster has the following characteristics that each cluster itself is compact as far as possible, and
It is separated as far as possible between each cluster.
Summary of the invention
The song clusters method based on Iterative k-means Algorithm that the purpose of the present invention is to provide a kind of is based on iteration k-
Means algorithm is obtained MFCC and is efficiently accurately clustered by emotional category to song progress using MFCC as acoustic feature, is realized
The function that magnanimity song is classified automatically in song library with emotional category.
The present invention is achieved through the following technical solutions: a kind of song clusters method based on Iterative k-means Algorithm, is extracted
The MFCC vector of song is as acoustic feature in song library, at the same with Iterative k-means Algorithm to song carry out emotion recognition and
Emotion is sorted out.
In order to preferably realize the present invention, further, emotion knowledge is carried out to song with Iterative k-means Algorithm twice
Do not sort out with emotion.
A kind of song clusters method based on Iterative k-means Algorithm, specifically includes the following steps:
Step S1: the mel-frequency cepstrum coefficient of song in song library is extracted at times, it is each to obtain every song day part
The MFCC vector of frame;
Step S2: carrying out song ID mark to MFCC vectors all in step S1, and the affiliated song of each MFCC vector is believed
Breath, which converges, is made into data set;
Step S3: first time k-means cluster is carried out using the data set in step S2 as input data, is obtained each
Cluster information belonging to MFCC vector, obtains the cluster result of MFCC vector;
Step S4: the MFCC vector as contained by every song generates K dimension label corresponding with the song in the ratio of each cluster
Vector set;
Step S5: the K dimension label vector set of every song is poly- as second of k-means of input data progress using in song library
Class obtains categorizing songs result.
In order to preferably realize the present invention, further, the specific steps of the step S1 include:
Step S1-1: pre-processing songs all in song library, extracts period at beginning, the climax period, coda of song
Period generates representative part of three WAV formatted files as every song, and carries out song to the representative part of every song
ID mark;
Step S1-2: reading with the library scipy of Python and handle the audio file of all WAV formats, obtains
Signal and frequency data;
Step S1-3: successively signal data progress preemphasis, framing, adding window, discrete Fourier transform, Meier are filtered,
Take logarithm, discrete cosine transform, the MFCC that seeks first-order difference, MFCC coefficient is obtained to each frame in conjunction with its first-order difference to
Amount.
In order to preferably realize the present invention, further, the step 1-3 specifically includes the following steps:
Step S1-3-1: preemphasis processing, promotion signal medium-high frequency are carried out to the original signal extracted in step S1-2
Part keeps signal more flat, while the formant of prominent high frequency;
Step S1-3-2: with 256 sampled points for a frame, framing is carried out to signal;And two possess one section between consecutive frame
Overlapping region;
Step S1-3-3: by each frame multiplied by Hamming window (Hamming Window), increase the continuity of frame left and right ends;
Step S1-3-4: discrete Fourier transform is carried out by signal, frequency domain is gone to by time domain, obtain the width of subsequent analysis
Degree spectrum;
Step S1-3-5: frequency multiplication is carried out to amplitude spectrum using melscale filter group and is added up, the frame data are obtained
In the energy value of each filter corresponding frequency band, i.e., frequency domain amplitude is filtered and is simplified;
Step S1-3-6: making natural logrithm operation to energy value, is converted into human ear to the non-linear relation of perception of sound;
Step S1-3-7: bringing logarithmic energy into discrete cosine transform, divides and removes redundant data, finds out MFCC coefficient;
Step S1-3-8: dynamic difference parameter extraction is carried out to the MFCC coefficient that step S1-3-7 is obtained, obtains MFCC's
First-order difference, and by its quiet behavioral characteristics in conjunction with and splice obtain final MFCC vector.
In order to preferably realize the present invention, further, the period at beginning of song refers to that song is opened in the step S1-1
Beginning 0-15 second, the climax period of song refer to that song climax starts 0-20 seconds, and the coda period of song refers to song last 20 seconds
First 15 seconds.
In order to preferably realize the present invention, further, the step S3 specifically includes the following steps:
Step S3-1: the K not close MFCC vectors of Euclidean distance from each other are chosen from all MFCC vectors of song library
Center vector as cluster;
Step S3-2: calculate removed in song library remaining MFCC in step S3-1 other than K MFCC vector having chosen to
Measure the center vector of each cluster between Euclidean distance, by each MFCC vector incorporate into with it in nearest cluster
In a cluster centered on Heart vector;
Step S3-3: the cluster center vector of K new clusters is calculated with standard k-means algorithm and obtains each dimension average value;
Step S3-4: repetition step S3-3 no longer changes or reaches the number of iterations until the center vector of K new clusters, obtains
To final cluster result.
Chosen in the step S3-1 K from each other the not close MFCC vector of Euclidean distance, in particular to, select batch
The K MFCC vector of distance as far as possible.Select the detailed process of a not close MFCC vector of Euclidean distance from each other of K such as
Under: center vector of the MFCC vector as first initial classes cluster is randomly choosed first, is then selected at the beginning of distance first
That the farthest center vector of MFCC vector as second initial classes cluster of the center vector of beginning class cluster, then reselection distance
The minimum distance of the first two vector (center vector of the center vector of first initial classes cluster, second initial classes cluster) is maximum
Center vector of the MFCC vector as third initial classes cluster, and so on, until selecting the center vector of K initial classes cluster.
In order to preferably realize the present invention, further, the step S4 specifically refers to the cluster final according to step S3
As a result, the ratio that N number of MFCC vector contained by every song in song library belongs to each cluster in K cluster is calculated, it is raw according to calculated result
At K dimension label vector set;
Wherein: N is the MFCC vector sum of a song;K is the cluster sum of first time k-means cluster.
In order to preferably realize the present invention, further, in the step S5 second of k-means cluster number of clusters not by
It is manually specified, but is intelligently determined based on " the cover thought " according to the number of songs in song library;The specific implementation steps are as follows:
Step S5-1: setting two distance thresholds D1 and D2, and D1 > D2;
Step S5-2: the K dimension label vector set of songs all in song library is fitted into a container, and establishes a cover
Label vector set C etc. is to be filled;
Step S5-3: one is randomly choosed in K dimension label vector sets all from container and is not belonging to the cover label vector
The label vector p of set C, and calculate the Euclidean distance of each vector in label vector p and the cover label vector set C:
If the cover label vector set C is sky, label vector p addition the cover label vector set C is performed the next step;
If D1 is all larger than at a distance from each vector in label vector p and the cover label vector set C, by label vector p
The cover label vector set C is added as a new the cover label vector;
If being less than D1 in label vector p and the cover label vector set C at a distance from a certain vector, label vector p is added
Enter the cover centered on the vector;
Step S5-4:, will mark if being less than D2 in label vector p and the cover label vector set C at a distance from a certain vector
Label vector p is deleted and is performed the next step from container C;
Step S5-5: S5-3 and S5-4 step is repeated until being empty, the K in the cover label vector set C at this time in container
Dimension label vector set is K2 class center vector of initialization of second of k-means cluster;
Step S5-6: start second of k-means cluster, calculate in song library and remove this in the K dimension label vector set of each song
The Euclidean distance of all label vectors and this K2 class center vector outside K2 class center vector draws each label vector
It is grouped into the cluster where the class center vector nearest from it;
Step S5-7: calculating the class center vector of K2 new clusters, that is, the standard k-means of each dimension average value is asked to calculate
Method;
Step S5-8: step S5-6 and S5-7 is repeated until K2 class center vector no longer changes or reach iteration time
Number.
The classification sum of the cluster number of clusters that K2 is second of k-means and song library song final classification.K- twice
The cluster result that all songs in song library are finally obtained after means cluster, by categorizing songs similar in Auditory Perception to together.
Compared with prior art, the present invention have the following advantages that and the utility model has the advantages that
(1) MFCC vector is obtained and using MFCC vector as acoustic feature by feelings the present invention is based on Iterative k-means Algorithm
Sense classification efficiently accurately clusters song progress, realizes the function that magnanimity song is classified automatically in song library with emotional category.
(2) the present invention is based on iteration k-means twice to be clustered to the song in song library, will be for auditory perceptual
Similar categorizing songs are to together, solving the problems, such as that manual sort is difficult and accuracy rate is told somebody what one's real intentions are.
Detailed description of the invention
Fig. 1 is that the present invention is based on the general flow charts of the song clusters method of Iterative k-means Algorithm.
Fig. 2 is that the present invention is based on the MFCC vectors in the song clusters method of Iterative k-means Algorithm to extract flow chart.
Fig. 3 is the process clustered the present invention is based on first time K-means in the song clusters method of Iterative k-means Algorithm
Figure.
Fig. 4 is that the present invention is based on the processes that second of K-mean in the song clusters method of Iterative k-means Algorithm is clustered
Figure.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, with reference to embodiments, to of the invention
Specific operation workflow is described in more detail, and exemplary embodiment of the invention and its explanation are only used for explaining this hair
It is bright, it is not intended as restriction of the invention.
Embodiment 1:
A kind of song clusters method based on Iterative k-means Algorithm of the present embodiment specifically includes as shown in Figs 1-4
Following steps:
Step S1: the mel-frequency cepstrum coefficient of song in song library is extracted at times, it is each to obtain every song day part
The MFCC vector of frame;
Step S2: carrying out song ID mark to MFCC vectors all in step S1, and the affiliated song of each MFCC vector is believed
Breath, which converges, is made into data set;
Step S3: first time k-means cluster is carried out using the data set in step S2 as input data, is obtained each
Cluster information belonging to MFCC vector, obtains the cluster result of MFCC vector;
Step S4: the MFCC vector as contained by every song generates K dimension label corresponding with the song in the ratio of each cluster
Vector set;Every song has and only one corresponding K dimension label vector set;
Step S5: the K dimension label vector set of every song is poly- as second of k-means of input data progress using in song library
Class obtains categorizing songs result.
The present invention carries out the song in song library with iteration k-means twice using MFCC coefficient as acoustic feature
Cluster.Cluster extracts label in the song library, i.e., traditional characteristic vector pickup mistake for all MFCC vectors in song library for the first time
Journey.Second of cluster carries out the song clusters in song library according to the label vector after processing, will be for auditory perceptual similar temperament
Categorizing songs are to together.
Embodiment 2:
The present embodiment limits the Meier frequency for extracting song in song library in the step S1 at times based on embodiment 1
Rate cepstrum coefficient and the 26 dimension MFCC vectors for obtaining every each frame of song day part, the i.e. feature of first time k-means cluster
Vector is 26 dimension MFCC vectors.Meanwhile the value that the cluster sum K that first time k-means is clustered is arranged in the step S3 is 50, i.e.,
50 representation vectors in MFCC vector in song library with symbolic characteristic are extracted by first time k-means cluster.
Based on above-mentioned restriction, song clusters method of one of the present embodiment based on Iterative k-means Algorithm is specific to wrap
Include step S1~S5.
Step S1: the mel-frequency cepstrum coefficient of song in song library is extracted at times, it is each to obtain every song day part
The MFCC vector of frame.
Step S1 is specifically referred to:
Step S1-1: pre-processing songs all in song library, extracts period at beginning, the climax period, coda of song
Period generates representative part of three WAV formatted files as every song, and carries out song to the representative part of every song
ID mark;
Step S1-2: reading with the library scipy of Python and handle the audio file of all WAV formats, obtains
Signal data and sample frequency;
Step S1-3: successively signal data progress preemphasis, framing, adding window, discrete Fourier transform, Meier are filtered,
Logarithm is taken, discrete cosine transform, first-order difference is sought, MFCC coefficient is obtained to 26 dimension MFCC of each frame in conjunction with its first-order difference
Vector.
The step 1-3 specifically includes step S1-3-1~S1-3-8:
Step S1-3-1: preemphasis.It refers specifically to: the original signal data extracted in step S1-2 is carried out at preemphasis
Reason, promotion signal medium-high frequency part keep signal more flat, while the formant of prominent high frequency.
Realize preemphasis processing specific formula is as follows:
S ' (n)=S (n)-β * S (n);
Wherein, β is to aggravate coefficient, and β is set as 0.95;
S ' (n) is the audio digital signals after aggravating;
S (n) is the audio digital signals after sampling.
When aggravating factor beta is 0.95, the formula of preemphasis processing be can simplify as S '(n)=0.05S(n)。
Step S1-3-2: framing.It refers specifically to: with 256 sampled points for a frame, framing being carried out to signal.In order to avoid facing
The variation of nearly two frame is excessive, therefore can allow between two consecutive frames and possess one section of overlapping region.
Step S1-3-3: adding window.It refers specifically to: by each frame multiplied by Hamming window (Hamming Window), increasing frame or so
The continuity at both ends.
In order to handle voice signal, need to carry out voice signal adding window, that is, the primary data only handled in window.Cause
It is very long for actual voice signal, very long data must disposably can't be handled in actual mechanical process.It is bright
The solution of intelligence is exactly to take one piece of data every time, is analyzed, and then removes one piece of data again, then analyzed.
One piece of data is taken in a manner of constructing such a function of Hamming window in the present invention.Hamming window is a kind of function, it
Shape as window, similar function is called window function.
Realize adding window specific formula is as follows:
Wherein, a is window coefficient, and a takes 0.46;
N is the size of frame;
N=0,1 ..., N-1.
Step S1-3-4: discrete Fourier transform, FFT.It refers specifically to: carrying out discrete Fourier transform and convert time-domain signal
Subsequent analysis is carried out to frequency domain.
Step S1-3-5: Meier filtering.Specific implementation is that the amplitude spectrum obtained using S1-3-4 is filtered with 26 respectively
Each of wave device filter carries out frequency multiplication and adds up, and obtains the frame data in the energy of each filter corresponding frequency band
Value;It is filtered using melscale filter group, frequency domain amplitude is simplified.
Step S1-3-6: logarithm is taken.It refers specifically to: natural logrithm operation being made to energy value, is converted into human ear to perception of sound
Non-linear relation.
Step S1-3-7: discrete cosine transform, DCT.It refers specifically to: bringing logarithmic energy into discrete cosine transform, divide except superfluous
Remainder evidence finds out MFCC coefficient.
Realize discrete cosine transform specific formula is as follows:
Wherein, L is MFCC coefficient order, and L takes 13;
M is Meier number of filter;
C(n)For MFCC coefficient;
M is filter ID;
S(m)For the logarithmic energy of filter group output.
Step S1-3-8: seek first-order difference, by MFCC coefficient in conjunction with its first-order difference.It refers specifically to: S1-3-7 is arrived
MFCC coefficient carry out dynamic difference parameter extraction, obtain the first-order difference of MFCC, and will its quiet behavioral characteristics combine after splice
Obtain 26 final dimension MFCC vectors.
It is as follows to implement formula:
Wherein: indicating t-th of first-order difference;
Indicate t-th of cepstrum coefficient;
Indicate the t+1 cepstrum coefficient;
Indicate the t-1 cepstrum coefficient;
Indicate the t+k cepstrum coefficient;
Indicate the t-k cepstrum coefficient;
Indicate the specific order of MFCC coefficient;
Indicate the traversal of K;
Indicate the order of cepstrum coefficient;
It indicates the time difference of single order inverse, can use 1 or 2.
Step S2: carrying out song ID mark to MFCC vectors all in step S1, and the affiliated song of each MFCC vector is believed
Breath, which converges, is made into data set.Song ID mark is carried out to each frame MFCC vector, indicates it belongs to which song in song library, simultaneously
All MFCC vectors are convenient for subsequent operation as data set using in song library.
Step S3: first time k-means cluster is carried out using the data set in step S2 as input data, is obtained each
Cluster information belonging to MFCC vector, obtains the cluster result of MFCC vector.When carrying out first time k-means cluster operation, K is set
Value be 50, it is intended to 50 in MFCC vector with symbolic characteristic are extracted in song library by first time k-means cluster
Representation vector.
The step S3 specifically includes step S3-1~step S3-4.
Step S3-1: 50 less close MFCC vectors of Euclidean distance from each other are chosen from all MFCC vectors of song library
Center vector as initial cluster;
Step S3-2: calculating the Euclidean distance in song library between remaining MFCC vector and the center vector of each cluster, will be every
A MFCC vector incorporates into a cluster centered on its center vector apart from nearest cluster;
Step S3-3: calculating the cluster center vector of 50 new clusters, that is, seeks the standard k-means algorithm of each dimension average value;
Step S3-4: step S3-3 is repeated until 50 cluster center vectors no longer change or reach the number of iterations;
Step S4: it according to the cluster result that step S3 is final, calculates N number of MFCC vector contained by every song in song library and belongs to
The ratio of 50 class clusters;
50 dimension label vectors are generated according to calculated resultni(i=1,2 ..., 50).
Wherein: niBelong to the quantity of ith cluster, i=1,2 ..., K for the MFCC vector of song;
N is the MFCC vector sum of a song;
K is the cluster sum of first time k-means cluster.
Step S5: it is poly- to carry out second of k-means as feature vector for 50 dimension label vectors of every song using in song library
Class, and last song classification number is intelligently determined with the quantity of song in song library based on " the cover " thought, finally obtain song
Classification results.
Traditional clustering algorithm is all feasible solution for the application problem of general small data quantity, but when number
When increasing according to amount in geometry grade, enforcement difficulty also can be higher and higher.Here data volume growth refers to: 1. the entry of data is very
More, there are many sample data vector that entire data set includes;2. it is very big for each its dimension of sample data vector in 1., i.e.,
Include multiple attributes;3. there are many center vector to be clustered.
The cover thought in the application is used in the thick cluster before K mean value.In view of K mean value has to determine in use
The size of K, and often data set not can determine that the value size of K in advance, in this way if K take unreasonable can bring K mean value
Error is very big, that is to say, that K mean value is poor to the anti-interference ability of noise.
The concrete thought of the cover thought in the present invention are as follows: generated using a simple distance calculation method with a fixed number
The subset of amount being overlapped.Data are divided into the different subsets being overlapped by coarse distance calculating method, are then only counted
The sample data vector in the same overlapping subset is calculated to reduce the sample size for needing distance to calculate, while obtaining
Subset number is the cluster number of clusters K of k-means algorithm.
The step S5 specifically includes step S5-1~step S5-5.
Step S5-1: setting two distance thresholds D1 and D2, and D1 > D2;
Step S5-2: the K dimension label vector set of songs all in song library is fitted into a container, and establishes a cover
Label vector set C etc. is to be filled;When establishing lid label vector the cover label vector set C for the first time, lid label vector the cover mark
Signing vector set C is sky;
Step S5-3: one is randomly choosed in K dimension label vector sets all from container and is not belonging to the cover label vector
The label vector p of set C, and calculate the Euclidean distance of each vector in label vector p and the cover label vector set C:
If the cover label vector set C is sky, label vector p addition the cover label vector set C is performed the next step;
If D1 is all larger than at a distance from each vector in label vector p and the cover label vector set C, by label vector p
The cover label vector set C is added as a new the cover label vector;
If being less than D1 in label vector p and the cover label vector set C at a distance from a certain vector, label vector p is added
Enter the cover centered on the vector;
Step S5-4:, will mark if being less than D2 in label vector p and the cover label vector set C at a distance from a certain vector
Label vector p is deleted and is performed the next step from container C;
Step S5-5: S5-3 and S5-4 step is repeated until being empty, the K in the cover label vector set C at this time in container
Dimension label vector set is K2 class center vector of initialization of second of k-means cluster;
Step S5-6: start second of k-means cluster, calculate in song library and remove this in the K dimension label vector set of each song
The Euclidean distance of all label vectors and this K2 class center vector outside K2 class center vector draws each label vector
It is grouped into the cluster where the class center vector nearest from it;
Step S5-7: calculating the class center vector of K2 new clusters, that is, the standard k-means of each dimension average value is asked to calculate
Method;
Step S5-8: step S5-6 and S5-7 is repeated until K2 class center vector no longer changes or reach iteration time
Number.
K, K2 is the cluster number of clusters of k-means, and wherein K is the cluster number of clusters of first time k-means, and K2 is second of k-
The cluster number of clusters of means and the classification sum of song library song final classification.Song library is finally obtained after k-means cluster twice
In all songs cluster result, by categorizing songs similar in Auditory Perception to together.
The cover is equivalent to a container in the present invention, if in step S5-3 in label vector p and the cover label vector set C
The distance of a certain vector X is less than D1, then label vector p is added to the cover centered on vector X.That is the cover label vector collection
Closing in C has a vector X, if label vector p is less than D1 at a distance from vector X in step S5-3, label vector p should belong to and return
Enter the cover centered on vector X.
The present invention provides a kind of song clusters methods based on second iteration k-means as a result,.It can be according to song library
MFCC vector solves magnanimity song according to emotional category from being referred to similar songs in music storehouse together in terms of auditory perceptual
Difficult problem is clustered, more convenient song for correctly finding and oneself may like but do not know specifying information is enabled users to
It is bent.
Embodiment 3:
The present embodiment advanced optimizes on the basis of embodiment 1, embodiment 2, and the period at beginning of song refers to that song is opened
Beginning 0-15 second, the climax period of song refer to that song climax starts 0-20 seconds, and the coda period of song refers to song last 20 seconds
First 15 seconds.
Namely refer to, songs all in song library are pre-processed in the step S1-1, extracts the song beginning (preceding 15
Second), climax (20 seconds intermediate), coda (first 15 seconds in ending 20 seconds) generates three WAV formatted files, the generation as every first song
Exterior portion point, and song ID mark is carried out to it.
The other parts of the present embodiment are identical as embodiment 1 or embodiment 2, and so it will not be repeated.
The above is only presently preferred embodiments of the present invention, not does limitation in any form to the present invention, it is all according to
According to technical spirit any simple modification to the above embodiments of the invention, equivalent variations, protection of the invention is each fallen within
Within the scope of.