CN109065071A - A kind of song clusters method based on Iterative k-means Algorithm - Google Patents

A kind of song clusters method based on Iterative k-means Algorithm Download PDF

Info

Publication number
CN109065071A
CN109065071A CN201811010257.XA CN201811010257A CN109065071A CN 109065071 A CN109065071 A CN 109065071A CN 201811010257 A CN201811010257 A CN 201811010257A CN 109065071 A CN109065071 A CN 109065071A
Authority
CN
China
Prior art keywords
song
vector
cluster
mfcc
label vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811010257.XA
Other languages
Chinese (zh)
Other versions
CN109065071B (en
Inventor
江春华
戴鑫铉
龚超
徐若航
刘耀方
王杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201811010257.XA priority Critical patent/CN109065071B/en
Publication of CN109065071A publication Critical patent/CN109065071A/en
Application granted granted Critical
Publication of CN109065071B publication Critical patent/CN109065071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/075Musical metadata derived from musical analysis or for use in electrophonic musical instruments
    • G10H2240/081Genre classification, i.e. descriptive metadata for classification or selection of musical pieces according to style

Abstract

The song clusters method based on Iterative k-means Algorithm that the invention discloses a kind of, the MFCC vector of song in song library is extracted as acoustic feature, emotion recognition is carried out to song with Iterative k-means Algorithm simultaneously and emotion is sorted out, specifically includes the following steps: extracting the MFCC of song in song library, the MFCC vector of every song is obtained;All MFCC vectors are carried out song ID label and converged to be made into data set;First time k-means cluster is carried out to data set, obtains each MFCC vector clusters result;The ratio that the MFCC vector according to contained by song belongs to each cluster generates K dimension label vector set;Second of k-means cluster is carried out to all K dimension label vector sets, obtains final song clusters result.The present invention can classify magnanimity song in song library according to the similitude of the emotion of auditory perceptual between song, to more precisely effectively recommend similar songs by emotional semantic classification for user.

Description

A kind of song clusters method based on Iterative k-means Algorithm
Technical field
The present invention relates to technical field of data processing, are that a kind of song based on Iterative k-means Algorithm is poly- specifically Class method.
Background technique
Current Internet era, huge event portal website possess the library of mass data.User often has searching Demand that is similar with song oneself is liked or belonging to same category of song.Traditional search engines are only suitable in user for mesh Mark song information scans under conditions of being expressly understood that, can not according to the similitude of the emotion of auditory perceptual between song into Row search.The method inefficiency that the artificial setting music label of traditional dependence classifies to music, not for magnanimity The processing of data, and manual tag accuracy is lower.Therefore, the music portal website of existing environment lack to magnanimity song into The efficient method accurately sorted out of row, cause user can not according to remaining song with oneself like the similitude of song come fast and easy Oneself is looked in ground may interested song.
Authorization Notice No.: CN104077598B;A kind of invention and created name: emotion recognition side based on voice fuzzy cluster A kind of emotion identification method based on voice fuzzy cluster is disclosed in the Chinese invention patent of method, comprising: to the voice of input Signal is pre-processed;Voice signal after extraction process characteristic information (characteristic information include mel cepstrum coefficients (MFCC), Fundamental tone, formant, short-time energy);Multiclass emotion is grouped, and phase is chosen according to the type after the grouping of multiclass emotion respectively The characteristic information answered;Classification processing is carried out respectively according to the characteristic information that each group of emotion class combination is chosen;According to each group of feelings Output result after feeling class assembled classification carries out speech emotion recognition;Beneficial effects of the present invention are to be chosen by different emotions Different feature, with the FCM of the improved adaptive fuzzy K mean cluster method same features of emotion more all than traditional approach The recognition effect of method is well very much, and discrimination is higher, and effect is more preferable.
((Mel-frequency cepstrum coefficients, MFCC) can reflect human ear to mel-frequency cepstrum coefficient Pitch auditory properties, be a kind of characteristic parameter for being widely used in speech processes field.By by MFCC and its first-order difference, Second differnce combines the static nature and behavioral characteristics that can reflect music signal comprehensively, improves the recognition performance to audio.
K-means algorithm is a kind of indirect clustering method based on similarity measurement between sample, belongs to unsupervised learning side Method.K-means algorithm receives input quantity k, and n data object is then divided into k cluster to make cluster obtained Meet: the object similarity in same cluster is higher;And the object similarity in different clusters is smaller.Wherein, similarity is clustered It is to obtain one " center object " (center of attraction or cluster center) using the mean value of object in each cluster come what is calculated.
The course of work of k-means algorithm are as follows: arbitrarily select k object as initial clustering from n data object first Center;And for remaining other objects, then according to both the Euclidean distance of they and these cluster centres judgements similarity, divide Cluster most like with it representated by cluster centre is not assigned these to;Then each cluster for obtaining and newly clustering is calculated again Center, i.e., the mean value of all objects in the cluster;This process is constantly repeated until canonical measure function starts convergence.One As all using mean square deviation as canonical measure function.K cluster has the following characteristics that each cluster itself is compact as far as possible, and It is separated as far as possible between each cluster.
Summary of the invention
The song clusters method based on Iterative k-means Algorithm that the purpose of the present invention is to provide a kind of is based on iteration k- Means algorithm is obtained MFCC and is efficiently accurately clustered by emotional category to song progress using MFCC as acoustic feature, is realized The function that magnanimity song is classified automatically in song library with emotional category.
The present invention is achieved through the following technical solutions: a kind of song clusters method based on Iterative k-means Algorithm, is extracted The MFCC vector of song is as acoustic feature in song library, at the same with Iterative k-means Algorithm to song carry out emotion recognition and Emotion is sorted out.
In order to preferably realize the present invention, further, emotion knowledge is carried out to song with Iterative k-means Algorithm twice Do not sort out with emotion.
A kind of song clusters method based on Iterative k-means Algorithm, specifically includes the following steps:
Step S1: the mel-frequency cepstrum coefficient of song in song library is extracted at times, it is each to obtain every song day part The MFCC vector of frame;
Step S2: carrying out song ID mark to MFCC vectors all in step S1, and the affiliated song of each MFCC vector is believed Breath, which converges, is made into data set;
Step S3: first time k-means cluster is carried out using the data set in step S2 as input data, is obtained each Cluster information belonging to MFCC vector, obtains the cluster result of MFCC vector;
Step S4: the MFCC vector as contained by every song generates K dimension label corresponding with the song in the ratio of each cluster Vector set;
Step S5: the K dimension label vector set of every song is poly- as second of k-means of input data progress using in song library Class obtains categorizing songs result.
In order to preferably realize the present invention, further, the specific steps of the step S1 include:
Step S1-1: pre-processing songs all in song library, extracts period at beginning, the climax period, coda of song Period generates representative part of three WAV formatted files as every song, and carries out song to the representative part of every song ID mark;
Step S1-2: reading with the library scipy of Python and handle the audio file of all WAV formats, obtains Signal and frequency data;
Step S1-3: successively signal data progress preemphasis, framing, adding window, discrete Fourier transform, Meier are filtered, Take logarithm, discrete cosine transform, the MFCC that seeks first-order difference, MFCC coefficient is obtained to each frame in conjunction with its first-order difference to Amount.
In order to preferably realize the present invention, further, the step 1-3 specifically includes the following steps:
Step S1-3-1: preemphasis processing, promotion signal medium-high frequency are carried out to the original signal extracted in step S1-2 Part keeps signal more flat, while the formant of prominent high frequency;
Step S1-3-2: with 256 sampled points for a frame, framing is carried out to signal;And two possess one section between consecutive frame Overlapping region;
Step S1-3-3: by each frame multiplied by Hamming window (Hamming Window), increase the continuity of frame left and right ends;
Step S1-3-4: discrete Fourier transform is carried out by signal, frequency domain is gone to by time domain, obtain the width of subsequent analysis Degree spectrum;
Step S1-3-5: frequency multiplication is carried out to amplitude spectrum using melscale filter group and is added up, the frame data are obtained In the energy value of each filter corresponding frequency band, i.e., frequency domain amplitude is filtered and is simplified;
Step S1-3-6: making natural logrithm operation to energy value, is converted into human ear to the non-linear relation of perception of sound;
Step S1-3-7: bringing logarithmic energy into discrete cosine transform, divides and removes redundant data, finds out MFCC coefficient;
Step S1-3-8: dynamic difference parameter extraction is carried out to the MFCC coefficient that step S1-3-7 is obtained, obtains MFCC's First-order difference, and by its quiet behavioral characteristics in conjunction with and splice obtain final MFCC vector.
In order to preferably realize the present invention, further, the period at beginning of song refers to that song is opened in the step S1-1 Beginning 0-15 second, the climax period of song refer to that song climax starts 0-20 seconds, and the coda period of song refers to song last 20 seconds First 15 seconds.
In order to preferably realize the present invention, further, the step S3 specifically includes the following steps:
Step S3-1: the K not close MFCC vectors of Euclidean distance from each other are chosen from all MFCC vectors of song library Center vector as cluster;
Step S3-2: calculate removed in song library remaining MFCC in step S3-1 other than K MFCC vector having chosen to Measure the center vector of each cluster between Euclidean distance, by each MFCC vector incorporate into with it in nearest cluster In a cluster centered on Heart vector;
Step S3-3: the cluster center vector of K new clusters is calculated with standard k-means algorithm and obtains each dimension average value;
Step S3-4: repetition step S3-3 no longer changes or reaches the number of iterations until the center vector of K new clusters, obtains To final cluster result.
Chosen in the step S3-1 K from each other the not close MFCC vector of Euclidean distance, in particular to, select batch The K MFCC vector of distance as far as possible.Select the detailed process of a not close MFCC vector of Euclidean distance from each other of K such as Under: center vector of the MFCC vector as first initial classes cluster is randomly choosed first, is then selected at the beginning of distance first That the farthest center vector of MFCC vector as second initial classes cluster of the center vector of beginning class cluster, then reselection distance The minimum distance of the first two vector (center vector of the center vector of first initial classes cluster, second initial classes cluster) is maximum Center vector of the MFCC vector as third initial classes cluster, and so on, until selecting the center vector of K initial classes cluster.
In order to preferably realize the present invention, further, the step S4 specifically refers to the cluster final according to step S3 As a result, the ratio that N number of MFCC vector contained by every song in song library belongs to each cluster in K cluster is calculated, it is raw according to calculated result At K dimension label vector set;
Wherein: N is the MFCC vector sum of a song;K is the cluster sum of first time k-means cluster.
In order to preferably realize the present invention, further, in the step S5 second of k-means cluster number of clusters not by It is manually specified, but is intelligently determined based on " the cover thought " according to the number of songs in song library;The specific implementation steps are as follows:
Step S5-1: setting two distance thresholds D1 and D2, and D1 > D2;
Step S5-2: the K dimension label vector set of songs all in song library is fitted into a container, and establishes a cover Label vector set C etc. is to be filled;
Step S5-3: one is randomly choosed in K dimension label vector sets all from container and is not belonging to the cover label vector The label vector p of set C, and calculate the Euclidean distance of each vector in label vector p and the cover label vector set C:
If the cover label vector set C is sky, label vector p addition the cover label vector set C is performed the next step;
If D1 is all larger than at a distance from each vector in label vector p and the cover label vector set C, by label vector p The cover label vector set C is added as a new the cover label vector;
If being less than D1 in label vector p and the cover label vector set C at a distance from a certain vector, label vector p is added Enter the cover centered on the vector;
Step S5-4:, will mark if being less than D2 in label vector p and the cover label vector set C at a distance from a certain vector Label vector p is deleted and is performed the next step from container C;
Step S5-5: S5-3 and S5-4 step is repeated until being empty, the K in the cover label vector set C at this time in container Dimension label vector set is K2 class center vector of initialization of second of k-means cluster;
Step S5-6: start second of k-means cluster, calculate in song library and remove this in the K dimension label vector set of each song The Euclidean distance of all label vectors and this K2 class center vector outside K2 class center vector draws each label vector It is grouped into the cluster where the class center vector nearest from it;
Step S5-7: calculating the class center vector of K2 new clusters, that is, the standard k-means of each dimension average value is asked to calculate Method;
Step S5-8: step S5-6 and S5-7 is repeated until K2 class center vector no longer changes or reach iteration time Number.
The classification sum of the cluster number of clusters that K2 is second of k-means and song library song final classification.K- twice The cluster result that all songs in song library are finally obtained after means cluster, by categorizing songs similar in Auditory Perception to together.
Compared with prior art, the present invention have the following advantages that and the utility model has the advantages that
(1) MFCC vector is obtained and using MFCC vector as acoustic feature by feelings the present invention is based on Iterative k-means Algorithm Sense classification efficiently accurately clusters song progress, realizes the function that magnanimity song is classified automatically in song library with emotional category.
(2) the present invention is based on iteration k-means twice to be clustered to the song in song library, will be for auditory perceptual Similar categorizing songs are to together, solving the problems, such as that manual sort is difficult and accuracy rate is told somebody what one's real intentions are.
Detailed description of the invention
Fig. 1 is that the present invention is based on the general flow charts of the song clusters method of Iterative k-means Algorithm.
Fig. 2 is that the present invention is based on the MFCC vectors in the song clusters method of Iterative k-means Algorithm to extract flow chart.
Fig. 3 is the process clustered the present invention is based on first time K-means in the song clusters method of Iterative k-means Algorithm Figure.
Fig. 4 is that the present invention is based on the processes that second of K-mean in the song clusters method of Iterative k-means Algorithm is clustered Figure.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, with reference to embodiments, to of the invention Specific operation workflow is described in more detail, and exemplary embodiment of the invention and its explanation are only used for explaining this hair It is bright, it is not intended as restriction of the invention.
Embodiment 1:
A kind of song clusters method based on Iterative k-means Algorithm of the present embodiment specifically includes as shown in Figs 1-4 Following steps:
Step S1: the mel-frequency cepstrum coefficient of song in song library is extracted at times, it is each to obtain every song day part The MFCC vector of frame;
Step S2: carrying out song ID mark to MFCC vectors all in step S1, and the affiliated song of each MFCC vector is believed Breath, which converges, is made into data set;
Step S3: first time k-means cluster is carried out using the data set in step S2 as input data, is obtained each Cluster information belonging to MFCC vector, obtains the cluster result of MFCC vector;
Step S4: the MFCC vector as contained by every song generates K dimension label corresponding with the song in the ratio of each cluster Vector set;Every song has and only one corresponding K dimension label vector set;
Step S5: the K dimension label vector set of every song is poly- as second of k-means of input data progress using in song library Class obtains categorizing songs result.
The present invention carries out the song in song library with iteration k-means twice using MFCC coefficient as acoustic feature Cluster.Cluster extracts label in the song library, i.e., traditional characteristic vector pickup mistake for all MFCC vectors in song library for the first time Journey.Second of cluster carries out the song clusters in song library according to the label vector after processing, will be for auditory perceptual similar temperament Categorizing songs are to together.
Embodiment 2:
The present embodiment limits the Meier frequency for extracting song in song library in the step S1 at times based on embodiment 1 Rate cepstrum coefficient and the 26 dimension MFCC vectors for obtaining every each frame of song day part, the i.e. feature of first time k-means cluster Vector is 26 dimension MFCC vectors.Meanwhile the value that the cluster sum K that first time k-means is clustered is arranged in the step S3 is 50, i.e., 50 representation vectors in MFCC vector in song library with symbolic characteristic are extracted by first time k-means cluster.
Based on above-mentioned restriction, song clusters method of one of the present embodiment based on Iterative k-means Algorithm is specific to wrap Include step S1~S5.
Step S1: the mel-frequency cepstrum coefficient of song in song library is extracted at times, it is each to obtain every song day part The MFCC vector of frame.
Step S1 is specifically referred to:
Step S1-1: pre-processing songs all in song library, extracts period at beginning, the climax period, coda of song Period generates representative part of three WAV formatted files as every song, and carries out song to the representative part of every song ID mark;
Step S1-2: reading with the library scipy of Python and handle the audio file of all WAV formats, obtains Signal data and sample frequency;
Step S1-3: successively signal data progress preemphasis, framing, adding window, discrete Fourier transform, Meier are filtered, Logarithm is taken, discrete cosine transform, first-order difference is sought, MFCC coefficient is obtained to 26 dimension MFCC of each frame in conjunction with its first-order difference Vector.
The step 1-3 specifically includes step S1-3-1~S1-3-8:
Step S1-3-1: preemphasis.It refers specifically to: the original signal data extracted in step S1-2 is carried out at preemphasis Reason, promotion signal medium-high frequency part keep signal more flat, while the formant of prominent high frequency.
Realize preemphasis processing specific formula is as follows:
S ' (n)=S (n)-β * S (n);
Wherein, β is to aggravate coefficient, and β is set as 0.95;
S ' (n) is the audio digital signals after aggravating;
S (n) is the audio digital signals after sampling.
When aggravating factor beta is 0.95, the formula of preemphasis processing be can simplify as S '(n)=0.05S(n)
Step S1-3-2: framing.It refers specifically to: with 256 sampled points for a frame, framing being carried out to signal.In order to avoid facing The variation of nearly two frame is excessive, therefore can allow between two consecutive frames and possess one section of overlapping region.
Step S1-3-3: adding window.It refers specifically to: by each frame multiplied by Hamming window (Hamming Window), increasing frame or so The continuity at both ends.
In order to handle voice signal, need to carry out voice signal adding window, that is, the primary data only handled in window.Cause It is very long for actual voice signal, very long data must disposably can't be handled in actual mechanical process.It is bright The solution of intelligence is exactly to take one piece of data every time, is analyzed, and then removes one piece of data again, then analyzed.
One piece of data is taken in a manner of constructing such a function of Hamming window in the present invention.Hamming window is a kind of function, it Shape as window, similar function is called window function.
Realize adding window specific formula is as follows:
Wherein, a is window coefficient, and a takes 0.46;
N is the size of frame;
N=0,1 ..., N-1.
Step S1-3-4: discrete Fourier transform, FFT.It refers specifically to: carrying out discrete Fourier transform and convert time-domain signal Subsequent analysis is carried out to frequency domain.
Step S1-3-5: Meier filtering.Specific implementation is that the amplitude spectrum obtained using S1-3-4 is filtered with 26 respectively Each of wave device filter carries out frequency multiplication and adds up, and obtains the frame data in the energy of each filter corresponding frequency band Value;It is filtered using melscale filter group, frequency domain amplitude is simplified.
Step S1-3-6: logarithm is taken.It refers specifically to: natural logrithm operation being made to energy value, is converted into human ear to perception of sound Non-linear relation.
Step S1-3-7: discrete cosine transform, DCT.It refers specifically to: bringing logarithmic energy into discrete cosine transform, divide except superfluous Remainder evidence finds out MFCC coefficient.
Realize discrete cosine transform specific formula is as follows:
Wherein, L is MFCC coefficient order, and L takes 13;
M is Meier number of filter;
C(n)For MFCC coefficient;
M is filter ID;
S(m)For the logarithmic energy of filter group output.
Step S1-3-8: seek first-order difference, by MFCC coefficient in conjunction with its first-order difference.It refers specifically to: S1-3-7 is arrived MFCC coefficient carry out dynamic difference parameter extraction, obtain the first-order difference of MFCC, and will its quiet behavioral characteristics combine after splice Obtain 26 final dimension MFCC vectors.
It is as follows to implement formula:
Wherein: indicating t-th of first-order difference;
Indicate t-th of cepstrum coefficient;
Indicate the t+1 cepstrum coefficient;
Indicate the t-1 cepstrum coefficient;
Indicate the t+k cepstrum coefficient;
Indicate the t-k cepstrum coefficient;
Indicate the specific order of MFCC coefficient;
Indicate the traversal of K;
Indicate the order of cepstrum coefficient;
It indicates the time difference of single order inverse, can use 1 or 2.
Step S2: carrying out song ID mark to MFCC vectors all in step S1, and the affiliated song of each MFCC vector is believed Breath, which converges, is made into data set.Song ID mark is carried out to each frame MFCC vector, indicates it belongs to which song in song library, simultaneously All MFCC vectors are convenient for subsequent operation as data set using in song library.
Step S3: first time k-means cluster is carried out using the data set in step S2 as input data, is obtained each Cluster information belonging to MFCC vector, obtains the cluster result of MFCC vector.When carrying out first time k-means cluster operation, K is set Value be 50, it is intended to 50 in MFCC vector with symbolic characteristic are extracted in song library by first time k-means cluster Representation vector.
The step S3 specifically includes step S3-1~step S3-4.
Step S3-1: 50 less close MFCC vectors of Euclidean distance from each other are chosen from all MFCC vectors of song library Center vector as initial cluster;
Step S3-2: calculating the Euclidean distance in song library between remaining MFCC vector and the center vector of each cluster, will be every A MFCC vector incorporates into a cluster centered on its center vector apart from nearest cluster;
Step S3-3: calculating the cluster center vector of 50 new clusters, that is, seeks the standard k-means algorithm of each dimension average value;
Step S3-4: step S3-3 is repeated until 50 cluster center vectors no longer change or reach the number of iterations;
Step S4: it according to the cluster result that step S3 is final, calculates N number of MFCC vector contained by every song in song library and belongs to The ratio of 50 class clusters;
50 dimension label vectors are generated according to calculated resultni(i=1,2 ..., 50).
Wherein: niBelong to the quantity of ith cluster, i=1,2 ..., K for the MFCC vector of song;
N is the MFCC vector sum of a song;
K is the cluster sum of first time k-means cluster.
Step S5: it is poly- to carry out second of k-means as feature vector for 50 dimension label vectors of every song using in song library Class, and last song classification number is intelligently determined with the quantity of song in song library based on " the cover " thought, finally obtain song Classification results.
Traditional clustering algorithm is all feasible solution for the application problem of general small data quantity, but when number When increasing according to amount in geometry grade, enforcement difficulty also can be higher and higher.Here data volume growth refers to: 1. the entry of data is very More, there are many sample data vector that entire data set includes;2. it is very big for each its dimension of sample data vector in 1., i.e., Include multiple attributes;3. there are many center vector to be clustered.
The cover thought in the application is used in the thick cluster before K mean value.In view of K mean value has to determine in use The size of K, and often data set not can determine that the value size of K in advance, in this way if K take unreasonable can bring K mean value Error is very big, that is to say, that K mean value is poor to the anti-interference ability of noise.
The concrete thought of the cover thought in the present invention are as follows: generated using a simple distance calculation method with a fixed number The subset of amount being overlapped.Data are divided into the different subsets being overlapped by coarse distance calculating method, are then only counted The sample data vector in the same overlapping subset is calculated to reduce the sample size for needing distance to calculate, while obtaining Subset number is the cluster number of clusters K of k-means algorithm.
The step S5 specifically includes step S5-1~step S5-5.
Step S5-1: setting two distance thresholds D1 and D2, and D1 > D2;
Step S5-2: the K dimension label vector set of songs all in song library is fitted into a container, and establishes a cover Label vector set C etc. is to be filled;When establishing lid label vector the cover label vector set C for the first time, lid label vector the cover mark Signing vector set C is sky;
Step S5-3: one is randomly choosed in K dimension label vector sets all from container and is not belonging to the cover label vector The label vector p of set C, and calculate the Euclidean distance of each vector in label vector p and the cover label vector set C:
If the cover label vector set C is sky, label vector p addition the cover label vector set C is performed the next step;
If D1 is all larger than at a distance from each vector in label vector p and the cover label vector set C, by label vector p The cover label vector set C is added as a new the cover label vector;
If being less than D1 in label vector p and the cover label vector set C at a distance from a certain vector, label vector p is added Enter the cover centered on the vector;
Step S5-4:, will mark if being less than D2 in label vector p and the cover label vector set C at a distance from a certain vector Label vector p is deleted and is performed the next step from container C;
Step S5-5: S5-3 and S5-4 step is repeated until being empty, the K in the cover label vector set C at this time in container Dimension label vector set is K2 class center vector of initialization of second of k-means cluster;
Step S5-6: start second of k-means cluster, calculate in song library and remove this in the K dimension label vector set of each song The Euclidean distance of all label vectors and this K2 class center vector outside K2 class center vector draws each label vector It is grouped into the cluster where the class center vector nearest from it;
Step S5-7: calculating the class center vector of K2 new clusters, that is, the standard k-means of each dimension average value is asked to calculate Method;
Step S5-8: step S5-6 and S5-7 is repeated until K2 class center vector no longer changes or reach iteration time Number.
K, K2 is the cluster number of clusters of k-means, and wherein K is the cluster number of clusters of first time k-means, and K2 is second of k- The cluster number of clusters of means and the classification sum of song library song final classification.Song library is finally obtained after k-means cluster twice In all songs cluster result, by categorizing songs similar in Auditory Perception to together.
The cover is equivalent to a container in the present invention, if in step S5-3 in label vector p and the cover label vector set C The distance of a certain vector X is less than D1, then label vector p is added to the cover centered on vector X.That is the cover label vector collection Closing in C has a vector X, if label vector p is less than D1 at a distance from vector X in step S5-3, label vector p should belong to and return Enter the cover centered on vector X.
The present invention provides a kind of song clusters methods based on second iteration k-means as a result,.It can be according to song library MFCC vector solves magnanimity song according to emotional category from being referred to similar songs in music storehouse together in terms of auditory perceptual Difficult problem is clustered, more convenient song for correctly finding and oneself may like but do not know specifying information is enabled users to It is bent.
Embodiment 3:
The present embodiment advanced optimizes on the basis of embodiment 1, embodiment 2, and the period at beginning of song refers to that song is opened Beginning 0-15 second, the climax period of song refer to that song climax starts 0-20 seconds, and the coda period of song refers to song last 20 seconds First 15 seconds.
Namely refer to, songs all in song library are pre-processed in the step S1-1, extracts the song beginning (preceding 15 Second), climax (20 seconds intermediate), coda (first 15 seconds in ending 20 seconds) generates three WAV formatted files, the generation as every first song Exterior portion point, and song ID mark is carried out to it.
The other parts of the present embodiment are identical as embodiment 1 or embodiment 2, and so it will not be repeated.
The above is only presently preferred embodiments of the present invention, not does limitation in any form to the present invention, it is all according to According to technical spirit any simple modification to the above embodiments of the invention, equivalent variations, protection of the invention is each fallen within Within the scope of.

Claims (9)

1. a kind of song clusters method based on Iterative k-means Algorithm, it is characterised in that: extract song library in song MFCC to Amount is used as acoustic feature, while carrying out emotion recognition and emotion classification to song with Iterative k-means Algorithm.
2. a kind of song clusters method based on Iterative k-means Algorithm according to claim 1, it is characterised in that: fortune Emotion recognition is carried out to song with Iterative k-means Algorithm twice and emotion is sorted out.
3. a kind of song clusters method based on Iterative k-means Algorithm according to claim 2, it is characterised in that: tool Body the following steps are included:
Step S1: the mel-frequency cepstrum coefficient of song in song library is extracted at times, obtains every each frame of song day part MFCC vector;
Step S2: carrying out song ID mark to MFCC vectors all in step S1, and the affiliated song information of each MFCC vector is converged It is made into data set;
Step S3: using the data set in step S2 as input data carry out first time k-means cluster, obtain each MFCC to Cluster information belonging to amount, obtains the cluster result of MFCC vector;
Step S4: the MFCC vector as contained by every song generates K dimension label vector corresponding with the song in the ratio of each cluster Collection;
Step S5: the K dimension label vector set of every song carries out second of k-means cluster as input data using in song library, Obtain categorizing songs result.
4. a kind of song clusters method based on Iterative k-means Algorithm according to claim 3, it is characterised in that: institute The specific steps for stating step S1 include:
Step S1-1: pre-processing songs all in song library, extracts the period at beginning, climax period, coda period of song Representative part of three WAV formatted files as every song is generated, and song ID mark is carried out to the representative part of every song Know;
Step S1-2: reading with the library scipy of Python and handle the audio file of all WAV formats, obtains signal And frequency data;
Step S1-3: preemphasis successively is carried out to signal data, framing, adding window, discrete Fourier transform, Meier filtering, is taken pair Number, discrete cosine transform, the MFCC vector sought first-order difference, MFCC coefficient is obtained to each frame in conjunction with its first-order difference.
5. a kind of song clusters method based on Iterative k-means Algorithm according to claim 4, it is characterised in that: institute State step 1-3 specifically includes the following steps:
Step S1-3-1: the original signal progress preemphasis processing to being extracted in step S1-2, promotion signal medium-high frequency part, Keep signal more flat, while the formant of prominent high frequency;
Step S1-3-2: with 256 sampled points for a frame, framing is carried out to signal;And two possess one section of overlapping between consecutive frame Region;
Step S1-3-3: by each frame multiplied by Hamming window (Hamming Window), increase the continuity of frame left and right ends;
Step S1-3-4: discrete Fourier transform is carried out by signal, frequency domain is gone to by time domain, obtain the amplitude spectrum of subsequent analysis;
Step S1-3-5: frequency multiplication is carried out to amplitude spectrum using melscale filter group and is added up, obtains the frame data every The energy value of a filter corresponding frequency band, i.e., be filtered frequency domain amplitude and simplify;
Step S1-3-6: making natural logrithm operation to energy value, is converted into human ear to the non-linear relation of perception of sound;
Step S1-3-7: bringing logarithmic energy into discrete cosine transform, divides and removes redundant data, finds out MFCC coefficient;
Step S1-3-8: dynamic difference parameter extraction is carried out to the MFCC coefficient that step S1-3-7 is obtained, obtains the single order of MFCC Difference, and by its quiet behavioral characteristics in conjunction with and splice obtain final MFCC vector.
6. a kind of song clusters method based on Iterative k-means Algorithm according to claim 4, it is characterised in that: institute The period at beginning for stating song in step S1-1 refers to that song starts 0-15 seconds, and the climax period of song refers to that song climax starts 0- 20 seconds, the coda period of song referred to first 15 seconds of song last 20 seconds.
7. a kind of song clusters method based on Iterative k-means Algorithm according to claim 3, it is characterised in that: institute State step S3 specifically includes the following steps:
Step S3-1: the K not close MFCC vector conducts of Euclidean distance from each other are chosen from all MFCC vectors of song library The center vector of cluster;
Step S3-2: calculate removed in song library remaining MFCC vector in step S3-1 other than K MFCC vector having chosen and Euclidean distance between the center vector of each cluster, by each MFCC vector incorporate into with its center apart from nearest cluster to In a cluster centered on amount;
Step S3-3: the cluster center vector of K new clusters is calculated with standard k-means algorithm and obtains each dimension average value;
Step S3-4: repetition step S3-3 no longer changes or reaches the number of iterations until the center vector of K new clusters, obtains most Whole cluster result.
8. a kind of song clusters method based on Iterative k-means Algorithm according to claim 3, it is characterised in that: institute It states step S4 and specifically refers to the cluster result final according to step S3, calculate N number of MFCC vector category contained by every song in song library The ratio of each cluster in K cluster generates K dimension label vector set according to calculated result;
Wherein: N is the MFCC vector sum of a song;K is the cluster sum of first time k-means cluster.
9. a kind of song clusters method based on Iterative k-means Algorithm according to claim 3, it is characterised in that: institute The number of clusters of second of k-means cluster in step S5 is stated not by being manually specified, but based on " the cover thought " according in song library Number of songs intelligently determines;The specific implementation steps are as follows:
Step S5-1: setting two distance thresholds D1 and D2, and D1 > D2;
Step S5-2: the K dimension label vector set of songs all in song library is fitted into a container, and establishes a cover label Vector set C etc. is to be filled;
Step S5-3: one is randomly choosed in K dimension label vector sets all from container and is not belonging to the cover label vector set C Label vector p, and calculate the Euclidean distance of each vector in label vector p and the cover label vector set C:
If the cover label vector set C is sky, label vector p addition the cover label vector set C is performed the next step;
If being all larger than D1 at a distance from each vector in label vector p and the cover label vector set C, using label vector p as The cover label vector set C is added in one new the cover label vector;
If in label vector p and the cover label vector set C at a distance from a certain vector be less than D1, by label vector p be added with The cover centered on the vector;
Step S5-4: if in label vector p and the cover label vector set C at a distance from a certain vector be less than D2, by label to Amount p is deleted and is performed the next step from container C;
Step S5-5: S5-3 and S5-4 step is repeated until being sky in container, the K in the cover label vector set C ties up mark at this time Sign K2 class center vector of initialization that vector set is second of k-means cluster;
Step S5-6: start second of k-means cluster, calculate and remove this K2 in song library in the K dimension label vector set of each song The Euclidean distance of all label vectors and this K2 class center vector outside class center vector, each label vector is incorporated into In cluster where the class center vector nearest from it;
Step S5-7: calculating the class center vector of K2 new clusters, that is, seeks the standard k-means algorithm of each dimension average value;
Step S5-8: step S5-6 and S5-7 is repeated until K2 class center vector no longer changes or reach the number of iterations;Most The cluster result for obtaining all songs in song library eventually, by categorizing songs similar in Auditory Perception to together;
The classification sum of the cluster number of clusters that K2 is second of k-means and song library song final classification.
CN201811010257.XA 2018-08-31 2018-08-31 Song clustering method based on iterative k-means algorithm Active CN109065071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811010257.XA CN109065071B (en) 2018-08-31 2018-08-31 Song clustering method based on iterative k-means algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811010257.XA CN109065071B (en) 2018-08-31 2018-08-31 Song clustering method based on iterative k-means algorithm

Publications (2)

Publication Number Publication Date
CN109065071A true CN109065071A (en) 2018-12-21
CN109065071B CN109065071B (en) 2021-05-14

Family

ID=64758096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811010257.XA Active CN109065071B (en) 2018-08-31 2018-08-31 Song clustering method based on iterative k-means algorithm

Country Status (1)

Country Link
CN (1) CN109065071B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767756A (en) * 2019-01-29 2019-05-17 大连海事大学 A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient
CN109979481A (en) * 2019-03-11 2019-07-05 大连海事大学 A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient
CN110867180A (en) * 2019-10-15 2020-03-06 北京雷石天地电子技术有限公司 System and method for generating word-by-word lyric file based on K-means clustering algorithm
CN112444868A (en) * 2019-08-30 2021-03-05 中国石油化工股份有限公司 Seismic facies analysis method based on improved K-means algorithm
CN113421585A (en) * 2021-05-10 2021-09-21 云境商务智能研究院南京有限公司 Audio fingerprint database generation method and device

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030205124A1 (en) * 2002-05-01 2003-11-06 Foote Jonathan T. Method and system for retrieving and sequencing music by rhythmic similarity
US20040167767A1 (en) * 2003-02-25 2004-08-26 Ziyou Xiong Method and system for extracting sports highlights from audio signals
CN101226743A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
CN102663432A (en) * 2012-04-18 2012-09-12 电子科技大学 Kernel fuzzy c-means speech emotion identification method combined with secondary identification of support vector machine
US8595005B2 (en) * 2010-05-31 2013-11-26 Simple Emotion, Inc. System and method for recognizing emotional state from a speech signal
CN103440873A (en) * 2013-08-27 2013-12-11 大连理工大学 Music recommendation method based on similarities
CN104077598A (en) * 2014-06-27 2014-10-01 电子科技大学 Emotion recognition method based on speech fuzzy clustering
US8867891B2 (en) * 2011-10-10 2014-10-21 Intellectual Ventures Fund 83 Llc Video concept classification using audio-visual grouplets
US9141604B2 (en) * 2013-02-22 2015-09-22 Riaex Inc Human emotion assessment reporting technology—system and method
US20160027452A1 (en) * 2014-07-28 2016-01-28 Sone Computer Entertainment Inc. Emotional speech processing
CN107195312A (en) * 2017-05-05 2017-09-22 深圳信息职业技术学院 Determination method, device, terminal device and the storage medium of emotional disclosure pattern
CN107273841A (en) * 2017-06-09 2017-10-20 北京工业大学 A kind of electric sensibility classification method of the brain based on EMD and gaussian kernel function SVM
CN107591162A (en) * 2017-07-28 2018-01-16 南京邮电大学 Sob recognition methods and intelligent safeguard system based on pattern match
CN107895027A (en) * 2017-11-17 2018-04-10 合肥工业大学 Individual feelings and emotions knowledge mapping method for building up and device
CN108197282A (en) * 2018-01-10 2018-06-22 腾讯科技(深圳)有限公司 Sorting technique, device and the terminal of file data, server, storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030205124A1 (en) * 2002-05-01 2003-11-06 Foote Jonathan T. Method and system for retrieving and sequencing music by rhythmic similarity
US20040167767A1 (en) * 2003-02-25 2004-08-26 Ziyou Xiong Method and system for extracting sports highlights from audio signals
CN101226743A (en) * 2007-12-05 2008-07-23 浙江大学 Method for recognizing speaker based on conversion of neutral and affection sound-groove model
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
US8595005B2 (en) * 2010-05-31 2013-11-26 Simple Emotion, Inc. System and method for recognizing emotional state from a speech signal
US8867891B2 (en) * 2011-10-10 2014-10-21 Intellectual Ventures Fund 83 Llc Video concept classification using audio-visual grouplets
CN102663432A (en) * 2012-04-18 2012-09-12 电子科技大学 Kernel fuzzy c-means speech emotion identification method combined with secondary identification of support vector machine
US9141604B2 (en) * 2013-02-22 2015-09-22 Riaex Inc Human emotion assessment reporting technology—system and method
CN103440873A (en) * 2013-08-27 2013-12-11 大连理工大学 Music recommendation method based on similarities
CN104077598A (en) * 2014-06-27 2014-10-01 电子科技大学 Emotion recognition method based on speech fuzzy clustering
US20160027452A1 (en) * 2014-07-28 2016-01-28 Sone Computer Entertainment Inc. Emotional speech processing
CN105575388A (en) * 2014-07-28 2016-05-11 索尼电脑娱乐公司 Emotional speech processing
CN107195312A (en) * 2017-05-05 2017-09-22 深圳信息职业技术学院 Determination method, device, terminal device and the storage medium of emotional disclosure pattern
CN107273841A (en) * 2017-06-09 2017-10-20 北京工业大学 A kind of electric sensibility classification method of the brain based on EMD and gaussian kernel function SVM
CN107591162A (en) * 2017-07-28 2018-01-16 南京邮电大学 Sob recognition methods and intelligent safeguard system based on pattern match
CN107895027A (en) * 2017-11-17 2018-04-10 合肥工业大学 Individual feelings and emotions knowledge mapping method for building up and device
CN108197282A (en) * 2018-01-10 2018-06-22 腾讯科技(深圳)有限公司 Sorting technique, device and the terminal of file data, server, storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S. SULTANA ET AL.: "《A non-hierarchical approach of speech emotion recognition based on enhanced wavelet coefficients and K-means clustering》", 《IEEE 2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV)》 *
谭发曾: "《语音情感状态模糊识别研究》", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109767756A (en) * 2019-01-29 2019-05-17 大连海事大学 A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient
CN109979481A (en) * 2019-03-11 2019-07-05 大连海事大学 A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient
CN112444868A (en) * 2019-08-30 2021-03-05 中国石油化工股份有限公司 Seismic facies analysis method based on improved K-means algorithm
CN112444868B (en) * 2019-08-30 2024-04-09 中国石油化工股份有限公司 Seismic phase analysis method based on improved K-means algorithm
CN110867180A (en) * 2019-10-15 2020-03-06 北京雷石天地电子技术有限公司 System and method for generating word-by-word lyric file based on K-means clustering algorithm
CN110867180B (en) * 2019-10-15 2022-03-29 北京雷石天地电子技术有限公司 System and method for generating word-by-word lyric file based on K-means clustering algorithm
CN113421585A (en) * 2021-05-10 2021-09-21 云境商务智能研究院南京有限公司 Audio fingerprint database generation method and device

Also Published As

Publication number Publication date
CN109065071B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN109065071A (en) A kind of song clusters method based on Iterative k-means Algorithm
CN101599271B (en) Recognition method of digital music emotion
Umapathy et al. Audio signal feature extraction and classification using local discriminant bases
CN109493881B (en) Method and device for labeling audio and computing equipment
CN103177722B (en) A kind of song retrieval method based on tone color similarity
CN109767785A (en) Ambient noise method for identifying and classifying based on convolutional neural networks
CN105872855A (en) Labeling method and device for video files
CN108280164B (en) Short text filtering and classifying method based on category related words
CN113327626A (en) Voice noise reduction method, device, equipment and storage medium
CN103761965A (en) Method for classifying musical instrument signals
Cai et al. Music genre classification based on auditory image, spectral and acoustic features
CN110399522A (en) A kind of music singing search method and device based on LSTM and layering and matching
Foucard et al. Multi-scale temporal fusion by boosting for music classification.
Ghosal et al. Speech/music classification using empirical mode decomposition
Yu Research on music emotion classification based on CNN-LSTM network
Nagavi et al. Content based audio retrieval with MFCC feature extraction, clustering and sort-merge techniques
Qin et al. A bag-of-tones model with MFCC features for musical genre classification
Chen et al. Cross-cultural music emotion recognition by adversarial discriminative domain adaptation
Mangalam et al. Emotion Recognition from Mizo Speech: A Signal Processing Approach
Kumar et al. Hilbert Spectrum based features for speech/music classification
Yue et al. Speaker age recognition based on isolated words by using SVM
Thiruvengatanadhan Music genre classification using mfcc and aann
Fu Application of an Isolated Word Speech Recognition System in the Field of Mental Health Consultation: Development and Usability Study
Mohammed et al. Arabic Speaker Identification System Using Multi Features
Ghosal et al. Speech/music discrimination using perceptual feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Dai Xinxuan

Inventor after: Jiang Chunhua

Inventor after: Gong Chao

Inventor after: Xu Ruohang

Inventor after: Liu Yaofang

Inventor after: Wang Jie

Inventor before: Jiang Chunhua

Inventor before: Dai Xinxuan

Inventor before: Gong Chao

Inventor before: Xu Ruohang

Inventor before: Liu Yaofang

Inventor before: Wang Jie

GR01 Patent grant
GR01 Patent grant