CN108648767A

CN108648767A - A kind of popular song emotion is comprehensive and sorting technique

Info

Publication number: CN108648767A
Application number: CN201810305399.2A
Authority: CN
Inventors: 孙书韬; 王永滨; 曹轶臻; 王�琦; 赵庄言
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2018-10-12
Anticipated expiration: 2038-04-08
Also published as: CN108648767B

Abstract

A kind of popular song emotion synthesis is related to audio-frequency information process field with sorting technique.First, music refrain is carried out to a piece of music and pattern discrimination occurs, determine different pop music patterns；Secondly, a song is divided by N number of segment using a kind of flexible segmentation method, its pleasure degree and fierceness degree is predicted to each segment；According to the pattern of a first pop music and the pleasant degree of N number of snatch of music with fierceness degree as a result, the different grader of selection, carries out whole song emotion and integrate, obtain the affective tag of whole song.It the advantage is that and the extraction of V/A emotion evolution Features is carried out using flexible partition technology, be respectively processed, keep the training to different structure popular song emotion classifiers more targeted；It carries out carrying out song emotional semantic classification using popular song structure and emotion evolution Feature, carries out comprehensive method with the simple statistical property based on whole first song, can more reflect emotion cognition process and feature of the mankind to music.

Description

A kind of popular song emotion is comprehensive and sorting technique

Technical field

The present invention relates to a kind of automatic pop music sensibility classification method towards full song of audio-frequency information process field

Background technology

The research object of the current method for song emotional semantic classification is mostly the segment handled in song, and one basic Thinking be to partition clips into the frame of fixed length, frame is directly carried out emotional semantic classification then count snatch of song in occupy an leading position Affective style as snatch of song affective style label.Also have using frame bag^[2]Mode model, then based on frame bag carry out What whole section of song was classified, but these methods are not accounted for when appreciating song in human emotion's response in feature.It is practical On, people are influenced the emotion perception of entire song by emotion behavior in song different location, also by emotion table The influence of existing evolution, traditional frame bag feature have ignored these factors.People also proposed use refrain for representative section into The scheme of row song emotional semantic classification^[3], but do not provide the method for carrying out emotion synthesis according to different paragraphs.The present invention is based on songs The observation and analysis of bent structure law performance and audience's music emotion identification process design a kind of two-stage emotion synthesis and classification Method differentiates the affective tag of entire song.

The song emotion integrated approach design Main Basiss of the present invention are observed as follows：One, song emotion behavior is in a timing Between be stable in section；Two, the different paragraphs of song are different the emotional expression contribution degree of song entirety, and emotion develops Emotion cognition on whole song is influential；Three, the structure of most of song be defer to certain rule, that is, prelude, Tail plays, refrain, appearing in for main song etc. defer to certain rule in the relative position of song, although may have exception and be not ten Divide stringent.

Invention content

The present invention provides the technical solution that a kind of pop music carries out automatic song emotion synthesis and classification.Song emotion Comprehensive and classification in two stages, first, carries out music refrain to a piece of music and pattern discrimination occurs, determine different pop musics Pattern；Secondly a song is divided by (time that the size of N occurs with song refrain of N number of segment using a kind of flexible segmentation method Number is related), its pleasure degree and fierceness degree are predicted to each segment；Secondly, the pattern according to a first pop music and N number of music The pleasant degree of segment is with fierceness degree as a result, the different grader of selection, carries out whole song emotion synthesis, obtain whole song Affective tag.

Chant music emotion synthesis is divided into two stages by the present invention.First is the fierce degree to a first snatch of music sung It is predicted with pleasant degree, forms song emotion evolutionary series.

The emotion evolutionary series of song are built upon on the basis of song segmentation.In order to complete to divide a piece of music There is situation by refrain to a first popular song and classify firstly the need of progress popular song structural analysis in section, the present invention.

The typical structure of popular song is that prelude, main song 1, refrain, main song 2, refrain, main song 3, refrain, tail are played.It is not institute Some popular songs all strictly defer to this format, some songs have certain variation, can have bridge section etc. between major-minor song.

The present invention carries out refrain identification using refrain recognizer, and after refrain identification, a song will be presented other sections, pair Song, Qi Taduan, the pattern that refrain is alternately present, wherein other sections include prelude, main song, bridge section or combinations thereof.The present invention according to Popular song is divided into k classes by the repeat pattern that refrain occurs, and is divided into no refrain structure, and 2 refrains occur, and 3 refrains go out There is structure in existing ..., k refrain, and k is generally taken to be not more than 5.If song mode discriminator identifies that refrain occurrence number is big In 5, enable k=5, be classified as with the song of k=5 one kind, and omit in subsequent processing the 6th refrain occur and behind Music content.For the ease of processing, the present invention omits last time refrain and subsequent song content occurs.

After the completion of popular music song mode detection, if detecting refrain, the beginning and ending time of each section of refrain can be obtained. Then the present invention is segmented song using a kind of flexible partition strategy, and a complete song is divided into N number of segment.In order to Make the emotion behavior substantial equalization in a snatch of song, the duration of each segment that should be not more than 10s.In order to segment in song In position have preferable discrimination, it is related that N wants sufficiently large and with song refrain feature occur.

Convenient for processing, the flexible segmentation scheme that the present invention designs is as follows：

The first kind is without repetition refrain structure.For without refrain structure is repeated, song is divided into N=N₁=40 pieces Section.Current invention assumes that the length of popular song is generally no greater than 400s. if it is greater than 400s, discrete sampling will be carried out, equidistantly Take out N₁The segment of a 10s.For song length L<The song of 400s, fragment length Lc=L/N.

Second class is secondary repetitive structure.For secondary repetitive structure OCOC, (C represents refrain segment, and O represents other classifications Segment), the present invention quantity such as carries out with refrain section by other sections and is segmented.Each other sections of O and each refrain section C are divided into M Small fragment, each small pieces segment length are not more than 10s, if it is greater than 10s, carry out the sampled equidistant of 10s segments.Song is always divided into For N=N₂=4M segment, wherein M are positive integer, it is proposed that take 10.

Third class is repetitive structure three times.For repetitive structure OCOCOC three times, the present invention by other sections with refrain section into The segmentation of the quantity such as row.Each other sections of O and each refrain section C are divided into M small fragment, and each small pieces segment length is not more than 10s, If it is greater than 10s, the sampled equidistant of 10s segments is carried out.Song is always divided into N=N₃=6M segment, wherein M are just whole Number, it is proposed that take 7.

4th class is four repetitive structures, and the 5th class is 5 times and the above repetitive structure.For 4 repetitive structures For OCOCOCOC and 5 with last time repetitive structure, segmentation method is similar with the repetitive structure of front, is accordingly divided into N=N₄=8M and N =N₅=10M sections, M suggests taking 5 and 4 respectively.

In order to identify the emotion of snatch of music, the present invention is based on the training of the affection data of the orderly segment of music to practice musical film Section emotion fallout predictor.Valence-Arousal (V-A) model of Thayer is used in the prediction of snatch of music emotion^[1]To indicate Emotion is divided into two dimensions of pleasant degree (valence) and fierceness degree (arousal).Pleasant degree indicates the positive and negative of emotion Attribute, the intensity (intensity) of fierce degree instruction emotion.Music emotion is expressed as fierce degree and pleasure degree index<v,a >, the real number of the value range of v, a between [- 1 ,+1].Emotion prediction model towards snatch of music is to stablize emotion by having The trained gained of snatch of music of expression, the present invention are referred to as V/A fallout predictors, are by snatch of music acoustic feature to V/A values One mapping is typicallyed represent such as formula 1,2, different with specific reference to the grader selected when implementing.

V=f_V(x₁,x₂,…,x_i,…,x_n) (1)

A=f_A(x₁,x₂,…,x_i,…,x_n) (2)

Wherein x_i(i=1 ..., n) is i-th of acoustic feature value of snatch of music, and n is to carry out the selection of V/A value prediction types Muscial acoustics feature quantity.

One first complete popular song is needed to identify the emotional semantic classification of entire song according to whole song feelings Sense performance carries out compressive classification.In order to accurately integrate the emotion of a song, the present invention identifies the different structure mould of song first Formula trains different emotion classifiers to carry out song emotion synthesis and classification for the song of different structure.It is presently believed that structure Similar song, the role that the identical snatch of song of relative position serves as in song emotion behavior have certain similitude. For each song, is predicted by fragment emotion, N number of fierceness degree index and N number of pleasure degree index can be obtained, this two class index A sequence E=＜ a can be combined into₁,v₁,a₂,v₂,...,a_N, v_NThe input feature vector that ＞ is integrated as emotion.It is integrated in emotion Stage, the present invention predict the affective tag of a song using this sequence, this feature not only reflects the feelings of entire song Feel statistical property, also reflects the temporal characteristics of song emotion behavior and the emotion behavior of different snatch of music.

In order to complete whole first song emotion synthesis, the present invention needs training for the grader of song emotion synthesis.It is inputted For a certain class formation song emotion evolutionary series E, export as song affective tag.Song emotion integrated classifier will be directed to difference The song that pattern occurs in refrain is respectively trained, and obtains 5 song emotion integrated classifiers, sorts out with aforementioned song pattern opposite It answers.The acquisition of song emotion evolutionary series E is dependent on aforementioned song pattern-recognition and song segmentation and A/V fallout predictors.Song The grader f of emotion synthesis_j, general type such as formula 3, concrete functional form is different according to the grader for implementing selection.

f_jFor the corresponding emotion compressive classification function of jth class formation.L_jTo use f_jThe tag along sort of gained, N_jMeaning is Five kinds of structure songs carry out the correspondence the piece number of fragment, f_jInput be counter structure song emotion evolutionary series.

The system framework of method proposed by the invention such as attached drawing 1 includes mainly V/A fallout predictors training module, emotion point Class device training module and song emotion compressive classification module.Song emotion emotion compressive classification module realizes in two stages, One stage carried out song pattern-recognition segmentation and is generated with emotion evolutionary series, and second stage carries out entire song using grader Emotion integrates and classification.

The present invention gives the emotion behaviors of a kind of consideration music different location and paragraph to entire song affective tag shadow Loud emotion integrated approach.It the advantage is that (1) method of pattern occurs using pop music refrain of presorting, it is special according to structure Sign right pop song is sorted out in advance, carries out the extraction of V/A emotion evolution Features using flexible partition technology, is respectively processed, Keep the training to different structure popular song emotion classifiers more targeted；(2) popular song structure and emotion is used to develop Feature carries out carrying out song emotional semantic classification, carries out comprehensive method with the simple statistical property based on whole first song, can more reflect Emotion cognition process and feature of the mankind to music.

Description of the drawings

A kind of pop music emotion synthesis of Fig. 1 and sorting technique system architecture diagram

Fig. 2 refrain detecting steps

Fig. 3 tonality feature matrixes example (450 beats, 12 tones)

The example of self similarity matrixes of the Fig. 4 mono- based on tonality feature

A kind of pop music emotion synthesis of Fig. 5 and sorting technique embodiment system architecture diagram

Specific implementation mode：

V/A fallout predictor training modules complete the training of popular song V/A fallout predictors, include mainly snatch of music feature extraction With two submodules of training.Feature extraction submodule is responsible for extracting the acoustic features such as tone color, tone, the beat of segment.Then with Corresponding A/V mark values input A/V fallout predictor training modules and are trained together.

Emotion classifiers training module includes feature extraction, song pattern-recognition, song segmentation, V/A fallout predictors, emotion point Class device trains submodule.Feature extraction submodule is responsible for the extraction of song acoustic feature, and song pattern recognition module identifies prevalence Song pattern and each section of split position, song segmentation module are completed according to song pattern, each section of split position and song length Flexible partition forms the snatch of song no longer than 10s, emotion Evolution Series is generated through V/A fallout predictors, with song affective tag one It plays input emotion classifiers training submodule and carries out emotion classifiers training.

Song emotion compressive classification module includes mainly feature extraction, song pattern-recognition, song segmentation, V/A predictions, feelings The several submodules of sense classification.After the emotion evolutionary series generated by V/A fallout predictors enter emotion classifiers, emotion classifiers according to The result of song pattern-recognition selects corresponding prediction model to carry out emotion synthesis and the classification of a song, exports most possible Affective tag or emotion ranking results.

In order to implement the present invention, a certain number of pop music materials marked, including popular music segments V/A are needed Value mark and whole first pop music affective tag mark.V/A values mark using section numerical value, such as pleasure degree V take [- 1 ,+1] it Between real number, -1 represents extreme negative emotions, and+1 represents extreme positive mood；Mobility takes the numerical value between [- 1 ,+1], -1 generation Table is very gentle, and+1 deputy activity degree is very fierce.Affective tag is generally divided into impassioned, glad, happy, light, tranquil, sad Wound, leads off, is nervous, boring etc. at indignation, affective tag be not limited to it is above-mentioned several, it is related to applying.

It can be, but not limited to extract the muscial acoustics feature such as table 1 in the embodiment of the present invention for training V/A fallout predictors.V/ A fallout predictors use multiple linear regression to predict in the present embodiment.Input data is the acoustic feature and mark of popular music segments V/A values are noted, are exported as predictor parameter.Returning for pleasant degree V and fierceness degree A can be respectively trained in the V/A fallout predictors of the present embodiment Return fallout predictor.By taking pleasant degree V regressive predictors as an example, anticipation function such as formula 4, loss function J such as formula 5.

V=h_θ(x₀,x₂,...,x_n)=θ^TX=θ₀x₀+θ₁x₁+θ₂x₂+…+θ_nx_n (4)

Wherein h_θFor pleasant degree regression forecasting function, θ=(θ₀,...,θ_n) it is model parameter, x=(x₀..., x_n),x₀= 1, x₁..., x_nFor the muscial acoustics characteristic value of extraction.

Wherein m is training use-case quantity, v⁽ⁱ⁾For the pleasant degree V mark values of i-th of training use-case, x⁽ⁱ⁾It is trained for i-th The acoustic feature vector of use-case.Training V fallout predictors are carried out using the method that gradient declines.

The model of A value fallout predictors is similar with V value fallout predictors with training program.

Another step that the present invention is implemented is to carry out popular song mode detection.The identification of popular song pattern of the present invention Embodiment use the refrain detection method based on self similarity matrix.Specific steps such as Fig. 2

The present invention implements the time series for detecting the rhythm point in music signal using having algorithm first.Extracting sound After happy cadence time sequence, framing and adding window are carried out according to the cadence time point extracted, then extracts each frame of song Tone (Chroma) feature, Chroma features are a 12 dimensional vector p=(p₁,...,p₁₂), corresponding 12 pitch classification C, The Chroma characteristic values of all frames in one beat are averaged, as this by C#, D, D#, E, F, F#, G, G#, A, A#, B The Chroma features of a beat.One song Chroma eigenmatrix examples are as shown in Figure 3.

After carrying out feature extraction, the tonality feature vector and other beats of each beat are calculated using following formula The distance between tonality feature vector：

Wherein, S is self similarity matrix, and S (i, j) is the element of matrix S, and d is distance function, the present embodiment using Euclidean away from From pⁱAnd p^jIt is the tonality feature vector of ith and jth beat respectively, m is Music Day umber of beats.Fig. 4 is a self similarity matrix Example.It can see the line segment wherein containing some and main diagonal parallel from self similarity matrix, these line segments illustrate The repetition paragraph of song.

After calculating new self similarity matrix S, the embodiment of the present invention is by detecting the diagonal bars in self similarity matrix S Line detects the repeated fragment in song.In specific implementation, it according to existing achievement in research, generally takes apart from shortest 2% Point is 1, and other points are 0 progress binaryzation, and the similar matrix after the binaryzation of gained contains the piece of original similar matrix substantially Section analog information.Then refrain detection is carried out on binaryzation distance matrix.Due to the influence of noise, in two values matrix, numerical value More disperse for 1 point, it is therefore desirable to diagonally enhance two values matrix B.In the diagonal directions, if two Time gap ＜=1 second of a value between 1 point, it is for those time spans ＜ that point therebetween, which is set 1. another processing, =2 seconds stripeds, are directly set to 0 because too short repetition striped be refrain possibility it is little.

After handling in this way, having some stripeds its snatch of music represented has overlapping, will be into for such striped Row merges, and combined criterion is just to be merged, with one if the snatch of music that two stripeds represent has 80% or more coincidence New striped after item merges represents, and can be further reduced candidate striped quantity again in this way.Then choose longest 30 stripe Carry out subsequent processing.

Remaining line segment represents the snatch of song repeated, if obtaining A segments and B segment weights according to the segment detected Multiple, B segments are repeated with C segments, it can be said that A, B, C segment are repeated three times.The present invention selects number of repetition maximum and length Snatch of music of the degree more than 10 seconds is refrain.Such song will be divided into the other sections of forms being alternately present with refrain, Pattern classification can be carried out to it.

Using above-mentioned music pattern arbiter and V/A fallout predictors, sound can be carried out to the music for being labelled with emotional category The extraction of happy pattern discrimination and emotion evolutionary series E.After obtaining emotion evolutionary series, so that it may to carry out the instruction of emotion classifiers Practice.

The embodiment of the present invention selects support vector machines (SVM) grader, the training to one mode song emotion classifiers Input is its emotion evolutionary series and affective tag, is exported as SVM model parameters.

The svm classifier model that training obtains may be used for carrying out the emotional semantic classification of new song.

1 optional muscial acoustics feature of subordinate list

[1]R.E.Thayer,The Biopsychology of Mood and Arousal.Oxford,U.K.: Oxford Univ.Press,1989.

[2]J.-C.Wang,H.-S.Lee,H.-M.Wang,and S.-K.Jeng,“Learning the similarity of audio music in bag-of-frames representation from tagged music data,”in Proc.Int.Society for Music Information Retrieval Conference,2011, pp.85–90.

[3] Chia-Hung Yeh, Yu-Dun Lin, Ming-Sui Lee2and Wen-Yu Tseng, Popular Music Analysis:Chorus and Emotion Detection,Proceedings of the Second APSIPA Annual Summit and Conference,pages 907–910,Biopolis,Singapore,14-17December 2010

Claims

1. a kind of popular song emotion synthesis and sorting technique, it is characterised in that in two stages, first, sound is carried out to a piece of music There is pattern discrimination in happy refrain, determines different pop music patterns；Secondly using a kind of flexible segmentation method by a song It is divided into N number of segment, its pleasure degree and fierceness degree is predicted to each segment；Secondly, according to the pattern of a first pop music and N number of The pleasant degree of snatch of music as a result, the different grader of selection, carries out whole song emotion synthesis, obtains whole first sound with fierceness degree Happy affective tag.

2. according to the method described in claim 1, it is characterized in that, fierce degree and pleasure degree to the snatch of music of a first song are pre- It surveys, forms song emotion evolutionary series.

Refrain identification is carried out using refrain recognizer, after refrain identification, a song will be presented other sections, refrain, Qi Taduan, The pattern that refrain is alternately present, wherein other sections include prelude, main song, bridge section or combinations thereof.The repetition mould occurred according to refrain Popular song is divided into k classes by formula, is divided into no refrain structure, and 2 refrains occur, and 3 refrains occur ..., k refrain is tied Structure enables k=5, is classified as and k=5 if k is taken to identify that refrain occurrence number is more than 5 no more than 5. song mode discriminators Song it is a kind of, and omit the 6th refrain in subsequent processing and occur and its subsequent music content.In popular music song After the completion of mode detection, if detecting refrain, the beginning and ending time of each section of refrain can be obtained.Then using a kind of flexible segmentation plan Slightly song is segmented, a complete song is divided into N number of segment.The duration of each segment should be not more than 10s.Design Flexible segmentation scheme is as follows：

The first kind is without repetition refrain structure.For without refrain structure is repeated, song is divided into N=N₁=40 segments.It is assumed that The length of popular song is generally no greater than 400s. if it is greater than 400s, will carry out discrete sampling, equidistantly takes out N₁A 10s's Segment.For song length L<The song of 400s, fragment length Lc=L/N.

Second class is secondary repetitive structure.Refrain segment is represented for secondary repetitive structure OCOC, wherein C, O represents other classifications Segment the quantity such as carries out with refrain section by other sections and is segmented.Each other sections of O and each refrain section C are divided into M small fragment, Each small pieces segment length is not more than 10s, if it is greater than 10s, carries out the sampled equidistant of 10s segments.Song is always divided into N=N₂ =4M segment, wherein M are positive integer, it is proposed that take 10.

Third class is repetitive structure three times.For repetitive structure OCOCOC three times, other sections and refrain section such as are subjected at the quantity point Section.Each other sections of O and each refrain section C are divided into M small fragment, and each small pieces segment length is not more than 10s, if it is greater than 10s carries out the sampled equidistant of 10s segments.Song is always divided into N=N₃=6M segment, wherein M are positive integer, it is proposed that are taken 7。

4th class is four repetitive structures, and the 5th class is 5 times and the above repetitive structure.For 4 repetitive structure OCOCOCOC, With 5 with last time repetitive structure, segmentation method is similar with the repetitive structure of front, is accordingly divided into N=N₄=8M and N=N₅=10M Section, M suggest taking 5 and 4 respectively.

3. according to the method described in claim 1, the pleasure it is characterized in that, the affection data training based on the orderly segment of music cultivates the voice Segment emotion fallout predictor.It is indicated using the Valence-Arousal of Thayer (V-A) models in the prediction of snatch of music emotion Emotion is divided into two dimensions of pleasant degree (valence) and fierceness degree (arousal).Pleasant degree indicates the positive and negative of emotion Attribute, the intensity (intensity) of fierce degree instruction emotion.Music emotion is expressed as fierce degree and pleasure degree index<v,a >, the real number of the value range of v, a between [- 1 ,+1].Emotion prediction model towards snatch of music is to stablize emotion by having The trained gained of snatch of music of expression, referred to as V/A fallout predictors are reflected by one of snatch of music acoustic feature to V/A values It penetrates, typicallys represent such as formula 1,2, it is different with specific reference to the grader selected when implementing.

V=f_V(x₁,x₂,…,x_i,…,x_n) (1)

A=f_A(x₁,x₂,…,x_i,…,x_n) (2)

Wherein x_i(i=1 ..., n) is i-th of acoustic feature value of snatch of music, and n be the music for carrying out the selection of V/A value prediction types Acoustic feature quantity.

For each song, is predicted by fragment emotion, N number of fierceness degree index and N number of pleasure degree index can be obtained, this two groups Index can be combined into a sequence E=＜ a₁,v₁,a₂,v₂,...,a_N, v_NThe input feature vector that ＞ is integrated as emotion.In emotion Synthesis phase predicts the affective tag of a song using this sequence,

Grader of the training for song emotion synthesis, input are a certain class formation song emotion evolutionary series E, are exported as song Bent affective tag.The song for pattern occur for different refrains is respectively trained song emotion integrated classifier, obtains 5 songs Emotion integrated classifier, it is corresponding with aforementioned song pattern classification.The acquisition of song emotion evolutionary series E depends on aforementioned song Pattern-recognition and song segmentation and A/V fallout predictors.The grader f of song emotion synthesis_j, form such as formula 3.

f_jFor the corresponding emotion compressive classification function of jth class formation.L_jTo use f_jThe tag along sort of gained, N_jMeaning is five kinds of knots Structure song carries out the correspondence the piece number of fragment, f_jInput be counter structure song emotion evolutionary series.

4. according to the method described in claim 1, it is characterized in that, V/A fallout predictors are predicted using multiple linear regression.Input number According to the acoustic feature and mark V/A values for popular music segments, export as predictor parameter.Pleasure is respectively trained in V/A fallout predictors Spend the regressive predictor of V and fierceness degree A.By taking pleasant degree V regressive predictors as an example, anticipation function such as formula 4, loss function J such as formula 5.

V=h_θ(x₀,x₂,...,x_n)=θ^TX=θ₀x₀+θ₁x₁+θ₂x₂+…+θ_nx_n (4)

Wherein h_θFor pleasant degree regression forecasting function, θ=(θ₀,...,θ_n) it is model parameter, x=(x₀..., x_n),x₀=1, x₁..., x_nFor the muscial acoustics characteristic value of extraction.

Wherein m is training use-case quantity, v⁽ⁱ⁾For the pleasant degree V mark values of i-th of training use-case, x⁽ⁱ⁾For i-th of training use-case Acoustic feature vector.Training V fallout predictors are carried out using the method that gradient declines.

5. according to the method described in claim 1, it is characterized in that, first using the rhythm having in algorithm detection music signal The time series of point.After extracting the cadence time sequence of music, framing is carried out simultaneously according to the cadence time point extracted Then adding window extracts the tone Chroma features of each frame of song, after carrying out feature extraction, calculated using following formula The distance between the tonality feature vector of the tonality feature vector and other beats of each beat：

Wherein, S is self similarity matrix, and S (i, j) is the element of matrix S, and d is distance function, using Euclidean distance, pⁱAnd p^jRespectively It is the tonality feature vector of ith and jth beat, m is Music Day umber of beats.

After calculating new self similarity matrix S, detected in song by detecting the diagonal stripes in self similarity matrix S Repeated fragment.It is 1 to take the point apart from shortest 2%, and other points are 0 progress binaryzation, and it is right to obtain the similar matrix after binaryzation Refrain detection is carried out on binaryzation distance matrix afterwards.Two values matrix B is diagonally enhanced.In diagonal On, if time gap ＜=1 second of two values between 1 point, it is for those that point therebetween, which is set 1. another processing, The striped of time span ＜=2 second, is directly set to 0.

After handling in this way, have some stripeds its represent snatch of music have overlapping, such striped will be closed And combined criterion is just to be merged if the snatch of music that two stripeds represent has 80% or more coincidence, is closed with one New striped after and represents, and can be further reduced candidate striped quantity again in this way.Then choose longest 30 stripe to carry out Subsequent processing.

Remaining line segment represents the snatch of song repeated, is repeated with B segments if obtaining A segments according to the segment detected, B pieces Section is repeated with C segments, it can be said that A, B, C segment are repeated three times.It selects number of repetition maximum and length is more than 10 seconds Snatch of music is refrain.Such song will be divided into the other sections of forms being alternately present with refrain, then be carried out to it Pattern is sorted out.

Using above-mentioned music pattern arbiter and V/A fallout predictors, the music to being labelled with emotional category carries out music pattern and sentences Other and emotion evolutionary series E extraction.After obtaining emotion evolutionary series, the training of emotion classifiers is carried out.

Support vector machines (SVM) grader is selected, the training input to one mode song emotion classifiers is that its emotion develops Sequence and affective tag export as SVM model parameters.

The svm classifier model that training obtains is used to carry out the emotional semantic classification of new song.