CN108648767A - A kind of popular song emotion is comprehensive and sorting technique - Google Patents
A kind of popular song emotion is comprehensive and sorting technique Download PDFInfo
- Publication number
- CN108648767A CN108648767A CN201810305399.2A CN201810305399A CN108648767A CN 108648767 A CN108648767 A CN 108648767A CN 201810305399 A CN201810305399 A CN 201810305399A CN 108648767 A CN108648767 A CN 108648767A
- Authority
- CN
- China
- Prior art keywords
- song
- emotion
- music
- refrain
- degree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/056—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/031—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
- G10H2210/061—Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work
Abstract
A kind of popular song emotion synthesis is related to audio-frequency information process field with sorting technique.First, music refrain is carried out to a piece of music and pattern discrimination occurs, determine different pop music patterns;Secondly, a song is divided by N number of segment using a kind of flexible segmentation method, its pleasure degree and fierceness degree is predicted to each segment;According to the pattern of a first pop music and the pleasant degree of N number of snatch of music with fierceness degree as a result, the different grader of selection, carries out whole song emotion and integrate, obtain the affective tag of whole song.It the advantage is that and the extraction of V/A emotion evolution Features is carried out using flexible partition technology, be respectively processed, keep the training to different structure popular song emotion classifiers more targeted;It carries out carrying out song emotional semantic classification using popular song structure and emotion evolution Feature, carries out comprehensive method with the simple statistical property based on whole first song, can more reflect emotion cognition process and feature of the mankind to music.
Description
Technical field
The present invention relates to a kind of automatic pop music sensibility classification method towards full song of audio-frequency information process field
Background technology
The research object of the current method for song emotional semantic classification is mostly the segment handled in song, and one basic
Thinking be to partition clips into the frame of fixed length, frame is directly carried out emotional semantic classification then count snatch of song in occupy an leading position
Affective style as snatch of song affective style label.Also have using frame bag[2]Mode model, then based on frame bag carry out
What whole section of song was classified, but these methods are not accounted for when appreciating song in human emotion's response in feature.It is practical
On, people are influenced the emotion perception of entire song by emotion behavior in song different location, also by emotion table
The influence of existing evolution, traditional frame bag feature have ignored these factors.People also proposed use refrain for representative section into
The scheme of row song emotional semantic classification[3], but do not provide the method for carrying out emotion synthesis according to different paragraphs.The present invention is based on songs
The observation and analysis of bent structure law performance and audience's music emotion identification process design a kind of two-stage emotion synthesis and classification
Method differentiates the affective tag of entire song.
The song emotion integrated approach design Main Basiss of the present invention are observed as follows:One, song emotion behavior is in a timing
Between be stable in section;Two, the different paragraphs of song are different the emotional expression contribution degree of song entirety, and emotion develops
Emotion cognition on whole song is influential;Three, the structure of most of song be defer to certain rule, that is, prelude,
Tail plays, refrain, appearing in for main song etc. defer to certain rule in the relative position of song, although may have exception and be not ten
Divide stringent.
Invention content
The present invention provides the technical solution that a kind of pop music carries out automatic song emotion synthesis and classification.Song emotion
Comprehensive and classification in two stages, first, carries out music refrain to a piece of music and pattern discrimination occurs, determine different pop musics
Pattern;Secondly a song is divided by (time that the size of N occurs with song refrain of N number of segment using a kind of flexible segmentation method
Number is related), its pleasure degree and fierceness degree are predicted to each segment;Secondly, the pattern according to a first pop music and N number of music
The pleasant degree of segment is with fierceness degree as a result, the different grader of selection, carries out whole song emotion synthesis, obtain whole song
Affective tag.
Chant music emotion synthesis is divided into two stages by the present invention.First is the fierce degree to a first snatch of music sung
It is predicted with pleasant degree, forms song emotion evolutionary series.
The emotion evolutionary series of song are built upon on the basis of song segmentation.In order to complete to divide a piece of music
There is situation by refrain to a first popular song and classify firstly the need of progress popular song structural analysis in section, the present invention.
The typical structure of popular song is that prelude, main song 1, refrain, main song 2, refrain, main song 3, refrain, tail are played.It is not institute
Some popular songs all strictly defer to this format, some songs have certain variation, can have bridge section etc. between major-minor song.
The present invention carries out refrain identification using refrain recognizer, and after refrain identification, a song will be presented other sections, pair
Song, Qi Taduan, the pattern that refrain is alternately present, wherein other sections include prelude, main song, bridge section or combinations thereof.The present invention according to
Popular song is divided into k classes by the repeat pattern that refrain occurs, and is divided into no refrain structure, and 2 refrains occur, and 3 refrains go out
There is structure in existing ..., k refrain, and k is generally taken to be not more than 5.If song mode discriminator identifies that refrain occurrence number is big
In 5, enable k=5, be classified as with the song of k=5 one kind, and omit in subsequent processing the 6th refrain occur and behind
Music content.For the ease of processing, the present invention omits last time refrain and subsequent song content occurs.
After the completion of popular music song mode detection, if detecting refrain, the beginning and ending time of each section of refrain can be obtained.
Then the present invention is segmented song using a kind of flexible partition strategy, and a complete song is divided into N number of segment.In order to
Make the emotion behavior substantial equalization in a snatch of song, the duration of each segment that should be not more than 10s.In order to segment in song
In position have preferable discrimination, it is related that N wants sufficiently large and with song refrain feature occur.
Convenient for processing, the flexible segmentation scheme that the present invention designs is as follows:
The first kind is without repetition refrain structure.For without refrain structure is repeated, song is divided into N=N1=40 pieces
Section.Current invention assumes that the length of popular song is generally no greater than 400s. if it is greater than 400s, discrete sampling will be carried out, equidistantly
Take out N1The segment of a 10s.For song length L<The song of 400s, fragment length Lc=L/N.
Second class is secondary repetitive structure.For secondary repetitive structure OCOC, (C represents refrain segment, and O represents other classifications
Segment), the present invention quantity such as carries out with refrain section by other sections and is segmented.Each other sections of O and each refrain section C are divided into M
Small fragment, each small pieces segment length are not more than 10s, if it is greater than 10s, carry out the sampled equidistant of 10s segments.Song is always divided into
For N=N2=4M segment, wherein M are positive integer, it is proposed that take 10.
Third class is repetitive structure three times.For repetitive structure OCOCOC three times, the present invention by other sections with refrain section into
The segmentation of the quantity such as row.Each other sections of O and each refrain section C are divided into M small fragment, and each small pieces segment length is not more than 10s,
If it is greater than 10s, the sampled equidistant of 10s segments is carried out.Song is always divided into N=N3=6M segment, wherein M are just whole
Number, it is proposed that take 7.
4th class is four repetitive structures, and the 5th class is 5 times and the above repetitive structure.For 4 repetitive structures
For OCOCOCOC and 5 with last time repetitive structure, segmentation method is similar with the repetitive structure of front, is accordingly divided into N=N4=8M and N
=N5=10M sections, M suggests taking 5 and 4 respectively.
In order to identify the emotion of snatch of music, the present invention is based on the training of the affection data of the orderly segment of music to practice musical film
Section emotion fallout predictor.Valence-Arousal (V-A) model of Thayer is used in the prediction of snatch of music emotion[1]To indicate
Emotion is divided into two dimensions of pleasant degree (valence) and fierceness degree (arousal).Pleasant degree indicates the positive and negative of emotion
Attribute, the intensity (intensity) of fierce degree instruction emotion.Music emotion is expressed as fierce degree and pleasure degree index<v,a
>, the real number of the value range of v, a between [- 1 ,+1].Emotion prediction model towards snatch of music is to stablize emotion by having
The trained gained of snatch of music of expression, the present invention are referred to as V/A fallout predictors, are by snatch of music acoustic feature to V/A values
One mapping is typicallyed represent such as formula 1,2, different with specific reference to the grader selected when implementing.
V=fV(x1,x2,…,xi,…,xn) (1)
A=fA(x1,x2,…,xi,…,xn) (2)
Wherein xi(i=1 ..., n) is i-th of acoustic feature value of snatch of music, and n is to carry out the selection of V/A value prediction types
Muscial acoustics feature quantity.
One first complete popular song is needed to identify the emotional semantic classification of entire song according to whole song feelings
Sense performance carries out compressive classification.In order to accurately integrate the emotion of a song, the present invention identifies the different structure mould of song first
Formula trains different emotion classifiers to carry out song emotion synthesis and classification for the song of different structure.It is presently believed that structure
Similar song, the role that the identical snatch of song of relative position serves as in song emotion behavior have certain similitude.
For each song, is predicted by fragment emotion, N number of fierceness degree index and N number of pleasure degree index can be obtained, this two class index
A sequence E=< a can be combined into1,v1,a2,v2,...,aN, vNThe input feature vector that > is integrated as emotion.It is integrated in emotion
Stage, the present invention predict the affective tag of a song using this sequence, this feature not only reflects the feelings of entire song
Feel statistical property, also reflects the temporal characteristics of song emotion behavior and the emotion behavior of different snatch of music.
In order to complete whole first song emotion synthesis, the present invention needs training for the grader of song emotion synthesis.It is inputted
For a certain class formation song emotion evolutionary series E, export as song affective tag.Song emotion integrated classifier will be directed to difference
The song that pattern occurs in refrain is respectively trained, and obtains 5 song emotion integrated classifiers, sorts out with aforementioned song pattern opposite
It answers.The acquisition of song emotion evolutionary series E is dependent on aforementioned song pattern-recognition and song segmentation and A/V fallout predictors.Song
The grader f of emotion synthesisj, general type such as formula 3, concrete functional form is different according to the grader for implementing selection.
fjFor the corresponding emotion compressive classification function of jth class formation.LjTo use fjThe tag along sort of gained, NjMeaning is
Five kinds of structure songs carry out the correspondence the piece number of fragment, fjInput be counter structure song emotion evolutionary series.
The system framework of method proposed by the invention such as attached drawing 1 includes mainly V/A fallout predictors training module, emotion point
Class device training module and song emotion compressive classification module.Song emotion emotion compressive classification module realizes in two stages,
One stage carried out song pattern-recognition segmentation and is generated with emotion evolutionary series, and second stage carries out entire song using grader
Emotion integrates and classification.
The present invention gives the emotion behaviors of a kind of consideration music different location and paragraph to entire song affective tag shadow
Loud emotion integrated approach.It the advantage is that (1) method of pattern occurs using pop music refrain of presorting, it is special according to structure
Sign right pop song is sorted out in advance, carries out the extraction of V/A emotion evolution Features using flexible partition technology, is respectively processed,
Keep the training to different structure popular song emotion classifiers more targeted;(2) popular song structure and emotion is used to develop
Feature carries out carrying out song emotional semantic classification, carries out comprehensive method with the simple statistical property based on whole first song, can more reflect
Emotion cognition process and feature of the mankind to music.
Description of the drawings
A kind of pop music emotion synthesis of Fig. 1 and sorting technique system architecture diagram
Fig. 2 refrain detecting steps
Fig. 3 tonality feature matrixes example (450 beats, 12 tones)
The example of self similarity matrixes of the Fig. 4 mono- based on tonality feature
A kind of pop music emotion synthesis of Fig. 5 and sorting technique embodiment system architecture diagram
Specific implementation mode:
V/A fallout predictor training modules complete the training of popular song V/A fallout predictors, include mainly snatch of music feature extraction
With two submodules of training.Feature extraction submodule is responsible for extracting the acoustic features such as tone color, tone, the beat of segment.Then with
Corresponding A/V mark values input A/V fallout predictor training modules and are trained together.
Emotion classifiers training module includes feature extraction, song pattern-recognition, song segmentation, V/A fallout predictors, emotion point
Class device trains submodule.Feature extraction submodule is responsible for the extraction of song acoustic feature, and song pattern recognition module identifies prevalence
Song pattern and each section of split position, song segmentation module are completed according to song pattern, each section of split position and song length
Flexible partition forms the snatch of song no longer than 10s, emotion Evolution Series is generated through V/A fallout predictors, with song affective tag one
It plays input emotion classifiers training submodule and carries out emotion classifiers training.
Song emotion compressive classification module includes mainly feature extraction, song pattern-recognition, song segmentation, V/A predictions, feelings
The several submodules of sense classification.After the emotion evolutionary series generated by V/A fallout predictors enter emotion classifiers, emotion classifiers according to
The result of song pattern-recognition selects corresponding prediction model to carry out emotion synthesis and the classification of a song, exports most possible
Affective tag or emotion ranking results.
In order to implement the present invention, a certain number of pop music materials marked, including popular music segments V/A are needed
Value mark and whole first pop music affective tag mark.V/A values mark using section numerical value, such as pleasure degree V take [- 1 ,+1] it
Between real number, -1 represents extreme negative emotions, and+1 represents extreme positive mood;Mobility takes the numerical value between [- 1 ,+1], -1 generation
Table is very gentle, and+1 deputy activity degree is very fierce.Affective tag is generally divided into impassioned, glad, happy, light, tranquil, sad
Wound, leads off, is nervous, boring etc. at indignation, affective tag be not limited to it is above-mentioned several, it is related to applying.
It can be, but not limited to extract the muscial acoustics feature such as table 1 in the embodiment of the present invention for training V/A fallout predictors.V/
A fallout predictors use multiple linear regression to predict in the present embodiment.Input data is the acoustic feature and mark of popular music segments
V/A values are noted, are exported as predictor parameter.Returning for pleasant degree V and fierceness degree A can be respectively trained in the V/A fallout predictors of the present embodiment
Return fallout predictor.By taking pleasant degree V regressive predictors as an example, anticipation function such as formula 4, loss function J such as formula 5.
V=hθ(x0,x2,...,xn)=θTX=θ0x0+θ1x1+θ2x2+…+θnxn (4)
Wherein hθFor pleasant degree regression forecasting function, θ=(θ0,...,θn) it is model parameter, x=(x0..., xn),x0=
1, x1..., xnFor the muscial acoustics characteristic value of extraction.
Wherein m is training use-case quantity, v(i)For the pleasant degree V mark values of i-th of training use-case, x(i)It is trained for i-th
The acoustic feature vector of use-case.Training V fallout predictors are carried out using the method that gradient declines.
The model of A value fallout predictors is similar with V value fallout predictors with training program.
Another step that the present invention is implemented is to carry out popular song mode detection.The identification of popular song pattern of the present invention
Embodiment use the refrain detection method based on self similarity matrix.Specific steps such as Fig. 2
The present invention implements the time series for detecting the rhythm point in music signal using having algorithm first.Extracting sound
After happy cadence time sequence, framing and adding window are carried out according to the cadence time point extracted, then extracts each frame of song
Tone (Chroma) feature, Chroma features are a 12 dimensional vector p=(p1,...,p12), corresponding 12 pitch classification C,
The Chroma characteristic values of all frames in one beat are averaged, as this by C#, D, D#, E, F, F#, G, G#, A, A#, B
The Chroma features of a beat.One song Chroma eigenmatrix examples are as shown in Figure 3.
After carrying out feature extraction, the tonality feature vector and other beats of each beat are calculated using following formula
The distance between tonality feature vector:
Wherein, S is self similarity matrix, and S (i, j) is the element of matrix S, and d is distance function, the present embodiment using Euclidean away from
From piAnd pjIt is the tonality feature vector of ith and jth beat respectively, m is Music Day umber of beats.Fig. 4 is a self similarity matrix
Example.It can see the line segment wherein containing some and main diagonal parallel from self similarity matrix, these line segments illustrate
The repetition paragraph of song.
After calculating new self similarity matrix S, the embodiment of the present invention is by detecting the diagonal bars in self similarity matrix S
Line detects the repeated fragment in song.In specific implementation, it according to existing achievement in research, generally takes apart from shortest 2%
Point is 1, and other points are 0 progress binaryzation, and the similar matrix after the binaryzation of gained contains the piece of original similar matrix substantially
Section analog information.Then refrain detection is carried out on binaryzation distance matrix.Due to the influence of noise, in two values matrix, numerical value
More disperse for 1 point, it is therefore desirable to diagonally enhance two values matrix B.In the diagonal directions, if two
Time gap <=1 second of a value between 1 point, it is for those time spans < that point therebetween, which is set 1. another processing,
=2 seconds stripeds, are directly set to 0 because too short repetition striped be refrain possibility it is little.
After handling in this way, having some stripeds its snatch of music represented has overlapping, will be into for such striped
Row merges, and combined criterion is just to be merged, with one if the snatch of music that two stripeds represent has 80% or more coincidence
New striped after item merges represents, and can be further reduced candidate striped quantity again in this way.Then choose longest 30 stripe
Carry out subsequent processing.
Remaining line segment represents the snatch of song repeated, if obtaining A segments and B segment weights according to the segment detected
Multiple, B segments are repeated with C segments, it can be said that A, B, C segment are repeated three times.The present invention selects number of repetition maximum and length
Snatch of music of the degree more than 10 seconds is refrain.Such song will be divided into the other sections of forms being alternately present with refrain,
Pattern classification can be carried out to it.
Using above-mentioned music pattern arbiter and V/A fallout predictors, sound can be carried out to the music for being labelled with emotional category
The extraction of happy pattern discrimination and emotion evolutionary series E.After obtaining emotion evolutionary series, so that it may to carry out the instruction of emotion classifiers
Practice.
The embodiment of the present invention selects support vector machines (SVM) grader, the training to one mode song emotion classifiers
Input is its emotion evolutionary series and affective tag, is exported as SVM model parameters.
The svm classifier model that training obtains may be used for carrying out the emotional semantic classification of new song.
1 optional muscial acoustics feature of subordinate list
[1]R.E.Thayer,The Biopsychology of Mood and Arousal.Oxford,U.K.:
Oxford Univ.Press,1989.
[2]J.-C.Wang,H.-S.Lee,H.-M.Wang,and S.-K.Jeng,“Learning the
similarity of audio music in bag-of-frames representation from tagged music
data,”in Proc.Int.Society for Music Information Retrieval Conference,2011,
pp.85–90.
[3] Chia-Hung Yeh, Yu-Dun Lin, Ming-Sui Lee2and Wen-Yu Tseng, Popular
Music Analysis:Chorus and Emotion Detection,Proceedings of the Second APSIPA
Annual Summit and Conference,pages 907–910,Biopolis,Singapore,14-17December
2010
Claims (5)
1. a kind of popular song emotion synthesis and sorting technique, it is characterised in that in two stages, first, sound is carried out to a piece of music
There is pattern discrimination in happy refrain, determines different pop music patterns;Secondly using a kind of flexible segmentation method by a song
It is divided into N number of segment, its pleasure degree and fierceness degree is predicted to each segment;Secondly, according to the pattern of a first pop music and N number of
The pleasant degree of snatch of music as a result, the different grader of selection, carries out whole song emotion synthesis, obtains whole first sound with fierceness degree
Happy affective tag.
2. according to the method described in claim 1, it is characterized in that, fierce degree and pleasure degree to the snatch of music of a first song are pre-
It surveys, forms song emotion evolutionary series.
Refrain identification is carried out using refrain recognizer, after refrain identification, a song will be presented other sections, refrain, Qi Taduan,
The pattern that refrain is alternately present, wherein other sections include prelude, main song, bridge section or combinations thereof.The repetition mould occurred according to refrain
Popular song is divided into k classes by formula, is divided into no refrain structure, and 2 refrains occur, and 3 refrains occur ..., k refrain is tied
Structure enables k=5, is classified as and k=5 if k is taken to identify that refrain occurrence number is more than 5 no more than 5. song mode discriminators
Song it is a kind of, and omit the 6th refrain in subsequent processing and occur and its subsequent music content.In popular music song
After the completion of mode detection, if detecting refrain, the beginning and ending time of each section of refrain can be obtained.Then using a kind of flexible segmentation plan
Slightly song is segmented, a complete song is divided into N number of segment.The duration of each segment should be not more than 10s.Design
Flexible segmentation scheme is as follows:
The first kind is without repetition refrain structure.For without refrain structure is repeated, song is divided into N=N1=40 segments.It is assumed that
The length of popular song is generally no greater than 400s. if it is greater than 400s, will carry out discrete sampling, equidistantly takes out N1A 10s's
Segment.For song length L<The song of 400s, fragment length Lc=L/N.
Second class is secondary repetitive structure.Refrain segment is represented for secondary repetitive structure OCOC, wherein C, O represents other classifications
Segment the quantity such as carries out with refrain section by other sections and is segmented.Each other sections of O and each refrain section C are divided into M small fragment,
Each small pieces segment length is not more than 10s, if it is greater than 10s, carries out the sampled equidistant of 10s segments.Song is always divided into N=N2
=4M segment, wherein M are positive integer, it is proposed that take 10.
Third class is repetitive structure three times.For repetitive structure OCOCOC three times, other sections and refrain section such as are subjected at the quantity point
Section.Each other sections of O and each refrain section C are divided into M small fragment, and each small pieces segment length is not more than 10s, if it is greater than
10s carries out the sampled equidistant of 10s segments.Song is always divided into N=N3=6M segment, wherein M are positive integer, it is proposed that are taken
7。
4th class is four repetitive structures, and the 5th class is 5 times and the above repetitive structure.For 4 repetitive structure OCOCOCOC,
With 5 with last time repetitive structure, segmentation method is similar with the repetitive structure of front, is accordingly divided into N=N4=8M and N=N5=10M
Section, M suggest taking 5 and 4 respectively.
3. according to the method described in claim 1, the pleasure it is characterized in that, the affection data training based on the orderly segment of music cultivates the voice
Segment emotion fallout predictor.It is indicated using the Valence-Arousal of Thayer (V-A) models in the prediction of snatch of music emotion
Emotion is divided into two dimensions of pleasant degree (valence) and fierceness degree (arousal).Pleasant degree indicates the positive and negative of emotion
Attribute, the intensity (intensity) of fierce degree instruction emotion.Music emotion is expressed as fierce degree and pleasure degree index<v,a
>, the real number of the value range of v, a between [- 1 ,+1].Emotion prediction model towards snatch of music is to stablize emotion by having
The trained gained of snatch of music of expression, referred to as V/A fallout predictors are reflected by one of snatch of music acoustic feature to V/A values
It penetrates, typicallys represent such as formula 1,2, it is different with specific reference to the grader selected when implementing.
V=fV(x1,x2,…,xi,…,xn) (1)
A=fA(x1,x2,…,xi,…,xn) (2)
Wherein xi(i=1 ..., n) is i-th of acoustic feature value of snatch of music, and n be the music for carrying out the selection of V/A value prediction types
Acoustic feature quantity.
For each song, is predicted by fragment emotion, N number of fierceness degree index and N number of pleasure degree index can be obtained, this two groups
Index can be combined into a sequence E=< a1,v1,a2,v2,...,aN, vNThe input feature vector that > is integrated as emotion.In emotion
Synthesis phase predicts the affective tag of a song using this sequence,
Grader of the training for song emotion synthesis, input are a certain class formation song emotion evolutionary series E, are exported as song
Bent affective tag.The song for pattern occur for different refrains is respectively trained song emotion integrated classifier, obtains 5 songs
Emotion integrated classifier, it is corresponding with aforementioned song pattern classification.The acquisition of song emotion evolutionary series E depends on aforementioned song
Pattern-recognition and song segmentation and A/V fallout predictors.The grader f of song emotion synthesisj, form such as formula 3.
fjFor the corresponding emotion compressive classification function of jth class formation.LjTo use fjThe tag along sort of gained, NjMeaning is five kinds of knots
Structure song carries out the correspondence the piece number of fragment, fjInput be counter structure song emotion evolutionary series.
4. according to the method described in claim 1, it is characterized in that, V/A fallout predictors are predicted using multiple linear regression.Input number
According to the acoustic feature and mark V/A values for popular music segments, export as predictor parameter.Pleasure is respectively trained in V/A fallout predictors
Spend the regressive predictor of V and fierceness degree A.By taking pleasant degree V regressive predictors as an example, anticipation function such as formula 4, loss function
J such as formula 5.
V=hθ(x0,x2,...,xn)=θTX=θ0x0+θ1x1+θ2x2+…+θnxn (4)
Wherein hθFor pleasant degree regression forecasting function, θ=(θ0,...,θn) it is model parameter, x=(x0..., xn),x0=1,
x1..., xnFor the muscial acoustics characteristic value of extraction.
Wherein m is training use-case quantity, v(i)For the pleasant degree V mark values of i-th of training use-case, x(i)For i-th of training use-case
Acoustic feature vector.Training V fallout predictors are carried out using the method that gradient declines.
5. according to the method described in claim 1, it is characterized in that, first using the rhythm having in algorithm detection music signal
The time series of point.After extracting the cadence time sequence of music, framing is carried out simultaneously according to the cadence time point extracted
Then adding window extracts the tone Chroma features of each frame of song, after carrying out feature extraction, calculated using following formula
The distance between the tonality feature vector of the tonality feature vector and other beats of each beat:
Wherein, S is self similarity matrix, and S (i, j) is the element of matrix S, and d is distance function, using Euclidean distance, piAnd pjRespectively
It is the tonality feature vector of ith and jth beat, m is Music Day umber of beats.
After calculating new self similarity matrix S, detected in song by detecting the diagonal stripes in self similarity matrix S
Repeated fragment.It is 1 to take the point apart from shortest 2%, and other points are 0 progress binaryzation, and it is right to obtain the similar matrix after binaryzation
Refrain detection is carried out on binaryzation distance matrix afterwards.Two values matrix B is diagonally enhanced.In diagonal
On, if time gap <=1 second of two values between 1 point, it is for those that point therebetween, which is set 1. another processing,
The striped of time span <=2 second, is directly set to 0.
After handling in this way, have some stripeds its represent snatch of music have overlapping, such striped will be closed
And combined criterion is just to be merged if the snatch of music that two stripeds represent has 80% or more coincidence, is closed with one
New striped after and represents, and can be further reduced candidate striped quantity again in this way.Then choose longest 30 stripe to carry out
Subsequent processing.
Remaining line segment represents the snatch of song repeated, is repeated with B segments if obtaining A segments according to the segment detected, B pieces
Section is repeated with C segments, it can be said that A, B, C segment are repeated three times.It selects number of repetition maximum and length is more than 10 seconds
Snatch of music is refrain.Such song will be divided into the other sections of forms being alternately present with refrain, then be carried out to it
Pattern is sorted out.
Using above-mentioned music pattern arbiter and V/A fallout predictors, the music to being labelled with emotional category carries out music pattern and sentences
Other and emotion evolutionary series E extraction.After obtaining emotion evolutionary series, the training of emotion classifiers is carried out.
Support vector machines (SVM) grader is selected, the training input to one mode song emotion classifiers is that its emotion develops
Sequence and affective tag export as SVM model parameters.
The svm classifier model that training obtains is used to carry out the emotional semantic classification of new song.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810305399.2A CN108648767B (en) | 2018-04-08 | 2018-04-08 | Popular song emotion synthesis and classification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810305399.2A CN108648767B (en) | 2018-04-08 | 2018-04-08 | Popular song emotion synthesis and classification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108648767A true CN108648767A (en) | 2018-10-12 |
CN108648767B CN108648767B (en) | 2021-11-05 |
Family
ID=63745734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810305399.2A Expired - Fee Related CN108648767B (en) | 2018-04-08 | 2018-04-08 | Popular song emotion synthesis and classification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108648767B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299312A (en) * | 2018-10-18 | 2019-02-01 | 湖南城市学院 | Music rhythm analysis method based on big data |
CN109829067A (en) * | 2019-03-05 | 2019-05-31 | 北京达佳互联信息技术有限公司 | Audio data processing method, device, electronic equipment and storage medium |
CN110134823A (en) * | 2019-04-08 | 2019-08-16 | 华南理工大学 | The MIDI musical genre classification method of Markov model is shown based on normalization note |
CN110377786A (en) * | 2019-07-24 | 2019-10-25 | 中国传媒大学 | Music emotion classification method |
CN110808065A (en) * | 2019-10-28 | 2020-02-18 | 北京达佳互联信息技术有限公司 | Method and device for detecting refrain, electronic equipment and storage medium |
CN111462774A (en) * | 2020-03-19 | 2020-07-28 | 河海大学 | Music emotion credible classification method based on deep learning |
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
CN111601433A (en) * | 2020-05-08 | 2020-08-28 | 中国传媒大学 | Method and device for predicting stage lighting effect control strategy |
GB2583455A (en) * | 2019-04-03 | 2020-11-04 | Mashtraxx Ltd | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
GB2584598A (en) * | 2019-04-03 | 2020-12-16 | Mashtraxx Ltd | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
CN112614511A (en) * | 2020-12-10 | 2021-04-06 | 央视国际网络无锡有限公司 | Song emotion detection method |
CN112989105A (en) * | 2019-12-16 | 2021-06-18 | 黑盒子科技(北京)有限公司 | Music structure analysis method and system |
CN113129871A (en) * | 2021-03-26 | 2021-07-16 | 广东工业大学 | Music emotion recognition method and system based on audio signal and lyrics |
US11068782B2 (en) | 2019-04-03 | 2021-07-20 | Mashtraxx Limited | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
CN114446323A (en) * | 2022-01-25 | 2022-05-06 | 电子科技大学 | Dynamic multi-dimensional music emotion analysis method and system |
US11544565B2 (en) | 2020-10-02 | 2023-01-03 | Emotional Perception AI Limited | Processing system for generating a playlist from candidate files and method for generating a playlist |
CN112989105B (en) * | 2019-12-16 | 2024-04-26 | 黑盒子科技(北京)有限公司 | Music structure analysis method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080019031A (en) * | 2005-06-01 | 2008-02-29 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Method and electronic device for determining a characteristic of a content item |
CN101937678A (en) * | 2010-07-19 | 2011-01-05 | 东南大学 | Judgment-deniable automatic speech emotion recognition method for fidget |
KR20120021174A (en) * | 2010-08-31 | 2012-03-08 | 한국전자통신연구원 | Apparatus and method for music search using emotion model |
CN102930865A (en) * | 2012-09-21 | 2013-02-13 | 重庆大学 | Coarse emotion soft cutting and classification method for waveform music |
CN105931625A (en) * | 2016-04-22 | 2016-09-07 | 成都涂鸦科技有限公司 | Rap music automatic generation method based on character input |
-
2018
- 2018-04-08 CN CN201810305399.2A patent/CN108648767B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080019031A (en) * | 2005-06-01 | 2008-02-29 | 코닌클리케 필립스 일렉트로닉스 엔.브이. | Method and electronic device for determining a characteristic of a content item |
CN101937678A (en) * | 2010-07-19 | 2011-01-05 | 东南大学 | Judgment-deniable automatic speech emotion recognition method for fidget |
KR20120021174A (en) * | 2010-08-31 | 2012-03-08 | 한국전자통신연구원 | Apparatus and method for music search using emotion model |
CN102930865A (en) * | 2012-09-21 | 2013-02-13 | 重庆大学 | Coarse emotion soft cutting and classification method for waveform music |
CN102930865B (en) * | 2012-09-21 | 2014-04-09 | 重庆大学 | Coarse emotion soft cutting and classification method for waveform music |
CN105931625A (en) * | 2016-04-22 | 2016-09-07 | 成都涂鸦科技有限公司 | Rap music automatic generation method based on character input |
Non-Patent Citations (1)
Title |
---|
孙向琨: "音乐内容和歌词相结合的歌曲情感分类方法研究", 《硕士学位论文》 * |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109299312A (en) * | 2018-10-18 | 2019-02-01 | 湖南城市学院 | Music rhythm analysis method based on big data |
CN109299312B (en) * | 2018-10-18 | 2021-11-30 | 湖南城市学院 | Music rhythm analysis method based on big data |
CN111583890A (en) * | 2019-02-15 | 2020-08-25 | 阿里巴巴集团控股有限公司 | Audio classification method and device |
CN109829067A (en) * | 2019-03-05 | 2019-05-31 | 北京达佳互联信息技术有限公司 | Audio data processing method, device, electronic equipment and storage medium |
US11645532B2 (en) | 2019-04-03 | 2023-05-09 | Emotional Perception AI Limited | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
GB2583455A (en) * | 2019-04-03 | 2020-11-04 | Mashtraxx Ltd | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
GB2584598A (en) * | 2019-04-03 | 2020-12-16 | Mashtraxx Ltd | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
GB2584598B (en) * | 2019-04-03 | 2024-02-14 | Emotional Perception Ai Ltd | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
US11494652B2 (en) | 2019-04-03 | 2022-11-08 | Emotional Perception AI Limited | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
US11068782B2 (en) | 2019-04-03 | 2021-07-20 | Mashtraxx Limited | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
US11080601B2 (en) | 2019-04-03 | 2021-08-03 | Mashtraxx Limited | Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content |
CN110134823B (en) * | 2019-04-08 | 2021-10-22 | 华南理工大学 | MIDI music genre classification method based on normalized note display Markov model |
CN110134823A (en) * | 2019-04-08 | 2019-08-16 | 华南理工大学 | The MIDI musical genre classification method of Markov model is shown based on normalization note |
CN110377786A (en) * | 2019-07-24 | 2019-10-25 | 中国传媒大学 | Music emotion classification method |
CN110808065A (en) * | 2019-10-28 | 2020-02-18 | 北京达佳互联信息技术有限公司 | Method and device for detecting refrain, electronic equipment and storage medium |
CN112989105B (en) * | 2019-12-16 | 2024-04-26 | 黑盒子科技(北京)有限公司 | Music structure analysis method and system |
CN112989105A (en) * | 2019-12-16 | 2021-06-18 | 黑盒子科技(北京)有限公司 | Music structure analysis method and system |
CN111462774A (en) * | 2020-03-19 | 2020-07-28 | 河海大学 | Music emotion credible classification method based on deep learning |
CN111601433A (en) * | 2020-05-08 | 2020-08-28 | 中国传媒大学 | Method and device for predicting stage lighting effect control strategy |
US11544565B2 (en) | 2020-10-02 | 2023-01-03 | Emotional Perception AI Limited | Processing system for generating a playlist from candidate files and method for generating a playlist |
CN112614511A (en) * | 2020-12-10 | 2021-04-06 | 央视国际网络无锡有限公司 | Song emotion detection method |
CN113129871A (en) * | 2021-03-26 | 2021-07-16 | 广东工业大学 | Music emotion recognition method and system based on audio signal and lyrics |
CN114446323A (en) * | 2022-01-25 | 2022-05-06 | 电子科技大学 | Dynamic multi-dimensional music emotion analysis method and system |
Also Published As
Publication number | Publication date |
---|---|
CN108648767B (en) | 2021-11-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108648767A (en) | A kind of popular song emotion is comprehensive and sorting technique | |
Cramer et al. | Look, listen, and learn more: Design choices for deep audio embeddings | |
Kong et al. | High-resolution piano transcription with pedals by regressing onset and offset times | |
Lee et al. | Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging | |
Vogl et al. | Drum Transcription via Joint Beat and Drum Modeling Using Convolutional Recurrent Neural Networks. | |
Gururani et al. | An attention mechanism for musical instrument recognition | |
Oikarinen et al. | Deep convolutional network for animal sound classification and source attribution using dual audio recordings | |
Stowell | Computational bioacoustic scene analysis | |
de Benito-Gorron et al. | Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset | |
Anglade et al. | Improving music genre classification using automatically induced harmony rules | |
US20200075019A1 (en) | System and method for neural network orchestration | |
Parekh et al. | Weakly supervised representation learning for audio-visual scene analysis | |
CN106302987A (en) | A kind of audio frequency recommends method and apparatus | |
US20200066278A1 (en) | System and method for neural network orchestration | |
Hou et al. | Transfer learning for improving singing-voice detection in polyphonic instrumental music | |
Morfi et al. | Data-efficient weakly supervised learning for low-resource audio event detection using deep learning | |
Hernandez-Olivan et al. | Music boundary detection using convolutional neural networks: A comparative analysis of combined input features | |
Jeong et al. | Audio tagging system using densely connected convolutional networks. | |
Kalinli et al. | Saliency-driven unstructured acoustic scene classification using latent perceptual indexing | |
US20200058307A1 (en) | System and method for neural network orchestration | |
Cumming et al. | Using corpus studies to find the origins of the madrigal | |
Pons Puig | Deep neural networks for music and audio tagging | |
Zhong et al. | Gender recognition of speech based on decision tree model | |
O’Brien | Musical Structure Segmentation with Convolutional Neural Networks | |
Singh et al. | Deep multi-view features from raw audio for acoustic scene classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211105 |