CN108648767A - A kind of popular song emotion is comprehensive and sorting technique - Google Patents

A kind of popular song emotion is comprehensive and sorting technique Download PDF

Info

Publication number
CN108648767A
CN108648767A CN201810305399.2A CN201810305399A CN108648767A CN 108648767 A CN108648767 A CN 108648767A CN 201810305399 A CN201810305399 A CN 201810305399A CN 108648767 A CN108648767 A CN 108648767A
Authority
CN
China
Prior art keywords
song
emotion
music
refrain
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810305399.2A
Other languages
Chinese (zh)
Other versions
CN108648767B (en
Inventor
孙书韬
王永滨
曹轶臻
王�琦
赵庄言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201810305399.2A priority Critical patent/CN108648767B/en
Publication of CN108648767A publication Critical patent/CN108648767A/en
Application granted granted Critical
Publication of CN108648767B publication Critical patent/CN108648767B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/061Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work

Abstract

A kind of popular song emotion synthesis is related to audio-frequency information process field with sorting technique.First, music refrain is carried out to a piece of music and pattern discrimination occurs, determine different pop music patterns;Secondly, a song is divided by N number of segment using a kind of flexible segmentation method, its pleasure degree and fierceness degree is predicted to each segment;According to the pattern of a first pop music and the pleasant degree of N number of snatch of music with fierceness degree as a result, the different grader of selection, carries out whole song emotion and integrate, obtain the affective tag of whole song.It the advantage is that and the extraction of V/A emotion evolution Features is carried out using flexible partition technology, be respectively processed, keep the training to different structure popular song emotion classifiers more targeted;It carries out carrying out song emotional semantic classification using popular song structure and emotion evolution Feature, carries out comprehensive method with the simple statistical property based on whole first song, can more reflect emotion cognition process and feature of the mankind to music.

Description

A kind of popular song emotion is comprehensive and sorting technique
Technical field
The present invention relates to a kind of automatic pop music sensibility classification method towards full song of audio-frequency information process field
Background technology
The research object of the current method for song emotional semantic classification is mostly the segment handled in song, and one basic Thinking be to partition clips into the frame of fixed length, frame is directly carried out emotional semantic classification then count snatch of song in occupy an leading position Affective style as snatch of song affective style label.Also have using frame bag[2]Mode model, then based on frame bag carry out What whole section of song was classified, but these methods are not accounted for when appreciating song in human emotion's response in feature.It is practical On, people are influenced the emotion perception of entire song by emotion behavior in song different location, also by emotion table The influence of existing evolution, traditional frame bag feature have ignored these factors.People also proposed use refrain for representative section into The scheme of row song emotional semantic classification[3], but do not provide the method for carrying out emotion synthesis according to different paragraphs.The present invention is based on songs The observation and analysis of bent structure law performance and audience's music emotion identification process design a kind of two-stage emotion synthesis and classification Method differentiates the affective tag of entire song.
The song emotion integrated approach design Main Basiss of the present invention are observed as follows:One, song emotion behavior is in a timing Between be stable in section;Two, the different paragraphs of song are different the emotional expression contribution degree of song entirety, and emotion develops Emotion cognition on whole song is influential;Three, the structure of most of song be defer to certain rule, that is, prelude, Tail plays, refrain, appearing in for main song etc. defer to certain rule in the relative position of song, although may have exception and be not ten Divide stringent.
Invention content
The present invention provides the technical solution that a kind of pop music carries out automatic song emotion synthesis and classification.Song emotion Comprehensive and classification in two stages, first, carries out music refrain to a piece of music and pattern discrimination occurs, determine different pop musics Pattern;Secondly a song is divided by (time that the size of N occurs with song refrain of N number of segment using a kind of flexible segmentation method Number is related), its pleasure degree and fierceness degree are predicted to each segment;Secondly, the pattern according to a first pop music and N number of music The pleasant degree of segment is with fierceness degree as a result, the different grader of selection, carries out whole song emotion synthesis, obtain whole song Affective tag.
Chant music emotion synthesis is divided into two stages by the present invention.First is the fierce degree to a first snatch of music sung It is predicted with pleasant degree, forms song emotion evolutionary series.
The emotion evolutionary series of song are built upon on the basis of song segmentation.In order to complete to divide a piece of music There is situation by refrain to a first popular song and classify firstly the need of progress popular song structural analysis in section, the present invention.
The typical structure of popular song is that prelude, main song 1, refrain, main song 2, refrain, main song 3, refrain, tail are played.It is not institute Some popular songs all strictly defer to this format, some songs have certain variation, can have bridge section etc. between major-minor song.
The present invention carries out refrain identification using refrain recognizer, and after refrain identification, a song will be presented other sections, pair Song, Qi Taduan, the pattern that refrain is alternately present, wherein other sections include prelude, main song, bridge section or combinations thereof.The present invention according to Popular song is divided into k classes by the repeat pattern that refrain occurs, and is divided into no refrain structure, and 2 refrains occur, and 3 refrains go out There is structure in existing ..., k refrain, and k is generally taken to be not more than 5.If song mode discriminator identifies that refrain occurrence number is big In 5, enable k=5, be classified as with the song of k=5 one kind, and omit in subsequent processing the 6th refrain occur and behind Music content.For the ease of processing, the present invention omits last time refrain and subsequent song content occurs.
After the completion of popular music song mode detection, if detecting refrain, the beginning and ending time of each section of refrain can be obtained. Then the present invention is segmented song using a kind of flexible partition strategy, and a complete song is divided into N number of segment.In order to Make the emotion behavior substantial equalization in a snatch of song, the duration of each segment that should be not more than 10s.In order to segment in song In position have preferable discrimination, it is related that N wants sufficiently large and with song refrain feature occur.
Convenient for processing, the flexible segmentation scheme that the present invention designs is as follows:
The first kind is without repetition refrain structure.For without refrain structure is repeated, song is divided into N=N1=40 pieces Section.Current invention assumes that the length of popular song is generally no greater than 400s. if it is greater than 400s, discrete sampling will be carried out, equidistantly Take out N1The segment of a 10s.For song length L<The song of 400s, fragment length Lc=L/N.
Second class is secondary repetitive structure.For secondary repetitive structure OCOC, (C represents refrain segment, and O represents other classifications Segment), the present invention quantity such as carries out with refrain section by other sections and is segmented.Each other sections of O and each refrain section C are divided into M Small fragment, each small pieces segment length are not more than 10s, if it is greater than 10s, carry out the sampled equidistant of 10s segments.Song is always divided into For N=N2=4M segment, wherein M are positive integer, it is proposed that take 10.
Third class is repetitive structure three times.For repetitive structure OCOCOC three times, the present invention by other sections with refrain section into The segmentation of the quantity such as row.Each other sections of O and each refrain section C are divided into M small fragment, and each small pieces segment length is not more than 10s, If it is greater than 10s, the sampled equidistant of 10s segments is carried out.Song is always divided into N=N3=6M segment, wherein M are just whole Number, it is proposed that take 7.
4th class is four repetitive structures, and the 5th class is 5 times and the above repetitive structure.For 4 repetitive structures For OCOCOCOC and 5 with last time repetitive structure, segmentation method is similar with the repetitive structure of front, is accordingly divided into N=N4=8M and N =N5=10M sections, M suggests taking 5 and 4 respectively.
In order to identify the emotion of snatch of music, the present invention is based on the training of the affection data of the orderly segment of music to practice musical film Section emotion fallout predictor.Valence-Arousal (V-A) model of Thayer is used in the prediction of snatch of music emotion[1]To indicate Emotion is divided into two dimensions of pleasant degree (valence) and fierceness degree (arousal).Pleasant degree indicates the positive and negative of emotion Attribute, the intensity (intensity) of fierce degree instruction emotion.Music emotion is expressed as fierce degree and pleasure degree index<v,a >, the real number of the value range of v, a between [- 1 ,+1].Emotion prediction model towards snatch of music is to stablize emotion by having The trained gained of snatch of music of expression, the present invention are referred to as V/A fallout predictors, are by snatch of music acoustic feature to V/A values One mapping is typicallyed represent such as formula 1,2, different with specific reference to the grader selected when implementing.
V=fV(x1,x2,…,xi,…,xn) (1)
A=fA(x1,x2,…,xi,…,xn) (2)
Wherein xi(i=1 ..., n) is i-th of acoustic feature value of snatch of music, and n is to carry out the selection of V/A value prediction types Muscial acoustics feature quantity.
One first complete popular song is needed to identify the emotional semantic classification of entire song according to whole song feelings Sense performance carries out compressive classification.In order to accurately integrate the emotion of a song, the present invention identifies the different structure mould of song first Formula trains different emotion classifiers to carry out song emotion synthesis and classification for the song of different structure.It is presently believed that structure Similar song, the role that the identical snatch of song of relative position serves as in song emotion behavior have certain similitude. For each song, is predicted by fragment emotion, N number of fierceness degree index and N number of pleasure degree index can be obtained, this two class index A sequence E=< a can be combined into1,v1,a2,v2,...,aN, vNThe input feature vector that > is integrated as emotion.It is integrated in emotion Stage, the present invention predict the affective tag of a song using this sequence, this feature not only reflects the feelings of entire song Feel statistical property, also reflects the temporal characteristics of song emotion behavior and the emotion behavior of different snatch of music.
In order to complete whole first song emotion synthesis, the present invention needs training for the grader of song emotion synthesis.It is inputted For a certain class formation song emotion evolutionary series E, export as song affective tag.Song emotion integrated classifier will be directed to difference The song that pattern occurs in refrain is respectively trained, and obtains 5 song emotion integrated classifiers, sorts out with aforementioned song pattern opposite It answers.The acquisition of song emotion evolutionary series E is dependent on aforementioned song pattern-recognition and song segmentation and A/V fallout predictors.Song The grader f of emotion synthesisj, general type such as formula 3, concrete functional form is different according to the grader for implementing selection.
fjFor the corresponding emotion compressive classification function of jth class formation.LjTo use fjThe tag along sort of gained, NjMeaning is Five kinds of structure songs carry out the correspondence the piece number of fragment, fjInput be counter structure song emotion evolutionary series.
The system framework of method proposed by the invention such as attached drawing 1 includes mainly V/A fallout predictors training module, emotion point Class device training module and song emotion compressive classification module.Song emotion emotion compressive classification module realizes in two stages, One stage carried out song pattern-recognition segmentation and is generated with emotion evolutionary series, and second stage carries out entire song using grader Emotion integrates and classification.
The present invention gives the emotion behaviors of a kind of consideration music different location and paragraph to entire song affective tag shadow Loud emotion integrated approach.It the advantage is that (1) method of pattern occurs using pop music refrain of presorting, it is special according to structure Sign right pop song is sorted out in advance, carries out the extraction of V/A emotion evolution Features using flexible partition technology, is respectively processed, Keep the training to different structure popular song emotion classifiers more targeted;(2) popular song structure and emotion is used to develop Feature carries out carrying out song emotional semantic classification, carries out comprehensive method with the simple statistical property based on whole first song, can more reflect Emotion cognition process and feature of the mankind to music.
Description of the drawings
A kind of pop music emotion synthesis of Fig. 1 and sorting technique system architecture diagram
Fig. 2 refrain detecting steps
Fig. 3 tonality feature matrixes example (450 beats, 12 tones)
The example of self similarity matrixes of the Fig. 4 mono- based on tonality feature
A kind of pop music emotion synthesis of Fig. 5 and sorting technique embodiment system architecture diagram
Specific implementation mode:
V/A fallout predictor training modules complete the training of popular song V/A fallout predictors, include mainly snatch of music feature extraction With two submodules of training.Feature extraction submodule is responsible for extracting the acoustic features such as tone color, tone, the beat of segment.Then with Corresponding A/V mark values input A/V fallout predictor training modules and are trained together.
Emotion classifiers training module includes feature extraction, song pattern-recognition, song segmentation, V/A fallout predictors, emotion point Class device trains submodule.Feature extraction submodule is responsible for the extraction of song acoustic feature, and song pattern recognition module identifies prevalence Song pattern and each section of split position, song segmentation module are completed according to song pattern, each section of split position and song length Flexible partition forms the snatch of song no longer than 10s, emotion Evolution Series is generated through V/A fallout predictors, with song affective tag one It plays input emotion classifiers training submodule and carries out emotion classifiers training.
Song emotion compressive classification module includes mainly feature extraction, song pattern-recognition, song segmentation, V/A predictions, feelings The several submodules of sense classification.After the emotion evolutionary series generated by V/A fallout predictors enter emotion classifiers, emotion classifiers according to The result of song pattern-recognition selects corresponding prediction model to carry out emotion synthesis and the classification of a song, exports most possible Affective tag or emotion ranking results.
In order to implement the present invention, a certain number of pop music materials marked, including popular music segments V/A are needed Value mark and whole first pop music affective tag mark.V/A values mark using section numerical value, such as pleasure degree V take [- 1 ,+1] it Between real number, -1 represents extreme negative emotions, and+1 represents extreme positive mood;Mobility takes the numerical value between [- 1 ,+1], -1 generation Table is very gentle, and+1 deputy activity degree is very fierce.Affective tag is generally divided into impassioned, glad, happy, light, tranquil, sad Wound, leads off, is nervous, boring etc. at indignation, affective tag be not limited to it is above-mentioned several, it is related to applying.
It can be, but not limited to extract the muscial acoustics feature such as table 1 in the embodiment of the present invention for training V/A fallout predictors.V/ A fallout predictors use multiple linear regression to predict in the present embodiment.Input data is the acoustic feature and mark of popular music segments V/A values are noted, are exported as predictor parameter.Returning for pleasant degree V and fierceness degree A can be respectively trained in the V/A fallout predictors of the present embodiment Return fallout predictor.By taking pleasant degree V regressive predictors as an example, anticipation function such as formula 4, loss function J such as formula 5.
V=hθ(x0,x2,...,xn)=θTX=θ0x01x12x2+…+θnxn (4)
Wherein hθFor pleasant degree regression forecasting function, θ=(θ0,...,θn) it is model parameter, x=(x0..., xn),x0= 1, x1..., xnFor the muscial acoustics characteristic value of extraction.
Wherein m is training use-case quantity, v(i)For the pleasant degree V mark values of i-th of training use-case, x(i)It is trained for i-th The acoustic feature vector of use-case.Training V fallout predictors are carried out using the method that gradient declines.
The model of A value fallout predictors is similar with V value fallout predictors with training program.
Another step that the present invention is implemented is to carry out popular song mode detection.The identification of popular song pattern of the present invention Embodiment use the refrain detection method based on self similarity matrix.Specific steps such as Fig. 2
The present invention implements the time series for detecting the rhythm point in music signal using having algorithm first.Extracting sound After happy cadence time sequence, framing and adding window are carried out according to the cadence time point extracted, then extracts each frame of song Tone (Chroma) feature, Chroma features are a 12 dimensional vector p=(p1,...,p12), corresponding 12 pitch classification C, The Chroma characteristic values of all frames in one beat are averaged, as this by C#, D, D#, E, F, F#, G, G#, A, A#, B The Chroma features of a beat.One song Chroma eigenmatrix examples are as shown in Figure 3.
After carrying out feature extraction, the tonality feature vector and other beats of each beat are calculated using following formula The distance between tonality feature vector:
Wherein, S is self similarity matrix, and S (i, j) is the element of matrix S, and d is distance function, the present embodiment using Euclidean away from From piAnd pjIt is the tonality feature vector of ith and jth beat respectively, m is Music Day umber of beats.Fig. 4 is a self similarity matrix Example.It can see the line segment wherein containing some and main diagonal parallel from self similarity matrix, these line segments illustrate The repetition paragraph of song.
After calculating new self similarity matrix S, the embodiment of the present invention is by detecting the diagonal bars in self similarity matrix S Line detects the repeated fragment in song.In specific implementation, it according to existing achievement in research, generally takes apart from shortest 2% Point is 1, and other points are 0 progress binaryzation, and the similar matrix after the binaryzation of gained contains the piece of original similar matrix substantially Section analog information.Then refrain detection is carried out on binaryzation distance matrix.Due to the influence of noise, in two values matrix, numerical value More disperse for 1 point, it is therefore desirable to diagonally enhance two values matrix B.In the diagonal directions, if two Time gap <=1 second of a value between 1 point, it is for those time spans < that point therebetween, which is set 1. another processing, =2 seconds stripeds, are directly set to 0 because too short repetition striped be refrain possibility it is little.
After handling in this way, having some stripeds its snatch of music represented has overlapping, will be into for such striped Row merges, and combined criterion is just to be merged, with one if the snatch of music that two stripeds represent has 80% or more coincidence New striped after item merges represents, and can be further reduced candidate striped quantity again in this way.Then choose longest 30 stripe Carry out subsequent processing.
Remaining line segment represents the snatch of song repeated, if obtaining A segments and B segment weights according to the segment detected Multiple, B segments are repeated with C segments, it can be said that A, B, C segment are repeated three times.The present invention selects number of repetition maximum and length Snatch of music of the degree more than 10 seconds is refrain.Such song will be divided into the other sections of forms being alternately present with refrain, Pattern classification can be carried out to it.
Using above-mentioned music pattern arbiter and V/A fallout predictors, sound can be carried out to the music for being labelled with emotional category The extraction of happy pattern discrimination and emotion evolutionary series E.After obtaining emotion evolutionary series, so that it may to carry out the instruction of emotion classifiers Practice.
The embodiment of the present invention selects support vector machines (SVM) grader, the training to one mode song emotion classifiers Input is its emotion evolutionary series and affective tag, is exported as SVM model parameters.
The svm classifier model that training obtains may be used for carrying out the emotional semantic classification of new song.
1 optional muscial acoustics feature of subordinate list
[1]R.E.Thayer,The Biopsychology of Mood and Arousal.Oxford,U.K.: Oxford Univ.Press,1989.
[2]J.-C.Wang,H.-S.Lee,H.-M.Wang,and S.-K.Jeng,“Learning the similarity of audio music in bag-of-frames representation from tagged music data,”in Proc.Int.Society for Music Information Retrieval Conference,2011, pp.85–90.
[3] Chia-Hung Yeh, Yu-Dun Lin, Ming-Sui Lee2and Wen-Yu Tseng, Popular Music Analysis:Chorus and Emotion Detection,Proceedings of the Second APSIPA Annual Summit and Conference,pages 907–910,Biopolis,Singapore,14-17December 2010

Claims (5)

1. a kind of popular song emotion synthesis and sorting technique, it is characterised in that in two stages, first, sound is carried out to a piece of music There is pattern discrimination in happy refrain, determines different pop music patterns;Secondly using a kind of flexible segmentation method by a song It is divided into N number of segment, its pleasure degree and fierceness degree is predicted to each segment;Secondly, according to the pattern of a first pop music and N number of The pleasant degree of snatch of music as a result, the different grader of selection, carries out whole song emotion synthesis, obtains whole first sound with fierceness degree Happy affective tag.
2. according to the method described in claim 1, it is characterized in that, fierce degree and pleasure degree to the snatch of music of a first song are pre- It surveys, forms song emotion evolutionary series.
Refrain identification is carried out using refrain recognizer, after refrain identification, a song will be presented other sections, refrain, Qi Taduan, The pattern that refrain is alternately present, wherein other sections include prelude, main song, bridge section or combinations thereof.The repetition mould occurred according to refrain Popular song is divided into k classes by formula, is divided into no refrain structure, and 2 refrains occur, and 3 refrains occur ..., k refrain is tied Structure enables k=5, is classified as and k=5 if k is taken to identify that refrain occurrence number is more than 5 no more than 5. song mode discriminators Song it is a kind of, and omit the 6th refrain in subsequent processing and occur and its subsequent music content.In popular music song After the completion of mode detection, if detecting refrain, the beginning and ending time of each section of refrain can be obtained.Then using a kind of flexible segmentation plan Slightly song is segmented, a complete song is divided into N number of segment.The duration of each segment should be not more than 10s.Design Flexible segmentation scheme is as follows:
The first kind is without repetition refrain structure.For without refrain structure is repeated, song is divided into N=N1=40 segments.It is assumed that The length of popular song is generally no greater than 400s. if it is greater than 400s, will carry out discrete sampling, equidistantly takes out N1A 10s's Segment.For song length L<The song of 400s, fragment length Lc=L/N.
Second class is secondary repetitive structure.Refrain segment is represented for secondary repetitive structure OCOC, wherein C, O represents other classifications Segment the quantity such as carries out with refrain section by other sections and is segmented.Each other sections of O and each refrain section C are divided into M small fragment, Each small pieces segment length is not more than 10s, if it is greater than 10s, carries out the sampled equidistant of 10s segments.Song is always divided into N=N2 =4M segment, wherein M are positive integer, it is proposed that take 10.
Third class is repetitive structure three times.For repetitive structure OCOCOC three times, other sections and refrain section such as are subjected at the quantity point Section.Each other sections of O and each refrain section C are divided into M small fragment, and each small pieces segment length is not more than 10s, if it is greater than 10s carries out the sampled equidistant of 10s segments.Song is always divided into N=N3=6M segment, wherein M are positive integer, it is proposed that are taken 7。
4th class is four repetitive structures, and the 5th class is 5 times and the above repetitive structure.For 4 repetitive structure OCOCOCOC, With 5 with last time repetitive structure, segmentation method is similar with the repetitive structure of front, is accordingly divided into N=N4=8M and N=N5=10M Section, M suggest taking 5 and 4 respectively.
3. according to the method described in claim 1, the pleasure it is characterized in that, the affection data training based on the orderly segment of music cultivates the voice Segment emotion fallout predictor.It is indicated using the Valence-Arousal of Thayer (V-A) models in the prediction of snatch of music emotion Emotion is divided into two dimensions of pleasant degree (valence) and fierceness degree (arousal).Pleasant degree indicates the positive and negative of emotion Attribute, the intensity (intensity) of fierce degree instruction emotion.Music emotion is expressed as fierce degree and pleasure degree index<v,a >, the real number of the value range of v, a between [- 1 ,+1].Emotion prediction model towards snatch of music is to stablize emotion by having The trained gained of snatch of music of expression, referred to as V/A fallout predictors are reflected by one of snatch of music acoustic feature to V/A values It penetrates, typicallys represent such as formula 1,2, it is different with specific reference to the grader selected when implementing.
V=fV(x1,x2,…,xi,…,xn) (1)
A=fA(x1,x2,…,xi,…,xn) (2)
Wherein xi(i=1 ..., n) is i-th of acoustic feature value of snatch of music, and n be the music for carrying out the selection of V/A value prediction types Acoustic feature quantity.
For each song, is predicted by fragment emotion, N number of fierceness degree index and N number of pleasure degree index can be obtained, this two groups Index can be combined into a sequence E=< a1,v1,a2,v2,...,aN, vNThe input feature vector that > is integrated as emotion.In emotion Synthesis phase predicts the affective tag of a song using this sequence,
Grader of the training for song emotion synthesis, input are a certain class formation song emotion evolutionary series E, are exported as song Bent affective tag.The song for pattern occur for different refrains is respectively trained song emotion integrated classifier, obtains 5 songs Emotion integrated classifier, it is corresponding with aforementioned song pattern classification.The acquisition of song emotion evolutionary series E depends on aforementioned song Pattern-recognition and song segmentation and A/V fallout predictors.The grader f of song emotion synthesisj, form such as formula 3.
fjFor the corresponding emotion compressive classification function of jth class formation.LjTo use fjThe tag along sort of gained, NjMeaning is five kinds of knots Structure song carries out the correspondence the piece number of fragment, fjInput be counter structure song emotion evolutionary series.
4. according to the method described in claim 1, it is characterized in that, V/A fallout predictors are predicted using multiple linear regression.Input number According to the acoustic feature and mark V/A values for popular music segments, export as predictor parameter.Pleasure is respectively trained in V/A fallout predictors Spend the regressive predictor of V and fierceness degree A.By taking pleasant degree V regressive predictors as an example, anticipation function such as formula 4, loss function J such as formula 5.
V=hθ(x0,x2,...,xn)=θTX=θ0x01x12x2+…+θnxn (4)
Wherein hθFor pleasant degree regression forecasting function, θ=(θ0,...,θn) it is model parameter, x=(x0..., xn),x0=1, x1..., xnFor the muscial acoustics characteristic value of extraction.
Wherein m is training use-case quantity, v(i)For the pleasant degree V mark values of i-th of training use-case, x(i)For i-th of training use-case Acoustic feature vector.Training V fallout predictors are carried out using the method that gradient declines.
5. according to the method described in claim 1, it is characterized in that, first using the rhythm having in algorithm detection music signal The time series of point.After extracting the cadence time sequence of music, framing is carried out simultaneously according to the cadence time point extracted Then adding window extracts the tone Chroma features of each frame of song, after carrying out feature extraction, calculated using following formula The distance between the tonality feature vector of the tonality feature vector and other beats of each beat:
Wherein, S is self similarity matrix, and S (i, j) is the element of matrix S, and d is distance function, using Euclidean distance, piAnd pjRespectively It is the tonality feature vector of ith and jth beat, m is Music Day umber of beats.
After calculating new self similarity matrix S, detected in song by detecting the diagonal stripes in self similarity matrix S Repeated fragment.It is 1 to take the point apart from shortest 2%, and other points are 0 progress binaryzation, and it is right to obtain the similar matrix after binaryzation Refrain detection is carried out on binaryzation distance matrix afterwards.Two values matrix B is diagonally enhanced.In diagonal On, if time gap <=1 second of two values between 1 point, it is for those that point therebetween, which is set 1. another processing, The striped of time span <=2 second, is directly set to 0.
After handling in this way, have some stripeds its represent snatch of music have overlapping, such striped will be closed And combined criterion is just to be merged if the snatch of music that two stripeds represent has 80% or more coincidence, is closed with one New striped after and represents, and can be further reduced candidate striped quantity again in this way.Then choose longest 30 stripe to carry out Subsequent processing.
Remaining line segment represents the snatch of song repeated, is repeated with B segments if obtaining A segments according to the segment detected, B pieces Section is repeated with C segments, it can be said that A, B, C segment are repeated three times.It selects number of repetition maximum and length is more than 10 seconds Snatch of music is refrain.Such song will be divided into the other sections of forms being alternately present with refrain, then be carried out to it Pattern is sorted out.
Using above-mentioned music pattern arbiter and V/A fallout predictors, the music to being labelled with emotional category carries out music pattern and sentences Other and emotion evolutionary series E extraction.After obtaining emotion evolutionary series, the training of emotion classifiers is carried out.
Support vector machines (SVM) grader is selected, the training input to one mode song emotion classifiers is that its emotion develops Sequence and affective tag export as SVM model parameters.
The svm classifier model that training obtains is used to carry out the emotional semantic classification of new song.
CN201810305399.2A 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method Expired - Fee Related CN108648767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810305399.2A CN108648767B (en) 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810305399.2A CN108648767B (en) 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method

Publications (2)

Publication Number Publication Date
CN108648767A true CN108648767A (en) 2018-10-12
CN108648767B CN108648767B (en) 2021-11-05

Family

ID=63745734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810305399.2A Expired - Fee Related CN108648767B (en) 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method

Country Status (1)

Country Link
CN (1) CN108648767B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299312A (en) * 2018-10-18 2019-02-01 湖南城市学院 Music rhythm analysis method based on big data
CN109829067A (en) * 2019-03-05 2019-05-31 北京达佳互联信息技术有限公司 Audio data processing method, device, electronic equipment and storage medium
CN110134823A (en) * 2019-04-08 2019-08-16 华南理工大学 The MIDI musical genre classification method of Markov model is shown based on normalization note
CN110377786A (en) * 2019-07-24 2019-10-25 中国传媒大学 Music emotion classification method
CN110808065A (en) * 2019-10-28 2020-02-18 北京达佳互联信息技术有限公司 Method and device for detecting refrain, electronic equipment and storage medium
CN111462774A (en) * 2020-03-19 2020-07-28 河海大学 Music emotion credible classification method based on deep learning
CN111583890A (en) * 2019-02-15 2020-08-25 阿里巴巴集团控股有限公司 Audio classification method and device
CN111601433A (en) * 2020-05-08 2020-08-28 中国传媒大学 Method and device for predicting stage lighting effect control strategy
GB2583455A (en) * 2019-04-03 2020-11-04 Mashtraxx Ltd Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
GB2584598A (en) * 2019-04-03 2020-12-16 Mashtraxx Ltd Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
CN112614511A (en) * 2020-12-10 2021-04-06 央视国际网络无锡有限公司 Song emotion detection method
CN112989105A (en) * 2019-12-16 2021-06-18 黑盒子科技(北京)有限公司 Music structure analysis method and system
CN113129871A (en) * 2021-03-26 2021-07-16 广东工业大学 Music emotion recognition method and system based on audio signal and lyrics
US11068782B2 (en) 2019-04-03 2021-07-20 Mashtraxx Limited Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
CN114446323A (en) * 2022-01-25 2022-05-06 电子科技大学 Dynamic multi-dimensional music emotion analysis method and system
US11544565B2 (en) 2020-10-02 2023-01-03 Emotional Perception AI Limited Processing system for generating a playlist from candidate files and method for generating a playlist
CN112989105B (en) * 2019-12-16 2024-04-26 黑盒子科技(北京)有限公司 Music structure analysis method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080019031A (en) * 2005-06-01 2008-02-29 코닌클리케 필립스 일렉트로닉스 엔.브이. Method and electronic device for determining a characteristic of a content item
CN101937678A (en) * 2010-07-19 2011-01-05 东南大学 Judgment-deniable automatic speech emotion recognition method for fidget
KR20120021174A (en) * 2010-08-31 2012-03-08 한국전자통신연구원 Apparatus and method for music search using emotion model
CN102930865A (en) * 2012-09-21 2013-02-13 重庆大学 Coarse emotion soft cutting and classification method for waveform music
CN105931625A (en) * 2016-04-22 2016-09-07 成都涂鸦科技有限公司 Rap music automatic generation method based on character input

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080019031A (en) * 2005-06-01 2008-02-29 코닌클리케 필립스 일렉트로닉스 엔.브이. Method and electronic device for determining a characteristic of a content item
CN101937678A (en) * 2010-07-19 2011-01-05 东南大学 Judgment-deniable automatic speech emotion recognition method for fidget
KR20120021174A (en) * 2010-08-31 2012-03-08 한국전자통신연구원 Apparatus and method for music search using emotion model
CN102930865A (en) * 2012-09-21 2013-02-13 重庆大学 Coarse emotion soft cutting and classification method for waveform music
CN102930865B (en) * 2012-09-21 2014-04-09 重庆大学 Coarse emotion soft cutting and classification method for waveform music
CN105931625A (en) * 2016-04-22 2016-09-07 成都涂鸦科技有限公司 Rap music automatic generation method based on character input

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙向琨: "音乐内容和歌词相结合的歌曲情感分类方法研究", 《硕士学位论文》 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299312A (en) * 2018-10-18 2019-02-01 湖南城市学院 Music rhythm analysis method based on big data
CN109299312B (en) * 2018-10-18 2021-11-30 湖南城市学院 Music rhythm analysis method based on big data
CN111583890A (en) * 2019-02-15 2020-08-25 阿里巴巴集团控股有限公司 Audio classification method and device
CN109829067A (en) * 2019-03-05 2019-05-31 北京达佳互联信息技术有限公司 Audio data processing method, device, electronic equipment and storage medium
US11645532B2 (en) 2019-04-03 2023-05-09 Emotional Perception AI Limited Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
GB2583455A (en) * 2019-04-03 2020-11-04 Mashtraxx Ltd Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
GB2584598A (en) * 2019-04-03 2020-12-16 Mashtraxx Ltd Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
GB2584598B (en) * 2019-04-03 2024-02-14 Emotional Perception Ai Ltd Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
US11494652B2 (en) 2019-04-03 2022-11-08 Emotional Perception AI Limited Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
US11068782B2 (en) 2019-04-03 2021-07-20 Mashtraxx Limited Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
US11080601B2 (en) 2019-04-03 2021-08-03 Mashtraxx Limited Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
CN110134823B (en) * 2019-04-08 2021-10-22 华南理工大学 MIDI music genre classification method based on normalized note display Markov model
CN110134823A (en) * 2019-04-08 2019-08-16 华南理工大学 The MIDI musical genre classification method of Markov model is shown based on normalization note
CN110377786A (en) * 2019-07-24 2019-10-25 中国传媒大学 Music emotion classification method
CN110808065A (en) * 2019-10-28 2020-02-18 北京达佳互联信息技术有限公司 Method and device for detecting refrain, electronic equipment and storage medium
CN112989105B (en) * 2019-12-16 2024-04-26 黑盒子科技(北京)有限公司 Music structure analysis method and system
CN112989105A (en) * 2019-12-16 2021-06-18 黑盒子科技(北京)有限公司 Music structure analysis method and system
CN111462774A (en) * 2020-03-19 2020-07-28 河海大学 Music emotion credible classification method based on deep learning
CN111601433A (en) * 2020-05-08 2020-08-28 中国传媒大学 Method and device for predicting stage lighting effect control strategy
US11544565B2 (en) 2020-10-02 2023-01-03 Emotional Perception AI Limited Processing system for generating a playlist from candidate files and method for generating a playlist
CN112614511A (en) * 2020-12-10 2021-04-06 央视国际网络无锡有限公司 Song emotion detection method
CN113129871A (en) * 2021-03-26 2021-07-16 广东工业大学 Music emotion recognition method and system based on audio signal and lyrics
CN114446323A (en) * 2022-01-25 2022-05-06 电子科技大学 Dynamic multi-dimensional music emotion analysis method and system

Also Published As

Publication number Publication date
CN108648767B (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN108648767A (en) A kind of popular song emotion is comprehensive and sorting technique
Cramer et al. Look, listen, and learn more: Design choices for deep audio embeddings
Kong et al. High-resolution piano transcription with pedals by regressing onset and offset times
Lee et al. Multi-level and multi-scale feature aggregation using pretrained convolutional neural networks for music auto-tagging
Vogl et al. Drum Transcription via Joint Beat and Drum Modeling Using Convolutional Recurrent Neural Networks.
Gururani et al. An attention mechanism for musical instrument recognition
Oikarinen et al. Deep convolutional network for animal sound classification and source attribution using dual audio recordings
Stowell Computational bioacoustic scene analysis
de Benito-Gorron et al. Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset
Anglade et al. Improving music genre classification using automatically induced harmony rules
US20200075019A1 (en) System and method for neural network orchestration
Parekh et al. Weakly supervised representation learning for audio-visual scene analysis
CN106302987A (en) A kind of audio frequency recommends method and apparatus
US20200066278A1 (en) System and method for neural network orchestration
Hou et al. Transfer learning for improving singing-voice detection in polyphonic instrumental music
Morfi et al. Data-efficient weakly supervised learning for low-resource audio event detection using deep learning
Hernandez-Olivan et al. Music boundary detection using convolutional neural networks: A comparative analysis of combined input features
Jeong et al. Audio tagging system using densely connected convolutional networks.
Kalinli et al. Saliency-driven unstructured acoustic scene classification using latent perceptual indexing
US20200058307A1 (en) System and method for neural network orchestration
Cumming et al. Using corpus studies to find the origins of the madrigal
Pons Puig Deep neural networks for music and audio tagging
Zhong et al. Gender recognition of speech based on decision tree model
O’Brien Musical Structure Segmentation with Convolutional Neural Networks
Singh et al. Deep multi-view features from raw audio for acoustic scene classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211105