CN108648767B - Popular song emotion synthesis and classification method - Google Patents

Popular song emotion synthesis and classification method Download PDF

Info

Publication number
CN108648767B
CN108648767B CN201810305399.2A CN201810305399A CN108648767B CN 108648767 B CN108648767 B CN 108648767B CN 201810305399 A CN201810305399 A CN 201810305399A CN 108648767 B CN108648767 B CN 108648767B
Authority
CN
China
Prior art keywords
emotion
song
music
refrain
segments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810305399.2A
Other languages
Chinese (zh)
Other versions
CN108648767A (en
Inventor
孙书韬
王永滨
曹轶臻
王�琦
赵庄言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication University of China
Original Assignee
Communication University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication University of China filed Critical Communication University of China
Priority to CN201810305399.2A priority Critical patent/CN108648767B/en
Publication of CN108648767A publication Critical patent/CN108648767A/en
Application granted granted Critical
Publication of CN108648767B publication Critical patent/CN108648767B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/061Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction of musical phrases, isolation of musically relevant segments, e.g. musical thumbnail generation, or for temporal structure analysis of a musical piece, e.g. determination of the movement sequence of a musical work

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Child & Adolescent Psychology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for synthesizing and classifying popular song emotions relates to the field of audio information processing. Firstly, judging the occurrence mode of music and refrain on a piece of music to determine different popular music modes; secondly, dividing a song into N segments by adopting a flexible segmentation method, and predicting the pleasure degree and the excitement degree of each segment; and selecting different classifiers according to the mode of the popular music and the results of the pleasure degree and the violence degree of the N music pieces to carry out emotion synthesis on the whole piece of music so as to obtain the emotion label of the whole piece of music. The method has the advantages that the flexible segmentation technology is adopted to extract the V/A emotion evolution characteristics, and the V/A emotion evolution characteristics are respectively processed, so that the training of the popular song emotion classifiers with different structures is more targeted; the method adopts the popular song structure and the emotional evolution characteristics to classify the emotion of the songs, and is integrated with the simple statistical characteristics based on the whole song, so that the emotion cognition process and characteristics of human beings to music can be reflected better.

Description

Popular song emotion synthesis and classification method
Technical Field
The invention relates to a full music oriented automatic popular music emotion classification method in the field of audio information processing.
Background
The current method for classifying song emotions mainly comprises the steps of processing a segment in a song, dividing the segment into frames with fixed length, directly classifying the frames by emotion, and then counting the dominant emotion types in the song segment as emotion type labels of the song segment. Also using frame bags[2]The method models and classifies the whole song based on the frame bag, but the methods do not consider the intrinsic characteristics of human emotional response when enjoying the song. In fact, the emotional perception of the whole song is influenced by the appearance of emotional expressions at different positions of the song and the development process of the emotional expressions, and the traditional frame bag characteristics ignore the emotional expressionsFactors. People also propose a scheme for carrying out song emotion classification by taking refrain as a representative segment[3]However, no method for emotion synthesis according to different paragraphs is given. The emotion label of the whole song is judged by designing a two-stage emotion synthesis and classification method based on the observation and analysis of song structure rule expression and listener music emotion recognition process.
The song emotion comprehensive method is mainly designed according to the following observation: firstly, song emotional expression is stable in a certain time period; secondly, the emotional expression contribution degrees of different sections of the song to the whole song are different, and the emotional evolution of the song has influence on the emotional cognition of the whole music; and thirdly, the structure of most songs follows a certain rule, namely, prelude, tail, refrain, master song and the like appear at the relative positions of the songs and follow a certain rule, although exceptions are possible and are not strict.
Disclosure of Invention
The invention provides a technical scheme for automatically synthesizing and classifying song emotions of popular music. Song emotion synthesis and classification are divided into two stages, firstly, music and refrain mode discrimination is carried out on a piece of music, and different popular music modes are determined; secondly, a flexible segmentation method is adopted to divide a song into N segments (the size of N is related to the occurrence frequency of the chorus of the song), and the pleasure degree and the intensity of each segment are predicted; secondly, selecting different classifiers according to the mode of a piece of popular music and the results of the pleasure degree and the violence degree of the N pieces of music to carry out emotion synthesis on the whole piece of music so as to obtain the emotion label of the whole piece of music.
The invention comprehensively divides song and music emotion into two stages. The first is the prediction of the excitement and the pleasure of a music segment of a song to form a song emotional evolution sequence.
The emotional evolution sequence of the song is established on the basis of song segmentation. In order to complete the segmentation of a piece of music, the invention firstly needs to carry out the structural analysis of popular songs and classify a popular song according to the occurrence condition of the refrain.
Typical structures of popular songs are prelude, verse 1, refrain, verse 2, refrain, verse 3, refrain, and tail. Not all popular songs strictly follow this format, some songs have certain variations, there may be bridges between the main and the side songs, etc.
The invention adopts the refrain identification algorithm to identify the refrain, after identifying the refrain, one song presents other segments, refrain, other segments and the alternate appearing mode of the refrain, wherein, the other segments comprise prelude, master song, bridge segment or the combination thereof. According to the method, popular songs are divided into k types according to the repeated mode of occurrence of the refrains, the popular songs are divided into a refrain structure, 2 refrains occur, 3 refrains occur, and k refrains occur, wherein k is not more than 5. If the song pattern recognizer recognizes that the number of times of occurrence of the refrain is more than 5, let k be 5, classify the refrain into a song with k being 5, and skip the sixth refrain and the following music content in the subsequent processing. For ease of processing, the present invention omits the content of the song following the last occurrence of the refrain.
After the detection of the popular music song mode is completed, if the refrain is detected, the start-stop time of each refrain is obtained. Then, the invention adopts a flexible segmentation strategy to segment the songs, and divides a complete song into N segments. In order to make the emotional performance within a song segment substantially stable, the duration of each segment should not be greater than 10 s. To have a good discrimination of the position of the segments in the song, N is large enough and related to the chorus occurrence characteristics of the song.
The flexible segmentation scheme designed by the invention is convenient to process as follows:
the first type is the no repeat refrain structure. For the no-repeat refrain structure, equally dividing the song into N-N140 fragments. The invention assumes that the length of popular songs is generally no greater than 400s, if greater than 400s, discrete sampling will be performed, with N taken at equal intervals110s fragment. For song length L<For a 400s song, the segment length Lc equals L/N.
The second type is a quadratic repeat structure. For the OCOC (C represents the refrain fragment and O represents the other category fragments) with the quadratic repeating structure, the invention carries out equal-quantity segmentation on other segments and refrain segments. Each other segment O and each refrain segment C are divided into M small segments, each small segment having a length not greater than 10s, and if greater than 10s, equidistant sampling of the 10s segments is performed. The total number of songs is divided into N ═ N24M fragments, where M is a positive integer, 10 is suggested.
The third type is a triple repeat structure. For the triple-repetition structure OCOCOC, the invention carries out equal-quantity segmentation on other segments and the refrain segment. Each other segment O and each refrain segment C are divided into M small segments, each small segment having a length not greater than 10s, and if greater than 10s, equidistant sampling of the 10s segments is performed. The total number of songs is divided into N ═ N36M fragments, where M is a positive integer, 7 is suggested.
The fourth type is a quadruple repeat structure, and the fifth type is a repeat structure of 5 times or more. For 4-repetition structures ococcoc, and 5-repetition structures above, the segmentation method is similar to the previous repetition structure, and is divided into N ═ N48M and N5Segment 10M, M suggests 5 and 4 respectively.
In order to identify the emotion of a music piece, the emotion predictor of the music piece is trained on the emotion data set of the music ordered piece. The Valence-Arousal (V-A) model of Thayer is adopted in the prediction of music segment emotion[1]To express emotion, it is divided into two dimensions of pleasure (value) and excitement (arousal). The degree of pleasure represents the positive and negative attributes of the emotion, and the degree of violence indicates the degree of intensiveness (intensity) of the emotion. Music emotion is expressed as excitement and pleasure index<v,a>V, a has a value range of [ -1, +1]Real numbers in between. The emotion prediction model for the music segments is obtained by training the music segments with stable emotion expression, is called as a V/A predictor in the invention, is a mapping from the acoustic characteristics of the music segments to V/A values, generally represents a formula 1 and a formula 2, and is different according to a classifier selected during implementation.
V=fV(x1,x2,…,xi,…,xn) (1)
A=fA(x1,x2,…,xi,…,xn) (2)
Wherein xi(iN) is the ith acoustic feature value of the music piece, and n is the number of music acoustic features selected by the V/a value prediction formula.
For a complete popular song, in order to identify the emotion classification of the whole song, comprehensive classification needs to be carried out according to the emotion expression of the whole song. In order to accurately integrate the emotion of a song, different structural modes of the song are firstly identified, and different emotion classifiers are trained for the songs with different structures to integrate and classify the emotion of the song. The invention considers that the songs with similar structures have certain similarity in the roles of the song segments with the same relative positions in song emotional expression. For each song, N violence indexes and N pleasure indexes are obtained through fragment emotion prediction, and the two indexes can be combined into a sequence E ═ a1,v1,a2,v2,...,aN,vNInput features as emotion synthesis. In the emotion synthesis stage, the sequence is adopted to predict the emotion label of a song, and the characteristics not only reflect the emotion statistical characteristics of the whole song, but also reflect the time sequence characteristics of song emotion expression and the emotion expression of different music segments.
In order to complete the emotion synthesis of the whole song, the invention needs to train a classifier for emotion synthesis of the song. The input of the sequence is a song emotion evolution sequence E with a certain structure, and the output is a song emotion label. The song emotion comprehensive classifier respectively trains songs with different occurrence modes of the chorus to obtain 5 song emotion comprehensive classifiers corresponding to the classification of the song modes. The acquisition of the song emotion evolution sequence E depends on the song pattern recognition and song segmentation and an A/V predictor. Song emotion integrated classifier fjThe general form of (1) is as in equation 3, and the specific functional form varies depending on the implementation of the selected classifier.
Figure BDA0001620816040000041
fjAnd integrating the emotion classification function corresponding to the j-th structure. L isjTo adopt fjThe resulting classification label, NjMeaning the corresponding number of the divided pieces of the five-structure songs, fjIs the emotional evolution sequence of the corresponding structure song.
The system framework of the method provided by the invention is shown in figure 1 and mainly comprises a V/A predictor training module, an emotion classifier training module and a song emotion comprehensive classification module. The song emotion and emotion comprehensive classification module is realized in two stages, wherein the first stage is used for song mode identification and segmentation and emotion evolution sequence generation, and the second stage is used for emotion synthesis and classification of the whole song by adopting a classifier.
The invention provides an emotion comprehensive method considering the influence of emotion expressions of different positions and paragraphs of music on emotion labels of a whole song. The method has the advantages that (1) the method for pre-classifying the appearance modes of the popular music and the chorus is adopted, the popular songs are pre-classified according to the structural characteristics, the V/A emotion evolution characteristics are extracted by adopting a flexible segmentation technology and are respectively processed, so that the training of the emotion classifiers of the popular songs with different structures is more targeted; (2) the method adopts the popular song structure and the emotional evolution characteristics to classify the emotion of the songs, and is integrated with the simple statistical characteristics based on the whole song, so that the emotion cognition process and characteristics of human beings to music can be reflected better.
Drawings
FIG. 1 is a system architecture diagram of a popular music emotion synthesis and classification method
FIG. 2 detecting steps of refraining
FIG. 3 tone feature matrix example (450 beats, 12 tones)
FIG. 4 is an example of a self-similarity matrix based on pitch characteristics
FIG. 5 is a system architecture diagram of an embodiment of a method for synthesizing and classifying popular music emotion
The specific implementation mode is as follows:
the V/A predictor training module completes the training of the popular song V/A predictor and mainly comprises two sub-modules of music segment feature extraction and training. The feature extraction submodule is responsible for extracting acoustic features such as tone, tone and beat of the fragments. And then the A/V label value and the corresponding A/V label value are input into an A/V predictor training module for training.
The emotion classifier training module comprises a feature extraction sub-module, a song mode recognition sub-module, a song segmentation sub-module, a V/A predictor sub-module and an emotion classifier training sub-module. The feature extraction submodule is responsible for song acoustic feature extraction, the song mode identification module identifies a running song mode and the segmentation position of each segment, the song segmentation module completes flexible segmentation according to the song mode, the segmentation position of each segment and the song length to form a song segment which is not longer than 10s, an emotion evolution series is generated through a V/A predictor, and the emotion evolution series and a song emotion label are input into the emotion classifier training submodule together to carry out emotion classifier training.
The song emotion comprehensive classification module mainly comprises a plurality of submodules of feature extraction, song mode identification, song segmentation, V/A prediction and emotion classification. After the emotion evolution sequence generated by the V/A predictor enters the emotion classifier, the emotion classifier selects a corresponding prediction model according to the song mode identification result to carry out emotion synthesis and classification on a song, and outputs a most possible emotion label or emotion sequencing result.
In order to implement the present invention, a certain number of labeled pop music materials are required, including pop music piece V/a value labeling and whole pop music emotion tag labeling. The V/A value is marked by adopting the numerical value of an interval, for example, the joyfulness V is a real number between [ -1 and +1], wherein, -1 represents extreme negative emotion, and +1 represents extreme positive emotion; the activity takes a value between-1 and +1, -1 for very mild and +1 for very intense activity. Affective tags are generally classified as anger, happiness, pleasure, relaxation, calm, sadness, anger, defecation, tension, boredom, etc., and affective tags are not limited to the above, but are related to the application.
The music acoustic features as shown in table 1 can be extracted for training the V/a predictor, but are not limited to the extraction in the embodiment of the present invention. The V/A predictor uses multiple linear regression prediction in this embodiment. The input data are acoustic features and labeled V/a values of popular music pieces, and the output is predictor parameters. The V/A predictor of the present embodiment can train regression predictors of the pleasure degree V and the excitement degree A respectively. Taking the goodness-of-pleasure V regression predictor as an example, the prediction function is as shown in equation 4, and the loss function J is as shown in equation 5.
V=hθ(x0,x2,...,xn)=θTx=θ0x01x12x2+…+θnxn (4)
Wherein h isθFor the pleasure regression prediction function, θ ═ θ0,...,θn) As a model parameter, x ═ x0,...,xn),x0=1,x1,...,xnIs the extracted acoustic feature value of the music.
Figure BDA0001620816040000061
Where m is the number of training cases, v(i)Marking a value, x, of the pleasure degree V of the ith training case(i)Is the acoustic feature vector of the ith training case. Training the V predictor is performed by a gradient descent method.
The model and training scheme of the a-value predictor is similar to the V-value predictor.
Another step implemented by the present invention is performing a pop song pattern detection. The embodiment of the invention for identifying the popular song mode adopts a refrain detection method based on a self-similarity matrix. The specific steps are shown in figure 2.
The implementation of the invention first uses the existing algorithm to detect the time series of the tempo points in the music signal. After extracting the tempo time series of the music, frames and windows are performed according to the extracted tempo time points, and then a pitch (Chroma) feature of each frame of the song is extracted, the Chroma feature being a 12-dimensional vector p ═ p1,...,p12) For 12 pitch classes C, C #, D #, E, F #, G #, a #, B, the Chroma feature values of all frames in a beat are averaged to obtain the Chroma feature of the beat. An example of a Chroma feature matrix for a song is shown in fig. 3.
After feature extraction, the distance between the pitch feature vector of each beat and the pitch feature vectors of other beats is calculated using the following formula:
Figure BDA0001620816040000071
where S is a self-similarity matrix, S (i, j) is an element of the matrix S, d is a distance function, and Euclidean distance, p, is used in this embodimentiAnd pjThe pitch feature vectors of the ith and jth beats, respectively, and m is the number of musical beats. Fig. 4 is an example of a self-similarity matrix. It can be seen from the self-similarity matrix that it contains some line segments parallel to the main diagonal, which represent repeated segments of the song.
After the new self-similarity matrix S is calculated, the embodiment of the present invention detects repeated segments in the song by detecting diagonal stripes in the self-similarity matrix S. In specific implementation, according to the existing research results, the point with the shortest distance of 2% is generally taken as 1, and other points are taken as 0 for binarization, and the obtained binarized similarity matrix basically contains the fragment similarity information of the original similarity matrix. And then, performing refrain detection on the binary distance matrix. Due to the influence of noise, points with a value of 1 in the binary matrix are relatively dispersed, and therefore the binary matrix B needs to be enhanced in a diagonal direction. In the diagonal direction, if the time distance between two points with a value of 1 is 1 second, the point in between is set to 1. another process is to set the stripes with a time length of 2 seconds to 0 directly, since too short repeated stripes are not likely to be chorus.
After the processing, some stripes represent music pieces which are overlapped, and the stripes are combined, the criterion of combining is that if the music pieces represented by two stripes are overlapped by more than 80%, the two stripes are combined, and a new combined stripe is used for representing, so that the number of candidate stripes can be further reduced. Then pick the longest 30 stripes for subsequent processing.
The remaining segments represent repeated song segments, and if the A segment is repeated with the B segment and the B segment is repeated with the C segment according to the detected segments, the A, B and C segments can be said to be repeated three times. The invention selects the music piece with the largest repetition times and the length of more than 10 seconds as the refrain. Such a song is divided into other segments that alternate with chorus, which can be pattern-categorized.
By adopting the music mode discriminator and the V/A predictor, music mode discrimination and emotion evolution sequence E extraction can be carried out on music marked with emotion types. After the emotion evolution sequence is obtained, the emotion classifier can be trained.
In the embodiment of the invention, a Support Vector Machine (SVM) classifier is selected, the training input of a mode song emotion classifier is an emotion evolution sequence and an emotion label of the SVM, and the training input is an SVM model parameter.
The SVM classification model obtained by training can be used for emotion classification of new songs.
Optional musical acoustics feature of supplementary table 1
Figure BDA0001620816040000081
[1]R.E.Thayer,The Biopsychology of Mood and Arousal.Oxford,U.K.:Oxford Univ.Press,1989.
[2]J.-C.Wang,H.-S.Lee,H.-M.Wang,and S.-K.Jeng,“Learning the similarity of audio music in bag-of-frames representation from tagged music data,”in Proc.Int.Society for Music Information Retrieval Conference,2011,pp.85–90.
[3]Chia-Hung Yeh,Yu-Dun Lin,Ming-Sui Lee2and Wen-Yu Tseng,Popular Music Analysis:Chorus and Emotion Detection,Proceedings of the Second APSIPA Annual Summit and Conference,pages 907–910,Biopolis,Singapore,14-17December 2010

Claims (2)

1. A popular song emotion synthesis and classification method is characterized in that the method comprises two stages, firstly, music and refrain mode discrimination is carried out on a piece of music, and different popular music modes are determined; secondly, dividing a song into N segments by adopting a flexible segmentation method, and predicting the pleasure degree and the excitement degree of each segment; secondly, selecting different classifiers according to the mode of a piece of popular music and the results of the pleasure degree and the violence degree of the N pieces of music to carry out emotion synthesis on the whole piece of music to obtain an emotion label of the whole piece of music;
predicting the excitement and the pleasure of a music piece of a song to form a song emotion evolution sequence;
identifying the refrain by adopting a refrain identification algorithm, wherein after identifying the refrain, one song presents other segments, refrains, other segments and a mode that the refrain alternately appears, wherein the other segments comprise prelude, master song or bridge segment; dividing popular songs into k types according to a repeated mode of occurrence of the refrains, dividing popular songs into a refrain structure, 2 times of refrains and 3 times of refrains, and judging that k is not more than 5, if the occurrence frequency of the refrains is more than 5, enabling k to be 5, classifying the popular songs into songs with k being 5, and omitting the sixth refrain and the following music content in subsequent processing; after the mode detection of the popular music songs is finished, if the refrain is detected, the start-stop time of each refrain is obtained; then, segmenting the songs by adopting a flexible segmentation strategy, and dividing a complete song into N segments; the duration of each segment should be no greater than 10 s; the flexible segmentation scheme was designed as follows:
the first type is a no repeat refrain structure; for the no-repeat refrain structure, equally dividing the song into N-N140 fragments; assuming that the length of the popular song is not more than 400s, if the length of the popular song is more than 400s, discrete sampling is carried out, and N is taken out at equal intervals1A 10s fragment; for song length L<400s of songs, wherein the segment length Lc is L/N;
the second type is a secondary repeating structure; for the OCOC with the quadratic repeat structure, wherein C represents the refrain fragment, O represents other category fragments, and other segments and the refrain fragment are segmented with equal quantity; each other segment O and each refrain segment C are divided into M small segments, the length of each small segment is not more than 10s, and if the length of each small segment is more than 10s, equidistant sampling of the 10s segments is carried out; the total number of songs is divided into N ═ N24M fragments, wherein M is 10;
the third type is a triple repeat structure; for threeThe OCOCOCOC with the secondary repeating structure segments other segments and the refrain segment by equal quantity; each other segment O and each refrain segment C are divided into M small segments, the length of each small segment is not more than 10s, and if the length of each small segment is more than 10s, equidistant sampling of the 10s segments is carried out; the total number of songs is divided into N ═ N36M fragments, wherein M is 7;
the fourth type is a quadruple repeat structure, and the fifth type is a repeat structure of 5 times or more; for 4-times repeating structure OCOCOCOC and more than 5-times repeating structure, the segmentation method is the same as that of the previous repeating structure, and the corresponding division is N-N48M and N5Segment 10M, M takes 5 and 4 respectively.
2. The method of claim 1, wherein a music piece emotion predictor is trained based on an emotion data set of the music ordered piece; the emotion is expressed by adopting a value-aroma (V-A) model of Thayer in the prediction of the emotion of the music piece, and the prediction is divided into two dimensions of pleasure (value) and excitement (aroma); the pleasure degree represents the positive and negative attributes of the emotion, the intensity indicates the intensity (intensity) of the emotion; the music emotion is expressed as a real number with the value range of the excitation degree and the pleasure degree index < v, a >, v, a between [ -1, +1 ]; the emotion prediction model for the music segments is obtained by training the music segments with stable emotion expression, is called as a V/A predictor, is a mapping from the acoustic characteristics of the music segments to V/A values, and is represented by the formulas (1) and (2), and is different according to a classifier selected during implementation;
V=fV(x1,x2,…,xi,…,xn) (1)
A=fA(x1,x2,…,xi,…,xn) (2)
wherein xi(i ═ 1., n) is the ith acoustic feature value of the music piece, and n is the number of acoustic features of the music selected by the V/A value prediction formula;
for each song, N severity indexes and N happiness indexes are obtained through fragment emotion predictionThe two groups of indexes are combined into a sequence E ═<a1,v1,a2,v2,...,aN,vN>Input features as emotion synthesis; in the emotion synthesis stage, the sequence is adopted to predict the emotion label of a song,
training a classifier for song emotion synthesis, inputting a song emotion evolution sequence E with a certain structure, and outputting a song emotion label; respectively training songs with different occurrence modes of the chorus by using the song emotion comprehensive classifier to obtain 5 song emotion comprehensive classifiers corresponding to classification of song modes; the acquisition of the song emotion evolution sequence E depends on song pattern recognition, song segmentation and a V/A predictor; f. ofjThe emotion comprehensive classification function corresponding to the j-th class structure is in the form of a formula (3)
Figure FDA0003197288170000021
LjTo adopt fjThe resulting classification label, NjMeaning the corresponding number of the divided pieces of the five-structure songs, fjIs the emotional evolution sequence of the corresponding structure song.
CN201810305399.2A 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method Expired - Fee Related CN108648767B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810305399.2A CN108648767B (en) 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810305399.2A CN108648767B (en) 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method

Publications (2)

Publication Number Publication Date
CN108648767A CN108648767A (en) 2018-10-12
CN108648767B true CN108648767B (en) 2021-11-05

Family

ID=63745734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810305399.2A Expired - Fee Related CN108648767B (en) 2018-04-08 2018-04-08 Popular song emotion synthesis and classification method

Country Status (1)

Country Link
CN (1) CN108648767B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299312B (en) * 2018-10-18 2021-11-30 湖南城市学院 Music rhythm analysis method based on big data
CN111583890A (en) * 2019-02-15 2020-08-25 阿里巴巴集团控股有限公司 Audio classification method and device
CN109829067B (en) * 2019-03-05 2020-12-29 北京达佳互联信息技术有限公司 Audio data processing method and device, electronic equipment and storage medium
GB2584598B (en) * 2019-04-03 2024-02-14 Emotional Perception Ai Ltd Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
US11080601B2 (en) 2019-04-03 2021-08-03 Mashtraxx Limited Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
GB2583455A (en) * 2019-04-03 2020-11-04 Mashtraxx Ltd Method of training a neural network to reflect emotional perception and related system and method for categorizing and finding associated content
CN110134823B (en) * 2019-04-08 2021-10-22 华南理工大学 MIDI music genre classification method based on normalized note display Markov model
CN110377786A (en) * 2019-07-24 2019-10-25 中国传媒大学 Music emotion classification method
CN110808065A (en) * 2019-10-28 2020-02-18 北京达佳互联信息技术有限公司 Method and device for detecting refrain, electronic equipment and storage medium
CN112989105B (en) * 2019-12-16 2024-04-26 黑盒子科技(北京)有限公司 Music structure analysis method and system
CN111462774B (en) * 2020-03-19 2023-02-24 河海大学 Music emotion credible classification method based on deep learning
CN111601433B (en) * 2020-05-08 2022-10-18 中国传媒大学 Method and device for predicting stage lighting effect control strategy
GB2599441B (en) 2020-10-02 2024-02-28 Emotional Perception Ai Ltd System and method for recommending semantically relevant content
CN112614511A (en) * 2020-12-10 2021-04-06 央视国际网络无锡有限公司 Song emotion detection method
CN113129871A (en) * 2021-03-26 2021-07-16 广东工业大学 Music emotion recognition method and system based on audio signal and lyrics
CN114446323B (en) * 2022-01-25 2023-03-10 电子科技大学 Dynamic multi-dimensional music emotion analysis method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080019031A (en) * 2005-06-01 2008-02-29 코닌클리케 필립스 일렉트로닉스 엔.브이. Method and electronic device for determining a characteristic of a content item
CN101937678A (en) * 2010-07-19 2011-01-05 东南大学 Judgment-deniable automatic speech emotion recognition method for fidget
KR20120021174A (en) * 2010-08-31 2012-03-08 한국전자통신연구원 Apparatus and method for music search using emotion model
CN102930865A (en) * 2012-09-21 2013-02-13 重庆大学 Coarse emotion soft cutting and classification method for waveform music
CN105931625A (en) * 2016-04-22 2016-09-07 成都涂鸦科技有限公司 Rap music automatic generation method based on character input

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080019031A (en) * 2005-06-01 2008-02-29 코닌클리케 필립스 일렉트로닉스 엔.브이. Method and electronic device for determining a characteristic of a content item
CN101937678A (en) * 2010-07-19 2011-01-05 东南大学 Judgment-deniable automatic speech emotion recognition method for fidget
KR20120021174A (en) * 2010-08-31 2012-03-08 한국전자통신연구원 Apparatus and method for music search using emotion model
CN102930865A (en) * 2012-09-21 2013-02-13 重庆大学 Coarse emotion soft cutting and classification method for waveform music
CN102930865B (en) * 2012-09-21 2014-04-09 重庆大学 Coarse emotion soft cutting and classification method for waveform music
CN105931625A (en) * 2016-04-22 2016-09-07 成都涂鸦科技有限公司 Rap music automatic generation method based on character input

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
音乐内容和歌词相结合的歌曲情感分类方法研究;孙向琨;《硕士学位论文》;20131231;全文 *

Also Published As

Publication number Publication date
CN108648767A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108648767B (en) Popular song emotion synthesis and classification method
Lehner et al. A low-latency, real-time-capable singing voice detection method with LSTM recurrent neural networks
Stowell Computational bioacoustic scene analysis
Cakir et al. Multi-label vs. combined single-label sound event detection with deep neural networks
CN111161715B (en) Specific sound event retrieval and positioning method based on sequence classification
US20200075019A1 (en) System and method for neural network orchestration
CN101398825B (en) Rapid music assorting and searching method and device
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
US20200066278A1 (en) System and method for neural network orchestration
CN111400540A (en) Singing voice detection method based on extrusion and excitation residual error network
Shah et al. Raga recognition in indian classical music using deep learning
Xia et al. Confidence based acoustic event detection
Hou et al. Transfer learning for improving singing-voice detection in polyphonic instrumental music
Mounika et al. Music genre classification using deep learning
Yasmin et al. A rough set theory and deep learning-based predictive system for gender recognition using audio speech
Kalinli et al. Saliency-driven unstructured acoustic scene classification using latent perceptual indexing
Foucard et al. Multi-scale temporal fusion by boosting for music classification.
Theodorou et al. Automatic sound recognition of urban environment events
CN105006231A (en) Distributed large population speaker recognition method based on fuzzy clustering decision tree
Wadhwa et al. Music genre classification using multi-modal deep learning based fusion
Thiruvengatanadhan Music genre classification using gmm
KR100869643B1 (en) Mp3-based popular song summarization installation and method using music structures, storage medium storing program for realizing the method
Viloria et al. Segmentation process and spectral characteristics in the determination of musical genres
Lee et al. Automatic melody extraction algorithm using a convolutional neural network
CN113673561B (en) Multi-mode-based automatic music tag classification method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211105

CF01 Termination of patent right due to non-payment of annual fee