CN106128479A - A kind of performance emotion identification method and device - Google Patents

A kind of performance emotion identification method and device Download PDF

Info

Publication number
CN106128479A
CN106128479A CN201610517375.4A CN201610517375A CN106128479A CN 106128479 A CN106128479 A CN 106128479A CN 201610517375 A CN201610517375 A CN 201610517375A CN 106128479 A CN106128479 A CN 106128479A
Authority
CN
China
Prior art keywords
training
described
coordinate
matrix
audio frequency
Prior art date
Application number
CN201610517375.4A
Other languages
Chinese (zh)
Other versions
CN106128479B (en
Inventor
蔡智力
Original Assignee
福建星网视易信息系统有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to CN201610506767 priority Critical
Priority to CN2016105067670 priority
Application filed by 福建星网视易信息系统有限公司 filed Critical 福建星网视易信息系统有限公司
Publication of CN106128479A publication Critical patent/CN106128479A/en
Application granted granted Critical
Publication of CN106128479B publication Critical patent/CN106128479B/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

This application discloses a kind of emotion identification method and device sung, the wherein said affective characteristics treating that audio frequency is sung in training that extracts, training obtains emotion recognition model;Described affective characteristics includes voice signal property and music score of Chinese operas feature;Extract the affective characteristics of performance audio frequency to be identified;The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the emotion of performance audio frequency to be identified.The present embodiment is compared to existing speech emotion recognition and music emotion identification, the present embodiment can go out the performance emotion of corresponding singer according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains according to music score of Chinese operas feature and acoustical signal feature identification, for same song, the emotion of corresponding performance can be identified according to different singers, more precisely identify the emotion of singer.

Description

A kind of performance emotion identification method and device

Technical field

The application belongs to emotion recognition field, specifically, relates to a kind of performance emotion recognition and device.

Background technology

The emotion recognition of present stage audio frequency is broadly divided into speech emotion recognition and music emotion identification two aspect, but from drilling But nobody relates to sing middle identification emotion, is also a difficult point of audio frequency emotion recognition.It is different from speech emotion recognition and music Emotion recognition because: rely on tone and word speed can judge emotion in, speech emotion recognition, but performance be all according to Tone and word speed that song is demarcated are carried out, so the method that foundation tone and word speed identify the emotion in performance is infeasible.Shen Please number be 200510046169.1, patent publication us " phonetic recognition analysing system and the service of filing date 2005-03-31 Method ", then it is the sound frequency extracting the mankind in person to person's communication process, with sound emotion degree and sound affinity degree as technology Foundation, draws speech recognition based on Sensory science field and analysis.Sound emotion degree is the tone according to people's sounding and musical note, Understand its personality, grasp speaker's mental status at that time;Sound affinity is directly driven by human lung according to analysis Low frequency sounding, and then show the true emotional of speaker.But for singing scene, all demarcate according to song during performance Tone and word speed are carried out, and identify the emotion of singer infeasible according to tone and musical note in this patent publication us.Two, sound Happy emotion recognition mainly judges emotion according to audio frequency characteristics and music score of Chinese operas feature, and the emotion therefore judged is all fixing, but Being that each singer can deduce voluntarily when singing, for a same song, the emotion of the deduction of each singer is also Differ, so music emotion identification can not be recognized accurately the emotion of corresponding performance according to the performance situation of singer.

To sum up, singing emotion recognition is a frontier being totally different from speech emotion recognition and music emotion identification, Prior art does not has to provide a solution, to realize identifying the emotion of singer from sing.

Summary of the invention

In view of this, technical problems to be solved in this application there is provided a kind of emotion recognition and device sung, permissible Realize identifying the emotion of singer from sing.

In order to solve above-mentioned technical problem, this application discloses a kind of performance emotion identification method, including:

Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described affective characteristics includes sound Tone signal feature and music score of Chinese operas feature;

Extract the affective characteristics of performance audio frequency to be identified;

The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the feelings of performance audio frequency to be identified Sense.

For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion recognition device, including:

Training module, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described feelings Sense feature includes voice signal property and music score of Chinese operas feature;

Extraction module, for extracting the affective characteristics of performance audio frequency to be identified;

Identification module, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies to be identified Sing the emotion of audio frequency.

For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion identification method, including:

Obtain user and sing audio frequency;

When identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, performance corresponding to output is tied Really control instruction.

For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion recognition device, including:

Acquisition module, is used for obtaining user and sings audio frequency;

Identification module, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, defeated Go out the performance output control instruction of correspondence.

Compared with prior art, the application can obtain and include techniques below effect:

The affective characteristics of the embodiment of the present application extraction and speech emotion recognition and music emotion identification are in terms of feature extraction There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only It is tone, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains The emotion of singer is more precisely identified with voice signal property.Concrete, by extracting, the present embodiment treats that training is sung The affective characteristics of audio frequency, training obtains emotion recognition model;Described affective characteristics includes voice signal property and music score of Chinese operas feature;Carry Take the affective characteristics of performance audio frequency to be identified;The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies The emotion of performance audio frequency to be identified.The embodiment of the present application is capable of the performance of the performance audio identification singer according to singer Affective style, can identify the emotion of singer from sing.

Certainly, the arbitrary product implementing the application it is not absolutely required to reach all the above technique effect simultaneously.

Accompanying drawing explanation

Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used for explaining the application, is not intended that the improper restriction to the application.In the accompanying drawings:

Figure 1A is a kind of schematic flow sheet singing emotion identification method that some embodiment of the application provides;

Figure 1B is the schematic flow sheet of a kind of emotion recognition method for establishing model that some embodiment of the application provides;

Fig. 2 A is the schematic flow sheet of another performance emotion identification method that some embodiment of the application provides;

Fig. 2 B is the schematic flow sheet singing emotion identification method provided based on Fig. 2 some embodiment of A the application;

Fig. 3 is the another kind of schematic flow sheet singing emotion identification method that some embodiment of the application provides;

Fig. 4 is the another kind of emotion recognition method for establishing model schematic flow sheet that some embodiment of the application provides;

Fig. 5 A is pressure factor and the capacity factor composition plane right-angle coordinate of some embodiment of the application offer;

Fig. 5 B is a part of flow process signal of a kind of emotion recognition method for establishing model that some embodiment of the application provides Figure;

Fig. 6 A is the schematic flow sheet singing emotion identification method that some embodiment of the application provides;

Fig. 6 B is the part schematic flow sheet singing emotion identification method that some embodiment of the application provides;

Fig. 6 C is another part schematic flow sheet singing emotion identification method that some embodiment of the application provides;

Fig. 7 is a kind of schematic flow sheet singing recognition methods that some embodiment of the application provides;

Fig. 8 is a kind of structural representation singing emotion recognition device that some embodiment of the application provides;

Fig. 9 is a kind of structural representation singing identification device that some embodiment of the application provides;

Figure 10 is the structural representation of the electric terminal that some embodiment of the application provides.

Detailed description of the invention

Describe presently filed embodiment in detail below in conjunction with drawings and Examples, thereby how the application is applied Technological means solves technical problem and reaches the process that realizes of technology effect and can fully understand and implement according to this.

Embodiment one

Refer to Figure 1A, it is shown that the embodiment of the present application provides a kind of schematic flow sheet singing emotion identification method, this Application can apply to terminal unit, it is also possible to being applied to emotion recognition model and set up device, this device can be with software, hardware Or the mode of software and hardware combining is typically positioned in terminal unit.Say as a example by executive agent is as terminal unit below Bright, the method shown in Figure 1A can be implemented as described below.

Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model;Described emotion is special Levy and include voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification extracted Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average, Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each The average duration of sound.

Optionally, as shown in Figure 1B, the training method of the present embodiment emotion recognition model is as follows.

Step 1011, determine described in treat training sing audio frequency affective characteristics respectively at the first coordinate axes and the second coordinate axes Training coordinate figure, obtain the first training coordinate figure and second training coordinate figure;Wherein, described first coordinate axes and the second coordinate Axle composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding.

Step 1012, according to described first training coordinate figure and treat training sing audio frequency affective characteristics set up the first training Matrix, sets up the second training matrix according to described second training coordinate figure and the affective characteristics treating training performance audio frequency;

Step 1013, the first training matrix is normalized into the first training normalization matrix;By the second training matrix normalizing Chemical conversion the second training normalization matrix;

Step 1014, by described first training normalization matrix, second training normalization matrix substitute into SVM algorithm respectively, Correspondence obtains the first training hyperplane, the second training hyperplane;

Step 1015, by first training hyperplane and first training normalization matrix substitute into SVM algorithm, obtain based on first First emotion recognition model of coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains The second emotion recognition model based on the second coordinate axes.Described first emotion recognition model is for determining performance audio frequency to be identified Affective characteristics is at the first coordinate figure of the first change in coordinate axis direction, and the second emotion recognition model is for determining performance audio frequency to be identified Affective characteristics is at the second coordinate figure of the second change in coordinate axis direction.

Step 102, extract the affective characteristics of performance audio frequency to be identified.With as step 101, step 102 extract emotion Feature includes voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification is extracted Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale base poor, average Frequently, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Institute State music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, every The average duration of individual sound.

Step 103, the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identify performance sound to be identified The emotion of frequency.Concrete, step 103 includes:

The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model, Determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;

The quadrant that described affective characteristics is corresponding is determined, to determine described feelings according to described first coordinate figure and the second coordinate figure The performance affective style that sense feature is corresponding.

The affective characteristics that the present embodiment extracts exists in terms of feature extraction with speech emotion recognition and music emotion identification Difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also sound Tune, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature and (are included in In voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, the present embodiment root According to including that the emotion recognition model that the affective characteristics of voice signal property and music score of Chinese operas feature obtains can be according to music score of Chinese operas feature harmony Tone signal feature, identifies the performance emotion of corresponding singer, for same song, it is possible to identify according to different singers The corresponding emotion sung, more precisely identifies the emotion of singer.Concrete, by extracting, the present embodiment treats that training is sung The affective characteristics of audio frequency, training obtains emotion recognition model;Described affective characteristics includes voice signal property and music score of Chinese operas feature;Carry Take the affective characteristics of performance audio frequency to be identified;The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies The emotion of performance audio frequency to be identified.The embodiment of the present application is capable of the performance of the performance audio identification singer according to singer Affective style, can identify the emotion of singer from sing.

Embodiment two

In conjunction with Figure 1A to Fig. 2 B, the embodiment of the present application provides one to sing emotion identification method, for based on enforcement one one Kind can implementation, especially by the following manner realize.Here, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.

Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model;Described emotion is special Levy and include voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification extracted Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average, Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each The average duration of sound.

Optionally, as shown in Figure 1B, the training method of the present embodiment emotion recognition model is as follows.

Step 1011, determine described in treat training sing audio frequency affective characteristics respectively at the first coordinate axes and the second coordinate axes Training coordinate figure, obtain the first training coordinate figure and second training coordinate figure;Wherein, described first coordinate axes and the second coordinate Axle composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding.

Optionally, the performance affective style that the quadrant of described plane right-angle coordinate is corresponding includes: nervous anxiety, happiness are joyous Hurry up, sad dejected, calmness naturally.The quadrant of described plane right-angle coordinate and the corresponding relation singing affective style include: the One quadrant correspondence anxiety anxiety, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, the corresponding nature of fourth quadrant Tranquil.

Step 1012, according to described first training coordinate figure and treat training sing audio frequency affective characteristics set up the first training Matrix, sets up the second training matrix according to described second training coordinate figure and the affective characteristics treating training performance audio frequency;

Step 1013, the first training matrix is normalized into the first training normalization matrix;By the second training matrix normalizing Chemical conversion the second training normalization matrix;

Step 1014, by described first training normalization matrix, second training normalization matrix substitute into SVM algorithm respectively, Correspondence obtains the first training hyperplane, the second training hyperplane;

Step 1015, by first training hyperplane and first training normalization matrix substitute into SVM algorithm, obtain based on first First emotion recognition model of coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains The second emotion recognition model based on the second coordinate axes.Described first emotion recognition model is for determining performance audio frequency to be identified Affective characteristics is at the first coordinate figure of the first change in coordinate axis direction, and the second emotion recognition model is for determining performance audio frequency to be identified Affective characteristics is at the second coordinate figure of the second change in coordinate axis direction.

Step 102, extract the affective characteristics of performance audio frequency to be identified.With as step 101, step 102 extract emotion Feature includes voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification is extracted Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale base poor, average Frequently, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Institute State music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, every The average duration of individual sound.

Step 103, the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identify performance sound to be identified The emotion of frequency.Concrete, step 103 includes:

The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model, Determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;

The quadrant that described affective characteristics is corresponding is determined, to determine described feelings according to described first coordinate figure and the second coordinate figure The performance affective style that sense feature is corresponding.

As shown in Figure 2 A, in a kind of feasible embodiment, step 103 obtains the first coordinate figure by the following method.

In step 1030, according to voice signal property, the music score of Chinese operas feature of described performance audio frequency to be identified and sit based on first First training matrix of parameter, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) add based on the first coordinate First training matrix of axleLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training Coordinate figure determines.In the present invention, parameter g in described voice signal property and music score of Chinese operas feature is song to be identified, and n is sound Signal characteristic number, m be music score of Chinese operas number of features, L be the number of training song.

In step 1032, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalizationgx,1…agx,n bgx,1…bgx,m)。

In step 1034, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (agx,1…agx,n bgx,1… bgx,m), first training hyperplaneThe first emotion recognition model T with X-axisXIt is updated to In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) sit in the first of X-direction Scale value Xg;Wherein, describedFor training the pth of voice signal propertyiIndividual feature,For training the q of music score of Chinese operas featureiIndividual spy Levy, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines; Described the first emotion recognition model based on the first coordinate axes is based on described first training hyperplane and the first training normalized moments Battle array determines.

As shown in Figure 2 B, in a kind of feasible embodiment, step 103 obtains the second coordinate figure by the following method.

In step 1030 ', sit according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on second Second training matrix of parameter, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on Y-axis the is added Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes Scale value determines.

In step 1032 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalizationgy,1…agy,n bgy,1…bgy,m)。

In step 1034 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (aGx, 1…aGx, n bGx, 1… bGx, m), second training hyperplaneThe second emotion recognition model T with Y-axisYIt is updated to SVM In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) at the second coordinate of Y direction Value Yg, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1… ri∈[1,n];s1…si∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;Described The second emotion recognition model based on the second coordinate axes is true based on described second training hyperplane and the second training normalization matrix Fixed.

It should be appreciated that step 1030 and step 1030 ' execution sequence without successively, can synchronize to perform.In like manner, suddenly 1032 and step 1032 ' execution sequence without successively, can synchronize to perform.Rapid 1034 and step 1034 ' execution sequence without successively, Can synchronize to perform.

In the embodiment of the present application, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described plane is straight The quadrant of angle coordinate system and performance affective style one_to_one corresponding.The embodiment of the present application is by the acoustical signal of performance audio frequency to be identified Feature and music score of Chinese operas feature determine that the affective characteristics of described performance audio frequency to be identified is respectively at the first coordinate axes and the second coordinate axes Coordinate figure, and determine, according to the first coordinate figure and the second coordinate figure, the performance that the affective characteristics of described performance audio frequency to be identified is corresponding Affective style, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can know from sing The emotion of other singer.

It addition, the affective characteristics of the present embodiment extraction and speech emotion recognition and music emotion identification are in terms of feature extraction There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only It is tone, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains And voice signal property, identify the performance emotion of corresponding singer, for same song, it is possible to know according to different singers Do not go out the emotion of corresponding performance, more precisely identify the emotion of singer.

Embodiment three

Referring to Fig. 3, the embodiment of the present application provides a kind of emotion identification method of singing, the present embodiment and embodiment one, Two is roughly the same, and the present embodiment is specifically told about: sets up the first emotion recognition model based on the first coordinate axes and sits based on second Second emotion recognition model of parameter, specifically can be accomplished by.

In step 301, extract voice signal property and the music score of Chinese operas feature treating that audio frequency is sung in training.Concrete, extract and wait to instruct Practice the voice signal property A singing audio frequencyi,jAnd music score of Chinese operas feature Bi,k.Wherein, Ai,jRepresent that the i-th head treats that the jth of audio frequency is sung in training The eigenvalue of individual voice signal property, 1≤j≤n, n are voice signal property total number, Bi,kRepresent that the i-th head treats that sound is sung in training The eigenvalue of the kth music score of Chinese operas feature of frequency, 1≤k≤m, m are music score of Chinese operas feature total number.

In step 302, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Training coordinate figure and the second training coordinate figure.Here, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The One training coordinate figure XiRepresent that the i-th head that the professional of connection music marks treats that training sings audio frequency at the first coordinate axes Coordinate figure, the second training coordinate figure YiRepresent that i-th head that marks of professional of connection music treats that training sings audio frequency the The coordinate figure of two coordinate axess, then the i-th head treats that audio frequency characteristics is sung in training is (Ai,1…Ai,n Bi,1…Bi,m Xi Yi)。XiAnd Yi Can directly use the coordinate figure that the professional of connection music marks in advance.

In step 303, determine respectively based on the first coordinate axes according to described first training coordinate figure, the second training coordinate figure The first training matrix, the second training matrix based on the second coordinate axes.After the feature of all L songs has all been extracted, The matrix of a L* (n+m+2) will be formedThis matrix is divided into base The first training matrix in the first coordinate axesWith based on the second coordinate axes Second training matrix

In step 304, described first training matrix, the second training matrix are normalized respectively, obtain first Training normalization matrix and the second training normalization matrix.Concrete, to X-axis the first training matrixIn data be normalized by row, make span be [-1, 1], the first training normalization matrix after normalization isIn formula:

ai,j∈[-1,1],bi,k∈[-1,1],xi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

In like manner, the second training matrix to Y-axis obtain after carrying out same normalized normalized second training return One changes matrix battle array is:In formula:

ai,j∈[-1,1],bi,k∈[-1,1],yi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

In step 305, described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively, Obtain the first training hyperplane based on the first coordinate axes, the second training hyperplane based on the second coordinate axes.First instruction of X-axis Practice normalization matrixSubstituting into SVM algorithm, this algorithm will ask for the one of X-direction Individual hyperplane, this hyperplane can be by xiIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by sound Partial Feature composition in signal characteristic and music score of Chinese operas feature, if the hyperplane of the X-axis tried to achieve isWhereinPth for voice signal propertyiIndividual feature,For music score of Chinese operas feature qiIndividual feature, p1…pi∈[1,n];q1…qi∈[1,m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal propertyiIndividual feature,S for music score of Chinese operas featureiIndividual Feature, r1…ri∈[1,n];s1…si∈[1,m]。

In step 306, the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtain based on the The emotion recognition model of one coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains base Emotion recognition model in the second coordinate axes.According to the X-axis hyperplane tried to achieveWherein p1…pi∈[1,n];q1…qi∈ [1, m], andBring in SVM algorithm, Try to achieve the emotion recognition model of X-axis, be set to TX.In like manner can try to achieve the emotion recognition model of Y-axis, be set to TY

Embodiment four

In conjunction with Figure 1A to Fig. 3, the embodiment of the present application provides one to sing emotion identification method, generally comprises two processes: (1) foundation of emotion recognition model is sung;(2) identification of emotion is sung.

(1) foundation of emotion recognition model is sung

This process is mainly used in: set up the first emotion recognition model based on the first coordinate axes and based on the second coordinate axes Second emotion recognition model.During setting up performance emotion recognition model, a large amount of collection in advance is needed to comprise various emotion Performance voice data (as treat training sing audio frequency), sing voice data require be pure voice as far as possible, collect correspondence simultaneously The music score of Chinese operas given song recitals.

These emotions singing audio frequency collected are classified by the personnel then looking for a little connection music specialty: first Determining the kind of emotional semantic classification, the most each performance audio frequency is desirable that the professional of connection music each listens one time, and respectively From carrying out Emotion tagging, when major part professional all think this initial performance current sing audio frequency belong to a certain emotion time, then ought This first audio frequency front is assigned under the catalogue of this emotion, otherwise abandons this audio frequency, is all classified by all performance audio frequency according to this.Need Illustrate: sing the performance that there may be the situation such as prelude and climax parts of singing emotion change in audio frequency for one section Emotion may be different, and this performance audio frequency now should be divided into some section audios of emotion independence by the professional of connection music, The emotion making every section of interior audio frequency is consistent, and the music score of Chinese operas of simultaneously corresponding song also should be by audio content segmentation and carry out mark Note is allowed to the audio frequency one_to_one corresponding with segmentation.

After said process, it is possible to audio frequency will be sung and press emotional semantic classification, and make the audio number of every class emotion identical; The music score of Chinese operas of song of simultaneously also having classified, the audio frequency one_to_one corresponding being allowed to Yu having classified.

By emotional category analysis the voice signal property of the performance audio frequency extracting each emotional category, extract simultaneously and drill The music score of Chinese operas feature of bent correspondence of singing.It should be understood that be different from speech emotion recognition and spy that music emotion identification is extracted Levying, the feature extracted herein comprises following content, and the voice signal property singing audio frequency extracts the following aspects Content: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, Centre of moment standard deviation, MFCC feature, language spectrum signature;The content of music score of Chinese operas feature extraction the following aspects: beat number per minute, big Tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.Sing voice signal property and the music score of Chinese operas of audio frequency Feature be both for that same head gives song recitals same section carries out extracting, and has sung which song, the music score of Chinese operas as sung audio frequency In extract the feature of these several the music of song the most accordingly.(remarks: speech emotion recognition has only to extract audio frequency characteristics and is not related to the music score of Chinese operas The extraction of feature, and its audio frequency characteristics also simply tone, word speed etc.;Although audio frequency characteristics and the music score of Chinese operas are also extracted in music emotion identification Feature, but it is not related to the extraction of language spectrum signature etc..Therefore with speech emotion recognition and music emotion identification in feature extraction side There is difference in face.)

After carrying out above-mentioned pretreatment work, the foundation singing emotion recognition model specifically can be real in the following manner Existing.Here, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate with Sing affective style one_to_one corresponding.First coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.Described plane rectangular coordinates The performance affective style that the quadrant of system is corresponding includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane is straight The quadrant of angle coordinate system includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second quadrant are corresponding Happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, fourth quadrant is corresponding naturally tranquil.

Concrete, performances emotion is divided into 4 classifications by the present embodiment, the saddest anxiety cheerful and light-hearted, nervous dejected, glad and oneself The tranquilest, four quadrants of corresponding flat rectangular coordinate system respectively, the affective style given song recitals is by the professional people of connection music Member is labeled in extracted emotional category characteristic (X and Y-direction in plane right-angle coordinate with coordinate form after determining Span is [-1,1], value more deviation X and Y coordinates axle, illustrates that its certain emotion is the most obvious;Value, the closer to X, Y coordinate axle, is said Its certain affective characteristics bright is the faintest).The training of the present embodiment and recognizer are SVM algorithm, by the professional people of connection music Member has marked the coordinate figure that user sings the quadrant at emotion place, extracts user and sings affective characteristics and extract its performance emotion seat Point X-axis data and Y-axis data after completing the extraction of all features and coordinate, are normalized, then divide by scale value respectively Jia Ru not be trained by SVM.The data trained according to these, it is special in the emotion of X-axis and Y-axis that SVM can show that user sings emotion The optimal hyperplane value levied, thus obtain emotion recognition model based on X-axis and Y-axis.

Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model;Described emotion is special Levy and include voice signal property and music score of Chinese operas feature.In conjunction with Figure 1A and Fig. 3, the journey of setting up of emotion recognition model specifically see Fig. 3 Shown implementation method.

In step 301, extract voice signal property and the music score of Chinese operas feature treating that audio frequency is sung in training.Concrete, extract and wait to instruct Practice the voice signal property A singing audio frequencyi,jAnd music score of Chinese operas feature Bi,k.Wherein, Ai,jRepresent that the i-th head treats that the jth of audio frequency is sung in training The eigenvalue of individual voice signal property, 1≤j≤n, n are voice signal property total number, Bi,kRepresent that the i-th head treats that sound is sung in training The eigenvalue of the kth music score of Chinese operas feature of frequency, 1≤k≤m, m are music score of Chinese operas feature total number.

In step 302, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Training coordinate figure and the second training coordinate figure.First training coordinate figure XiRepresent connection music professional mark i-th Head treats that the audio frequency coordinate figure at the first coordinate axes, the second training coordinate figure Y are sung in trainingiRepresent professional's mark of connection music The i-th head noted treats that the audio frequency coordinate figure at the second coordinate axes is sung in training, and then the i-th head treats that training is sung audio frequency characteristics and is (Ai,1…Ai,n Bi,1…Bi,m Xi Yi)。XiAnd YiCan directly use the coordinate figure that the professional of connection music marks in advance.

In step 303, determine respectively based on the first coordinate axes according to described first training coordinate figure, the second training coordinate figure The first training matrix, the second training matrix based on the second coordinate axes.After the feature of all L songs has all been extracted, The matrix of a L* (n+m+2) will be formedThen this matrix is divided Become the first training matrix based on the first coordinate axesWith based on the second coordinate Second training matrix of axle

In step 304, described first training matrix, the second training matrix are normalized respectively, obtain first Training normalization matrix and the second training normalization matrix.Concrete, to X-axis the first training matrixIn data be normalized by row, make span be [-1, 1], the first training normalization matrix after normalization isIn formula:

ai,j∈[-1,1],bi,k∈[-1,1],xi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

In like manner, the first training matrix to Y-axis obtain after carrying out same normalized normalized second training return One changes matrix battle array is:In formula:

ai,j∈[-1,1],bi,k∈[-1,1],yi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

In step 305, described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively, Obtain the first training hyperplane based on the first coordinate axes, the second training hyperplane based on the second coordinate axes.First instruction of X-axis Practice normalization matrixSubstituting into SVM algorithm, this algorithm will ask for the one of X-direction Individual hyperplane, this hyperplane can be by xiIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by sound Partial Feature composition in signal characteristic and music score of Chinese operas feature, if the hyperplane of the X-axis tried to achieve isWhereinPth for voice signal propertyiIndividual feature,Q for music score of Chinese operas featurei Individual feature, p1…pi∈[1,n];q1…qi∈[1,m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal propertyiIndividual feature,S for music score of Chinese operas featureiIndividual Feature, r1…ri∈[1,n];s1…si∈[1,m]。

In step 306, the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtain based on the First emotion recognition model of one coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, To the second emotion recognition model based on the second coordinate axes.According to the X-axis hyperplane tried to achieveWherein p1…pi∈[1,n];q1…qi∈ [1, m], andBring in SVM algorithm, the first emotion recognition model of X-axis can be tried to achieve, if For TX.In like manner can try to achieve the second emotion recognition model of Y-axis, be set to TY。TXAnd TYIt is the performance emotion recognition model of foundation.

Described first emotion recognition model is for determining that the affective characteristics of performance audio frequency to be identified is at the first change in coordinate axis direction The first coordinate figure, the second emotion recognition model is for determining that the affective characteristics of performance audio frequency to be identified is at the second change in coordinate axis direction The second coordinate figure.

(2) identification of emotion is sung

Step 102, extract the affective characteristics of performance audio frequency to be identified.The affective characteristics that step 102 is extracted includes that sound is believed Number feature and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and feature that music emotion identification is extracted, the present embodiment The voice signal property extracted include following at least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, super Cross the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described music score of Chinese operas feature include with Descend at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.

Step 103, the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identify performance sound to be identified The emotion of frequency.Concrete, step 103 includes:

The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model, Determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;

The quadrant that described affective characteristics is corresponding is determined, to determine described feelings according to described first coordinate figure and the second coordinate figure The performance affective style that sense feature is corresponding.

As shown in Figure 2 A, in a kind of feasible embodiment, step 103 obtains the first coordinate figure by the following method.

In step 1030, according to voice signal property, the music score of Chinese operas feature of described performance audio frequency to be identified and sit based on first First training matrix of parameter, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on X-axis the is added One training matrixLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training Coordinate figure determines

In step 1032, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalizationgx,1…agx,n bgx,1…bgx,m)。

In step 1034, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (agx,1…agx,n bgx,1… bgx,m), first training hyperplaneThe first emotion recognition model T with X-axisXIt is updated to In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) in the first of X-direction Coordinate figure Xg;Wherein, describedFor training the pth of voice signal propertyiIndividual feature,For training the q of music score of Chinese operas featureiIndividual spy Levy, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines; Described the first emotion recognition model based on the first coordinate axes is based on described first training hyperplane and the first training normalized moments Battle array determines.

As shown in Figure 2 B, in a kind of feasible embodiment, step 103 obtains the second coordinate figure by the following method.

In step 1030 ', sit according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on second Second training matrix of parameter, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on Y-axis the is added Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes Scale value determines.

In step 1032 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalizationgy,1…agy,n bgy,1…bgy,m)。

In step 1034 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (agy,1…agy,n bgy,1… bgy,m), second training hyperplaneThe second emotion recognition model T with Y-axisYIt is updated to SVM In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) at the second coordinate of Y direction Value Yg, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1… ri∈[1,n];s1…si∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;Described The second emotion recognition model based on the second coordinate axes is true based on described second training hyperplane and the second training normalization matrix Fixed.

It should be appreciated that step 1030 and step 1030 ' execution sequence without successively, can synchronize to perform.In like manner, suddenly 1032 and step 1032 ' execution sequence without successively, can synchronize to perform.Rapid 1034 and step 1034 ' execution sequence without successively, Can synchronize to perform.

In the embodiment of the present application, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described plane is straight The quadrant of angle coordinate system and performance affective style one_to_one corresponding.The embodiment of the present application is by the acoustical signal of performance audio frequency to be identified Feature and music score of Chinese operas feature determine that the affective characteristics of described performance audio frequency to be identified is respectively at the first coordinate axes and the second coordinate axes Coordinate figure, and determine, according to the first coordinate figure and the second coordinate figure, the performance that the affective characteristics of described performance audio frequency to be identified is corresponding Affective style, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can know from sing The emotion of other singer.

It addition, the affective characteristics of the present embodiment extraction and speech emotion recognition and music emotion identification are in terms of feature extraction There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only It is tone, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains And voice signal property, identify the performance emotion of corresponding singer, for same song, it is possible to know according to different singers Do not go out the emotion of corresponding performance, more precisely identify the emotion of singer.

Embodiment five

Refer to Fig. 4 to Fig. 5 B, it is shown that the optional emotion recognition method for establishing model of the embodiment of the present application other Schematic flow sheet, the application can apply to terminal unit, it is also possible to being applied to emotion recognition model and set up device, this device can It is typically positioned in terminal unit in the way of with software, hardware or software and hardware combining.Set with executive agent for terminal below Illustrate as a example by Bei, can be implemented as described below in conjunction with the method shown in Fig. 4 to Fig. 5 B.

In step 400, obtain and treat that audio sample is sung in training, treat that sound is sung in training according to default affective style to described Frequently sample carries out emotional semantic classification, determines that multiple the waiting corresponding with affective style trains performance audio frequency subsample;Wherein, it is used for determining The emotional factor of affective style includes pressure factor and capacity factor.

This step is collected the performance audio frequency comprising various emotion in a large number and is i.e. treated that audio frequency is sung in training, as treating that sound is sung in training Frequently sample.Sing audio request is pure voice as far as possible, collects simultaneously and sings the music score of Chinese operas that audio frequency correspondence gives song recitals.This step is permissible It is that terminal unit is directly from local or storage device or the performance audio frequency of Network Capture collector collection.

After having collected performance audio frequency, terminal unit can carry out emotion according to default affective style to performance audio frequency and divide Class.Concrete, the performance audio frequency collected can be carried out emotional semantic classification, also according to the criteria for classification of the personnel of connection music specialty The personnel that can directly ask music speciality carry out emotional semantic classification according to their criteria for classification.The classification of the personnel of connection music specialty Standard can be such that the kind first determining emotional semantic classification, the most each performance audio frequency are desirable that the professional people of connection music Member each listens one time, and each carries out Emotion tagging, when major part professional thinks that this initial performance current is sung audio frequency and belonged to certain During a kind of emotion, then this first audio frequency current is assigned under the catalogue of this emotion, otherwise abandon this audio frequency, according to this by all performances Audio frequency is all classified.Should be noted that: one section sing in audio frequency there may be sing the situation of emotion change such as before Playing the performance emotion with climax parts may be different, now by the professional of connection music, this performance audio frequency should be divided into emotion Independent some section audios so that the emotion of every section of interior audio frequency is consistent, the music score of Chinese operas of the most corresponding song also should be by sound Frequently content section carry out the audio frequency one_to_one corresponding that mark is allowed to segmentation.

After terminal unit carries out above-mentioned emotional semantic classification, it may be determined that multiple wait corresponding with affective style trains performance audio frequency Sample.Concrete, audio frequency will be sung and press emotional semantic classification, and make the audio number of every class emotion identical, while also to classify The music score of Chinese operas of song, makes the music score of Chinese operas and the audio frequency one_to_one corresponding classified.

Step 402, extraction respectively treat that the affective characteristics of audio frequency subsample is sung in training, treat that audio frequency is sung in training described in all The affective characteristics of subsample is based respectively on pressure dimension and energy dimension is normalized, and correspondence obtains normalization pressure feelings Sense feature and normalized energy affective characteristics.

Step 404, described normalization pressure affective characteristics and normalized energy affective characteristics are carried out SVM algorithm instruction respectively Practicing, correspondence obtains the pressure index for determining pressure factor size and for determining the nergy Index of capacity factor height.

Step 406, described normalization pressure affective characteristics and pressure index are carried out SVM algorithm training, obtain for really First emotion recognition model of constant-pressure factor;Described normalized energy affective characteristics and nergy Index are carried out SVM algorithm instruction Practice, obtain the second emotion recognition model for determining capacity factor.

It will be understood by those skilled in the art that in the said method of the application detailed description of the invention, the sequence number of each step Size is not meant to the priority of execution sequence, and the execution sequence of each step, logical combination should be true with its function and internal logic Fixed, and the implementation process of the application detailed description of the invention should not constituted any restriction.

In the embodiment of the present application, can be obtained by step 400-406 can determine pressure factor first emotion know Other model and determine the second emotion recognition model of capacity factor, so that the terminal unit of this method can be performed or other can be indirect The terminal unit calling this method can determine performance to be identified according to the first emotion recognition model and the second emotion recognition model The pressure factor of audio frequency and capacity factor, and then determine the performance affective style that performance audio frequency to be identified is corresponding.Pass through the application Embodiment, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can identify from sing The emotion of singer.

In an optional embodiment, two principal elements affecting music emotion are pressure and energy, due to pressure Can be the most corresponding with acoustic features with capacity factor, therefore can be music according to the power of pressure factor (Valence) Affective characteristics be divided into from anxiety to happy, can be the emotion of music according to the power of capacity factor (Arousal) Feature be divided into from vigor to tranquil.Four area of space that corresponding two dimensional surface rectangular coordinate system is divided into, music Be segmented into following four big classes: nervous/frightened, cheerful, meet, dejected.As shown in Figure 5A, pressure (Valence) dimension can be represented by the first coordinate axes, and energy (Arousal) dimension can be represented by the second coordinate axes, wherein, First coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance emotion class Type one_to_one corresponding.First coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The quadrant pair of described plane right-angle coordinate The performance affective style answered includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane right-angle coordinate Quadrant includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second corresponding happiness of quadrant be cheerful and light-hearted, the Three quadrant correspondence sadnesss are dejected, fourth quadrant is corresponding naturally tranquil.

Based on above-mentioned optional embodiment, as shown in Figure 5 B, step 402 may be accomplished by.

Step 4021, determine described in treat that the first instruction corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Practice coordinate figure and the second training coordinate figure.Wherein, described affective characteristics can include voice signal property and music score of Chinese operas feature.

By emotional category analysis the voice signal property of the performance audio frequency extracting each emotional category, extract simultaneously and drill The music score of Chinese operas feature of bent correspondence of singing.It should be understood that be different from speech emotion recognition and spy that music emotion identification is extracted Levying, the feature extracted herein comprises following content, and the voice signal property singing audio frequency extracts the following aspects Content: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, Centre of moment standard deviation, MFCC feature, language spectrum signature;The content of music score of Chinese operas feature extraction the following aspects: beat number per minute, big Tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.Sing voice signal property and the music score of Chinese operas of audio frequency Feature be both for that same head gives song recitals same section carries out extracting, and has sung which song, the music score of Chinese operas as sung audio frequency In extract the feature of these several the music of song the most accordingly.

Concrete, extract the voice signal property A treating that audio frequency is sung in trainingi,jAnd music score of Chinese operas feature Bi,k.Wherein, Ai,jRepresent I-th head treats that the eigenvalue of the jth voice signal property of audio frequency is sung in training, and 1≤j≤n, n are voice signal property total number, Bi,kRepresenting that the i-th head treats that the eigenvalue of the kth music score of Chinese operas feature of audio frequency is sung in training, 1≤k≤m, m are music score of Chinese operas feature total number.

In step 4021, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Training coordinate figure and the second training coordinate figure.Here, the first training coordinate figure XiRepresent professional's mark of connection music The i-th good head treats that the audio frequency coordinate figure at the first coordinate axes, the second training coordinate figure Y are sung in trainingiRepresent the special of connection music The i-th head that industry personnel mark treats that the audio frequency coordinate figure at the second coordinate axes is sung in training, and then the i-th head treats that audio frequency is sung in training It is characterized as (Ai,1…Ai,n Bi,1…Bi,m Xi Yi)。XiAnd YiCan directly use the seat that the professional of connection music marks in advance Scale value.After the feature of all L songs has all been extracted, the matrix of a L* (n+m+2) will be formed

Step 4022, according to treating that affective characteristics and the first training coordinate figure of audio frequency subsample are sung in training described in all, Determine the first training matrix based on the first coordinate axes;According to treat described in all training sing audio frequency subsample affective characteristics and Second training coordinate figure determines the second training matrix based on the second coordinate axes.

Concrete, the first training matrix based on X-axis isBased on Y Second training matrix of axle is

Step 4023, being normalized described first training matrix, the second training matrix respectively, correspondence obtains One training normalization matrix, the second training normalization matrix.Here, the first training normalized moments matrix representation normalization pressure feelings Sense feature, the second training normalized moments matrix representation normalized energy affective characteristics.

Concrete, to X-axis the first training matrixIn data press Row are normalized, and making span is [-1,1], and the first training normalization matrix after normalization isIn formula:

ai,j∈[-1,1],bi,k∈[-1,1],xi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

In like manner, the first training matrix to Y-axis obtain after carrying out same normalized normalized second training return One changes matrix battle array is:In formula:

ai,j∈[-1,1],bi,k∈[-1,1],yi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

Based on above-mentioned optional embodiment, step 404 particularly as follows: train normalization matrix, the second instruction by described first Practicing normalization matrix and substitute into SVM algorithm respectively, correspondence obtains the first training hyperplane based on the first coordinate axes, sits based on second Second training hyperplane of parameter;Wherein, described first training hyperplane is used for determining pressure factor size, described second training Hyperplane is used for determining capacity factor height.Concrete, by the first training normalization matrix of X-axisSubstituting into SVM algorithm, this algorithm will ask for a hyperplane of X-direction, should Hyperplane can be by xiIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by voice signal property and song Partial Feature composition in spectrum signature, if the hyperplane of the X-axis tried to achieve isWherein Pth for voice signal propertyiIndividual feature,Q for music score of Chinese operas featureiIndividual feature, p1…pi∈[1,n];q1…qi∈[1, m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal propertyi Individual feature,S for music score of Chinese operas featureiIndividual feature, r1…ri∈[1,n];s1…si∈[1,m]。

Step 406, particularly as follows: the first training hyperplane and the first training matrix are substituted into SVM algorithm, obtains for determining First emotion recognition model of the first coordinate identification value;Second training hyperplane and the second training matrix are substituted into SVM algorithm, To the second emotion recognition model for determining the second coordinate identification value.Here, the first coordinate identification value represents pressure factor, Second coordinate identification value represents capacity factor.According to the X-axis hyperplane tried to achieveWherein p1…pi∈[1,n];q1…qi∈ [1, m], andBring in SVM algorithm, Try to achieve the training pattern of X-axis, be set to TX.In like manner can try to achieve the training pattern of Y-axis, be set to TY

In above-mentioned optional embodiment, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described The quadrant of plane right-angle coordinate and performance affective style one_to_one corresponding, the embodiment of the present application can obtain and can determine the first seat First emotion recognition model of scale value and the second emotion recognition model of the second coordinate figure, so that performing the terminal unit of this method Or other terminal units that can indirectly call this method can determine described to be identified according to the first coordinate figure and the second coordinate figure Sing the performance affective style that audio frequency is corresponding, it is possible to realize the performance emotion class of the performance audio identification singer according to singer Type, can identify the emotion of singer from sing.

The embodiment of the present application can obtain the first emotion recognition model T that can determine the first coordinate figureXWith the second coordinate Second emotion recognition model T of valueY, so that the terminal unit of execution this method or other terminals that can indirectly call this method set For determining, according to the first coordinate figure and the second coordinate figure, the performance affective style that described performance audio frequency to be identified is corresponding, it is possible to Realize the performance affective style of the performance audio identification singer according to singer, the feelings of singer can be identified from sing Sense.

The affective characteristics that the present embodiment extracts exists in terms of feature extraction with speech emotion recognition and music emotion identification Difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also sound Tune, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature and (are included in In voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, the present embodiment root According to including that the emotion recognition model that the affective characteristics of voice signal property and music score of Chinese operas feature obtains can be according to music score of Chinese operas feature harmony Tone signal feature, identifies the performance emotion of corresponding singer, for same song, it is possible to identify according to different singers The corresponding emotion sung, more precisely identifies the emotion of singer.

Embodiment six

Based on previous embodiment, Fig. 6 A to Fig. 6 C shows that the embodiment of the present application other optionally sings emotion recognition side The schematic flow sheet of method, the application can apply to terminal unit, it is also possible to is applied to emotion recognition model and sets up device, this dress Put and can be typically positioned in terminal unit in the way of software, hardware or software and hardware combining.Below with executive agent as end Illustrating as a example by end equipment, knot method shown in the present embodiment can be implemented as described below.

In the present embodiment, pressure dimension is represented by the first coordinate axes, and energy dimension is represented by the second coordinate axes, the One coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate is with affective style one by one Corresponding.Concrete, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The quadrant of described plane right-angle coordinate Corresponding performance affective style includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane right-angle coordinate Quadrant include with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second corresponding happiness of quadrant be cheerful and light-hearted, Third quadrant correspondence sadness is dejected, fourth quadrant is corresponding naturally tranquil.

Step 600, extracting the affective characteristics of performance audio frequency to be identified, wherein, affective characteristics can include that acoustical signal is special Levy and music score of Chinese operas feature.

Step 602, according to described affective characteristics and the first emotion recognition model, determine described affective characteristics based on pressure tie up The pressure factor of degree;According to described affective characteristics and the second emotion recognition model, determine that described affective characteristics is based on energy dimension Capacity factor;Wherein, described pressure factor and capacity factor are used for determining affective style.In the present embodiment, the first emotion Identifying that model and the second emotion recognition model obtain for setting up based on previous embodiment, concrete model is set up process and be see reality Execute example five.

Concrete, determine described to be identified drill according to the voice signal property of described performance audio frequency to be identified and music score of Chinese operas feature Sing the affective characteristics of audio frequency respectively at the first coordinate axes and the coordinate figure of the second coordinate axes, obtain for characterize pressure factor One coordinate figure and for characterizing the second coordinate figure of capacity factor.

As shown in Figure 6B, in a kind of feasible embodiment, step 602 obtains the first coordinate figure by the following method.

Step 6020, according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on the first coordinate First training matrix of axle, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on X-axis the is added One training matrixLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training Coordinate figure determines.

In step 6022, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalizationgx,1…agx,n bgx,1…bgx,m)。

In step 6024, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (agx,1…agx,n bgx,1… bgx,m), first training hyperplaneThe first emotion recognition model T with X-axisXIt is updated to In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1Bg,m) in the first of X-direction Coordinate figure Xg;Wherein, describedFor training the pth of voice signal propertyiIndividual feature,For training the q of music score of Chinese operas featureiIndividual spy Levy, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines; Described training pattern based on the first coordinate axes determines based on described first training hyperplane and the first training normalization matrix.

As shown in Figure 6 C, in a kind of feasible embodiment, step 602 obtains the second coordinate figure by the following method.

Step 6020 ', according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on the second coordinate Second training matrix of axle, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on Y-axis the is added Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes Scale value determines.

In step 6022 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalizationgy,1…agy,n bgy,1…bgy,m)。

In step 6024 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (agy,1…agy,n bgy,1… bgy,m), second training hyperplaneThe second emotion recognition model T with Y-axisYIt is updated to SVM In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) at the second coordinate of Y direction Value Yg, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1… ri∈[1,n];s1…si∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;Described Training pattern based on the second coordinate axes determines based on described second training hyperplane and the second training normalization matrix.

It should be appreciated that step 6020 and step 6020 ' execution sequence without successively, can synchronize to perform.In like manner, step 6022 and step 6022 ' execution sequence without successively, can synchronize to perform.Step 6024 and step 6024 ' execution sequence without elder generation After, can synchronize to perform.

Step 604, according to described pressure factor and capacity factor, determine the performance feelings that described performance audio frequency to be identified is corresponding Sense type.Concrete, the affective characteristics of described performance audio frequency to be identified is determined according to described first coordinate figure and the second coordinate figure Corresponding performance affective style.

In the embodiment of the present application, can make to perform the terminal unit of this method or other can call this method indirectly Terminal unit according to the first emotion recognition model and the second emotion recognition model, can determine the pressure of performance audio frequency to be identified because of Element and capacity factor, and then determine the performance affective style that performance audio frequency to be identified is corresponding.Pass through the embodiment of the present application, it is possible to real Now according to the performance affective style of the performance audio identification singer of singer, the emotion of singer can be identified from sing.

In the embodiment of the present application, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described plane is straight The quadrant of angle coordinate system and performance affective style one_to_one corresponding.The embodiment of the present application is by the acoustical signal of performance audio frequency to be identified Feature and music score of Chinese operas feature determine that the affective characteristics of described performance audio frequency to be identified is respectively at the first coordinate axes and the second coordinate axes Coordinate figure, and determine, according to the first coordinate figure and the second coordinate figure, the performance that the affective characteristics of described performance audio frequency to be identified is corresponding Affective style, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can know from sing The emotion of other singer.

Embodiment seven

Refer to Fig. 7, it is shown that a kind of performance emotion identification method that the embodiment of the present application provides, the application can apply In terminal unit, it is also possible to be applied to emotion recognition model and set up device, this device can be with software, hardware or software and hardware combining Mode be typically positioned in terminal unit.Illustrate as a example by executive agent is as terminal unit below, method shown in Fig. 7 Can be accomplished by.

Audio frequency is sung in step 700, acquisition user.This step can be terminal unit directly from local or storage device or The performance audio frequency that Network Capture user sings.

Step 702, when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, it is right to export The performance output control instruction answered.

User can be sung audio frequency by speech emotion recognition and music emotion identification and carry out affective style by this step Identify, it is also possible to by previous embodiment one to six arbitrary described method, user is sung audio frequency and carry out the knowledge of affective style Not.

In an optional embodiment, described performance output control instruction include following at least one: sing add sub-control System instruction, signal light control instruction.Such as, user, when KTV sings, sings, when KTV equipment identifies user, the feelings that audio frequency is corresponding When sense type is consistent with preset musical emotion (assuming that preset musical emotion is for glad cheerful and light-hearted), then output performance bonus point controls to refer to Order, carries out bonus point with the performance mark showing KTV equipment.The most such as, user is when KTV sings, when KTV equipment identifies use When the affective style that family performance audio frequency is corresponding is consistent with preset musical emotion (assuming that preset musical emotion is for sad dejected), the most defeated Go out signal light control instruction, the luminaire being connected with KTV equipment to be carried out signal light control, concrete, can control and KTV equipment The luminaire output blue light connected, to embody sad dejected scene.

Embodiment 8

Referring to Fig. 8, the present embodiment provides one to sing emotion recognition device, including:

Training module 800, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Institute State affective characteristics and include voice signal property and music score of Chinese operas feature;

Extraction module 801, for extracting the affective characteristics of performance audio frequency to be identified;

Identification module 802, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies and waits to know Do not sing the emotion of audio frequency.

Optionally, described voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average, Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each The average duration of sound.

Optionally, described training module is as follows:

Training coordinate figure determines unit, treats that training sings the affective characteristics of audio frequency respectively at the first coordinate described in determining The training coordinate figure of axle and the second coordinate axes, obtains the first training coordinate figure and the second training coordinate figure;Wherein, described first sit Parameter and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate is with performance affective style one by one Corresponding;

Training matrix determines unit, for according to described first training coordinate figure and the affective characteristics treating training performance audio frequency Set up the first training matrix, set up the second training according to described second training coordinate figure and the affective characteristics treating training performance audio frequency Matrix;

Training normalization matrix determines unit, for the first training matrix is normalized into the first training normalization matrix; Second training matrix is normalized into the second training normalization matrix;

Training hyperplane determines unit, for described first training normalization matrix, the second training normalization matrix being divided Not substituting into SVM algorithm, correspondence obtains the first training hyperplane, the second training hyperplane;

Emotion recognition model determines unit, for the first training hyperplane and the first training normalization matrix are substituted into SVM Algorithm, obtains the first emotion recognition model based on the first coordinate axes;By the second training hyperplane and the second training normalized moments Battle array substitutes into SVM algorithm, obtains the second emotion recognition model based on the second coordinate axes.

Optionally, described identification module includes:

Input block, for inputting the first emotion recognition model and second respectively by the affective characteristics of performance audio frequency to be identified Emotion recognition model, determines described affective characteristics the first coordinate figure based on the first coordinate axes and based on the second coordinate axes second Coordinate figure;

Determine unit, for according to described first coordinate figure and the second coordinate figure determine described affective characteristics corresponding as Limit, the performance affective style corresponding to determine described affective characteristics.

Optionally, described determine unit, specifically for:

According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix;To institute State fisrt feature matrix to be normalized, obtain the first normalization matrix, and then obtain the emotion of performance audio frequency to be identified Feature matrix after the first training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the first training matrix Matrix, the first training hyperplane and the first emotion recognition model after normalization substitute into SVM algorithm, obtain affective characteristics first First coordinate figure of change in coordinate axis direction.

Optionally, described first coordinate axes is X-axis, the most described determines unit, specifically for:

By described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) the first training matrix based on X-axis is addedLast OK, fisrt feature matrix is obtained

To describedData in matrix are normalized by row, Obtain the first normalization matrixThen the number of matrix last column is extracted According to, obtain the affective characteristics of described performance audio frequency to be identified matrix (a after the first training matrix normalizationgx,1…agx,n bgx,1…bgx,m);

By described (agx,1…agx,n bgx,1…bgx,m), first training hyperplaneWith First emotion recognition model T of X-axisXIt is updated in SVM algorithm, obtains the affective characteristics (A of described performance audio frequency to be identifiedg,1… Ag,n Bg,1…Bg,m) at the first coordinate figure X of X-directiong;Wherein, describedFor training the pth of voice signal propertyiIndividual spy Levy,For training the q of music score of Chinese operas featureiIndividual feature, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n is that acoustical signal is special Levy number, m is music score of Chinese operas number of features, L is the number training song.

Optionally, described determine unit, specifically for:

According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix;To institute State second characteristic matrix to be normalized, obtain the second normalization matrix, and then obtain the emotion of performance audio frequency to be identified Feature matrix after the second training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the second training matrix Matrix, the second training hyperplane and the second emotion recognition model after normalization substitute into SVM algorithm, obtain affective characteristics second Second coordinate figure of change in coordinate axis direction.

Optionally, the second coordinate axes is Y-axis, the most described determines unit, specifically for:

By described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) the second training matrix based on Y-axis is addedLast OK, second characteristic matrix is obtained

To describedData in matrix are normalized by row, Obtain the second normalization matrixThen the number of matrix last column is extracted According to, obtain the affective characteristics of described performance audio frequency to be identified matrix (a after the second training matrix normalizationgy,1…agy,n bgy,1…bgy,m);

By described (agy,1…agy,n bgy,1…bgy,m), second training hyperplaneAnd Y Second emotion recognition model T of axleYIt is updated in SVM algorithm, obtains the affective characteristics (A of described performance audio frequency to be identifiedg,1… Ag,n Bg,1…Bg,m) at the second coordinate figure Y of Y directiong, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1…ri∈[1,n];s1…si∈ [1, m], wherein n is voice signal property number Mesh, m be music score of Chinese operas number of features, L be the number of training song.

Optionally, the performance affective style that the quadrant of described plane right-angle coordinate is corresponding includes: nervous anxiety, happiness are joyous Hurry up, sad dejected, calmness naturally.

Optionally, the quadrant of described plane right-angle coordinate and the corresponding relation singing affective style include:

First quartile correspondence anxiety anxiety, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, four-quadrant Limit is corresponding naturally tranquil.

This device embodiment is the most corresponding with the method feature in previous embodiment, and correlation module/unit can corresponding perform Method flow in previous embodiment, therefore can be found in the associated description of method flow part in previous embodiment, at this no longer Repeat.

The embodiment of the present application also provides for a kind of electric terminal, the performance emotion recognition dress provided including such as previous embodiment Put.This device embodiment is the most corresponding with the method feature in previous embodiment, therefore can be found in method stream in previous embodiment The associated description of journey part, does not repeats them here.

Embodiment nine

Referring to Fig. 9, the present embodiment provides one to sing and identifies device, including:

Acquisition module 901, is used for obtaining user and sings audio frequency;

Identification module 902, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, The performance output control instruction that output is corresponding.

Optionally, described performance output control instruction include following at least one: sing bonus point control instruction, signal light control Instruction.

This device embodiment is the most corresponding with the method feature in previous embodiment, and correlation module/unit can corresponding perform Method flow in previous embodiment, therefore can be found in the associated description of method flow part in previous embodiment, at this no longer Repeat.

Referring to Figure 10, the embodiment of the present application also provides for a kind of electric terminal, including:

Memorizer 1000;

One or more processors 1003;And

One or more modules 1001, the one or more module 1001 is stored in described memorizer and is configured Becoming to be controlled by the one or more processor, the one or more module is for performing the instruction of following steps:

Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described affective characteristics includes sound Tone signal feature and music score of Chinese operas feature;

Extract the affective characteristics of performance audio frequency to be identified;

The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the feelings of performance audio frequency to be identified Sense.

In a typical configuration, calculating equipment includes one or more processor (CPU), input/output interface, net Network interface and internal memory.

Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.

Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, can be used for the information that storage can be accessed by a computing device.According to defining herein, calculate Machine computer-readable recording medium does not include non-temporary computer readable media (transitory media), such as data signal and the carrier wave of modulation.

As employed some vocabulary in the middle of description and claim to censure specific components.Those skilled in the art should It is understood that hardware manufacturer may call same assembly with different nouns.This specification and claims are not with name The difference claimed is used as distinguishing the mode of assembly, but is used as the criterion distinguished with assembly difference functionally.As logical " comprising " mentioned in the middle of piece description and claim is an open language, therefore should be construed to " comprise but do not limit In "." substantially " referring in receivable range of error, those skilled in the art can solve described in the range of certain error Technical problem, basically reaches described technique effect.Additionally, " coupling " word comprises any directly and indirectly electric property coupling at this Means.Therefore, if a first device is coupled to one second device described in literary composition, then representing described first device can direct electrical coupling It is connected to described second device, or is indirectly electrically coupled to described second device by other devices or the means that couple.Description Subsequent descriptions is to implement the better embodiment of the application, for the purpose of right described description is the rule so that the application to be described, It is not limited to scope of the present application.The protection domain of the application is when being as the criterion depending on the defined person of claims.

Also, it should be noted term " includes ", " comprising " or its any other variant are intended to nonexcludability Comprise, so that include that the commodity of a series of key element or system not only include those key elements, but also include the most clearly Other key elements listed, or also include the key element intrinsic for this commodity or system.In the feelings not having more restriction Under condition, statement " including ... " key element limited, it is not excluded that in the commodity including described key element or system also There is other identical element.

Described above illustrate and describes some preferred embodiments of the present invention, but as previously mentioned, it should be understood that the present invention Be not limited to form disclosed herein, be not to be taken as the eliminating to other embodiments, and can be used for other combinations various, Amendment and environment, and can be in invention contemplated scope described herein, by above-mentioned teaching or the technology of association area or knowledge It is modified.And the change that those skilled in the art are carried out and change are without departing from the spirit and scope of the present invention, the most all should be at this In the protection domain of bright claims.

Claims (17)

1. sing emotion identification method for one kind, it is characterised in that including:
Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described affective characteristics includes that sound is believed Number feature and music score of Chinese operas feature;
Extract the affective characteristics of performance audio frequency to be identified;
The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the emotion of performance audio frequency to be identified.
Performance emotion identification method the most according to claim 1, it is characterised in that described voice signal property includes following At least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, average The centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described music score of Chinese operas feature include following at least one: beat number per minute, Big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.
Performance emotion identification method the most according to claim 1, it is characterised in that the training package of described emotion recognition model Include:
Treat described in determining training sing audio frequency affective characteristics respectively at the training coordinate figure of the first coordinate axes and the second coordinate axes, Obtain the first training coordinate figure and the second training coordinate figure;Wherein, described first coordinate axes and the second coordinate axes composition plane are straight Angle coordinate system, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding;
The first training matrix is set up, according to described according to described first training coordinate figure and the affective characteristics treating training performance audio frequency Second training coordinate figure and the affective characteristics treating training performance audio frequency set up the second training matrix;
First training matrix is normalized into the first training normalization matrix;Second training matrix is normalized into the second training return One changes matrix;
Described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively, and correspondence obtains the first instruction Practice hyperplane, the second training hyperplane;
First training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtains based on the first coordinate axes first Emotion recognition model;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains based on the second coordinate Second emotion recognition model of axle.
Performance emotion identification method the most according to claim 3, it is characterised in that the described feelings by performance audio frequency to be identified Sense feature input emotion recognition model, the emotion identifying performance audio frequency to be identified includes:
The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model, determines Described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;
The quadrant that described affective characteristics is corresponding is determined, to determine that described emotion is special according to described first coordinate figure and the second coordinate figure Levy the performance affective style of correspondence.
Performance emotion identification method the most according to claim 4, it is characterised in that the described feelings by performance audio frequency to be identified Sense feature input emotion recognition model, determines that described affective characteristics the first coordinate figure based on the first coordinate axes includes:
According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix;To described One eigenmatrix is normalized, and obtains the first normalization matrix, and then obtains the affective characteristics of performance audio frequency to be identified Matrix after the first training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the first training matrix normalizing Matrix, the first training hyperplane and the first emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the first coordinate Axial first coordinate figure.
Performance emotion identification method the most according to claim 4, it is characterised in that the described feelings by performance audio frequency to be identified Sense feature input emotion recognition model, determines that described affective characteristics the second coordinate figure based on the second coordinate axes includes:
According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix;To described Two eigenmatrixes are normalized, and obtain the second normalization matrix, and then obtain the affective characteristics of performance audio frequency to be identified Matrix after the second training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the second training matrix normalizing Matrix, the second training hyperplane and the second emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the second coordinate Axial second coordinate figure.
Performance emotion identification method the most according to claim 3, it is characterised in that the quadrant of described plane right-angle coordinate Corresponding performance affective style includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.
The quadrant of described plane right-angle coordinate includes with the corresponding relation singing affective style: the corresponding anxiety of first quartile is burnt Consider, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, fourth quadrant is corresponding naturally tranquil.
8. sing emotion recognition device for one kind, it is characterised in that including:
Training module, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described emotion is special Levy and include voice signal property and music score of Chinese operas feature;
Extraction module, for extracting the affective characteristics of performance audio frequency to be identified;
Identification module, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies performance to be identified The emotion of audio frequency.
Performance emotion recognition device the most according to claim 8, it is characterised in that described voice signal property includes following At least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, average The centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described music score of Chinese operas feature include following at least one: beat number per minute, Big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.
Performance emotion recognition device the most according to claim 8, it is characterised in that described training module is as follows:
Training coordinate figure determines unit, treat described in determine training sing the affective characteristics of audio frequency respectively at the first coordinate axes and The training coordinate figure of the second coordinate axes, obtains the first training coordinate figure and the second training coordinate figure;Wherein, described first coordinate axes With the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate with sing affective style one a pair Should;
Training matrix determines unit, for setting up according to described first training coordinate figure and the affective characteristics treating training performance audio frequency First training matrix, sets up the second training square according to described second training coordinate figure and the affective characteristics treating training performance audio frequency Battle array;
Training normalization matrix determines unit, for the first training matrix is normalized into the first training normalization matrix;By Two training matrix are normalized into the second training normalization matrix;
Training hyperplane determines unit, for by described first training normalization matrix, the second training normalization matrix generation respectively Entering SVM algorithm, correspondence obtains the first training hyperplane, the second training hyperplane;
Emotion recognition model determines unit, for the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, Obtain the first emotion recognition model based on the first coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains the second emotion recognition model based on the second coordinate axes.
11. performance emotion recognition devices according to claim 10, it is characterised in that described identification module includes:
Input block, for inputting the first emotion recognition model and the second emotion respectively by the affective characteristics of performance audio frequency to be identified Identify model, determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate based on the second coordinate axes Value;
Determine unit, for determining, according to described first coordinate figure and the second coordinate figure, the quadrant that described affective characteristics is corresponding, with Determine the performance affective style that described affective characteristics is corresponding.
12. performance emotion recognition devices according to claim 11, it is characterised in that described determine unit, specifically for:
According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix;To described One eigenmatrix is normalized, and obtains the first normalization matrix, and then obtains the affective characteristics of performance audio frequency to be identified Matrix after the first training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the first training matrix normalizing Matrix, the first training hyperplane and the first emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the first coordinate Axial first coordinate figure.
13. performance emotion recognition devices according to claim 11, it is characterised in that described determine unit, specifically for:
According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix;To described Two eigenmatrixes are normalized, and obtain the second normalization matrix, and then obtain the affective characteristics of performance audio frequency to be identified Matrix after the second training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the second training matrix normalizing Matrix, the second training hyperplane and the second emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the second coordinate Axial second coordinate figure.
14. performance emotion recognition devices according to claim 10, it is characterised in that described plane right-angle coordinate as The performance affective style of limit correspondence includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil;Described plane rectangular coordinates The quadrant of system includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second quadrant are corresponding glad joyous Hurry up, third quadrant correspondence sadness is dejected, fourth quadrant is corresponding naturally tranquil.
Sing recognition methods for 15. 1 kinds, it is characterised in that including:
Obtain user and sing audio frequency;
When identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, output correspondence performance result control System instruction.
16. performance recognition methodss according to claim 15, it is characterised in that described performance output control instruction include with Descend at least one: sing bonus point control instruction, signal light control instruction.
Sing for 17. 1 kinds and identify device, it is characterised in that including:
Acquisition module, is used for obtaining user and sings audio frequency;
Identification module, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, it is right to export Answer performance output control instruction, wherein said performance output control instruction include following at least one: sing bonus point control refer to Make, signal light control instructs.
CN201610517375.4A 2016-06-30 2016-07-02 A kind of performance emotion identification method and device CN106128479B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610506767 2016-06-30
CN2016105067670 2016-06-30

Publications (2)

Publication Number Publication Date
CN106128479A true CN106128479A (en) 2016-11-16
CN106128479B CN106128479B (en) 2019-09-06

Family

ID=57468267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610517375.4A CN106128479B (en) 2016-06-30 2016-07-02 A kind of performance emotion identification method and device

Country Status (1)

Country Link
CN (1) CN106128479B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108174498A (en) * 2017-12-28 2018-06-15 福建海媚数码科技有限公司 A kind of control method and system of the scene lamp based on intelligent Matching
CN108899046A (en) * 2018-07-12 2018-11-27 东北大学 A kind of speech-emotion recognition method and system based on Multistage Support Vector Machine classification
CN108986843A (en) * 2018-08-10 2018-12-11 杭州网易云音乐科技有限公司 Audio data processing method and device, medium and calculating equipment
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and its device, electronic equipment, storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10187178A (en) * 1996-10-28 1998-07-14 Omron Corp Feeling analysis device for singing and grading device
CN1975856A (en) * 2006-10-30 2007-06-06 邹采荣 Speech emotion identifying method based on supporting vector machine
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
US8283549B2 (en) * 2006-09-08 2012-10-09 Panasonic Corporation Information processing terminal and music information generating method and program
US20140172431A1 (en) * 2012-12-13 2014-06-19 National Chiao Tung University Music playing system and music playing method based on speech emotion recognition
CN106132040A (en) * 2016-06-20 2016-11-16 科大讯飞股份有限公司 Sing lamp light control method and the device of environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10187178A (en) * 1996-10-28 1998-07-14 Omron Corp Feeling analysis device for singing and grading device
US8283549B2 (en) * 2006-09-08 2012-10-09 Panasonic Corporation Information processing terminal and music information generating method and program
CN1975856A (en) * 2006-10-30 2007-06-06 邹采荣 Speech emotion identifying method based on supporting vector machine
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
US20140172431A1 (en) * 2012-12-13 2014-06-19 National Chiao Tung University Music playing system and music playing method based on speech emotion recognition
CN106132040A (en) * 2016-06-20 2016-11-16 科大讯飞股份有限公司 Sing lamp light control method and the device of environment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108174498A (en) * 2017-12-28 2018-06-15 福建海媚数码科技有限公司 A kind of control method and system of the scene lamp based on intelligent Matching
CN108899046A (en) * 2018-07-12 2018-11-27 东北大学 A kind of speech-emotion recognition method and system based on Multistage Support Vector Machine classification
CN108986843A (en) * 2018-08-10 2018-12-11 杭州网易云音乐科技有限公司 Audio data processing method and device, medium and calculating equipment
CN109120992A (en) * 2018-09-13 2019-01-01 北京金山安全软件有限公司 Video generation method and its device, electronic equipment, storage medium

Also Published As

Publication number Publication date
CN106128479B (en) 2019-09-06

Similar Documents

Publication Publication Date Title
Anagnostopoulos et al. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011
Giannoulis et al. A database and challenge for acoustic scene classification and event detection
Alías et al. A review of physical and perceptual feature extraction techniques for speech, music and environmental sounds
Tzanetakis et al. Marsyas: A framework for audio analysis
CN102881284B (en) Unspecific human voice and emotion recognition method and system
Zhang et al. Speech emotion recognition using deep convolutional neural network and discriminant temporal pyramid matching
Schmidt et al. Feature selection for content-based, time-varying musical emotion regression
Barchiesi et al. Acoustic scene classification: Classifying environments from the sounds they produce
Downie The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research
CN107610707B (en) A kind of method for recognizing sound-groove and device
Li et al. Separation of singing voice from music accompaniment for monaural recordings
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
Aucouturier et al. The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music
Kotti et al. Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema
Miller et al. Perceptual space for musical structures
Pons et al. Timbre analysis of music audio signals with convolutional neural networks
Zhang et al. Hierarchical classification of audio data for archiving and retrieving
CN102799899B (en) Special audio event layered and generalized identification method based on SVM (Support Vector Machine) and GMM (Gaussian Mixture Model)
TWI297486B (en) Intelligent classification of sound signals with applicaation and method
Koolagudi et al. IITKGP-SESC: speech database for emotion analysis
CN104167208B (en) A kind of method for distinguishing speek person and device
Bergstra et al. Aggregate features and a da b oost for music classification
Livingstone et al. Changing musical emotion: A computational rule system for modifying score and performance
CN101261832B (en) Extraction and modeling method for Chinese speech sensibility information
Nwe et al. Exploring vibrato-motivated acoustic features for singer identification

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
CB03 Change of inventor or designer information

Inventor after: Cai Zhili

Inventor after: Li Hongfu

Inventor before: Cai Zhili

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant