CN106128479A

CN106128479A - A kind of performance emotion identification method and device

Info

Publication number: CN106128479A
Application number: CN201610517375.4A
Authority: CN
Inventors: 蔡智力
Original assignee: Fujian Star Net eVideo Information Systems Co Ltd
Current assignee: Fujian Star Net eVideo Information Systems Co Ltd
Priority date: 2016-06-30
Filing date: 2016-07-02
Publication date: 2016-11-16
Anticipated expiration: 2036-07-02
Also published as: CN106128479B

Abstract

This application discloses a kind of emotion identification method and device sung, the wherein said affective characteristics treating that audio frequency is sung in training that extracts, training obtains emotion recognition model；Described affective characteristics includes voice signal property and music score of Chinese operas feature；Extract the affective characteristics of performance audio frequency to be identified；The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the emotion of performance audio frequency to be identified.The present embodiment is compared to existing speech emotion recognition and music emotion identification, the present embodiment can go out the performance emotion of corresponding singer according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains according to music score of Chinese operas feature and acoustical signal feature identification, for same song, the emotion of corresponding performance can be identified according to different singers, more precisely identify the emotion of singer.

Description

A kind of performance emotion identification method and device

Technical field

The application belongs to emotion recognition field, specifically, relates to a kind of performance emotion recognition and device.

Background technology

The emotion recognition of present stage audio frequency is broadly divided into speech emotion recognition and music emotion identification two aspect, but from drilling But nobody relates to sing middle identification emotion, is also a difficult point of audio frequency emotion recognition.It is different from speech emotion recognition and music Emotion recognition because: rely on tone and word speed can judge emotion in, speech emotion recognition, but performance be all according to Tone and word speed that song is demarcated are carried out, so the method that foundation tone and word speed identify the emotion in performance is infeasible.Shen Please number be 200510046169.1, patent publication us " phonetic recognition analysing system and the service of filing date 2005-03-31 Method ", then it is the sound frequency extracting the mankind in person to person's communication process, with sound emotion degree and sound affinity degree as technology Foundation, draws speech recognition based on Sensory science field and analysis.Sound emotion degree is the tone according to people's sounding and musical note, Understand its personality, grasp speaker's mental status at that time；Sound affinity is directly driven by human lung according to analysis Low frequency sounding, and then show the true emotional of speaker.But for singing scene, all demarcate according to song during performance Tone and word speed are carried out, and identify the emotion of singer infeasible according to tone and musical note in this patent publication us.Two, sound Happy emotion recognition mainly judges emotion according to audio frequency characteristics and music score of Chinese operas feature, and the emotion therefore judged is all fixing, but Being that each singer can deduce voluntarily when singing, for a same song, the emotion of the deduction of each singer is also Differ, so music emotion identification can not be recognized accurately the emotion of corresponding performance according to the performance situation of singer.

To sum up, singing emotion recognition is a frontier being totally different from speech emotion recognition and music emotion identification, Prior art does not has to provide a solution, to realize identifying the emotion of singer from sing.

Summary of the invention

In view of this, technical problems to be solved in this application there is provided a kind of emotion recognition and device sung, permissible Realize identifying the emotion of singer from sing.

In order to solve above-mentioned technical problem, this application discloses a kind of performance emotion identification method, including:

Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model；Described affective characteristics includes sound Tone signal feature and music score of Chinese operas feature；

Extract the affective characteristics of performance audio frequency to be identified；

The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the feelings of performance audio frequency to be identified Sense.

For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion recognition device, including:

Training module, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model；Described feelings Sense feature includes voice signal property and music score of Chinese operas feature；

Extraction module, for extracting the affective characteristics of performance audio frequency to be identified；

Identification module, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies to be identified Sing the emotion of audio frequency.

For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion identification method, including:

Obtain user and sing audio frequency；

When identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, performance corresponding to output is tied Really control instruction.

Acquisition module, is used for obtaining user and sings audio frequency；

Identification module, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, defeated Go out the performance output control instruction of correspondence.

Compared with prior art, the application can obtain and include techniques below effect:

The affective characteristics of the embodiment of the present application extraction and speech emotion recognition and music emotion identification are in terms of feature extraction There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only It is tone, word speed etc.；Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains The emotion of singer is more precisely identified with voice signal property.Concrete, by extracting, the present embodiment treats that training is sung The affective characteristics of audio frequency, training obtains emotion recognition model；Described affective characteristics includes voice signal property and music score of Chinese operas feature；Carry Take the affective characteristics of performance audio frequency to be identified；The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies The emotion of performance audio frequency to be identified.The embodiment of the present application is capable of the performance of the performance audio identification singer according to singer Affective style, can identify the emotion of singer from sing.

Certainly, the arbitrary product implementing the application it is not absolutely required to reach all the above technique effect simultaneously.

Accompanying drawing explanation

Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen Schematic description and description please is used for explaining the application, is not intended that the improper restriction to the application.In the accompanying drawings:

Figure 1A is a kind of schematic flow sheet singing emotion identification method that some embodiment of the application provides；

Figure 1B is the schematic flow sheet of a kind of emotion recognition method for establishing model that some embodiment of the application provides；

Fig. 2 A is the schematic flow sheet of another performance emotion identification method that some embodiment of the application provides；

Fig. 2 B is the schematic flow sheet singing emotion identification method provided based on Fig. 2 some embodiment of A the application；

Fig. 3 is the another kind of schematic flow sheet singing emotion identification method that some embodiment of the application provides；

Fig. 4 is the another kind of emotion recognition method for establishing model schematic flow sheet that some embodiment of the application provides；

Fig. 5 A is pressure factor and the capacity factor composition plane right-angle coordinate of some embodiment of the application offer；

Fig. 5 B is a part of flow process signal of a kind of emotion recognition method for establishing model that some embodiment of the application provides Figure；

Fig. 6 A is the schematic flow sheet singing emotion identification method that some embodiment of the application provides；

Fig. 6 B is the part schematic flow sheet singing emotion identification method that some embodiment of the application provides；

Fig. 6 C is another part schematic flow sheet singing emotion identification method that some embodiment of the application provides；

Fig. 7 is a kind of schematic flow sheet singing recognition methods that some embodiment of the application provides；

Fig. 8 is a kind of structural representation singing emotion recognition device that some embodiment of the application provides；

Fig. 9 is a kind of structural representation singing identification device that some embodiment of the application provides；

Figure 10 is the structural representation of the electric terminal that some embodiment of the application provides.

Detailed description of the invention

Describe presently filed embodiment in detail below in conjunction with drawings and Examples, thereby how the application is applied Technological means solves technical problem and reaches the process that realizes of technology effect and can fully understand and implement according to this.

Embodiment one

Refer to Figure 1A, it is shown that the embodiment of the present application provides a kind of schematic flow sheet singing emotion identification method, this Application can apply to terminal unit, it is also possible to being applied to emotion recognition model and set up device, this device can be with software, hardware Or the mode of software and hardware combining is typically positioned in terminal unit.Say as a example by executive agent is as terminal unit below Bright, the method shown in Figure 1A can be implemented as described below.

Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model；Described emotion is special Levy and include voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification extracted Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average, Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature；Described Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each The average duration of sound.

Optionally, as shown in Figure 1B, the training method of the present embodiment emotion recognition model is as follows.

Step 1011, determine described in treat training sing audio frequency affective characteristics respectively at the first coordinate axes and the second coordinate axes Training coordinate figure, obtain the first training coordinate figure and second training coordinate figure；Wherein, described first coordinate axes and the second coordinate Axle composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding.

Step 1012, according to described first training coordinate figure and treat training sing audio frequency affective characteristics set up the first training Matrix, sets up the second training matrix according to described second training coordinate figure and the affective characteristics treating training performance audio frequency；

Step 1013, the first training matrix is normalized into the first training normalization matrix；By the second training matrix normalizing Chemical conversion the second training normalization matrix；

Step 1014, by described first training normalization matrix, second training normalization matrix substitute into SVM algorithm respectively, Correspondence obtains the first training hyperplane, the second training hyperplane；

Step 1015, by first training hyperplane and first training normalization matrix substitute into SVM algorithm, obtain based on first First emotion recognition model of coordinate axes；Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains The second emotion recognition model based on the second coordinate axes.Described first emotion recognition model is for determining performance audio frequency to be identified Affective characteristics is at the first coordinate figure of the first change in coordinate axis direction, and the second emotion recognition model is for determining performance audio frequency to be identified Affective characteristics is at the second coordinate figure of the second change in coordinate axis direction.

Step 102, extract the affective characteristics of performance audio frequency to be identified.With as step 101, step 102 extract emotion Feature includes voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification is extracted Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale base poor, average Frequently, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature；Institute State music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, every The average duration of individual sound.

Step 103, the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identify performance sound to be identified The emotion of frequency.Concrete, step 103 includes:

The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model, Determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes；

The quadrant that described affective characteristics is corresponding is determined, to determine described feelings according to described first coordinate figure and the second coordinate figure The performance affective style that sense feature is corresponding.

The affective characteristics that the present embodiment extracts exists in terms of feature extraction with speech emotion recognition and music emotion identification Difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also sound Tune, word speed etc.；Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature and (are included in In voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, the present embodiment root According to including that the emotion recognition model that the affective characteristics of voice signal property and music score of Chinese operas feature obtains can be according to music score of Chinese operas feature harmony Tone signal feature, identifies the performance emotion of corresponding singer, for same song, it is possible to identify according to different singers The corresponding emotion sung, more precisely identifies the emotion of singer.Concrete, by extracting, the present embodiment treats that training is sung The affective characteristics of audio frequency, training obtains emotion recognition model；Described affective characteristics includes voice signal property and music score of Chinese operas feature；Carry Take the affective characteristics of performance audio frequency to be identified；The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies The emotion of performance audio frequency to be identified.The embodiment of the present application is capable of the performance of the performance audio identification singer according to singer Affective style, can identify the emotion of singer from sing.

Embodiment two

In conjunction with Figure 1A to Fig. 2 B, the embodiment of the present application provides one to sing emotion identification method, for based on enforcement one one Kind can implementation, especially by the following manner realize.Here, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.

Optionally, the performance affective style that the quadrant of described plane right-angle coordinate is corresponding includes: nervous anxiety, happiness are joyous Hurry up, sad dejected, calmness naturally.The quadrant of described plane right-angle coordinate and the corresponding relation singing affective style include: the One quadrant correspondence anxiety anxiety, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, the corresponding nature of fourth quadrant Tranquil.

As shown in Figure 2 A, in a kind of feasible embodiment, step 103 obtains the first coordinate figure by the following method.

In step 1030, according to voice signal property, the music score of Chinese operas feature of described performance audio frequency to be identified and sit based on first First training matrix of parameter, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property (A_g,1…A_g,n) and music score of Chinese operas feature (B_g,1…B_g,m) matrix (A that forms_g,1…A_g,n B_g,1…B_g,m0) add based on the first coordinate First training matrix of axleLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training Coordinate figure determines.In the present invention, parameter g in described voice signal property and music score of Chinese operas feature is song to be identified, and n is sound Signal characteristic number, m be music score of Chinese operas number of features, L be the number of training song.

In step 1032, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalization_gx,1…a_gx,n b_gx,1…b_gx,m)。

In step 1034, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (a_gx,1…a_gx,n b_gx,1… b_gx,m), first training hyperplaneThe first emotion recognition model T with X-axis_XIt is updated to In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identified_g,1…A_g,n B_g,1…B_g,m) sit in the first of X-direction Scale value X_g；Wherein, describedFor training the pth of voice signal property_iIndividual feature,For training the q of music score of Chinese operas feature_iIndividual spy Levy, p₁…p_i∈[1,n]；q₁…q_i∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines； Described the first emotion recognition model based on the first coordinate axes is based on described first training hyperplane and the first training normalized moments Battle array determines.

As shown in Figure 2 B, in a kind of feasible embodiment, step 103 obtains the second coordinate figure by the following method.

In step 1030 ', sit according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on second Second training matrix of parameter, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property (A_g,1…A_g,n) and music score of Chinese operas feature (B_g,1…B_g,m) matrix (A that forms_g,1…A_g,n B_g,1…B_g,m0) based on Y-axis the is added Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes Scale value determines.

In step 1032 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalization_gy,1…a_gy,n b_gy,1…b_gy,m)。

In step 1034 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (a_{Gx, 1}…a_{Gx, n} b_{Gx, 1}… b_{Gx, m}), second training hyperplaneThe second emotion recognition model T with Y-axis_YIt is updated to SVM In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identified_g,1…A_g,n B_g,1…B_g,m) at the second coordinate of Y direction Value Y_g, wherein, describedFor training the r of voice signal property_iIndividual spy,For training the s of music score of Chinese operas feature_iIndividual feature, r₁… r_i∈[1,n]；s₁…s_i∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines；Described The second emotion recognition model based on the second coordinate axes is true based on described second training hyperplane and the second training normalization matrix Fixed.

It should be appreciated that step 1030 and step 1030 ' execution sequence without successively, can synchronize to perform.In like manner, suddenly 1032 and step 1032 ' execution sequence without successively, can synchronize to perform.Rapid 1034 and step 1034 ' execution sequence without successively, Can synchronize to perform.

In the embodiment of the present application, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described plane is straight The quadrant of angle coordinate system and performance affective style one_to_one corresponding.The embodiment of the present application is by the acoustical signal of performance audio frequency to be identified Feature and music score of Chinese operas feature determine that the affective characteristics of described performance audio frequency to be identified is respectively at the first coordinate axes and the second coordinate axes Coordinate figure, and determine, according to the first coordinate figure and the second coordinate figure, the performance that the affective characteristics of described performance audio frequency to be identified is corresponding Affective style, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can know from sing The emotion of other singer.

It addition, the affective characteristics of the present embodiment extraction and speech emotion recognition and music emotion identification are in terms of feature extraction There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only It is tone, word speed etc.；Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains And voice signal property, identify the performance emotion of corresponding singer, for same song, it is possible to know according to different singers Do not go out the emotion of corresponding performance, more precisely identify the emotion of singer.

Embodiment three

Referring to Fig. 3, the embodiment of the present application provides a kind of emotion identification method of singing, the present embodiment and embodiment one, Two is roughly the same, and the present embodiment is specifically told about: sets up the first emotion recognition model based on the first coordinate axes and sits based on second Second emotion recognition model of parameter, specifically can be accomplished by.

In step 301, extract voice signal property and the music score of Chinese operas feature treating that audio frequency is sung in training.Concrete, extract and wait to instruct Practice the voice signal property A singing audio frequency_i,jAnd music score of Chinese operas feature B_i,k.Wherein, A_i,jRepresent that the i-th head treats that the jth of audio frequency is sung in training The eigenvalue of individual voice signal property, 1≤j≤n, n are voice signal property total number, B_i,kRepresent that the i-th head treats that sound is sung in training The eigenvalue of the kth music score of Chinese operas feature of frequency, 1≤k≤m, m are music score of Chinese operas feature total number.

In step 302, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Training coordinate figure and the second training coordinate figure.Here, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The One training coordinate figure X_iRepresent that the i-th head that the professional of connection music marks treats that training sings audio frequency at the first coordinate axes Coordinate figure, the second training coordinate figure Y_iRepresent that i-th head that marks of professional of connection music treats that training sings audio frequency the The coordinate figure of two coordinate axess, then the i-th head treats that audio frequency characteristics is sung in training is (A_i,1…A_i,n B_i,1…B_i,m X_i Y_i)。X_iAnd Y_i Can directly use the coordinate figure that the professional of connection music marks in advance.

In step 303, determine respectively based on the first coordinate axes according to described first training coordinate figure, the second training coordinate figure The first training matrix, the second training matrix based on the second coordinate axes.After the feature of all L songs has all been extracted, The matrix of a L* (n+m+2) will be formedThis matrix is divided into base The first training matrix in the first coordinate axesWith based on the second coordinate axes Second training matrix

In step 304, described first training matrix, the second training matrix are normalized respectively, obtain first Training normalization matrix and the second training normalization matrix.Concrete, to X-axis the first training matrixIn data be normalized by row, make span be [-1, 1], the first training normalization matrix after normalization isIn formula:

a_i,j∈[-1,1],b_i,k∈[-1,1],x_i∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

In like manner, the second training matrix to Y-axis obtain after carrying out same normalized normalized second training return One changes matrix battle array is:In formula:

a_i,j∈[-1,1],b_i,k∈[-1,1],y_i∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].

In step 305, described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively, Obtain the first training hyperplane based on the first coordinate axes, the second training hyperplane based on the second coordinate axes.First instruction of X-axis Practice normalization matrixSubstituting into SVM algorithm, this algorithm will ask for the one of X-direction Individual hyperplane, this hyperplane can be by x_iIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by sound Partial Feature composition in signal characteristic and music score of Chinese operas feature, if the hyperplane of the X-axis tried to achieve isWhereinPth for voice signal property_iIndividual feature,For music score of Chinese operas feature q_iIndividual feature, p₁…p_i∈[1,n]；q₁…q_i∈[1,m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal property_iIndividual feature,S for music score of Chinese operas feature_iIndividual Feature, r₁…r_i∈[1,n]；s₁…s_i∈[1,m]。

In step 306, the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtain based on the The emotion recognition model of one coordinate axes；Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains base Emotion recognition model in the second coordinate axes.According to the X-axis hyperplane tried to achieveWherein p₁…p_i∈[1,n]；q₁…q_i∈ [1, m], andBring in SVM algorithm, Try to achieve the emotion recognition model of X-axis, be set to T_X.In like manner can try to achieve the emotion recognition model of Y-axis, be set to T_Y。

Embodiment four

In conjunction with Figure 1A to Fig. 3, the embodiment of the present application provides one to sing emotion identification method, generally comprises two processes: (1) foundation of emotion recognition model is sung；(2) identification of emotion is sung.

(1) foundation of emotion recognition model is sung

This process is mainly used in: set up the first emotion recognition model based on the first coordinate axes and based on the second coordinate axes Second emotion recognition model.During setting up performance emotion recognition model, a large amount of collection in advance is needed to comprise various emotion Performance voice data (as treat training sing audio frequency), sing voice data require be pure voice as far as possible, collect correspondence simultaneously The music score of Chinese operas given song recitals.

These emotions singing audio frequency collected are classified by the personnel then looking for a little connection music specialty: first Determining the kind of emotional semantic classification, the most each performance audio frequency is desirable that the professional of connection music each listens one time, and respectively From carrying out Emotion tagging, when major part professional all think this initial performance current sing audio frequency belong to a certain emotion time, then ought This first audio frequency front is assigned under the catalogue of this emotion, otherwise abandons this audio frequency, is all classified by all performance audio frequency according to this.Need Illustrate: sing the performance that there may be the situation such as prelude and climax parts of singing emotion change in audio frequency for one section Emotion may be different, and this performance audio frequency now should be divided into some section audios of emotion independence by the professional of connection music, The emotion making every section of interior audio frequency is consistent, and the music score of Chinese operas of simultaneously corresponding song also should be by audio content segmentation and carry out mark Note is allowed to the audio frequency one_to_one corresponding with segmentation.

After said process, it is possible to audio frequency will be sung and press emotional semantic classification, and make the audio number of every class emotion identical； The music score of Chinese operas of song of simultaneously also having classified, the audio frequency one_to_one corresponding being allowed to Yu having classified.

By emotional category analysis the voice signal property of the performance audio frequency extracting each emotional category, extract simultaneously and drill The music score of Chinese operas feature of bent correspondence of singing.It should be understood that be different from speech emotion recognition and spy that music emotion identification is extracted Levying, the feature extracted herein comprises following content, and the voice signal property singing audio frequency extracts the following aspects Content: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, Centre of moment standard deviation, MFCC feature, language spectrum signature；The content of music score of Chinese operas feature extraction the following aspects: beat number per minute, big Tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.Sing voice signal property and the music score of Chinese operas of audio frequency Feature be both for that same head gives song recitals same section carries out extracting, and has sung which song, the music score of Chinese operas as sung audio frequency In extract the feature of these several the music of song the most accordingly.(remarks: speech emotion recognition has only to extract audio frequency characteristics and is not related to the music score of Chinese operas The extraction of feature, and its audio frequency characteristics also simply tone, word speed etc.；Although audio frequency characteristics and the music score of Chinese operas are also extracted in music emotion identification Feature, but it is not related to the extraction of language spectrum signature etc..Therefore with speech emotion recognition and music emotion identification in feature extraction side There is difference in face.)

After carrying out above-mentioned pretreatment work, the foundation singing emotion recognition model specifically can be real in the following manner Existing.Here, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate with Sing affective style one_to_one corresponding.First coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.Described plane rectangular coordinates The performance affective style that the quadrant of system is corresponding includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane is straight The quadrant of angle coordinate system includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second quadrant are corresponding Happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, fourth quadrant is corresponding naturally tranquil.

Concrete, performances emotion is divided into 4 classifications by the present embodiment, the saddest anxiety cheerful and light-hearted, nervous dejected, glad and oneself The tranquilest, four quadrants of corresponding flat rectangular coordinate system respectively, the affective style given song recitals is by the professional people of connection music Member is labeled in extracted emotional category characteristic (X and Y-direction in plane right-angle coordinate with coordinate form after determining Span is [-1,1], value more deviation X and Y coordinates axle, illustrates that its certain emotion is the most obvious；Value, the closer to X, Y coordinate axle, is said Its certain affective characteristics bright is the faintest).The training of the present embodiment and recognizer are SVM algorithm, by the professional people of connection music Member has marked the coordinate figure that user sings the quadrant at emotion place, extracts user and sings affective characteristics and extract its performance emotion seat Point X-axis data and Y-axis data after completing the extraction of all features and coordinate, are normalized, then divide by scale value respectively Jia Ru not be trained by SVM.The data trained according to these, it is special in the emotion of X-axis and Y-axis that SVM can show that user sings emotion The optimal hyperplane value levied, thus obtain emotion recognition model based on X-axis and Y-axis.

Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model；Described emotion is special Levy and include voice signal property and music score of Chinese operas feature.In conjunction with Figure 1A and Fig. 3, the journey of setting up of emotion recognition model specifically see Fig. 3 Shown implementation method.

In step 302, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Training coordinate figure and the second training coordinate figure.First training coordinate figure X_iRepresent connection music professional mark i-th Head treats that the audio frequency coordinate figure at the first coordinate axes, the second training coordinate figure Y are sung in training_iRepresent professional's mark of connection music The i-th head noted treats that the audio frequency coordinate figure at the second coordinate axes is sung in training, and then the i-th head treats that training is sung audio frequency characteristics and is (A_i,1…A_i,n B_i,1…B_i,m X_i Y_i)。X_iAnd Y_iCan directly use the coordinate figure that the professional of connection music marks in advance.

In step 303, determine respectively based on the first coordinate axes according to described first training coordinate figure, the second training coordinate figure The first training matrix, the second training matrix based on the second coordinate axes.After the feature of all L songs has all been extracted, The matrix of a L* (n+m+2) will be formedThen this matrix is divided Become the first training matrix based on the first coordinate axesWith based on the second coordinate Second training matrix of axle

In like manner, the first training matrix to Y-axis obtain after carrying out same normalized normalized second training return One changes matrix battle array is:In formula:

In step 305, described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively, Obtain the first training hyperplane based on the first coordinate axes, the second training hyperplane based on the second coordinate axes.First instruction of X-axis Practice normalization matrixSubstituting into SVM algorithm, this algorithm will ask for the one of X-direction Individual hyperplane, this hyperplane can be by x_iIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by sound Partial Feature composition in signal characteristic and music score of Chinese operas feature, if the hyperplane of the X-axis tried to achieve isWhereinPth for voice signal property_iIndividual feature,Q for music score of Chinese operas feature_i Individual feature, p₁…p_i∈[1,n]；q₁…q_i∈[1,m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal property_iIndividual feature,S for music score of Chinese operas feature_iIndividual Feature, r₁…r_i∈[1,n]；s₁…s_i∈[1,m]。

In step 306, the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtain based on the First emotion recognition model of one coordinate axes；Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, To the second emotion recognition model based on the second coordinate axes.According to the X-axis hyperplane tried to achieveWherein p₁…p_i∈[1,n]；q₁…q_i∈ [1, m], andBring in SVM algorithm, the first emotion recognition model of X-axis can be tried to achieve, if For T_X.In like manner can try to achieve the second emotion recognition model of Y-axis, be set to T_Y。T_XAnd T_YIt is the performance emotion recognition model of foundation.

Described first emotion recognition model is for determining that the affective characteristics of performance audio frequency to be identified is at the first change in coordinate axis direction The first coordinate figure, the second emotion recognition model is for determining that the affective characteristics of performance audio frequency to be identified is at the second change in coordinate axis direction The second coordinate figure.

(2) identification of emotion is sung

Step 102, extract the affective characteristics of performance audio frequency to be identified.The affective characteristics that step 102 is extracted includes that sound is believed Number feature and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and feature that music emotion identification is extracted, the present embodiment The voice signal property extracted include following at least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, super Cross the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature；Described music score of Chinese operas feature include with Descend at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.

In step 1030, according to voice signal property, the music score of Chinese operas feature of described performance audio frequency to be identified and sit based on first First training matrix of parameter, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property (A_g,1…A_g,n) and music score of Chinese operas feature (B_g,1…B_g,m) matrix (A that forms_g,1…A_g,n B_g,1…B_g,m0) based on X-axis the is added One training matrixLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training Coordinate figure determines

In step 1034, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (a_gx,1…a_gx,n b_gx,1… b_gx,m), first training hyperplaneThe first emotion recognition model T with X-axis_XIt is updated to In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identified_g,1…A_g,n B_g,1…B_g,m) in the first of X-direction Coordinate figure X_g；Wherein, describedFor training the pth of voice signal property_iIndividual feature,For training the q of music score of Chinese operas feature_iIndividual spy Levy, p₁…p_i∈[1,n]；q₁…q_i∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines； Described the first emotion recognition model based on the first coordinate axes is based on described first training hyperplane and the first training normalized moments Battle array determines.

In step 1034 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (a_gy,1…a_gy,n b_gy,1… b_gy,m), second training hyperplaneThe second emotion recognition model T with Y-axis_YIt is updated to SVM In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identified_g,1…A_g,n B_g,1…B_g,m) at the second coordinate of Y direction Value Y_g, wherein, describedFor training the r of voice signal property_iIndividual spy,For training the s of music score of Chinese operas feature_iIndividual feature, r₁… r_i∈[1,n]；s₁…s_i∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines；Described The second emotion recognition model based on the second coordinate axes is true based on described second training hyperplane and the second training normalization matrix Fixed.

Embodiment five

Refer to Fig. 4 to Fig. 5 B, it is shown that the optional emotion recognition method for establishing model of the embodiment of the present application other Schematic flow sheet, the application can apply to terminal unit, it is also possible to being applied to emotion recognition model and set up device, this device can It is typically positioned in terminal unit in the way of with software, hardware or software and hardware combining.Set with executive agent for terminal below Illustrate as a example by Bei, can be implemented as described below in conjunction with the method shown in Fig. 4 to Fig. 5 B.

In step 400, obtain and treat that audio sample is sung in training, treat that sound is sung in training according to default affective style to described Frequently sample carries out emotional semantic classification, determines that multiple the waiting corresponding with affective style trains performance audio frequency subsample；Wherein, it is used for determining The emotional factor of affective style includes pressure factor and capacity factor.

This step is collected the performance audio frequency comprising various emotion in a large number and is i.e. treated that audio frequency is sung in training, as treating that sound is sung in training Frequently sample.Sing audio request is pure voice as far as possible, collects simultaneously and sings the music score of Chinese operas that audio frequency correspondence gives song recitals.This step is permissible It is that terminal unit is directly from local or storage device or the performance audio frequency of Network Capture collector collection.

After having collected performance audio frequency, terminal unit can carry out emotion according to default affective style to performance audio frequency and divide Class.Concrete, the performance audio frequency collected can be carried out emotional semantic classification, also according to the criteria for classification of the personnel of connection music specialty The personnel that can directly ask music speciality carry out emotional semantic classification according to their criteria for classification.The classification of the personnel of connection music specialty Standard can be such that the kind first determining emotional semantic classification, the most each performance audio frequency are desirable that the professional people of connection music Member each listens one time, and each carries out Emotion tagging, when major part professional thinks that this initial performance current is sung audio frequency and belonged to certain During a kind of emotion, then this first audio frequency current is assigned under the catalogue of this emotion, otherwise abandon this audio frequency, according to this by all performances Audio frequency is all classified.Should be noted that: one section sing in audio frequency there may be sing the situation of emotion change such as before Playing the performance emotion with climax parts may be different, now by the professional of connection music, this performance audio frequency should be divided into emotion Independent some section audios so that the emotion of every section of interior audio frequency is consistent, the music score of Chinese operas of the most corresponding song also should be by sound Frequently content section carry out the audio frequency one_to_one corresponding that mark is allowed to segmentation.

After terminal unit carries out above-mentioned emotional semantic classification, it may be determined that multiple wait corresponding with affective style trains performance audio frequency Sample.Concrete, audio frequency will be sung and press emotional semantic classification, and make the audio number of every class emotion identical, while also to classify The music score of Chinese operas of song, makes the music score of Chinese operas and the audio frequency one_to_one corresponding classified.

Step 402, extraction respectively treat that the affective characteristics of audio frequency subsample is sung in training, treat that audio frequency is sung in training described in all The affective characteristics of subsample is based respectively on pressure dimension and energy dimension is normalized, and correspondence obtains normalization pressure feelings Sense feature and normalized energy affective characteristics.

Step 404, described normalization pressure affective characteristics and normalized energy affective characteristics are carried out SVM algorithm instruction respectively Practicing, correspondence obtains the pressure index for determining pressure factor size and for determining the nergy Index of capacity factor height.

Step 406, described normalization pressure affective characteristics and pressure index are carried out SVM algorithm training, obtain for really First emotion recognition model of constant-pressure factor；Described normalized energy affective characteristics and nergy Index are carried out SVM algorithm instruction Practice, obtain the second emotion recognition model for determining capacity factor.

It will be understood by those skilled in the art that in the said method of the application detailed description of the invention, the sequence number of each step Size is not meant to the priority of execution sequence, and the execution sequence of each step, logical combination should be true with its function and internal logic Fixed, and the implementation process of the application detailed description of the invention should not constituted any restriction.

In the embodiment of the present application, can be obtained by step 400-406 can determine pressure factor first emotion know Other model and determine the second emotion recognition model of capacity factor, so that the terminal unit of this method can be performed or other can be indirect The terminal unit calling this method can determine performance to be identified according to the first emotion recognition model and the second emotion recognition model The pressure factor of audio frequency and capacity factor, and then determine the performance affective style that performance audio frequency to be identified is corresponding.Pass through the application Embodiment, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can identify from sing The emotion of singer.

In an optional embodiment, two principal elements affecting music emotion are pressure and energy, due to pressure Can be the most corresponding with acoustic features with capacity factor, therefore can be music according to the power of pressure factor (Valence) Affective characteristics be divided into from anxiety to happy, can be the emotion of music according to the power of capacity factor (Arousal) Feature be divided into from vigor to tranquil.Four area of space that corresponding two dimensional surface rectangular coordinate system is divided into, music Be segmented into following four big classes: nervous/frightened, cheerful, meet, dejected.As shown in Figure 5A, pressure (Valence) dimension can be represented by the first coordinate axes, and energy (Arousal) dimension can be represented by the second coordinate axes, wherein, First coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance emotion class Type one_to_one corresponding.First coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The quadrant pair of described plane right-angle coordinate The performance affective style answered includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane right-angle coordinate Quadrant includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second corresponding happiness of quadrant be cheerful and light-hearted, the Three quadrant correspondence sadnesss are dejected, fourth quadrant is corresponding naturally tranquil.

Based on above-mentioned optional embodiment, as shown in Figure 5 B, step 402 may be accomplished by.

Step 4021, determine described in treat that the first instruction corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Practice coordinate figure and the second training coordinate figure.Wherein, described affective characteristics can include voice signal property and music score of Chinese operas feature.

By emotional category analysis the voice signal property of the performance audio frequency extracting each emotional category, extract simultaneously and drill The music score of Chinese operas feature of bent correspondence of singing.It should be understood that be different from speech emotion recognition and spy that music emotion identification is extracted Levying, the feature extracted herein comprises following content, and the voice signal property singing audio frequency extracts the following aspects Content: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, Centre of moment standard deviation, MFCC feature, language spectrum signature；The content of music score of Chinese operas feature extraction the following aspects: beat number per minute, big Tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.Sing voice signal property and the music score of Chinese operas of audio frequency Feature be both for that same head gives song recitals same section carries out extracting, and has sung which song, the music score of Chinese operas as sung audio frequency In extract the feature of these several the music of song the most accordingly.

Concrete, extract the voice signal property A treating that audio frequency is sung in training_i,jAnd music score of Chinese operas feature B_i,k.Wherein, A_i,jRepresent I-th head treats that the eigenvalue of the jth voice signal property of audio frequency is sung in training, and 1≤j≤n, n are voice signal property total number, B_i,kRepresenting that the i-th head treats that the eigenvalue of the kth music score of Chinese operas feature of audio frequency is sung in training, 1≤k≤m, m are music score of Chinese operas feature total number.

In step 4021, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training Training coordinate figure and the second training coordinate figure.Here, the first training coordinate figure X_iRepresent professional's mark of connection music The i-th good head treats that the audio frequency coordinate figure at the first coordinate axes, the second training coordinate figure Y are sung in training_iRepresent the special of connection music The i-th head that industry personnel mark treats that the audio frequency coordinate figure at the second coordinate axes is sung in training, and then the i-th head treats that audio frequency is sung in training It is characterized as (A_i,1…A_i,n B_i,1…B_i,m X_i Y_i)。X_iAnd Y_iCan directly use the seat that the professional of connection music marks in advance Scale value.After the feature of all L songs has all been extracted, the matrix of a L* (n+m+2) will be formed

Step 4022, according to treating that affective characteristics and the first training coordinate figure of audio frequency subsample are sung in training described in all, Determine the first training matrix based on the first coordinate axes；According to treat described in all training sing audio frequency subsample affective characteristics and Second training coordinate figure determines the second training matrix based on the second coordinate axes.

Concrete, the first training matrix based on X-axis isBased on Y Second training matrix of axle is

Step 4023, being normalized described first training matrix, the second training matrix respectively, correspondence obtains One training normalization matrix, the second training normalization matrix.Here, the first training normalized moments matrix representation normalization pressure feelings Sense feature, the second training normalized moments matrix representation normalized energy affective characteristics.

Concrete, to X-axis the first training matrixIn data press Row are normalized, and making span is [-1,1], and the first training normalization matrix after normalization isIn formula:

Based on above-mentioned optional embodiment, step 404 particularly as follows: train normalization matrix, the second instruction by described first Practicing normalization matrix and substitute into SVM algorithm respectively, correspondence obtains the first training hyperplane based on the first coordinate axes, sits based on second Second training hyperplane of parameter；Wherein, described first training hyperplane is used for determining pressure factor size, described second training Hyperplane is used for determining capacity factor height.Concrete, by the first training normalization matrix of X-axisSubstituting into SVM algorithm, this algorithm will ask for a hyperplane of X-direction, should Hyperplane can be by x_iIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by voice signal property and song Partial Feature composition in spectrum signature, if the hyperplane of the X-axis tried to achieve isWherein Pth for voice signal property_iIndividual feature,Q for music score of Chinese operas feature_iIndividual feature, p₁…p_i∈[1,n]；q₁…q_i∈[1, m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal property_i Individual feature,S for music score of Chinese operas feature_iIndividual feature, r₁…r_i∈[1,n]；s₁…s_i∈[1,m]。

Step 406, particularly as follows: the first training hyperplane and the first training matrix are substituted into SVM algorithm, obtains for determining First emotion recognition model of the first coordinate identification value；Second training hyperplane and the second training matrix are substituted into SVM algorithm, To the second emotion recognition model for determining the second coordinate identification value.Here, the first coordinate identification value represents pressure factor, Second coordinate identification value represents capacity factor.According to the X-axis hyperplane tried to achieveWherein p₁…p_i∈[1,n]；q₁…q_i∈ [1, m], andBring in SVM algorithm, Try to achieve the training pattern of X-axis, be set to T_X.In like manner can try to achieve the training pattern of Y-axis, be set to T_Y。

In above-mentioned optional embodiment, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described The quadrant of plane right-angle coordinate and performance affective style one_to_one corresponding, the embodiment of the present application can obtain and can determine the first seat First emotion recognition model of scale value and the second emotion recognition model of the second coordinate figure, so that performing the terminal unit of this method Or other terminal units that can indirectly call this method can determine described to be identified according to the first coordinate figure and the second coordinate figure Sing the performance affective style that audio frequency is corresponding, it is possible to realize the performance emotion class of the performance audio identification singer according to singer Type, can identify the emotion of singer from sing.

The embodiment of the present application can obtain the first emotion recognition model T that can determine the first coordinate figure_XWith the second coordinate Second emotion recognition model T of value_Y, so that the terminal unit of execution this method or other terminals that can indirectly call this method set For determining, according to the first coordinate figure and the second coordinate figure, the performance affective style that described performance audio frequency to be identified is corresponding, it is possible to Realize the performance affective style of the performance audio identification singer according to singer, the feelings of singer can be identified from sing Sense.

The affective characteristics that the present embodiment extracts exists in terms of feature extraction with speech emotion recognition and music emotion identification Difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also sound Tune, word speed etc.；Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature and (are included in In voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, the present embodiment root According to including that the emotion recognition model that the affective characteristics of voice signal property and music score of Chinese operas feature obtains can be according to music score of Chinese operas feature harmony Tone signal feature, identifies the performance emotion of corresponding singer, for same song, it is possible to identify according to different singers The corresponding emotion sung, more precisely identifies the emotion of singer.

Embodiment six

Based on previous embodiment, Fig. 6 A to Fig. 6 C shows that the embodiment of the present application other optionally sings emotion recognition side The schematic flow sheet of method, the application can apply to terminal unit, it is also possible to is applied to emotion recognition model and sets up device, this dress Put and can be typically positioned in terminal unit in the way of software, hardware or software and hardware combining.Below with executive agent as end Illustrating as a example by end equipment, knot method shown in the present embodiment can be implemented as described below.

In the present embodiment, pressure dimension is represented by the first coordinate axes, and energy dimension is represented by the second coordinate axes, the One coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate is with affective style one by one Corresponding.Concrete, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The quadrant of described plane right-angle coordinate Corresponding performance affective style includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane right-angle coordinate Quadrant include with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second corresponding happiness of quadrant be cheerful and light-hearted, Third quadrant correspondence sadness is dejected, fourth quadrant is corresponding naturally tranquil.

Step 600, extracting the affective characteristics of performance audio frequency to be identified, wherein, affective characteristics can include that acoustical signal is special Levy and music score of Chinese operas feature.

Step 602, according to described affective characteristics and the first emotion recognition model, determine described affective characteristics based on pressure tie up The pressure factor of degree；According to described affective characteristics and the second emotion recognition model, determine that described affective characteristics is based on energy dimension Capacity factor；Wherein, described pressure factor and capacity factor are used for determining affective style.In the present embodiment, the first emotion Identifying that model and the second emotion recognition model obtain for setting up based on previous embodiment, concrete model is set up process and be see reality Execute example five.

Concrete, determine described to be identified drill according to the voice signal property of described performance audio frequency to be identified and music score of Chinese operas feature Sing the affective characteristics of audio frequency respectively at the first coordinate axes and the coordinate figure of the second coordinate axes, obtain for characterize pressure factor One coordinate figure and for characterizing the second coordinate figure of capacity factor.

As shown in Figure 6B, in a kind of feasible embodiment, step 602 obtains the first coordinate figure by the following method.

Step 6020, according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on the first coordinate First training matrix of axle, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property (A_g,1…A_g,n) and music score of Chinese operas feature (B_g,1…B_g,m) matrix (A that forms_g,1…A_g,n B_g,1…B_g,m0) based on X-axis the is added One training matrixLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training Coordinate figure determines.

In step 6022, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalization_gx,1…a_gx,n b_gx,1…b_gx,m)。

In step 6024, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (a_gx,1…a_gx,n b_gx,1… b_gx,m), first training hyperplaneThe first emotion recognition model T with X-axis_XIt is updated to In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identified_g,1…A_g,n B_g,1…_Bg,m) in the first of X-direction Coordinate figure X_g；Wherein, describedFor training the pth of voice signal property_iIndividual feature,For training the q of music score of Chinese operas feature_iIndividual spy Levy, p₁…p_i∈[1,n]；q₁…q_i∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines； Described training pattern based on the first coordinate axes determines based on described first training hyperplane and the first training normalization matrix.

As shown in Figure 6 C, in a kind of feasible embodiment, step 602 obtains the second coordinate figure by the following method.

Step 6020 ', according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on the second coordinate Second training matrix of axle, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property (A_g,1…A_g,n) and music score of Chinese operas feature (B_g,1…B_g,m) matrix (A that forms_g,1…A_g,n B_g,1…B_g,m0) based on Y-axis the is added Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes Scale value determines.

In step 6022 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization MatrixThen extract matrix last column data, obtain described in wait to know Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalization_gy,1…a_gy,n b_gy,1…b_gy,m)。

In step 6024 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (a_gy,1…a_gy,n b_gy,1… b_gy,m), second training hyperplaneThe second emotion recognition model T with Y-axis_YIt is updated to SVM In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identified_g,1…A_g,n B_g,1…B_g,m) at the second coordinate of Y direction Value Y_g, wherein, describedFor training the r of voice signal property_iIndividual spy,For training the s of music score of Chinese operas feature_iIndividual feature, r₁… r_i∈[1,n]；s₁…s_i∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines；Described Training pattern based on the second coordinate axes determines based on described second training hyperplane and the second training normalization matrix.

It should be appreciated that step 6020 and step 6020 ' execution sequence without successively, can synchronize to perform.In like manner, step 6022 and step 6022 ' execution sequence without successively, can synchronize to perform.Step 6024 and step 6024 ' execution sequence without elder generation After, can synchronize to perform.

Step 604, according to described pressure factor and capacity factor, determine the performance feelings that described performance audio frequency to be identified is corresponding Sense type.Concrete, the affective characteristics of described performance audio frequency to be identified is determined according to described first coordinate figure and the second coordinate figure Corresponding performance affective style.

In the embodiment of the present application, can make to perform the terminal unit of this method or other can call this method indirectly Terminal unit according to the first emotion recognition model and the second emotion recognition model, can determine the pressure of performance audio frequency to be identified because of Element and capacity factor, and then determine the performance affective style that performance audio frequency to be identified is corresponding.Pass through the embodiment of the present application, it is possible to real Now according to the performance affective style of the performance audio identification singer of singer, the emotion of singer can be identified from sing.

Embodiment seven

Refer to Fig. 7, it is shown that a kind of performance emotion identification method that the embodiment of the present application provides, the application can apply In terminal unit, it is also possible to be applied to emotion recognition model and set up device, this device can be with software, hardware or software and hardware combining Mode be typically positioned in terminal unit.Illustrate as a example by executive agent is as terminal unit below, method shown in Fig. 7 Can be accomplished by.

Audio frequency is sung in step 700, acquisition user.This step can be terminal unit directly from local or storage device or The performance audio frequency that Network Capture user sings.

Step 702, when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, it is right to export The performance output control instruction answered.

User can be sung audio frequency by speech emotion recognition and music emotion identification and carry out affective style by this step Identify, it is also possible to by previous embodiment one to six arbitrary described method, user is sung audio frequency and carry out the knowledge of affective style Not.

In an optional embodiment, described performance output control instruction include following at least one: sing add sub-control System instruction, signal light control instruction.Such as, user, when KTV sings, sings, when KTV equipment identifies user, the feelings that audio frequency is corresponding When sense type is consistent with preset musical emotion (assuming that preset musical emotion is for glad cheerful and light-hearted), then output performance bonus point controls to refer to Order, carries out bonus point with the performance mark showing KTV equipment.The most such as, user is when KTV sings, when KTV equipment identifies use When the affective style that family performance audio frequency is corresponding is consistent with preset musical emotion (assuming that preset musical emotion is for sad dejected), the most defeated Go out signal light control instruction, the luminaire being connected with KTV equipment to be carried out signal light control, concrete, can control and KTV equipment The luminaire output blue light connected, to embody sad dejected scene.

Embodiment 8

Referring to Fig. 8, the present embodiment provides one to sing emotion recognition device, including:

Training module 800, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model；Institute State affective characteristics and include voice signal property and music score of Chinese operas feature；

Extraction module 801, for extracting the affective characteristics of performance audio frequency to be identified；

Identification module 802, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies and waits to know Do not sing the emotion of audio frequency.

Optionally, described voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average, Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature；Described Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each The average duration of sound.

Optionally, described training module is as follows:

Training coordinate figure determines unit, treats that training sings the affective characteristics of audio frequency respectively at the first coordinate described in determining The training coordinate figure of axle and the second coordinate axes, obtains the first training coordinate figure and the second training coordinate figure；Wherein, described first sit Parameter and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate is with performance affective style one by one Corresponding；

Training matrix determines unit, for according to described first training coordinate figure and the affective characteristics treating training performance audio frequency Set up the first training matrix, set up the second training according to described second training coordinate figure and the affective characteristics treating training performance audio frequency Matrix；

Training normalization matrix determines unit, for the first training matrix is normalized into the first training normalization matrix； Second training matrix is normalized into the second training normalization matrix；

Training hyperplane determines unit, for described first training normalization matrix, the second training normalization matrix being divided Not substituting into SVM algorithm, correspondence obtains the first training hyperplane, the second training hyperplane；

Emotion recognition model determines unit, for the first training hyperplane and the first training normalization matrix are substituted into SVM Algorithm, obtains the first emotion recognition model based on the first coordinate axes；By the second training hyperplane and the second training normalized moments Battle array substitutes into SVM algorithm, obtains the second emotion recognition model based on the second coordinate axes.

Optionally, described identification module includes:

Input block, for inputting the first emotion recognition model and second respectively by the affective characteristics of performance audio frequency to be identified Emotion recognition model, determines described affective characteristics the first coordinate figure based on the first coordinate axes and based on the second coordinate axes second Coordinate figure；

Determine unit, for according to described first coordinate figure and the second coordinate figure determine described affective characteristics corresponding as Limit, the performance affective style corresponding to determine described affective characteristics.

Optionally, described determine unit, specifically for:

According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix；To institute State fisrt feature matrix to be normalized, obtain the first normalization matrix, and then obtain the emotion of performance audio frequency to be identified Feature matrix after the first training matrix normalization；By the affective characteristics of described performance audio frequency to be identified through the first training matrix Matrix, the first training hyperplane and the first emotion recognition model after normalization substitute into SVM algorithm, obtain affective characteristics first First coordinate figure of change in coordinate axis direction.

Optionally, described first coordinate axes is X-axis, the most described determines unit, specifically for:

By described voice signal property (A_g,1…A_g,n) and music score of Chinese operas feature (B_g,1…B_g,m) matrix (A that forms_g,1…A_g,n B_g,1…B_g,m0) the first training matrix based on X-axis is addedLast OK, fisrt feature matrix is obtained

To describedData in matrix are normalized by row, Obtain the first normalization matrixThen the number of matrix last column is extracted According to, obtain the affective characteristics of described performance audio frequency to be identified matrix (a after the first training matrix normalization_gx,1…a_gx,n b_gx,1…b_gx,m)；

By described (a_gx,1…a_gx,n b_gx,1…b_gx,m), first training hyperplaneWith First emotion recognition model T of X-axis_XIt is updated in SVM algorithm, obtains the affective characteristics (A of described performance audio frequency to be identified_g,1… A_g,n B_g,1…B_g,m) at the first coordinate figure X of X-direction_g；Wherein, describedFor training the pth of voice signal property_iIndividual spy Levy,For training the q of music score of Chinese operas feature_iIndividual feature, p₁…p_i∈[1,n]；q₁…q_i∈ [1, m], wherein n is that acoustical signal is special Levy number, m is music score of Chinese operas number of features, L is the number training song.

Optionally, described determine unit, specifically for:

According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix；To institute State second characteristic matrix to be normalized, obtain the second normalization matrix, and then obtain the emotion of performance audio frequency to be identified Feature matrix after the second training matrix normalization；By the affective characteristics of described performance audio frequency to be identified through the second training matrix Matrix, the second training hyperplane and the second emotion recognition model after normalization substitute into SVM algorithm, obtain affective characteristics second Second coordinate figure of change in coordinate axis direction.

Optionally, the second coordinate axes is Y-axis, the most described determines unit, specifically for:

By described voice signal property (A_g,1…A_g,n) and music score of Chinese operas feature (B_g,1…B_g,m) matrix (A that forms_g,1…A_g,n B_g,1…B_g,m0) the second training matrix based on Y-axis is addedLast OK, second characteristic matrix is obtained

To describedData in matrix are normalized by row, Obtain the second normalization matrixThen the number of matrix last column is extracted According to, obtain the affective characteristics of described performance audio frequency to be identified matrix (a after the second training matrix normalization_gy,1…a_gy,n b_gy,1…b_gy,m)；

By described (a_gy,1…a_gy,n b_gy,1…b_gy,m), second training hyperplaneAnd Y Second emotion recognition model T of axle_YIt is updated in SVM algorithm, obtains the affective characteristics (A of described performance audio frequency to be identified_g,1… A_g,n B_g,1…B_g,m) at the second coordinate figure Y of Y direction_g, wherein, describedFor training the r of voice signal property_iIndividual spy,For training the s of music score of Chinese operas feature_iIndividual feature, r₁…r_i∈[1,n]；s₁…s_i∈ [1, m], wherein n is voice signal property number Mesh, m be music score of Chinese operas number of features, L be the number of training song.

Optionally, the performance affective style that the quadrant of described plane right-angle coordinate is corresponding includes: nervous anxiety, happiness are joyous Hurry up, sad dejected, calmness naturally.

Optionally, the quadrant of described plane right-angle coordinate and the corresponding relation singing affective style include:

First quartile correspondence anxiety anxiety, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, four-quadrant Limit is corresponding naturally tranquil.

This device embodiment is the most corresponding with the method feature in previous embodiment, and correlation module/unit can corresponding perform Method flow in previous embodiment, therefore can be found in the associated description of method flow part in previous embodiment, at this no longer Repeat.

The embodiment of the present application also provides for a kind of electric terminal, the performance emotion recognition dress provided including such as previous embodiment Put.This device embodiment is the most corresponding with the method feature in previous embodiment, therefore can be found in method stream in previous embodiment The associated description of journey part, does not repeats them here.

Embodiment nine

Referring to Fig. 9, the present embodiment provides one to sing and identifies device, including:

Acquisition module 901, is used for obtaining user and sings audio frequency；

Identification module 902, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, The performance output control instruction that output is corresponding.

Optionally, described performance output control instruction include following at least one: sing bonus point control instruction, signal light control Instruction.

Referring to Figure 10, the embodiment of the present application also provides for a kind of electric terminal, including:

Memorizer 1000；

One or more processors 1003；And

One or more modules 1001, the one or more module 1001 is stored in described memorizer and is configured Becoming to be controlled by the one or more processor, the one or more module is for performing the instruction of following steps:

In a typical configuration, calculating equipment includes one or more processor (CPU), input/output interface, net Network interface and internal memory.

Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.

Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory (CD-ROM), Digital versatile disc (DVD) or other optical storage, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, can be used for the information that storage can be accessed by a computing device.According to defining herein, calculate Machine computer-readable recording medium does not include non-temporary computer readable media (transitory media), such as data signal and the carrier wave of modulation.

As employed some vocabulary in the middle of description and claim to censure specific components.Those skilled in the art should It is understood that hardware manufacturer may call same assembly with different nouns.This specification and claims are not with name The difference claimed is used as distinguishing the mode of assembly, but is used as the criterion distinguished with assembly difference functionally.As logical " comprising " mentioned in the middle of piece description and claim is an open language, therefore should be construed to " comprise but do not limit In "." substantially " referring in receivable range of error, those skilled in the art can solve described in the range of certain error Technical problem, basically reaches described technique effect.Additionally, " coupling " word comprises any directly and indirectly electric property coupling at this Means.Therefore, if a first device is coupled to one second device described in literary composition, then representing described first device can direct electrical coupling It is connected to described second device, or is indirectly electrically coupled to described second device by other devices or the means that couple.Description Subsequent descriptions is to implement the better embodiment of the application, for the purpose of right described description is the rule so that the application to be described, It is not limited to scope of the present application.The protection domain of the application is when being as the criterion depending on the defined person of claims.

Also, it should be noted term " includes ", " comprising " or its any other variant are intended to nonexcludability Comprise, so that include that the commodity of a series of key element or system not only include those key elements, but also include the most clearly Other key elements listed, or also include the key element intrinsic for this commodity or system.In the feelings not having more restriction Under condition, statement " including ... " key element limited, it is not excluded that in the commodity including described key element or system also There is other identical element.

Described above illustrate and describes some preferred embodiments of the present invention, but as previously mentioned, it should be understood that the present invention Be not limited to form disclosed herein, be not to be taken as the eliminating to other embodiments, and can be used for other combinations various, Amendment and environment, and can be in invention contemplated scope described herein, by above-mentioned teaching or the technology of association area or knowledge It is modified.And the change that those skilled in the art are carried out and change are without departing from the spirit and scope of the present invention, the most all should be at this In the protection domain of bright claims.

Claims

1. sing emotion identification method for one kind, it is characterised in that including:

Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model；Described affective characteristics includes that sound is believed Number feature and music score of Chinese operas feature；

The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the emotion of performance audio frequency to be identified.

Performance emotion identification method the most according to claim 1, it is characterised in that described voice signal property includes following At least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, average The centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature；Described music score of Chinese operas feature include following at least one: beat number per minute, Big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.

Performance emotion identification method the most according to claim 1, it is characterised in that the training package of described emotion recognition model Include:

Treat described in determining training sing audio frequency affective characteristics respectively at the training coordinate figure of the first coordinate axes and the second coordinate axes, Obtain the first training coordinate figure and the second training coordinate figure；Wherein, described first coordinate axes and the second coordinate axes composition plane are straight Angle coordinate system, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding；

The first training matrix is set up, according to described according to described first training coordinate figure and the affective characteristics treating training performance audio frequency Second training coordinate figure and the affective characteristics treating training performance audio frequency set up the second training matrix；

First training matrix is normalized into the first training normalization matrix；Second training matrix is normalized into the second training return One changes matrix；

Described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively, and correspondence obtains the first instruction Practice hyperplane, the second training hyperplane；

First training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtains based on the first coordinate axes first Emotion recognition model；Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains based on the second coordinate Second emotion recognition model of axle.

Performance emotion identification method the most according to claim 3, it is characterised in that the described feelings by performance audio frequency to be identified Sense feature input emotion recognition model, the emotion identifying performance audio frequency to be identified includes:

The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model, determines Described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes；

The quadrant that described affective characteristics is corresponding is determined, to determine that described emotion is special according to described first coordinate figure and the second coordinate figure Levy the performance affective style of correspondence.

Performance emotion identification method the most according to claim 4, it is characterised in that the described feelings by performance audio frequency to be identified Sense feature input emotion recognition model, determines that described affective characteristics the first coordinate figure based on the first coordinate axes includes:

According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix；To described One eigenmatrix is normalized, and obtains the first normalization matrix, and then obtains the affective characteristics of performance audio frequency to be identified Matrix after the first training matrix normalization；By the affective characteristics of described performance audio frequency to be identified through the first training matrix normalizing Matrix, the first training hyperplane and the first emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the first coordinate Axial first coordinate figure.

Performance emotion identification method the most according to claim 4, it is characterised in that the described feelings by performance audio frequency to be identified Sense feature input emotion recognition model, determines that described affective characteristics the second coordinate figure based on the second coordinate axes includes:

According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix；To described Two eigenmatrixes are normalized, and obtain the second normalization matrix, and then obtain the affective characteristics of performance audio frequency to be identified Matrix after the second training matrix normalization；By the affective characteristics of described performance audio frequency to be identified through the second training matrix normalizing Matrix, the second training hyperplane and the second emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the second coordinate Axial second coordinate figure.

Performance emotion identification method the most according to claim 3, it is characterised in that the quadrant of described plane right-angle coordinate Corresponding performance affective style includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.

The quadrant of described plane right-angle coordinate includes with the corresponding relation singing affective style: the corresponding anxiety of first quartile is burnt Consider, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, fourth quadrant is corresponding naturally tranquil.

8. sing emotion recognition device for one kind, it is characterised in that including:

Training module, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model；Described emotion is special Levy and include voice signal property and music score of Chinese operas feature；

Identification module, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies performance to be identified The emotion of audio frequency.

Performance emotion recognition device the most according to claim 8, it is characterised in that described voice signal property includes following At least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, average The centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature；Described music score of Chinese operas feature include following at least one: beat number per minute, Big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.

Performance emotion recognition device the most according to claim 8, it is characterised in that described training module is as follows:

Training coordinate figure determines unit, treat described in determine training sing the affective characteristics of audio frequency respectively at the first coordinate axes and The training coordinate figure of the second coordinate axes, obtains the first training coordinate figure and the second training coordinate figure；Wherein, described first coordinate axes With the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate with sing affective style one a pair Should；

Training matrix determines unit, for setting up according to described first training coordinate figure and the affective characteristics treating training performance audio frequency First training matrix, sets up the second training square according to described second training coordinate figure and the affective characteristics treating training performance audio frequency Battle array；

Training normalization matrix determines unit, for the first training matrix is normalized into the first training normalization matrix；By Two training matrix are normalized into the second training normalization matrix；

Training hyperplane determines unit, for by described first training normalization matrix, the second training normalization matrix generation respectively Entering SVM algorithm, correspondence obtains the first training hyperplane, the second training hyperplane；

Emotion recognition model determines unit, for the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, Obtain the first emotion recognition model based on the first coordinate axes；Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains the second emotion recognition model based on the second coordinate axes.

11. performance emotion recognition devices according to claim 10, it is characterised in that described identification module includes:

Input block, for inputting the first emotion recognition model and the second emotion respectively by the affective characteristics of performance audio frequency to be identified Identify model, determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate based on the second coordinate axes Value；

Determine unit, for determining, according to described first coordinate figure and the second coordinate figure, the quadrant that described affective characteristics is corresponding, with Determine the performance affective style that described affective characteristics is corresponding.

12. performance emotion recognition devices according to claim 11, it is characterised in that described determine unit, specifically for:

13. performance emotion recognition devices according to claim 11, it is characterised in that described determine unit, specifically for:

14. performance emotion recognition devices according to claim 10, it is characterised in that described plane right-angle coordinate as The performance affective style of limit correspondence includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil；Described plane rectangular coordinates The quadrant of system includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second quadrant are corresponding glad joyous Hurry up, third quadrant correspondence sadness is dejected, fourth quadrant is corresponding naturally tranquil.

Sing recognition methods for 15. 1 kinds, it is characterised in that including:

Obtain user and sing audio frequency；

When identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, output correspondence performance result control System instruction.

16. performance recognition methodss according to claim 15, it is characterised in that described performance output control instruction include with Descend at least one: sing bonus point control instruction, signal light control instruction.

Sing for 17. 1 kinds and identify device, it is characterised in that including:

Acquisition module, is used for obtaining user and sings audio frequency；

Identification module, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, it is right to export Answer performance output control instruction, wherein said performance output control instruction include following at least one: sing bonus point control refer to Make, signal light control instructs.