CN106128479A - A kind of performance emotion identification method and device - Google Patents
A kind of performance emotion identification method and device Download PDFInfo
- Publication number
- CN106128479A CN106128479A CN201610517375.4A CN201610517375A CN106128479A CN 106128479 A CN106128479 A CN 106128479A CN 201610517375 A CN201610517375 A CN 201610517375A CN 106128479 A CN106128479 A CN 106128479A
- Authority
- CN
- China
- Prior art keywords
- training
- coordinate
- matrix
- audio frequency
- performance
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 154
- 238000000034 method Methods 0.000 title claims abstract description 77
- 230000008909 emotion recognition Effects 0.000 claims abstract description 183
- 239000000284 extract Substances 0.000 claims abstract description 38
- 239000011159 matrix material Substances 0.000 claims description 282
- 238000010606 normalization Methods 0.000 claims description 114
- 238000000605 extraction Methods 0.000 claims description 28
- 208000019901 Anxiety disease Diseases 0.000 claims description 22
- 230000036506 anxiety Effects 0.000 claims description 22
- 230000008859 change Effects 0.000 claims description 22
- 239000000203 mixture Substances 0.000 claims description 17
- 238000001228 spectrum Methods 0.000 claims description 17
- 230000002996 emotional effect Effects 0.000 description 15
- 230000008569 process Effects 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 102000002274 Matrix Metalloproteinases Human genes 0.000 description 3
- 108010000684 Matrix Metalloproteinases Proteins 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 230000037007 arousal Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006854 communication Effects 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000003340 mental effect Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Child & Adolescent Psychology (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
This application discloses a kind of emotion identification method and device sung, the wherein said affective characteristics treating that audio frequency is sung in training that extracts, training obtains emotion recognition model;Described affective characteristics includes voice signal property and music score of Chinese operas feature;Extract the affective characteristics of performance audio frequency to be identified;The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the emotion of performance audio frequency to be identified.The present embodiment is compared to existing speech emotion recognition and music emotion identification, the present embodiment can go out the performance emotion of corresponding singer according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains according to music score of Chinese operas feature and acoustical signal feature identification, for same song, the emotion of corresponding performance can be identified according to different singers, more precisely identify the emotion of singer.
Description
Technical field
The application belongs to emotion recognition field, specifically, relates to a kind of performance emotion recognition and device.
Background technology
The emotion recognition of present stage audio frequency is broadly divided into speech emotion recognition and music emotion identification two aspect, but from drilling
But nobody relates to sing middle identification emotion, is also a difficult point of audio frequency emotion recognition.It is different from speech emotion recognition and music
Emotion recognition because: rely on tone and word speed can judge emotion in, speech emotion recognition, but performance be all according to
Tone and word speed that song is demarcated are carried out, so the method that foundation tone and word speed identify the emotion in performance is infeasible.Shen
Please number be 200510046169.1, patent publication us " phonetic recognition analysing system and the service of filing date 2005-03-31
Method ", then it is the sound frequency extracting the mankind in person to person's communication process, with sound emotion degree and sound affinity degree as technology
Foundation, draws speech recognition based on Sensory science field and analysis.Sound emotion degree is the tone according to people's sounding and musical note,
Understand its personality, grasp speaker's mental status at that time;Sound affinity is directly driven by human lung according to analysis
Low frequency sounding, and then show the true emotional of speaker.But for singing scene, all demarcate according to song during performance
Tone and word speed are carried out, and identify the emotion of singer infeasible according to tone and musical note in this patent publication us.Two, sound
Happy emotion recognition mainly judges emotion according to audio frequency characteristics and music score of Chinese operas feature, and the emotion therefore judged is all fixing, but
Being that each singer can deduce voluntarily when singing, for a same song, the emotion of the deduction of each singer is also
Differ, so music emotion identification can not be recognized accurately the emotion of corresponding performance according to the performance situation of singer.
To sum up, singing emotion recognition is a frontier being totally different from speech emotion recognition and music emotion identification,
Prior art does not has to provide a solution, to realize identifying the emotion of singer from sing.
Summary of the invention
In view of this, technical problems to be solved in this application there is provided a kind of emotion recognition and device sung, permissible
Realize identifying the emotion of singer from sing.
In order to solve above-mentioned technical problem, this application discloses a kind of performance emotion identification method, including:
Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described affective characteristics includes sound
Tone signal feature and music score of Chinese operas feature;
Extract the affective characteristics of performance audio frequency to be identified;
The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the feelings of performance audio frequency to be identified
Sense.
For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion recognition device, including:
Training module, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described feelings
Sense feature includes voice signal property and music score of Chinese operas feature;
Extraction module, for extracting the affective characteristics of performance audio frequency to be identified;
Identification module, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies to be identified
Sing the emotion of audio frequency.
For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion identification method, including:
Obtain user and sing audio frequency;
When identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, performance corresponding to output is tied
Really control instruction.
For solving above-mentioned technical problem, disclosed herein as well is a kind of performance emotion recognition device, including:
Acquisition module, is used for obtaining user and sings audio frequency;
Identification module, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, defeated
Go out the performance output control instruction of correspondence.
Compared with prior art, the application can obtain and include techniques below effect:
The affective characteristics of the embodiment of the present application extraction and speech emotion recognition and music emotion identification are in terms of feature extraction
There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only
It is tone, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag
Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement
Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains
The emotion of singer is more precisely identified with voice signal property.Concrete, by extracting, the present embodiment treats that training is sung
The affective characteristics of audio frequency, training obtains emotion recognition model;Described affective characteristics includes voice signal property and music score of Chinese operas feature;Carry
Take the affective characteristics of performance audio frequency to be identified;The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies
The emotion of performance audio frequency to be identified.The embodiment of the present application is capable of the performance of the performance audio identification singer according to singer
Affective style, can identify the emotion of singer from sing.
Certainly, the arbitrary product implementing the application it is not absolutely required to reach all the above technique effect simultaneously.
Accompanying drawing explanation
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please is used for explaining the application, is not intended that the improper restriction to the application.In the accompanying drawings:
Figure 1A is a kind of schematic flow sheet singing emotion identification method that some embodiment of the application provides;
Figure 1B is the schematic flow sheet of a kind of emotion recognition method for establishing model that some embodiment of the application provides;
Fig. 2 A is the schematic flow sheet of another performance emotion identification method that some embodiment of the application provides;
Fig. 2 B is the schematic flow sheet singing emotion identification method provided based on Fig. 2 some embodiment of A the application;
Fig. 3 is the another kind of schematic flow sheet singing emotion identification method that some embodiment of the application provides;
Fig. 4 is the another kind of emotion recognition method for establishing model schematic flow sheet that some embodiment of the application provides;
Fig. 5 A is pressure factor and the capacity factor composition plane right-angle coordinate of some embodiment of the application offer;
Fig. 5 B is a part of flow process signal of a kind of emotion recognition method for establishing model that some embodiment of the application provides
Figure;
Fig. 6 A is the schematic flow sheet singing emotion identification method that some embodiment of the application provides;
Fig. 6 B is the part schematic flow sheet singing emotion identification method that some embodiment of the application provides;
Fig. 6 C is another part schematic flow sheet singing emotion identification method that some embodiment of the application provides;
Fig. 7 is a kind of schematic flow sheet singing recognition methods that some embodiment of the application provides;
Fig. 8 is a kind of structural representation singing emotion recognition device that some embodiment of the application provides;
Fig. 9 is a kind of structural representation singing identification device that some embodiment of the application provides;
Figure 10 is the structural representation of the electric terminal that some embodiment of the application provides.
Detailed description of the invention
Describe presently filed embodiment in detail below in conjunction with drawings and Examples, thereby how the application is applied
Technological means solves technical problem and reaches the process that realizes of technology effect and can fully understand and implement according to this.
Embodiment one
Refer to Figure 1A, it is shown that the embodiment of the present application provides a kind of schematic flow sheet singing emotion identification method, this
Application can apply to terminal unit, it is also possible to being applied to emotion recognition model and set up device, this device can be with software, hardware
Or the mode of software and hardware combining is typically positioned in terminal unit.Say as a example by executive agent is as terminal unit below
Bright, the method shown in Figure 1A can be implemented as described below.
Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model;Described emotion is special
Levy and include voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification extracted
Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average,
Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described
Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each
The average duration of sound.
Optionally, as shown in Figure 1B, the training method of the present embodiment emotion recognition model is as follows.
Step 1011, determine described in treat training sing audio frequency affective characteristics respectively at the first coordinate axes and the second coordinate axes
Training coordinate figure, obtain the first training coordinate figure and second training coordinate figure;Wherein, described first coordinate axes and the second coordinate
Axle composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding.
Step 1012, according to described first training coordinate figure and treat training sing audio frequency affective characteristics set up the first training
Matrix, sets up the second training matrix according to described second training coordinate figure and the affective characteristics treating training performance audio frequency;
Step 1013, the first training matrix is normalized into the first training normalization matrix;By the second training matrix normalizing
Chemical conversion the second training normalization matrix;
Step 1014, by described first training normalization matrix, second training normalization matrix substitute into SVM algorithm respectively,
Correspondence obtains the first training hyperplane, the second training hyperplane;
Step 1015, by first training hyperplane and first training normalization matrix substitute into SVM algorithm, obtain based on first
First emotion recognition model of coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains
The second emotion recognition model based on the second coordinate axes.Described first emotion recognition model is for determining performance audio frequency to be identified
Affective characteristics is at the first coordinate figure of the first change in coordinate axis direction, and the second emotion recognition model is for determining performance audio frequency to be identified
Affective characteristics is at the second coordinate figure of the second change in coordinate axis direction.
Step 102, extract the affective characteristics of performance audio frequency to be identified.With as step 101, step 102 extract emotion
Feature includes voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification is extracted
Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale base poor, average
Frequently, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Institute
State music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, every
The average duration of individual sound.
Step 103, the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identify performance sound to be identified
The emotion of frequency.Concrete, step 103 includes:
The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model,
Determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;
The quadrant that described affective characteristics is corresponding is determined, to determine described feelings according to described first coordinate figure and the second coordinate figure
The performance affective style that sense feature is corresponding.
The affective characteristics that the present embodiment extracts exists in terms of feature extraction with speech emotion recognition and music emotion identification
Difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also sound
Tune, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature and (are included in
In voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, the present embodiment root
According to including that the emotion recognition model that the affective characteristics of voice signal property and music score of Chinese operas feature obtains can be according to music score of Chinese operas feature harmony
Tone signal feature, identifies the performance emotion of corresponding singer, for same song, it is possible to identify according to different singers
The corresponding emotion sung, more precisely identifies the emotion of singer.Concrete, by extracting, the present embodiment treats that training is sung
The affective characteristics of audio frequency, training obtains emotion recognition model;Described affective characteristics includes voice signal property and music score of Chinese operas feature;Carry
Take the affective characteristics of performance audio frequency to be identified;The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies
The emotion of performance audio frequency to be identified.The embodiment of the present application is capable of the performance of the performance audio identification singer according to singer
Affective style, can identify the emotion of singer from sing.
Embodiment two
In conjunction with Figure 1A to Fig. 2 B, the embodiment of the present application provides one to sing emotion identification method, for based on enforcement one one
Kind can implementation, especially by the following manner realize.Here, the first coordinate axes can be X-axis, and the second coordinate axes can be
Y-axis.
Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model;Described emotion is special
Levy and include voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification extracted
Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average,
Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described
Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each
The average duration of sound.
Optionally, as shown in Figure 1B, the training method of the present embodiment emotion recognition model is as follows.
Step 1011, determine described in treat training sing audio frequency affective characteristics respectively at the first coordinate axes and the second coordinate axes
Training coordinate figure, obtain the first training coordinate figure and second training coordinate figure;Wherein, described first coordinate axes and the second coordinate
Axle composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding.
Optionally, the performance affective style that the quadrant of described plane right-angle coordinate is corresponding includes: nervous anxiety, happiness are joyous
Hurry up, sad dejected, calmness naturally.The quadrant of described plane right-angle coordinate and the corresponding relation singing affective style include: the
One quadrant correspondence anxiety anxiety, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, the corresponding nature of fourth quadrant
Tranquil.
Step 1012, according to described first training coordinate figure and treat training sing audio frequency affective characteristics set up the first training
Matrix, sets up the second training matrix according to described second training coordinate figure and the affective characteristics treating training performance audio frequency;
Step 1013, the first training matrix is normalized into the first training normalization matrix;By the second training matrix normalizing
Chemical conversion the second training normalization matrix;
Step 1014, by described first training normalization matrix, second training normalization matrix substitute into SVM algorithm respectively,
Correspondence obtains the first training hyperplane, the second training hyperplane;
Step 1015, by first training hyperplane and first training normalization matrix substitute into SVM algorithm, obtain based on first
First emotion recognition model of coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains
The second emotion recognition model based on the second coordinate axes.Described first emotion recognition model is for determining performance audio frequency to be identified
Affective characteristics is at the first coordinate figure of the first change in coordinate axis direction, and the second emotion recognition model is for determining performance audio frequency to be identified
Affective characteristics is at the second coordinate figure of the second change in coordinate axis direction.
Step 102, extract the affective characteristics of performance audio frequency to be identified.With as step 101, step 102 extract emotion
Feature includes voice signal property and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and music emotion identification is extracted
Feature, the present embodiment extract voice signal property include following at least one: average energy, energy scale base poor, average
Frequently, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Institute
State music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, every
The average duration of individual sound.
Step 103, the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identify performance sound to be identified
The emotion of frequency.Concrete, step 103 includes:
The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model,
Determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;
The quadrant that described affective characteristics is corresponding is determined, to determine described feelings according to described first coordinate figure and the second coordinate figure
The performance affective style that sense feature is corresponding.
As shown in Figure 2 A, in a kind of feasible embodiment, step 103 obtains the first coordinate figure by the following method.
In step 1030, according to voice signal property, the music score of Chinese operas feature of described performance audio frequency to be identified and sit based on first
First training matrix of parameter, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property
(Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) add based on the first coordinate
First training matrix of axleLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training
Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training
Coordinate figure determines.In the present invention, parameter g in described voice signal property and music score of Chinese operas feature is song to be identified, and n is sound
Signal characteristic number, m be music score of Chinese operas number of features, L be the number of training song.
In step 1032, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain
To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization
MatrixThen extract matrix last column data, obtain described in wait to know
Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalizationgx,1…agx,n bgx,1…bgx,m)。
In step 1034, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization
Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified
Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (agx,1…agx,n bgx,1…
bgx,m), first training hyperplaneThe first emotion recognition model T with X-axisXIt is updated to
In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) sit in the first of X-direction
Scale value Xg;Wherein, describedFor training the pth of voice signal propertyiIndividual feature,For training the q of music score of Chinese operas featureiIndividual spy
Levy, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training
The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments
Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;
Described the first emotion recognition model based on the first coordinate axes is based on described first training hyperplane and the first training normalized moments
Battle array determines.
As shown in Figure 2 B, in a kind of feasible embodiment, step 103 obtains the second coordinate figure by the following method.
In step 1030 ', sit according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on second
Second training matrix of parameter, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property
(Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on Y-axis the is added
Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training
Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes
Scale value determines.
In step 1032 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then
Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization
MatrixThen extract matrix last column data, obtain described in wait to know
Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalizationgy,1…agy,n bgy,1…bgy,m)。
In step 1034 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization
Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified
Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (aGx, 1…aGx, n bGx, 1…
bGx, m), second training hyperplaneThe second emotion recognition model T with Y-axisYIt is updated to SVM
In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) at the second coordinate of Y direction
Value Yg, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1…
ri∈[1,n];s1…si∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song
Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true
Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;Described
The second emotion recognition model based on the second coordinate axes is true based on described second training hyperplane and the second training normalization matrix
Fixed.
It should be appreciated that step 1030 and step 1030 ' execution sequence without successively, can synchronize to perform.In like manner, suddenly
1032 and step 1032 ' execution sequence without successively, can synchronize to perform.Rapid 1034 and step 1034 ' execution sequence without successively,
Can synchronize to perform.
In the embodiment of the present application, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described plane is straight
The quadrant of angle coordinate system and performance affective style one_to_one corresponding.The embodiment of the present application is by the acoustical signal of performance audio frequency to be identified
Feature and music score of Chinese operas feature determine that the affective characteristics of described performance audio frequency to be identified is respectively at the first coordinate axes and the second coordinate axes
Coordinate figure, and determine, according to the first coordinate figure and the second coordinate figure, the performance that the affective characteristics of described performance audio frequency to be identified is corresponding
Affective style, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can know from sing
The emotion of other singer.
It addition, the affective characteristics of the present embodiment extraction and speech emotion recognition and music emotion identification are in terms of feature extraction
There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only
It is tone, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag
Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement
Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains
And voice signal property, identify the performance emotion of corresponding singer, for same song, it is possible to know according to different singers
Do not go out the emotion of corresponding performance, more precisely identify the emotion of singer.
Embodiment three
Referring to Fig. 3, the embodiment of the present application provides a kind of emotion identification method of singing, the present embodiment and embodiment one,
Two is roughly the same, and the present embodiment is specifically told about: sets up the first emotion recognition model based on the first coordinate axes and sits based on second
Second emotion recognition model of parameter, specifically can be accomplished by.
In step 301, extract voice signal property and the music score of Chinese operas feature treating that audio frequency is sung in training.Concrete, extract and wait to instruct
Practice the voice signal property A singing audio frequencyi,jAnd music score of Chinese operas feature Bi,k.Wherein, Ai,jRepresent that the i-th head treats that the jth of audio frequency is sung in training
The eigenvalue of individual voice signal property, 1≤j≤n, n are voice signal property total number, Bi,kRepresent that the i-th head treats that sound is sung in training
The eigenvalue of the kth music score of Chinese operas feature of frequency, 1≤k≤m, m are music score of Chinese operas feature total number.
In step 302, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training
Training coordinate figure and the second training coordinate figure.Here, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The
One training coordinate figure XiRepresent that the i-th head that the professional of connection music marks treats that training sings audio frequency at the first coordinate axes
Coordinate figure, the second training coordinate figure YiRepresent that i-th head that marks of professional of connection music treats that training sings audio frequency the
The coordinate figure of two coordinate axess, then the i-th head treats that audio frequency characteristics is sung in training is (Ai,1…Ai,n Bi,1…Bi,m Xi Yi)。XiAnd Yi
Can directly use the coordinate figure that the professional of connection music marks in advance.
In step 303, determine respectively based on the first coordinate axes according to described first training coordinate figure, the second training coordinate figure
The first training matrix, the second training matrix based on the second coordinate axes.After the feature of all L songs has all been extracted,
The matrix of a L* (n+m+2) will be formedThis matrix is divided into base
The first training matrix in the first coordinate axesWith based on the second coordinate axes
Second training matrix
In step 304, described first training matrix, the second training matrix are normalized respectively, obtain first
Training normalization matrix and the second training normalization matrix.Concrete, to X-axis the first training matrixIn data be normalized by row, make span be [-1,
1], the first training normalization matrix after normalization isIn formula:
ai,j∈[-1,1],bi,k∈[-1,1],xi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].
In like manner, the second training matrix to Y-axis obtain after carrying out same normalized normalized second training return
One changes matrix battle array is:In formula:
ai,j∈[-1,1],bi,k∈[-1,1],yi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].
In step 305, described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively,
Obtain the first training hyperplane based on the first coordinate axes, the second training hyperplane based on the second coordinate axes.First instruction of X-axis
Practice normalization matrixSubstituting into SVM algorithm, this algorithm will ask for the one of X-direction
Individual hyperplane, this hyperplane can be by xiIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by sound
Partial Feature composition in signal characteristic and music score of Chinese operas feature, if the hyperplane of the X-axis tried to achieve isWhereinPth for voice signal propertyiIndividual feature,For music score of Chinese operas feature
qiIndividual feature, p1…pi∈[1,n];q1…qi∈[1,m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal propertyiIndividual feature,S for music score of Chinese operas featureiIndividual
Feature, r1…ri∈[1,n];s1…si∈[1,m]。
In step 306, the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtain based on the
The emotion recognition model of one coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains base
Emotion recognition model in the second coordinate axes.According to the X-axis hyperplane tried to achieveWherein
p1…pi∈[1,n];q1…qi∈ [1, m], andBring in SVM algorithm,
Try to achieve the emotion recognition model of X-axis, be set to TX.In like manner can try to achieve the emotion recognition model of Y-axis, be set to TY。
Embodiment four
In conjunction with Figure 1A to Fig. 3, the embodiment of the present application provides one to sing emotion identification method, generally comprises two processes:
(1) foundation of emotion recognition model is sung;(2) identification of emotion is sung.
(1) foundation of emotion recognition model is sung
This process is mainly used in: set up the first emotion recognition model based on the first coordinate axes and based on the second coordinate axes
Second emotion recognition model.During setting up performance emotion recognition model, a large amount of collection in advance is needed to comprise various emotion
Performance voice data (as treat training sing audio frequency), sing voice data require be pure voice as far as possible, collect correspondence simultaneously
The music score of Chinese operas given song recitals.
These emotions singing audio frequency collected are classified by the personnel then looking for a little connection music specialty: first
Determining the kind of emotional semantic classification, the most each performance audio frequency is desirable that the professional of connection music each listens one time, and respectively
From carrying out Emotion tagging, when major part professional all think this initial performance current sing audio frequency belong to a certain emotion time, then ought
This first audio frequency front is assigned under the catalogue of this emotion, otherwise abandons this audio frequency, is all classified by all performance audio frequency according to this.Need
Illustrate: sing the performance that there may be the situation such as prelude and climax parts of singing emotion change in audio frequency for one section
Emotion may be different, and this performance audio frequency now should be divided into some section audios of emotion independence by the professional of connection music,
The emotion making every section of interior audio frequency is consistent, and the music score of Chinese operas of simultaneously corresponding song also should be by audio content segmentation and carry out mark
Note is allowed to the audio frequency one_to_one corresponding with segmentation.
After said process, it is possible to audio frequency will be sung and press emotional semantic classification, and make the audio number of every class emotion identical;
The music score of Chinese operas of song of simultaneously also having classified, the audio frequency one_to_one corresponding being allowed to Yu having classified.
By emotional category analysis the voice signal property of the performance audio frequency extracting each emotional category, extract simultaneously and drill
The music score of Chinese operas feature of bent correspondence of singing.It should be understood that be different from speech emotion recognition and spy that music emotion identification is extracted
Levying, the feature extracted herein comprises following content, and the voice signal property singing audio frequency extracts the following aspects
Content: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment,
Centre of moment standard deviation, MFCC feature, language spectrum signature;The content of music score of Chinese operas feature extraction the following aspects: beat number per minute, big
Tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.Sing voice signal property and the music score of Chinese operas of audio frequency
Feature be both for that same head gives song recitals same section carries out extracting, and has sung which song, the music score of Chinese operas as sung audio frequency
In extract the feature of these several the music of song the most accordingly.(remarks: speech emotion recognition has only to extract audio frequency characteristics and is not related to the music score of Chinese operas
The extraction of feature, and its audio frequency characteristics also simply tone, word speed etc.;Although audio frequency characteristics and the music score of Chinese operas are also extracted in music emotion identification
Feature, but it is not related to the extraction of language spectrum signature etc..Therefore with speech emotion recognition and music emotion identification in feature extraction side
There is difference in face.)
After carrying out above-mentioned pretreatment work, the foundation singing emotion recognition model specifically can be real in the following manner
Existing.Here, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate with
Sing affective style one_to_one corresponding.First coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.Described plane rectangular coordinates
The performance affective style that the quadrant of system is corresponding includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane is straight
The quadrant of angle coordinate system includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second quadrant are corresponding
Happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, fourth quadrant is corresponding naturally tranquil.
Concrete, performances emotion is divided into 4 classifications by the present embodiment, the saddest anxiety cheerful and light-hearted, nervous dejected, glad and oneself
The tranquilest, four quadrants of corresponding flat rectangular coordinate system respectively, the affective style given song recitals is by the professional people of connection music
Member is labeled in extracted emotional category characteristic (X and Y-direction in plane right-angle coordinate with coordinate form after determining
Span is [-1,1], value more deviation X and Y coordinates axle, illustrates that its certain emotion is the most obvious;Value, the closer to X, Y coordinate axle, is said
Its certain affective characteristics bright is the faintest).The training of the present embodiment and recognizer are SVM algorithm, by the professional people of connection music
Member has marked the coordinate figure that user sings the quadrant at emotion place, extracts user and sings affective characteristics and extract its performance emotion seat
Point X-axis data and Y-axis data after completing the extraction of all features and coordinate, are normalized, then divide by scale value respectively
Jia Ru not be trained by SVM.The data trained according to these, it is special in the emotion of X-axis and Y-axis that SVM can show that user sings emotion
The optimal hyperplane value levied, thus obtain emotion recognition model based on X-axis and Y-axis.
Step 100, extract treat training sing audio frequency affective characteristics, training obtain emotion recognition model;Described emotion is special
Levy and include voice signal property and music score of Chinese operas feature.In conjunction with Figure 1A and Fig. 3, the journey of setting up of emotion recognition model specifically see Fig. 3
Shown implementation method.
In step 301, extract voice signal property and the music score of Chinese operas feature treating that audio frequency is sung in training.Concrete, extract and wait to instruct
Practice the voice signal property A singing audio frequencyi,jAnd music score of Chinese operas feature Bi,k.Wherein, Ai,jRepresent that the i-th head treats that the jth of audio frequency is sung in training
The eigenvalue of individual voice signal property, 1≤j≤n, n are voice signal property total number, Bi,kRepresent that the i-th head treats that sound is sung in training
The eigenvalue of the kth music score of Chinese operas feature of frequency, 1≤k≤m, m are music score of Chinese operas feature total number.
In step 302, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training
Training coordinate figure and the second training coordinate figure.First training coordinate figure XiRepresent connection music professional mark i-th
Head treats that the audio frequency coordinate figure at the first coordinate axes, the second training coordinate figure Y are sung in trainingiRepresent professional's mark of connection music
The i-th head noted treats that the audio frequency coordinate figure at the second coordinate axes is sung in training, and then the i-th head treats that training is sung audio frequency characteristics and is
(Ai,1…Ai,n Bi,1…Bi,m Xi Yi)。XiAnd YiCan directly use the coordinate figure that the professional of connection music marks in advance.
In step 303, determine respectively based on the first coordinate axes according to described first training coordinate figure, the second training coordinate figure
The first training matrix, the second training matrix based on the second coordinate axes.After the feature of all L songs has all been extracted,
The matrix of a L* (n+m+2) will be formedThen this matrix is divided
Become the first training matrix based on the first coordinate axesWith based on the second coordinate
Second training matrix of axle
In step 304, described first training matrix, the second training matrix are normalized respectively, obtain first
Training normalization matrix and the second training normalization matrix.Concrete, to X-axis the first training matrixIn data be normalized by row, make span be [-1,
1], the first training normalization matrix after normalization isIn formula:
ai,j∈[-1,1],bi,k∈[-1,1],xi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].
In like manner, the first training matrix to Y-axis obtain after carrying out same normalized normalized second training return
One changes matrix battle array is:In formula:
ai,j∈[-1,1],bi,k∈[-1,1],yi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].
In step 305, described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively,
Obtain the first training hyperplane based on the first coordinate axes, the second training hyperplane based on the second coordinate axes.First instruction of X-axis
Practice normalization matrixSubstituting into SVM algorithm, this algorithm will ask for the one of X-direction
Individual hyperplane, this hyperplane can be by xiIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by sound
Partial Feature composition in signal characteristic and music score of Chinese operas feature, if the hyperplane of the X-axis tried to achieve isWhereinPth for voice signal propertyiIndividual feature,Q for music score of Chinese operas featurei
Individual feature, p1…pi∈[1,n];q1…qi∈[1,m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal propertyiIndividual feature,S for music score of Chinese operas featureiIndividual
Feature, r1…ri∈[1,n];s1…si∈[1,m]。
In step 306, the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtain based on the
First emotion recognition model of one coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm,
To the second emotion recognition model based on the second coordinate axes.According to the X-axis hyperplane tried to achieveWherein p1…pi∈[1,n];q1…qi∈ [1, m], andBring in SVM algorithm, the first emotion recognition model of X-axis can be tried to achieve, if
For TX.In like manner can try to achieve the second emotion recognition model of Y-axis, be set to TY。TXAnd TYIt is the performance emotion recognition model of foundation.
Described first emotion recognition model is for determining that the affective characteristics of performance audio frequency to be identified is at the first change in coordinate axis direction
The first coordinate figure, the second emotion recognition model is for determining that the affective characteristics of performance audio frequency to be identified is at the second change in coordinate axis direction
The second coordinate figure.
(2) identification of emotion is sung
Step 102, extract the affective characteristics of performance audio frequency to be identified.The affective characteristics that step 102 is extracted includes that sound is believed
Number feature and music score of Chinese operas feature.Optionally, it is different from speech emotion recognition and feature that music emotion identification is extracted, the present embodiment
The voice signal property extracted include following at least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, super
Cross the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described music score of Chinese operas feature include with
Descend at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.
Step 103, the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identify performance sound to be identified
The emotion of frequency.Concrete, step 103 includes:
The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model,
Determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;
The quadrant that described affective characteristics is corresponding is determined, to determine described feelings according to described first coordinate figure and the second coordinate figure
The performance affective style that sense feature is corresponding.
As shown in Figure 2 A, in a kind of feasible embodiment, step 103 obtains the first coordinate figure by the following method.
In step 1030, according to voice signal property, the music score of Chinese operas feature of described performance audio frequency to be identified and sit based on first
First training matrix of parameter, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property
(Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on X-axis the is added
One training matrixLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training
Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training
Coordinate figure determines
In step 1032, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain
To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization
MatrixThen extract matrix last column data, obtain described in wait to know
Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalizationgx,1…agx,n bgx,1…bgx,m)。
In step 1034, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization
Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified
Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (agx,1…agx,n bgx,1…
bgx,m), first training hyperplaneThe first emotion recognition model T with X-axisXIt is updated to
In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) in the first of X-direction
Coordinate figure Xg;Wherein, describedFor training the pth of voice signal propertyiIndividual feature,For training the q of music score of Chinese operas featureiIndividual spy
Levy, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training
The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments
Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;
Described the first emotion recognition model based on the first coordinate axes is based on described first training hyperplane and the first training normalized moments
Battle array determines.
As shown in Figure 2 B, in a kind of feasible embodiment, step 103 obtains the second coordinate figure by the following method.
In step 1030 ', sit according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on second
Second training matrix of parameter, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property
(Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on Y-axis the is added
Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training
Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes
Scale value determines.
In step 1032 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then
Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization
MatrixThen extract matrix last column data, obtain described in wait to know
Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalizationgy,1…agy,n bgy,1…bgy,m)。
In step 1034 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization
Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified
Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (agy,1…agy,n bgy,1…
bgy,m), second training hyperplaneThe second emotion recognition model T with Y-axisYIt is updated to SVM
In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) at the second coordinate of Y direction
Value Yg, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1…
ri∈[1,n];s1…si∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song
Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true
Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;Described
The second emotion recognition model based on the second coordinate axes is true based on described second training hyperplane and the second training normalization matrix
Fixed.
It should be appreciated that step 1030 and step 1030 ' execution sequence without successively, can synchronize to perform.In like manner, suddenly
1032 and step 1032 ' execution sequence without successively, can synchronize to perform.Rapid 1034 and step 1034 ' execution sequence without successively,
Can synchronize to perform.
In the embodiment of the present application, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described plane is straight
The quadrant of angle coordinate system and performance affective style one_to_one corresponding.The embodiment of the present application is by the acoustical signal of performance audio frequency to be identified
Feature and music score of Chinese operas feature determine that the affective characteristics of described performance audio frequency to be identified is respectively at the first coordinate axes and the second coordinate axes
Coordinate figure, and determine, according to the first coordinate figure and the second coordinate figure, the performance that the affective characteristics of described performance audio frequency to be identified is corresponding
Affective style, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can know from sing
The emotion of other singer.
It addition, the affective characteristics of the present embodiment extraction and speech emotion recognition and music emotion identification are in terms of feature extraction
There is difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also only
It is tone, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature (bag
Include in voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, this enforcement
Example can be according to music score of Chinese operas feature according to the emotion recognition model that the affective characteristics including voice signal property and music score of Chinese operas feature obtains
And voice signal property, identify the performance emotion of corresponding singer, for same song, it is possible to know according to different singers
Do not go out the emotion of corresponding performance, more precisely identify the emotion of singer.
Embodiment five
Refer to Fig. 4 to Fig. 5 B, it is shown that the optional emotion recognition method for establishing model of the embodiment of the present application other
Schematic flow sheet, the application can apply to terminal unit, it is also possible to being applied to emotion recognition model and set up device, this device can
It is typically positioned in terminal unit in the way of with software, hardware or software and hardware combining.Set with executive agent for terminal below
Illustrate as a example by Bei, can be implemented as described below in conjunction with the method shown in Fig. 4 to Fig. 5 B.
In step 400, obtain and treat that audio sample is sung in training, treat that sound is sung in training according to default affective style to described
Frequently sample carries out emotional semantic classification, determines that multiple the waiting corresponding with affective style trains performance audio frequency subsample;Wherein, it is used for determining
The emotional factor of affective style includes pressure factor and capacity factor.
This step is collected the performance audio frequency comprising various emotion in a large number and is i.e. treated that audio frequency is sung in training, as treating that sound is sung in training
Frequently sample.Sing audio request is pure voice as far as possible, collects simultaneously and sings the music score of Chinese operas that audio frequency correspondence gives song recitals.This step is permissible
It is that terminal unit is directly from local or storage device or the performance audio frequency of Network Capture collector collection.
After having collected performance audio frequency, terminal unit can carry out emotion according to default affective style to performance audio frequency and divide
Class.Concrete, the performance audio frequency collected can be carried out emotional semantic classification, also according to the criteria for classification of the personnel of connection music specialty
The personnel that can directly ask music speciality carry out emotional semantic classification according to their criteria for classification.The classification of the personnel of connection music specialty
Standard can be such that the kind first determining emotional semantic classification, the most each performance audio frequency are desirable that the professional people of connection music
Member each listens one time, and each carries out Emotion tagging, when major part professional thinks that this initial performance current is sung audio frequency and belonged to certain
During a kind of emotion, then this first audio frequency current is assigned under the catalogue of this emotion, otherwise abandon this audio frequency, according to this by all performances
Audio frequency is all classified.Should be noted that: one section sing in audio frequency there may be sing the situation of emotion change such as before
Playing the performance emotion with climax parts may be different, now by the professional of connection music, this performance audio frequency should be divided into emotion
Independent some section audios so that the emotion of every section of interior audio frequency is consistent, the music score of Chinese operas of the most corresponding song also should be by sound
Frequently content section carry out the audio frequency one_to_one corresponding that mark is allowed to segmentation.
After terminal unit carries out above-mentioned emotional semantic classification, it may be determined that multiple wait corresponding with affective style trains performance audio frequency
Sample.Concrete, audio frequency will be sung and press emotional semantic classification, and make the audio number of every class emotion identical, while also to classify
The music score of Chinese operas of song, makes the music score of Chinese operas and the audio frequency one_to_one corresponding classified.
Step 402, extraction respectively treat that the affective characteristics of audio frequency subsample is sung in training, treat that audio frequency is sung in training described in all
The affective characteristics of subsample is based respectively on pressure dimension and energy dimension is normalized, and correspondence obtains normalization pressure feelings
Sense feature and normalized energy affective characteristics.
Step 404, described normalization pressure affective characteristics and normalized energy affective characteristics are carried out SVM algorithm instruction respectively
Practicing, correspondence obtains the pressure index for determining pressure factor size and for determining the nergy Index of capacity factor height.
Step 406, described normalization pressure affective characteristics and pressure index are carried out SVM algorithm training, obtain for really
First emotion recognition model of constant-pressure factor;Described normalized energy affective characteristics and nergy Index are carried out SVM algorithm instruction
Practice, obtain the second emotion recognition model for determining capacity factor.
It will be understood by those skilled in the art that in the said method of the application detailed description of the invention, the sequence number of each step
Size is not meant to the priority of execution sequence, and the execution sequence of each step, logical combination should be true with its function and internal logic
Fixed, and the implementation process of the application detailed description of the invention should not constituted any restriction.
In the embodiment of the present application, can be obtained by step 400-406 can determine pressure factor first emotion know
Other model and determine the second emotion recognition model of capacity factor, so that the terminal unit of this method can be performed or other can be indirect
The terminal unit calling this method can determine performance to be identified according to the first emotion recognition model and the second emotion recognition model
The pressure factor of audio frequency and capacity factor, and then determine the performance affective style that performance audio frequency to be identified is corresponding.Pass through the application
Embodiment, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can identify from sing
The emotion of singer.
In an optional embodiment, two principal elements affecting music emotion are pressure and energy, due to pressure
Can be the most corresponding with acoustic features with capacity factor, therefore can be music according to the power of pressure factor (Valence)
Affective characteristics be divided into from anxiety to happy, can be the emotion of music according to the power of capacity factor (Arousal)
Feature be divided into from vigor to tranquil.Four area of space that corresponding two dimensional surface rectangular coordinate system is divided into, music
Be segmented into following four big classes: nervous/frightened, cheerful, meet, dejected.As shown in Figure 5A, pressure
(Valence) dimension can be represented by the first coordinate axes, and energy (Arousal) dimension can be represented by the second coordinate axes, wherein,
First coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate and performance emotion class
Type one_to_one corresponding.First coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The quadrant pair of described plane right-angle coordinate
The performance affective style answered includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane right-angle coordinate
Quadrant includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second corresponding happiness of quadrant be cheerful and light-hearted, the
Three quadrant correspondence sadnesss are dejected, fourth quadrant is corresponding naturally tranquil.
Based on above-mentioned optional embodiment, as shown in Figure 5 B, step 402 may be accomplished by.
Step 4021, determine described in treat that the first instruction corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training
Practice coordinate figure and the second training coordinate figure.Wherein, described affective characteristics can include voice signal property and music score of Chinese operas feature.
By emotional category analysis the voice signal property of the performance audio frequency extracting each emotional category, extract simultaneously and drill
The music score of Chinese operas feature of bent correspondence of singing.It should be understood that be different from speech emotion recognition and spy that music emotion identification is extracted
Levying, the feature extracted herein comprises following content, and the voice signal property singing audio frequency extracts the following aspects
Content: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment,
Centre of moment standard deviation, MFCC feature, language spectrum signature;The content of music score of Chinese operas feature extraction the following aspects: beat number per minute, big
Tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.Sing voice signal property and the music score of Chinese operas of audio frequency
Feature be both for that same head gives song recitals same section carries out extracting, and has sung which song, the music score of Chinese operas as sung audio frequency
In extract the feature of these several the music of song the most accordingly.
Concrete, extract the voice signal property A treating that audio frequency is sung in trainingi,jAnd music score of Chinese operas feature Bi,k.Wherein, Ai,jRepresent
I-th head treats that the eigenvalue of the jth voice signal property of audio frequency is sung in training, and 1≤j≤n, n are voice signal property total number,
Bi,kRepresenting that the i-th head treats that the eigenvalue of the kth music score of Chinese operas feature of audio frequency is sung in training, 1≤k≤m, m are music score of Chinese operas feature total number.
In step 4021, determine described in treat that first corresponding to the voice signal property of audio frequency and music score of Chinese operas feature is sung in training
Training coordinate figure and the second training coordinate figure.Here, the first training coordinate figure XiRepresent professional's mark of connection music
The i-th good head treats that the audio frequency coordinate figure at the first coordinate axes, the second training coordinate figure Y are sung in trainingiRepresent the special of connection music
The i-th head that industry personnel mark treats that the audio frequency coordinate figure at the second coordinate axes is sung in training, and then the i-th head treats that audio frequency is sung in training
It is characterized as (Ai,1…Ai,n Bi,1…Bi,m Xi Yi)。XiAnd YiCan directly use the seat that the professional of connection music marks in advance
Scale value.After the feature of all L songs has all been extracted, the matrix of a L* (n+m+2) will be formed
Step 4022, according to treating that affective characteristics and the first training coordinate figure of audio frequency subsample are sung in training described in all,
Determine the first training matrix based on the first coordinate axes;According to treat described in all training sing audio frequency subsample affective characteristics and
Second training coordinate figure determines the second training matrix based on the second coordinate axes.
Concrete, the first training matrix based on X-axis isBased on Y
Second training matrix of axle is
Step 4023, being normalized described first training matrix, the second training matrix respectively, correspondence obtains
One training normalization matrix, the second training normalization matrix.Here, the first training normalized moments matrix representation normalization pressure feelings
Sense feature, the second training normalized moments matrix representation normalized energy affective characteristics.
Concrete, to X-axis the first training matrixIn data press
Row are normalized, and making span is [-1,1], and the first training normalization matrix after normalization isIn formula:
ai,j∈[-1,1],bi,k∈[-1,1],xi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].
In like manner, the first training matrix to Y-axis obtain after carrying out same normalized normalized second training return
One changes matrix battle array is:In formula:
ai,j∈[-1,1],bi,k∈[-1,1],yi∈ [-1,1], j ∈ [1, n], k ∈ [1, m], i ∈ [1, L].
Based on above-mentioned optional embodiment, step 404 particularly as follows: train normalization matrix, the second instruction by described first
Practicing normalization matrix and substitute into SVM algorithm respectively, correspondence obtains the first training hyperplane based on the first coordinate axes, sits based on second
Second training hyperplane of parameter;Wherein, described first training hyperplane is used for determining pressure factor size, described second training
Hyperplane is used for determining capacity factor height.Concrete, by the first training normalization matrix of X-axisSubstituting into SVM algorithm, this algorithm will ask for a hyperplane of X-direction, should
Hyperplane can be by xiIn more than 0 and less than 0 part distinguish as far as possible, the hyperplane tried to achieve will be by voice signal property and song
Partial Feature composition in spectrum signature, if the hyperplane of the X-axis tried to achieve isWherein
Pth for voice signal propertyiIndividual feature,Q for music score of Chinese operas featureiIndividual feature, p1…pi∈[1,n];q1…qi∈[1,
m].In like manner, the hyperplane that can try to achieve Y-axis isWhereinR for voice signal propertyi
Individual feature,S for music score of Chinese operas featureiIndividual feature, r1…ri∈[1,n];s1…si∈[1,m]。
Step 406, particularly as follows: the first training hyperplane and the first training matrix are substituted into SVM algorithm, obtains for determining
First emotion recognition model of the first coordinate identification value;Second training hyperplane and the second training matrix are substituted into SVM algorithm,
To the second emotion recognition model for determining the second coordinate identification value.Here, the first coordinate identification value represents pressure factor,
Second coordinate identification value represents capacity factor.According to the X-axis hyperplane tried to achieveWherein
p1…pi∈[1,n];q1…qi∈ [1, m], andBring in SVM algorithm,
Try to achieve the training pattern of X-axis, be set to TX.In like manner can try to achieve the training pattern of Y-axis, be set to TY。
In above-mentioned optional embodiment, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described
The quadrant of plane right-angle coordinate and performance affective style one_to_one corresponding, the embodiment of the present application can obtain and can determine the first seat
First emotion recognition model of scale value and the second emotion recognition model of the second coordinate figure, so that performing the terminal unit of this method
Or other terminal units that can indirectly call this method can determine described to be identified according to the first coordinate figure and the second coordinate figure
Sing the performance affective style that audio frequency is corresponding, it is possible to realize the performance emotion class of the performance audio identification singer according to singer
Type, can identify the emotion of singer from sing.
The embodiment of the present application can obtain the first emotion recognition model T that can determine the first coordinate figureXWith the second coordinate
Second emotion recognition model T of valueY, so that the terminal unit of execution this method or other terminals that can indirectly call this method set
For determining, according to the first coordinate figure and the second coordinate figure, the performance affective style that described performance audio frequency to be identified is corresponding, it is possible to
Realize the performance affective style of the performance audio identification singer according to singer, the feelings of singer can be identified from sing
Sense.
The affective characteristics that the present embodiment extracts exists in terms of feature extraction with speech emotion recognition and music emotion identification
Difference: speech emotion recognition has only to extract audio frequency characteristics and is not related to the extraction of music score of Chinese operas feature, and its audio frequency characteristics is also sound
Tune, word speed etc.;Although audio frequency characteristics and music score of Chinese operas feature are also extracted in music emotion identification, but are not related to language spectrum signature and (are included in
In voice signal property) etc. extraction.Therefore compared to existing speech emotion recognition and music emotion identification, the present embodiment root
According to including that the emotion recognition model that the affective characteristics of voice signal property and music score of Chinese operas feature obtains can be according to music score of Chinese operas feature harmony
Tone signal feature, identifies the performance emotion of corresponding singer, for same song, it is possible to identify according to different singers
The corresponding emotion sung, more precisely identifies the emotion of singer.
Embodiment six
Based on previous embodiment, Fig. 6 A to Fig. 6 C shows that the embodiment of the present application other optionally sings emotion recognition side
The schematic flow sheet of method, the application can apply to terminal unit, it is also possible to is applied to emotion recognition model and sets up device, this dress
Put and can be typically positioned in terminal unit in the way of software, hardware or software and hardware combining.Below with executive agent as end
Illustrating as a example by end equipment, knot method shown in the present embodiment can be implemented as described below.
In the present embodiment, pressure dimension is represented by the first coordinate axes, and energy dimension is represented by the second coordinate axes, the
One coordinate axes and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate is with affective style one by one
Corresponding.Concrete, the first coordinate axes can be X-axis, and the second coordinate axes can be Y-axis.The quadrant of described plane right-angle coordinate
Corresponding performance affective style includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.Described plane right-angle coordinate
Quadrant include with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second corresponding happiness of quadrant be cheerful and light-hearted,
Third quadrant correspondence sadness is dejected, fourth quadrant is corresponding naturally tranquil.
Step 600, extracting the affective characteristics of performance audio frequency to be identified, wherein, affective characteristics can include that acoustical signal is special
Levy and music score of Chinese operas feature.
Step 602, according to described affective characteristics and the first emotion recognition model, determine described affective characteristics based on pressure tie up
The pressure factor of degree;According to described affective characteristics and the second emotion recognition model, determine that described affective characteristics is based on energy dimension
Capacity factor;Wherein, described pressure factor and capacity factor are used for determining affective style.In the present embodiment, the first emotion
Identifying that model and the second emotion recognition model obtain for setting up based on previous embodiment, concrete model is set up process and be see reality
Execute example five.
Concrete, determine described to be identified drill according to the voice signal property of described performance audio frequency to be identified and music score of Chinese operas feature
Sing the affective characteristics of audio frequency respectively at the first coordinate axes and the coordinate figure of the second coordinate axes, obtain for characterize pressure factor
One coordinate figure and for characterizing the second coordinate figure of capacity factor.
As shown in Figure 6B, in a kind of feasible embodiment, step 602 obtains the first coordinate figure by the following method.
Step 6020, according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on the first coordinate
First training matrix of axle, obtains fisrt feature matrix based on the first coordinate axes.Concrete, by described voice signal property
(Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on X-axis the is added
One training matrixLast column, obtain fisrt feature matrixWherein, based on presetting, described first training matrix treats that audio frequency is sung in training
Voice signal property and music score of Chinese operas feature and described treat training sing audio frequency affective characteristics the first coordinate axes first training
Coordinate figure determines.
In step 6022, described fisrt feature matrix is normalized, obtains the first normalization matrix, and then obtain
To the affective characteristics of performance audio frequency to be identified matrix after the first training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the first normalization
MatrixThen extract matrix last column data, obtain described in wait to know
Do not sing the affective characteristics of the audio frequency matrix (a after the first training matrix normalizationgx,1…agx,n bgx,1…bgx,m)。
In step 6024, by the affective characteristics of described performance audio frequency to be identified square after the first training matrix normalization
Battle array, the first training hyperplane and the first emotion recognition model based on the first coordinate axes substitute into SVM algorithm, obtain described to be identified
Sing the affective characteristics first coordinate figure at the first change in coordinate axis direction of audio frequency.Concrete, by described (agx,1…agx,n bgx,1…
bgx,m), first training hyperplaneThe first emotion recognition model T with X-axisXIt is updated to
In SVM algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) in the first of X-direction
Coordinate figure Xg;Wherein, describedFor training the pth of voice signal propertyiIndividual feature,For training the q of music score of Chinese operas featureiIndividual spy
Levy, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training
The number of song.After described first training hyperplane is normalized based on the first training matrix, first trains normalized moments
Battle array determines, described first training matrix is based on described first training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;
Described training pattern based on the first coordinate axes determines based on described first training hyperplane and the first training normalization matrix.
As shown in Figure 6 C, in a kind of feasible embodiment, step 602 obtains the second coordinate figure by the following method.
Step 6020 ', according to the voice signal property of described performance audio frequency to be identified, music score of Chinese operas feature with based on the second coordinate
Second training matrix of axle, obtains second characteristic matrix based on the second coordinate axes.Concrete, by described voice signal property
(Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n Bg,1…Bg,m0) based on Y-axis the is added
Two training matrixLast column, obtain second characteristic matrixWherein, based on presetting, described second training matrix treats that audio frequency is sung in training
Voice signal property and music score of Chinese operas feature and the described affective characteristics treating training performance audio frequency are sat in the second training of the second coordinate axes
Scale value determines.
In step 6022 ', described second characteristic matrix is normalized, obtains the second normalization matrix, and then
Obtain the affective characteristics of performance audio frequency to be identified matrix after the second training matrix normalization.Concrete, to describedData in matrix are normalized by row, obtain the second normalization
MatrixThen extract matrix last column data, obtain described in wait to know
Do not sing the affective characteristics of the audio frequency matrix (a after the second training matrix normalizationgy,1…agy,n bgy,1…bgy,m)。
In step 6024 ', by the affective characteristics of described performance audio frequency to be identified square after the second training matrix normalization
Battle array, the second training hyperplane and the second emotion recognition model based on the second coordinate axes substitute into SVM algorithm, obtain described to be identified
Sing the affective characteristics second coordinate figure at the second change in coordinate axis direction of audio frequency.Concrete, by described (agy,1…agy,n bgy,1…
bgy,m), second training hyperplaneThe second emotion recognition model T with Y-axisYIt is updated to SVM
In algorithm, obtain the affective characteristics (A of described performance audio frequency to be identifiedg,1…Ag,n Bg,1…Bg,m) at the second coordinate of Y direction
Value Yg, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1…
ri∈[1,n];s1…si∈ [1, m], wherein n be voice signal property number, m be music score of Chinese operas number of features, L be training song
Number.After described second training hyperplane is normalized based on the second training matrix, the second training normalization matrix is true
Fixed, described second training matrix is based on described second training coordinate figure and treats that the affective characteristics of training performance audio frequency determines;Described
Training pattern based on the second coordinate axes determines based on described second training hyperplane and the second training normalization matrix.
It should be appreciated that step 6020 and step 6020 ' execution sequence without successively, can synchronize to perform.In like manner, step
6022 and step 6022 ' execution sequence without successively, can synchronize to perform.Step 6024 and step 6024 ' execution sequence without elder generation
After, can synchronize to perform.
Step 604, according to described pressure factor and capacity factor, determine the performance feelings that described performance audio frequency to be identified is corresponding
Sense type.Concrete, the affective characteristics of described performance audio frequency to be identified is determined according to described first coordinate figure and the second coordinate figure
Corresponding performance affective style.
In the embodiment of the present application, can make to perform the terminal unit of this method or other can call this method indirectly
Terminal unit according to the first emotion recognition model and the second emotion recognition model, can determine the pressure of performance audio frequency to be identified because of
Element and capacity factor, and then determine the performance affective style that performance audio frequency to be identified is corresponding.Pass through the embodiment of the present application, it is possible to real
Now according to the performance affective style of the performance audio identification singer of singer, the emotion of singer can be identified from sing.
In the embodiment of the present application, the first coordinate axes and the second coordinate axes composition plane right-angle coordinate, described plane is straight
The quadrant of angle coordinate system and performance affective style one_to_one corresponding.The embodiment of the present application is by the acoustical signal of performance audio frequency to be identified
Feature and music score of Chinese operas feature determine that the affective characteristics of described performance audio frequency to be identified is respectively at the first coordinate axes and the second coordinate axes
Coordinate figure, and determine, according to the first coordinate figure and the second coordinate figure, the performance that the affective characteristics of described performance audio frequency to be identified is corresponding
Affective style, it is possible to realize the performance affective style of the performance audio identification singer according to singer, can know from sing
The emotion of other singer.
Embodiment seven
Refer to Fig. 7, it is shown that a kind of performance emotion identification method that the embodiment of the present application provides, the application can apply
In terminal unit, it is also possible to be applied to emotion recognition model and set up device, this device can be with software, hardware or software and hardware combining
Mode be typically positioned in terminal unit.Illustrate as a example by executive agent is as terminal unit below, method shown in Fig. 7
Can be accomplished by.
Audio frequency is sung in step 700, acquisition user.This step can be terminal unit directly from local or storage device or
The performance audio frequency that Network Capture user sings.
Step 702, when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, it is right to export
The performance output control instruction answered.
User can be sung audio frequency by speech emotion recognition and music emotion identification and carry out affective style by this step
Identify, it is also possible to by previous embodiment one to six arbitrary described method, user is sung audio frequency and carry out the knowledge of affective style
Not.
In an optional embodiment, described performance output control instruction include following at least one: sing add sub-control
System instruction, signal light control instruction.Such as, user, when KTV sings, sings, when KTV equipment identifies user, the feelings that audio frequency is corresponding
When sense type is consistent with preset musical emotion (assuming that preset musical emotion is for glad cheerful and light-hearted), then output performance bonus point controls to refer to
Order, carries out bonus point with the performance mark showing KTV equipment.The most such as, user is when KTV sings, when KTV equipment identifies use
When the affective style that family performance audio frequency is corresponding is consistent with preset musical emotion (assuming that preset musical emotion is for sad dejected), the most defeated
Go out signal light control instruction, the luminaire being connected with KTV equipment to be carried out signal light control, concrete, can control and KTV equipment
The luminaire output blue light connected, to embody sad dejected scene.
Embodiment 8
Referring to Fig. 8, the present embodiment provides one to sing emotion recognition device, including:
Training module 800, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Institute
State affective characteristics and include voice signal property and music score of Chinese operas feature;
Extraction module 801, for extracting the affective characteristics of performance audio frequency to be identified;
Identification module 802, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies and waits to know
Do not sing the emotion of audio frequency.
Optionally, described voice signal property include following at least one: average energy, energy scale fundamental frequency poor, average,
Fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, the average centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described
Music score of Chinese operas feature include following at least one: beat number per minute, big tone category type, mode, average pitch, pitch standard deviation, each
The average duration of sound.
Optionally, described training module is as follows:
Training coordinate figure determines unit, treats that training sings the affective characteristics of audio frequency respectively at the first coordinate described in determining
The training coordinate figure of axle and the second coordinate axes, obtains the first training coordinate figure and the second training coordinate figure;Wherein, described first sit
Parameter and the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate is with performance affective style one by one
Corresponding;
Training matrix determines unit, for according to described first training coordinate figure and the affective characteristics treating training performance audio frequency
Set up the first training matrix, set up the second training according to described second training coordinate figure and the affective characteristics treating training performance audio frequency
Matrix;
Training normalization matrix determines unit, for the first training matrix is normalized into the first training normalization matrix;
Second training matrix is normalized into the second training normalization matrix;
Training hyperplane determines unit, for described first training normalization matrix, the second training normalization matrix being divided
Not substituting into SVM algorithm, correspondence obtains the first training hyperplane, the second training hyperplane;
Emotion recognition model determines unit, for the first training hyperplane and the first training normalization matrix are substituted into SVM
Algorithm, obtains the first emotion recognition model based on the first coordinate axes;By the second training hyperplane and the second training normalized moments
Battle array substitutes into SVM algorithm, obtains the second emotion recognition model based on the second coordinate axes.
Optionally, described identification module includes:
Input block, for inputting the first emotion recognition model and second respectively by the affective characteristics of performance audio frequency to be identified
Emotion recognition model, determines described affective characteristics the first coordinate figure based on the first coordinate axes and based on the second coordinate axes second
Coordinate figure;
Determine unit, for according to described first coordinate figure and the second coordinate figure determine described affective characteristics corresponding as
Limit, the performance affective style corresponding to determine described affective characteristics.
Optionally, described determine unit, specifically for:
According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix;To institute
State fisrt feature matrix to be normalized, obtain the first normalization matrix, and then obtain the emotion of performance audio frequency to be identified
Feature matrix after the first training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the first training matrix
Matrix, the first training hyperplane and the first emotion recognition model after normalization substitute into SVM algorithm, obtain affective characteristics first
First coordinate figure of change in coordinate axis direction.
Optionally, described first coordinate axes is X-axis, the most described determines unit, specifically for:
By described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n
Bg,1…Bg,m0) the first training matrix based on X-axis is addedLast
OK, fisrt feature matrix is obtained
To describedData in matrix are normalized by row,
Obtain the first normalization matrixThen the number of matrix last column is extracted
According to, obtain the affective characteristics of described performance audio frequency to be identified matrix (a after the first training matrix normalizationgx,1…agx,n
bgx,1…bgx,m);
By described (agx,1…agx,n bgx,1…bgx,m), first training hyperplaneWith
First emotion recognition model T of X-axisXIt is updated in SVM algorithm, obtains the affective characteristics (A of described performance audio frequency to be identifiedg,1…
Ag,n Bg,1…Bg,m) at the first coordinate figure X of X-directiong;Wherein, describedFor training the pth of voice signal propertyiIndividual spy
Levy,For training the q of music score of Chinese operas featureiIndividual feature, p1…pi∈[1,n];q1…qi∈ [1, m], wherein n is that acoustical signal is special
Levy number, m is music score of Chinese operas number of features, L is the number training song.
Optionally, described determine unit, specifically for:
According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix;To institute
State second characteristic matrix to be normalized, obtain the second normalization matrix, and then obtain the emotion of performance audio frequency to be identified
Feature matrix after the second training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the second training matrix
Matrix, the second training hyperplane and the second emotion recognition model after normalization substitute into SVM algorithm, obtain affective characteristics second
Second coordinate figure of change in coordinate axis direction.
Optionally, the second coordinate axes is Y-axis, the most described determines unit, specifically for:
By described voice signal property (Ag,1…Ag,n) and music score of Chinese operas feature (Bg,1…Bg,m) matrix (A that formsg,1…Ag,n
Bg,1…Bg,m0) the second training matrix based on Y-axis is addedLast
OK, second characteristic matrix is obtained
To describedData in matrix are normalized by row,
Obtain the second normalization matrixThen the number of matrix last column is extracted
According to, obtain the affective characteristics of described performance audio frequency to be identified matrix (a after the second training matrix normalizationgy,1…agy,n
bgy,1…bgy,m);
By described (agy,1…agy,n bgy,1…bgy,m), second training hyperplaneAnd Y
Second emotion recognition model T of axleYIt is updated in SVM algorithm, obtains the affective characteristics (A of described performance audio frequency to be identifiedg,1…
Ag,n Bg,1…Bg,m) at the second coordinate figure Y of Y directiong, wherein, describedFor training the r of voice signal propertyiIndividual spy,For training the s of music score of Chinese operas featureiIndividual feature, r1…ri∈[1,n];s1…si∈ [1, m], wherein n is voice signal property number
Mesh, m be music score of Chinese operas number of features, L be the number of training song.
Optionally, the performance affective style that the quadrant of described plane right-angle coordinate is corresponding includes: nervous anxiety, happiness are joyous
Hurry up, sad dejected, calmness naturally.
Optionally, the quadrant of described plane right-angle coordinate and the corresponding relation singing affective style include:
First quartile correspondence anxiety anxiety, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, four-quadrant
Limit is corresponding naturally tranquil.
This device embodiment is the most corresponding with the method feature in previous embodiment, and correlation module/unit can corresponding perform
Method flow in previous embodiment, therefore can be found in the associated description of method flow part in previous embodiment, at this no longer
Repeat.
The embodiment of the present application also provides for a kind of electric terminal, the performance emotion recognition dress provided including such as previous embodiment
Put.This device embodiment is the most corresponding with the method feature in previous embodiment, therefore can be found in method stream in previous embodiment
The associated description of journey part, does not repeats them here.
Embodiment nine
Referring to Fig. 9, the present embodiment provides one to sing and identifies device, including:
Acquisition module 901, is used for obtaining user and sings audio frequency;
Identification module 902, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time,
The performance output control instruction that output is corresponding.
Optionally, described performance output control instruction include following at least one: sing bonus point control instruction, signal light control
Instruction.
This device embodiment is the most corresponding with the method feature in previous embodiment, and correlation module/unit can corresponding perform
Method flow in previous embodiment, therefore can be found in the associated description of method flow part in previous embodiment, at this no longer
Repeat.
Referring to Figure 10, the embodiment of the present application also provides for a kind of electric terminal, including:
Memorizer 1000;
One or more processors 1003;And
One or more modules 1001, the one or more module 1001 is stored in described memorizer and is configured
Becoming to be controlled by the one or more processor, the one or more module is for performing the instruction of following steps:
Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described affective characteristics includes sound
Tone signal feature and music score of Chinese operas feature;
Extract the affective characteristics of performance audio frequency to be identified;
The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the feelings of performance audio frequency to be identified
Sense.
In a typical configuration, calculating equipment includes one or more processor (CPU), input/output interface, net
Network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or
The forms such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium
Example.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-can be by any method
Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data.
The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves
State random access memory (DRAM), other kinds of random access memory (RAM), read only memory (ROM), electric erasable
Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read only memory (CD-ROM),
Digital versatile disc (DVD) or other optical storage, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus
Or any other non-transmission medium, can be used for the information that storage can be accessed by a computing device.According to defining herein, calculate
Machine computer-readable recording medium does not include non-temporary computer readable media (transitory media), such as data signal and the carrier wave of modulation.
As employed some vocabulary in the middle of description and claim to censure specific components.Those skilled in the art should
It is understood that hardware manufacturer may call same assembly with different nouns.This specification and claims are not with name
The difference claimed is used as distinguishing the mode of assembly, but is used as the criterion distinguished with assembly difference functionally.As logical
" comprising " mentioned in the middle of piece description and claim is an open language, therefore should be construed to " comprise but do not limit
In "." substantially " referring in receivable range of error, those skilled in the art can solve described in the range of certain error
Technical problem, basically reaches described technique effect.Additionally, " coupling " word comprises any directly and indirectly electric property coupling at this
Means.Therefore, if a first device is coupled to one second device described in literary composition, then representing described first device can direct electrical coupling
It is connected to described second device, or is indirectly electrically coupled to described second device by other devices or the means that couple.Description
Subsequent descriptions is to implement the better embodiment of the application, for the purpose of right described description is the rule so that the application to be described,
It is not limited to scope of the present application.The protection domain of the application is when being as the criterion depending on the defined person of claims.
Also, it should be noted term " includes ", " comprising " or its any other variant are intended to nonexcludability
Comprise, so that include that the commodity of a series of key element or system not only include those key elements, but also include the most clearly
Other key elements listed, or also include the key element intrinsic for this commodity or system.In the feelings not having more restriction
Under condition, statement " including ... " key element limited, it is not excluded that in the commodity including described key element or system also
There is other identical element.
Described above illustrate and describes some preferred embodiments of the present invention, but as previously mentioned, it should be understood that the present invention
Be not limited to form disclosed herein, be not to be taken as the eliminating to other embodiments, and can be used for other combinations various,
Amendment and environment, and can be in invention contemplated scope described herein, by above-mentioned teaching or the technology of association area or knowledge
It is modified.And the change that those skilled in the art are carried out and change are without departing from the spirit and scope of the present invention, the most all should be at this
In the protection domain of bright claims.
Claims (17)
1. sing emotion identification method for one kind, it is characterised in that including:
Extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described affective characteristics includes that sound is believed
Number feature and music score of Chinese operas feature;
Extract the affective characteristics of performance audio frequency to be identified;
The affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies the emotion of performance audio frequency to be identified.
Performance emotion identification method the most according to claim 1, it is characterised in that described voice signal property includes following
At least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, average
The centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described music score of Chinese operas feature include following at least one: beat number per minute,
Big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.
Performance emotion identification method the most according to claim 1, it is characterised in that the training package of described emotion recognition model
Include:
Treat described in determining training sing audio frequency affective characteristics respectively at the training coordinate figure of the first coordinate axes and the second coordinate axes,
Obtain the first training coordinate figure and the second training coordinate figure;Wherein, described first coordinate axes and the second coordinate axes composition plane are straight
Angle coordinate system, the quadrant of described plane right-angle coordinate and performance affective style one_to_one corresponding;
The first training matrix is set up, according to described according to described first training coordinate figure and the affective characteristics treating training performance audio frequency
Second training coordinate figure and the affective characteristics treating training performance audio frequency set up the second training matrix;
First training matrix is normalized into the first training normalization matrix;Second training matrix is normalized into the second training return
One changes matrix;
Described first training normalization matrix, the second training normalization matrix are substituted into SVM algorithm respectively, and correspondence obtains the first instruction
Practice hyperplane, the second training hyperplane;
First training hyperplane and the first training normalization matrix are substituted into SVM algorithm, obtains based on the first coordinate axes first
Emotion recognition model;Second training hyperplane and the second training normalization matrix are substituted into SVM algorithm, obtains based on the second coordinate
Second emotion recognition model of axle.
Performance emotion identification method the most according to claim 3, it is characterised in that the described feelings by performance audio frequency to be identified
Sense feature input emotion recognition model, the emotion identifying performance audio frequency to be identified includes:
The affective characteristics of performance audio frequency to be identified is inputted respectively the first emotion recognition model and the second emotion recognition model, determines
Described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate figure based on the second coordinate axes;
The quadrant that described affective characteristics is corresponding is determined, to determine that described emotion is special according to described first coordinate figure and the second coordinate figure
Levy the performance affective style of correspondence.
Performance emotion identification method the most according to claim 4, it is characterised in that the described feelings by performance audio frequency to be identified
Sense feature input emotion recognition model, determines that described affective characteristics the first coordinate figure based on the first coordinate axes includes:
According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix;To described
One eigenmatrix is normalized, and obtains the first normalization matrix, and then obtains the affective characteristics of performance audio frequency to be identified
Matrix after the first training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the first training matrix normalizing
Matrix, the first training hyperplane and the first emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the first coordinate
Axial first coordinate figure.
Performance emotion identification method the most according to claim 4, it is characterised in that the described feelings by performance audio frequency to be identified
Sense feature input emotion recognition model, determines that described affective characteristics the second coordinate figure based on the second coordinate axes includes:
According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix;To described
Two eigenmatrixes are normalized, and obtain the second normalization matrix, and then obtain the affective characteristics of performance audio frequency to be identified
Matrix after the second training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the second training matrix normalizing
Matrix, the second training hyperplane and the second emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the second coordinate
Axial second coordinate figure.
Performance emotion identification method the most according to claim 3, it is characterised in that the quadrant of described plane right-angle coordinate
Corresponding performance affective style includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil.
The quadrant of described plane right-angle coordinate includes with the corresponding relation singing affective style: the corresponding anxiety of first quartile is burnt
Consider, the second quadrant correspondence happiness is cheerful and light-hearted, the corresponding sadness of third quadrant is dejected, fourth quadrant is corresponding naturally tranquil.
8. sing emotion recognition device for one kind, it is characterised in that including:
Training module, for extracting the affective characteristics treating that audio frequency is sung in training, training obtains emotion recognition model;Described emotion is special
Levy and include voice signal property and music score of Chinese operas feature;
Extraction module, for extracting the affective characteristics of performance audio frequency to be identified;
Identification module, for the affective characteristics of performance audio frequency to be identified is inputted emotion recognition model, identifies performance to be identified
The emotion of audio frequency.
Performance emotion recognition device the most according to claim 8, it is characterised in that described voice signal property includes following
At least one: average energy, energy scale fundamental frequency poor, average, fundamental frequency standard deviation, exceed the frequency number of average fundamental frequency, average
The centre of moment, centre of moment standard deviation, MFCC feature, language spectrum signature;Described music score of Chinese operas feature include following at least one: beat number per minute,
Big tone category type, mode, average pitch, pitch standard deviation, the average duration of each sound.
Performance emotion recognition device the most according to claim 8, it is characterised in that described training module is as follows:
Training coordinate figure determines unit, treat described in determine training sing the affective characteristics of audio frequency respectively at the first coordinate axes and
The training coordinate figure of the second coordinate axes, obtains the first training coordinate figure and the second training coordinate figure;Wherein, described first coordinate axes
With the second coordinate axes composition plane right-angle coordinate, the quadrant of described plane right-angle coordinate with sing affective style one a pair
Should;
Training matrix determines unit, for setting up according to described first training coordinate figure and the affective characteristics treating training performance audio frequency
First training matrix, sets up the second training square according to described second training coordinate figure and the affective characteristics treating training performance audio frequency
Battle array;
Training normalization matrix determines unit, for the first training matrix is normalized into the first training normalization matrix;By
Two training matrix are normalized into the second training normalization matrix;
Training hyperplane determines unit, for by described first training normalization matrix, the second training normalization matrix generation respectively
Entering SVM algorithm, correspondence obtains the first training hyperplane, the second training hyperplane;
Emotion recognition model determines unit, for the first training hyperplane and the first training normalization matrix are substituted into SVM algorithm,
Obtain the first emotion recognition model based on the first coordinate axes;Second training hyperplane and the second training normalization matrix are substituted into
SVM algorithm, obtains the second emotion recognition model based on the second coordinate axes.
11. performance emotion recognition devices according to claim 10, it is characterised in that described identification module includes:
Input block, for inputting the first emotion recognition model and the second emotion respectively by the affective characteristics of performance audio frequency to be identified
Identify model, determine described affective characteristics the first coordinate figure based on the first coordinate axes and the second coordinate based on the second coordinate axes
Value;
Determine unit, for determining, according to described first coordinate figure and the second coordinate figure, the quadrant that described affective characteristics is corresponding, with
Determine the performance affective style that described affective characteristics is corresponding.
12. performance emotion recognition devices according to claim 11, it is characterised in that described determine unit, specifically for:
According to described first training matrix and the affective characteristics of performance audio frequency to be identified, set up fisrt feature matrix;To described
One eigenmatrix is normalized, and obtains the first normalization matrix, and then obtains the affective characteristics of performance audio frequency to be identified
Matrix after the first training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the first training matrix normalizing
Matrix, the first training hyperplane and the first emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the first coordinate
Axial first coordinate figure.
13. performance emotion recognition devices according to claim 11, it is characterised in that described determine unit, specifically for:
According to described second training matrix and the affective characteristics of performance audio frequency to be identified, set up second characteristic matrix;To described
Two eigenmatrixes are normalized, and obtain the second normalization matrix, and then obtain the affective characteristics of performance audio frequency to be identified
Matrix after the second training matrix normalization;By the affective characteristics of described performance audio frequency to be identified through the second training matrix normalizing
Matrix, the second training hyperplane and the second emotion recognition model after change substitute into SVM algorithm, obtain affective characteristics at the second coordinate
Axial second coordinate figure.
14. performance emotion recognition devices according to claim 10, it is characterised in that described plane right-angle coordinate as
The performance affective style of limit correspondence includes: nervous anxiety, happiness are cheerful and light-hearted, sad dejected, naturally tranquil;Described plane rectangular coordinates
The quadrant of system includes with the corresponding relation singing affective style: first quartile corresponding anxiety anxiety, the second quadrant are corresponding glad joyous
Hurry up, third quadrant correspondence sadness is dejected, fourth quadrant is corresponding naturally tranquil.
Sing recognition methods for 15. 1 kinds, it is characterised in that including:
Obtain user and sing audio frequency;
When identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, output correspondence performance result control
System instruction.
16. performance recognition methodss according to claim 15, it is characterised in that described performance output control instruction include with
Descend at least one: sing bonus point control instruction, signal light control instruction.
Sing for 17. 1 kinds and identify device, it is characterised in that including:
Acquisition module, is used for obtaining user and sings audio frequency;
Identification module, for when identify user sing affective style corresponding to audio frequency be consistent with preset musical emotion time, it is right to export
Answer performance output control instruction, wherein said performance output control instruction include following at least one: sing bonus point control refer to
Make, signal light control instructs.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610506767 | 2016-06-30 | ||
CN2016105067670 | 2016-06-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106128479A true CN106128479A (en) | 2016-11-16 |
CN106128479B CN106128479B (en) | 2019-09-06 |
Family
ID=57468267
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610517375.4A Active CN106128479B (en) | 2016-06-30 | 2016-07-02 | A kind of performance emotion identification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106128479B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108039181A (en) * | 2017-11-02 | 2018-05-15 | 北京捷通华声科技股份有限公司 | The emotion information analysis method and device of a kind of voice signal |
CN108174498A (en) * | 2017-12-28 | 2018-06-15 | 福建海媚数码科技有限公司 | A kind of control method and system of the scene lamp based on intelligent Matching |
CN108899046A (en) * | 2018-07-12 | 2018-11-27 | 东北大学 | A kind of speech-emotion recognition method and system based on Multistage Support Vector Machine classification |
CN108986843A (en) * | 2018-08-10 | 2018-12-11 | 杭州网易云音乐科技有限公司 | Audio data processing method and device, medium and calculating equipment |
CN109120992A (en) * | 2018-09-13 | 2019-01-01 | 北京金山安全软件有限公司 | Video generation method and device, electronic equipment and storage medium |
CN109273025A (en) * | 2018-11-02 | 2019-01-25 | 中国地质大学(武汉) | A kind of China National Pentatonic emotion identification method and system |
CN110162671A (en) * | 2019-05-09 | 2019-08-23 | 央视国际网络无锡有限公司 | The method for identifying video ads by music emotion |
CN110223712A (en) * | 2019-06-05 | 2019-09-10 | 西安交通大学 | A kind of music emotion recognition method based on two-way convolution loop sparse network |
CN111601433A (en) * | 2020-05-08 | 2020-08-28 | 中国传媒大学 | Method and device for predicting stage lighting effect control strategy |
CN112614511A (en) * | 2020-12-10 | 2021-04-06 | 央视国际网络无锡有限公司 | Song emotion detection method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10187178A (en) * | 1996-10-28 | 1998-07-14 | Omron Corp | Feeling analysis device for singing and grading device |
CN1975856A (en) * | 2006-10-30 | 2007-06-06 | 邹采荣 | Speech emotion identifying method based on supporting vector machine |
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
US8283549B2 (en) * | 2006-09-08 | 2012-10-09 | Panasonic Corporation | Information processing terminal and music information generating method and program |
US20140172431A1 (en) * | 2012-12-13 | 2014-06-19 | National Chiao Tung University | Music playing system and music playing method based on speech emotion recognition |
CN106132040A (en) * | 2016-06-20 | 2016-11-16 | 科大讯飞股份有限公司 | Sing lamp light control method and the device of environment |
-
2016
- 2016-07-02 CN CN201610517375.4A patent/CN106128479B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH10187178A (en) * | 1996-10-28 | 1998-07-14 | Omron Corp | Feeling analysis device for singing and grading device |
US8283549B2 (en) * | 2006-09-08 | 2012-10-09 | Panasonic Corporation | Information processing terminal and music information generating method and program |
CN1975856A (en) * | 2006-10-30 | 2007-06-06 | 邹采荣 | Speech emotion identifying method based on supporting vector machine |
CN101599271A (en) * | 2009-07-07 | 2009-12-09 | 华中科技大学 | A kind of recognition methods of digital music emotion |
US20140172431A1 (en) * | 2012-12-13 | 2014-06-19 | National Chiao Tung University | Music playing system and music playing method based on speech emotion recognition |
CN106132040A (en) * | 2016-06-20 | 2016-11-16 | 科大讯飞股份有限公司 | Sing lamp light control method and the device of environment |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108039181A (en) * | 2017-11-02 | 2018-05-15 | 北京捷通华声科技股份有限公司 | The emotion information analysis method and device of a kind of voice signal |
CN108039181B (en) * | 2017-11-02 | 2021-02-12 | 北京捷通华声科技股份有限公司 | Method and device for analyzing emotion information of sound signal |
CN108174498A (en) * | 2017-12-28 | 2018-06-15 | 福建海媚数码科技有限公司 | A kind of control method and system of the scene lamp based on intelligent Matching |
CN108899046A (en) * | 2018-07-12 | 2018-11-27 | 东北大学 | A kind of speech-emotion recognition method and system based on Multistage Support Vector Machine classification |
CN108986843A (en) * | 2018-08-10 | 2018-12-11 | 杭州网易云音乐科技有限公司 | Audio data processing method and device, medium and calculating equipment |
CN109120992A (en) * | 2018-09-13 | 2019-01-01 | 北京金山安全软件有限公司 | Video generation method and device, electronic equipment and storage medium |
CN109273025A (en) * | 2018-11-02 | 2019-01-25 | 中国地质大学(武汉) | A kind of China National Pentatonic emotion identification method and system |
CN110162671A (en) * | 2019-05-09 | 2019-08-23 | 央视国际网络无锡有限公司 | The method for identifying video ads by music emotion |
CN110223712A (en) * | 2019-06-05 | 2019-09-10 | 西安交通大学 | A kind of music emotion recognition method based on two-way convolution loop sparse network |
CN110223712B (en) * | 2019-06-05 | 2021-04-20 | 西安交通大学 | Music emotion recognition method based on bidirectional convolution cyclic sparse network |
CN111601433A (en) * | 2020-05-08 | 2020-08-28 | 中国传媒大学 | Method and device for predicting stage lighting effect control strategy |
CN112614511A (en) * | 2020-12-10 | 2021-04-06 | 央视国际网络无锡有限公司 | Song emotion detection method |
Also Published As
Publication number | Publication date |
---|---|
CN106128479B (en) | 2019-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106128479A (en) | A kind of performance emotion identification method and device | |
Won et al. | Evaluation of cnn-based automatic music tagging models | |
Lee et al. | Automatic music genre classification based on modulation spectral analysis of spectral and cepstral features | |
CN110457432A (en) | Interview methods of marking, device, equipment and storage medium | |
Lataifeh et al. | Arabic audio clips: Identification and discrimination of authentic cantillations from imitations | |
CN109461073A (en) | Risk management method, device, computer equipment and the storage medium of intelligent recognition | |
CN102723079B (en) | Music and chord automatic identification method based on sparse representation | |
Yu et al. | Predominant instrument recognition based on deep neural network with auxiliary classification | |
CN115083422B (en) | Voice traceability evidence obtaining method and device, equipment and storage medium | |
Biagetti et al. | Speaker identification with short sequences of speech frames | |
Zang et al. | Singfake: Singing voice deepfake detection | |
Tsunoo et al. | Music mood classification by rhythm and bass-line unit pattern analysis | |
CN112052686B (en) | Voice learning resource pushing method for user interactive education | |
Anglade et al. | Characterisation of Harmony With Inductive Logic Programming. | |
CN105895079A (en) | Voice data processing method and device | |
Ramirez et al. | Performance-based interpreter identification in saxophone audio recordings | |
Ma et al. | How to boost anti-spoofing with x-vectors | |
Mahfood et al. | Emotion Recognition from Speech Using Convolutional Neural Networks | |
Won et al. | Visualizing and understanding self-attention based music tagging | |
Kotti et al. | Speaker-independent negative emotion recognition | |
Hama Saeed | Improved speech emotion classification using deep neural network | |
Zhang et al. | Feature learning via deep belief network for Chinese speech emotion recognition | |
Tsai et al. | Bird species identification based on timbre and pitch features | |
Lashkari et al. | NMF-based cepstral features for speech emotion recognition | |
Kamińska et al. | Polish emotional speech recognition based on the committee of classifiers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Cai Zhili Inventor after: Li Hongfu Inventor before: Cai Zhili |
|
GR01 | Patent grant | ||
GR01 | Patent grant |