CN108305642B - The determination method and apparatus of emotion information - Google Patents
The determination method and apparatus of emotion information Download PDFInfo
- Publication number
- CN108305642B CN108305642B CN201710527116.4A CN201710527116A CN108305642B CN 108305642 B CN108305642 B CN 108305642B CN 201710527116 A CN201710527116 A CN 201710527116A CN 108305642 B CN108305642 B CN 108305642B
- Authority
- CN
- China
- Prior art keywords
- emotion
- text
- information
- feature
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 367
- 238000000034 method Methods 0.000 title claims abstract description 66
- 238000003062 neural network model Methods 0.000 claims description 63
- 238000012549 training Methods 0.000 claims description 37
- 238000000605 extraction Methods 0.000 claims description 24
- 230000015654 memory Effects 0.000 claims description 19
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 5
- 235000013399 edible fruits Nutrition 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000001537 neural effect Effects 0.000 claims description 2
- 238000004590 computer program Methods 0.000 claims 2
- 230000032258 transport Effects 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 14
- 238000001514 detection method Methods 0.000 description 27
- 230000002996 emotional effect Effects 0.000 description 23
- 230000000694 effects Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 11
- 230000006870 function Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000004927 fusion Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 5
- RZVAJINKPMORJF-UHFFFAOYSA-N Acetaminophen Chemical compound CC(=O)NC1=CC=C(O)C=C1 RZVAJINKPMORJF-UHFFFAOYSA-N 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000036651 mood Effects 0.000 description 4
- 230000019771 cognition Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000004218 nerve net Anatomy 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000008713 feedback mechanism Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Probability & Statistics with Applications (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of determination method and apparatus of emotion information.Wherein, this method comprises: obtaining target audio;The first text information is identified from target audio, target audio has phonetic feature, and the first text information has text feature;The phonetic feature that the text feature and target audio being had based on the first text information are had determines the target emotion information of target audio.The present invention solves the technical issues of emotion information that can not accurately identify speaker in the related technology.
Description
Technical field
The present invention relates to internet areas, in particular to a kind of determination method and apparatus of emotion information.
Background technique
Now, along with the increase of multimedia content, from the market demand can carry out the audiovisual in the short time content it is general
Want technology.In addition, the type of content is presented diversified trend, for example, film, serial, home video, news, documentary film,
Music content, life real-time scene, the network novel, text news etc. correspond to this, and the audiovisual requirement for trying hearer is also more and more
Sample.
Along with the diversification that this audiovisual requires, needs to retrieve immediately for the audiovisual requirement to examination hearer, prompts to want
Adaptation, the technology of scene of viewing.Such as content summary technology, i.e., based on comprising text information and summary content, in content
In summarization techniques, by analyzing text information, so that it is determined that the emotion that text information carries, such as laughs at, is angry, is sad
Deng.
In above-mentioned analysis method, the emotion detection method based on audio can be used, the audio of speaker is detected,
Emotion detection is carried out using audio, has the function of the case where obvious emotional expression with relatively good to speaker, when
The emotional expression of speaker is not strong, such as a thing being very glad, is expressed with the very flat tone, at this time in audio
Hardly with for expressing glad feature, in this case, voice-based emotion detection is just ineffective, does not do
Method is accurately adjudicated according to phonetic feature, in some instances it may even be possible to obtain the court verdict of mistake.
The technical issues of for the emotion information that can not accurately identify speaker in the related technology, not yet proposes effective at present
Solution.
Summary of the invention
The embodiment of the invention provides a kind of determination method and apparatus of emotion information, at least solve in the related technology without
Method accurately identifies the technical issues of emotion information of speaker.
According to an aspect of an embodiment of the present invention, a kind of determination method of emotion information is provided, the determination method packet
It includes: obtaining target audio;The first text information is identified from target audio, target audio has phonetic feature, the first text
Information has text feature;The phonetic feature that the text feature and target audio being had based on the first text information are had determines mesh
The target emotion information of mark with phonetic symbols frequency.
According to another aspect of an embodiment of the present invention, a kind of determining device of emotion information is additionally provided, the determining device
It include: acquiring unit, for obtaining target audio;Recognition unit, for identifying the first text information, mesh from target audio
Mark with phonetic symbols frequency has phonetic feature, and the first text information has text feature;Determination unit, for being had based on the first text information
Text feature and the phonetic feature that has of target audio determine the target emotion information of target audio.
In embodiments of the present invention, when getting target audio, the first text information is identified from target audio, so
The phonetic feature that the text feature and target audio being had afterwards based on the first text information are had determines the target feelings of target audio
Feel information, namely in text information there is apparent emotion can determine emotion by the text feature of text information when showing
Information in target audio there is apparent emotion can determine emotion information by the phonetic feature of target audio when showing,
The technical issues of can solve the emotion information that can not accurately identify speaker in the related technology, and then reach raising identification and speak
The technical effect of the accuracy of the emotion information of person.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the schematic diagram of the hardware environment of the determination method of emotion information according to an embodiment of the present invention;
Fig. 2 is a kind of flow chart of the determination method of optional emotion information according to an embodiment of the present invention;
Fig. 3 is the flow chart of optional training convolutional neural networks model according to an embodiment of the present invention;
Fig. 4 is the flow chart of optional trained deep neural network model according to an embodiment of the present invention;
Fig. 5 is a kind of flow chart of the determination method of optional emotion information according to an embodiment of the present invention;
Fig. 6 is a kind of schematic diagram of the determining device of optional emotion information according to an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram of the determining device of optional emotion information according to an embodiment of the present invention;And
Fig. 8 is a kind of structural block diagram of terminal according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, the embodiment of the method for a kind of determination method of emotion information is provided.
Optionally, in the present embodiment, the determination method of above-mentioned emotion information can be applied to as shown in Figure 1 by servicing
In the hardware environment that device 102 and terminal 104 are constituted.As shown in Figure 1, server 102 is connected by network with terminal 104
Connect, above-mentioned network includes but is not limited to: wide area network, Metropolitan Area Network (MAN) or local area network, terminal 104 are not limited to PC, mobile phone, plate electricity
Brain etc..The determination method of the emotion information of the embodiment of the present invention can be executed by server 102, can also by terminal 104
It executes, can also be and executed jointly by server 102 and terminal 104.Wherein, terminal 104 executes the emotion of the embodiment of the present invention
The determination method of information is also possible to be executed by client mounted thereto.
When the determination method of the emotion information of the embodiment of the present invention by server or terminal to be individually performed when, directly taking
Program code corresponding with the present processes is executed on business device or terminal.
When the determination method of the emotion information of the embodiment of the present invention by server and terminal jointly to execute when, sent out by terminal
The demand for playing identification target audio, at this point, target voice to be identified is sent to server by terminal, then is held by server
Row program code corresponding with the present processes, and the result of identification is fed back into terminal.
This Shen is described in detail for executing program code corresponding with the present processes on server or terminal below
Embodiment please, Fig. 2 are a kind of flow chart of the determination method of optional emotion information according to an embodiment of the present invention, such as Fig. 2
Shown, this method may comprise steps of:
Step S202 obtains target audio.
It can be terminal and actively obtain target audio, perhaps receive the target audio of other equipment transmissions or in target
Target audio is obtained under the triggering of instruction.Target instruction target word is equivalent to the finger of the target audio for identification of user or terminal triggering
It enables.Obtaining target audio is the emotion information in order to identify target audio, which is to state text by target audio
Show when information and (is including but not limited to showed by the wording in text or tone, tone color in text, audio etc.
The emotion information come out out).
Above-mentioned text information refers to the combination of a sentence or multiple sentences, and a text includes but is not limited to one
Sentence (Sentence), a paragraph (Paragraph) or a chapter (Discourse).
Emotion information is the information for describing speaker's emotion, such as when merely arriving something, is given expression to related to happiness
Emotion (glad, flat, sad), when such as receiving others and apologizing, giving expression to relevant to pardon emotion, (pardon, do not set can
It is no, do not forgive) etc..
Step S204, identifies the first text information from target audio, and target audio has phonetic feature, the first text
Information has text feature.
The first text information is identified from target audio, is referred to and is identified target audio institute by way of speech recognition
(the first text information identified herein may there are subtle areas with the text information actually stated for first text information of expression
Not).
For speech recognition, phonetic feature includes the feature of following several respects: perceptual weighting linear prediction PLP
(Perceptual Linear Predictive), Mel frequency cepstral coefficient MFCC (Mel-Frequency Cepstral
Coefficients), FBANK (Filter-bank feature), tone PITCH (such as high bass), speech energy ENERGY, I-
VECTOR a kind of important feature of vocal acoustics' difference (reflection speak) etc..The feature used in the application can be among the above
It is one or more, it is preferred to use multiple.
For text identification, the first above-mentioned text can be identified from target audio by speech recognition engine
Information, the text feature of text information include the affective style of each phrase or vocabulary in text, Sentiment orientation, emotional intensity etc.
Feature can also be the incidence relation feature etc. between phrase.
Step S206, the phonetic feature that the text feature and target audio being had based on the first text information are had determine mesh
The target emotion information of mark with phonetic symbols frequency.
In the target emotion information for determining target audio, comprehensively considered text feature that the first text information has and
The phonetic feature that target audio has, relative in the related technology, only with the emotion detection method based on audio to speaker
Audio detected for, both can be used audio carry out emotion detection, to speaker have obvious emotional expression
The case where, has the function of relatively good, but when the emotional expression of speaker is not strong, for example a thing being very glad, use is very flat
When the light tone is expressed, it can also hardly be used in this case with the feature for expressing happiness in audio
Text based emotion detection method detects the text information in the audio of speaker, so as to according to text feature
It is accurately adjudicated, to make up only by audio the deficiency for carrying out emotion detection, reaches the accuracy for improving court verdict
Effect.
S202 to step S206 identifies first when getting target audio from target audio through the above steps
Text information, is then based on the text feature that the first text information has and the phonetic feature that target audio has determines target sound
The target emotion information of frequency, namely in text information there is apparent emotion can pass through the text feature of text information when showing
It determines emotion information, in target audio there is apparent emotion can determine by the phonetic feature of target audio when showing
Emotion information, the technical issues of can solve the emotion information that can not accurately identify speaker in the related technology, and then reach and mention
The technical effect of the accuracy of the emotion information of height identification speaker.
Only with the emotion detection method based on audio for the audio of speaker detects, to speaker have than
The case where obvious emotional expression have the function of it is relatively good, using text based emotion detection method to the sound of speaker
Text information in frequency has the function of the case where obvious emotional expression with relatively good, however, when (i.e. what
Scene or which type of voice) detected using the emotion detection method based on audio, when utilize text based feelings
It is unknown that sense detection method, which carries out detection, it is impossible to is predicted in advance using any method come the inspection to current audio to be detected
It is more preferable to survey effect.
If applicant is it is considered that the text obvious for certain emotions is stated using the flat tone, (such as emotion is
If glad text is stated using the flat tone), the recognition effect using text based emotion detection method is obviously preferable,
If being stated (as more flat text is glad than flat aobvious text using the tone with obvious emotion for certain emotions
Tone statement), the recognition effect of the emotion detection method based on audio is obviously preferable, and the obvious text of above-mentioned emotion can be with
It is stated using the obvious tone of the flat tone or emotion, the more flat text of emotion also can be used with significant emotion
The tone or the statement of the flat tone, be not in the tone table that the obvious text of certain positive emotions uses reversed emotion
It states, the text such as with happiness emotional color is stated using the sad tone.
Therefore, on the basis of above-mentioned cognition, as long as one of voice and text are with apparent emotional color (i.e. the first emotion
The emotion information of grade), then it can determine that target voice is the voice with emotional color.Had based on the first text information
When the phonetic feature that text feature and target audio have determines the target emotion information of target audio, obtain according to text feature
The first determining recognition result, the first recognition result are used for the emotion information for indicating to identify according to text feature;Obtain basis
The second recognition result that phonetic feature determines, the second recognition result are used for the emotion information for indicating to identify according to phonetic feature;
Believe in the emotion that the emotion information that at least one of the first recognition result and the second recognition result indicate is the first emotion grade
When breath, the target emotion information of target audio is determined as to the emotion information of the first emotion grade.
The first above-mentioned emotion grade is the grade with obvious emotion information, rather than tend to it is intermediate it is flat (without
Obvious emotion) information, such as glad, flat, sad this group of emotion information, the emotion information of the first emotion grade
Refer to glad or sadness, rather than it is flat, it is similar for other kinds of emotion information, it repeats no more.
In the above-mentioned technical solution identified of the application, common algorithm or machine are including but not limited to used
Device, which learns relevant algorithm, which carries out feature identification and the identification of emotion information, can use engineering for the accuracy of raising
Relevant algorithm is practised to carry out the identification of feature identification and emotion information.It is specifically described below:
(1) the CNN training process based on text identification
Before the above-mentioned steps S202 to step S206 for executing the application, first algorithm model can be trained: being obtained
Before taking target audio, using the second text information (training text) and the first emotion information to the second convolution neural network model
(original convolution neural network model) is trained, to determine the value of parameter in the second convolution neural network model, and will be true
The second convolution neural network model after the value of parameter is determined and has been set as the first convolution neural network model, wherein first
Emotion information is the emotion information of the second text information.It is as shown in Figure 3:
Step S301, segments training text.
Training sentence is segmented, such as to the result of example sentence " today pays out wages, I am very happy " participle are as follows: modern
It, pay out wages, I, very, happily.The affective tag (practical emotion information) of the sentence of this training is glad.
Step S302, training CNN model (i.e. the second convolution neural network model).
Step S3021, Word2vector (term vector).
Term vector is as its name suggests to indicate a word with the form of a vector.Since machine learning task is needed input
It is quantized into numerical value expression, then by making full use of the computing capability of computer, is calculated finally wanting as a result, so needing
By term vector.
According to the number segmented in training sentence, the matrix of a n*k is formed, wherein n is the number of trained sentence word, k
Type for the dimension of vector v ector, this matrix can be fixed, be also possible to it is dynamic, according to specific circumstances into
Row selection.
Word2vector has comparison more and stable algorithm at present, and it is real that the application can choose CBOW and Skip-gram
It is existing, can be based on Huffman tree for CBOW algorithm model and Skip-gram algorithm model, n omicronn-leaf in Huffman tree
The initialization value of the intermediate vector of node storage is null vector, and the term vector of the corresponding word of leaf node is random initializtion
's.
Step S3022, convolutional layer carry out feature extraction.
The n*k matrix that step S3021 is generated obtains the matrix that several columns are 1 by convolutional layer, this layer is similar
One feature extraction layer carries out feature extraction.
Step S3023, pond layer carry out pond processing.
Several of step S3022 generation are classified as 1 matrix, can according to the actual situation selected characteristic value maximum one
Or it is maximum several as new feature, by forming the feature of fixed dimension after this layer, to solve sentence length
Problem.
The processing that S3024, NN layers of step.
The new feature that step S3023 is generated can be according to the actual situation by one layer or the neural net layer of multilayer, most
Later layer is softmax layers, by NN layers, obtains the label or score of an attribute.
The processing of step S3025, Back-Propagation (BP).
When step S3024 obtains an attribute tags or score, according to the practical affective tag of training sentence and know
Error between other attribute, which retracts, is updated parameter, is optimal model into excessively a few wheel iteration, thus trained
Journey is completed, and CNN model (the first convolution neural network model) is obtained.
(2) voice-based DNN training process
Before the above-mentioned steps S202 to step S206 for executing the application, first algorithm model can be trained and also wrapped
It includes: before obtaining target audio, using training audio (or training voice) and the second emotion information to the second depth nerve net
Network model is trained, with determine the second deep neural network model in parameter value, and by determined parameter value it
The second deep neural network model afterwards is set as the first deep neural network model, wherein the second emotion information is training sound
The emotion information of frequency.It is described in detail below with reference to Fig. 4:
Step S401 carries out feature extraction to training audio.
Feature extraction carried out to training voice, the feature of extraction can there are many kinds of, such as PLP, MFCC, FBANK,
PITCH, ENERGY, I-VECTOR etc. can extract one or more in this various features, the spy that the application preferentially uses
Sign is manifold fusion.
Step S402 is trained DNN (the second deep neural network model) using the feature of extraction.
Selection includes the DNN model of one layer or multilayer neural network layer, the last layer of DNN model according to the actual situation
It is softmax layers (regression models), the fusion feature that previous step obtains is expanded into depth nerve net by before and after frames
DNN layers of network, then by softmax layers of output.
DNN model may also include Back-Propagation (BP layers of back-propagation algorithm), and BP layers defeated by softmax layers
Enter the label either difference of score and the result of affective tag to be handled using BP algorithm, the parameter of DNN be updated,
Into excessively a few wheel iteration make model reach one it is optimal, obtain the first deep neural network model;Identification process does not need this
Step.
(3) joint training based on voice and text
It is that two models are trained respectively in above-mentioned (1) and (2), does not excavate voice and text in identification
Internal association, but text and voice are identified respectively.It can be used to excavate the internal association of voice and text
Voice and text carry out joint training to model:
Before obtaining target audio, using training audio and the second text information to the second deep neural network model into
Row training, to determine the value of parameter in the second deep neural network model, and by second after the value that parameter has been determined
Deep neural network model is set as the first deep neural network model.
Above-mentioned training audio has the second phonetic feature, and the second text information has the second text feature, wherein uses
Training audio and the second text information are trained the second deep neural network model, to determine the second deep neural network mould
The value of parameter in type, and the first depth mind is set by the second deep neural network model after the value that parameter has been determined
It is specifically included through network model:
Step 1, using the second phonetic feature and the second text feature as the input of the second deep neural network model, with right
Second deep neural network model is trained, wherein is trained the second deep neural network model including deep for second
The parameter assignment in neural network model is spent, training audio carries the first emotion information;
Step 2, in the second emotion information and the matched feelings of the first emotion information of the output of the second deep neural network model
Under condition, the first deep neural network model will be set as to the second deep neural network model after parameter assignment, wherein first
Deep neural network model is used to identify that emotion information, incidence relation are special for describing emotion information and voice according to incidence relation
Incidence relation between sign, the first text feature;
Step 3, under the second emotion information and the unmatched situation of the first emotion information, adjustment assigns the second depth nerve
The value of parameter in network model, so that second of the second deep neural network model output after the value of adjustment imparting parameter
Emotion information is matched with the first emotion information;
Step 4, (text having based on the first text information is executed when being identified using the model after training
When the phonetic feature that eigen and target audio have determines the target emotion information of target audio), by the first phonetic feature and
Input of first text feature as the first deep neural network model, and the first deep neural network model is obtained according to first
The target emotion information for the target audio that phonetic feature and the first text feature determine.
In the technical solution that step S202 is provided, target audio is obtained, it is defeated by audio such as to obtain user at the terminal
Enter a segment of audio of equipment (such as microphone) input.
In the technical solution that step S204 is provided, the first text information, target audio tool are identified from target audio
There is phonetic feature, the first text information has text feature.
The extraction and selection of acoustic feature are an important links of speech recognition, and the extraction of acoustic feature is both a letter
Cease the process and a signal uncoiling process significantly compressed, it is therefore an objective to mode division device be enable preferably to divide.Due to language
The time-varying characteristics of sound signal, feature extraction must carry out on a bit of voice signal, namely carry out short-time analysis.This section of quilt
It is considered that stable analystal section is referred to as frame, the offset between frame and frame usually takes the 1/2 or 1/3 of frame length.Usually extract mesh
During phonetic feature in mark with phonetic symbols frequency, preemphasis can be carried out to signal to promote high frequency, to signal adding window to avoid in short-term
The influence at voice segments edge.The above-mentioned process for obtaining the first text information can be realized by speech recognition engine.
In the technical solution that step S206 is provided, the text feature and target audio being had based on the first text information are had
Some phonetic features determine the target emotion information of target audio.The technical solution that step S206 is provided includes at least following two
Implementation:
(1) mode one
The phonetic feature that the text feature and target audio being had based on the first text information are had determines target audio
When target emotion information, the first recognition result determined according to text feature is obtained, the first recognition result is for indicating according to text
The emotion information that eigen identifies;The second recognition result determined according to phonetic feature is obtained, the second recognition result is used for table
Show the emotion information identified according to phonetic feature;It is indicated at least one of the first recognition result and the second recognition result
When emotion information is the emotion information of the first emotion grade, the target emotion information of target audio is determined as the first emotion grade
Emotion information.Such as glad, flat, sad this group of emotion information, in the first recognition result and the second identification knot
As long as having one in fruit is glad or sad, final result (target emotion information) to be glad or sad, and is ignored without bright
The influence of the emotion information of flat the first estate of aobvious Sentiment orientation.
Above-mentioned the first recognition result and the second recognition result can directly be the emotion information identified, be also possible to use
In the other information (such as emotion score, affective style) for the emotion information that instruction identifies.
Optionally, the first convolution neural network model that is identified by of text feature is realized, is being obtained according to text feature
When determining first recognition result, directly obtains from the first convolution neural network model and identified according to from the first text information
Text feature determine the first recognition result.
Above-mentioned acquisition the first convolution neural network model is true according to the text feature identified from the first text information
The first fixed recognition result include: by the feature extraction layer of the first convolution neural network model in multiple characteristic dimensions to
One text information carries out feature extraction, obtains multiple text features, extracts in each characteristic dimension and obtains a text feature;
Feature identification is carried out to the first text feature in multiple text features by the classification layer of the first convolution neural network model, is obtained
To the first recognition result (namely selected characteristic is worth one or several maximum features), text feature includes the first text feature
With the second text feature, the characteristic value of the first text feature is greater than the characteristic value of any one the second text feature.
Phonetic feature is identified by the realization of the first deep neural network model, is obtaining the determined according to phonetic feature
When two recognition results, directly obtains from the first deep neural network model and determined according to the phonetic feature identified from target audio
The second recognition result.
(2) mode two
The phonetic feature that the text feature and target audio being had based on the first text information are had determines target audio
Target emotion information includes: the first recognition result for obtaining and being determined according to text feature, and the first recognition result includes being used to indicate
According to the first emotion parameter of the emotion information that text feature identifies;It obtains and is tied according to the second identification that phonetic feature determines
Fruit, the second recognition result include the second emotion parameter for being used to indicate the emotion information identified according to phonetic feature;It will be used for
Indicate the third emotion parameter final_score setting of target emotion information are as follows: the first emotion parameter Score1* is the first emotion
Weight a+ the second emotion parameter Score2* of parameter setting is the weight (1-a) of the second emotion parameter setting;The second feelings will be located at
The emotion information of sense grade is determined as target emotion information, and the second emotion grade is the emotion parameter where with third emotion parameter
The corresponding emotion grade in section, each emotion grade are corresponding with an emotion parameter section.
It should be noted that when obtaining according to the first determining recognition result of text feature, and obtain according to voice spy
When levying the second determining recognition result, reference can be made to model used in above-mentioned mode one is calculated.
Optionally, mesh is determined in the phonetic feature that the text feature and target audio that are had based on the first text information are had
After the target emotion information of mark with phonetic symbols frequency, plays target audio and show the target emotion information of target audio;Receive user's
Feedback information includes being used to indicate whether the target emotion information identified correctly indicates information in feedback information, not just
It further include the practical emotion information that user identifies according to the target audio of broadcasting in feedback information in the case where really.
If the target emotion information identified is incorrect, illustrate convolutional neural networks model and deep neural network model
Recognition accuracy it is to be improved, especially for the audio-frequency informations of this kind of identification mistakes, discrimination is worse, at this point, sharp
Discrimination is improved with negative feedback mechanism, specifically using the audio-frequency information of this kind of identification mistakes in a manner mentioned above to volume
Product neural network model and deep neural network model carry out re -training, to adjust the value of two Model Parameters, improve it
Recognition accuracy.
Optionally, the phonetic feature that the text feature and target audio being had based on the first text information are had determines target
When the target emotion information of audio, target audio can be divided into several audio sections, identified from multiple audio sections multiple
First text information, wherein any one first text information is identified from a corresponding audio section, audio section tool
There is phonetic feature, the first text information has text feature, so as to the phonetic feature and multiple first based on multiple audio sections
The text feature that text information has determines the target emotion information of multiple audio sections.
The text feature that phonetic feature and multiple first text informations based on multiple audio sections have determines multiple audios
The target emotion information of section includes the target emotion information for determining each audio section as follows: being obtained according to the first text
The first recognition result that the text feature of information determines (namely the convolutional neural networks model obtained is according to from the first text information
In the first recognition result for determining of the text feature that identifies), the first recognition result is identified for indicating according to text feature
Emotion information;The second determining recognition result of the phonetic feature of acquisition basis audio section corresponding with the first text information (
The second recognition result that the deep neural network model obtained is determined according to the phonetic feature identified from audio section), wherein
Second recognition result is used for the emotion information for indicating to identify according to phonetic feature;In the first recognition result and the second recognition result
At least one of indicate emotion information be the first emotion grade emotion information when, by the target emotion information of institute's audio section
It is determined as the emotion information of the first emotion grade.
Above-mentioned convolutional neural networks model is according to the first of the text feature determination identified from the first text information
Recognition result can be accomplished in that through the feature extraction layer of convolutional neural networks model in multiple characteristic dimensions
Feature extraction is carried out to the first text information, obtains multiple text features, wherein is extracted in each characteristic dimension and obtains one
Text feature;Feature knowledge is carried out to the first text feature in multiple text features by the classification layer of convolutional neural networks model
Not, the first recognition result is obtained, wherein text feature includes the first text feature and the second text feature, the first text feature
Characteristic value be greater than any one the second text feature characteristic value.
The second identification determined for the deep neural network model of acquisition according to the phonetic feature identified from audio section
As a result, similar with the mode of above-mentioned the first recognition result of acquisition, details are not described herein.
It in this scenario, can be made up and be identified using single features based on the method blended herein with voice
The shortcomings that, the fusion of the two is that text and audio training blend, and the method for fusion can be text output result and audio is defeated
It sums up to obtain final result, and not instead of whole section of adduction using a weight among result out, the adduction being segmented, because
Impossible one whole section of emotion for speaker remains unchanged, but can be risen and fallen, and may be with regard to several passes in one section of word
The emotion of keyword can recognize that the emotional characteristics of different phase speaker in whole section of words than stronger in this way.
On the basis of above-mentioned cognition, as long as voice or the apparent emotional color of text band (the i.e. feelings of the first emotion grade
Feel information), then it can determine that target voice is the voice with emotional color.In phonetic feature based on multiple audio sections and more
After the text feature that a first text information has determines the target emotion information of multiple audio sections, multiple target feelings can be obtained
Feel emotion grade belonging to each target emotion information in information;Including the first emotion grade in multiple target emotion informations
When emotion information, determine that the emotion information of target audio is the emotion information of the first emotion grade.
The first above-mentioned emotion grade is the grade with obvious emotion information, rather than tend to it is intermediate it is flat (without
Obvious emotion) information, such as glad, flat, sad this group of emotion information, the emotion information of the first emotion grade
Refer to glad or sadness, rather than it is flat, it is similar for other kinds of emotion information, it repeats no more.
As a kind of optional embodiment, embodiments herein is described in detail below with reference to Fig. 5:
Step S501 extracts the phonetic feature (namely acoustic feature) in target audio.
Step S502 carries out speech recognition by speech recognition engine.
In the training stage of speech recognition engine, each word in vocabulary can successively be given an account of, and by its feature
Vector is stored in template library as template.
In the stage for carrying out speech recognition by speech recognition engine, will input the acoustic feature vector of voice successively with mould
Each template in plate library carries out similarity-rough set, exports similarity soprano as recognition result.
Step S503 obtains Text region result (i.e. the first text information).
Step S504 segments the first text information, such as to " tomorrow will have a holiday or vacation, I am good happy " participle
As a result are as follows: tomorrow, will, have a holiday or vacation, I, it is good, happy,.
Step S505, multiple words that above-mentioned participle is obtained are as the input of CNN model, and CNN model is to multiple words
Language carries out convolution, classification, identifying processing.
Step S506 obtains the first recognition result score1 of CNN model output.
Step S507 is handled by phonetic feature of the DNN model to target audio.
DNN model is according to above-mentioned phonetic feature (perceptual weighting linear prediction PLP, Mel frequency cepstral coefficient identified
It is MFCC, FBANK, tone PITCH, multiple in speech energy ENERGY, I-VECTOR) carry out identifying processing.
Step S508 obtains the second recognition result score2.
Convolution, classification processing are carried out to these fusion features (multiple features) using the convolutional layer of DNN model, obtained final
Recognition result score2.
Step S509 carries out fusion treatment to recognition result and obtains final result.
The target audio of input, by feature extraction, feature extraction is divided into two kinds of one kind for speech recognition, by voice
It identifies engine, obtains speech recognition result, speech recognition result is sent to text emotion detecting and alarm, obtains text by participle
Emotion score score1;Another is used to detect score based on audio emotion, is sent to the detection of audio emotion by feature extraction,
Audio score score2 is obtained, then obtains final score final_score by a weight factor:
Final_score=a*score1+ (1-a) * score2.
A is the weighted value obtained by development set training, and final score is the score between 0-1.
For example, sad corresponding score section be [0,0.3), flat corresponding score section be [0.3,0.7), it is glad right
The score [0.7,1] answered can determine that actual emotion is glad, sad or flat according to finally obtained score value.
In embodiments herein, using based on the method blended herein with voice, individual difference can be made up
The shortcomings that method, can increase the weight that a weight factor is used to adjust two methods during the two blends, with
It is applicable in different occasions.The application can be divided into two modules, training module and identification module, and training module can be instructed individually
To practice, different text and audio are chosen according to different situations, three kinds of emotional characteristics in the application are glad, normal and unhappy,
Degree glad and out of sorts can indicate that the score of emotion is more passive closer to zero mood between 0-1 with score,
It is more positive closer to 1 mood, for application can be whole sentence and differentiate.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
According to embodiments of the present invention, it additionally provides a kind of for implementing the emotion information of the determination method of above-mentioned emotion information
Determining device.Fig. 6 is a kind of schematic diagram of the determining device of optional emotion information according to an embodiment of the present invention, such as Fig. 6
It is shown, the apparatus may include: acquiring unit 61, recognition unit 62 and determination unit 63.
Acquiring unit 61, for obtaining target audio.
It can be terminal and actively obtain target audio, perhaps receive the target audio of other equipment transmissions or in target
Target audio is obtained under the triggering of instruction.Target instruction target word is equivalent to the finger of the target audio for identification of user or terminal triggering
It enables.Obtaining target audio is the emotion information in order to identify target audio, which is to state text by target audio
Show when information and (is including but not limited to showed by the wording in text or tone, tone color in text, audio etc.
) come out emotion information.
Above-mentioned text information refers to the combination of a sentence or multiple sentences, and a text includes but is not limited to one
Sentence (Sentence), a paragraph (Paragraph) or a chapter (Discourse).
Emotion information is the information for describing speaker's emotion, such as when merely arriving something, is given expression to related to happiness
Emotion (glad, flat, sad), when such as receiving others and apologizing, giving expression to relevant to pardon emotion, (pardon, do not set can
It is no, do not forgive) etc..
Recognition unit 62, for identifying the first text information from target audio, target audio has phonetic feature, the
One text information has text feature.
The first text information is identified from target audio, is referred to and is identified target audio institute by way of speech recognition
(the first text information identified herein may there are subtle areas with the text information actually stated for first text information of expression
Not).
For speech recognition, phonetic feature includes the feature of following several respects: perceptual weighting linear prediction PLP
(Perceptual Linear Predictive), Mel frequency cepstral coefficient MFCC (Mel-Frequency Cepstral
Coefficients), FBANK (Filter-bank feature), tone PITCH (such as high bass), speech energy ENERGY, I-
VECTOR a kind of important feature of vocal acoustics' difference (reflection speak) etc..The feature used in the application can be among the above
It is one or more, it is preferred to use multiple.
For text identification, the first above-mentioned text can be identified from target audio by speech recognition engine
Information, the text feature of text information include the affective style of each phrase or vocabulary in text, Sentiment orientation, emotional intensity etc.
Feature can also be the incidence relation feature etc. between phrase.
Determination unit 63, the phonetic feature that text feature and target audio for being had based on the first text information are had
Determine the target emotion information of target audio.
In the target emotion information for determining target audio, comprehensively considered text feature that the first text information has and
The phonetic feature that target audio has, relative in the related technology, only with the emotion detection method based on audio to speaker
Audio detected for, both can be used audio carry out emotion detection, to speaker have obvious emotional expression
The case where, has the function of relatively good, but when the emotional expression of speaker is not strong, for example a thing being very glad, use is very flat
When the light tone is expressed, it can also hardly be used in this case with the feature for expressing happiness in audio
Text based emotion detection method detects the text information in the audio of speaker, so as to according to text feature
It is accurately adjudicated, to make up only by audio the deficiency for carrying out emotion detection, reaches the accuracy for improving court verdict
Effect.
It should be noted that the acquiring unit 61 in the embodiment can be used for executing the step in the embodiment of the present application 1
S202, the recognition unit 62 in the embodiment can be used for executing the step S204 in the embodiment of the present application 1, in the embodiment
Determination unit 63 can be used for executing the step S206 in the embodiment of the present application 1.
Herein it should be noted that above-mentioned module is identical as example and application scenarios that corresponding step is realized, but not
It is limited to 1 disclosure of that of above-described embodiment.It should be noted that above-mentioned module as a part of device may operate in as
In hardware environment shown in FIG. 1, hardware realization can also be passed through by software realization.
By above-mentioned module, when getting target audio, the first text information is identified from target audio, then base
The phonetic feature that the text feature and target audio having in the first text information have determines the target emotion letter of target audio
Breath, namely in text information there is apparent emotion can determine that emotion is believed when showing by the text feature of text information
Breath in target audio there is apparent emotion can determine emotion information by the phonetic feature of target audio when showing, can
To solve the technical issues of can not accurately identifying the emotion information of speaker in the related technology, and then reach raising identification speaker
Emotion information accuracy technical effect.
Only with the emotion detection method based on audio for the audio of speaker detects, to speaker have than
The case where obvious emotional expression have the function of it is relatively good, using text based emotion detection method to the sound of speaker
Text information in frequency has the function of the case where obvious emotional expression with relatively good, however, when (i.e. what
Scene or which type of voice) detected using the emotion detection method based on audio, when utilize text based feelings
It is unknown that sense detection method, which carries out detection, it is impossible to is predicted in advance using any method come the inspection to current audio to be detected
It is more preferable to survey effect.
If applicant is it is considered that the text obvious for certain emotions is stated using the flat tone, (such as emotion is
If glad text is stated using the flat tone), the recognition effect using text based emotion detection method is obviously preferable,
If being stated (as more flat text is glad than flat aobvious text using the tone with obvious emotion for certain emotions
Tone statement), the recognition effect of the emotion detection method based on audio is obviously preferable, and the obvious text of above-mentioned emotion can be with
It is stated using the obvious tone of the flat tone or emotion, the more flat text of emotion also can be used with significant emotion
The tone or the statement of the flat tone, be not in the tone table that the obvious text of certain positive emotions uses reversed emotion
It states, the text such as with happiness emotional color is stated using the sad tone.
Therefore, on the basis of above-mentioned cognition, as long as voice or the apparent emotional color of text band (i.e. the first emotion grade
Emotion information), then can determine target voice be the voice with emotional color.As shown in fig. 7, determination unit can pass through
Following module realizes above-mentioned technical proposal: first obtains module 631, for obtaining the first identification knot determined according to text feature
Fruit, wherein the first recognition result is used for the emotion information for indicating to identify according to text feature;Second obtains module 632, is used for
Obtain the second recognition result determined according to phonetic feature, wherein the second recognition result is identified for indicating according to phonetic feature
Emotion information out;First determining module 633, for being indicated at least one of the first recognition result and the second recognition result
Emotion information be the first emotion grade emotion information when, the target emotion information of target audio is determined as first emotion etc.
The emotion information of grade.
The first above-mentioned emotion grade is the grade with obvious emotion information, rather than tend to it is intermediate it is flat (without
Obvious emotion) information, such as glad, flat, sad this group of emotion information, the emotion information of the first emotion grade
Refer to glad or sadness, rather than it is flat, it is similar for other kinds of emotion information, it repeats no more.
In the above-mentioned technical solution identified of the application, common algorithm or machine are including but not limited to used
Device, which learns relevant algorithm, which carries out feature identification and the identification of emotion information, can use engineering for the accuracy of raising
Relevant algorithm is practised to carry out the identification of feature identification and emotion information.
Optionally, before acquiring unit gets target audio, the first training unit uses the second text information and the
One emotion information is trained the second convolution neural network model, and to determine, parameter is taken in the second convolution neural network model
Value, and the first convolution neural network model is set by the second convolution neural network model after the value that parameter has been determined,
Wherein, the first emotion information is the emotion information of the second text information.
Optionally, before acquiring unit gets target audio, the second training unit uses training audio and the second feelings
Sense information is trained the second deep neural network model, to determine the value of parameter in the second deep neural network model,
And the first deep neural network model is set by the second deep neural network model after the value that parameter has been determined,
In, the second emotion information is the emotion information of training audio.
After having trained identification model, above-mentioned first obtain module obtain the first convolution neural network model according to
When the first recognition result that the text feature identified from the first text information determines, the first convolution neural network model is obtained
The first recognition result determined according to the text feature identified from the first text information.Pass through the first convolutional neural networks mould
The feature extraction layer of type carries out feature extraction to the first text information in multiple characteristic dimensions, obtains multiple text features,
In, it is extracted in each characteristic dimension and obtains a text feature;By the classification layer of the first convolution neural network model to more
The first text feature in a text feature carries out feature identification, obtains the first recognition result, wherein text feature includes first
Text feature and the second text feature, the characteristic value of the first text feature are greater than the characteristic value of any one the second text feature.
It is to obtain the first depth mind according to the second recognition result that phonetic feature determines that the second above-mentioned acquisition module, which is obtained,
The second recognition result determined through network model according to the phonetic feature identified from target audio.
Optionally, the determination unit of the application may also include that third obtains module, be determined for obtaining according to text feature
The first recognition result, wherein the first recognition result includes be used to indicate the emotion information identified according to text feature
One emotion parameter;4th obtains module, for obtaining the second recognition result determined according to phonetic feature, wherein the second identification
It as a result include the second emotion parameter for being used to indicate the emotion information identified according to phonetic feature;Setup module, for that will use
In the third emotion parameter setting of instruction target emotion information are as follows: the weight that the first emotion parameter * is arranged for the first emotion parameter+
Second emotion parameter * is the weight of the second emotion parameter setting;Second determining module, for the feelings of the second emotion grade will to be located at
Sense information is determined as target emotion information, wherein the second emotion grade is and the emotion parameter section where third emotion parameter
Corresponding emotion grade, each emotion grade are corresponding with an emotion parameter section.
The target audio of input, by feature extraction, feature extraction is divided into two kinds of one kind for speech recognition, by voice
It identifies engine, obtains speech recognition result, speech recognition result is sent to text emotion detecting and alarm, obtains text by participle
Emotion score score1;Another is used to detect score based on audio emotion, is sent to the detection of audio emotion by feature extraction,
Audio score score2 is obtained, then obtains final score final_score by a weight factor:
Final_score=a*score1+ (1-a) * score2.
A is the weighted value obtained by development set training, and final score is the score between 0-1.
In embodiments herein, using based on the method blended herein with voice, individual difference can be made up
The shortcomings that method, can increase the weight that a weight factor is used to adjust two methods during the two blends, with
It is applicable in different occasions.The application can be divided into two modules, training module and identification module, and training module can be instructed individually
To practice, different text and audio are chosen according to different situations, three kinds of emotional characteristics in the application are glad, normal and unhappy,
Degree glad and out of sorts can indicate that the score of emotion is more passive closer to zero mood between 0-1 with score,
It is more positive closer to 1 mood, for application can be whole sentence and differentiate.
Herein it should be noted that above-mentioned module is identical as example and application scenarios that corresponding step is realized, but not
It is limited to 1 disclosure of that of above-described embodiment.It should be noted that above-mentioned module as a part of device may operate in as
In hardware environment shown in FIG. 1, hardware realization can also be passed through by software realization, wherein hardware environment includes network
Environment.
Embodiment 3
According to embodiments of the present invention, additionally provide a kind of server for implementing the determination method of above-mentioned emotion information or
Terminal (namely electronic device).
Fig. 8 is a kind of structural block diagram of terminal according to an embodiment of the present invention, as shown in figure 8, the terminal may include: one
A or multiple (one is only shown in Fig. 8) processor 801, memory 803 and transmitting device 805 are (in such as above-described embodiment
Sending device), as shown in figure 8, the terminal can also include input-output equipment 807.
Wherein, memory 803 can be used for storing software program and module, such as the emotion information in the embodiment of the present invention
Determine the corresponding program instruction/module of method and apparatus, the software journey that processor 801 is stored in memory 803 by operation
Sequence and module realize the determination method of above-mentioned emotion information thereby executing various function application and data processing.It deposits
Reservoir 803 may include high speed random access memory, can also include nonvolatile memory, as one or more magnetic storage fills
It sets, flash memory or other non-volatile solid state memories.In some instances, memory 803 can further comprise relative to place
The remotely located memory of device 801 is managed, these remote memories can pass through network connection to terminal.The example packet of above-mentioned network
Include but be not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Above-mentioned transmitting device 805 is used to that data to be received or sent via network, can be also used for processor with
Data transmission between memory.Above-mentioned network specific example may include cable network and wireless network.In an example,
Transmitting device 805 includes a network adapter (Network Interface Controller, NIC), can pass through cable
It is connected with other network equipments with router so as to be communicated with internet or local area network.In an example, transmission dress
805 are set as radio frequency (Radio Frequency, RF) module, is used to wirelessly be communicated with internet.
Wherein, specifically, memory 803 is for storing application program.
The application program that processor 801 can call memory 803 to store by transmitting device 805, to execute following steps
It is rapid: to obtain target audio;The first text information is identified from target audio, target audio has phonetic feature, the first text
Information has text feature;The phonetic feature that the text feature and target audio being had based on the first text information are had determines mesh
The target emotion information of mark with phonetic symbols frequency.
Processor 801 is also used to execute following step: obtaining the first recognition result determined according to text feature, wherein
First recognition result is used for the emotion information for indicating to identify according to text feature;It obtains and is known according to second that phonetic feature determines
Other result, wherein the second recognition result is used for the emotion information for indicating to identify according to phonetic feature;In the first recognition result and
When the emotion information that at least one of second recognition result indicates is the emotion information of the first emotion grade, by target audio
Target emotion information is determined as the emotion information of the first emotion grade.
Using the embodiment of the present invention, when getting target audio, the first text information is identified from target audio, so
The phonetic feature that the text feature and target audio being had afterwards based on the first text information are had determines the target feelings of target audio
Feel information, namely in text information there is apparent emotion can determine emotion by the text feature of text information when showing
Information in target audio there is apparent emotion can determine emotion information by the phonetic feature of target audio when showing,
The technical issues of can solve the emotion information that can not accurately identify speaker in the related technology, and then reach raising identification and speak
The technical effect of the accuracy of the emotion information of person.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2
Example, details are not described herein for the present embodiment.
It will appreciated by the skilled person that structure shown in Fig. 8 is only to illustrate, terminal can be smart phone
(such as Android phone, iOS mobile phone), tablet computer, palm PC and mobile internet device (Mobile Internet
Devices, MID), the terminal devices such as PAD.Fig. 8 it does not cause to limit to the structure of above-mentioned electronic device.For example, terminal is also
May include than shown in Fig. 8 more perhaps less component (such as network interface, display device) or have with shown in Fig. 8
Different configurations.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing the relevant hardware of terminal device by program, which can store in a computer readable storage medium
In, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random
Access Memory, RAM), disk or CD etc..
Embodiment 4
The embodiments of the present invention also provide a kind of storage mediums.Optionally, in the present embodiment, above-mentioned storage medium can
With the program code of the determination method for executing emotion information.
Optionally, in the present embodiment, above-mentioned storage medium can be located at multiple in network shown in above-described embodiment
On at least one network equipment in the network equipment.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:
S11 obtains target audio;
S12, identifies the first text information from target audio, and target audio has phonetic feature, the first text information
With text feature;
S13, the phonetic feature that the text feature and target audio being had based on the first text information are had determine target sound
The target emotion information of frequency.
Optionally, storage medium is also configured to store the program code for executing following steps:
S21 obtains the first recognition result determined according to text feature, wherein the first recognition result is for indicating basis
The emotion information that text feature identifies;
S22 obtains the second recognition result determined according to phonetic feature, wherein the second recognition result is for indicating basis
The emotion information that phonetic feature identifies;
S23 is the first emotion in the emotion information that at least one of the first recognition result and the second recognition result indicate
When the emotion information of grade, the target emotion information of target audio is determined as to the emotion information of the first emotion grade.
Optionally, the specific example in the present embodiment can be shown with reference to described in above-described embodiment 1 and embodiment 2
Example, details are not described herein for the present embodiment.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or
The various media that can store program code such as CD.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
If the integrated unit in above-described embodiment is realized in the form of SFU software functional unit and as independent product
When selling or using, it can store in above-mentioned computer-readable storage medium.Based on this understanding, skill of the invention
Substantially all or part of the part that contributes to existing technology or the technical solution can be with soft in other words for art scheme
The form of part product embodies, which is stored in a storage medium, including some instructions are used so that one
Platform or multiple stage computers equipment (can be personal computer, server or network equipment etc.) execute each embodiment institute of the present invention
State all or part of the steps of method.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed client, it can be by others side
Formula is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, and only one
Kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
It is desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or discussed it is mutual it
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of determination method of emotion information characterized by comprising
Obtain target audio;
The first text information is identified from the target audio, wherein the target audio have phonetic feature, described first
Text information has text feature;
The phonetic feature that the text feature and the target audio being had based on first text information are had determines the mesh
The target emotion information of mark with phonetic symbols frequency, the language that the text feature and the target audio being had based on first text information are had
Sound feature determines that the target emotion information of the target audio includes: to obtain to tie according to the first identification that the text feature determines
Fruit, wherein first recognition result is used for the emotion information for indicating to identify according to the text feature;It obtains according to
The second recognition result that phonetic feature determines, wherein second recognition result is identified for indicating according to the phonetic feature
Emotion information out;It is in the emotion information that at least one of first recognition result and second recognition result indicate
When the emotion information of the first emotion grade, the target emotion information of the target audio is determined as the first emotion grade
Emotion information, the first emotion grade are the grade with obvious emotion information, rather than without the grade of obvious emotion.
2. the method according to claim 1, wherein
Obtain according to the text feature determine the first recognition result include: obtain the first convolution neural network model according to from
First recognition result that the text feature identified in first text information determines;
Obtain according to the phonetic feature determine the second recognition result include: obtain the first deep neural network model according to from
Second recognition result that the phonetic feature that the target audio identifies determines.
3. according to the method described in claim 2, it is characterized in that, obtaining the first convolution neural network model according to from described the
First recognition result that the text feature that identifies in one text information determines includes:
By the feature extraction layer of the first convolution neural network model to first text envelope in multiple characteristic dimensions
Breath carries out feature extraction, obtains multiple text features, wherein extract and obtained described in one in each characteristic dimension
Text feature;
By the classification layer of the first convolution neural network model to the first text feature in multiple text features into
The identification of row feature, obtains first recognition result, wherein the text feature includes first text feature and the second text
Eigen, the characteristic value of first text feature are greater than the characteristic value of any one of second text feature.
4. according to the method described in claim 2, it is characterized in that, before obtaining target audio, the method also includes:
The second convolution neural network model is trained using the second text information and the first emotion information, with determination described
The value of two convolutional neural networks Model Parameters, and by second convolutional Neural after the value that the parameter has been determined
Network model is set as the first convolution neural network model, wherein first emotion information is second text envelope
The emotion information of breath.
5. according to the method described in claim 2, it is characterized in that, before obtaining target audio, the method also includes:
The second deep neural network model is trained using training audio and the second emotion information, it is deep with determination described second
The value of parameter in neural network model is spent, and by second deep neural network after the value that the parameter has been determined
Model is set as first deep neural network model, wherein second emotion information is the emotion of the trained audio
Information.
6. the method according to claim 1, wherein the text feature that is had based on first text information and
The phonetic feature that the target audio has determines that the target emotion information of the target audio includes:
Obtain the first recognition result determined according to the text feature, wherein first recognition result includes being used to indicate
According to the first emotion parameter of the emotion information that the text feature identifies;
Obtain the second recognition result determined according to the phonetic feature, wherein second recognition result includes being used to indicate
According to the second emotion parameter of the emotion information that the phonetic feature identifies;
The third emotion parameter setting of the target emotion information will be used to indicate are as follows: the first emotion parameter * is described the
Weight+the second emotion parameter * of one emotion parameter setting is the weight of second emotion parameter setting;
The emotion information for being located at the second emotion grade is determined as the target emotion information, wherein the second emotion grade
It is emotion grade corresponding with the emotion parameter section where the third emotion parameter, each emotion grade is corresponding with a feelings
Feel parameter section.
7. a kind of determining device of emotion information characterized by comprising
Acquiring unit, for obtaining target audio;
Recognition unit, for identifying the first text information from the target audio, wherein the target audio has voice
Feature, first text information have text feature;
Determination unit, the voice that text feature and the target audio for being had based on first text information are had are special
Sign determines the target emotion information of the target audio, and the determination unit includes: the first acquisition module, for obtaining according to institute
State the first recognition result that text feature determines, wherein first recognition result is known for indicating according to the text feature
Not Chu emotion information;Second obtains module, for obtaining the second recognition result determined according to the phonetic feature, wherein
Second recognition result is used for the emotion information for indicating to identify according to the phonetic feature;First determining module is used for
The emotion information that at least one of first recognition result and second recognition result indicate is the first emotion grade
When emotion information, the target emotion information of the target audio is determined as to the emotion information of the first emotion grade, it is described
First emotion grade is the grade with obvious emotion information, rather than without the grade of obvious emotion.
8. device according to claim 7, which is characterized in that the determination unit includes:
Third obtains module, for obtaining the first recognition result determined according to the text feature, wherein first identification
It as a result include the first emotion parameter for being used to indicate the emotion information identified according to the text feature;
4th obtains module, for obtaining the second recognition result determined according to the phonetic feature, wherein second identification
It as a result include the second emotion parameter for being used to indicate the emotion information identified according to the phonetic feature;
Setup module, the third emotion parameter for that will be used to indicate the target emotion information are arranged are as follows: first emotion
Parameter * is the power that weight+second emotion parameter * of the first emotion parameter setting is the second emotion parameter setting
Weight;
Second determining module, for the emotion information for being located at the second emotion grade to be determined as the target emotion information, wherein
The second emotion grade is emotion grade corresponding with the emotion parameter section where the third emotion parameter, each emotion
Grade is corresponding with an emotion parameter section.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein when described program is run
Execute method described in 1 to 6 any one of the claims.
10. a kind of electronic device, including memory, processor and it is stored on the memory and can transports on the processor
Capable computer program, which is characterized in that the processor executes the claims 1 to 6 by the computer program
Method described in one.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710527116.4A CN108305642B (en) | 2017-06-30 | 2017-06-30 | The determination method and apparatus of emotion information |
PCT/CN2018/093085 WO2019001458A1 (en) | 2017-06-30 | 2018-06-27 | Method and device for determining emotion information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710527116.4A CN108305642B (en) | 2017-06-30 | 2017-06-30 | The determination method and apparatus of emotion information |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108305642A CN108305642A (en) | 2018-07-20 |
CN108305642B true CN108305642B (en) | 2019-07-19 |
Family
ID=62872598
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710527116.4A Active CN108305642B (en) | 2017-06-30 | 2017-06-30 | The determination method and apparatus of emotion information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108305642B (en) |
Families Citing this family (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109192192A (en) * | 2018-08-10 | 2019-01-11 | 北京猎户星空科技有限公司 | A kind of Language Identification, device, translator, medium and equipment |
CN109741732B (en) | 2018-08-30 | 2022-06-21 | 京东方科技集团股份有限公司 | Named entity recognition method, named entity recognition device, equipment and medium |
CN109243490A (en) * | 2018-10-11 | 2019-01-18 | 平安科技(深圳)有限公司 | Driver's Emotion identification method and terminal device |
CN109740156B (en) * | 2018-12-28 | 2023-08-04 | 北京金山安全软件有限公司 | Feedback information processing method and device, electronic equipment and storage medium |
CN109767765A (en) * | 2019-01-17 | 2019-05-17 | 平安科技(深圳)有限公司 | Talk about art matching process and device, storage medium, computer equipment |
CN111681645B (en) * | 2019-02-25 | 2023-03-31 | 北京嘀嘀无限科技发展有限公司 | Emotion recognition model training method, emotion recognition device and electronic equipment |
CN110070893A (en) * | 2019-03-25 | 2019-07-30 | 成都品果科技有限公司 | A kind of system, method and apparatus carrying out sentiment analysis using vagitus |
CN110097894B (en) * | 2019-05-21 | 2021-06-11 | 焦点科技股份有限公司 | End-to-end speech emotion recognition method and system |
CN110188361A (en) * | 2019-06-10 | 2019-08-30 | 北京智合大方科技有限公司 | Speech intention recognition methods and device in conjunction with text, voice and emotional characteristics |
CN110473571A (en) * | 2019-07-26 | 2019-11-19 | 北京影谱科技股份有限公司 | Emotion identification method and device based on short video speech |
CN110688499A (en) * | 2019-08-13 | 2020-01-14 | 深圳壹账通智能科技有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN110675859B (en) * | 2019-09-05 | 2021-11-23 | 华南理工大学 | Multi-emotion recognition method, system, medium, and apparatus combining speech and text |
CN110599999A (en) * | 2019-09-17 | 2019-12-20 | 寇晓宇 | Data interaction method and device and robot |
CN110910901B (en) * | 2019-10-08 | 2023-03-28 | 平安科技(深圳)有限公司 | Emotion recognition method and device, electronic equipment and readable storage medium |
CN110825503B (en) * | 2019-10-12 | 2024-03-19 | 平安科技(深圳)有限公司 | Theme switching method and device, storage medium and server |
CN110827799B (en) * | 2019-11-21 | 2022-06-10 | 百度在线网络技术(北京)有限公司 | Method, apparatus, device and medium for processing voice signal |
CN111145786A (en) * | 2019-12-17 | 2020-05-12 | 深圳追一科技有限公司 | Speech emotion recognition method and device, server and computer readable storage medium |
CN111091810A (en) * | 2019-12-19 | 2020-05-01 | 佛山科学技术学院 | VR game character expression control method based on voice information and storage medium |
CN111081279A (en) * | 2019-12-24 | 2020-04-28 | 深圳壹账通智能科技有限公司 | Voice emotion fluctuation analysis method and device |
CN111324207A (en) * | 2020-02-28 | 2020-06-23 | 京东方科技集团股份有限公司 | Drawing display method and device and electronic equipment |
CN111510563A (en) * | 2020-04-16 | 2020-08-07 | 中国银行股份有限公司 | Intelligent outbound method and device, storage medium and electronic equipment |
CN112002348B (en) * | 2020-09-07 | 2021-12-28 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN113192498A (en) * | 2021-05-26 | 2021-07-30 | 北京捷通华声科技股份有限公司 | Audio data processing method and device, processor and nonvolatile storage medium |
CN113645364B (en) * | 2021-06-21 | 2023-08-22 | 国网浙江省电力有限公司金华供电公司 | Intelligent voice outbound method for power dispatching |
CN113241060B (en) * | 2021-07-09 | 2021-12-17 | 明品云(北京)数据科技有限公司 | Security early warning method and system |
CN117409818A (en) * | 2022-07-08 | 2024-01-16 | 顺丰科技有限公司 | Speech emotion recognition method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456314A (en) * | 2013-09-03 | 2013-12-18 | 广州创维平面显示科技有限公司 | Emotion recognition method and device |
CN104102627A (en) * | 2014-07-11 | 2014-10-15 | 合肥工业大学 | Multi-mode non-contact emotion analyzing and recording system |
CN104200804A (en) * | 2014-09-19 | 2014-12-10 | 合肥工业大学 | Various-information coupling emotion recognition method for human-computer interaction |
CN105427869A (en) * | 2015-11-02 | 2016-03-23 | 北京大学 | Session emotion autoanalysis method based on depth learning |
CN105760852A (en) * | 2016-03-14 | 2016-07-13 | 江苏大学 | Driver emotion real time identification method fusing facial expressions and voices |
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
CN106503805A (en) * | 2016-11-14 | 2017-03-15 | 合肥工业大学 | A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method |
-
2017
- 2017-06-30 CN CN201710527116.4A patent/CN108305642B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103456314A (en) * | 2013-09-03 | 2013-12-18 | 广州创维平面显示科技有限公司 | Emotion recognition method and device |
CN104102627A (en) * | 2014-07-11 | 2014-10-15 | 合肥工业大学 | Multi-mode non-contact emotion analyzing and recording system |
CN104200804A (en) * | 2014-09-19 | 2014-12-10 | 合肥工业大学 | Various-information coupling emotion recognition method for human-computer interaction |
CN105427869A (en) * | 2015-11-02 | 2016-03-23 | 北京大学 | Session emotion autoanalysis method based on depth learning |
CN105760852A (en) * | 2016-03-14 | 2016-07-13 | 江苏大学 | Driver emotion real time identification method fusing facial expressions and voices |
CN106297826A (en) * | 2016-08-18 | 2017-01-04 | 竹间智能科技(上海)有限公司 | Speech emotional identification system and method |
CN106503805A (en) * | 2016-11-14 | 2017-03-15 | 合肥工业大学 | A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method |
Also Published As
Publication number | Publication date |
---|---|
CN108305642A (en) | 2018-07-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108305642B (en) | The determination method and apparatus of emotion information | |
CN108305643A (en) | The determination method and apparatus of emotion information | |
CN108305641B (en) | Method and device for determining emotion information | |
US10706873B2 (en) | Real-time speaker state analytics platform | |
US20060080098A1 (en) | Apparatus and method for speech processing using paralinguistic information in vector form | |
Yeh et al. | Segment-based emotion recognition from continuous Mandarin Chinese speech | |
CN112466326B (en) | Voice emotion feature extraction method based on transducer model encoder | |
Aloufi et al. | Emotionless: Privacy-preserving speech analysis for voice assistants | |
CN109155132A (en) | Speaker verification method and system | |
Triantafyllopoulos et al. | Deep speaker conditioning for speech emotion recognition | |
WO2020237769A1 (en) | Accompaniment purity evaluation method and related device | |
CN112530408A (en) | Method, apparatus, electronic device, and medium for recognizing speech | |
CN108986798B (en) | Processing method, device and the equipment of voice data | |
CN113239147A (en) | Intelligent conversation method, system and medium based on graph neural network | |
CN110111778A (en) | A kind of method of speech processing, device, storage medium and electronic equipment | |
WO2019001458A1 (en) | Method and device for determining emotion information | |
CN108829739A (en) | A kind of information-pushing method and device | |
Pao et al. | A study on the search of the most discriminative speech features in the speaker dependent speech emotion recognition | |
Lanjewar et al. | Speech emotion recognition: a review | |
CN105895079A (en) | Voice data processing method and device | |
Johar | Paralinguistic profiling using speech recognition | |
CN113327631B (en) | Emotion recognition model training method, emotion recognition method and emotion recognition device | |
WO2020162239A1 (en) | Paralinguistic information estimation model learning device, paralinguistic information estimation device, and program | |
Afshan et al. | Attention-based conditioning methods using variable frame rate for style-robust speaker verification | |
CN110232911B (en) | Singing following recognition method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |