CN111354377A - Method and device for recognizing emotion through voice and electronic equipment - Google Patents
Method and device for recognizing emotion through voice and electronic equipment Download PDFInfo
- Publication number
- CN111354377A CN111354377A CN201910569691.XA CN201910569691A CN111354377A CN 111354377 A CN111354377 A CN 111354377A CN 201910569691 A CN201910569691 A CN 201910569691A CN 111354377 A CN111354377 A CN 111354377A
- Authority
- CN
- China
- Prior art keywords
- voice
- emotion
- recognition result
- emotion recognition
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000008909 emotion recognition Effects 0.000 claims abstract description 128
- 239000013598 vector Substances 0.000 claims abstract description 70
- 238000012545 processing Methods 0.000 claims abstract description 32
- 230000015654 memory Effects 0.000 claims description 27
- 230000036651 mood Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 description 5
- 238000012549 training Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 241000989913 Gunnera petaloidea Species 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- General Health & Medical Sciences (AREA)
- Child & Adolescent Psychology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a device for recognizing emotion through voice and electronic equipment, wherein the method comprises the following steps: acquiring a voice signal of a recognition object; processing the voice signal to obtain a voice characteristic vector; inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result; searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result; and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result. The invention can recognize emotion through voice.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for recognizing emotion through voice and electronic equipment.
Background
The voice has various characteristics, the individual category of the voice can be identified through the voice, and for human beings, the emotion of the human can be identified according to different characteristics of the voice. In the education field, through voice recognition student's mood, can help the teacher in time to know student's the condition, the teacher of being convenient for adjusts the teaching mode, improves the teaching effect, or in time discovers the unusual student of mood and carries out the front guide.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for recognizing emotion through voice, and an electronic device, which are capable of recognizing emotion through voice.
In view of the above object, the present invention provides a method for recognizing emotion by voice, comprising:
acquiring a voice signal of a recognition object;
processing the voice signal to obtain a voice characteristic vector;
inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result;
searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
Optionally, the speech feature vector includes a mood feature, a speech rate feature, a intonation feature, a pronunciation frequency feature, an accent feature, and a word.
Optionally, the mood characteristic, the speech speed characteristic, the intonation characteristic and the pronunciation frequency characteristic are input into the emotion recognition model, and the first emotion recognition result is obtained through recognition.
Optionally, the emotion word database is searched for words according to the accent features, and the second emotion recognition result is obtained.
Optionally, the method further includes:
and searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
An embodiment of the present invention further provides a device for recognizing emotion through voice, including:
the voice acquisition module is used for acquiring a voice signal of the recognition object;
the voice processing module is used for processing the voice signal to obtain a voice characteristic vector;
the first recognition module is used for inputting the voice feature vector into an emotion recognition model and recognizing to obtain a first emotion recognition result;
the second recognition module is used for searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and the recognition result module is used for obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
Optionally, the speech feature vector includes a mood feature, a speech rate feature, a intonation feature, a pronunciation frequency feature, an accent feature, and a word.
Optionally, the first recognition module is configured to input the mood characteristic, the speed characteristic, the intonation characteristic, and the pronunciation frequency characteristic into the emotion recognition model, and recognize to obtain the first emotion recognition result.
Optionally, the second recognition module is configured to search the emotion word database by using words according to the accent features to obtain the second emotion recognition result.
Optionally, the apparatus further comprises:
and the identity recognition module is used for searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
The embodiment of the invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the method for recognizing the emotion through the voice when executing the program.
As can be seen from the above, according to the method and apparatus for recognizing emotion through voice and the electronic device provided by the present invention, the voice signal of the recognition object is obtained, the voice signal is processed to obtain the voice feature vector, the first emotion recognition result is obtained by using the emotion recognition model according to the voice feature vector, the emotion word database is searched according to the voice feature vector to obtain the second emotion recognition result, and the final emotion recognition result is obtained according to the first emotion recognition result and the second emotion recognition result. The invention can recognize emotion through voice.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention. As shown in the figure, the method for recognizing emotion through voice according to the embodiment of the present invention includes:
s10: acquiring a voice signal of a recognition object;
in some embodiments, a voice signal of the recognition object may be collected by a voice collecting apparatus.
In the application scene of school, a voice acquisition device can be configured at the desk position of each student, and in the course of lessons, the voice signals of the corresponding students can be acquired through the voice acquisition devices. And voice signals collected by each voice collecting device are transmitted to the server, and the server obtains the voice signals and carries out subsequent voice recognition and analysis processing on the voice signals.
S11: processing a voice signal to obtain a voice characteristic vector;
and processing the voice signals to obtain voice feature vectors, wherein the voice feature vectors comprise voice features such as tone features, speech speed features, intonation features, pronunciation frequency features, accent features, word usage and the like. The voice signal processing method comprises frequency domain signal processing, time domain signal processing, denoising processing, voice enhancement processing and the like, belongs to the prior art, and detailed description is not given to the specific method flow of the voice signal processing.
S12: inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result;
in some embodiments, the emotion recognition model is pre-established by acquiring voice signals of a plurality of recognition objects, processing the voice signals to obtain a plurality of groups of voice feature vectors, inputting the plurality of groups of voice feature vectors as training samples into a classifier for classification training to obtain the emotion recognition model. The MFCC features can be obtained by processing the voice signals by using a Mel frequency cepstrum coefficient method and are used as training samples of the model.
Optionally, the emotion recognition model can recognize one emotion recognition result of happiness, hurt, anger, fear, surprise, confusion and the like according to the tone features, the speech rate features, the tone features and the pronunciation frequency features in the input speech feature vector. For example, mood is moderate, mood is slow, intonation is down, pronunciation frequency is low, the first emotion recognition result output by the emotion recognition model is sad, mood is questioned, intonation is up, the first emotion recognition result output by the emotion recognition model is suspicion, mood is anger, intonation is fast, intonation is up, pronunciation frequency is fast, the first emotion recognition result output by the emotion recognition model is anger, and the like. The tone type, the speed of speech, the tone type and the pronunciation frequency can be determined according to preset threshold values.
S13: searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
in some embodiments, an emotion word database is established in advance, the emotion word database includes words with different accents corresponding to various emotions, and the emotion word database is searched by the words according to accent features in the speech feature vector to obtain a second emotion recognition result. For example, the word "take a good care", "haha", "too good", the word "what" is found to give a second emotion recognition result of happy, the word "what" is found to give a second emotion recognition result of doubtful or surprised, the word "unlawful phrase" is found to give a second emotion recognition result of angry, and the like.
S14: and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
In some embodiments, a first emotion recognition result is obtained by emotion recognition model recognition through tone features, speech speed features, tone features and pronunciation frequency features, a second emotion recognition result is obtained by accent features and word use and emotion word database recognition, and a final emotion recognition result is obtained comprehensively according to the first emotion recognition result and the second emotion recognition result. For example, if the first emotion recognition result and the second emotion recognition result are both happy, the final emotion recognition result is happy, if the first emotion recognition result is suspicious, the second emotion recognition result is suspicious or surprised, the final emotion recognition result is suspicious, and if the first emotion recognition result is angry and the second emotion recognition result is not matched, the final emotion recognition result is angry and the like.
In some embodiments, the system further comprises an identity information database for storing the voice feature vectors of the recognition objects. The method comprises the steps of collecting voice signals of an identification object in advance, processing the voice signals to obtain voice characteristic vectors, and storing identity information of the identification object and the corresponding voice characteristic vectors in an identity information database. Processing the voice signal according to the acquired voice signal to obtain a voice feature vector to be matched, searching the identity information database according to the voice feature vector to be matched, and if a search result is obtained, taking the search result as matched identity information, namely, the embodiment of the invention can identify the identity information of the identification object according to the voice signal of the identification object.
In an application scene of a school, voice signals of each student collected by the voice collecting equipment on each student desk are sent to the server, and the server processes the voice signals of each channel according to the obtained multiple channels of voice signals to obtain voice feature vectors corresponding to the voice signals of each channel. Searching an identity information database according to each group of voice feature vectors, and searching to obtain identity information matched with each group of voice feature vectors, namely identifying the identity information (information such as name, gender, class and the like) of students according to the voice feature vectors; according to each group of voice feature vectors, utilizing an emotion recognition model to recognize to obtain first emotion recognition results corresponding to each group of voice feature vectors; and finally, obtaining emotion recognition results corresponding to the voice feature vectors of each group according to the first emotion recognition result and the second emotion recognition result, and obtaining the emotion state of each student by combining the recognized identity information.
Fig. 2 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. As shown in the drawings, an apparatus for recognizing emotion through voice according to an embodiment of the present invention includes:
the voice acquisition module is used for acquiring a voice signal of the recognition object;
the voice processing module is used for processing the voice signals to obtain voice characteristic vectors;
the first recognition module is used for inputting the voice feature vector into the emotion recognition model and recognizing to obtain a first emotion recognition result;
the second recognition module is used for searching the emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and the recognition result module is used for obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
In some embodiments, a voice signal of the recognition object may be collected by a voice collecting apparatus.
In the application scene of school, a voice acquisition device can be configured at the desk position of each student, and in the course of lessons, the voice signals of the corresponding students can be acquired through the voice acquisition devices. The voice signals collected by the voice collecting devices are transmitted to the server, and the voice obtaining module of the server obtains the voice signals and carries out subsequent voice recognition and analysis processing on the voice signals.
In some embodiments, the speech processing module processes the speech signal to obtain a speech feature vector, where the speech feature vector includes speech features such as mood features, speech rate features, intonation features, pronunciation frequency features, accent features, and word usage. The voice signal processing method comprises frequency domain signal processing, time domain signal processing, denoising processing, voice enhancement processing and the like, belongs to the prior art, and detailed description is not given to the specific method flow of the voice signal processing.
In some embodiments, the emotion recognition model is pre-established by acquiring voice signals of a plurality of recognition objects, processing the voice signals to obtain a plurality of groups of voice feature vectors, and performing classification training by using the plurality of groups of voice feature vectors as training samples to obtain the emotion recognition model.
The first recognition module can recognize one emotion recognition result of happiness, injury, anger, fear, surprise, confusion and the like according to tone features, speech speed features, tone features and pronunciation frequency features in the input voice feature vector by using the emotion recognition model. For example, mood is moderate, mood is slow, intonation is down, pronunciation frequency is low, the first emotion recognition result output by the emotion recognition model is sad, mood is questioned, intonation is up, the first emotion recognition result output by the emotion recognition model is suspicion, mood is anger, intonation is fast, intonation is up, pronunciation frequency is fast, the first emotion recognition result output by the emotion recognition model is anger, and the like.
In some embodiments, an emotion word database is established in advance, the emotion word database includes words with different accents corresponding to various emotions, and the second recognition module searches the emotion word database according to the accent features in the speech feature vector and the words to obtain a second emotion recognition result. For example, the word "take a good care", "haha", "too good", the word "what" is found to give a second emotion recognition result of happy, the word "what" is found to give a second emotion recognition result of doubtful or surprised, the word "unlawful phrase" is found to give a second emotion recognition result of angry, and the like.
In some embodiments, a first emotion recognition result is obtained by emotion recognition model recognition through tone features, speech speed features, tone features and pronunciation frequency features, a second emotion recognition result is obtained by accent features and word use and emotion word database recognition, and the recognition result module obtains a final emotion recognition result comprehensively according to the first emotion recognition result and the second emotion recognition result. For example, if the first emotion recognition result and the second emotion recognition result are both happy, the final emotion recognition result is happy, if the first emotion recognition result is suspicious, the second emotion recognition result is suspicious or surprised, the final emotion recognition result is suspicious, and if the first emotion recognition result is angry and the second emotion recognition result is not matched, the final emotion recognition result is angry and the like.
The device for recognizing emotion through voice of the embodiment of the present invention further includes:
and the identity recognition module is used for searching the identity information database according to the voice feature vector to obtain the identity information matched with the recognition object.
In some embodiments, the identity recognition module searches the identity information database according to the voice feature vector, and obtains the identity information of the recognition object according to the search result.
And the identity information database is used for storing the voice feature vectors of the recognition objects. The method comprises the steps of collecting voice signals of an identification object in advance, processing the voice signals to obtain voice characteristic vectors, and storing identity information of the identification object and the corresponding voice characteristic vectors in an identity information database. Processing the voice signal according to the acquired voice signal to obtain a voice feature vector to be matched, searching the identity information database according to the voice feature vector to be matched, and if a search result is obtained, taking the search result as matched identity information, namely, the embodiment of the invention can identify the identity information of the identification object according to the voice signal of the identification object.
In an application scene of a school, voice signals of each student collected by the voice collecting equipment on each student desk are sent to the server, and the server processes the voice signals of each channel according to the obtained multiple channels of voice signals to obtain voice feature vectors corresponding to the voice signals of each channel. Searching an identity information database according to each group of voice feature vectors, and searching to obtain identity information matched with each group of voice feature vectors, namely identifying the identity information (information such as name, gender, class and the like) of students according to the voice feature vectors; according to each group of voice feature vectors, utilizing an emotion recognition model to recognize to obtain first emotion recognition results corresponding to each group of voice feature vectors; and finally, obtaining emotion recognition results corresponding to the voice feature vectors of each group according to the first emotion recognition result and the second emotion recognition result, and obtaining the emotion state of each student by combining the recognized identity information.
In view of the above object, the embodiment of the present invention further provides an embodiment of an apparatus for performing the method for recognizing emotion through voice. The device comprises:
one or more processors, and a memory.
The apparatus performing the method of recognizing emotion by voice may further include: an input device and an output device.
The processor, memory, input device, and output device may be connected by a bus or other means.
The memory, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method of recognizing emotion through voice in the embodiments of the present invention. The processor executes various functional applications of the server and data processing by running the nonvolatile software programs, instructions and modules stored in the memory, that is, implements the method of recognizing emotion by voice of the above-described method embodiments.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an apparatus performing the method of recognizing emotion by voice, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the processor, and these remote memories may be connected to the member user behavior monitoring device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device may receive input numeric or character information and generate key signal inputs related to user settings and function control of the device performing the method of recognizing emotion by voice. The output device may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform a method of recognizing emotion through voice in any of the method embodiments described above. The technical effect of the embodiment of the device for executing the method for recognizing emotion through voice is the same as or similar to that of any method embodiment.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the processing method of the list item operation in any method embodiment. Embodiments of the non-transitory computer storage medium may be the same or similar in technical effect to any of the method embodiments described above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program that can be stored in a computer-readable storage medium and that, when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The technical effect of the embodiment of the computer program is the same as or similar to that of any of the method embodiments described above.
Furthermore, the apparatuses, devices, etc. described in the present disclosure may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, etc., and may also be large terminal devices, such as a server, etc., and therefore the scope of protection of the present disclosure should not be limited to a specific type of apparatus, device. The client disclosed by the present disclosure may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
Furthermore, the method according to the present disclosure may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method of the present disclosure.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (11)
1. A method for recognizing emotion through voice, comprising:
acquiring a voice signal of a recognition object;
processing the voice signal to obtain a voice characteristic vector;
inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result;
searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
2. The method of claim 1, wherein the speech feature vector comprises mood features, pace features, intonation features, pronunciation frequency features, accent features, and vocabularies.
3. The method according to claim 2, wherein the mood characteristic, the speech rate characteristic, the intonation characteristic and the pronunciation frequency characteristic are input into the emotion recognition model, and the first emotion recognition result is obtained through recognition.
4. The method according to claim 2, wherein the emotion word database is searched for words based on the accent features to obtain the second emotion recognition result.
5. The method of claim 1, further comprising:
and searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
6. An apparatus for recognizing emotion by voice, comprising:
the voice acquisition module is used for acquiring a voice signal of the recognition object;
the voice processing module is used for processing the voice signal to obtain a voice characteristic vector;
the first recognition module is used for inputting the voice feature vector into an emotion recognition model and recognizing to obtain a first emotion recognition result;
the second recognition module is used for searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and the recognition result module is used for obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
7. The apparatus of claim 6, wherein the speech feature vector comprises mood features, pace features, intonation features, pronunciation frequency features, accent features, and vocabularies.
8. The apparatus of claim 7,
and the first recognition module is used for inputting the tone features, the speed features, the tone features and the pronunciation frequency features into the emotion recognition model and recognizing to obtain the first emotion recognition result.
9. The apparatus of claim 7,
and the second recognition module is used for searching the emotion word database by words according to the accent characteristics to obtain a second emotion recognition result.
10. The apparatus of claim 6, further comprising:
and the identity recognition module is used for searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910569691.XA CN111354377B (en) | 2019-06-27 | 2019-06-27 | Method and device for recognizing emotion through voice and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910569691.XA CN111354377B (en) | 2019-06-27 | 2019-06-27 | Method and device for recognizing emotion through voice and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111354377A true CN111354377A (en) | 2020-06-30 |
CN111354377B CN111354377B (en) | 2022-11-18 |
Family
ID=71198109
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910569691.XA Active CN111354377B (en) | 2019-06-27 | 2019-06-27 | Method and device for recognizing emotion through voice and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111354377B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112002348A (en) * | 2020-09-07 | 2020-11-27 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN113241096A (en) * | 2021-07-09 | 2021-08-10 | 明品云(北京)数据科技有限公司 | Emotion monitoring device and method |
CN117935865A (en) * | 2024-03-22 | 2024-04-26 | 江苏斑马软件技术有限公司 | User emotion analysis method and system for personalized marketing |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650633A (en) * | 2016-11-29 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Driver emotion recognition method and device |
CN107066514A (en) * | 2017-01-23 | 2017-08-18 | 深圳亲友科技有限公司 | The Emotion identification method and system of the elderly |
CN107818786A (en) * | 2017-10-25 | 2018-03-20 | 维沃移动通信有限公司 | A kind of call voice processing method, mobile terminal |
CN108764010A (en) * | 2018-03-23 | 2018-11-06 | 姜涵予 | Emotional state determines method and device |
CN109033257A (en) * | 2018-07-06 | 2018-12-18 | 中国平安人寿保险股份有限公司 | Talk about art recommended method, device, computer equipment and storage medium |
CN109036405A (en) * | 2018-07-27 | 2018-12-18 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device, equipment and storage medium |
CN109087670A (en) * | 2018-08-30 | 2018-12-25 | 西安闻泰电子科技有限公司 | Mood analysis method, system, server and storage medium |
CN109254669A (en) * | 2017-07-12 | 2019-01-22 | 腾讯科技(深圳)有限公司 | A kind of expression picture input method, device, electronic equipment and system |
CN109410986A (en) * | 2018-11-21 | 2019-03-01 | 咪咕数字传媒有限公司 | A kind of Emotion identification method, apparatus and storage medium |
CN109767765A (en) * | 2019-01-17 | 2019-05-17 | 平安科技(深圳)有限公司 | Talk about art matching process and device, storage medium, computer equipment |
-
2019
- 2019-06-27 CN CN201910569691.XA patent/CN111354377B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650633A (en) * | 2016-11-29 | 2017-05-10 | 上海智臻智能网络科技股份有限公司 | Driver emotion recognition method and device |
CN107066514A (en) * | 2017-01-23 | 2017-08-18 | 深圳亲友科技有限公司 | The Emotion identification method and system of the elderly |
CN109254669A (en) * | 2017-07-12 | 2019-01-22 | 腾讯科技(深圳)有限公司 | A kind of expression picture input method, device, electronic equipment and system |
CN107818786A (en) * | 2017-10-25 | 2018-03-20 | 维沃移动通信有限公司 | A kind of call voice processing method, mobile terminal |
CN108764010A (en) * | 2018-03-23 | 2018-11-06 | 姜涵予 | Emotional state determines method and device |
CN109033257A (en) * | 2018-07-06 | 2018-12-18 | 中国平安人寿保险股份有限公司 | Talk about art recommended method, device, computer equipment and storage medium |
CN109036405A (en) * | 2018-07-27 | 2018-12-18 | 百度在线网络技术(北京)有限公司 | Voice interactive method, device, equipment and storage medium |
CN109087670A (en) * | 2018-08-30 | 2018-12-25 | 西安闻泰电子科技有限公司 | Mood analysis method, system, server and storage medium |
CN109410986A (en) * | 2018-11-21 | 2019-03-01 | 咪咕数字传媒有限公司 | A kind of Emotion identification method, apparatus and storage medium |
CN109767765A (en) * | 2019-01-17 | 2019-05-17 | 平安科技(深圳)有限公司 | Talk about art matching process and device, storage medium, computer equipment |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112002348A (en) * | 2020-09-07 | 2020-11-27 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN112002348B (en) * | 2020-09-07 | 2021-12-28 | 复旦大学 | Method and system for recognizing speech anger emotion of patient |
CN113241096A (en) * | 2021-07-09 | 2021-08-10 | 明品云(北京)数据科技有限公司 | Emotion monitoring device and method |
CN117935865A (en) * | 2024-03-22 | 2024-04-26 | 江苏斑马软件技术有限公司 | User emotion analysis method and system for personalized marketing |
Also Published As
Publication number | Publication date |
---|---|
CN111354377B (en) | 2022-11-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109741732B (en) | Named entity recognition method, named entity recognition device, equipment and medium | |
CN109461437B (en) | Verification content generation method and related device for lip language identification | |
CN110634472B (en) | Speech recognition method, server and computer readable storage medium | |
CN112530408A (en) | Method, apparatus, electronic device, and medium for recognizing speech | |
CN112786007A (en) | Speech synthesis method, device, readable medium and electronic equipment | |
CN109686383A (en) | A kind of speech analysis method, device and storage medium | |
CN111259148A (en) | Information processing method, device and storage medium | |
CN111028845A (en) | Multi-audio recognition method, device, equipment and readable storage medium | |
CN110544470B (en) | Voice recognition method and device, readable storage medium and electronic equipment | |
CN111354377B (en) | Method and device for recognizing emotion through voice and electronic equipment | |
CN112183107A (en) | Audio processing method and device | |
CN110890088A (en) | Voice information feedback method and device, computer equipment and storage medium | |
US11580994B2 (en) | Speech recognition | |
CN110826637A (en) | Emotion recognition method, system and computer-readable storage medium | |
CN111858876A (en) | Knowledge base generation method and text search method and device | |
CN109947971A (en) | Image search method, device, electronic equipment and storage medium | |
CN111179910A (en) | Speed of speech recognition method and apparatus, server, computer readable storage medium | |
KR20210071713A (en) | Speech Skill Feedback System | |
CN111339809A (en) | Classroom behavior analysis method and device and electronic equipment | |
CN107910005B (en) | Target service positioning method and device for interactive text | |
JP2015175859A (en) | Pattern recognition device, pattern recognition method, and pattern recognition program | |
CN111522937B (en) | Speaking recommendation method and device and electronic equipment | |
CN110544472B (en) | Method for improving performance of voice task using CNN network structure | |
Vasquez-Correa et al. | Wavelet-based time-frequency representations for automatic recognition of emotions from speech | |
CN114913859B (en) | Voiceprint recognition method, voiceprint recognition device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |