CN111354377A - Method and device for recognizing emotion through voice and electronic equipment - Google Patents

Method and device for recognizing emotion through voice and electronic equipment Download PDF

Info

Publication number
CN111354377A
CN111354377A CN201910569691.XA CN201910569691A CN111354377A CN 111354377 A CN111354377 A CN 111354377A CN 201910569691 A CN201910569691 A CN 201910569691A CN 111354377 A CN111354377 A CN 111354377A
Authority
CN
China
Prior art keywords
voice
emotion
recognition result
emotion recognition
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910569691.XA
Other languages
Chinese (zh)
Other versions
CN111354377B (en
Inventor
鲁召选
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Honghe Innovation Information Technology Co Ltd
Original Assignee
Shenzhen Honghe Innovation Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Honghe Innovation Information Technology Co Ltd filed Critical Shenzhen Honghe Innovation Information Technology Co Ltd
Priority to CN201910569691.XA priority Critical patent/CN111354377B/en
Publication of CN111354377A publication Critical patent/CN111354377A/en
Application granted granted Critical
Publication of CN111354377B publication Critical patent/CN111354377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • General Health & Medical Sciences (AREA)
  • Child & Adolescent Psychology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for recognizing emotion through voice and electronic equipment, wherein the method comprises the following steps: acquiring a voice signal of a recognition object; processing the voice signal to obtain a voice characteristic vector; inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result; searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result; and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result. The invention can recognize emotion through voice.

Description

Method and device for recognizing emotion through voice and electronic equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a device for recognizing emotion through voice and electronic equipment.
Background
The voice has various characteristics, the individual category of the voice can be identified through the voice, and for human beings, the emotion of the human can be identified according to different characteristics of the voice. In the education field, through voice recognition student's mood, can help the teacher in time to know student's the condition, the teacher of being convenient for adjusts the teaching mode, improves the teaching effect, or in time discovers the unusual student of mood and carries out the front guide.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for recognizing emotion through voice, and an electronic device, which are capable of recognizing emotion through voice.
In view of the above object, the present invention provides a method for recognizing emotion by voice, comprising:
acquiring a voice signal of a recognition object;
processing the voice signal to obtain a voice characteristic vector;
inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result;
searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
Optionally, the speech feature vector includes a mood feature, a speech rate feature, a intonation feature, a pronunciation frequency feature, an accent feature, and a word.
Optionally, the mood characteristic, the speech speed characteristic, the intonation characteristic and the pronunciation frequency characteristic are input into the emotion recognition model, and the first emotion recognition result is obtained through recognition.
Optionally, the emotion word database is searched for words according to the accent features, and the second emotion recognition result is obtained.
Optionally, the method further includes:
and searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
An embodiment of the present invention further provides a device for recognizing emotion through voice, including:
the voice acquisition module is used for acquiring a voice signal of the recognition object;
the voice processing module is used for processing the voice signal to obtain a voice characteristic vector;
the first recognition module is used for inputting the voice feature vector into an emotion recognition model and recognizing to obtain a first emotion recognition result;
the second recognition module is used for searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and the recognition result module is used for obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
Optionally, the speech feature vector includes a mood feature, a speech rate feature, a intonation feature, a pronunciation frequency feature, an accent feature, and a word.
Optionally, the first recognition module is configured to input the mood characteristic, the speed characteristic, the intonation characteristic, and the pronunciation frequency characteristic into the emotion recognition model, and recognize to obtain the first emotion recognition result.
Optionally, the second recognition module is configured to search the emotion word database by using words according to the accent features to obtain the second emotion recognition result.
Optionally, the apparatus further comprises:
and the identity recognition module is used for searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
The embodiment of the invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the method for recognizing the emotion through the voice when executing the program.
As can be seen from the above, according to the method and apparatus for recognizing emotion through voice and the electronic device provided by the present invention, the voice signal of the recognition object is obtained, the voice signal is processed to obtain the voice feature vector, the first emotion recognition result is obtained by using the emotion recognition model according to the voice feature vector, the emotion word database is searched according to the voice feature vector to obtain the second emotion recognition result, and the final emotion recognition result is obtained according to the first emotion recognition result and the second emotion recognition result. The invention can recognize emotion through voice.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention. As shown in the figure, the method for recognizing emotion through voice according to the embodiment of the present invention includes:
s10: acquiring a voice signal of a recognition object;
in some embodiments, a voice signal of the recognition object may be collected by a voice collecting apparatus.
In the application scene of school, a voice acquisition device can be configured at the desk position of each student, and in the course of lessons, the voice signals of the corresponding students can be acquired through the voice acquisition devices. And voice signals collected by each voice collecting device are transmitted to the server, and the server obtains the voice signals and carries out subsequent voice recognition and analysis processing on the voice signals.
S11: processing a voice signal to obtain a voice characteristic vector;
and processing the voice signals to obtain voice feature vectors, wherein the voice feature vectors comprise voice features such as tone features, speech speed features, intonation features, pronunciation frequency features, accent features, word usage and the like. The voice signal processing method comprises frequency domain signal processing, time domain signal processing, denoising processing, voice enhancement processing and the like, belongs to the prior art, and detailed description is not given to the specific method flow of the voice signal processing.
S12: inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result;
in some embodiments, the emotion recognition model is pre-established by acquiring voice signals of a plurality of recognition objects, processing the voice signals to obtain a plurality of groups of voice feature vectors, inputting the plurality of groups of voice feature vectors as training samples into a classifier for classification training to obtain the emotion recognition model. The MFCC features can be obtained by processing the voice signals by using a Mel frequency cepstrum coefficient method and are used as training samples of the model.
Optionally, the emotion recognition model can recognize one emotion recognition result of happiness, hurt, anger, fear, surprise, confusion and the like according to the tone features, the speech rate features, the tone features and the pronunciation frequency features in the input speech feature vector. For example, mood is moderate, mood is slow, intonation is down, pronunciation frequency is low, the first emotion recognition result output by the emotion recognition model is sad, mood is questioned, intonation is up, the first emotion recognition result output by the emotion recognition model is suspicion, mood is anger, intonation is fast, intonation is up, pronunciation frequency is fast, the first emotion recognition result output by the emotion recognition model is anger, and the like. The tone type, the speed of speech, the tone type and the pronunciation frequency can be determined according to preset threshold values.
S13: searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
in some embodiments, an emotion word database is established in advance, the emotion word database includes words with different accents corresponding to various emotions, and the emotion word database is searched by the words according to accent features in the speech feature vector to obtain a second emotion recognition result. For example, the word "take a good care", "haha", "too good", the word "what" is found to give a second emotion recognition result of happy, the word "what" is found to give a second emotion recognition result of doubtful or surprised, the word "unlawful phrase" is found to give a second emotion recognition result of angry, and the like.
S14: and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
In some embodiments, a first emotion recognition result is obtained by emotion recognition model recognition through tone features, speech speed features, tone features and pronunciation frequency features, a second emotion recognition result is obtained by accent features and word use and emotion word database recognition, and a final emotion recognition result is obtained comprehensively according to the first emotion recognition result and the second emotion recognition result. For example, if the first emotion recognition result and the second emotion recognition result are both happy, the final emotion recognition result is happy, if the first emotion recognition result is suspicious, the second emotion recognition result is suspicious or surprised, the final emotion recognition result is suspicious, and if the first emotion recognition result is angry and the second emotion recognition result is not matched, the final emotion recognition result is angry and the like.
In some embodiments, the system further comprises an identity information database for storing the voice feature vectors of the recognition objects. The method comprises the steps of collecting voice signals of an identification object in advance, processing the voice signals to obtain voice characteristic vectors, and storing identity information of the identification object and the corresponding voice characteristic vectors in an identity information database. Processing the voice signal according to the acquired voice signal to obtain a voice feature vector to be matched, searching the identity information database according to the voice feature vector to be matched, and if a search result is obtained, taking the search result as matched identity information, namely, the embodiment of the invention can identify the identity information of the identification object according to the voice signal of the identification object.
In an application scene of a school, voice signals of each student collected by the voice collecting equipment on each student desk are sent to the server, and the server processes the voice signals of each channel according to the obtained multiple channels of voice signals to obtain voice feature vectors corresponding to the voice signals of each channel. Searching an identity information database according to each group of voice feature vectors, and searching to obtain identity information matched with each group of voice feature vectors, namely identifying the identity information (information such as name, gender, class and the like) of students according to the voice feature vectors; according to each group of voice feature vectors, utilizing an emotion recognition model to recognize to obtain first emotion recognition results corresponding to each group of voice feature vectors; and finally, obtaining emotion recognition results corresponding to the voice feature vectors of each group according to the first emotion recognition result and the second emotion recognition result, and obtaining the emotion state of each student by combining the recognized identity information.
Fig. 2 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. As shown in the drawings, an apparatus for recognizing emotion through voice according to an embodiment of the present invention includes:
the voice acquisition module is used for acquiring a voice signal of the recognition object;
the voice processing module is used for processing the voice signals to obtain voice characteristic vectors;
the first recognition module is used for inputting the voice feature vector into the emotion recognition model and recognizing to obtain a first emotion recognition result;
the second recognition module is used for searching the emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and the recognition result module is used for obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
In some embodiments, a voice signal of the recognition object may be collected by a voice collecting apparatus.
In the application scene of school, a voice acquisition device can be configured at the desk position of each student, and in the course of lessons, the voice signals of the corresponding students can be acquired through the voice acquisition devices. The voice signals collected by the voice collecting devices are transmitted to the server, and the voice obtaining module of the server obtains the voice signals and carries out subsequent voice recognition and analysis processing on the voice signals.
In some embodiments, the speech processing module processes the speech signal to obtain a speech feature vector, where the speech feature vector includes speech features such as mood features, speech rate features, intonation features, pronunciation frequency features, accent features, and word usage. The voice signal processing method comprises frequency domain signal processing, time domain signal processing, denoising processing, voice enhancement processing and the like, belongs to the prior art, and detailed description is not given to the specific method flow of the voice signal processing.
In some embodiments, the emotion recognition model is pre-established by acquiring voice signals of a plurality of recognition objects, processing the voice signals to obtain a plurality of groups of voice feature vectors, and performing classification training by using the plurality of groups of voice feature vectors as training samples to obtain the emotion recognition model.
The first recognition module can recognize one emotion recognition result of happiness, injury, anger, fear, surprise, confusion and the like according to tone features, speech speed features, tone features and pronunciation frequency features in the input voice feature vector by using the emotion recognition model. For example, mood is moderate, mood is slow, intonation is down, pronunciation frequency is low, the first emotion recognition result output by the emotion recognition model is sad, mood is questioned, intonation is up, the first emotion recognition result output by the emotion recognition model is suspicion, mood is anger, intonation is fast, intonation is up, pronunciation frequency is fast, the first emotion recognition result output by the emotion recognition model is anger, and the like.
In some embodiments, an emotion word database is established in advance, the emotion word database includes words with different accents corresponding to various emotions, and the second recognition module searches the emotion word database according to the accent features in the speech feature vector and the words to obtain a second emotion recognition result. For example, the word "take a good care", "haha", "too good", the word "what" is found to give a second emotion recognition result of happy, the word "what" is found to give a second emotion recognition result of doubtful or surprised, the word "unlawful phrase" is found to give a second emotion recognition result of angry, and the like.
In some embodiments, a first emotion recognition result is obtained by emotion recognition model recognition through tone features, speech speed features, tone features and pronunciation frequency features, a second emotion recognition result is obtained by accent features and word use and emotion word database recognition, and the recognition result module obtains a final emotion recognition result comprehensively according to the first emotion recognition result and the second emotion recognition result. For example, if the first emotion recognition result and the second emotion recognition result are both happy, the final emotion recognition result is happy, if the first emotion recognition result is suspicious, the second emotion recognition result is suspicious or surprised, the final emotion recognition result is suspicious, and if the first emotion recognition result is angry and the second emotion recognition result is not matched, the final emotion recognition result is angry and the like.
The device for recognizing emotion through voice of the embodiment of the present invention further includes:
and the identity recognition module is used for searching the identity information database according to the voice feature vector to obtain the identity information matched with the recognition object.
In some embodiments, the identity recognition module searches the identity information database according to the voice feature vector, and obtains the identity information of the recognition object according to the search result.
And the identity information database is used for storing the voice feature vectors of the recognition objects. The method comprises the steps of collecting voice signals of an identification object in advance, processing the voice signals to obtain voice characteristic vectors, and storing identity information of the identification object and the corresponding voice characteristic vectors in an identity information database. Processing the voice signal according to the acquired voice signal to obtain a voice feature vector to be matched, searching the identity information database according to the voice feature vector to be matched, and if a search result is obtained, taking the search result as matched identity information, namely, the embodiment of the invention can identify the identity information of the identification object according to the voice signal of the identification object.
In an application scene of a school, voice signals of each student collected by the voice collecting equipment on each student desk are sent to the server, and the server processes the voice signals of each channel according to the obtained multiple channels of voice signals to obtain voice feature vectors corresponding to the voice signals of each channel. Searching an identity information database according to each group of voice feature vectors, and searching to obtain identity information matched with each group of voice feature vectors, namely identifying the identity information (information such as name, gender, class and the like) of students according to the voice feature vectors; according to each group of voice feature vectors, utilizing an emotion recognition model to recognize to obtain first emotion recognition results corresponding to each group of voice feature vectors; and finally, obtaining emotion recognition results corresponding to the voice feature vectors of each group according to the first emotion recognition result and the second emotion recognition result, and obtaining the emotion state of each student by combining the recognized identity information.
In view of the above object, the embodiment of the present invention further provides an embodiment of an apparatus for performing the method for recognizing emotion through voice. The device comprises:
one or more processors, and a memory.
The apparatus performing the method of recognizing emotion by voice may further include: an input device and an output device.
The processor, memory, input device, and output device may be connected by a bus or other means.
The memory, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method of recognizing emotion through voice in the embodiments of the present invention. The processor executes various functional applications of the server and data processing by running the nonvolatile software programs, instructions and modules stored in the memory, that is, implements the method of recognizing emotion by voice of the above-described method embodiments.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of an apparatus performing the method of recognizing emotion by voice, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory remotely located from the processor, and these remote memories may be connected to the member user behavior monitoring device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device may receive input numeric or character information and generate key signal inputs related to user settings and function control of the device performing the method of recognizing emotion by voice. The output device may include a display device such as a display screen.
The one or more modules are stored in the memory and, when executed by the one or more processors, perform a method of recognizing emotion through voice in any of the method embodiments described above. The technical effect of the embodiment of the device for executing the method for recognizing emotion through voice is the same as or similar to that of any method embodiment.
The embodiment of the invention also provides a non-transitory computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the processing method of the list item operation in any method embodiment. Embodiments of the non-transitory computer storage medium may be the same or similar in technical effect to any of the method embodiments described above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes in the methods of the above embodiments may be implemented by a computer program that can be stored in a computer-readable storage medium and that, when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. The technical effect of the embodiment of the computer program is the same as or similar to that of any of the method embodiments described above.
Furthermore, the apparatuses, devices, etc. described in the present disclosure may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, etc., and may also be large terminal devices, such as a server, etc., and therefore the scope of protection of the present disclosure should not be limited to a specific type of apparatus, device. The client disclosed by the present disclosure may be applied to any one of the above electronic terminal devices in the form of electronic hardware, computer software, or a combination of both.
Furthermore, the method according to the present disclosure may also be implemented as a computer program executed by a CPU, which may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method of the present disclosure.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) described herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.
In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (11)

1. A method for recognizing emotion through voice, comprising:
acquiring a voice signal of a recognition object;
processing the voice signal to obtain a voice characteristic vector;
inputting the voice feature vector into an emotion recognition model, and recognizing to obtain a first emotion recognition result;
searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
2. The method of claim 1, wherein the speech feature vector comprises mood features, pace features, intonation features, pronunciation frequency features, accent features, and vocabularies.
3. The method according to claim 2, wherein the mood characteristic, the speech rate characteristic, the intonation characteristic and the pronunciation frequency characteristic are input into the emotion recognition model, and the first emotion recognition result is obtained through recognition.
4. The method according to claim 2, wherein the emotion word database is searched for words based on the accent features to obtain the second emotion recognition result.
5. The method of claim 1, further comprising:
and searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
6. An apparatus for recognizing emotion by voice, comprising:
the voice acquisition module is used for acquiring a voice signal of the recognition object;
the voice processing module is used for processing the voice signal to obtain a voice characteristic vector;
the first recognition module is used for inputting the voice feature vector into an emotion recognition model and recognizing to obtain a first emotion recognition result;
the second recognition module is used for searching an emotion word database according to the voice feature vector to obtain a second emotion recognition result;
and the recognition result module is used for obtaining a final emotion recognition result according to the first emotion recognition result and the second emotion recognition result.
7. The apparatus of claim 6, wherein the speech feature vector comprises mood features, pace features, intonation features, pronunciation frequency features, accent features, and vocabularies.
8. The apparatus of claim 7,
and the first recognition module is used for inputting the tone features, the speed features, the tone features and the pronunciation frequency features into the emotion recognition model and recognizing to obtain the first emotion recognition result.
9. The apparatus of claim 7,
and the second recognition module is used for searching the emotion word database by words according to the accent characteristics to obtain a second emotion recognition result.
10. The apparatus of claim 6, further comprising:
and the identity recognition module is used for searching an identity information database according to the voice feature vector to obtain identity information matched with the recognition object.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the program.
CN201910569691.XA 2019-06-27 2019-06-27 Method and device for recognizing emotion through voice and electronic equipment Active CN111354377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910569691.XA CN111354377B (en) 2019-06-27 2019-06-27 Method and device for recognizing emotion through voice and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910569691.XA CN111354377B (en) 2019-06-27 2019-06-27 Method and device for recognizing emotion through voice and electronic equipment

Publications (2)

Publication Number Publication Date
CN111354377A true CN111354377A (en) 2020-06-30
CN111354377B CN111354377B (en) 2022-11-18

Family

ID=71198109

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910569691.XA Active CN111354377B (en) 2019-06-27 2019-06-27 Method and device for recognizing emotion through voice and electronic equipment

Country Status (1)

Country Link
CN (1) CN111354377B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112002348A (en) * 2020-09-07 2020-11-27 复旦大学 Method and system for recognizing speech anger emotion of patient
CN113241096A (en) * 2021-07-09 2021-08-10 明品云(北京)数据科技有限公司 Emotion monitoring device and method
CN117935865A (en) * 2024-03-22 2024-04-26 江苏斑马软件技术有限公司 User emotion analysis method and system for personalized marketing

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650633A (en) * 2016-11-29 2017-05-10 上海智臻智能网络科技股份有限公司 Driver emotion recognition method and device
CN107066514A (en) * 2017-01-23 2017-08-18 深圳亲友科技有限公司 The Emotion identification method and system of the elderly
CN107818786A (en) * 2017-10-25 2018-03-20 维沃移动通信有限公司 A kind of call voice processing method, mobile terminal
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device
CN109033257A (en) * 2018-07-06 2018-12-18 中国平安人寿保险股份有限公司 Talk about art recommended method, device, computer equipment and storage medium
CN109036405A (en) * 2018-07-27 2018-12-18 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium
CN109087670A (en) * 2018-08-30 2018-12-25 西安闻泰电子科技有限公司 Mood analysis method, system, server and storage medium
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN109410986A (en) * 2018-11-21 2019-03-01 咪咕数字传媒有限公司 A kind of Emotion identification method, apparatus and storage medium
CN109767765A (en) * 2019-01-17 2019-05-17 平安科技(深圳)有限公司 Talk about art matching process and device, storage medium, computer equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650633A (en) * 2016-11-29 2017-05-10 上海智臻智能网络科技股份有限公司 Driver emotion recognition method and device
CN107066514A (en) * 2017-01-23 2017-08-18 深圳亲友科技有限公司 The Emotion identification method and system of the elderly
CN109254669A (en) * 2017-07-12 2019-01-22 腾讯科技(深圳)有限公司 A kind of expression picture input method, device, electronic equipment and system
CN107818786A (en) * 2017-10-25 2018-03-20 维沃移动通信有限公司 A kind of call voice processing method, mobile terminal
CN108764010A (en) * 2018-03-23 2018-11-06 姜涵予 Emotional state determines method and device
CN109033257A (en) * 2018-07-06 2018-12-18 中国平安人寿保险股份有限公司 Talk about art recommended method, device, computer equipment and storage medium
CN109036405A (en) * 2018-07-27 2018-12-18 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium
CN109087670A (en) * 2018-08-30 2018-12-25 西安闻泰电子科技有限公司 Mood analysis method, system, server and storage medium
CN109410986A (en) * 2018-11-21 2019-03-01 咪咕数字传媒有限公司 A kind of Emotion identification method, apparatus and storage medium
CN109767765A (en) * 2019-01-17 2019-05-17 平安科技(深圳)有限公司 Talk about art matching process and device, storage medium, computer equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112002348A (en) * 2020-09-07 2020-11-27 复旦大学 Method and system for recognizing speech anger emotion of patient
CN112002348B (en) * 2020-09-07 2021-12-28 复旦大学 Method and system for recognizing speech anger emotion of patient
CN113241096A (en) * 2021-07-09 2021-08-10 明品云(北京)数据科技有限公司 Emotion monitoring device and method
CN117935865A (en) * 2024-03-22 2024-04-26 江苏斑马软件技术有限公司 User emotion analysis method and system for personalized marketing

Also Published As

Publication number Publication date
CN111354377B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN109741732B (en) Named entity recognition method, named entity recognition device, equipment and medium
CN109461437B (en) Verification content generation method and related device for lip language identification
CN110634472B (en) Speech recognition method, server and computer readable storage medium
CN112530408A (en) Method, apparatus, electronic device, and medium for recognizing speech
CN112786007A (en) Speech synthesis method, device, readable medium and electronic equipment
CN109686383A (en) A kind of speech analysis method, device and storage medium
CN111259148A (en) Information processing method, device and storage medium
CN111028845A (en) Multi-audio recognition method, device, equipment and readable storage medium
CN110544470B (en) Voice recognition method and device, readable storage medium and electronic equipment
CN111354377B (en) Method and device for recognizing emotion through voice and electronic equipment
CN112183107A (en) Audio processing method and device
CN110890088A (en) Voice information feedback method and device, computer equipment and storage medium
US11580994B2 (en) Speech recognition
CN110826637A (en) Emotion recognition method, system and computer-readable storage medium
CN111858876A (en) Knowledge base generation method and text search method and device
CN109947971A (en) Image search method, device, electronic equipment and storage medium
CN111179910A (en) Speed of speech recognition method and apparatus, server, computer readable storage medium
KR20210071713A (en) Speech Skill Feedback System
CN111339809A (en) Classroom behavior analysis method and device and electronic equipment
CN107910005B (en) Target service positioning method and device for interactive text
JP2015175859A (en) Pattern recognition device, pattern recognition method, and pattern recognition program
CN111522937B (en) Speaking recommendation method and device and electronic equipment
CN110544472B (en) Method for improving performance of voice task using CNN network structure
Vasquez-Correa et al. Wavelet-based time-frequency representations for automatic recognition of emotions from speech
CN114913859B (en) Voiceprint recognition method, voiceprint recognition device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant