WO2020135194A1 - Emotion engine technology-based voice interaction method, smart terminal, and storage medium - Google Patents

Emotion engine technology-based voice interaction method, smart terminal, and storage medium Download PDF

Info

Publication number
WO2020135194A1
WO2020135194A1 PCT/CN2019/126443 CN2019126443W WO2020135194A1 WO 2020135194 A1 WO2020135194 A1 WO 2020135194A1 CN 2019126443 W CN2019126443 W CN 2019126443W WO 2020135194 A1 WO2020135194 A1 WO 2020135194A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
emotion
user
voice interaction
voice
Prior art date
Application number
PCT/CN2019/126443
Other languages
French (fr)
Chinese (zh)
Inventor
温馨
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Publication of WO2020135194A1 publication Critical patent/WO2020135194A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present disclosure relates to the field of Internet interaction technology, and in particular, to a voice interaction method based on emotion engine technology, an intelligent terminal, and a storage medium.
  • auxiliary interactive devices such as keyboard and mouse
  • the interaction method is more convenient, and it also provides the possibility for the reform of mobile devices, so that the technology can exist in everyone's pocket.
  • artificial intelligence technology provides a more natural way of interaction-natural language conversation, users can interact with the machine through natural language, obtain information, and use conversational interaction as the core, using voice technology and images
  • voice technology and images
  • the combination of technology, face recognition technology, and enhanced display technology enables technology to exist in ubiquitous devices.
  • Conversational artificial intelligence is a major application of AI technology, mainly refers to the use of speech recognition, semantic understanding, multi-round dialogue and natural language understanding and other technologies to allow users to communicate with robots in natural language.
  • AI technology mainly refers to the use of speech recognition, semantic understanding, multi-round dialogue and natural language understanding and other technologies to allow users to communicate with robots in natural language.
  • the solid dialogue management mechanism is used to replies or answer the user.
  • this method can complete the user's basic dialogue needs, it cannot be based on the user. To respond more intelligently to the current emotions and inconvenient to use.
  • the technical problem to be solved by the present disclosure is to provide a voice interaction method, an intelligent terminal and a storage medium based on the emotion engine technology in view of the above-mentioned defects of the prior art, aiming to solve the problem between the user and the intelligent robot in the prior art In the solidified response mode adopted by the conversation, the intelligent robot cannot make more intelligent responses and other questions based on the user's current emotions.
  • a voice interaction method based on emotion engine technology wherein the method includes:
  • the emotion of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the voice interaction information is output.
  • the voice interaction method based on the emotion engine technology wherein the step of obtaining voice information input by the user and obtaining face image information of the user specifically includes:
  • the voice interaction method based on the emotion engine technology, wherein the step of extracting emotion recognition features from the voice information and face image information, and inputting the extracted emotion recognition features into a preset emotion recognition model, This includes:
  • the other voice signal in the obtained voice information is used to extract the user's audio emotional state through a preset voice emotion sensor;
  • the obtained facial image information is used to extract the user's expression state through a preset expression recognition system
  • the text emotion state, audio emotion state and expression state are input to a preset emotion recognition model.
  • the voice interaction method based on the emotion engine technology, wherein the step of extracting the user's text emotional state from the text information specifically includes:
  • the sentence information and the user's personal information are input into a preset emotional state recognition model to identify the user's text emotional state.
  • the voice interaction method based on the emotion engine technology wherein the step of inputting the sentence information and the user's personal information into a preset emotion recognition model to recognize the user's textual emotion state specifically includes:
  • the first emotional state is used as the user's text emotional state; if the first confidence score is less than the threshold, the first emotional state and the second emotional state are dynamically Sort, and determine the user's text emotional state according to the result of dynamic sorting.
  • the sentence information includes: Chinese word segmentation information of the sentence, part-of-speech tagging information of the sentence after word segmentation processing, sentence sentence information of the sentence, and sentence2vector information of the sentence.
  • the parameters involved in the dynamic ranking include: text length, extracted keywords, text input by the user, and confidence scores of the first/second emotional states.
  • the voice interaction method based on the emotion engine technology, wherein the step of calculating the user's emotion through the emotion recognition model, generating an anthropomorphic voice interaction strategy based on the user's emotion, and outputting the voice interaction information, specific include:
  • the emotion recognition model performs weighted calculation on the input text emotion state, audio emotion state and expression state to obtain the user's emotion
  • a dialogue generation model is used to generate voice interactive information with emotions, and output voice interactive information.
  • the emotion database includes a variety of emotions and emotion feature information corresponding to each emotion.
  • the emotion recognition model performs weighted calculation on the input text emotion state, audio emotion state and expression state, it includes:
  • Unweighted weights are set in advance for the text emotional state, audio emotional state, and expression state.
  • the voice interaction method based on the emotion engine technology, wherein the step of generating the voice interaction information with emotion through the dialogue generation model specifically includes:
  • the dialogue generation model receives the question information input by the user, and records the user's historical dialogue information, position change information, and mood change information;
  • the voice interaction information is also used to update the dialogue generation model.
  • the dialogue generation module is implemented by a three-layer recurrent neural network RNN architecture, and is based on a back propagation algorithm algorithm.
  • the voice interaction method based on the emotion engine technology, wherein the emotion recognition of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the output step further includes:
  • An intelligent terminal comprising: a processor and a storage medium communicatively connected to the processor, the storage medium is suitable for storing multiple instructions; the processor is suitable for calling instructions in the storage medium to execute the implementation The steps of the voice interaction method based on the emotion engine technology described in any one of the above.
  • a storage medium on which a plurality of instructions are stored, wherein the instructions are suitable for being loaded and executed by a processor to perform the steps of implementing any of the voice interaction methods based on the emotion engine technology described above.
  • the present disclosure analyzes the emotions of users and adds emotions to voice interactions, thereby shaping an emotional intelligent voice interaction mode and making the user and the smart terminal achieve more interesting voice cunning, Get rid of the mechanized and passive communication mode of the traditional voice interaction system, which provides convenience for users.
  • FIG. 1 is a flowchart of a preferred embodiment of a voice interaction method based on emotion engine technology of the present disclosure.
  • FIG. 2 is a general control flowchart of the voice interaction method based on the emotion engine technology of the present disclosure.
  • FIG. 3 is a logic flow diagram of an emotion recognition system of a voice interaction method based on emotion engine technology of the present disclosure.
  • FIG. 4 is a functional schematic diagram of the smart terminal of the present disclosure.
  • the voice interaction method based on the emotion engine technology provided by the present disclosure can be applied to a terminal.
  • the terminal may be, but not limited to, various personal computers, notebook computers, mobile phones, tablet computers, in-vehicle computers, and portable wearable devices.
  • the terminal of the present disclosure uses a multi-core processor.
  • the processor of the terminal may be at least one of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a video processing unit (Video Processing Unit, VPU), and the like.
  • the present disclosure provides a voice interaction method based on emotion engine technology. Specifically, as shown in FIG. 1, the method includes:
  • Step S100 Acquire voice information input by the user, and acquire face image information of the user.
  • Step S200 Extract emotion recognition features from the voice information and face image information, and input the extracted emotion recognition features to a preset emotion recognition model.
  • Step S300 Calculate the user's emotion through the emotion recognition model, and generate an anthropomorphic voice interaction strategy based on the user's emotion, and output it.
  • this embodiment provides a voice interaction method based on the emotion engine technology, which mainly analyzes the user's emotions and adds emotions to the voice interactions, thereby shaping an emotional intelligent voice interaction method and getting rid of the traditional
  • the mechanized and passive communication mode of the voice interaction system provides convenience for users.
  • whether the user performs voice interaction is monitored in real time.
  • the voice information input by the user is obtained by presetting a remote device or a remote pickup device; considering that the user is in different emotional states
  • the facial expressions will also change, and the changes in facial expressions also represent the user's emotional state, so this embodiment presets the camera device, and when the user performs voice interaction, the user's person is acquired in real time through the preset camera device Face image information, combined with voice information and user's face image information, can more accurately determine the user's current mood.
  • the acquired voice information includes language and text information when the user speaks and tone and speed information when the user speaks, for example, a happy expression sentence appears in the user's language expression, indicating that the user may currently be in a relatively happy state , The user speaks faster, and the louder sound means the user is in a more excited state.
  • some words in the user's voice information can also indicate the user's current emotional state. For example, the user's voice information contains the words "very annoying", which shows that the user is more anxious. Therefore, in order to better analyze the voice information used, as shown in FIG.
  • the obtained voice information is divided into two voice signals, and one voice signal passes the preset ASR (Automatic Sp Recognition)
  • the speech recognition module converts into text information and extracts the user's text emotional state from the text information; another voice signal extracts the user's audio emotional state through a preset voice emotion sensor. Since the user's facial expression will change under different emotional states, the facial expression information obtained by the user can be extracted through the preset facial expression recognition system; finally, the extracted text emotional state, The audio emotion state and the expression state are input to a preset emotion recognition module for emotion recognition, which can more accurately recognize the user emotion.
  • ASR Automatic Sp Recognition
  • extracting the user's text emotional state from the text information specifically includes the following steps:
  • Step 301 Extract sentence information according to user input.
  • Step 302 Acquire user personal information from the memory map.
  • Step 303 Input sentence information into the rule model, extract keywords, and obtain the user's first emotional state and first confidence score according to the keywords.
  • Step 304 Input sentence information and user information into the deep learning model to obtain the user's second emotional state and second confidence score.
  • Step 305 Determine whether the first confidence score is greater than a preset threshold. If not, perform step 307. If yes, perform step 306.
  • Step 306 Use the first emotional state as the user's text emotional state.
  • Step 307 Dynamically sort the first emotional state and the second emotional state, and make a decision according to the result of the dynamic sorting.
  • the sentence information in the above steps includes: Chinese word segmentation information of the sentence, part-of-speech tagging information after the sentence segmentation, sentence pattern information of the sentence, sentence2vector information of the sentence, etc.; personal information of the user includes: name, gender , Birthday, age, constellation, user's psychological state and physiological state, etc.
  • the parameters involved in dynamic sorting include: text length, extracted keywords, text input by the user, confidence scores of the first/second emotional states, etc. When the above-mentioned first confidence score is less than the preset threshold, in this embodiment, these parameters are used as inputs to enter the dynamic ranking model, which affects the ranking result by giving different weights, and finally determines the user's text emotional state according to the ranking result.
  • the parameter selection and weight adjustment of dynamic sequencing will be adjusted according to the performance of the overall model.
  • the method of extracting sentence information includes existing Chinese word segmentation information and part-of-speech tagging information technology, which will not be repeated here.
  • the emotion data of a plurality of users are pre-stated to generate an emotion database.
  • the emotion database includes 22 emotions such as emotions of human emotions, and also includes each emotion Corresponding emotional feature information, for example, the happy emotional feature information in the emotional database includes corresponding expression image data (such as mouth corners rising), corresponding high-frequency text (such as happy, happy and other words), corresponding tone and intonation Information (eg, cheerful tone), etc. Therefore, when happy emotions are found in the emotion database, corresponding emotion feature information can be obtained. Similarly, corresponding emotion states can also be found in the emotion database through the emotion feature information.
  • this embodiment uses the acquired text emotional state, audio emotional state, and expression state Input to the emotion recognition model, weighted calculation of the input text emotion state, audio emotion state and expression state through the emotion recognition model, and the calculation result is compared and matched with the preset emotion database to obtain the user's emotion.
  • the emotion recognition model is formed by inputting various collected text emotional states, audio emotional states, and expression states into the network model for deep learning and training in advance.
  • unweighted weights may be set in advance for the text emotional state, audio emotional state, and expression state.
  • the text emotional state weight is set to 20%
  • the audio emotional state weight is set to 50%
  • the expression state weight is set. 30%, according to the set weights, you can get the emotion closest to the user's current emotional state.
  • they are compared and matched in the emotion database to obtain emotional feature information corresponding to the emotions.
  • the emotional feature information is used for emotional intention decision-making and user portrait filling, so as to generate voice interaction information with emotions.
  • the emotional feature information corresponding to happiness includes: frequently appearing words such as "happy”, "happy”, expression images with raised mouth corners, and cheerful intonation.
  • emotional intention decisions that is, the emotional feedback that the smart terminal needs to make according to the user's emotions
  • response information with corresponding emotions that is, the same Output response information with pleasant emotions to achieve a more humanized voice interaction.
  • a dialogue generation module when interacting with voice information, a dialogue generation module is used to realize the response.
  • the dialogue generation module receives the question information input by the user, and records the user's historical dialogue information, position change information, and mood changes Information, and then analyze the user's personal information and activity status based on the above information to obtain user portrait information; based on problem information and user portrait information (the user portrait information at this time is analyzed based on the emotional feature information corresponding to the user's emotion) To generate voice interaction information.
  • the emotional response information can be made according to the user's emotional state, but also different voice interaction strategies can be made in real time according to the user's emotional changes.
  • the emotions carried in the voice interaction strategy will also be Changes in real time.
  • the dialog generation module in this embodiment is implemented by a three-layer recurrent neural network RNN architecture, using a back propagation algorithm (backpropagation, bp) algorithm as the basis.
  • backpropagation backpropagation, bp
  • the method provided in this embodiment further includes: adding voice interaction information to the dialogue generation model, which can be mixed Use rules, machine learning, and deep learning techniques to save the voice interaction information from the voice interaction information and learn and train the dialogue generation model, thereby updating the dialogue generation model to make the dialogue generation model better generate emotional Voice response information.
  • the corresponding scene structured data is set according to the character characteristics corresponding to the different scenes.
  • the user's emotions and the obtained emotional intention decision results that is, the emotional feedback made by the smart terminal according to the user's emotions
  • the custom scene structured data is used as The second input of the network model
  • an emotion engine model that outputs anthropomorphic voice interaction strategies in specific scenarios is obtained.
  • the emotion engine model can enable intelligent terminals to automatically output anthropomorphic voice interactions according to specific scenarios Strategy to achieve more intelligent and humanized voice interaction.
  • the present disclosure also provides an intelligent terminal, and a functional block diagram thereof may be shown in FIG. 4.
  • the intelligent terminal includes a processor, a memory, a network interface, a display screen, and a temperature sensor connected through a system bus.
  • the processor of the intelligent terminal is used to provide computing and control capabilities.
  • the memory of the intelligent terminal includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and computer programs.
  • the internal memory provides an environment for the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the intelligent terminal is used to communicate with external terminals through a network connection.
  • the computer program is executed by the processor to implement a voice interaction method based on the emotion engine technology.
  • the display screen of the intelligent terminal may be a liquid crystal display screen or an electronic ink display screen.
  • the temperature sensor of the intelligent terminal is set in the interior of the intelligent terminal in advance to detect the current operating temperature of the internal device.
  • FIG. 4 is only a block diagram of a part of the structure related to the disclosed solution, and does not constitute a limitation on the smart terminal to which the disclosed solution is applied.
  • the specific smart terminal It may include more or less components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • an intelligent terminal which includes a memory and a processor, and a computer program is stored in the memory.
  • the processor executes the computer program, at least the following steps may be implemented:
  • the emotion of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the voice interaction information is output.
  • the processor when the processor executes the computer program, it can also be implemented: start a preset monitoring program to monitor whether the user performs voice interaction; when the user is monitored for voice interaction, start the preset remote device or remote control
  • the audio device obtains the voice information input by the user itself, and starts a preset camera to obtain the user's face information.
  • the acquired voice information is converted into text information through the preset ASR voice recognition module, and the user's text emotional state is extracted from the text information; the other voice signal passes the preset
  • the voice emotion sensor extracts the user's audio emotion state; the facial image information obtained through the preset expression recognition system can extract the user's expression state; the extracted text emotion state, audio emotion state and expression
  • the state is input to a preset emotion recognition module for emotion recognition.
  • the processor when the processor executes the computer program, it can also be realized: after acquiring text information of the user's voice interaction, extract sentence information according to the voice information input by the user, and obtain the user's personal information from the memory map; Information input rule model, extract keywords, and get the user's first emotional state and first confidence score according to the keywords; input sentence information and user information into the deep learning model to get the user's second emotional state and second confidence score ; Determine the size of the first confidence score and the preset threshold, when the first confidence score is greater than the preset threshold, the first emotional state is used as the user's emotional state; when the first confidence score is less than the preset threshold, will The first emotional state and the second emotional state are dynamically sorted, and decisions are made based on the results of the dynamic sorting.
  • the processor when the processor executes the computer program, it can also realize: pre-stating the emotional data of multiple users to generate an emotional database, and input the acquired text emotional state, audio emotional state, and expression state into the emotional recognition model Then, perform weighted calculation to obtain the user's emotions, compare and match the user's emotions with the preset emotion database, and obtain corresponding emotion feature information. Based on the obtained emotional feature information, emotional intention decision-making and user portrait filling are performed; based on the obtained emotional intention decision result and user portrait information, a voice generation information with emotion is generated through a dialogue generation model.
  • the dialogue generation model receives the question information input by the user, records the user's historical dialogue information, position change information, and mood change information, analyzes the user's personal information and activity status, and obtains the user's portrait information; according to the question Information and user portrait information to generate voice interaction information, which can also be used to update the dialogue generation model.
  • voice interaction information can also be used to update the dialogue generation model.
  • the emotional response information can be made according to the user's emotional state, but also different voice interaction strategies can be made in real time according to the user's emotional changes, and the emotions carried in the voice interaction strategy will also change in real time .
  • the processor when the processor executes the computer program, it can also be realized: using the user's emotions and the obtained emotional intent decision results as the first input of the network model; using the custom scene structured data as the network model's The second input; through the learning and training of the network model, get the emotion engine model that outputs the anthropomorphic voice interaction strategy in the specific scene.
  • the emotion engine model can enable the intelligent terminal to automatically output the anthropomorphic voice interaction strategy according to the specific scene. More intelligent and user-friendly voice interaction.
  • Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory can include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • SLDRAM synchronous chain (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • the present disclosure provides a voice interaction method based on the emotion engine technology.
  • the method includes: obtaining voice information input by a user, and obtaining face image information of the user; from the voice information and face image information Extract emotion recognition features, and input the extracted emotion recognition features into a preset emotion recognition model; calculate the user's emotions through the emotion recognition model, and generate an anthropomorphic voice interaction strategy based on the user's emotions, and output voice interactions information.
  • the present disclosure analyzes the user's emotions and adds emotions to the voice interaction, thereby shaping an emotional intelligent voice interaction mode, getting rid of the traditional voice interaction system's mechanized and passive communication mode, and providing convenience for users.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Disclosed are an emotion engine technology-based voice interaction method, a storage medium, and a smart terminal. The method comprises: acquiring voice information input by a user and acquiring face image information of the user; extracting emotion recognition features from the voice information and face image information and inputting the extracted emotion recognition features into a preset emotion recognition model; calculating the emotion of the user by means of the emotion recognition model, generating a personified voice interaction strategy on the basis of the emotion of the user, and outputting voice interaction information. In the present disclosure, a user's emotion is analyzed, and the emotion is added into voice interaction, thereby developing an emotional intelligent voice interaction mode, getting rid of inflexible and passive communication modes of traditional voice interaction systems, and providing convenience for users.

Description

基于情绪引擎技术的语音交互方法、智能终端及存储介质Voice interaction method, intelligent terminal and storage medium based on emotion engine technology
优先权priority
所述PCT专利申请要求申请日为2018年12月26日,申请号为201811605103.5的中国专利优先权,本专利申请结合了上述专利的技术方案。The PCT patent application requires the application date to be December 26, 2018, and the Chinese patent priority of the application number 201811605103.5. This patent application combines the technical solutions of the above patents.
技术领域Technical field
本公开涉及互联网交互技术领域,具体涉及一种基于情绪引擎技术的语音交互方法、智能终端及存储介质。The present disclosure relates to the field of Internet interaction technology, and in particular, to a voice interaction method based on emotion engine technology, an intelligent terminal, and a storage medium.
背景技术Background technique
随着人机交互技术的持续创新,人们的交互方式在不断改变,从鼠标、键盘、遥控器再到触屏,交互方式越来越简单。计算机的第一平台时代,人和机器的交互只能通过键盘和鼠标,这个时期的技术只能存在于机房,操作十分繁琐;到了第二平台时代,计算机增加了一些相对友好的交互界面设计,人们无需在DOS界面输入命令,通过简单的界面操作即可与计算机交互,交互体验得到了很大提升;到了第三平台时代,触屏技术兴起,人们直接动动手指即可完成交互操作,摆脱了键盘鼠标等辅助交互设备的束缚,交互方式更加便捷,同时也为移动设备的改革提供了可能,使得技术能够存在于人人的口袋里。而人工智能技术的兴起,为更加自然的交互方式提供了可能——自然语言会话,使用者可通过自然语言的方式与机器交互、获取信息,并以对话式交互为核心,将语音技术、图像技术、人脸识别技术、增强显示技术相结合,使技术存在于无处不在的设备中。With the continuous innovation of human-computer interaction technology, people's interaction methods are constantly changing, from mouse, keyboard, remote control to touch screen, the interaction method is getting simpler. In the era of the first platform of the computer, the interaction between humans and machines can only be achieved through the keyboard and mouse. The technology in this period can only exist in the computer room, and the operation is very cumbersome. In the era of the second platform, the computer added some relatively friendly interactive interface design. People do not need to enter commands on the DOS interface, they can interact with the computer through a simple interface operation, and the interactive experience has been greatly improved; in the era of the third platform, with the rise of touch screen technology, people can directly complete their interactive operations by moving their fingers. In addition to the constraints of auxiliary interactive devices such as keyboard and mouse, the interaction method is more convenient, and it also provides the possibility for the reform of mobile devices, so that the technology can exist in everyone's pocket. The rise of artificial intelligence technology provides a more natural way of interaction-natural language conversation, users can interact with the machine through natural language, obtain information, and use conversational interaction as the core, using voice technology and images The combination of technology, face recognition technology, and enhanced display technology enables technology to exist in ubiquitous devices.
会话式人工智能是AI技术的一个主要应用,主要是指利用语音识别、语义理解、多轮对话和自然语言理解等技术,让用户以自然语言的方式与机器人沟通。但是,目前用户与机器人之间语音交互的主要停留在被动任务式的对话,通过固化的对话管理机制对用户进行反问或应答,这种方式虽然能够完成用户基本的对话需求,但是并不能依据用户的当前情绪来进行更为智能的应答,使用不便。Conversational artificial intelligence is a major application of AI technology, mainly refers to the use of speech recognition, semantic understanding, multi-round dialogue and natural language understanding and other technologies to allow users to communicate with robots in natural language. However, the current voice interaction between the user and the robot mainly stays in a passive task-style dialogue. The solid dialogue management mechanism is used to replies or answer the user. Although this method can complete the user's basic dialogue needs, it cannot be based on the user. To respond more intelligently to the current emotions and inconvenient to use.
因此,现有技术还有待于改进和发展。Therefore, the existing technology needs to be improved and developed.
发明内容Summary of the invention
本公开要解决的技术问题在于,针对现有技术的上述缺陷,提供一种基于情绪引擎技术的语音交互方法、智能终端及存储介质,旨在解决现有技术中的用户在与智能机器人之间的对话采用的固化的应答模式,智能机器人并不能依据用户当前的情绪来做出更为智能的应答等问题。The technical problem to be solved by the present disclosure is to provide a voice interaction method, an intelligent terminal and a storage medium based on the emotion engine technology in view of the above-mentioned defects of the prior art, aiming to solve the problem between the user and the intelligent robot in the prior art In the solidified response mode adopted by the conversation, the intelligent robot cannot make more intelligent responses and other questions based on the user's current emotions.
本公开解决技术问题所采用的技术方案如下:The technical solutions adopted by the present disclosure to solve the technical problems are as follows:
一种基于情绪引擎技术的语音交互方法,其中,所述方法包括:A voice interaction method based on emotion engine technology, wherein the method includes:
获取用户输入的语音信息,并获取用户的人脸图像信息;Obtain the voice information input by the user, and obtain the user's face image information;
从所述语音信息与人脸图像信息中提取情绪识别特征,并将提取的情绪识别特征输入至预设的情绪识别模型;Extract emotion recognition features from the voice information and face image information, and input the extracted emotion recognition features into a preset emotion recognition model;
通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出语音交互信息。The emotion of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the voice interaction information is output.
所述的基于情绪引擎技术的语音交互方法,其中,所述获取用户输入的语音信息,并获取用户的人脸图像信息的步骤,具体包括:The voice interaction method based on the emotion engine technology, wherein the step of obtaining voice information input by the user and obtaining face image information of the user specifically includes:
通过预设的远程设备或者遥控器拾音设备获取用户输入的语音信息;Obtain the voice information input by the user through the preset remote device or remote control pickup device;
通过预设的摄像头设备获取用户的人脸图像信息。Obtain the user's face image information through a preset camera device.
所述的基于情绪引擎技术的语音交互方法,其中,所述从所述语音信息与人脸图像信息中提取情绪识别特征,并将提取的情绪识别特征输入至预设的情绪识别模型的步骤,具体包括:The voice interaction method based on the emotion engine technology, wherein the step of extracting emotion recognition features from the voice information and face image information, and inputting the extracted emotion recognition features into a preset emotion recognition model, This includes:
将获取到的语音信息中的一路语音信号通过ASR语音识别模块转换成文本信息,并从所述文本信息中提取用户的文本情绪状态;Convert one voice signal in the acquired voice information into text information through the ASR voice recognition module, and extract the user's text emotional state from the text information;
将获取到的语音信息中的另一路语音信号通过预设的语音情绪感知器提取出用户的音频情绪状态;The other voice signal in the obtained voice information is used to extract the user's audio emotional state through a preset voice emotion sensor;
将获取到的人脸图像信息通过预设的表情识别系统提取出用户的表情状态;The obtained facial image information is used to extract the user's expression state through a preset expression recognition system;
将所述文本情绪状态、音频情绪状态以及表情状态输入至预设的情绪识别模型。The text emotion state, audio emotion state and expression state are input to a preset emotion recognition model.
所述的基于情绪引擎技术的语音交互方法,其中,所述从所述文本信息中提取用户的文本情绪状态的步骤,具体包括:The voice interaction method based on the emotion engine technology, wherein the step of extracting the user's text emotional state from the text information specifically includes:
对所述文本信息进行特征提取,提取出语句信息,并根据所述语句信息从预 设的记忆图谱中获取到用户的个人信息;Perform feature extraction on the text information, extract sentence information, and obtain user's personal information from a preset memory map according to the sentence information;
将所述语句信息与用户的个人信息输入到预设的情绪状态识别模型,识别出用户的文本情绪状态。The sentence information and the user's personal information are input into a preset emotional state recognition model to identify the user's text emotional state.
所述的基于情绪引擎技术的语音交互方法,其中,所述将所述语句信息与用户的个人信息输入到预设的情绪识别模型,识别出用户的文本情绪状态的步骤,具体包括:The voice interaction method based on the emotion engine technology, wherein the step of inputting the sentence information and the user's personal information into a preset emotion recognition model to recognize the user's textual emotion state specifically includes:
从所述语句信息中提取出关键词,并根据所述关键词得到用户的第一情绪状态和第一信心分值;Extract keywords from the sentence information, and obtain the user's first emotional state and first confidence score according to the keywords;
将所述语句信息和用户的个人信息输入至所述情绪识别模型中,得到用户的第二情绪状态和第二信心分值;Input the sentence information and the user's personal information into the emotion recognition model to obtain the user's second emotional state and second confidence score;
将所述第一信心分值与预设的阈值进行比较;Compare the first confidence score with a preset threshold;
若所述第一信心分值大于阈值,将所述第一情绪状态作为用户的文本情绪状态;若所述第一信心分值小于阈值,将第一情绪状态和所述第二情绪状态进行动态排序,并根据动态排序的结果决定出用户的文本情绪状态。If the first confidence score is greater than the threshold, the first emotional state is used as the user's text emotional state; if the first confidence score is less than the threshold, the first emotional state and the second emotional state are dynamically Sort, and determine the user's text emotional state according to the result of dynamic sorting.
所述的基于情绪引擎技术的语音交互方法,其中,所述语句信息包括:语句的中文分词信息,语句在分词处理后的词性标注信息,语句的句式信息,语句的sentence2vector信息。In the voice interaction method based on the emotion engine technology, the sentence information includes: Chinese word segmentation information of the sentence, part-of-speech tagging information of the sentence after word segmentation processing, sentence sentence information of the sentence, and sentence2vector information of the sentence.
所述的基于情绪引擎技术的语音交互方法,其中,所述动态排序涉及的参数包括:文本长度、提取的关键词、用户输入的文本、第一/第二情感状态的信心分值。In the voice interaction method based on the emotion engine technology, the parameters involved in the dynamic ranking include: text length, extracted keywords, text input by the user, and confidence scores of the first/second emotional states.
所述的基于情绪引擎技术的语音交互方法,其中,所述通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出语音交互信息的步骤,具体包括:The voice interaction method based on the emotion engine technology, wherein the step of calculating the user's emotion through the emotion recognition model, generating an anthropomorphic voice interaction strategy based on the user's emotion, and outputting the voice interaction information, specific include:
所述情绪识别模型对输入的文本情绪状态、音频情绪状态以及表情状态进行加权计算,得出用户的情绪;The emotion recognition model performs weighted calculation on the input text emotion state, audio emotion state and expression state to obtain the user's emotion;
将得出的情绪与预设的情绪数据库中的情绪特征信息进行对比匹配,得出对应的情绪特征信息;Compare and match the obtained emotion with the emotion feature information in the preset emotion database to obtain the corresponding emotion feature information;
基于得到的情绪特征信息,进行情绪意图决策和用户画像填充;Based on the obtained emotional feature information, perform emotional intention decision and user portrait filling;
根据得到的情绪意图决策结果和用户画像信息,通过对话生成模型来生成带 有情绪的语音交互信息,并输出语音交互信息。Based on the obtained emotional intent decision result and user portrait information, a dialogue generation model is used to generate voice interactive information with emotions, and output voice interactive information.
所述情绪数据库中包括有多种情绪以及每一个每一种情绪所对应的情绪特征信息。The emotion database includes a variety of emotions and emotion feature information corresponding to each emotion.
所述的基于情绪引擎技术的语音交互方法,其中,所述情绪识别模型对输入的文本情绪状态、音频情绪状态以及表情状态进行加权计算之前,包括:In the voice interaction method based on the emotion engine technology, before the emotion recognition model performs weighted calculation on the input text emotion state, audio emotion state and expression state, it includes:
预先对文本情绪状态、音频情绪状态以及表情状态这三者设置不用的权重。Unweighted weights are set in advance for the text emotional state, audio emotional state, and expression state.
所述的基于情绪引擎技术的语音交互方法,其中,所述通过对话生成模型来生成带有情绪的语音交互信息的步骤,具体包括:The voice interaction method based on the emotion engine technology, wherein the step of generating the voice interaction information with emotion through the dialogue generation model specifically includes:
对话生成模型接收用户输入的问题信息,并记录用户的历史对话信息、位置变换信息以及情绪变化信息;The dialogue generation model receives the question information input by the user, and records the user's historical dialogue information, position change information, and mood change information;
根据所述历史对话信息、所述位置变换信息以及所述情绪变化信息分析出用户的个人信息和活动状态,得到用户画像信息;Analyzing the user's personal information and activity status according to the historical dialogue information, the position change information, and the emotion change information, to obtain user portrait information;
根据问题信息和用户画像信息,生成语音交互信息;所述语音交互信息还用于对所述对话生成模型进行更新。Generate voice interaction information according to the problem information and user portrait information; the voice interaction information is also used to update the dialogue generation model.
所述的基于情绪引擎技术的语音交互方法,其中,所述对话生成模块是通过三层循环神经网络RNN架构实现,且采用反向传播算法算法为基础。In the voice interaction method based on the emotion engine technology, the dialogue generation module is implemented by a three-layer recurrent neural network RNN architecture, and is based on a back propagation algorithm algorithm.
所述的基于情绪引擎技术的语音交互方法,其中,所述通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出的步骤,还包括:The voice interaction method based on the emotion engine technology, wherein the emotion recognition of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the output step further includes:
将用户的情绪以及得到的情绪意图决策结果作为网络模型的第一输入;Use the user's emotion and the obtained emotional intention decision result as the first input of the network model;
将自定义的场景结构化数据作为网络模型的第二输入;Use the customized scene structured data as the second input of the network model;
通过网络模型的学习与训练,得到在特定场景下输出拟人化语音交互策略的情绪引擎模型。Through the learning and training of the network model, an emotion engine model that outputs anthropomorphic voice interaction strategies under specific scenarios is obtained.
一种智能终端,其中,包括:处理器、与处理器通信连接的存储介质,所述存储介质适于存储多条指令;所述处理器适于调用所述存储介质中的指令,以执行实现上述任一项所述的基于情绪引擎技术的语音交互方法的步骤。An intelligent terminal, comprising: a processor and a storage medium communicatively connected to the processor, the storage medium is suitable for storing multiple instructions; the processor is suitable for calling instructions in the storage medium to execute the implementation The steps of the voice interaction method based on the emotion engine technology described in any one of the above.
一种存储介质,其上存储有多条指令,其中,所述指令适于由处理器加载并执行,以执行实现上述任一项所述的基于情绪引擎技术的语音交互方法的步骤。A storage medium on which a plurality of instructions are stored, wherein the instructions are suitable for being loaded and executed by a processor to perform the steps of implementing any of the voice interaction methods based on the emotion engine technology described above.
本公开的有益效果:本公开通过对用户情感的分析,并在语音交互中加入情 感,从而塑造出有情感的智能语音交互方式,使得用户与智能终端之间实现更为趣味性的语音狡猾,摆脱传统语音交互系统机械化、被动式的交流模式,给用户的使用提供了方便。Beneficial effect of the present disclosure: The present disclosure analyzes the emotions of users and adds emotions to voice interactions, thereby shaping an emotional intelligent voice interaction mode and making the user and the smart terminal achieve more interesting voice cunning, Get rid of the mechanized and passive communication mode of the traditional voice interaction system, which provides convenience for users.
附图说明BRIEF DESCRIPTION
图1是本公开的基于情绪引擎技术的语音交互方法的较佳实施例的流程图。FIG. 1 is a flowchart of a preferred embodiment of a voice interaction method based on emotion engine technology of the present disclosure.
图2是本公开的基于情绪引擎技术的语音交互方法的总体控制流程图。FIG. 2 is a general control flowchart of the voice interaction method based on the emotion engine technology of the present disclosure.
图3是本公开的基于情绪引擎技术的语音交互方法的情感识别系统逻辑流程图。FIG. 3 is a logic flow diagram of an emotion recognition system of a voice interaction method based on emotion engine technology of the present disclosure.
图4是本公开的智能终端的功能原理图。4 is a functional schematic diagram of the smart terminal of the present disclosure.
具体实施方式detailed description
为使本公开的目的、技术方案及优点更加清楚、明确,以下参照附图并举实施例对本公开进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本公开,并不用于限定本公开。In order to make the purpose, technical solutions and advantages of the disclosure more clear and unambiguous, the disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present disclosure and are not intended to limit the present disclosure.
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all the embodiments. The following description of at least one exemplary embodiment is actually merely illustrative, and in no way serves as any limitation to the present disclosure and its application or use. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present disclosure.
本公开提供的基于情绪引擎技术的语音交互方法,可以应用于终端中。其中,终端可以但不限于是各种个人计算机、笔记本电脑、手机、平板电脑、车载电脑和便携式可穿戴设备。本公开的终端采用多核处理器。其中,终端的处理器可以为中央处理器(Central Processing Unit,CPU),图形处理器(Graphics Processing Unit,GPU)、视频处理单元(Video Processing Unit,VPU)等中的至少一种。The voice interaction method based on the emotion engine technology provided by the present disclosure can be applied to a terminal. Among them, the terminal may be, but not limited to, various personal computers, notebook computers, mobile phones, tablet computers, in-vehicle computers, and portable wearable devices. The terminal of the present disclosure uses a multi-core processor. The processor of the terminal may be at least one of a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), a video processing unit (Video Processing Unit, VPU), and the like.
本公开提供一种基于情绪引擎技术的语音交互方法,具体如图1所示,所述方法包括:The present disclosure provides a voice interaction method based on emotion engine technology. Specifically, as shown in FIG. 1, the method includes:
步骤S100、获取用户输入的语音信息,并获取用户的人脸图像信息。Step S100: Acquire voice information input by the user, and acquire face image information of the user.
步骤S200、从所述语音信息与人脸图像信息中提取情绪识别特征,并将提取的情绪识别特征输入至预设的情绪识别模型。Step S200: Extract emotion recognition features from the voice information and face image information, and input the extracted emotion recognition features to a preset emotion recognition model.
步骤S300、通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出。Step S300: Calculate the user's emotion through the emotion recognition model, and generate an anthropomorphic voice interaction strategy based on the user's emotion, and output it.
由于目前的语音交互方式还停留在被动任务式的对话,通过固化的对话管理机制对用户进行反问或应答,这种方式通常枯燥无趣。为了解决上述问题,本实施例提供一种基于情绪引擎技术的语音交互方法,主要是通过对用户情感的分析,并在语音交互中加入情感,从而塑造出有情感的智能语音交互方式,摆脱传统语音交互系统机械化、被动式的交流模式,给用户的使用提供了方便。Since the current voice interaction mode still stays in a passive task-style dialogue, it is usually boring and uninteresting to ask or answer questions to users through a solid dialogue management mechanism. In order to solve the above problems, this embodiment provides a voice interaction method based on the emotion engine technology, which mainly analyzes the user's emotions and adds emotions to the voice interactions, thereby shaping an emotional intelligent voice interaction method and getting rid of the traditional The mechanized and passive communication mode of the voice interaction system provides convenience for users.
具体地,本实施例中会实时监测用户是否进行语音交互,当监测到用户进行语音交互时,通过预先设置远程设备或遥控拾音设备获取用户自身输入的语音信息;考虑到用户在不同情绪状态下的面部表情也会有所变化,而面部表情的变化也同样代表着用户的情绪状态,因此本实施例预先设置摄像头设备,当用户进行语音交互时通过预设的摄像头设备实时获取用户的人脸图像信息,结合语音信息和用户的人脸图像信息能够更加准确的判断出用户当前的情绪。Specifically, in this embodiment, whether the user performs voice interaction is monitored in real time. When the user is detected to perform voice interaction, the voice information input by the user is obtained by presetting a remote device or a remote pickup device; considering that the user is in different emotional states The facial expressions will also change, and the changes in facial expressions also represent the user's emotional state, so this embodiment presets the camera device, and when the user performs voice interaction, the user's person is acquired in real time through the preset camera device Face image information, combined with voice information and user's face image information, can more accurately determine the user's current mood.
进一步地,由于获取到的语音信息包括用户说话时的语言文字信息和用户说话时的语调语速信息等,如用户的语言表达中出现了高兴的表达语句说明用户当前可能处在相对高兴的状态,用户语速较快,声音较大则说明用户处于较激动状态。此外,用户的语音信息中的某一些字眼也是可以表示出用户当前的情绪状态的,例如,用户的语音信息中包含了“很烦”的字眼,这说明用户比较焦虑。因此,为了更好地对用于的语音信息进行分析,如图2中所示,本实施例中将获取到的语音信息分为两路语音信号,一路语音信号通过预设的ASR(AutomaticSp eech Recognition)语音识别模块转换成文本信息,并从所述文本信息中提取用户的文本情绪状态;另一路语音信号通过预设的语音情绪感知器提取出用户的音频情绪状态。而由于用户在不同情绪状态下的面部表情会有所变化,因此将获取到的人脸图像信息通过预设的表情识别系统就可以提取出用户的表情状态;最后将提取出的文本情绪状态、音频情绪状态以及表情状态输入至预设的情绪识别模块进行情绪识别,能够更加精确的识别到用户情绪。Further, since the acquired voice information includes language and text information when the user speaks and tone and speed information when the user speaks, for example, a happy expression sentence appears in the user's language expression, indicating that the user may currently be in a relatively happy state , The user speaks faster, and the louder sound means the user is in a more excited state. In addition, some words in the user's voice information can also indicate the user's current emotional state. For example, the user's voice information contains the words "very annoying", which shows that the user is more anxious. Therefore, in order to better analyze the voice information used, as shown in FIG. 2, in this embodiment, the obtained voice information is divided into two voice signals, and one voice signal passes the preset ASR (Automatic Sp Recognition) The speech recognition module converts into text information and extracts the user's text emotional state from the text information; another voice signal extracts the user's audio emotional state through a preset voice emotion sensor. Since the user's facial expression will change under different emotional states, the facial expression information obtained by the user can be extracted through the preset facial expression recognition system; finally, the extracted text emotional state, The audio emotion state and the expression state are input to a preset emotion recognition module for emotion recognition, which can more accurately recognize the user emotion.
具体地,如图3所示,本实施例中从所述文本信息中提取用户的文本情绪状态具体包括如下步骤:Specifically, as shown in FIG. 3, in this embodiment, extracting the user's text emotional state from the text information specifically includes the following steps:
步骤301、根据用户输入提取语句信息。Step 301: Extract sentence information according to user input.
步骤302、从记忆图谱中获取用户个人信息。Step 302: Acquire user personal information from the memory map.
步骤303、将语句信息输入规则模型,提取关键词,并根据关键词得到用户的第一情感状态和第一信心分值。Step 303: Input sentence information into the rule model, extract keywords, and obtain the user's first emotional state and first confidence score according to the keywords.
步骤304、将语句信息和用户信息输入深度学习模型得到用户的第二情感状态和第二信心分值。Step 304: Input sentence information and user information into the deep learning model to obtain the user's second emotional state and second confidence score.
步骤305、判断第一信心分值是否大于预设阈值,若否,则执行步骤307,若是,则执行步骤306。Step 305: Determine whether the first confidence score is greater than a preset threshold. If not, perform step 307. If yes, perform step 306.
步骤306、将第一情感状态作为用户的文本情感状态。Step 306: Use the first emotional state as the user's text emotional state.
步骤307、将第一情感状态和第二情感状态进行动态排序,根据动态排序的结果决策。Step 307: Dynamically sort the first emotional state and the second emotional state, and make a decision according to the result of the dynamic sorting.
在一种实现方式中,上述步骤中的语句信息包括:语句的中文分词信息,语句分词后的词性标注信息,语句的句式信息,语句的sentence2vector信息等;用户的个人信息包括:姓名、性别、生日、年龄、星座、用户的心理状态和生理状态等。动态排序涉及的参数包括:文本长度、提取的关键词、用户输入的文本、第一/第二情感状态的信心分值等。当上述第一信心分值小于预设的阈值时,本实施例这些参数作为输入进入动态排序模型,通过赋予不同权重影响排序结果,最后根据排序结果来判断出用户的文本情感状态。动态排序的参数选择和权重调整会根据整体模型的表现进行调整。语句信息的提取方式包括现有的中文分词信息和词性标注信息技术,在此不作赘述。In one implementation, the sentence information in the above steps includes: Chinese word segmentation information of the sentence, part-of-speech tagging information after the sentence segmentation, sentence pattern information of the sentence, sentence2vector information of the sentence, etc.; personal information of the user includes: name, gender , Birthday, age, constellation, user's psychological state and physiological state, etc. The parameters involved in dynamic sorting include: text length, extracted keywords, text input by the user, confidence scores of the first/second emotional states, etc. When the above-mentioned first confidence score is less than the preset threshold, in this embodiment, these parameters are used as inputs to enter the dynamic ranking model, which affects the ranking result by giving different weights, and finally determines the user's text emotional state according to the ranking result. The parameter selection and weight adjustment of dynamic sequencing will be adjusted according to the performance of the overall model. The method of extracting sentence information includes existing Chinese word segmentation information and part-of-speech tagging information technology, which will not be repeated here.
进一步地,本实施例中预先统计多个用户的情绪数据生成情绪数据库,在一种实现方式中,该情绪数据库中包含有人类情感的喜怒哀乐等22种情绪,并且还包括每一种情绪所对应的情绪特征信息,例如在该情绪数据库中愉快的情绪特征信息包含了对应的表情图像数据(如嘴角上扬)、对应的高频文字(例如开心,快乐等字眼),对应的语气及语调信息(如:欢快的语调)等。因此当在该情绪库中找到愉快的情绪,即可得到对应的情绪特征信息,同样,通过情绪特征信息也可以在情绪数据库中找到对应的情绪状态。Further, in this embodiment, the emotion data of a plurality of users are pre-stated to generate an emotion database. In one implementation, the emotion database includes 22 emotions such as emotions of human emotions, and also includes each emotion Corresponding emotional feature information, for example, the happy emotional feature information in the emotional database includes corresponding expression image data (such as mouth corners rising), corresponding high-frequency text (such as happy, happy and other words), corresponding tone and intonation Information (eg, cheerful tone), etc. Therefore, when happy emotions are found in the emotion database, corresponding emotion feature information can be obtained. Similarly, corresponding emotion states can also be found in the emotion database through the emotion feature information.
具体实施时,考虑到在不同的应用场景下用户语音信息以及面部图像信息对最终的情绪状态判断的影响权重可能不同,因此本实施例通过将获取到的文本情绪状态、音频情绪状态以及表情状态输入到情绪识别模型,通过情绪识别模型对输入的文本情绪状态、音频情绪状态以及表情状态进行加权计算,将计算结果与预设的情绪数据库进行对比匹配,可以得出用户的情绪。具体地,所述情绪识别模型是预先通过将采集到的各种文本情绪状态、音频情绪状态以及表情状态输入至网络模型中进行深度学习与训练而成。本实施例中可以预先对文本情绪状态、音频情绪状态以及表情状态这三者设置不用的权重,如,设置文本情绪状态的权重为20%,音频情绪状态的权重为50%,表情状态的权重为30%,根据设置的权重进行计算,就可以得出与用户当前情绪状态最为接近的情绪。然后根据得到的用户情绪在情绪数据库中进行对比匹配,得出该情绪对应的情绪特征信息,该情绪特征信息用于进行情绪意图决策和用户画像填充,以便生成带有情绪的语音交互信息。例如,当计算得到用户的情绪为愉快,因此愉快对应的情绪特征信息就包括:频繁出现的“开心”、“快乐”等字眼、嘴角上扬的表情图像以及欢快的语调,根据这些情绪特征信息就可以确定出用户画像以及用户当前具体的情绪,智能终端就可以做出相应的情绪意图决策(即智能终端根据用户情绪所要作出的情绪反馈),并做出带有相应情绪的应答信息,即同样输出带有愉快情绪的应答信息,实现更加人性化的语音交互。During specific implementation, considering that user voice information and facial image information may have different influence weights on the final emotional state judgment in different application scenarios, this embodiment uses the acquired text emotional state, audio emotional state, and expression state Input to the emotion recognition model, weighted calculation of the input text emotion state, audio emotion state and expression state through the emotion recognition model, and the calculation result is compared and matched with the preset emotion database to obtain the user's emotion. Specifically, the emotion recognition model is formed by inputting various collected text emotional states, audio emotional states, and expression states into the network model for deep learning and training in advance. In this embodiment, unweighted weights may be set in advance for the text emotional state, audio emotional state, and expression state. For example, the text emotional state weight is set to 20%, the audio emotional state weight is set to 50%, and the expression state weight is set. 30%, according to the set weights, you can get the emotion closest to the user's current emotional state. Then, based on the obtained user emotions, they are compared and matched in the emotion database to obtain emotional feature information corresponding to the emotions. The emotional feature information is used for emotional intention decision-making and user portrait filling, so as to generate voice interaction information with emotions. For example, when the user's emotion is calculated to be pleasant, the emotional feature information corresponding to happiness includes: frequently appearing words such as "happy", "happy", expression images with raised mouth corners, and cheerful intonation. According to these emotional feature information, You can determine the user's portrait and the user's current specific emotions, and the smart terminal can make corresponding emotional intention decisions (that is, the emotional feedback that the smart terminal needs to make according to the user's emotions), and make response information with corresponding emotions, that is, the same Output response information with pleasant emotions to achieve a more humanized voice interaction.
进一步地,本实施例中在进行语音信息交互时,采用的是对话生成模块来实现应答,具体地,对话生成模块接收用户输入的问题信息,记录用户的历史对话信息、位置变换信息以及情绪变化信息,然后根据上述信息分析出用户的个人信息和活动状态,得到用户画像信息;根据问题信息和用户画像信息(此时的用户画像信息就是基于用户的情绪所对应的情绪特征信息分析出来的),生成语音交互信息。可见,本实施例中不但可以根据用户的情绪状态做出带有情绪应答信息,而且还可以根据用户的情绪变化,实时做出不同的语音交互策略,语音交互策略中所带有的情绪也会实时变化。在一种实现方式中,本实施例中的对话生成模块通过三层循环神经网络RNN架构实现,采用反向传播算法(backpropagation,bp)算法为基础。在一种实现方式中,对话生成模型中的用户信息越完善,语音交互信息的准确率越高,因此,本实施例提供的方法还包括:将语音交互信息添加到 对话生成模型中,可以混合使用规则、机器学习、深度学习的技术来从语音交互信息中将语音交互信息保存并对对话生成模型进行学习与训练,从而更新对话生成模型,以使对话生成模型更好的生成带有情绪的语音应答信息。Further, in this embodiment, when interacting with voice information, a dialogue generation module is used to realize the response. Specifically, the dialogue generation module receives the question information input by the user, and records the user's historical dialogue information, position change information, and mood changes Information, and then analyze the user's personal information and activity status based on the above information to obtain user portrait information; based on problem information and user portrait information (the user portrait information at this time is analyzed based on the emotional feature information corresponding to the user's emotion) To generate voice interaction information. It can be seen that in this embodiment, not only the emotional response information can be made according to the user's emotional state, but also different voice interaction strategies can be made in real time according to the user's emotional changes. The emotions carried in the voice interaction strategy will also be Changes in real time. In an implementation manner, the dialog generation module in this embodiment is implemented by a three-layer recurrent neural network RNN architecture, using a back propagation algorithm (backpropagation, bp) algorithm as the basis. In one implementation, the more complete the user information in the dialogue generation model, the higher the accuracy of the voice interaction information. Therefore, the method provided in this embodiment further includes: adding voice interaction information to the dialogue generation model, which can be mixed Use rules, machine learning, and deep learning techniques to save the voice interaction information from the voice interaction information and learn and train the dialogue generation model, thereby updating the dialogue generation model to make the dialogue generation model better generate emotional Voice response information.
进一步地,本实施例中,考虑到在不同的场景下,交互方的人物特征属性会有所不同,根据不同的场景所对应的人物特征属性,设置对应的场景结构化数据。在获取到用户的情绪后,将用户的情绪以及得到的情绪意图决策结果(即智能终端根据用户的情绪所作出的情绪反馈)作为网络模型的第一输入;将自定义的场景结构化数据作为网络模型的第二输入;通过网络模型的学习与训练,得到在特定场景下输出拟人化语音交互策略的情绪引擎模型,该情绪引擎模型可以使得智能终端根据特定的场景自动输出拟人化的语音交互策略,实现更加智能且人性化的语音交互。Further, in this embodiment, considering that in different scenarios, the character characteristics of the interacting party will be different, the corresponding scene structured data is set according to the character characteristics corresponding to the different scenes. After acquiring the user's emotions, the user's emotions and the obtained emotional intention decision results (that is, the emotional feedback made by the smart terminal according to the user's emotions) are used as the first input of the network model; the custom scene structured data is used as The second input of the network model; through the learning and training of the network model, an emotion engine model that outputs anthropomorphic voice interaction strategies in specific scenarios is obtained. The emotion engine model can enable intelligent terminals to automatically output anthropomorphic voice interactions according to specific scenarios Strategy to achieve more intelligent and humanized voice interaction.
基于上述实施例,本公开还提供了一种智能终端,其原理框图可以如图4所示。该智能终端包括通过系统总线连接的处理器、存储器、网络接口、显示屏和温度传感器。其中,该智能终端的处理器用于提供计算和控制能力。该智能终端的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该智能终端的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种基于情绪引擎技术的语音交互方法。该智能终端的显示屏可以是液晶显示屏或者电子墨水显示屏,该智能终端的温度传感器是预先在智能终端内部设置,用于检测内部设备的当前运行温度。Based on the above embodiment, the present disclosure also provides an intelligent terminal, and a functional block diagram thereof may be shown in FIG. 4. The intelligent terminal includes a processor, a memory, a network interface, a display screen, and a temperature sensor connected through a system bus. Among them, the processor of the intelligent terminal is used to provide computing and control capabilities. The memory of the intelligent terminal includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer programs. The internal memory provides an environment for the operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used to communicate with external terminals through a network connection. The computer program is executed by the processor to implement a voice interaction method based on the emotion engine technology. The display screen of the intelligent terminal may be a liquid crystal display screen or an electronic ink display screen. The temperature sensor of the intelligent terminal is set in the interior of the intelligent terminal in advance to detect the current operating temperature of the internal device.
本领域技术人员可以理解,图4中示出的原理框图,仅仅是与本公开方案相关的部分结构的框图,并不构成对本公开方案所应用于其上的智能终端的限定,具体的智能终端可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the functional block diagram shown in FIG. 4 is only a block diagram of a part of the structure related to the disclosed solution, and does not constitute a limitation on the smart terminal to which the disclosed solution is applied. The specific smart terminal It may include more or less components than shown in the figures, or combine certain components, or have a different arrangement of components.
在一个实施例中,提供了一种智能终端,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时至少可以实现以下步骤:In one embodiment, an intelligent terminal is provided, which includes a memory and a processor, and a computer program is stored in the memory. When the processor executes the computer program, at least the following steps may be implemented:
获取用户输入的语音信息,并获取用户的人脸图像信息;Obtain the voice information input by the user, and obtain the user's face image information;
从所述语音信息与人脸图像信息中提取情绪识别特征,并将提取的情绪识别特征输入至预设的情绪识别模型;Extract emotion recognition features from the voice information and face image information, and input the extracted emotion recognition features into a preset emotion recognition model;
通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出语音交互信息。The emotion of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the voice interaction information is output.
在其中的一个实施例中,该处理器执行计算机程序时还可以实现:启动预设的监听程序监听用户是否进行语音交互;当监听到用户进行语音交互时,启动预设的远程设备或遥控拾音设备获取用户自身输入的语音信息,启动预设的摄像头获取用户的人脸信息。将获取到的语音信息分为两路语音信号,一路语音信息通过预设的ASR语音识别模块转换成文本信息,并从所述文本信息中提取用户的文本情绪状态;另一路语音信号通过预设的语音情绪感知器提取出用户的音频情绪状态;将获取到的人脸图像信息通过预设的表情识别系统就可以提取出用户的表情状态;将提取出的文本情绪状态、音频情绪状态以及表情状态输入至预设的情绪识别模块进行情绪识别。In one of the embodiments, when the processor executes the computer program, it can also be implemented: start a preset monitoring program to monitor whether the user performs voice interaction; when the user is monitored for voice interaction, start the preset remote device or remote control The audio device obtains the voice information input by the user itself, and starts a preset camera to obtain the user's face information. Divide the acquired voice information into two voice signals, one voice information is converted into text information through the preset ASR voice recognition module, and the user's text emotional state is extracted from the text information; the other voice signal passes the preset The voice emotion sensor extracts the user's audio emotion state; the facial image information obtained through the preset expression recognition system can extract the user's expression state; the extracted text emotion state, audio emotion state and expression The state is input to a preset emotion recognition module for emotion recognition.
在其中的一个实施例中,该处理器执行计算机程序时还可以实现:获取用户语音交互的文本信息后,根据用户输入的语音信息提取语句信息,从记忆图谱中获取用户个人信息;将将语句信息输入规则模型,提取关键词,并根据关键词得到用户的第一情感状态和第一信心分值;将语句信息和用户信息输入深度学习模型得到用户的第二情感状态和第二信心分值;判断第一信心分值与预设阈值的大小,当第一信心分值大于预设阈值时,将第一情感状态作为用户的情感状态;当第一信心分值小于预设阈值时,将第一情感状态和第二情感状态进行动态排序,根据动态排序的结果决策。In one of the embodiments, when the processor executes the computer program, it can also be realized: after acquiring text information of the user's voice interaction, extract sentence information according to the voice information input by the user, and obtain the user's personal information from the memory map; Information input rule model, extract keywords, and get the user's first emotional state and first confidence score according to the keywords; input sentence information and user information into the deep learning model to get the user's second emotional state and second confidence score ; Determine the size of the first confidence score and the preset threshold, when the first confidence score is greater than the preset threshold, the first emotional state is used as the user's emotional state; when the first confidence score is less than the preset threshold, will The first emotional state and the second emotional state are dynamically sorted, and decisions are made based on the results of the dynamic sorting.
在其中的一个实施例中,该处理器执行计算机程序时还可以实现:预先统计多个用户的情绪数据生成情绪数据库,将获取到的文本情绪状态、音频情绪状态以及表情状态输入到情绪识别模型后进行加权计算,得到用户的情绪,将用户的情绪与预设的情绪数据库进行对比匹配,得到对应的情绪特征信息。基于得到的情绪特征信息,进行情绪意图决策和用户画像填充;根据得到的情绪意图决策结果和用户画像信息,通过对话生成模型来生成带有情绪的语音交互信息。在具体的语音交互过程中,对话生成模型接收用户输入的问题信息,记录用户的历史对话信息、位置变换信息以及情绪变化信息,分析出用户的个人信息和活动状态,得到用户画像信息;根据问题信息和用户画像信息,生成语音交互信息,该语音交互信息还可以用来更新对话生成模型。本实施例中不但可以根据用户的情绪状 态做出带有情绪应答信息,而且还可以根据用户的情绪变化,实时做出不同的语音交互策略,语音交互策略中所带有的情绪也会实时变化。In one of the embodiments, when the processor executes the computer program, it can also realize: pre-stating the emotional data of multiple users to generate an emotional database, and input the acquired text emotional state, audio emotional state, and expression state into the emotional recognition model Then, perform weighted calculation to obtain the user's emotions, compare and match the user's emotions with the preset emotion database, and obtain corresponding emotion feature information. Based on the obtained emotional feature information, emotional intention decision-making and user portrait filling are performed; based on the obtained emotional intention decision result and user portrait information, a voice generation information with emotion is generated through a dialogue generation model. In the specific voice interaction process, the dialogue generation model receives the question information input by the user, records the user's historical dialogue information, position change information, and mood change information, analyzes the user's personal information and activity status, and obtains the user's portrait information; according to the question Information and user portrait information to generate voice interaction information, which can also be used to update the dialogue generation model. In this embodiment, not only the emotional response information can be made according to the user's emotional state, but also different voice interaction strategies can be made in real time according to the user's emotional changes, and the emotions carried in the voice interaction strategy will also change in real time .
在其中的一个实施例中,该处理器执行计算机程序时还可以实现:将用户的情绪以及得到的情绪意图决策结果作为网络模型的第一输入;将自定义的场景结构化数据作为网络模型的第二输入;通过网络模型的学习与训练,得到在特定场景下输出拟人化语音交互策略的情绪引擎模型,该情绪引擎模型可以使得智能终端根据特定的场景自动输出拟人化的语音交互策略,实现更加智能且人性化的语音交互。In one of the embodiments, when the processor executes the computer program, it can also be realized: using the user's emotions and the obtained emotional intent decision results as the first input of the network model; using the custom scene structured data as the network model's The second input; through the learning and training of the network model, get the emotion engine model that outputs the anthropomorphic voice interaction strategy in the specific scene. The emotion engine model can enable the intelligent terminal to automatically output the anthropomorphic voice interaction strategy according to the specific scene. More intelligent and user-friendly voice interaction.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本公开所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art may understand that all or part of the processes in the method of the above embodiments may be completed by instructing relevant hardware through a computer program, and the computer program may be stored in a non-volatile computer readable storage In the medium, when the computer program is executed, the process of the foregoing method embodiments may be included. Among them, any reference to memory, storage, database or other media used in the embodiments provided by the present disclosure may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
综上所述,本公开提供了一种基于情绪引擎技术的语音交互方法,方法包括:获取用户输入的语音信息,并获取用户的人脸图像信息;从所述语音信息与人脸图像信息中提取情绪识别特征,并将提取的情绪识别特征输入至预设的情绪识别模型;通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出语音交互信息。本公开通过对用户情感的分析,并在语音交互中加入情感,从而塑造出有情感的智能语音交互方式,摆脱传统语音交互系统机械化、被动式的交流模式,给用户的使用提供了方便。In summary, the present disclosure provides a voice interaction method based on the emotion engine technology. The method includes: obtaining voice information input by a user, and obtaining face image information of the user; from the voice information and face image information Extract emotion recognition features, and input the extracted emotion recognition features into a preset emotion recognition model; calculate the user's emotions through the emotion recognition model, and generate an anthropomorphic voice interaction strategy based on the user's emotions, and output voice interactions information. The present disclosure analyzes the user's emotions and adds emotions to the voice interaction, thereby shaping an emotional intelligent voice interaction mode, getting rid of the traditional voice interaction system's mechanized and passive communication mode, and providing convenience for users.
应当理解的是,本公开的应用不限于上述的举例,对本领域普通技术人员来 说,可以根据上述说明加以改进或变换,所有这些改进和变换都应属于本公开所附权利要求的保护范围。It should be understood that the application of the present disclosure is not limited to the above examples. For those of ordinary skill in the art, improvements or changes can be made according to the above description, and all such improvements and changes should fall within the protection scope of the claims appended to the present disclosure.

Claims (15)

  1. 一种基于情绪引擎技术的语音交互方法,其中,所述方法包括:A voice interaction method based on emotion engine technology, wherein the method includes:
    获取用户输入的语音信息,并获取用户的人脸图像信息;Obtain the voice information input by the user, and obtain the user's face image information;
    从所述语音信息与人脸图像信息中提取情绪识别特征,并将提取的情绪识别特征输入至预设的情绪识别模型;Extract emotion recognition features from the voice information and face image information, and input the extracted emotion recognition features into a preset emotion recognition model;
    通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出语音交互信息。The emotion of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the voice interaction information is output.
  2. 根据权利要求1所述的基于情绪引擎技术的语音交互方法,其中,所述获取用户输入的语音信息,并获取用户的人脸图像信息的步骤,具体包括:The voice interaction method based on the emotion engine technology according to claim 1, wherein the step of obtaining voice information input by the user and obtaining face image information of the user specifically includes:
    通过预设的远程设备或者遥控器拾音设备获取用户输入的语音信息;Obtain the voice information input by the user through the preset remote device or remote control pickup device;
    通过预设的摄像头设备获取用户的人脸图像信息。Obtain the user's face image information through a preset camera device.
  3. 根据权利要求1所述的基于情绪引擎技术的语音交互方法,其中,所述从所述语音信息与人脸图像信息中提取情绪识别特征,并将提取的情绪识别特征输入至预设的情绪识别模型的步骤,具体包括:The voice interaction method based on the emotion engine technology according to claim 1, wherein the emotion recognition feature is extracted from the voice information and face image information, and the extracted emotion recognition feature is input to a preset emotion recognition The steps of the model include:
    将获取到的语音信息中的一路语音信号通过ASR语音识别模块转换成文本信息,并从所述文本信息中提取用户的文本情绪状态;Convert one voice signal in the acquired voice information into text information through the ASR voice recognition module, and extract the user's text emotional state from the text information;
    将获取到的语音信息中的另一路语音信号通过预设的语音情绪感知器提取出用户的音频情绪状态;The other voice signal in the obtained voice information is used to extract the user's audio emotional state through a preset voice emotion sensor;
    将获取到的人脸图像信息通过预设的表情识别系统提取出用户的表情状态;The obtained facial image information is used to extract the user's expression state through a preset expression recognition system;
    将所述文本情绪状态、音频情绪状态以及表情状态输入至预设的情绪识别模型。The text emotion state, audio emotion state and expression state are input to a preset emotion recognition model.
  4. 根据权利要求3所述的基于情绪引擎技术的语音交互方法,其中,所述从所述文本信息中提取用户的文本情绪状态的步骤,具体包括:The voice interaction method based on the emotion engine technology according to claim 3, wherein the step of extracting the user's text emotional state from the text information specifically includes:
    对所述文本信息进行特征提取,提取出语句信息,并根据所述语句信息从预设的记忆图谱中获取到用户的个人信息;Perform feature extraction on the text information, extract sentence information, and obtain user personal information from a preset memory map according to the sentence information;
    将所述语句信息与用户的个人信息输入到预设的情绪识别模型,识别出用户的文本情绪状态。The sentence information and the user's personal information are input into a preset emotion recognition model to identify the user's textual emotion state.
  5. 根据权利要求4所述的基于情绪引擎技术的语音交互方法,其中,所述将所述语句信息与用户的个人信息输入到预设的情绪识别模型,识别出用户的文本情绪状态的步骤,具体包括:The voice interaction method based on the emotion engine technology according to claim 4, wherein the step of inputting the sentence information and the user's personal information into a preset emotion recognition model to recognize the user's textual emotion state, specifically include:
    从所述语句信息中提取出关键词,并根据所述关键词得到用户的第一情绪状态和第一信心分值;Extract keywords from the sentence information, and obtain the user's first emotional state and first confidence score according to the keywords;
    将所述语句信息和用户的个人信息输入至所述情绪识别模型中,得到用户的第二情绪状态和第二信心分值;Input the sentence information and the user's personal information into the emotion recognition model to obtain the user's second emotional state and second confidence score;
    将所述第一信心分值与预设的阈值进行比较;Compare the first confidence score with a preset threshold;
    若所述第一信心分值大于阈值,将所述第一情绪状态作为用户的文本情绪状态;若所述第一信心分值小于阈值,将第一情绪状态和所述第二情绪状态进行动态排序,并根据动态排序的结果决定出用户的文本情绪状态。If the first confidence score is greater than the threshold, the first emotional state is used as the user's text emotional state; if the first confidence score is less than the threshold, the first emotional state and the second emotional state are dynamically Sort, and determine the user's text emotional state according to the result of dynamic sorting.
  6. 根据权利要求5所述的基于情绪引擎技术的语音交互方法,其中,所述语句信息包括:语句的中文分词信息、语句在分词处理后的词性标注信息、语句的句式信息、语句的sentence2vector信息。The voice interaction method based on emotion engine technology according to claim 5, wherein the sentence information includes: Chinese word segmentation information of the sentence, part-of-speech tagging information of the sentence after word segmentation processing, sentence sentence information, sentence sentence2vector information .
  7. 根据权利要求5所述的基于情绪引擎技术的语音交互方法,其中,所述动态排序涉及的参数包括:文本长度、提取的关键词、用户输入的文本、第一/第二情感状态的信心分值。The voice interaction method based on emotion engine technology according to claim 5, wherein the parameters involved in the dynamic ranking include: text length, extracted keywords, text input by the user, confidence scores of the first/second emotional states value.
  8. 根据权利要求1所述的基于情绪引擎技术的语音交互方法,其中,所述通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出语音交互信息的步骤,具体包括:The voice interaction method based on the emotion engine technology according to claim 1, wherein the emotion recognition of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the voice interaction information is output The steps include:
    所述情绪识别模型对输入的文本情绪状态、音频情绪状态以及表情状态进行加权计算,得出用户的情绪;The emotion recognition model performs weighted calculation on the input text emotion state, audio emotion state and expression state to obtain the user's emotion;
    将得出的情绪与预设的情绪数据库中的情绪特征信息进行对比匹配,得出对应的情绪特征信息;Compare and match the obtained emotion with the emotion feature information in the preset emotion database to obtain the corresponding emotion feature information;
    基于得到的情绪特征信息,进行情绪意图决策和用户画像填充,得到情绪意图决策结果;Based on the obtained emotional feature information, emotional intention decision-making and user portrait filling are performed to obtain the emotional intention decision result;
    根据得到的情绪意图决策结果和用户画像信息,通过对话生成模型来生成带有情绪的语音交互信息,并输出所述语音交互信息。According to the obtained emotion intention decision result and user portrait information, a voice generation information with emotion is generated through a dialogue generation model, and the voice interaction information is output.
  9. 根据权利要求8所述的基于情绪引擎技术的语音交互方法,其中,所述情绪数据库中包括有多种情绪以及每一个每一种情绪所对应的情绪特征信息。The voice interaction method based on the emotion engine technology according to claim 8, wherein the emotion database includes multiple emotions and emotional feature information corresponding to each emotion.
  10. 根据权利要求1所述的基于情绪引擎技术的语音交互方法,其中,所述情绪识别模型对输入的文本情绪状态、音频情绪状态以及表情状态进行加权计算 之前,包括:The voice interaction method based on the emotion engine technology according to claim 1, wherein before the emotion recognition model performs weighted calculation on the input text emotional state, audio emotional state and expression state, it includes:
    预先对文本情绪状态、音频情绪状态以及表情状态这三者设置不用的权重。Unweighted weights are set in advance for the text emotional state, audio emotional state, and expression state.
  11. 根据权利要求9所述的基于情绪引擎技术的语音交互方法,其中,所述通过对话生成模型来生成带有情绪的语音交互信息的步骤,具体包括:The voice interaction method based on the emotion engine technology according to claim 9, wherein the step of generating voice interaction information with emotions through a dialog generation model specifically includes:
    对话生成模型接收用户输入的问题信息,并记录用户的历史对话信息、位置变换信息以及情绪变化信息;The dialogue generation model receives the question information input by the user, and records the user's historical dialogue information, position change information, and mood change information;
    根据所述历史对话信息、所述位置变换信息以及所述情绪变化信息分析出用户的个人信息和活动状态,得到用户画像信息;Analyzing the user's personal information and activity status according to the historical dialogue information, the position change information, and the emotion change information, to obtain user portrait information;
    根据问题信息和用户画像信息,生成语音交互信息;所述语音交互信息还用于对所述对话生成模型进行更新。Generate voice interaction information according to the problem information and user portrait information; the voice interaction information is also used to update the dialogue generation model.
  12. 根据权利要求11所述的基于情绪引擎技术的语音交互方法,其中,所述对话生成模块是通过三层循环神经网络RNN架构实现,且采用反向传播算法算法为基础。The voice interaction method based on emotion engine technology according to claim 11, wherein the dialogue generation module is implemented by a three-layer recurrent neural network RNN architecture, and is based on a back propagation algorithm algorithm.
  13. 根据权利要求9所述的基于情绪引擎技术的语音交互方法,其中,所述通过所述情绪识别模型计算出用户的情绪,并基于用户的情绪生成拟人化的语音交互策略,并输出语音交互信息的步骤,包括:The voice interaction method based on the emotion engine technology according to claim 9, wherein the emotion of the user is calculated through the emotion recognition model, and an anthropomorphic voice interaction strategy is generated based on the emotion of the user, and the voice interaction information is output The steps include:
    将用户的情绪以及得到的情绪意图决策结果作为网络模型的第一输入;Use the user's emotion and the obtained emotional intention decision result as the first input of the network model;
    将自定义的场景结构化数据作为网络模型的第二输入;Use the customized scene structured data as the second input of the network model;
    通过网络模型的学习与训练,得到在特定场景下输出拟人化语音交互策略的情绪引擎模型。Through the learning and training of the network model, an emotion engine model that outputs anthropomorphic voice interaction strategies under specific scenarios is obtained.
  14. 一种智能终端,其中,包括:处理器、与处理器通信连接的存储介质,所述存储介质适于存储多条指令;所述处理器适于调用所述存储介质中的指令,以执行实现上述权利要求1-13任一项所述的基于情绪引擎技术的语音交互方法的步骤。An intelligent terminal, including: a processor and a storage medium communicatively connected to the processor, the storage medium is suitable for storing multiple instructions; the processor is suitable for calling instructions in the storage medium to execute the implementation The steps of the voice interaction method based on the emotion engine technology according to any one of claims 1-13.
  15. 一种存储介质,其上存储有多条指令,其中,所述指令适于由处理器加载并执行,以执行实现上述权利要求1-13任一项所述的基于情绪引擎技术的语音交互方法的步骤。A storage medium on which a plurality of instructions are stored, wherein the instructions are suitable for being loaded and executed by a processor to implement the voice interaction method based on the emotion engine technology of any one of claims 1-13 A step of.
PCT/CN2019/126443 2018-12-26 2019-12-19 Emotion engine technology-based voice interaction method, smart terminal, and storage medium WO2020135194A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811605103.5A CN111368609B (en) 2018-12-26 2018-12-26 Speech interaction method based on emotion engine technology, intelligent terminal and storage medium
CN201811605103.5 2018-12-26

Publications (1)

Publication Number Publication Date
WO2020135194A1 true WO2020135194A1 (en) 2020-07-02

Family

ID=71128377

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/126443 WO2020135194A1 (en) 2018-12-26 2019-12-19 Emotion engine technology-based voice interaction method, smart terminal, and storage medium

Country Status (2)

Country Link
CN (1) CN111368609B (en)
WO (1) WO2020135194A1 (en)

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696556A (en) * 2020-07-13 2020-09-22 上海茂声智能科技有限公司 Method, system, equipment and storage medium for analyzing user conversation emotion
CN111858892A (en) * 2020-07-24 2020-10-30 中国平安人寿保险股份有限公司 Voice interaction method, device, equipment and medium based on knowledge graph
CN111883127A (en) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech
CN111897933A (en) * 2020-07-27 2020-11-06 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device
CN112002329A (en) * 2020-09-03 2020-11-27 深圳Tcl新技术有限公司 Physical and mental health monitoring method and device and computer readable storage medium
CN112034989A (en) * 2020-09-04 2020-12-04 华人运通(上海)云计算科技有限公司 Intelligent interaction system
CN112100337A (en) * 2020-10-15 2020-12-18 平安科技(深圳)有限公司 Emotion recognition method and device in interactive conversation
CN112151027A (en) * 2020-08-21 2020-12-29 深圳追一科技有限公司 Specific person inquiry method, device and storage medium based on digital person
CN112185389A (en) * 2020-09-22 2021-01-05 北京小米松果电子有限公司 Voice generation method and device, storage medium and electronic equipment
CN112232276A (en) * 2020-11-04 2021-01-15 赵珍 Emotion detection method and device based on voice recognition and image recognition
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112650399A (en) * 2020-12-22 2021-04-13 科大讯飞股份有限公司 Expression recommendation method and device
CN112687260A (en) * 2020-11-17 2021-04-20 珠海格力电器股份有限公司 Facial-recognition-based expression judgment voice recognition method, server and air conditioner
CN112785667A (en) * 2021-01-25 2021-05-11 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN113269406A (en) * 2021-05-06 2021-08-17 京东数字科技控股股份有限公司 Method and device for evaluating online service, computer equipment and storage medium
CN113488024A (en) * 2021-05-31 2021-10-08 杭州摸象大数据科技有限公司 Semantic recognition-based telephone interruption recognition method and system
CN113609851A (en) * 2021-07-09 2021-11-05 浙江连信科技有限公司 Psychological idea cognitive deviation identification method and device and electronic equipment
CN113645364A (en) * 2021-06-21 2021-11-12 国网浙江省电力有限公司金华供电公司 Intelligent voice outbound method facing power dispatching
CN114416934A (en) * 2021-12-24 2022-04-29 北京百度网讯科技有限公司 Multi-modal dialog generation model training method and device and electronic equipment
CN114533063A (en) * 2022-02-23 2022-05-27 金华高等研究院(金华理工学院筹建工作领导小组办公室) Multi-source monitoring combined emotion calculation system and method
CN115830171A (en) * 2023-02-17 2023-03-21 深圳前海深蕾半导体有限公司 Image generation method based on artificial intelligence drawing, display device and storage medium
CN116030811A (en) * 2023-03-22 2023-04-28 广州小鹏汽车科技有限公司 Voice interaction method, vehicle and computer readable storage medium
CN116643675A (en) * 2023-07-27 2023-08-25 苏州创捷传媒展览股份有限公司 Intelligent interaction system based on AI virtual character
CN116821287A (en) * 2023-08-28 2023-09-29 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method
CN116935480A (en) * 2023-09-18 2023-10-24 四川天地宏华导航设备有限公司 Emotion recognition method and device
CN117153151A (en) * 2023-10-09 2023-12-01 广州易风健康科技股份有限公司 Emotion recognition method based on user intonation
CN117371338A (en) * 2023-12-07 2024-01-09 浙江宇宙奇点科技有限公司 AI digital person modeling method and system based on user portrait

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881348A (en) * 2020-07-20 2020-11-03 百度在线网络技术(北京)有限公司 Information processing method, device, equipment and storage medium
CN111897434A (en) * 2020-08-05 2020-11-06 上海永骁智能技术有限公司 System, method, and medium for signal control of virtual portrait
CN112183197A (en) * 2020-08-21 2021-01-05 深圳追一科技有限公司 Method and device for determining working state based on digital person and storage medium
CN112235183B (en) * 2020-08-29 2021-11-12 上海量明科技发展有限公司 Communication message processing method and device and instant communication client
CN112148850A (en) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 Dynamic interaction method, server, electronic device and storage medium
CN112185422B (en) * 2020-09-14 2022-11-08 五邑大学 Prompt message generation method and voice robot thereof
CN112083806B (en) * 2020-09-16 2021-10-26 华南理工大学 Self-learning emotion interaction method based on multi-modal recognition
CN112297023B (en) * 2020-10-22 2022-04-05 新华网股份有限公司 Intelligent accompanying robot system
CN112455370A (en) * 2020-11-24 2021-03-09 一汽奔腾轿车有限公司 Emotion management and interaction system and method based on multidimensional data arbitration mechanism
CN112379780B (en) * 2020-12-01 2021-10-26 宁波大学 Multi-mode emotion interaction method, intelligent device, system, electronic device and medium
CN112633172B (en) * 2020-12-23 2023-11-14 平安银行股份有限公司 Communication optimization method, device, equipment and medium
CN112735440A (en) * 2020-12-30 2021-04-30 北京瞰瞰科技有限公司 Vehicle-mounted intelligent robot interaction method, robot and vehicle
CN114745349B (en) * 2021-01-08 2023-12-26 上海博泰悦臻网络技术服务有限公司 Comment method, electronic equipment and computer readable storage medium
CN113822967A (en) * 2021-02-09 2021-12-21 北京沃东天骏信息技术有限公司 Man-machine interaction method, device, system, electronic equipment and computer medium
CN112967725A (en) * 2021-02-26 2021-06-15 平安科技(深圳)有限公司 Voice conversation data processing method and device, computer equipment and storage medium
CN112990301A (en) * 2021-03-10 2021-06-18 深圳市声扬科技有限公司 Emotion data annotation method and device, computer equipment and storage medium
CN113270087A (en) * 2021-05-26 2021-08-17 深圳传音控股股份有限公司 Processing method, mobile terminal and storage medium
CN113434647B (en) * 2021-06-18 2024-01-12 竹间智能科技(上海)有限公司 Man-machine interaction method, system and storage medium
CN113852524A (en) * 2021-07-16 2021-12-28 天翼智慧家庭科技有限公司 Intelligent household equipment control system and method based on emotional characteristic fusion
CN113380271B (en) * 2021-08-12 2021-12-21 明品云(北京)数据科技有限公司 Emotion recognition method, system, device and medium
CN113687744B (en) * 2021-08-19 2022-01-18 北京智精灵科技有限公司 Man-machine interaction device for emotion adjustment
CN113580166B (en) * 2021-08-20 2023-11-28 安徽淘云科技股份有限公司 Interaction method, device, equipment and storage medium of anthropomorphic robot
CN113707185B (en) * 2021-09-17 2023-09-12 卓尔智联(武汉)研究院有限公司 Emotion recognition method and device and electronic equipment
CN114115533A (en) * 2021-11-11 2022-03-01 北京萌特博智能机器人科技有限公司 Intelligent interaction method and device
CN114237395A (en) * 2021-12-14 2022-03-25 北京百度网讯科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN114516341A (en) * 2022-04-13 2022-05-20 北京智科车联科技有限公司 User interaction method and system and vehicle
CN114999534A (en) * 2022-06-10 2022-09-02 中国第一汽车股份有限公司 Method, device and equipment for controlling playing of vehicle-mounted music and storage medium
CN115238111B (en) * 2022-06-15 2023-11-14 荣耀终端有限公司 Picture display method and electronic equipment
CN115204127B (en) * 2022-09-19 2023-01-06 深圳市北科瑞声科技股份有限公司 Form filling method, device, equipment and medium based on remote flow adjustment
CN115334205B (en) * 2022-10-11 2022-12-27 北京资采信息技术有限公司 Voice outbound system and method adopting deep learning
CN115431288B (en) * 2022-11-10 2023-01-31 深圳市神州云海智能科技有限公司 Guide robot for emotion feedback and information interaction based on multi-element fusion information
CN116820250B (en) * 2023-08-29 2023-11-17 小舟科技有限公司 User interaction method and device based on meta universe, terminal and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203344A (en) * 2016-07-12 2016-12-07 北京光年无限科技有限公司 A kind of Emotion identification method and system for intelligent robot
CN106570496A (en) * 2016-11-22 2017-04-19 上海智臻智能网络科技股份有限公司 Emotion recognition method and device and intelligent interaction method and device
CN107944008A (en) * 2017-12-08 2018-04-20 神思电子技术股份有限公司 A kind of method that Emotion identification is carried out for natural language
CN109036405A (en) * 2018-07-27 2018-12-18 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106773923B (en) * 2016-11-30 2020-04-21 北京光年无限科技有限公司 Multi-mode emotion data interaction method and device for robot
CN107220591A (en) * 2017-04-28 2017-09-29 哈尔滨工业大学深圳研究生院 Multi-modal intelligent mood sensing system
CN107301168A (en) * 2017-06-01 2017-10-27 深圳市朗空亿科科技有限公司 Intelligent robot and its mood exchange method, system
CN107243905A (en) * 2017-06-28 2017-10-13 重庆柚瓣科技有限公司 Mood Adaptable System based on endowment robot

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203344A (en) * 2016-07-12 2016-12-07 北京光年无限科技有限公司 A kind of Emotion identification method and system for intelligent robot
CN106570496A (en) * 2016-11-22 2017-04-19 上海智臻智能网络科技股份有限公司 Emotion recognition method and device and intelligent interaction method and device
CN107944008A (en) * 2017-12-08 2018-04-20 神思电子技术股份有限公司 A kind of method that Emotion identification is carried out for natural language
CN109036405A (en) * 2018-07-27 2018-12-18 百度在线网络技术(北京)有限公司 Voice interactive method, device, equipment and storage medium

Cited By (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696556B (en) * 2020-07-13 2023-05-16 上海茂声智能科技有限公司 Method, system, equipment and storage medium for analyzing user dialogue emotion
CN111696556A (en) * 2020-07-13 2020-09-22 上海茂声智能科技有限公司 Method, system, equipment and storage medium for analyzing user conversation emotion
CN111858892A (en) * 2020-07-24 2020-10-30 中国平安人寿保险股份有限公司 Voice interaction method, device, equipment and medium based on knowledge graph
CN111858892B (en) * 2020-07-24 2023-09-29 中国平安人寿保险股份有限公司 Voice interaction method, device, equipment and medium based on knowledge graph
CN111897933A (en) * 2020-07-27 2020-11-06 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device
CN111897933B (en) * 2020-07-27 2024-02-06 腾讯科技(深圳)有限公司 Emotion dialogue generation method and device and emotion dialogue model training method and device
CN111883127A (en) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech
CN112151027B (en) * 2020-08-21 2024-05-03 深圳追一科技有限公司 Method, device and storage medium for querying specific person based on digital person
CN112151027A (en) * 2020-08-21 2020-12-29 深圳追一科技有限公司 Specific person inquiry method, device and storage medium based on digital person
CN112002329A (en) * 2020-09-03 2020-11-27 深圳Tcl新技术有限公司 Physical and mental health monitoring method and device and computer readable storage medium
CN112002329B (en) * 2020-09-03 2024-04-02 深圳Tcl新技术有限公司 Physical and mental health monitoring method, equipment and computer readable storage medium
CN112034989A (en) * 2020-09-04 2020-12-04 华人运通(上海)云计算科技有限公司 Intelligent interaction system
CN112185389A (en) * 2020-09-22 2021-01-05 北京小米松果电子有限公司 Voice generation method and device, storage medium and electronic equipment
CN112100337B (en) * 2020-10-15 2024-03-05 平安科技(深圳)有限公司 Emotion recognition method and device in interactive dialogue
CN112100337A (en) * 2020-10-15 2020-12-18 平安科技(深圳)有限公司 Emotion recognition method and device in interactive conversation
CN112232276A (en) * 2020-11-04 2021-01-15 赵珍 Emotion detection method and device based on voice recognition and image recognition
CN112232276B (en) * 2020-11-04 2023-10-13 上海企创信息科技有限公司 Emotion detection method and device based on voice recognition and image recognition
CN112418034A (en) * 2020-11-12 2021-02-26 元梦人文智能国际有限公司 Multi-modal emotion recognition method and device, electronic equipment and storage medium
CN112687260A (en) * 2020-11-17 2021-04-20 珠海格力电器股份有限公司 Facial-recognition-based expression judgment voice recognition method, server and air conditioner
CN112650399A (en) * 2020-12-22 2021-04-13 科大讯飞股份有限公司 Expression recommendation method and device
CN112650399B (en) * 2020-12-22 2023-12-01 科大讯飞股份有限公司 Expression recommendation method and device
CN112785667A (en) * 2021-01-25 2021-05-11 北京有竹居网络技术有限公司 Video generation method, device, medium and electronic equipment
CN113269406A (en) * 2021-05-06 2021-08-17 京东数字科技控股股份有限公司 Method and device for evaluating online service, computer equipment and storage medium
CN113488024B (en) * 2021-05-31 2023-06-23 杭州摸象大数据科技有限公司 Telephone interrupt recognition method and system based on semantic recognition
CN113488024A (en) * 2021-05-31 2021-10-08 杭州摸象大数据科技有限公司 Semantic recognition-based telephone interruption recognition method and system
CN113645364A (en) * 2021-06-21 2021-11-12 国网浙江省电力有限公司金华供电公司 Intelligent voice outbound method facing power dispatching
CN113645364B (en) * 2021-06-21 2023-08-22 国网浙江省电力有限公司金华供电公司 Intelligent voice outbound method for power dispatching
CN113609851A (en) * 2021-07-09 2021-11-05 浙江连信科技有限公司 Psychological idea cognitive deviation identification method and device and electronic equipment
CN114416934A (en) * 2021-12-24 2022-04-29 北京百度网讯科技有限公司 Multi-modal dialog generation model training method and device and electronic equipment
CN114533063B (en) * 2022-02-23 2023-10-27 金华高等研究院(金华理工学院筹建工作领导小组办公室) Multi-source monitoring combined emotion computing system and method
CN114533063A (en) * 2022-02-23 2022-05-27 金华高等研究院(金华理工学院筹建工作领导小组办公室) Multi-source monitoring combined emotion calculation system and method
CN115830171A (en) * 2023-02-17 2023-03-21 深圳前海深蕾半导体有限公司 Image generation method based on artificial intelligence drawing, display device and storage medium
CN116030811A (en) * 2023-03-22 2023-04-28 广州小鹏汽车科技有限公司 Voice interaction method, vehicle and computer readable storage medium
CN116030811B (en) * 2023-03-22 2023-06-30 广州小鹏汽车科技有限公司 Voice interaction method, vehicle and computer readable storage medium
CN116643675A (en) * 2023-07-27 2023-08-25 苏州创捷传媒展览股份有限公司 Intelligent interaction system based on AI virtual character
CN116643675B (en) * 2023-07-27 2023-10-03 苏州创捷传媒展览股份有限公司 Intelligent interaction system based on AI virtual character
CN116821287B (en) * 2023-08-28 2023-11-17 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method
CN116821287A (en) * 2023-08-28 2023-09-29 湖南创星科技股份有限公司 Knowledge graph and large language model-based user psychological portrait system and method
CN116935480B (en) * 2023-09-18 2023-12-29 四川天地宏华导航设备有限公司 Emotion recognition method and device
CN116935480A (en) * 2023-09-18 2023-10-24 四川天地宏华导航设备有限公司 Emotion recognition method and device
CN117153151A (en) * 2023-10-09 2023-12-01 广州易风健康科技股份有限公司 Emotion recognition method based on user intonation
CN117153151B (en) * 2023-10-09 2024-05-07 广州易风健康科技股份有限公司 Emotion recognition method based on user intonation
CN117371338A (en) * 2023-12-07 2024-01-09 浙江宇宙奇点科技有限公司 AI digital person modeling method and system based on user portrait
CN117371338B (en) * 2023-12-07 2024-03-22 浙江宇宙奇点科技有限公司 AI digital person modeling method and system based on user portrait

Also Published As

Publication number Publication date
CN111368609A (en) 2020-07-03
CN111368609B (en) 2023-10-17

Similar Documents

Publication Publication Date Title
WO2020135194A1 (en) Emotion engine technology-based voice interaction method, smart terminal, and storage medium
CN108334583B (en) Emotion interaction method and device, computer readable storage medium and computer equipment
CN108227932B (en) Interaction intention determination method and device, computer equipment and storage medium
CN110688911B (en) Video processing method, device, system, terminal equipment and storage medium
JP7022062B2 (en) VPA with integrated object recognition and facial expression recognition
CN105843381B (en) Data processing method for realizing multi-modal interaction and multi-modal interaction system
WO2020147428A1 (en) Interactive content generation method and apparatus, computer device, and storage medium
US9501743B2 (en) Method and apparatus for tailoring the output of an intelligent automated assistant to a user
WO2021000497A1 (en) Retrieval method and apparatus, and computer device and storage medium
CN110110169A (en) Man-machine interaction method and human-computer interaction device
KR102448382B1 (en) Electronic device for providing image related with text and operation method thereof
US20180129647A1 (en) Systems and methods for dynamically collecting and evaluating potential imprecise characteristics for creating precise characteristics
WO2020211820A1 (en) Method and device for speech emotion recognition
CN106502382B (en) Active interaction method and system for intelligent robot
CN110399837A (en) User emotion recognition methods, device and computer readable storage medium
KR101984283B1 (en) Automated Target Analysis System Using Machine Learning Model, Method, and Computer-Readable Medium Thereof
Hema et al. Emotional speech recognition using cnn and deep learning techniques
KR102222911B1 (en) System for Providing User-Robot Interaction and Computer Program Therefore
Verkholyak et al. Modeling short-term and long-term dependencies of the speech signal for paralinguistic emotion classification
CN114676259B (en) Conversation emotion recognition method based on causal perception interactive network
CN110909218A (en) Information prompting method and system in question-answering scene
CN110931002B (en) Man-machine interaction method, device, computer equipment and storage medium
CN109961152B (en) Personalized interaction method and system of virtual idol, terminal equipment and storage medium
Du et al. Composite Emotion Recognition and Feedback of Social Assistive Robot for Elderly People
US20240038225A1 (en) Gestural prompting based on conversational artificial intelligence

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19904022

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19904022

Country of ref document: EP

Kind code of ref document: A1