WO2020098756A1 - 一种基于情感的语音交互方法、存储介质及终端设备 - Google Patents

一种基于情感的语音交互方法、存储介质及终端设备 Download PDF

Info

Publication number
WO2020098756A1
WO2020098756A1 PCT/CN2019/118580 CN2019118580W WO2020098756A1 WO 2020098756 A1 WO2020098756 A1 WO 2020098756A1 CN 2019118580 W CN2019118580 W CN 2019118580W WO 2020098756 A1 WO2020098756 A1 WO 2020098756A1
Authority
WO
WIPO (PCT)
Prior art keywords
emotion
voice
intention
type
voice information
Prior art date
Application number
PCT/CN2019/118580
Other languages
English (en)
French (fr)
Inventor
马小莉
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Priority to US17/261,832 priority Critical patent/US11640832B2/en
Priority to EP19885273.3A priority patent/EP3882910A4/en
Publication of WO2020098756A1 publication Critical patent/WO2020098756A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • the present disclosure relates to the technical field of intelligent terminals, and in particular, to an emotion-based voice interaction method, storage medium, and terminal equipment.
  • the present disclosure aims to provide an emotion-based voice interaction method, storage medium, and terminal device.
  • An emotion-based voice interaction method including:
  • a response voice of the voice information is generated according to the emotion type, and the response voice is played.
  • the type of intent to receive the voice information to be processed specifically includes:
  • the intention type of the voice information is emotional intention.
  • the receiving the to-be-processed voice information, dividing the voice information into words to obtain several words, and judging whether the divided words include emotional keywords specifically includes:
  • the preset condition is that the part of speech of the word does not belong to the preset part of speech list.
  • the preset part-of-speech list includes non-keyword parts, where the non-keyword parts are part-of-speech parts that do not have emotions or action meanings.
  • determining that the intention type of the voice information is emotional intention specifically includes:
  • the intention type of the voice information is determined to be the emotion intention.
  • the method further includes:
  • the intention type of the voice information is instruction intention.
  • the method further includes:
  • the intention type of the voice information is instruction intention.
  • the method further includes:
  • the instruction intention can determine the instruction content
  • the user is queried in a domain clarification manner until the instruction content can be determined, and the instruction corresponding to the instruction intention is executed.
  • the method further includes:
  • an excitation voice is generated and played according to the instruction.
  • determining the emotion type of the voice information is specifically:
  • emotion analysis is performed on the speech information to obtain an emotion type corresponding to the speech information, wherein the emotion analysis includes vocabulary sentiment analysis, sentence sentiment sentiment analysis, and voice rhythm sentiment analysis One or more of the analysis methods.
  • the generating a response voice of the voice information according to the emotion type, and playing the response voice is specifically:
  • a response voice corresponding to the voice information is generated according to the emotion type, and the response voice is played, wherein the response voice includes an emotional response sentence and a function-oriented sentence.
  • generating response voice corresponding to the voice information according to the emotion type according to the principle of emotion empathy and the principle of emotion orientation, and playing the response voice specifically includes:
  • generating response voice corresponding to the voice information according to the emotion type according to the principle of emotion empathy and the principle of emotion orientation, and playing the response voice specifically includes:
  • An emotional visual image is generated based on the response voice, and the corresponding response voice is deduced through the visual image.
  • before receiving the voice information to be processed and before acquiring the intention type of the voice information includes:
  • the voice listening mode is activated and the preset voice is actively played.
  • the response voice after generating a response voice of the voice information according to the emotion type, and playing the response voice further includes:
  • a computer-readable storage medium stores one or more programs, and the one or more programs may be executed by one or more processors to implement emotion-based as described above Steps in a voice interaction method.
  • a terminal device includes: a processor and a memory
  • a computer-readable program executable by the processor is stored on the memory
  • a computer-readable storage medium stores one or more programs, and the one or more programs may be executed by one or more processors to implement emotion-based as described above Steps in a voice interaction method.
  • a terminal device including: a processor and a memory
  • a computer-readable program executable by the processor is stored on the memory
  • the present disclosure provides an emotion-based voice interaction method, storage medium, and terminal device.
  • the method includes: receiving voice information to be processed, and taking an intention type of the voice information; When the intention type is an emotion intention, the emotion type of the voice information is determined; a response voice of the voice information is generated according to the emotion type, and the response voice is played.
  • the present disclosure judges the intention type of voice information, and when the intention type is emotional intention, generates a corresponding response voice according to the emotional intention, so that the response voice can be matched with the user's emotional intention, while achieving voice interaction Achieve emotional interaction and bring convenience to users.
  • FIG. 1 is a flowchart of an embodiment of an emotion-based voice interaction method provided by the present disclosure.
  • FIG. 2 is a flowchart of step S10 in an embodiment of an emotion-based voice interaction method provided by the present disclosure.
  • FIG. 3 is a flowchart of step S20 in an embodiment of an emotion-based voice interaction method provided by the present disclosure.
  • FIG. 4 is a structural schematic diagram of a terminal device provided by the present disclosure.
  • the present disclosure provides an emotion-based voice interaction method, storage medium, and terminal device.
  • the disclosure will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present disclosure and are not intended to limit the present disclosure.
  • This embodiment provides an emotion-based voice interaction method. As shown in FIG. 1, the method includes:
  • the voice information may be the voice collected by the terminal device through the microphone in real-time input by the user, or the voice sent by the external device to the terminal device through the network.
  • the execution body of the process shown in FIG. 1 may be a cloud server, that is, the terminal device receives the voice information to be processed and sends the voice information to A cloud server.
  • the cloud server can generate a response voice according to the obtained voice information, and return the response voice to the terminal device, and then the terminal device plays the response voice to the user.
  • the terminal device is a terminal device with a voice function.
  • the execution subject of the process described in FIG. 1 may also be the terminal device itself.
  • the intention type refers to a type of meaning expected to be expressed by the voice information, and the intention type is used to determine a response manner of the voice information, wherein voice information of different intention types Corresponding to different response methods, this can quickly perform the corresponding operation according to the user's intention, and improve the timeliness of voice response.
  • the intention includes an emotional intention and an instruction intention.
  • the emotional intention refers to the voice information desired to express emotions, that is, the voice information includes emotional expressions
  • the instruction intention refers to the voice information desired to express operation instructions, that is, the The voice message contains only the instruction expression.
  • the intention type may be determined by judging whether the intention type is an emotional intention, and when the intention type is not an emotional intention, the intention type is an instruction intention.
  • the emotional intention may be determined by determining whether the voice information includes emotional keywords.
  • the type of intent to receive the voice information to be processed specifically includes:
  • S11 Receive voice information to be processed, perform word division on the voice information to obtain several words, and determine whether the divided words include emotional keywords;
  • the emotional keywords are emotional words
  • the emotional keywords may be preset and stored in a keyword database.
  • the keyword database may be searched and divided. If it is found, it is determined that the voice information includes emotional keywords, and if it is not found, it is determined that the voice information does not include emotional keywords.
  • the voice information needs to be recognized to convert the voice information into text information, and then the text information is divided into words, and after the word division . You can perform part-of-speech screening on the words obtained by division, and delete words that are not keyword-based (for example, adverbs and personal pronouns, etc.) to improve the search speed of emotional keywords.
  • the non-keywords can be stored in the part-of-speech list in advance. After several words are divided, the part-of-speech of each word can be obtained, and the divided words can be filtered according to the part-of-speech list to remove words in the part-of-speech attribute part-of-speech list.
  • the non-keyword character is a part of speech that does not have emotional characteristics and action characteristics, emotional characteristics refer to emotional colors, and action characteristics refer to action meanings.
  • the file information corresponding to the voice information is "Today is really tired", and the text information can be divided into words to get “Today”, “True tiredness” and “Ah”, and the words can be filtered to get “Today” And “True tiredness”, wherein the tiredness in the "true tiredness” is an emotional keyword, so that the intention type of the voice information can be determined as emotional intention.
  • the voice information does not include emotional keywords, it can be determined that the intention type of the voice information is instruction intention.
  • the keyword database may contain multiple divided words.
  • multiple emotional keywords it can be determined whether the emotional types corresponding to the multiple emotional keywords are the same. If the same is selected, one of the multiple emotional keywords is selected as the emotional keyword of the voice information.
  • the intention type of the voice information may be determined as the instruction intention.
  • the voice information includes multiple different types of emotional keywords, the tone and intonation corresponding to the voice information may be acquired, and the emotional keywords corresponding to the voice information may be determined according to the tone and intonation.
  • the emotion type refers to the emotional state of the user, for example, the emotion type is happy, unhappy, angry, and sad.
  • the emotion type may be directly determined according to the emotion keyword, or may be determined according to the emotion keyword and the overall sentence meaning of the voice information.
  • determining the emotion type of the voice information specifically includes:
  • the intent type is emotional intent
  • perform sentiment analysis on the speech information where the sentiment analysis is one or more of vocabulary sentiment analysis, sentence sentiment sentiment analysis, and voice rhythm sentiment analysis;
  • the vocabulary sentiment analysis can perform sentiment analysis on vocabulary of speech information, wherein the sentiments of Chinese characters include testimonials, derogatory words, positive and negative tone words, curse words, etc., and different vocabulary has its own sentiment representation.
  • the sentimental sentiment analysis performs sentiment analysis on the vocabulary and complete sentiment of speech information through natural language processing analysis, where the sentiment sentiment analysis is mainly based on vocabulary sentiment analysis.
  • the voice rhythm sentiment analysis analyzes the voice of voice information, and compares it with historical interaction records and standard voice sentiment rhythm libraries to determine the voice rhythm and predict emotions. In this way, the sentiment analysis can be used to determine the sentiment type corresponding to the speech information through vocabulary sentiment analysis, sentence meaning sentiment analysis, and / or voice rhythm sentiment analysis.
  • voice rhythm sentiment analysis it is preferable to use voice rhythm sentiment analysis to determine the sentiment type of the voice information, that is, to divide the sentiment intention of the voice information, for example, the sentiment intention belongs to the sad type.
  • the voice rhythm sentiment analysis may also use phonemes to determine the sentiment type corresponding to the sentiment intention.
  • the phoneme may include the level of the sound, fluctuation, tone, and the like. That is to say, the corresponding emotion type can be determined according to the level, fluctuation, tone, etc. of the voice of the voice information.
  • the level of sound or fluctuation can be adopted in the same way; or the level of sound, fluctuation and tone can be set respectively, and then the level of sound, fluctuation and The tone is compared with its corresponding range to determine the level of the sound, the ups and downs, and the emotion type corresponding to the tone.
  • a large number of emotion types are selected as the emotion type of the voice information.
  • the level, fluctuation and tone of the voice correspond to different emotion types respectively
  • the The preferred level of height, fluctuation, and tone determines the type of emotion corresponding to the voice information.
  • the preferred levels of the sound level, fluctuation and tone are pre-selected.
  • the response voice is voice information generated according to the emotion type of the voice information, and the response voice is generated based on the emotion empathy principle and the emotion-oriented principle, and the emotion empathy-based
  • the principle refers to adopting the same principle as the emotion carried by the voice information
  • the emotion-oriented principle refers to guiding the user's emotional release direction. Therefore, the principle of emotion empathy and the principle of emotion guidance include two parts: the empathy part of emotion and the direction domain of emotion. Wherein, the emotion empathy part is used to resonate emotion with the user, and the emotion-oriented domain part is used to provide the user with a way to ease the emotion.
  • the voice message is "Today is so tired”
  • the response voice generated based on the principle of emotional empathy and the principle of emotional orientation can be “Oh, then relax and listen to music", where, “Oh, then “Relax and take a rest” is the emotional empathy part, and "listen to music” is the emotional-oriented domain, which can also improve the empathy of the user who answers the voice and make the emotion flow.
  • the generating of the response voice of the voice information according to the emotion type and playing the response voice is specifically: according to the emotion empathy principle and the emotion-oriented principle, generating the voice information corresponding to the emotion type according to the emotion type Answering voice, and playing the answering voice, wherein the answering voice includes emotional response sentences and function-oriented sentences.
  • an emotional visual image of the corresponding response voice may also be generated, and the response voice and the emotional visual image may be invoked to pass the visual The image interprets the corresponding response voice.
  • the voice characteristics of the voice information can be obtained, and the voice characteristics are used to play the response voice, so that the response voice is in a language consistent with the voice information territory.
  • the voice characteristics refer to the voice characteristics of the voice information.
  • the voice characteristics may include volume, tone, and audio.
  • the voice characteristics of the answering voice are determined according to the volume, tone, and audio.
  • the method for playing the response information is determined according to the voice characteristics and the accent, and the response information is played in a determined manner.
  • the method in order to improve the initiative of emotional voice interaction, after generating a response voice according to the type of emotion, the number of emotional voice interactions can be recorded, and when the number reaches a preset threshold, it automatically starts The preset active emotion mode.
  • the method further includes:
  • the active emotion mode is preset, and when the active emotion mode is turned on, the terminal device actively plays voice to the user when the terminal device is turned on or the voice function is woken up.
  • the preset threshold value is preset, for example, 5 etc. That is to say, when the number of emotional expressions performed by the user and the terminal device reaches 5 times, the terminal device automatically starts the active emotion mode to enter the active emotion processor, where the active emotion processor actively sends a voice to the user after turning on To interact emotionally with users.
  • the terminal device will immediately utter a voice, that is, when the terminal device enters the emotional mode, the terminal device utters a voice.
  • the method further includes:
  • instruction intent determine whether the instruction intent is clear, that is, determine whether the instruction intent can determine the content of the instruction
  • the domain clarification method is used to query the user until the instruction content can be determined, and the instruction corresponding to the instruction intent is executed.
  • the instruction intent is what instruction the user needs to execute, and when the user's current intention cannot determine the instruction content, the domain clarification inquiry method can be used to inquire, so that the user can further clarify the intention, when the instruction intent can determine the instruction content , According to the instructions intended to perform the corresponding operation. For example, if the user says “sweet”, the smart device generates and plays a query voice "Do you want to watch a movie or listen to a song?" By using domain clarification. At this time, if the user says "listen to a song", the smart device determines that the user's instruction is If the song is sweet, then the smart device performs the sweet operation.
  • the method further includes: when executing an instruction corresponding to the instruction intention, generating and playing an excitation voice according to the instruction. For example, after performing the sweet operation of playing a song, a voice of "please enjoy slowly” can be generated and played.
  • the present disclosure also provides a computer-readable storage medium that stores one or more programs, and the one or more programs can be used by one or more The processor executes to implement the steps in the emotion-based voice interaction method as described above
  • the present disclosure also provides a terminal device, as shown in FIG. 4, which includes at least one processor (processor) 20; a display screen 21; and a memory (memory) 22, and may also include a communication interface (Communications) Interface 23 and a bus twenty four.
  • the processor 20, the display screen 21, the memory 22 and the communication interface 23 can complete communication with each other through the bus 24.
  • the display screen 21 is set to display a user guide interface preset in the initial setting mode.
  • the communication interface 23 can transmit information.
  • the processor 20 may call logical instructions in the memory 22 to execute the method in the above-mentioned embodiment.
  • logic instructions in the aforementioned memory 22 may be implemented in the form of software functional units and sold or used as independent products, and may be stored in a computer-readable storage medium.
  • the memory 22 as a computer-readable storage medium may be configured to store software programs and computer-executable programs, such as program instructions or modules corresponding to the method in the embodiments of the present disclosure.
  • the processor 20 executes functional applications and data processing by running software programs, instructions, or modules stored in the memory 22, that is, implementing the method in the foregoing embodiment.
  • the memory 22 may include a storage program area and a storage data area, wherein the storage program area may store an operating system and application programs required for at least one function; the storage data area may store data created according to the use of a terminal device and the like.
  • the memory 22 may include a high-speed random access memory, and may also include a non-volatile memory.
  • U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or CD-ROM and other media that can store program codes can also be temporary State storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Child & Adolescent Psychology (AREA)
  • Hospice & Palliative Care (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种基于情感的语音交互方法、存储介质及终端设备,该方法包括:接收待处理的语音信息,获取语音信息的意图类型(S10);当意图类型为情感意图时,确定语音信息的情感类型(S20);根据情感类型生成语音信息的应答语音,并播放应答语音(S30)。该方法能够根据情感意图生成相应的应答语音,使得应答语音可以与用户的情感意图相配合,在实现语音交互的同时达到情感交互,给用户的使用带来方便。

Description

一种基于情感的语音交互方法、存储介质及终端设备
优先权
本申请要求于2018年11月16日提交中国专利局、申请号为2018113665887、申请名称为“一种基于情感的语音交互方法、存储介质及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及智能终端技术领域,特别涉及一种基于情感的语音交互方法、存储介质及终端设备。
背景技术
随着人工智能的日益发展,自然语言处理已经成为一个重要的研究方向,近年来,在不同领域的应用越来越广阔,例如:家居、汽车、教育等领域。由于人工智能在语言领域的发展,除了原有的遥控按键交互、触屏交互,自然语言的交互,也使人机交互也进入了新的篇章。
虽然人工智能的不断迅速发展,自然语言交互的效率和精准度、智能性也在不断前进,但现有的产品的对话依旧固定机械化,缺少情感,只能按照预置的固定策略来回答用户,在情感情绪表达方面,与用户在心理层面的沟通不能产生互动,不能满足用户对人机交互的需求。
公开内容
鉴于现有技术的不足,本公开旨在提供一种基于情感的语音交互方法、存储介质及终端设备。
本公开所采用的技术方案如下:
一种基于情感的语音交互方法,其包括:
接收待处理的语音信息,获取所述语音信息的意图类型;
当所述意图类型为情感意图时,确定所述语音信息的情感类型;
根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音。
进一步,在一个实施例中,所述接收待处理的语音信息,获取所述语音信息的意图类型具体包括:
接收待处理的语音信息,对所述语音信息进行词语划分以得到若干词语,判断划分得到的若干词语中是否包括情感关键词;
当所述若干词语包括情感关键词时,判定所述语音信息的意图类型为情感意图。
进一步,在一个实施例中,所述接收待处理的语音信息,对所述语音信息进行词语划分以得到若干词语,判断划分得到的若干词语中是否包括情感关键词具体包括:
接收待处理的语音信息,将所述语音信息转换为文本信息;
将所述文本信息划分为若干词语,并在划分得到的若干词语中筛选满足预设条件的词语;
判断筛选得到的满足预设条件的词语中是否包括情感关键词。
进一步,在一个实施例中,所述预设条件为词语的词性不属于预设的词性列表。
进一步,在一个实施例中,所述预设的词性列表包括非关键词性,其中,所述非关键词性为不具有情感以及动作意思的词性。
进一步,在一个实施例中,所述当所述若干词语包括情感关键词时,判定所述语音信息的意图类型为情感意图具体包括:
当所述若干词语包括情感关键词时,获取包括情感关键词的数量;
当所述数量等于1时,判定所述语音信息的意图类型为情感意图;
当所述数量大于1时,检测各情感关键词对应的情感类型是否相同,若各情感关键词对应的情感类型相同,则判定所述语音信息的意图类型为情感意图。
进一步,在一个实施例中,所述方法还包括:
若各情感关键词对应的情感类型不相同,则判定所述语音信息的意图类型为指令意图。
进一步,在一个实施例中,所述方法还包括:
当所述语音信息未包括情感关键词时,判定所述语音信息的意图类型为指令意图。
进一步,在一个实施例中,所述方法还包括:
当所述意图类型为指令意图时,判断所述指令意图是否能够确定指令内容;
当所述指令意图能够确定指令内容时,采用域澄清方式询问用户直至能够确定指令内容,并执行所述指令意图对应的指令。
进一步,在一个实施例中,所述方法还包括:
当执行所述指令意图对应的指令时,根据所述指令生成并播放激励语音。
进一步,在一个实施例中,所述当所述意图类型为情感意图时,确定所述语音信息的情感类型具体为:
当所述意图类型为情感意图时,对所述语音信息进行情感分析,以得到所述语音信息对应的情感类型,其中,所述情感分析包括词汇情感分析、句意情感分析以及声音节奏情感分析中的一种或多种分析方式。
进一步,在一个实施例中,所述根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音具体为:
依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音,并播放所述应答语音,其中,所述应答语音包括情感回应语句和功能导向语句。
进一步,在一个实施例中,所述依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音,并播放所述应答语音具体包括:
依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音;
获取所述语音信息的语音特征,以所述语音特征播放所述应答语音。
进一步,在一个实施例中,所述依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音,并播放所述应答语音具体包括:
依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音;
根据所述应答语音生成情感视觉图像,并通过所述视觉图像演绎所述对应的应答语音。
进一步,在一个实施例中,所述接收待处理的语音信息,获取所述语音信息的意图类型之前包括:
当接收到语音唤醒指令时,启动语音倾听模式并主动播放预设语音。
进一步,在一个实施例中,所述根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音之后还包括:
记录情感类型为情感意图的语音信息的数量,并当所述数量达到预设阈值时启动预设的主动情感模式,其中,终端设备在所述主动情感模式下主动播放语音。
一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如上任一所述的基于情感的语音交互方法中的步骤。
一种终端设备,包括:处理器以及存储器;
所述存储器上存储有可被所述处理器执行的计算机可读程序;
所述处理器执行所述计算机可读程序时实现如上任一所述的基于情感的语音交互方法中的步骤。
一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如上任一所述的基于情感的语音交互方法中的步骤。
一种终端设备,其包括:处理器以及存储器;
所述存储器上存储有可被所述处理器执行的计算机可读程序;
所述处理器执行所述计算机可读程序时实现如上任一所述的基于情感的语音交互方法中的步骤。
有益效果:与现有技术相比,本公开提供了一种基于情感的语音交互方法、存储介质及终端设备,所述方法包括:接收待处理的语音信息,取所述语音信息的意图类型;当所述意图类型为情感意图时,确定所述语音信息的情感类型;根据所述情感类型生成 所述语音信息的应答语音,并播放所述应答语音。本公开通过对语音信息的意图类型进行判断,并当意图类型为情感意图时,根据情感意图生成相应的应答语音,使得所述应答语音可以与用户的情感意图相配合,在实现语音交互的同时达到情感交互,给用户的使用带来方便。
附图说明
图1为本公开提供的基于情感的语音交互方法的一个实施例的流程图。
图2为本公开提供的基于情感的语音交互方法的一个实施例中步骤S10的流程图。
图3为本公开提供的基于情感的语音交互方法的一个实施例中步骤S20的流程图。
图4为本公开提供的一种终端设备的结构原理图。
具体实施方式
本公开提供一种基于情感的语音交互方法、存储介质及终端设备,为使本公开的目的、技术方案及效果更加清楚、明确,以下参照附图并举实施例对本公开进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本公开,并不用于限定本公开。
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本公开的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。
本技术领域技术人员可以理解,除非另外定义,这里使用的所有术语(包括技术术语和科学术语),具有与本公开所属领域中的普通技术人员的一般理解相同的意义。还应该理解的是,诸如通用字典中定义的那些术语,应该被理解为具有与现有技术的上下 文中的意义一致的意义,并且除非像这里一样被特定定义,否则不会用理想化或过于正式的含义来解释。
下面结合附图,通过对实施例的描述,对公开内容作进一步说明。
本实施例提供了一种基于情感的语音交互方法,如图1所示,所述方法包括:
S10、接收待处理的语音信息,获取所述语音信息的意图类型。
S20、当所述意图类型为情感意图时,确定所述语音信息的情感类型。
S30、根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音。
具体地,所述语音信息可以是终端设备通过拾音器采集到用户实时输入的语音,也可以是外部设备通过网络发送至终端设备的语音等。在实际应用中,为了减少语音交互对终端设备的损耗,图1所示的流程的执行主体可以是云端服务器,也就是说,终端设备接收待处理的语音信息,并将所述语音信息发送至云端服务器,云端服务器可根据获取到的语音信息生成应答语音,并将应答语音返回给终端设备,再有终端设备将应答语音播放给用户。其中,所述终端设备为具有语音功能的终端设备。当然,值得说明的,图1所述的流程的执行主体也可以是终端设备本身。
进一步,在所述步骤S10中,所述意图类型指的是所述语音信息期望表达的意思的类型,所述意图类型用于确定所述语音信息的应答方式,其中,不同意图类型的语音信息对应不同的应答方式,这样可以根据用户的意图快速执行相应的操作,提高语音应答的及时性。所述意图包括情感意图和指令意图,所述情感意图指的是语音信息期望表达情感,即所述语音信息包含有情感表达,所述指令意图指的是语音信息期望表达操作指令,即所述语音信息仅包含指令表达。在本实施例的一个可能实现方式中,所述意图类型可以通过判断所述意图类型是否为情感意图来确定,当所述意图类型不是情感意图时,所述意图类型为指令意图。所述情感意图可以通过确定所述语音信息是否包括情感关键词来确定。相应的,如图2所示,所述接收待处理的语音信息,获取所述语音信息的意图类型具体包括:
S11、接收待处理的语音信息,对所述语音信息进行词语划分以得到若干词语,判断划分得到的若干词语中是否包括情感关键词;
S12、当所述若干词语包括情感关键词时,判定所述语音信息的意图类型为情感意图;
S13、当所述若干词语未包括情感关键词时,判定所述语音信息的意图类型为指令意图。
具体地,所述情感关键词为具有情感色彩的词语,所述情感关键词可以是预设存储于关键词数据库内,当将所述语音信息进行词语划分后,可以在关键词数据库查找划分得到的各词语,如果查找到,则判定所述语音信息包括情感关键词,如果未查找到,则判定所述语音信息未包括情感关键词。在本实施例中,在对所述语音信息进行划分之前,需要识别所述语音信息,以将所述语音信息转换为文本信息,之后再对所述文本信息进行词语划分,并且在词语划分后,可以对划分得到词语进行词性筛选,将非关键词性(例如,副词以及人称代词等)的词语删除,以提高情感关键词的查找速度。其中,所述非关键词性可以预先存储在词性列表中,在划分得到若干词语后,可以获取各词语的词性,并根据该词性列表对划分得到若干词语进行筛选,以去除词性属性词性列表的词语,其中,所述非关键词性为不具有情感特征以及动作特征的词性,感情特征指的是具有感情色彩,动作特征指的是具有动作意义。例如,所述语音信息对应的文件信息为“今天真累呀”,对所述文本信息进行词语划分可以得到“今天”“真累”和“呀”,对词语进行筛选后可以得到“今天”和“真累”,其中,所述“真累”中的累为情感关键词,从而可以确定所述语音信息的意图类型为情感意图。当然,在实际应用中,当所述语音信息未包括情感关键词时,则可以判定所述语音信息的意图类型为指令意图。
进一步,当在关键词数据库内查找划分得到的各词语时,关键词数据库内可以包含多个划分得到的词语。而当查找到多个情感关键词时,可以判断多个情感关键词对应的情感类型是否相同,如果相同在多个情感关键词中选取一个情感关键词作为所述语音信息的情感关键词。当多个情感关键词对应的情感类型不同时,可以将所述语音信息的意图类型确定为指令意图。当然,在实际应用中,当语音信息包含多个不同类型的情感关键词时,可以获取所述语音信息对应的语气和语调,根据所述语气和语调确定所述语音信息对应的情感关键词。
进一步,在所述步骤S20中,所述情感类型指的是用户的情感状态,例如,情感类型为开心、不开心、生气以及伤心等。所述情感类型可以根据所述情感关键词直接确定,也可以根据情感关键词以及语音信息的整体句意来确定。相应的,如图3所示,所述当所述意图类型为情感意图时,确定所述语音信息的情感类型具体包括:
S21、当所述意图类型为情感意图时,对所述语音信息进行情感分析,其中,所述情感分析为词汇情感分析、句意情感分析以及声音节奏情感分析中的一种或多种;
S22、根据所述情感分析确定所述语音信息对应的情感类型。
具体地,所述词汇情感分析可针对语音信息的词汇进行情感分析,其中,汉字的情感有褒义词、贬义词、积极与消极的语气词、骂人词等,不同词汇有着其情感代表。所述句意情感分析针对语音信息的词汇和完整句意,通过自然语言处理分析,来进行情感分析,其中,所述句意情感分析主要是基于词汇情感分析进行的。所述声音节奏情感分析对语音信息的声音进行分析,与历史交互记录以及标准声音情感节奏库等进行对比判断声音节奏,预测情感。这样通过所述情感分析为词汇情感分析、句意情感分析和/或声音节奏情感分析可以确定所述语音信息对应的情感类型。在本实施例中,优选采用声音节奏情感分析确定语音信息的情感类型,即对所述语音信息的情感意图进行划分,例如,情感意图属于伤心类等。其中,所述声音节奏情感分析还可以采用对音素进行分析以确定情感意图对应的情感类型。所述音素可以包括声音的高低、起伏、声调等。也就是说,可以根据语音信息的声音的高低、起伏、声调等确定其对应的情感类型。在实际应用中,可以预设各情感类型对也得声调范围,将语音信息对应的声调与声调范围进行匹配,以确定其所属的情感类型。
此外,在本公开的一个实施例中,可以采用声音的高低或起伏时,也可以采用相同的方式进行;或者将声音的高低、起伏以及声调分别设置范围,之后依次将声音的高低、起伏以及声调与其对应的范围进行对比,确定声音的高低、起伏以及声调对应的情感类型。在确定声音的高低、起伏以及声调对应的情感类型后,选取数量多的情感类型作为所述语音信息的情感类型,当声音的高低、起伏以及声调分别对应不同的情感类型时,可以根据声音的高低、起伏以及声调的优选等级来确定所述语音信息对应的情感类 型。其中,所述声音的高低、起伏以及声调的优选等级为预选设定的。当然,当声音的高低、起伏以及声调分别对应不同的情感类型时,也可以随机选取一个情感类型作为所述语音信息对应的情感类型。
进一步,在所述步骤S30中,所述应答语音是根据所述语音信息的情感类型生成的语音信息,并且所述应答语音是基于情感同理原则及情感导向原则生成,所述基于情感同理原则指的是采用与语音信息携带情感相同的原则,所述情感导向原则指的是引导用户情感释放方向。由此,所述情感同理原则及情感导向原则包括情感同理部分和情感导向域两部分。其中,所述情感同理部分用于与用户产生情感共鸣,所述情感导向域部分用于给用户提供缓解情绪的方式。例如,语音信息为“今天真累呀”,基于情感同理原则及情感导向原则生成的应答语音可以为“噢,那要放松休息一下了,听听音乐吧”,其中,“噢,那要放松休息一下了”为情感同理部分,“听听音乐吧”为情感导向域,这也可以提高应答语音与的用户共情,使情绪可以流动。相应的,所述根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音具体为:依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音,并播放所述应答语音,其中,所述应答语音包括情感回应语句和功能导向语句。当然,在实际应用中,当根据所述情感类型生成所述语音的应答语音时,还可以生成相应的应答语音的情感视觉图像,并调用所述应答语音及与情感视觉图像,通过所述视觉图像演绎所述对应的应答语音。
进一步,为了提高所述应答语句与所述语音信息共鸣,在生成应答语音时,可以获取语音信息的语音特征,采用所述语音特征来播放应答语音,使得所述应答语音在符合语音信息的语境。其中,所述语音特征指的是所述语音信息的语音特色,例如,所述语音特征可以包括音量、音调以及音频等,根据所述音量、音调以及音频确定应答语音的语音特征,在采用所述语音特征来播放所述应答语音。当然,在实际应用中,还可以获取语音信息对应的用户标识,并根据用户标识确定所述用户的说话格式习惯、口音以及口头禅等,可以根据说话格式习惯、情感类型以及口头禅生成应答语音,并根据语音特征以及口音确定播放应答信息的方式,并采用确定的方式播放所述应答信息。
进一步,在本公开的一个实施例中,为了提高情感语音交互的主动性,在根据情感类型生成应答语音后,可以记录情感语音交互的次数,并且当所述次数达到预设阈值时,自动开始预设的主动情感模式。相应的,所述根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音之后还包括:
记录情感类型为情感意图的语音信息的数量,并当所述数量达到预设阈值时启动预设的主动情感模式,其中,所述主动情感模式下终端设备主动播放语音。
具体地,所述主动情感模式为预设设置的,当所述主动情感模式开启时,终端设备在开机或者语音功能唤醒时,主动向用户播放语音。其中,所述预设阈值为预先设置的,例如,5等。也就是说,当用户与终端设备进行情绪情感表达的次数达到5次后,终端设备自动启动主动情感模式进入主动情感处理机,其中,所述主动情感处理机为用户开机后主动向发出语音,以与用户进行情感互动。当然,值得说明的,当终端设备自动启动主动情感模式后,终端设备会立刻主动发出语音,也就是说,在终端设备进入情感模式的时刻,终端设备发出语音。
进一步,在本公开的一个实施例中,所述方法还包括:
当所述意图类型为指令意图时,判断所述指令意图是否清晰,即判断所述指令意图是否能够确定指令内容;
当所述指令意图不能够确定指令内容时,采用域澄清方式询问用户直至能够确定指令内容,并执行所述指令意图对应的指令。
具体地,所述指令意图为用户需要执行什么指令,而用户当前意图不能够确定指令内容时,可以采用域澄清询问的方式进行询问,以让用户进一步澄清意图,当指令意图能够确定指令内容时,根据所述指令意图执行相应的操作。例如用户说“甜蜜蜜”,智能设备采用域澄清的方式生成并播放询问语音“您是要看电影还是听歌呢?”,此时用户说“听歌”,则智能设备确定用户指令意图为播放歌曲甜蜜蜜,那么智能设备执行播放歌曲甜蜜蜜的操作。此外,为了提供语音交互的情感性,在根据指令意图对应的指令执行完操作后,可以生成相应的激励语音。相应的,所述方法还包括:当执行所述指令意图对 应的指令时,根据所述指令生成并播放激励语音。例如,当执行完播放歌曲甜蜜蜜的操作后,可以生成并播放“请慢慢欣赏”的语音。
基于上述基于情感的语音交互方法,本公开还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如上所述的基于情感的语音交互方法中的步骤
本公开还提供了一种终端设备,如图4所示,其包括至少一个处理器(processor)20;显示屏21;以及存储器(memory)22,还可以包括通信接口(Communications Interface)23和总线24。其中,处理器20、显示屏21、存储器22和通信接口23可以通过总线24完成相互间的通信。显示屏21设置为显示初始设置模式中预设的用户引导界面。通信接口23可以传输信息。处理器20可以调用存储器22中的逻辑指令,以执行上述实施例中的方法。
此外,上述的存储器22中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。
存储器22作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序,如本公开实施例中的方法对应的程序指令或模块。处理器20通过运行存储在存储器22中的软件程序、指令或模块,从而执行功能应用以及数据处理,即实现上述实施例中的方法。
存储器22可包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器22可以包括高速随机存取存储器,还可以包括非易失性存储器。例如,U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等多种可以存储程序代码的介质,也可以是暂态存储介质。
此外,上述存储介质以及移动终端中的多条指令处理器加载并执行的具体过程在上述方法中已经详细说明,在这里就不再一一陈述。
最后应说明的是:以上实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的精神和范围。

Claims (18)

  1. 一种基于情感的语音交互方法,其包括:
    接收待处理的语音信息,获取所述语音信息的意图类型;
    当所述意图类型为情感意图时,确定所述语音信息的情感类型;
    根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音。
  2. 根据权利要求1所述基于情感的语音交互方法,所述接收待处理的语音信息,获取所述语音信息的意图类型具体包括:
    接收待处理的语音信息,对所述语音信息进行词语划分以得到若干词语,判断划分得到的若干词语中是否包括情感关键词;
    当所述若干词语包括情感关键词时,判定所述语音信息的意图类型为情感意图。
  3. 根据权利要求2所述基于情感的语音交互方法,所述接收待处理的语音信息,对所述语音信息进行词语划分以得到若干词语,判断划分得到的若干词语中是否包括情感关键词具体包括:
    接收待处理的语音信息,将所述语音信息转换为文本信息;
    将所述文本信息划分为若干词语,并在划分得到的若干词语中筛选满足预设条件的词语;
    判断筛选得到的满足预设条件的词语中是否包括情感关键词。
  4. 根据权利要求3所述基于情感的语音交互方法,所述预设条件为词语的词性不属于预设的词性列表。
  5. 根据权利要求4所述基于情感的语音交互方法,所述预设的词性列表包括非关键词性,其中,所述非关键词性为不具有情感特征以及动作特征的词性。
  6. 根据权利要求2所述基于情感的语音交互方法,所述当所述若干词语包括情感关键词时,判定所述语音信息的意图类型为情感意图具体包括:
    当所述若干词语包括情感关键词时,获取包括情感关键词的数量;
    当所述数量等于1时,判定所述语音信息的意图类型为情感意图;
    当所述数量大于1时,检测各情感关键词对应的情感类型是否相同,若各情感关键词对应的情感类型相同,则判定所述语音信息的意图类型为情感意图。
  7. 根据权利要求6所述基于情感的语音交互方法,所述方法还包括:
    若各情感关键词对应的情感类型不相同,则判定所述语音信息的意图类型为指令意图。
  8. 根据权利要求1所述基于情感的语音交互方法,所述方法还包括:
    当所述语音信息未包括情感关键词时,判定所述语音信息的意图类型为指令意图.
  9. 根据权利要求7或8所述基于情感的语音交互方法,所述方法还包括:
    当所述意图类型为指令意图时,判断所述指令意图是否能够确定指令内容;
    当所述指令意图能够确定指令内容时,采用域澄清方式询问用户直至能够确定指令内容,并执行所述指令意图对应的指令。
  10. 根据权利要求9所述基于情感的语音交互方法,所述方法还包括:
    当执行所述指令意图对应的指令时,根据所述指令生成并播放激励语音。
  11. 根据权利要求1所述基于情感的语音交互方法,所述当所述意图类型为情感意图时,确定所述语音信息的情感类型具体为:
    当所述意图类型为情感意图时,对所述语音信息进行情感分析,以得到所述语音信息对应的情感类型,其中,所述情感分析包括词汇情感分析、句意情感分析以及声音节奏情感分析中的一种或多种分析方式。
  12. 根据权利要求1所述基于情感的语音交互方法,所述根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音具体为:
    依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音,并播放所述应答语音,其中,所述应答语音包括情感回应语句和功能导向语句。
  13. 根据权利要求12所述基于情感的语音交互方法,所述依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音,并播放所述应答语音具体包括:
    依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音;
    获取所述语音信息的语音特征,以所述语音特征播放所述应答语音。
  14. 根据权利要求12所述基于情感的语音交互方法,所述依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音,并播放所述应答语音具体包括:
    依据情感同理原则及情感导向原则,根据所述情感类型生成所述语音信息对应的应答语音;
    根据所述应答语音生成情感视觉图像,并通过所述视觉图像演绎所述对应的应答语音。
  15. 根据权利要求1所述基于情感的语音交互方法,所述接收待处理的语音信息,获取所述语音信息的意图类型之前包括:
    当接收到语音唤醒指令时,启动语音倾听模式并主动播放预设语音。
  16. 根据权利要求1所述基于情感的语音交互方法,所述根据所述情感类型生成所述语音信息的应答语音,并播放所述应答语音之后还包括:
    记录情感类型为情感意图的语音信息的数量,并当所述数量达到预设阈值时启动预设的主动情感模式,其中,终端设备在所述主动情感模式下主动播放语音。
  17. 一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现如权利要求1~16任意一项所述的基于情感的语音交互方法中的步骤。
  18. 一种终端设备,包括:处理器以及存储器;
    所述存储器上存储有可被所述处理器执行的计算机可读程序;
    所述处理器执行所述计算机可读程序时实现如权利要求1-16任意一项所述的基于情感的语音交互方法中的步骤。
PCT/CN2019/118580 2018-11-16 2019-11-14 一种基于情感的语音交互方法、存储介质及终端设备 WO2020098756A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/261,832 US11640832B2 (en) 2018-11-16 2019-11-14 Emotion-based voice interaction method, storage medium and terminal device using pitch, fluctuation and tone
EP19885273.3A EP3882910A4 (en) 2018-11-16 2019-11-14 PROCEDURE, STORAGE MEDIUM AND DEVICE ON EMOTION-BASE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811366588.7A CN111199732B (zh) 2018-11-16 2018-11-16 一种基于情感的语音交互方法、存储介质及终端设备
CN201811366588.7 2018-11-16

Publications (1)

Publication Number Publication Date
WO2020098756A1 true WO2020098756A1 (zh) 2020-05-22

Family

ID=70731026

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118580 WO2020098756A1 (zh) 2018-11-16 2019-11-14 一种基于情感的语音交互方法、存储介质及终端设备

Country Status (4)

Country Link
US (1) US11640832B2 (zh)
EP (1) EP3882910A4 (zh)
CN (1) CN111199732B (zh)
WO (1) WO2020098756A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111754979A (zh) * 2020-07-21 2020-10-09 南京智金科技创新服务中心 智能语音识别方法及装置
CN113053388B (zh) * 2021-03-09 2023-08-01 北京百度网讯科技有限公司 语音交互方法、装置、设备和存储介质
CN113076407B (zh) * 2021-03-22 2023-07-21 联想(北京)有限公司 一种信息处理方法及装置
CN113126951B (zh) * 2021-04-16 2024-05-17 深圳地平线机器人科技有限公司 音频播放方法、装置、计算机可读存储介质及电子设备
CN114969282B (zh) * 2022-05-05 2024-02-06 迈吉客科技(北京)有限公司 基于富媒体知识图谱多模态情感分析模型的智能交互方法
CN115904075B (zh) * 2022-11-28 2024-01-02 中国汽车技术研究中心有限公司 车辆配置改进方法、系统、设备和存储介质
CN116030811B (zh) * 2023-03-22 2023-06-30 广州小鹏汽车科技有限公司 语音交互方法、车辆及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160240213A1 (en) * 2015-02-16 2016-08-18 Samsung Electronics Co., Ltd. Method and device for providing information
CN106254186A (zh) * 2016-08-05 2016-12-21 易晓阳 一种语音交互识别控制系统
CN106599998A (zh) * 2016-12-01 2017-04-26 竹间智能科技(上海)有限公司 基于情感特征调整机器人回答的方法及系统
CN108334583A (zh) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 情感交互方法及装置、计算机可读存储介质、计算机设备

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004049304A1 (ja) * 2002-11-25 2004-06-10 Matsushita Electric Industrial Co., Ltd. 音声合成方法および音声合成装置
US7043435B2 (en) * 2004-09-16 2006-05-09 Sbc Knowledgfe Ventures, L.P. System and method for optimizing prompts for speech-enabled applications
KR20110072847A (ko) * 2009-12-23 2011-06-29 삼성전자주식회사 열려진 사용자 의도 처리를 위한 대화관리 시스템 및 방법
US9788777B1 (en) * 2013-08-12 2017-10-17 The Neilsen Company (US), LLC Methods and apparatus to identify a mood of media
US9858039B2 (en) * 2014-01-28 2018-01-02 Oracle International Corporation Voice recognition of commands extracted from user interface screen devices
KR20150123579A (ko) * 2014-04-25 2015-11-04 삼성전자주식회사 사용자 음성으로부터 감정정보를 확인하는 방법 및 장치
CN105654943A (zh) 2015-10-26 2016-06-08 乐视致新电子科技(天津)有限公司 一种语音唤醒方法、装置及系统
CN105334743B (zh) * 2015-11-18 2018-10-26 深圳创维-Rgb电子有限公司 一种基于情感识别的智能家居控制方法及其系统
CN106910513A (zh) * 2015-12-22 2017-06-30 微软技术许可有限责任公司 情绪智能聊天引擎
WO2017130486A1 (ja) * 2016-01-28 2017-08-03 ソニー株式会社 情報処理装置、情報処理方法およびプログラム
JP2017199254A (ja) * 2016-04-28 2017-11-02 日本電気株式会社 会話分析装置、会話分析方法および会話分析プログラム
US10268769B2 (en) * 2016-08-29 2019-04-23 International Business Machines Corporation Sentiment analysis
US9812151B1 (en) 2016-11-18 2017-11-07 IPsoft Incorporated Generating communicative behaviors for anthropomorphic virtual agents based on user's affect
CN106710590B (zh) * 2017-02-24 2023-05-30 广州幻境科技有限公司 基于虚拟现实环境的具有情感功能的语音交互系统及方法
JP6751536B2 (ja) * 2017-03-08 2020-09-09 パナソニック株式会社 装置、ロボット、方法、及びプログラム
CN109417504A (zh) * 2017-04-07 2019-03-01 微软技术许可有限责任公司 自动聊天中的语音转发
CN107562850A (zh) * 2017-08-28 2018-01-09 百度在线网络技术(北京)有限公司 音乐推荐方法、装置、设备及存储介质
CN107515944A (zh) * 2017-08-31 2017-12-26 广东美的制冷设备有限公司 基于人工智能的交互方法、用户终端、及存储介质
CN108197115B (zh) * 2018-01-26 2022-04-22 上海智臻智能网络科技股份有限公司 智能交互方法、装置、计算机设备和计算机可读存储介质
CN108711423A (zh) * 2018-03-30 2018-10-26 百度在线网络技术(北京)有限公司 智能语音交互实现方法、装置、计算机设备及存储介质
US10566010B2 (en) * 2018-04-20 2020-02-18 Spotify Ab Systems and methods for enhancing responsiveness to utterances having detectable emotion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160240213A1 (en) * 2015-02-16 2016-08-18 Samsung Electronics Co., Ltd. Method and device for providing information
CN106254186A (zh) * 2016-08-05 2016-12-21 易晓阳 一种语音交互识别控制系统
CN106599998A (zh) * 2016-12-01 2017-04-26 竹间智能科技(上海)有限公司 基于情感特征调整机器人回答的方法及系统
CN108334583A (zh) * 2018-01-26 2018-07-27 上海智臻智能网络科技股份有限公司 情感交互方法及装置、计算机可读存储介质、计算机设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3882910A4

Also Published As

Publication number Publication date
EP3882910A4 (en) 2022-08-10
EP3882910A1 (en) 2021-09-22
US20210304789A1 (en) 2021-09-30
CN111199732A (zh) 2020-05-26
US11640832B2 (en) 2023-05-02
CN111199732B (zh) 2022-11-15

Similar Documents

Publication Publication Date Title
WO2020098756A1 (zh) 一种基于情感的语音交互方法、存储介质及终端设备
US20230206940A1 (en) Method of and system for real time feedback in an incremental speech input interface
US11720326B2 (en) Audio output control
US11948556B2 (en) Detection and/or enrollment of hot commands to trigger responsive action by automated assistant
US10068573B1 (en) Approaches for voice-activated audio commands
CN108228132B (zh) 语音启用装置及其中执行的方法
US10056078B1 (en) Output of content based on speech-based searching and browsing requests
US11823678B2 (en) Proactive command framework
US11184412B1 (en) Modifying constraint-based communication sessions
US20200126566A1 (en) Method and apparatus for voice interaction
WO2017071182A1 (zh) 一种语音唤醒方法、装置及系统
WO2019192250A1 (zh) 语音唤醒方法及装置
US11276403B2 (en) Natural language speech processing application selection
US11355115B2 (en) Question answering for a voice user interface
US10600419B1 (en) System command processing
US11797629B2 (en) Content generation framework
CN116917984A (zh) 交互式内容输出
CN113761268A (zh) 音频节目内容的播放控制方法、装置、设备和存储介质
US11721347B1 (en) Intermediate data for inter-device speech processing
CN108492826B (zh) 音频处理方法、装置、智能设备及介质
US10957313B1 (en) System command processing
US11632345B1 (en) Message management for communal account
CN113035181A (zh) 语音数据处理方法、设备和系统
US11914923B1 (en) Computer system-based pausing and resuming of natural language conversations
CN116564290A (zh) 一种多模态的语音停顿判断方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19885273

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019885273

Country of ref document: EP

Effective date: 20210616