WO2023097745A1 - Deep learning-based intelligent human-computer interaction method and system, and terminal - Google Patents

Deep learning-based intelligent human-computer interaction method and system, and terminal Download PDF

Info

Publication number
WO2023097745A1
WO2023097745A1 PCT/CN2021/136927 CN2021136927W WO2023097745A1 WO 2023097745 A1 WO2023097745 A1 WO 2023097745A1 CN 2021136927 W CN2021136927 W CN 2021136927W WO 2023097745 A1 WO2023097745 A1 WO 2023097745A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice
deep learning
language
intonation
Prior art date
Application number
PCT/CN2021/136927
Other languages
French (fr)
Chinese (zh)
Inventor
张庆茂
刘培刚
Original Assignee
山东远联信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东远联信息科技有限公司 filed Critical 山东远联信息科技有限公司
Publication of WO2023097745A1 publication Critical patent/WO2023097745A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/632Query formulation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Definitions

  • This application relates to the field of artificial intelligence interaction technology, in particular to an intelligent interaction method, system and terminal based on deep learning.
  • Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing and expert systems, etc. Since the birth of artificial intelligence, the theory and technology have become increasingly mature, and the application fields have also continued to expand. It can be imagined that the technological products brought by artificial intelligence in the future will be the "container" of human wisdom. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but it can think like human beings, and it may surpass human intelligence.
  • speech recognition and natural language processing are widely used in smart terminals and online customer service in the service industry, such as mobile, China Unicom, China Telecom and other operators, as well as government service hotlines.
  • the artificial intelligence dialogue in traditional technology generally sets a fixed dialogue template. When the user accesses it, the intelligent customer service will guide the user to make their own request through the templated language through the guiding language. After identifying the user's request, give the corresponding response according to the user's request.
  • the traditional intelligent customer service can realize the basic voice recognition function, if the user asks in a dialect, or does not use a template language when making an inquiry, the intelligent customer service will enter an endless loop at this time, constantly asking the user's needs, and then will reduce user satisfaction.
  • the embodiment of the present application provides an intelligent interaction method based on deep learning, including: obtaining the voice feature information of the access user; inputting the voice feature information into the trained deep learning neural network, and determining the response strategy ; Answer the user according to the answer policy.
  • the traditional thin-plate language is abandoned when the intelligent customer service talks with the user, and the user is given priority in explaining the appeal. Then analyze the utterance of the appeal to obtain a response strategy, thereby ensuring a response to the user's appeal, so that there is no need to repeatedly ask the user's needs, thereby improving user satisfaction.
  • the acquiring the voice feature information of the access user includes: matching the language of the user's voice to determine the language information; according to the language information and the corresponding The language base determines the semantic and intonation meaning of phonetic correspondence.
  • determining the semantics and intonation meaning corresponding to the voice according to the language information and the corresponding language library includes: according to the language Each word in the user's sentence is determined by the database and voice voiceprint information; the determined words are combined and then part-of-speech is divided to determine the semantics of the user's voice; the user's intonation meaning is determined by combining the intonation of the voice and the intonation feature information of the current language.
  • the speech feature information is input into a trained deep learning neural network to obtain a response strategy, including: the depth
  • the learning neural network determines the user's emotional characteristics according to the intonation meaning; if the emotional characteristics represent the user's emotional stability, then select the corresponding response words from the response database according to the semantics of the user's voice; or, if the emotional characteristics represent If the user is emotionally anxious, it will be transferred to manual service.
  • the agent if the agent is busy when transferring to a manual agent, a transfer intelligent customer service is temporarily established, and the transfer intelligent customer service imitates In the status of manual customer access, when the manual customer service is idle, it will directly switch to the manual customer service.
  • the embodiment of the present application provides an intelligent interaction system based on deep learning, including: an acquisition module, used to acquire the voice feature information of the access user; a determination module, used to input the voice feature information into the trained In the deep learning neural network, the response strategy is determined; the response module is used to respond to the user according to the response strategy.
  • the acquisition module includes: a first determining unit, configured to match the language of the user's voice, and determine language information; a second determining unit, configured to It is used to determine the semantics and intonation meaning corresponding to the voice according to the language type information and the corresponding language database.
  • the second determination unit includes: a first determination subunit configured to The information determines each word in the user sentence; the second determining subunit is used to combine the determined words and then perform part-of-speech division to determine the semantics of the user's voice; the third determining subunit is used to combine voice intonation and current The intonation feature information of the language determines the intonation meaning of the user.
  • the determining module includes: a third determining unit, configured for the deep learning neural network to determine according to the intonation meaning The user's emotional characteristics; the processing unit is used to select the corresponding response words from the response database according to the semantics of the user's voice if the emotional characteristics represent the user's emotional stability; or, if the emotional characteristics represent the user's emotional anxiety, Then transfer to manual service.
  • an embodiment of the present application provides a terminal, including: a processor; a memory for storing computer-executable instructions; when the processor executes the computer-executable instructions, the processor executes the first
  • a terminal including: a processor; a memory for storing computer-executable instructions; when the processor executes the computer-executable instructions, the processor executes the first
  • FIG. 1 is a schematic flow diagram of a deep learning-based intelligent interaction method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an intelligent interactive system based on deep learning provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a terminal provided by an embodiment of the present application.
  • Fig. 1 is a schematic flowchart of a deep learning-based intelligent interaction method provided by the embodiment of the present application.
  • the deep learning-based intelligent interaction method provided by the embodiment of the present application includes:
  • Intelligent voice interaction in traditional technologies generally realizes communication in fixed language types intelligently, such as mobile operators, convenience service hotlines, etc.
  • the users who visit are required to express their appeals in Mandarin, and the intelligent customer service determines the content of the response based on the analysis of the user's voice.
  • the intelligent customer service cannot answer.
  • the language of the user's voice is first matched to determine the language information.
  • the semantics and intonation meaning of the user's voice are determined in combination with the corresponding language database.
  • the semantics of voice is to understand the meaning of the user, while the meaning of intonation is to determine the tone and mood of the customer when speaking.
  • the intonation meaning of the user is determined by combining the intonation of the voice and the intonation feature information of the current language.
  • the meaning of the tone of the user's voice it is particularly important to determine the meaning of the tone of the user's voice, because the meaning of the tone of voice can determine the current emotional characteristics of the user. For example, taking Mandarin Chinese as an example, if the user is emotionally excited or anxious, he will have the following intonation characteristics when speaking: speaking fast or loudly. However, for some languages, fast speaking speed and loud voice are their unique normal intonation features, which need to be determined from other aspects.
  • the deep learning neural network After determining the meaning of the user's speech semantics and intonation in S101, input them into the trained deep learning neural network, and the deep learning neural network first determines the user's emotional characteristics according to the meaning of the intonation. If the emotional feature indicates that the user is emotionally stable, a corresponding response utterance is selected from the response database according to the semantics of the user's voice. However, if the emotional characteristics represent the user's emotional anxiety, then transfer to manual service. At this time, using intelligent customer service to interact with the user may not be able to solve the user's appeal, and may even cause user dissatisfaction.
  • the corresponding response sentence is retrieved from the corresponding database to respond to the user through the semantics of the user's voice. If it needs to be transferred to manual, the service will be performed manually.
  • a transfer intelligent customer service is temporarily set up.
  • the transfer intelligent customer service imitates the state of manual customer access, and when the artificial customer service is idle, it is directly switched to the artificial customer service.
  • the present application also provides an embodiment of a deep learning-based intelligent interaction system.
  • the deep learning-based intelligent interaction system 20 includes: An acquisition module 201 , a determination module 202 and a response module 203 .
  • the acquiring module 201 is configured to acquire voice feature information of an access user.
  • the determining module 202 is configured to input the speech feature information into the trained deep learning neural network to determine the response strategy.
  • An answering module 203 configured to answer the user according to the answering strategy.
  • the acquiring module 201 includes: a first determining unit and a second determining unit.
  • the first determination unit is configured to match the language of the user's voice and determine the language information;
  • the second determination unit is configured to determine the semantics and intonation meaning of the voice according to the language information and the corresponding language library.
  • the second determination unit includes: a first determination subunit, a second determination subunit and a third determination subunit.
  • the first determining subunit is configured to determine each word in the user sentence according to the language library and voiceprint information.
  • the second determining subunit is used to combine the determined words and then perform part-of-speech division to determine the semantics of the user's voice.
  • the third determining subunit is used to determine the meaning of the user's intonation in combination with the intonation feature information of the current language.
  • the determining module 202 includes: a third determining unit and a processing unit.
  • the third determination unit is used for the deep learning neural network to determine the user's emotional characteristics according to the meaning of the intonation.
  • the processing unit is used to select the corresponding response utterance from the response database according to the semantics of the user's voice if the emotional feature indicates that the user is emotionally stable; or, if the emotional feature indicates that the user is emotionally anxious, transfer to a manual Serve.
  • a terminal 30 includes: a processor 301 , a memory 302 and a communication interface 303 .
  • the processor 301 , the memory 302 and the communication interface 303 can be connected to each other through a bus; the bus can be divided into an address bus, a data bus, a control bus, and the like.
  • the bus can be divided into an address bus, a data bus, a control bus, and the like.
  • only one thick line is used in FIG. 3 , but it does not mean that there is only one bus or one type of bus.
  • the processor 301 usually controls the overall functions of the terminal 30, such as starting the terminal 30, and obtaining the voice feature information of the access user after the terminal 30 starts; input the voice feature information into the trained deep learning neural network, and determine the response policy; answering the user according to the answering policy.
  • the processor 301 may be a general processor, for example, a central processing unit (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP) or a combination of CPU and NP.
  • the processor may also be a microprocessor (MCU).
  • Processors may also include hardware chips.
  • the aforementioned hardware chip may be an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD) or a combination thereof.
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field programmable logic gate array (FPGA) or the like.
  • the memory 302 is configured to store computer-executable instructions to support the operation of terminal 30 data.
  • the memory 301 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • the processor 301 and the memory 302 are powered on, and the processor 301 reads and executes the computer-executable instructions stored in the memory 302 to complete all or part of the above-mentioned embodiments of the intelligent interaction method based on deep learning step.
  • the communication interface 303 is used for the terminal 30 to transmit data, such as realizing communication with network devices and servers.
  • the communication interface 303 includes a wired communication interface, and may also include a wireless communication interface.
  • the wired communication interface includes a USB interface, a Micro USB interface, and may also include an Ethernet interface.
  • the wireless communication interface may be a WLAN interface, a cellular network communication interface or a combination thereof.
  • the terminal 30 provided in the embodiment of the present application further includes a power supply component, which provides power for various components of the terminal 30 .
  • Power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to terminal 30 .
  • a communication component the communication component is configured to facilitate wired or wireless communication between the terminal 30 and other devices.
  • the terminal 30 can access a wireless network based on communication standards, such as WiFi, 4G or 5G, or a combination thereof.
  • the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component also includes a Near Field Communication (NFC) module to facilitate short-range communication.
  • NFC Near Field Communication
  • the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID Radio Frequency Identification
  • IrDA Infrared Data Association
  • UWB Ultra Wideband
  • Bluetooth Bluetooth
  • terminal 30 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Programmable Gate Array (FPGA) or other electronic component implementation.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGA Field Programmable Programmable Gate Array

Abstract

A deep learning-based intelligent human-computer interaction method and system, and a terminal. The method comprises: acquiring voice feature information of a calling user (S101); inputting the voice feature information into a trained deep learning neural network, and determining a response policy (S102); and responding to the user according to the response policy (S103). In a session with a user, intelligent customer service does not use the traditional template language, and prioritizes letting the user describe their problem. Then, the words of the description of the problem are analyzed and a response policy is obtained, thereby ensuring that the response addresses the user's problem, avoiding repeatedly inquiring about the user's needs, and improving user satisfaction.

Description

一种基于深度学习的智能交互方法、系统及终端An intelligent interaction method, system and terminal based on deep learning 技术领域technical field
本申请涉及人工智能交互技术领域,具体涉及一种基于深度学习的智能交互方法、系统及终端。This application relates to the field of artificial intelligence interaction technology, in particular to an intelligent interaction method, system and terminal based on deep learning.
背景技术Background technique
人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器,该领域的研究包括机器人、语言识别、图像识别、自然语言处理和专家系统等。人工智能从诞生以来,理论和技术日益成熟,应用领域也不断扩大,可以设想,未来人工智能带来的科技产品,将会是人类智慧的“容器”。人工智能可以对人的意识、思维的信息过程的模拟。人工智能不是人的智能,但能像人那样思考、也可能超过人的智能。Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Research in this field includes robotics, language recognition, image recognition, natural language processing and expert systems, etc. Since the birth of artificial intelligence, the theory and technology have become increasingly mature, and the application fields have also continued to expand. It can be imagined that the technological products brought by artificial intelligence in the future will be the "container" of human wisdom. Artificial intelligence can simulate the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but it can think like human beings, and it may surpass human intelligence.
尤其语音识别和自然语言处理被广泛应用到服务行业的智能终端和在线客服中,比如移动、联通、电信等运营商,也有政府服务热线中。传统技术中的人工智能对话一般是设置固定对话模板,当用户接入后,智能客服会通过引导语引导用户通过模板化的语言提出自己的请求。识别到用户的请求后,根据用户请求给出对应的应答。In particular, speech recognition and natural language processing are widely used in smart terminals and online customer service in the service industry, such as mobile, China Unicom, China Telecom and other operators, as well as government service hotlines. The artificial intelligence dialogue in traditional technology generally sets a fixed dialogue template. When the user accesses it, the intelligent customer service will guide the user to make their own request through the templated language through the guiding language. After identifying the user's request, give the corresponding response according to the user's request.
虽然传统的智能客服能实现基本的语音识别功能,但是如果用户采用方言询问,或者进行询问时不采用模板式的语言,此时智能客服会进入死循环中,不停的询问用户的需求,进而会降低用户的满意度。Although the traditional intelligent customer service can realize the basic voice recognition function, if the user asks in a dialect, or does not use a template language when making an inquiry, the intelligent customer service will enter an endless loop at this time, constantly asking the user's needs, and then will reduce user satisfaction.
发明内容Contents of the invention
本申请为了解决上述技术问题,提出了如下技术方案:In order to solve the above technical problems, the application proposes the following technical solutions:
第一方面,本申请实施例提供了一种基于深度学习的智能交互方法,包 括:获取接入用户的语音特征信息;将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;根据所述应答策略对所述用户进行应答。In the first aspect, the embodiment of the present application provides an intelligent interaction method based on deep learning, including: obtaining the voice feature information of the access user; inputting the voice feature information into the trained deep learning neural network, and determining the response strategy ; Answer the user according to the answer policy.
采用上述实现方式,智能客服与用户进行会话时摒弃了传统的薄板式语言,优先用户进行诉求阐述。然后对诉求阐述的话语进行分析,获得回应的策略,进而保证了针对用户诉求进行回复,从而不需要反复的询问用户的需求,从而提高了用户的满意度。Using the above-mentioned implementation method, the traditional thin-plate language is abandoned when the intelligent customer service talks with the user, and the user is given priority in explaining the appeal. Then analyze the utterance of the appeal to obtain a response strategy, thereby ensuring a response to the user's appeal, so that there is no need to repeatedly ask the user's needs, thereby improving user satisfaction.
结合第一方面,在第一方面第一种可能的实现方式中,所述获取接入用户的语音特征信息,包括:对用户语音的语种进行匹配,确定语种信息;根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。With reference to the first aspect, in the first possible implementation manner of the first aspect, the acquiring the voice feature information of the access user includes: matching the language of the user's voice to determine the language information; according to the language information and the corresponding The language base determines the semantic and intonation meaning of phonetic correspondence.
结合第一方面第一种可能的实现方式,在第一方面第二种可能的实现方式中,根据所述语种信息与对应的语言库确定语音对应的语义和语调含义,包括:根据所述语言库和语音声纹信息确定出用户语句中的每个单字;将确定出的单字进行组合后再进行词性划分确定用户语音的语义;结合语音语调和当前语种的语调特征信息确定用户的语调含义。With reference to the first possible implementation of the first aspect, in the second possible implementation of the first aspect, determining the semantics and intonation meaning corresponding to the voice according to the language information and the corresponding language library includes: according to the language Each word in the user's sentence is determined by the database and voice voiceprint information; the determined words are combined and then part-of-speech is divided to determine the semantics of the user's voice; the user's intonation meaning is determined by combining the intonation of the voice and the intonation feature information of the current language.
结合第一方面第二种可能的实现方式,在第一方面第三种可能的实现方式中,将所述语音特征信息输入训练好的深度学习神经网络中,获取应答策略,包括:所述深度学习神经网络根据所述语调含义确定用户的情绪特征;如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。In combination with the second possible implementation of the first aspect, in the third possible implementation of the first aspect, the speech feature information is input into a trained deep learning neural network to obtain a response strategy, including: the depth The learning neural network determines the user's emotional characteristics according to the intonation meaning; if the emotional characteristics represent the user's emotional stability, then select the corresponding response words from the response database according to the semantics of the user's voice; or, if the emotional characteristics represent If the user is emotionally anxious, it will be transferred to manual service.
结合第一方面第三种可能的实现方式,在第一方面第四种可能的实现方式中,如果转接人工坐席时,出现坐席繁忙,则临时建立一个中转智能客服,所述中转智能客服模仿人工客户接入的状态,当出现人工客服空闲时,直接 切换至人工客服。In combination with the third possible implementation of the first aspect, in the fourth possible implementation of the first aspect, if the agent is busy when transferring to a manual agent, a transfer intelligent customer service is temporarily established, and the transfer intelligent customer service imitates In the status of manual customer access, when the manual customer service is idle, it will directly switch to the manual customer service.
第二方面,本申请实施例提供了一种基于深度学习的智能交互系统,包括:获取模块,用于获取接入用户的语音特征信息;确定模块,用于将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;应答模块,用于根据所述应答策略对所述用户进行应答。In the second aspect, the embodiment of the present application provides an intelligent interaction system based on deep learning, including: an acquisition module, used to acquire the voice feature information of the access user; a determination module, used to input the voice feature information into the trained In the deep learning neural network, the response strategy is determined; the response module is used to respond to the user according to the response strategy.
结合第二方面,在第二方面第一种可能的实现方式中,所述获取模块,包括:第一确定单元,用于对用户语音的语种进行匹配,确定语种信息;第二确定单元,用于根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。With reference to the second aspect, in the first possible implementation manner of the second aspect, the acquisition module includes: a first determining unit, configured to match the language of the user's voice, and determine language information; a second determining unit, configured to It is used to determine the semantics and intonation meaning corresponding to the voice according to the language type information and the corresponding language database.
结合第二方面第一种可能的实现方式,在第二方面第二种可能的实现方式中,所述第二确定单元包括:第一确定子单元,用于根据所述语言库和语音声纹信息确定出用户语句中的每个单字;第二确定子单元,用于将确定出的单字进行组合后再进行词性划分确定用户语音的语义;第三确定子单元,用于结合语音语调和当前语种的语调特征信息确定用户的语调含义。With reference to the first possible implementation of the second aspect, in the second possible implementation of the second aspect, the second determination unit includes: a first determination subunit configured to The information determines each word in the user sentence; the second determining subunit is used to combine the determined words and then perform part-of-speech division to determine the semantics of the user's voice; the third determining subunit is used to combine voice intonation and current The intonation feature information of the language determines the intonation meaning of the user.
结合第二方面第二种可能的实现方式,在第二方面第三种可能的实现方式中,所述确定模块包括:第三确定单元,用于所述深度学习神经网络根据所述语调含义确定用户的情绪特征;处理单元,用于如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。With reference to the second possible implementation of the second aspect, in a third possible implementation of the second aspect, the determining module includes: a third determining unit, configured for the deep learning neural network to determine according to the intonation meaning The user's emotional characteristics; the processing unit is used to select the corresponding response words from the response database according to the semantics of the user's voice if the emotional characteristics represent the user's emotional stability; or, if the emotional characteristics represent the user's emotional anxiety, Then transfer to manual service.
第三方面,本申请实施例提供了一种终端,包括:处理器;存储器,用于存储计算机可执行指令;当所述处理器执行所述计算机可执行指令时,所述处理器执行第一方面或第一方面任一可能实现方式所述的方法,实现智能语音交互。In a third aspect, an embodiment of the present application provides a terminal, including: a processor; a memory for storing computer-executable instructions; when the processor executes the computer-executable instructions, the processor executes the first The method described in any possible implementation manner of the aspect or the first aspect realizes intelligent voice interaction.
附图说明Description of drawings
图1为本申请实施例提供的一种基于深度学习的智能交互方法的流程示意图;FIG. 1 is a schematic flow diagram of a deep learning-based intelligent interaction method provided by an embodiment of the present application;
图2为本申请实施例提供的一种基于深度学习的智能交互系统的示意图;FIG. 2 is a schematic diagram of an intelligent interactive system based on deep learning provided by an embodiment of the present application;
图3为本申请实施例提供的一种终端的示意图。FIG. 3 is a schematic diagram of a terminal provided by an embodiment of the present application.
具体实施方式Detailed ways
下面结合附图与具体实施方式对本方案进行阐述。The scheme will be described below in conjunction with the accompanying drawings and specific implementation methods.
图1为本申请实施例提供的一种基于深度学习的智能交互方法的流程示意图,参见图1,本申请实施例提供的基于深度学习的智能交互方法包括:Fig. 1 is a schematic flowchart of a deep learning-based intelligent interaction method provided by the embodiment of the present application. Referring to Fig. 1, the deep learning-based intelligent interaction method provided by the embodiment of the present application includes:
S101,获取接入用户的语音特征信息。S101. Acquire voice feature information of an access user.
传统技术中的智能语音交互一般智能实现固定语言种类的交流,比如移动运营商、便民服务热线等。一般要求访问的用户采用普通话说出自己的诉求,智能客服根据对用户语音的分析确定应答内容。但是,如果没有用户发音为非普通话或是非固定语言种类,则智能客服无法进行应答。Intelligent voice interaction in traditional technologies generally realizes communication in fixed language types intelligently, such as mobile operators, convenience service hotlines, etc. Generally, the users who visit are required to express their appeals in Mandarin, and the intelligent customer service determines the content of the response based on the analysis of the user's voice. However, if there is no user whose pronunciation is non-Mandarin or non-fixed language, the intelligent customer service cannot answer.
基于上述原因,本申请实施例中接收到用户的语音后,首先对用户语音的语种进行匹配,确定语种信息。为了实现上述功能,需要接入多种语言的数据库和各地方言发音数据库。当匹配到对应的语种信息后,则结合对应的语言库确定出用户语音的语义和语调含义。很显然的,语音的语义是对用户的意思进行理解,而语调的含义则是对客户说话时的语气与心情进行确定。Based on the above reasons, after the user's voice is received in the embodiment of the present application, the language of the user's voice is first matched to determine the language information. In order to realize the above functions, it is necessary to access databases of multiple languages and pronunciation databases of local dialects. When the corresponding language information is matched, the semantics and intonation meaning of the user's voice are determined in combination with the corresponding language database. Obviously, the semantics of voice is to understand the meaning of the user, while the meaning of intonation is to determine the tone and mood of the customer when speaking.
本实施例中为了实现对用户语音语义和语调含义的确定,首先根据所述语言库和语音声纹信息确定出用户语句中的每个单字,将确定出的单字进行组合后再进行词性划分确定用户语音的语义。确定语义时,需要根据对应语言种类的特征对单字进行准确划分,使得语义与用户表达意思贴合。确定出 用户语音的语义后,再结合语音语调和当前语种的语调特征信息确定用户的语调含义。本实施例中,对于用户语音语调的含义确定尤其重要,因为语调含义可以确定出用户当前情绪特征。比如以普通话为例,如果用户情绪比较激动或着急,则说话时会有以下语调特征:语速快或声音大等。但是有的语言种类语速快和声音大则是其特有的正常语调特征,而需要从其他方面来确定。In this embodiment, in order to realize the determination of the meaning of the user's voice semantics and intonation, first determine each individual character in the user's sentence according to the language library and voice print information, and then perform part-of-speech division and determination after combining the determined individual characters Semantics of user speech. When determining the semantics, it is necessary to accurately divide the words according to the characteristics of the corresponding language category, so that the semantics fit the meaning expressed by the user. After the semantics of the user's voice is determined, the intonation meaning of the user is determined by combining the intonation of the voice and the intonation feature information of the current language. In this embodiment, it is particularly important to determine the meaning of the tone of the user's voice, because the meaning of the tone of voice can determine the current emotional characteristics of the user. For example, taking Mandarin Chinese as an example, if the user is emotionally excited or anxious, he will have the following intonation characteristics when speaking: speaking fast or loudly. However, for some languages, fast speaking speed and loud voice are their unique normal intonation features, which need to be determined from other aspects.
S102,将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略。S102. Input the speech feature information into the trained deep learning neural network to determine a response strategy.
S101中确定出用户语音语义和语调含义后,输入到训练好的深度学习神经网络中,深度学习神经网络首先根据语调含义确定用户的情绪特征。如果情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语。但是如果所述情绪特征表征用户情绪焦虑,则转接至人工服务,此时采用智能客服与用户交互可能无法解决用户的诉求,甚至会造成用户的不满意。After determining the meaning of the user's speech semantics and intonation in S101, input them into the trained deep learning neural network, and the deep learning neural network first determines the user's emotional characteristics according to the meaning of the intonation. If the emotional feature indicates that the user is emotionally stable, a corresponding response utterance is selected from the response database according to the semantics of the user's voice. However, if the emotional characteristics represent the user's emotional anxiety, then transfer to manual service. At this time, using intelligent customer service to interact with the user may not be able to solve the user's appeal, and may even cause user dissatisfaction.
比如接入诉求的用户此时比较着急,例如涉及到投诉的情况,如果像现在的人工智能客服反复的询问“您投诉哪方面的内容”,则会引起用户的不满。如果将这类情况的用户直接通过转接接入到人工客服,则可以通过人工客服进行针对性的人性化服务,从而实现最大程度的解决用户诉求。For example, users who access appeals are more anxious at this time. For example, when it comes to complaints, if the current artificial intelligence customer service repeatedly asks "what is your complaint?", it will cause user dissatisfaction. If users in such situations are directly connected to the manual customer service through transfer, targeted humanized services can be provided through the manual customer service, so as to solve user demands to the greatest extent.
S103,根据所述应答策略对所述用户进行应答。S103. Answer the user according to the answer policy.
根据S102中确定应答策略,如果是采用智能客服则通过用户语音的语义,从对应数据库中调取相应的应答语句对用户进行应答。如果需要转人工的,则由人工进行服务。According to the response strategy determined in S102, if the intelligent customer service is adopted, the corresponding response sentence is retrieved from the corresponding database to respond to the user through the semantics of the user's voice. If it needs to be transferred to manual, the service will be performed manually.
需要指出的是,如果转接人工坐席时,出现坐席繁忙,则临时建立一个 中转智能客服,所述中转智能客服模仿人工客户接入的状态,当出现人工客服空闲时,直接切换至人工客服。It should be pointed out that if the agent is busy when the agent is transferred, a transfer intelligent customer service is temporarily set up. The transfer intelligent customer service imitates the state of manual customer access, and when the artificial customer service is idle, it is directly switched to the artificial customer service.
与上述实施例提供的一种基于深度学习的智能交互方法相对应,本申请还提供了一种基于深度学习的智能交互系统的实施例,参见图2,基于深度学习的智能交互系统20包括:获取模块201、确定模块202和应答模块203。Corresponding to the deep learning-based intelligent interaction method provided in the above embodiments, the present application also provides an embodiment of a deep learning-based intelligent interaction system. Referring to FIG. 2, the deep learning-based intelligent interaction system 20 includes: An acquisition module 201 , a determination module 202 and a response module 203 .
获取模块201,用于获取接入用户的语音特征信息。确定模块202,用于将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略。应答模块203,用于根据所述应答策略对所述用户进行应答。The acquiring module 201 is configured to acquire voice feature information of an access user. The determining module 202 is configured to input the speech feature information into the trained deep learning neural network to determine the response strategy. An answering module 203, configured to answer the user according to the answering strategy.
本实施例中,所述获取模块201,包括:第一确定单元和第二确定单元。第一确定单元,用于对用户语音的语种进行匹配,确定语种信息;第二确定单元,用于根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。In this embodiment, the acquiring module 201 includes: a first determining unit and a second determining unit. The first determination unit is configured to match the language of the user's voice and determine the language information; the second determination unit is configured to determine the semantics and intonation meaning of the voice according to the language information and the corresponding language library.
进一步地,所述第二确定单元包括:第一确定子单元、第二确定子单元和第三确定子单元。第一确定子单元,用于根据所述语言库和语音声纹信息确定出用户语句中的每个单字。第二确定子单元,用于将确定出的单字进行组合后再进行词性划分确定用户语音的语义。第三确定子单元,用于结合语音语调和当前语种的语调特征信息确定用户的语调含义。Further, the second determination unit includes: a first determination subunit, a second determination subunit and a third determination subunit. The first determining subunit is configured to determine each word in the user sentence according to the language library and voiceprint information. The second determining subunit is used to combine the determined words and then perform part-of-speech division to determine the semantics of the user's voice. The third determining subunit is used to determine the meaning of the user's intonation in combination with the intonation feature information of the current language.
所述确定模块202包括:第三确定单元和处理单元。第三确定单元,用于所述深度学习神经网络根据所述语调含义确定用户的情绪特征。处理单元,用于如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。The determining module 202 includes: a third determining unit and a processing unit. The third determination unit is used for the deep learning neural network to determine the user's emotional characteristics according to the meaning of the intonation. The processing unit is used to select the corresponding response utterance from the response database according to the semantics of the user's voice if the emotional feature indicates that the user is emotionally stable; or, if the emotional feature indicates that the user is emotionally anxious, transfer to a manual Serve.
本申请还提供了一种终端的实施例,参见图3,终端30包括:处理器301、 存储器302和通信接口303。The present application also provides an embodiment of a terminal. Referring to FIG. 3 , a terminal 30 includes: a processor 301 , a memory 302 and a communication interface 303 .
在图3中,处理器301、存储器302和通信接口303可以通过总线相互连接;总线可以分为地址总线、数据总线、控制总线等。为便于表示,图3中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。In FIG. 3 , the processor 301 , the memory 302 and the communication interface 303 can be connected to each other through a bus; the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 3 , but it does not mean that there is only one bus or one type of bus.
处理器301通常是控制终端30的整体功能,例如终端30的启动、以及终端30启动后获取接入用户的语音特征信息;将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;根据所述应答策略对所述用户进行应答。The processor 301 usually controls the overall functions of the terminal 30, such as starting the terminal 30, and obtaining the voice feature information of the access user after the terminal 30 starts; input the voice feature information into the trained deep learning neural network, and determine the response policy; answering the user according to the answering policy.
处理器301可以是通用处理器,例如,中央处理器(英文:central processing unit,缩写:CPU),网络处理器(英文:network processor,缩写:NP)或者CPU和NP的组合。处理器也可以是微处理器(MCU)。处理器还可以包括硬件芯片。上述硬件芯片可以是专用集成电路(ASIC),可编程逻辑器件(PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(CPLD),现场可编程逻辑门阵列(FPGA)等。The processor 301 may be a general processor, for example, a central processing unit (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP) or a combination of CPU and NP. The processor may also be a microprocessor (MCU). Processors may also include hardware chips. The aforementioned hardware chip may be an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD) or a combination thereof. The above-mentioned PLD may be a complex programmable logic device (CPLD), a field programmable logic gate array (FPGA) or the like.
存储器302被配置为存储计算机可执行指令以支持终端30数据的操作。存储器301可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。The memory 302 is configured to store computer-executable instructions to support the operation of terminal 30 data. The memory 301 can be implemented by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
启动终端30后,处理器301和存储器302上电,处理器301读取并执行存储在存储器302内的计算机可执行指令,以完成上述的基于深度学习的智能交互方法实施例中的全部或部分步骤。After the terminal 30 is started, the processor 301 and the memory 302 are powered on, and the processor 301 reads and executes the computer-executable instructions stored in the memory 302 to complete all or part of the above-mentioned embodiments of the intelligent interaction method based on deep learning step.
通信接口303用于终端30传输数据,例如实现与网络设备、服务器之间的 通信等。通信接口303包括有线通信接口,还可以包括无线通信接口。其中,有线通信接口包括USB接口、Micro USB接口,还可以包括以太网接口。无线通信接口可以为WLAN接口,蜂窝网络通信接口或其组合等。The communication interface 303 is used for the terminal 30 to transmit data, such as realizing communication with network devices and servers. The communication interface 303 includes a wired communication interface, and may also include a wireless communication interface. Wherein, the wired communication interface includes a USB interface, a Micro USB interface, and may also include an Ethernet interface. The wireless communication interface may be a WLAN interface, a cellular network communication interface or a combination thereof.
在一个示意性实施例中,本申请实施例提供的终端30还包括电源组件,电源组件为终端30的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为终端30生成、管理和分配电力相关联的组件。In an exemplary embodiment, the terminal 30 provided in the embodiment of the present application further includes a power supply component, which provides power for various components of the terminal 30 . Power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to terminal 30 .
通信组件,通信组件被配置为便于终端30和其他设备之间有线或无线方式的通信。终端30可以接入基于通信标准的无线网络,如WiFi,4G或5G,或它们的组合。通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。A communication component, the communication component is configured to facilitate wired or wireless communication between the terminal 30 and other devices. The terminal 30 can access a wireless network based on communication standards, such as WiFi, 4G or 5G, or a combination thereof. The communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. The communication component also includes a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在一个示意性实施例中,终端30可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)或其他电子元件实现。In an exemplary embodiment, terminal 30 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Programmable Gate Array (FPGA) or other electronic component implementation.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存 在另外的相同要素。It should be noted that in this article, relative terms such as "first" and "second" are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these No such actual relationship or order exists between entities or operations. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

Claims (10)

  1. 一种基于深度学习的智能交互方法,其特征在于,包括:An intelligent interaction method based on deep learning, characterized in that it includes:
    获取接入用户的语音特征信息;Obtain the voice feature information of the access user;
    将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;Input the voice feature information into the trained deep learning neural network to determine the response strategy;
    根据所述应答策略对所述用户进行应答。Answer the user according to the answer policy.
  2. 根据权利要求1所述的基于深度学习的智能交互方法,其特征在于,所述获取接入用户的语音特征信息,包括:The intelligent interaction method based on deep learning according to claim 1, wherein said acquiring voice feature information of an access user comprises:
    对用户语音的语种进行匹配,确定语种信息;Match the language of the user's voice to determine the language information;
    根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。Determine the semantics and intonation meaning corresponding to the voice according to the language type information and the corresponding language database.
  3. 根据权利要求2所述的基于深度学习的智能交互方法,其特征在于,根据所述语种信息与对应的语言库确定语音对应的语义和语调含义,包括:The intelligent interaction method based on deep learning according to claim 2, characterized in that, according to the language type information and the corresponding language library, the semantic and intonation meaning corresponding to the voice is determined, including:
    根据所述语言库和语音声纹信息确定出用户语句中的每个单字;Determining each individual word in the user sentence according to the language library and voice print information;
    将确定出的单字进行组合后再进行词性划分确定用户语音的语义;Combining the determined words and then performing part-of-speech division to determine the semantics of the user's voice;
    结合语音语调和当前语种的语调特征信息确定用户的语调含义。Combining the voice intonation and intonation feature information of the current language to determine the intonation meaning of the user.
  4. 根据权利要求3所述基于深度学习的智能交互方法,其特征在于,将所述语音特征信息输入训练好的深度学习神经网络中,获取应答策略,包括:The intelligent interaction method based on deep learning according to claim 3, wherein the speech feature information is input into a trained deep learning neural network to obtain a response strategy, including:
    所述深度学习神经网络根据所述语调含义确定用户的情绪特征;The deep learning neural network determines the user's emotional characteristics according to the meaning of the intonation;
    如果所述情绪特征表征用户情绪稳定,则根据所述用户语音的语义从应答数据库中选择对应的应答话语;If the emotional feature represents the user's emotional stability, then select the corresponding answering utterance from the answering database according to the semantics of the user's voice;
    或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。Or, if the emotional feature represents the user's emotional anxiety, transfer to manual service.
  5. 根据权利要求4所述的基于深度学习的智能交互方法,其特征在于,如果转接人工坐席时,出现坐席繁忙,则临时建立一个中转智能客服,所述中转智能客服模仿人工客户接入的状态,当出现人工客服空闲时,直接切换至人工客服。The intelligent interaction method based on deep learning according to claim 4, wherein if the agent is busy when transferring to an artificial agent, a transfer intelligent customer service is temporarily established, and the transfer intelligent customer service imitates the state of artificial customer access , when the manual customer service is idle, switch directly to the manual customer service.
  6. 一种基于深度学习的智能交互系统,其特征在于,包括:An intelligent interactive system based on deep learning, characterized in that it includes:
    获取模块,用于获取接入用户的语音特征信息;An acquisition module, configured to acquire voice feature information of an access user;
    确定模块,用于将所述语音特征信息输入训练好的深度学习神经网络中,确定应答策略;Determining module, for inputting described speech feature information in the deep learning neural network that has trained, determines response strategy;
    应答模块,用于根据所述应答策略对所述用户进行应答。An answering module, configured to answer the user according to the answering strategy.
  7. 根据权利要求6所述的基于深度学习的智能交互系统,其特征在于,所述获取模块,包括:The intelligent interactive system based on deep learning according to claim 6, wherein the acquisition module includes:
    第一确定单元,用于对用户语音的语种进行匹配,确定语种信息;The first determining unit is used to match the language of the user's voice and determine the language information;
    第二确定单元,用于根据所述语种信息与对应的语言库确定语音对应的语义和语调含义。The second determination unit is configured to determine the semantics and intonation meaning corresponding to the voice according to the language type information and the corresponding language library.
  8. 根据权利要求7所述的基于深度学习的智能交互系统,其特征在于,所述第二确定单元包括:The intelligent interaction system based on deep learning according to claim 7, wherein the second determining unit comprises:
    第一确定子单元,用于根据所述语言库和语音声纹信息确定出用户语句中的每个单字;The first determination subunit is used to determine each word in the user sentence according to the language library and voiceprint information;
    第二确定子单元,用于将确定出的单字进行组合后再进行词性划分确定用户语音的语义;The second determining subunit is used for combining the determined words and then performing part-of-speech division to determine the semantics of the user's voice;
    第三确定子单元,用于结合语音语调和当前语种的语调特征信息确定用户的语调含义。The third determining subunit is used to determine the meaning of the user's intonation in combination with the intonation feature information of the current language.
  9. 根据权利要求8所述基于深度学习的智能交互系统,其特征在于,所述确定模块包括:The intelligent interactive system based on deep learning according to claim 8, wherein the determination module comprises:
    第三确定单元,用于所述深度学习神经网络根据所述语调含义确定用户的情绪特征;The third determination unit is used for the deep learning neural network to determine the user's emotional characteristics according to the meaning of the intonation;
    处理单元,用于如果所述情绪特征表征用户情绪稳定,则根据所述用户 语音的语义从应答数据库中选择对应的应答话语;The processing unit is used to select the corresponding answering utterance from the answering database according to the semantics of the user's voice if the emotional feature represents the user's emotional stability;
    或者,如果所述情绪特征表征用户情绪焦虑,则转接至人工服务。Or, if the emotional feature represents the user's emotional anxiety, transfer to manual service.
  10. 一种终端,其特征在于,包括:A terminal, characterized in that, comprising:
    处理器;processor;
    存储器,用于存储计算机可执行指令;memory for storing computer-executable instructions;
    当所述处理器执行所述计算机可执行指令时,所述处理器执行权利要求1-5任一项所述的方法,实现智能语音交互。When the processor executes the computer-executable instructions, the processor executes the method according to any one of claims 1-5 to realize intelligent voice interaction.
PCT/CN2021/136927 2021-12-03 2021-12-10 Deep learning-based intelligent human-computer interaction method and system, and terminal WO2023097745A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111464680.9A CN114240454A (en) 2021-12-03 2021-12-03 Intelligent interaction method, system and terminal based on deep learning
CN202111464680.9 2021-12-03

Publications (1)

Publication Number Publication Date
WO2023097745A1 true WO2023097745A1 (en) 2023-06-08

Family

ID=80752869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/136927 WO2023097745A1 (en) 2021-12-03 2021-12-10 Deep learning-based intelligent human-computer interaction method and system, and terminal

Country Status (2)

Country Link
CN (1) CN114240454A (en)
WO (1) WO2023097745A1 (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202301A (en) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 A kind of intelligent response system based on degree of depth study
CN108090218A (en) * 2017-12-29 2018-05-29 北京百度网讯科技有限公司 Conversational system generation method and device based on deeply study
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
CN110149450A (en) * 2019-05-22 2019-08-20 欧冶云商股份有限公司 Intelligent customer service answer method and system
CN110427472A (en) * 2019-08-02 2019-11-08 深圳追一科技有限公司 The matched method, apparatus of intelligent customer service, terminal device and storage medium
CN111739516A (en) * 2020-06-19 2020-10-02 中国—东盟信息港股份有限公司 Speech recognition system for intelligent customer service call
CN112148849A (en) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 Dynamic interaction method, server, electronic device and storage medium
CN112148850A (en) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 Dynamic interaction method, server, electronic device and storage medium
JP2021022928A (en) * 2019-07-24 2021-02-18 ネイバー コーポレーションNAVER Corporation Artificial intelligence-based automatic response method and system

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202301A (en) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 A kind of intelligent response system based on degree of depth study
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
CN108090218A (en) * 2017-12-29 2018-05-29 北京百度网讯科技有限公司 Conversational system generation method and device based on deeply study
CN110149450A (en) * 2019-05-22 2019-08-20 欧冶云商股份有限公司 Intelligent customer service answer method and system
JP2021022928A (en) * 2019-07-24 2021-02-18 ネイバー コーポレーションNAVER Corporation Artificial intelligence-based automatic response method and system
CN110427472A (en) * 2019-08-02 2019-11-08 深圳追一科技有限公司 The matched method, apparatus of intelligent customer service, terminal device and storage medium
CN111739516A (en) * 2020-06-19 2020-10-02 中国—东盟信息港股份有限公司 Speech recognition system for intelligent customer service call
CN112148849A (en) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 Dynamic interaction method, server, electronic device and storage medium
CN112148850A (en) * 2020-09-08 2020-12-29 北京百度网讯科技有限公司 Dynamic interaction method, server, electronic device and storage medium

Also Published As

Publication number Publication date
CN114240454A (en) 2022-03-25

Similar Documents

Publication Publication Date Title
US11335339B2 (en) Voice interaction method and apparatus, terminal, server and readable storage medium
US20230206940A1 (en) Method of and system for real time feedback in an incremental speech input interface
KR102178738B1 (en) Automated assistant calls from appropriate agents
KR102458805B1 (en) Multi-user authentication on a device
WO2020098249A1 (en) Electronic device, response conversation technique recommendation method and computer readable storage medium
US20220335930A1 (en) Utilizing pre-event and post-event input streams to engage an automated assistant
US11810557B2 (en) Dynamic and/or context-specific hot words to invoke automated assistant
US10468052B2 (en) Method and device for providing information
US20030167167A1 (en) Intelligent personal assistants
US20030163311A1 (en) Intelligent social agents
US8165887B2 (en) Data-driven voice user interface
CN111090728A (en) Conversation state tracking method and device and computing equipment
US11758047B2 (en) Systems and methods for smart dialogue communication
US20180308481A1 (en) Automated assistant data flow
JP2019133127A (en) Voice recognition method, apparatus and server
JP2021022928A (en) Artificial intelligence-based automatic response method and system
CN111258529B (en) Electronic apparatus and control method thereof
WO2023097745A1 (en) Deep learning-based intelligent human-computer interaction method and system, and terminal
JP4000828B2 (en) Information system, electronic equipment, program
CN114860910A (en) Intelligent dialogue method and system
CN111556096B (en) Information pushing method, device, medium and electronic equipment
USRE47974E1 (en) Dialog system with automatic reactivation of speech acquiring mode
CN113885825A (en) Method and device for intelligently creating application form
CN109147786A (en) A kind of information processing method and electronic equipment
CN111596833A (en) Skill art winding processing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21966170

Country of ref document: EP

Kind code of ref document: A1