WO2021082133A1 - 人机对话模式切换方法 - Google Patents

人机对话模式切换方法 Download PDF

Info

Publication number
WO2021082133A1
WO2021082133A1 PCT/CN2019/120617 CN2019120617W WO2021082133A1 WO 2021082133 A1 WO2021082133 A1 WO 2021082133A1 CN 2019120617 W CN2019120617 W CN 2019120617W WO 2021082133 A1 WO2021082133 A1 WO 2021082133A1
Authority
WO
WIPO (PCT)
Prior art keywords
dialogue
user
sentence
current user
current
Prior art date
Application number
PCT/CN2019/120617
Other languages
English (en)
French (fr)
Inventor
宋洪博
石韡斯
朱成亚
樊帅
Original Assignee
苏州思必驰信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州思必驰信息科技有限公司 filed Critical 苏州思必驰信息科技有限公司
Priority to US17/770,206 priority Critical patent/US20220399020A1/en
Priority to JP2022524252A priority patent/JP7413521B2/ja
Priority to EP19950263.4A priority patent/EP4054111A4/en
Publication of WO2021082133A1 publication Critical patent/WO2021082133A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L5/00Arrangements affording multiple use of the transmission path
    • H04L5/14Two-way operation using the same type of signal, i.e. duplex
    • H04L5/16Half-duplex systems; Simplex/duplex switching; Transmission of break signals non-automatically inverting the direction of transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q5/00Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange
    • H04Q5/24Selecting arrangements wherein two or more subscriber stations are connected by the same line to the exchange for two-party-line systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L15/222Barge in, i.e. overridable guidance for interrupting prompts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04QSELECTING
    • H04Q2213/00Indexing scheme relating to selecting arrangements in general and for multiplex systems
    • H04Q2213/13175Graphical user interface [GUI], WWW interface, visual indication

Definitions

  • This application relates to the technical field of man-machine dialogue, and in particular to a method for switching between man-machine dialogue modes.
  • a full-duplex dialogue mode or a half-duplex dialogue mode is mostly adopted.
  • the full-duplex dialogue mode is: long-open recording, recording is always on during the interaction process, TTS broadcast and recording can be performed at the same time, upstream and downstream two-way transmission at the same time.
  • the advantages are: the interactive mode is natural, and there will be no audio leakage problems; the disadvantages: the recording is long open, if the current AEC (Echo Cancellation) technology is not mature enough, it will cause the TTS to broadcast the sound recording, causing misrecognition, and falsely triggering the dialogue state Changes affect the flow of dialogue.
  • AEC Echo Cancellation
  • the half-duplex dialogue mode is: when the voice is broadcast, no recording is performed, and the upstream and downstream data are transmitted alternately.
  • the advantage is: when TTS broadcasts, no recording is performed, which can prevent false triggering of the dialogue process caused by noise; the disadvantage is: when the voice is broadcast, no recording is performed, and the user must wait for the completion of the broadcast before proceeding to the next round of dialogue.
  • the interaction process is unnatural.
  • the embodiments of the present application provide a method and system for man-machine dialogue mode switching, which are used to solve at least one of the above technical problems.
  • an embodiment of the present application provides a method for switching a man-machine dialogue mode, including:
  • an embodiment of the present application provides a human-machine dialogue mode switching system, including:
  • the voice receiving module is used to receive the current user sentence spoken by the current user
  • the dialog field determination module is used to determine whether the dialog field to which the current user sentence belongs is a preset dialog field
  • the dialogue mode switching module is used to switch the current dialogue mode to the full-duplex dialogue mode when it is determined that the dialogue field to which the current user sentence belongs is the preset dialogue field; when it is determined that the dialogue field to which the current user sentence belongs is not When the dialogue area is preset, the current dialogue mode is switched to half-duplex dialogue mode.
  • an embodiment of the present application provides a storage medium in which one or more programs including execution instructions are stored.
  • the execution instructions can be used by electronic devices (including but not limited to computers, servers, or networks). Equipment, etc.) to read and execute, so as to implement any one of the above-mentioned man-machine dialogue mode switching methods in this application.
  • an electronic device which includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions that can be executed by the at least one processor, The instructions are executed by the at least one processor, so that the at least one processor can execute any one of the above-mentioned man-machine dialogue mode switching methods of this application.
  • the embodiments of the present application also provide a computer program product, the computer program product includes a computer program stored on a storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, The computer executes any one of the above-mentioned man-machine dialogue mode switching methods.
  • the beneficial effect of the embodiments of the present application is that the dialog mode is switched by judging whether the dialog domain to which the current user sentence belongs belongs to the preset dialog domain, so that the dialog mode can be automatically switched and adjusted according to the difference in the dialog domain, so that the human-computer dialogue Always stay in the most suitable dialogue mode, and realize man-machine dialogue smoothly.
  • FIG. 1 is a flowchart of an embodiment of a method for switching between man-machine dialogue modes of the present application
  • FIG. 2 is a flowchart of another embodiment of a method for switching a man-machine dialogue mode according to the present application
  • FIG. 3 is a flowchart of another embodiment of the method for switching between man-machine dialogue modes of the present application
  • FIG. 4 is a functional block diagram of an embodiment of the human-machine dialogue mode switching system of this application.
  • FIG. 5 is a schematic structural diagram of an embodiment of the electronic device of this application.
  • This application may be described in the general context of computer-executable instructions executed by a computer, such as a program module.
  • program modules include routines, programs, objects, elements, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • This application can also be practiced in distributed computing environments. In these distributed computing environments, tasks are performed by remote processing devices connected through a communication network.
  • program modules can be located in local and remote computer storage media including storage devices.
  • module refers to related entities applied to a computer, such as hardware, a combination of hardware and software, software or software in execution, etc.
  • an element may, but is not limited to, a process, a processor, an object, an executable element, an execution thread, a program, and/or a computer running on a processor.
  • the application program or script program running on the server, and the server can all be components.
  • One or more elements can be in the process and/or thread of execution, and the elements can be localized on one computer and/or distributed between two or more computers, and can be run by various computer-readable media .
  • the component can also be based on a signal with one or more data packets, for example, a signal from a data that interacts with another component in a local system, a distributed system, and/or through a signal on the Internet to interact with other systems. Local and/or remote process to communicate.
  • the embodiment of the present application provides a method for switching between man-machine dialogue modes.
  • the method can be applied to an electronic device equipped with a man-machine dialogue system.
  • the electronic device may be a smart speaker, a smart phone, a smart robot, etc. This application does not limit this.
  • the following uses a smart speaker as an example to illustrate the method for switching between man-machine dialogue modes of the present application, and the method includes:
  • the default dialogue mode or the dialogue mode suitable for the current user is activated, and the user's voice signal is detected, and when the current user sentence spoken by the current user is detected, the current user sentence is recognized .
  • S12 Determine whether the dialog domain to which the current user sentence belongs is a preset dialog domain.
  • the smart speaker obtains the text content corresponding to the current user sentence, determines the dialogue domain to which the current user sentence belongs according to the text content, and further determines whether the dialogue domain to which it belongs is a preset dialogue domain.
  • different dialogue fields correspond to different dialogue scenarios, or dialogue fields correspond to skills in smart speakers, and each skill belongs to a dialogue field, for example, idiom solitaire skills, navigation skills, weather query skills, ticket booking skills Etc. belong to different areas of dialogue.
  • the current dialogue mode of the smart speaker is the full-duplex dialogue mode
  • the current dialogue mode can be maintained; if the current dialogue mode of the smart speaker is the half-duplex dialogue mode, switch it to the full-duplex dialogue mode .
  • the current dialogue mode of the smart speaker is the half-duplex dialogue mode
  • the current dialogue mode can be maintained; if the current dialogue mode of the smart speaker is the full-duplex dialogue mode, switch it to the half-duplex dialogue mode .
  • the dialogue mode is switched by judging whether the dialogue field to which the current user sentence belongs belongs to the preset dialogue field, so that the dialogue mode can be automatically switched and adjusted according to the different dialogue fields, so that the man-machine dialogue is always in the most suitable dialogue. Under the mode, the man-machine dialogue can be realized smoothly.
  • the current dialog mode when it is determined that the dialog domain to which the current user sentence belongs is a half-duplex dialog domain, the current dialog mode will be switched to the half-duplex dialog mode, otherwise the current dialog mode is maintained (if the current dialog mode is possible It is half-duplex dialogue mode, or full-duplex dialogue mode).
  • the half-duplex dialog field is a pre-configured designated dialog field.
  • a flowchart of another embodiment of a method for switching a human-machine dialogue mode of this application includes the following steps:
  • the client will start recording when TTS is broadcasting, that is, start the full-duplex dialogue mode
  • the client broadcasts a more important TTS message, and hopes that the user can listen to the message completely without being interrupted abnormally. It can be configured through the cloud to designate certain dialog areas as half-duplex mode.
  • the cloud sends a message to the client, and the client adaptively changes to the half-duplex mode at this time, and stops recording when the TTS broadcasts, so as to avoid noise input and affect the conversation state.
  • the user uses audio equipment and has no screen, and the user does not finish listening to the TTS broadcast, he does not know what command can be said next. At this time, the TTS broadcast is very important information. If the TTS interrupts and the user does not finish listening, the user will not know What to say later.
  • the following example is a dialogue between user U and machine M:
  • the method when in the full-duplex dialogue mode, the method further includes:
  • the hold response corresponds to the reply content of the previous user sentence.
  • this embodiment realizes the adaptive restriction of the jump of the dialogue field according to the dialogue context. Limit the jump in the dialogue area to avoid interactive interference caused by the switching of the dialogue area. In task-based multi-round dialogue scenarios, the switching of the dialogue field will cause the previous dialogue context to be cleared.
  • the dialogue between user U and machine M is as follows:
  • the detected "call” may be a misrecognition caused by surrounding noise.
  • the system responds to the input, it will switch the dialog field, causing the navigation task to be interrupted and entering the phone field.
  • the contextual information previously entered is Clear, if you return to the navigation field, you need to re-enter the navigation information, which will affect the interactive experience.
  • the human-machine dialogue mode switching method of this embodiment it is possible to determine whether to respond to the new user sentence according to whether the dialogue field described in the new user sentence is the same as the current multi-round dialogue field, so as to avoid the surrounding noise. Misrecognition makes the current multi-round dialogue task successfully completed.
  • a flowchart of another embodiment of a method for switching a man-machine dialogue mode of this application includes the following steps:
  • the input hits a certain semantic field (that is, the dialogue field of the current multiple rounds of dialogue);
  • the input hits a certain semantic field (ie, the dialogue field of the current multiple rounds of dialogue);
  • the human-machine dialogue mode switching method of the present application actually implements an adaptive dialogue mode switching method, which can restrict domain jumps according to the client state (for example, during the TTS broadcast process). After the client state changes (TTS broadcast is completed) Let go of domain jump restrictions. This reduces the misidentification caused by noise during the TTS broadcast process.
  • the client uploads the status to the server in real time, and the server adaptively switches the status of the dialog according to the status of the client and the context of the dialog, which can effectively reject noisy input.
  • This application adaptively changes the dialogue mode according to the dialogue scene and the state of the client, and opens the corresponding dialogue mode in different scenes, so as to meet the needs of different scenes.
  • the human-machine dialogue mode switching method of the present application further includes:
  • the stored reply content of the previous user sentence is acquired and presented to the user.
  • This embodiment considers that although the new user sentence belongs to a different conversation domain from the previous user sentence in the multi-round dialogue process, it may indeed be that the current user wants to urgently end the current multi-round conversation and open a dialogue in other domains.
  • the new user sentence entered by the current user for the first time is filtered out by the system and no response is received, it usually tries to enter the new user sentence a second time. Based on this embodiment, this actual scenario can be taken into consideration, thereby ensuring that the real needs of the current user are met, and the user experience is improved.
  • the reply content corresponding to the previous user sentence is saved, when the user wants to retrieve the previous reply content, the result can be directly presented to the user, without the user having to repeat what has been done before. Round the conversation to get the content of the reply.
  • the user voice instruction includes returning to the previous task or the previous user sentence or an answer sentence to the last question in the previous round of dialogue.
  • the preset dialogue field is a dialogue field whose number of times the current user uses exceeds a set threshold; in this embodiment, the method for switching the man-machine dialogue mode further includes:
  • the client ie, smart speaker
  • a user enters a certain field according to user behavior statistics, it is found that the user often uses the process.
  • the process is broadcast in TTS, the full-duplex mode is automatically turned on.
  • the user can speak in advance and enter the next step. There is no need to wait for the completion of the TTS broadcast in a round of dialogue process.
  • determining whether the dialog domain to which the current user sentence belongs is a preset dialog domain includes:
  • the user characteristic information of the current user is acquired according to the current user sentence; for example, the user characteristic information is the user's voiceprint information.
  • the searched preset dialogue field includes at least one specific dialogue field.
  • the same electronic device for example, smart speaker
  • a smart speaker used at home will be used by multiple people at home
  • different users have different habits or numbers of smart speakers or their own knowledge of different dialogue fields.
  • Smart speakers need to adjust different dialogue modes to better realize man-machine dialogue.
  • the user’s voiceprint information is used to identify different users, and the corresponding preset dialogue areas are determined, so that the current sentence of the current user can be accurately judged Whether the dialogue field is a preset dialogue field, a suitable dialogue mode is finally selected for man-machine dialogue.
  • the method before receiving the current user sentence spoken by the current user, the method further includes: detecting a wake-up word;
  • the half-duplex dialogue mode is turned on, and the user characteristic information is stored in the user characteristic information database.
  • This embodiment realizes the adaptive selection of the initial dialogue mode after the system is awakened.
  • the inventor found that the smart speaker or story machine will always broadcast the preset introduction content or guide the user how to use the content when it is turned on. This is indeed very practical for new users, but For users who are already familiar with the smart speaker or story machine, it seems a bit redundant or even disgusting.
  • the user characteristic information for example, voiceprint information
  • the user characteristic information database For example, voiceprint information
  • the present application also provides an electronic device, which includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores the memory that can be executed by the at least one processor.
  • the instructions are executed by the at least one processor to enable the at least one processor to execute:
  • the at least one processor is further configured to: when in a full-duplex dialogue mode,
  • the hold response corresponds to the reply content of the previous user sentence.
  • the at least one processor is further configured to:
  • the stored reply content of the previous user sentence is acquired and presented to the user.
  • the preset dialog field is a dialog field that the current user uses more than a set threshold
  • the at least one processor is also configured to:
  • determining whether the dialog domain to which the current user sentence belongs is a preset dialog domain includes:
  • the at least one processor is further configured to: before the receiving the current user sentence spoken by the current user,
  • the half-duplex dialogue mode is turned on, and the user characteristic information is stored in the user characteristic information database.
  • an embodiment of the present application further provides a system 400 for switching a man-machine dialogue mode, including:
  • the voice receiving module 410 is used to receive the current user sentence spoken by the current user;
  • the dialog field determining module 420 is configured to determine whether the dialog field to which the current user sentence belongs is a preset dialog field
  • the dialogue mode switching module 430 is configured to switch the current dialogue mode to a full-duplex dialogue mode when it is determined that the dialogue field to which the current user sentence belongs is a preset dialogue field; when the dialogue field to which the current user sentence belongs is determined When it is not the preset dialogue area, the current dialogue mode is switched to half-duplex dialogue mode.
  • the man-machine dialogue mode switching system when in the full-duplex dialogue mode, is further configured to:
  • the hold response corresponds to the reply content of the previous user sentence.
  • the human-machine dialogue mode switching system is further configured to:
  • the stored reply content of the previous user sentence is acquired and presented to the user.
  • the preset dialogue field is a dialogue field whose use times by the current user exceeds a set threshold; the man-machine dialogue mode switching system is further configured to:
  • determining whether the dialog domain to which the current user sentence belongs is a preset dialog domain includes:
  • the man-machine dialogue mode switching system is further configured to perform the following steps before receiving the current user sentence spoken by the current user:
  • the half-duplex dialogue mode is turned on, and the user characteristic information is stored in the user characteristic information database.
  • the user characteristic information is voiceprint information of the user.
  • the embodiments of the present application provide a non-volatile computer-readable storage medium, the storage medium stores one or more programs including execution instructions, and the execution instructions can be used by an electronic device (including However, it is not limited to a computer, server, or network device, etc.) to read and execute, so as to implement any of the above-mentioned man-machine dialogue mode switching methods in this application.
  • an electronic device including However, it is not limited to a computer, server, or network device, etc.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes a computer program stored on a non-volatile computer-readable storage medium.
  • the computer program includes program instructions. When the program instructions are executed by a computer, the computer executes any one of the above-mentioned man-machine dialogue mode switching methods.
  • the embodiments of the present application further provide an electronic device, which includes: at least one processor, and a memory communicatively connected with the at least one processor, wherein the memory stores the memory that can be used by the at least one processor.
  • An instruction executed by a processor the instruction being executed by the at least one processor, so that the at least one processor can execute the man-machine dialogue mode switching method.
  • the embodiments of the present application further provide a storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, a method for switching between man-machine dialogue modes is implemented.
  • the above-mentioned man-machine dialogue mode switching system of the embodiment of the present application can be used to implement the man-machine dialogue mode switching method of the embodiment of the present application, and correspondingly achieve the technical effects achieved by the above-mentioned man-machine dialogue mode switching method of the embodiment of the present application. I won't repeat it here.
  • a hardware processor (hardware processor) may be used to implement related functional modules.
  • FIG. 5 is a schematic diagram of the hardware structure of an electronic device for performing a method for switching between man-machine dialogue modes according to another embodiment of the present application. As shown in FIG. 5, the device includes:
  • One or more processors 510 and a memory 520 are taken as an example in FIG. 5.
  • the device for performing the human-machine dialogue mode switching method may further include: an input device 530 and an output device 540.
  • the processor 510, the memory 520, the input device 530, and the output device 540 may be connected by a bus or in other ways. In FIG. 5, the connection by a bus is taken as an example.
  • the memory 520 can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, as in the man-machine dialogue mode switching method in the embodiment of the present application.
  • the processor 510 executes various functional applications and data processing of the server by running non-volatile software programs, instructions, and modules stored in the memory 520, that is, implements the man-machine dialogue mode switching method in the foregoing method embodiment.
  • the memory 520 may include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the man-machine dialogue mode switching device, etc. .
  • the memory 520 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 520 may optionally include a memory remotely provided with respect to the processor 510, and these remote memories may be connected to the man-machine dialogue mode switching device via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 530 can receive inputted digital or character information, and generate signals related to user settings and function control of the man-machine dialogue mode switching device.
  • the output device 540 may include a display device such as a display screen.
  • the one or more modules are stored in the memory 520, and when executed by the one or more processors 510, the man-machine dialogue mode switching method in any of the foregoing method embodiments is executed.
  • the electronic devices of the embodiments of the present application exist in various forms, including but not limited to:
  • Mobile communication equipment This type of equipment is characterized by mobile communication functions, and its main goal is to provide voice and data communications.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile personal computer equipment This type of equipment belongs to the category of personal computers, has calculation and processing functions, and generally also has mobile Internet features.
  • Such terminals include: PDA, MID and UMPC devices, such as iPad.
  • Portable entertainment equipment This type of equipment can display and play multimedia content.
  • Such devices include: audio, video players (such as iPod), smart speakers, story machines, robots, handheld game consoles, e-books, as well as smart toys and portable car navigation devices.
  • Server A device that provides computing services.
  • the structure of a server includes a processor, hard disk, memory, system bus, etc.
  • the server is similar to a general-purpose computer architecture, but because it needs to provide highly reliable services, it is in terms of processing capacity and stability. , Reliability, security, scalability, and manageability.
  • the device embodiments described above are merely illustrative.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • each implementation manner can be implemented by means of software plus a general hardware platform, and of course, it can also be implemented by hardware.
  • the above technical solution essentially or the part that contributes to the related technology can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk , CD-ROM, etc., including a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each embodiment or some parts of the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)
  • Document Processing Apparatus (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开一种人机对话模式切换方法,应用于电子设备,其中,所述方法包括:接收当前用户所说的当前用户语句;确定所述当前用户语句所属的对话领域是否为预设对话领域;若是,则将当前对话模式切换为全双工对话模式;若否,则将当前对话模式切换为半双工对话模式。本申请通过判断当前用户语句所属的对话领域是否属于预设对话领域的方式,来切换对话模式,从而可以根据对话领域的不同而自动切换调整对话模式,使得人机对话始终处于最合适的对话模式之下,顺畅的实现人机对话。

Description

人机对话模式切换方法 技术领域
本申请涉及人机对话技术领域,尤其涉及一种人机对话模式切换方法。
背景技术
目前的人机对话中(例如,用户与智能音箱之间对话),多采用全双工对话模式或者半双工对话模式。
其中,全双工对话模式为:长开录音,交互过程中一直开录音,可以做到TTS播报和录音同时进行,上行流和下行流双向同时传输。优点为:交互模式自然,不会出现漏音频的问题;缺点为:录音长开,如果当前的AEC(回声消除)技术不够成熟,会导致TTS播报声音录入,造成误识别,从而误触发对话状态改变,影响对话流程。
半双工对话模式为:语音播报时,不进行录音,上行流和下行流数据交替进行传输。优点为:TTS播报时,不进行录音,可以防止噪声引起的对话流程误触发;缺点为:语音播报时,不进行录音,用户必须等待播报完成,才能进行下一轮对话,交互流程不自然。
发明内容
本申请实施例提供一种人机对话模式切换方法及系统,用于至少解决上述技术问题之一。
第一方面,本申请实施例提供一种人机对话模式切换方法,包括:
接收当前用户所说的当前用户语句;
确定所述当前用户语句所属的对话领域是否为预设对话领域;
若是,则将当前对话模式切换为全双工对话模式;
若否,则将当前对话模式切换为半双工对话模式。
第二方面,本申请实施例提供一种人机对话模式切换系统,包括:
语音接收模块,用于接收当前用户所说的当前用户语句;
对话领域确定模块,用于确定所述当前用户语句所属的对话领域是否为预设对话领域;
对话模式切换模块,用于当确定所述当前用户语句所属的对话领域为预设对话领域时,则将当前对话模式切换为全双工对话模式;当确定所述当前用户语句所属的对话领域不是预设对话领域时,则将当前对话模式切换为半双工对话模式。
第三方面,本申请实施例提供一种存储介质,所述存储介质中存储有一个或多个包括执行指令的程序,所述执行指令能够被电子设备(包括但不限于计算机,服务器,或者网络设备等)读取并执行,以用于执行本申请上述任一项人机对话模式切换方法。
第四方面,提供一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行本申请上述任一项人机对话模式切换方法。
第五方面,本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储在存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任一项人机对话模式切换方法。
本申请实施例的有益效果在于:通过判断当前用户语句所属的对话领域是否属于预设对话领域的方式,来切换对话模式,从而可以根据对话领域的不同而自动切换调整对话模式,使得人机对话始终处于最合适的对话模式之下,顺畅的实现人机对话。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请的人机对话模式切换方法的一实施例的流程图;
图2为本申请的人机对话模式切换方法的另一实施例的流程图;
图3为本申请的人机对话模式切换方法的又一实施例的流程图;
图4为本申请的人机对话模式切换系统的一实施例的原理框图;
图5为本申请的电子设备的一实施例的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、元件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
在本申请中,“模块”、“装置”、“系统”等指应用于计算机的相关实体,如硬件、硬件和软件的组合、软件或执行中的软件等。详细地说,例如,元件可以、但不限于是运行于处理器的过程、处理器、对象、可执行元件、执行线程、程序和/或计算机。还有,运行于服务器上的应用程序或脚本程序、服务器都可以是元件。一个或多个元件可在执行的过程和/或线程中,并且元件可以在一台计算机上本地化和/或分布在两台或多台计算机之间,并可以由各种计算机可读介质运行。元件还可以根据具有一个或多个数据包的信号,例如,来自一个与本地系统、分布式系统中另一元件交互的,和/或在因特网的网络通过信号与其它系统交互的数据的信号通过本地和/或远程过程来进行通信。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术 语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”,不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
如图1所示,本申请的实施例提供一种人机对话模式切换方法,该方法可应用于搭载有人机对话系统的电子设备,该电子设备可以是智能音箱、智能手机、智能机器人等,本申请对此不作限定。
以下以智能音箱为例对本申请的人机对话模式切换方法进行示例性的展开描述,该方法包括:
S11、接收当前用户所说的当前用户语句。
示例性地,智能音箱被当前用户唤醒之后启动默认对话模式或者适用于当前用户的对话模式,并检测用户语音信号,当检测到当前用户所说的当前用户语句时对该当前用户语句进行识别处理。
S12、确定所述当前用户语句所属的对话领域是否为预设对话领域。
示例性地,智能音箱获取对应于所述当前用户语句的文本内容,并根据文本内容确定当前用户语句所属的对话领域,并进一步判断该所属的对话领域是否为预设对话领域。示例性地,不同的对话领域对应于不同的对话场景,或者对话领域对应于智能音箱中的技能,每个技能属于一个对话领域,例如,成语接龙技能、导航技能、天气查询技能、订票技能等分别属于不同的对话领域。
S13、当确定所述当前用户语句所属的对话领域为预设对话领域时,则将当前对话模式切换为全双工对话模式。
示例性地,如果智能音箱的当前对话模式就是全双工对话模式,则保持当前对话模式即可;如果智能音箱的当前对话模式为半双工对话模式,则将其切换为全双工对话模式。
S14、当确定所述当前用户语句所属的对话领域不是预设对话领域时,则将当前对话模式切换为半双工对话模式。
示例性地,如果智能音箱的当前对话模式就是半双工对话模式,则保持当前对话模式即可;如果智能音箱的当前对话模式为全双工对话模式,则将其切换为半双工对话模式。
本实施例通过判断当前用户语句所属的对话领域是否属于预设对话领域的方式,来切换对话模式,从而可以根据对话领域的不同而自动切换调整对话模式,使得人机对话始终处于最合适的对话模式之下,顺畅的实现人机对话。
在一些实施例中,当确定所述当前用户语句所属的对话领域为半双工对话领域时,则将将当前对话模式切换为半双工对话模式,否则保持当前对话模式(若当前对话模式可能是半双工对话模式,或者全双工对话模式)。示例性地,半双工对话领域为预先配置的指定对话领域。
如图2所示,为本申请的人机对话模式切换方法的另一实施例的流程图,包括以下步骤:
用户输入;
判断用户输入当前命中对话领域是否为半双工领域;
若是,则下发给客户端,开启半双工命令,即开启半双工对话模式,从而客户端在TTS播报时关闭录音;
若否,则客户端在TTS播报时开启录音,即开启全双工对话模式;
判断对话是否结束,如果否则继续重复上述步骤。
示例性地,客户端播报较重要的TTS消息,希望用户完整收听到该条消息,不被异常打断,可以通过云端配置,指定某些对话领域为半双工模式,当用户说法命中该领域,云端下发消息给客户端,客户端此时自适应转变为半双工模式,在TTS播报时就会停止录音,从而避免噪声录入,影响对话状态。
例如,用户使用音响设备,没有屏幕,用户不听完TTS播报,则不知道下一步可以说什么命令,此时TTS播报是很重要的信息,如果TTS打断导致用户没有听完,用户会不知后面该说什么。以下举例为用户U和机器M的对话:
U:我想背诵古诗
M:唐,李白,静夜思,忘记可以说‘提示一下’,结束任务可以说‘退出’
U:提示一下。
在一些实施例中,当处于全双工对话模式下时,所述方法还包括:
在多轮对话过程中,确定接收的新的用户语句所属的对话领域;
如果所述新的用户语句与所述多轮对话过程中的前一用户语句属于不同对话领域,则保持应答对应于所述前一用户语句的答复内容。
示例性地,本实施例实现了根据对话上下文自适应限制对话领域的跳转。限制对话领域跳转,避免对话领域切换造成交互干扰。任务型多轮对话场景,对话领域切换会导致之前的对话上下文清空。
例如,用户U和机器M的对话如下:
U:我想导航去火车站。
M:找到如下地点,请问是第几个?
U:第一个。
M:正在为您规划路线,距离最短,躲避拥堵,不走高速,请问选择第几个?
U:第二个。
M:路线规划成功,是否开始导航?
U:打电话
M:请问你要打给谁?
检测到的“打电话”可能是周围噪声导致的误识别,此时如果系统对该输入响应,则会切换对话领域,导致导航的任务被打断,进入了电话领域,之前输入的上下文信息被清空,若回到导航领域,需要重新输入导航信息,影响交互体验。
采用本实施例的人机对话模式切换方法则可以根据判断新的用户语句所述的对话领域是否与当前多轮对话领域相同,来决定是否要响应新的用户语句,从而能够避免周围噪声导致的误识别,使得当前多轮对话任务顺利完成。
如图3所示,为本申请的人机对话模式切换方法的另一实施例的流程图,包括以下步骤:
用户输入;
判断客户端状态是否上传,限制语义领域;例如,当前客户端处于TTS播报状态,此时限制语义领域;当前客户端状态处于非TTS播报状态,此时无需限制语义领域;
如果否,则输入命中某语义领域(即,当前多轮对话的对话领域);
如果是,则判断用户输入是否命中与上轮对话相同语义领域;
如果是,则输入命中某语义领域(即,当前多轮对话的对话领域);
如果否,则语义匹配失败,过滤用户输入;
执行对话输出,实时上传客户端状态;
结束对话。
本申请的人机对话模式切换方法实际上实现了一种自适应对话模式切换方法,可以根据客户端状态(比如在TTS播报过程中)限制领域跳转,客户端状态改变(TTS播报完成)后放开领域跳转限制。这样减少了TTS播报过程中,由于噪声造成的误识别。
客户端实时上传状态给服务端,服务端根据客户端状态,结合对话上下文,自适应切换对话状态,可以有效拒识噪声输入。本申请根据对话场景和客户端状态,自适应改变对话模式,不同场景下开启相应的对话模式,从而满足不同场景的需求。
在一些实施例中,本申请的人机对话模式切换方法还包括:
如果再次接收到所述新的用户语句,则保存对应于所述前一用户语句的答复内容;
获取对应于所述新的对话语句的答复内容并呈现给用户;
当接收到重新获取所述前一用户语句的答复内容的用户语音指令时,获取所存储的所述前一用户语句的答复内容并呈现给用户。
本实施例考虑到新的用户语句虽然与多轮对话过程中的前一用户语句属于不同对话领域,但可能的确是当前用户想要紧急结束当前多轮对 话,开启其它领域的对话。这种情况下,当前用户第一次输入新的用户语句虽然被系统过滤掉了,没有得到应答,但通常会尝试第二次输入新的用户语句。基于本实施例就能够兼顾到这种实际场景,从而确保满足当前用户的真实需求,提高用户体验。
此外,由于保存了对应于所述前一用户语句的答复内容,因此当用户想要重新获取之前的答复内容时,就能够直接将结果呈现给用户,而不需要用户再重复之前已经进行的多轮对话来获得答复内容。
在一些实施例中,所述用户语音指令包括返回上一任务或者所述前一用户语句或者是对上一轮对话中的最后一个问句的回答语句。
示例性地,结合用户U和机器M的去火车站的对话为例,当用户打完电话之后,向机器说“返回上一任务”,或者“第二个”,或者“开始导航”,就能够使得机器重新将之前的多轮对话所得到的导航路线呈现给用户。
在一些实施例中,所述预设对话领域为当前用户使用次数超过设定阈值的对话领域;在该实施例中人机对话模式切换方法还包括:
确定所述当前用户语句所属的对话领域被当前用户所提及的次数是否超过所述设定阈值;
如果是,则将所述当前用户语音所属的对话领域标记为对应于所述当前用户的预设对话领域。
示例性地,客户端(即,智能音箱)将用户的日常操作通过事件上报给服务端。当用户进入某一领域时,根据用户行为统计,发现该用户经常使用该流程,该流程在TTS播报时,自适应开启全双工模式,在TTS播报过程中,用户就可以提前说话,进入下一轮对话流程,无需等待TTS播报完成。
例如,用户U和机器M的对话:
U:你好小驰(唤醒词)
M:你好主人,你要做什么?你可以说打电话,播放音乐,导航,设置。
U:导航去火车站(用户可以在机器刚开始播报你好主人的时候,就说‘导航去火车站’,而不用等整条TTS播报完成)。
在一些实施例中,确定所述当前用户语句所属的对话领域是否为预设对话领域包括:
确定所述当前用户语句所属的对话领域;
根据所述当前用户语句获取所述当前用户的用户特征信息;示例性地,所述用户特征信息为用户的声纹信息。
根据所述用户特征信息查询对应于所述当前用户的预设对话领域;所查询到的预设对话领域包括至少一个具体的对话领域。
确定所述所属的对话领域是否属于所述预设对话领域。
发明人在实现本申请的过程中发现,同一个电子设备(例如,智能音箱),可能会有多个不同的用户使用(例如,家庭使用的智能音箱,则会有家里的多个人都使用),这时不同的用户对智能音箱的使用习惯或者数量程度或者本身掌握不同对话领域的知识程度不同,就需要智能音箱调整不同的对话模式来更好的实现人机对话。
本实施例的方法中针对不同的用户具有不同的预设对话领域,通过用户的声纹信息来进行不同用户的识别,并且确定相应的预设对话领域,从而能够准确的判断当前用户的当前语句所述的对话领域是否为预设对话领域,最终选择出合适的对话模式进行人机对话。
在一些实施例中,在所述接收当前用户所说的当前用户语句之前还包括:检测唤醒词;
根据所检测到的唤醒词语音确定所述当前用户的用户特征信息;
查询用户特征信息库中是否存在所述当前用户的用户特征信息;
如果是,则开启全双工对话模式;
如果否,则开启半双工对话模式,并将所述用户特征信息存储至所述用户特征信息库。
本实施例实现了系统被唤醒之后初始对话模式的自适应选择。发明人在实现本申请的过程中发现,智能音箱或者故事机在打开时总是会播报预设的介绍内容或者引导用户如何使用的内容,这对于新的用户来说的确是非常实用的,但是对于已经熟知该智能音箱或者故事机的用户来说则显得 有些多余,甚至反感。
采用本实施例的方法,在智能音箱或者故事机被唤醒的同时通过当前用户的唤醒语音中所提取的用户特征信息(例如,声纹信息),与本地存储的声纹信息库进行比对,来判断当前用户是否为新的用户,如果否,则将系统初始化为全双工对话模式,从而用户可以随时输入语音指令来控制智能音箱或者故事机。
示例性地,本申请还提供一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行:
接收当前用户所说的当前用户语句;
确定所述当前用户语句所属的对话领域是否为预设对话领域;
若是,则将当前对话模式切换为全双工对话模式;
若否,则将当前对话模式切换为半双工对话模式。
在一些实施例中,所述至少一个处理器还配置为:当处于全双工对话模式下时,
在多轮对话过程中,确定接收的新的用户语句所属的对话领域;
如果所述新的用户语句与所述多轮对话过程中的前一用户语句属于不同对话领域,则保持应答对应于所述前一用户语句的答复内容。
在一些实施例中,所述至少一个处理器还配置为:
如果再次接收到所述新的用户语句,则保存对应于所述前一用户语句的答复内容;
获取对应于所述新的对话语句的答复内容并呈现给用户;
当接收到重新获取所述前一用户语句的答复内容的用户语音指令时,获取所存储的所述前一用户语句的答复内容并呈现给用户。
在一些实施例中,所述预设对话领域为当前用户使用次数超过设定阈 值的对话领域;
所述至少一个处理器还配置为:
确定所述当前用户语句所属的对话领域被当前用户所提及的次数是否超过所述设定阈值;
如果是,则将所述当前用户语音所属的对话领域标记为对应于所述当前用户的预设对话领域。
在一些实施例中,确定所述当前用户语句所属的对话领域是否为预设对话领域包括:
确定所述当前用户语句所属的对话领域;
根据所述当前用户语句获取所述当前用户的用户特征信息;
根据所述用户特征信息查询对应于所述当前用户的预设对话领域;
确定所述所属的对话领域是否属于所述预设对话领域。
在一些实施例中,所述至少一个处理器还配置为:在所述接收当前用户所说的当前用户语句之前,
检测唤醒词;
根据所检测到的唤醒词语音确定所述当前用户的用户特征信息;
查询用户特征信息库中是否存在所述当前用户的用户特征信息;
如果是,则开启全双工对话模式;
如果否,则开启半双工对话模式,并将所述用户特征信息存储至所述用户特征信息库。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作合并,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
如图4所示,本申请的实施例还提供一种人机对话模式切换系统400,包括:
语音接收模块410,用于接收当前用户所说的当前用户语句;
对话领域确定模块420,用于确定所述当前用户语句所属的对话领域是否为预设对话领域;
对话模式切换模块430,用于当确定所述当前用户语句所属的对话领域为预设对话领域时,则将当前对话模式切换为全双工对话模式;当确定所述当前用户语句所属的对话领域不是预设对话领域时,则将当前对话模式切换为半双工对话模式。
在一些实施例中,当处于全双工对话模式下时,所述人机对话模式切换系统还配置为:
在多轮对话过程中,确定接收的新的用户语句所属的对话领域;
如果所述新的用户语句与所述多轮对话过程中的前一用户语句属于不同对话领域,则保持应答对应于所述前一用户语句的答复内容。
在一些实施例中,所述人机对话模式切换系统还配置为:
如果再次接收到所述新的用户语句,则保存对应于所述前一用户语句的答复内容;
获取对应于所述新的对话语句的答复内容并呈现给用户;
当接收到重新获取所述前一用户语句的答复内容的用户语音指令时,获取所存储的所述前一用户语句的答复内容并呈现给用户。
在一些实施例中,所述预设对话领域为当前用户使用次数超过设定阈值的对话领域;所述人机对话模式切换系统还配置为:
确定所述当前用户语句所属的对话领域被当前用户所提及的次数是否超过所述设定阈值;
如果是,则将所述当前用户语音所属的对话领域标记为对应于所述当前用户的预设对话领域。
在一些实施例中,确定所述当前用户语句所属的对话领域是否为预设对话领域包括:
确定所述当前用户语句所属的对话领域;
根据所述当前用户语句获取所述当前用户的用户特征信息;
根据所述用户特征信息查询对应于所述当前用户的预设对话领域;
确定所述所属的对话领域是否属于所述预设对话领域。
在一些实施例中,所述人机对话模式切换系统还配置为:在所述接收当前用户所说的当前用户语句之前执行以下步骤:
检测唤醒词;
根据所检测到的唤醒词语音确定所述当前用户的用户特征信息;
查询用户特征信息库中是否存在所述当前用户的用户特征信息;
如果是,则开启全双工对话模式;
如果否,则开启半双工对话模式,并将所述用户特征信息存储至所述用户特征信息库。
在一些实施例中,所述用户特征信息为用户的声纹信息。
在一些实施例中,本申请实施例提供一种非易失性计算机可读存储介质,所述存储介质中存储有一个或多个包括执行指令的程序,所述执行指令能够被电子设备(包括但不限于计算机,服务器,或者网络设备等)读取并执行,以用于执行本申请上述任一项人机对话模式切换方法。
在一些实施例中,本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任一项人机对话模式切换方法。
在一些实施例中,本申请实施例还提供一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行人机对话模式切换方法。
在一些实施例中,本申请实施例还提供一种存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现人机对话模式切换方法。
上述本申请实施例的人机对话模式切换系统可用于执行本申请实施例的人机对话模式切换方法,并相应的达到上述本申请实施例的实现人机对话模式切换方法所达到的技术效果,这里不再赘述。本申请实施例中可以通过硬件处理器(hardware processor)来实现相关功能模块。
图5是本申请另一实施例提供的执行人机对话模式切换方法的电子设备的硬件结构示意图,如图5所示,该设备包括:
一个或多个处理器510以及存储器520,图5中以一个处理器510为例。
执行人机对话模式切换方法的设备还可以包括:输入装置530和输出装置540。
处理器510、存储器520、输入装置530和输出装置540可以通过总线或者其他方式连接,图5中以通过总线连接为例。
存储器520作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的人机对话模式切换方法对应的程序指令/模块。处理器510通过运行存储在存储器520中的非易失性软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例人机对话模式切换方法。
存储器520可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据人机对话模式切换装置的使用所创建的数据等。此外,存储器520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器520可选包括相对于处理器510远程设置的存储器,这些远程存储器可以通过网络连接至人机对话模式切换装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置530可接收输入的数字或字符信息,以及产生与人机对话模式切换装置的用户设置以及功能控制有关的信号。输出装置540可包括显示屏等显示设备。
所述一个或者多个模块存储在所述存储器520中,当被所述一个或者多个处理器510执行时,执行上述任意方法实施例中的人机对话模式切换方法。
上述产品可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。
本申请实施例的电子设备以多种形式存在,包括但不限于:
(1)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(2)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(3)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),智能音响,故事机,机器人,掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(4)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。
(5)其他具有数据交互功能的电子装置。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各 实施方式可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (10)

  1. 一种人机对话模式切换方法,应用于电子设备,所述方法包括:
    接收当前用户所说的当前用户语句;
    确定所述当前用户语句所属的对话领域是否为预设对话领域;
    若是,则将当前对话模式切换为全双工对话模式;
    若否,则将当前对话模式切换为半双工对话模式。
  2. 根据权利要求1所述的方法,其中,当处于全双工对话模式下时,所述方法还包括:
    在多轮对话过程中,确定接收的新的用户语句所属的对话领域;
    如果所述新的用户语句与所述多轮对话过程中的前一用户语句属于不同对话领域,则保持应答对应于所述前一用户语句的答复内容。
  3. 根据权利要求2所述的方法,其中,所述方法还包括:
    如果再次接收到所述新的用户语句,则保存对应于所述前一用户语句的答复内容;
    获取对应于所述新的对话语句的答复内容并呈现给用户;
    当接收到重新获取所述前一用户语句的答复内容的用户语音指令时,获取所存储的所述前一用户语句的答复内容并呈现给用户。
  4. 根据权利要求1所述的方法,其中,确定所述当前用户语句所属的对话领域是否为预设对话领域包括:
    确定所述当前用户语句所属的对话领域;
    根据所述当前用户语句获取所述当前用户的用户特征信息;
    根据所述用户特征信息查询对应于所述当前用户的预设对话领域;
    确定所述所属的对话领域是否属于所述预设对话领域。
  5. 根据权利要求1所述的方法,其中,所述预设对话领域为当前用户使用次数超过设定阈值的对话领域;
    所述方法还包括:
    确定所述当前用户语句所属的对话领域被当前用户所提及的次数是否超过所述设定阈值;
    如果是,则将所述当前用户语音所属的对话领域标记为对应于所述当前用户的预设对话领域。
  6. 一种电子设备,其包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行:
    接收当前用户所说的当前用户语句;
    确定所述当前用户语句所属的对话领域是否为预设对话领域;
    若是,则将当前对话模式切换为全双工对话模式;
    若否,则将当前对话模式切换为半双工对话模式。
  7. 根据权利要求6所述的电子设备,其中,所述至少一个处理器还配置为:当处于全双工对话模式下时,
    在多轮对话过程中,确定接收的新的用户语句所属的对话领域;
    如果所述新的用户语句与所述多轮对话过程中的前一用户语句属于不同对话领域,则保持应答对应于所述前一用户语句的答复内容。
  8. 根据权利要求7所述的电子设备,其中,所述至少一个处理器还配置为:
    如果再次接收到所述新的用户语句,则保存对应于所述前一用户语句的答复内容;
    获取对应于所述新的对话语句的答复内容并呈现给用户;
    当接收到重新获取所述前一用户语句的答复内容的用户语音指令时,获取所存储的所述前一用户语句的答复内容并呈现给用户。
  9. 根据权利要求6所述的电子设备,其中,确定所述当前用户语句 所属的对话领域是否为预设对话领域包括:
    确定所述当前用户语句所属的对话领域;
    根据所述当前用户语句获取所述当前用户的用户特征信息;
    根据所述用户特征信息查询对应于所述当前用户的预设对话领域;
    确定所述所属的对话领域是否属于所述预设对话领域。
  10. 根据权利要求6所述的电子设备,其中,所述预设对话领域为当前用户使用次数超过设定阈值的对话领域;
    所述至少一个处理器还配置为:
    确定所述当前用户语句所属的对话领域被当前用户所提及的次数是否超过所述设定阈值;
    如果是,则将所述当前用户语音所属的对话领域标记为对应于所述当前用户的预设对话领域。
PCT/CN2019/120617 2019-10-28 2019-11-25 人机对话模式切换方法 WO2021082133A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US17/770,206 US20220399020A1 (en) 2019-10-28 2019-11-25 Man-machine dialogue mode switching method
JP2022524252A JP7413521B2 (ja) 2019-10-28 2019-11-25 ヒューマンマシン対話モードの切り替え方法
EP19950263.4A EP4054111A4 (en) 2019-10-28 2019-11-25 METHOD OF SWITCHING BETWEEN HUMAN-MACHINE DIALOGUE MODES

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911028778.2A CN112735398B (zh) 2019-10-28 2019-10-28 人机对话模式切换方法及系统
CN201911028778.2 2019-10-28

Publications (1)

Publication Number Publication Date
WO2021082133A1 true WO2021082133A1 (zh) 2021-05-06

Family

ID=75588779

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/120617 WO2021082133A1 (zh) 2019-10-28 2019-11-25 人机对话模式切换方法

Country Status (5)

Country Link
US (1) US20220399020A1 (zh)
EP (1) EP4054111A4 (zh)
JP (1) JP7413521B2 (zh)
CN (1) CN112735398B (zh)
WO (1) WO2021082133A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112002315B (zh) * 2020-07-28 2023-12-29 珠海格力节能环保制冷技术研究中心有限公司 一种语音控制方法、装置、电器设备、存储介质及处理器
CN112820290A (zh) * 2020-12-31 2021-05-18 广东美的制冷设备有限公司 家电设备及其语音控制方法、语音装置、计算机存储介质
CN113744743B (zh) * 2021-08-27 2022-11-08 海信冰箱有限公司 一种洗衣机的语音交互方法及装置
CN117496973B (zh) * 2024-01-02 2024-03-19 四川蜀天信息技术有限公司 一种提升人机对话交互体验感的方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912070B1 (en) * 2006-07-12 2011-03-22 Nextel Communications Inc. System and method for seamlessly switching a half-duplex session to a full-duplex session
CN105679314A (zh) * 2015-12-28 2016-06-15 百度在线网络技术(北京)有限公司 语音识别方法和装置
CN105812573A (zh) * 2016-04-28 2016-07-27 努比亚技术有限公司 一种语音处理方法及移动终端
CN105931638A (zh) * 2016-04-26 2016-09-07 北京光年无限科技有限公司 面向智能机器人的对话系统数据处理方法及装置
CN108108340A (zh) * 2017-11-28 2018-06-01 北京光年无限科技有限公司 用于智能机器人的对话交互方法及系统
CN109657091A (zh) * 2019-01-02 2019-04-19 百度在线网络技术(北京)有限公司 语音交互设备的状态呈现方法、装置、设备及存储介质
CN110300435A (zh) * 2019-05-21 2019-10-01 努比亚技术有限公司 一种通信模式切换方法、终端及计算机可读存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004037721A (ja) * 2002-07-02 2004-02-05 Pioneer Electronic Corp 音声応答システム、音声応答プログラム及びそのための記憶媒体
US8369251B2 (en) * 2008-06-20 2013-02-05 Microsoft Corporation Timestamp quality assessment for assuring acoustic echo canceller operability
US8817641B2 (en) * 2011-02-16 2014-08-26 Intel Mobile Communications GmbH Communication terminal, communication device and methods thereof for detecting and avoiding in-device interference
JP6705589B2 (ja) * 2015-10-07 2020-06-03 Necソリューションイノベータ株式会社 音声認識システム、方法およびプログラム
US10311875B2 (en) * 2016-12-22 2019-06-04 Soundhound, Inc. Full-duplex utterance processing in a natural language virtual assistant
JP2018185362A (ja) 2017-04-24 2018-11-22 富士ソフト株式会社 ロボットおよびその制御方法
US20180364798A1 (en) * 2017-06-16 2018-12-20 Lenovo (Singapore) Pte. Ltd. Interactive sessions
CN107507612B (zh) 2017-06-30 2020-08-28 百度在线网络技术(北京)有限公司 一种声纹识别方法及装置
JP7111488B2 (ja) 2018-03-29 2022-08-02 旭化成ホームズ株式会社 発話量積算装置、接客支援装置及びプログラム
US11979360B2 (en) * 2018-10-25 2024-05-07 Microsoft Technology Licensing, Llc Multi-phrase responding in full duplex voice conversation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912070B1 (en) * 2006-07-12 2011-03-22 Nextel Communications Inc. System and method for seamlessly switching a half-duplex session to a full-duplex session
CN105679314A (zh) * 2015-12-28 2016-06-15 百度在线网络技术(北京)有限公司 语音识别方法和装置
CN105931638A (zh) * 2016-04-26 2016-09-07 北京光年无限科技有限公司 面向智能机器人的对话系统数据处理方法及装置
CN105812573A (zh) * 2016-04-28 2016-07-27 努比亚技术有限公司 一种语音处理方法及移动终端
CN108108340A (zh) * 2017-11-28 2018-06-01 北京光年无限科技有限公司 用于智能机器人的对话交互方法及系统
CN109657091A (zh) * 2019-01-02 2019-04-19 百度在线网络技术(北京)有限公司 语音交互设备的状态呈现方法、装置、设备及存储介质
CN110300435A (zh) * 2019-05-21 2019-10-01 努比亚技术有限公司 一种通信模式切换方法、终端及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4054111A4 *

Also Published As

Publication number Publication date
EP4054111A4 (en) 2022-12-07
CN112735398B (zh) 2022-09-06
US20220399020A1 (en) 2022-12-15
CN112735398A (zh) 2021-04-30
EP4054111A1 (en) 2022-09-07
JP2022554219A (ja) 2022-12-28
JP7413521B2 (ja) 2024-01-15

Similar Documents

Publication Publication Date Title
WO2021082133A1 (zh) 人机对话模式切换方法
JP2019117623A (ja) 音声対話方法、装置、デバイス及び記憶媒体
CN111049996B (zh) 多场景语音识别方法及装置、和应用其的智能客服系统
CN111540349B (zh) 一种语音的打断方法和装置
JP7353497B2 (ja) 能動的に対話の開始を提起するためのサーバ側処理方法及びサーバ、並びに能動的に対話の開始が提起できる音声インタラクションシステム
CN108962262A (zh) 语音数据处理方法和装置
CN103337242A (zh) 一种语音控制方法和控制设备
WO2021208392A1 (zh) 用于人机对话的语音技能跳转方法、电子设备及存储介质
CN109671429B (zh) 语音交互方法及设备
CN111462726B (zh) 一种外呼应答方法、装置、设备及介质
CN110619878A (zh) 用于办公系统的语音交互方法和装置
EP4047489A1 (en) Human-machine conversation processing method
CN109686372B (zh) 资源播放控制方法和装置
CN112700767B (zh) 人机对话打断方法及装置
WO2021042584A1 (zh) 全双工语音对话方法
CN111128166B (zh) 连续唤醒识别功能的优化方法和装置
US20200211552A1 (en) Voice interaction control method and apparatus
CN111161734A (zh) 基于指定场景的语音交互方法及装置
CN113488047A (zh) 人机对话打断方法、电子设备及计算机可读存储介质
CN112786031B (zh) 人机对话方法及系统
CN111047923B (zh) 故事机的控制方法、故事播放系统及存储介质
CN111312244B (zh) 用于沙盘的语音交互系统及方法
CN113658585B (zh) 语音交互模型的训练方法、语音交互方法及装置
CN112328765A (zh) 对话状态退出方法、终端设备及存储介质
CN113643691A (zh) 远场语音留言交互方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950263

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022524252

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019950263

Country of ref document: EP

Effective date: 20220530