WO2021212388A1 - Procédé et dispositif de mise en œuvre d'une communication interactive, et support de stockage - Google Patents

Procédé et dispositif de mise en œuvre d'une communication interactive, et support de stockage Download PDF

Info

Publication number
WO2021212388A1
WO2021212388A1 PCT/CN2020/086222 CN2020086222W WO2021212388A1 WO 2021212388 A1 WO2021212388 A1 WO 2021212388A1 CN 2020086222 W CN2020086222 W CN 2020086222W WO 2021212388 A1 WO2021212388 A1 WO 2021212388A1
Authority
WO
WIPO (PCT)
Prior art keywords
interactive
interaction
interactive object
candidate
state
Prior art date
Application number
PCT/CN2020/086222
Other languages
English (en)
Chinese (zh)
Inventor
马海滨
Original Assignee
南京阿凡达机器人科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京阿凡达机器人科技有限公司 filed Critical 南京阿凡达机器人科技有限公司
Priority to PCT/CN2020/086222 priority Critical patent/WO2021212388A1/fr
Priority to CN202080004243.6A priority patent/CN112739507B/zh
Publication of WO2021212388A1 publication Critical patent/WO2021212388A1/fr

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the invention relates to the technical field of human-computer interaction, in particular to a method, equipment and storage medium for realizing interactive communication.
  • Triggering operations such as "wake-up words" or touch input operations are the main triggering methods for triggering current robots or smart devices to perform human-computer interaction.
  • the problem of using the above method for interaction in a multi-person scenario is that for each subject person participating in the interaction, the above operation must be performed when the robot or smart device is in the awake state to switch new interactive objects midway, resulting in all The user must understand and master the trigger operations of different robots or smart devices.
  • the above-mentioned trigger operation is executed.
  • Such an interaction process is not only mechanical but also affects the rhythm of multi-person switching interaction. It cannot interact with multiple users in real time and intelligently in multi-user interaction scenarios. Users communicate effectively.
  • the purpose of the present invention is to provide a method, equipment and storage medium for realizing interactive communication to realize the natural, flexible and intelligent switching of interactive objects in a multi-user interaction scenario, so as to realize timely and efficient interaction with multiple objects in a humanized manner.
  • the purpose of communication is to provide a method, equipment and storage medium for realizing interactive communication to realize the natural, flexible and intelligent switching of interactive objects in a multi-user interaction scenario, so as to realize timely and efficient interaction with multiple objects in a humanized manner.
  • the present invention provides a method for realizing interactive communication, which includes the steps:
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
  • the detection is continued while responding to the required service type of the current interactive object.
  • the duration of no interaction object reaches the first preset time period, it enters the dormant state.
  • a wake-up signal If a wake-up signal is received, it switches from the dormant state to the wake-up state, and it is determined that the target object that triggers the awakening of itself is the current interactive object.
  • determining a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals includes the steps:
  • one candidate object is determined as the new interactive object according to the image recognition result and/or the sound source localization result.
  • the present invention also provides an interactive communication realization device, including:
  • Image collection module used to collect face images
  • Audio collection module used to collect voice signals
  • the detection module is used to detect whether the current interactive object stops interacting
  • the processing module is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awakened state.
  • the execution module is configured to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
  • the processing module is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state.
  • the detection module is also used to determine whether a wake-up signal is received when the detection module is in a sleep state
  • the processing module is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
  • processing module includes:
  • the searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration for the current interaction object to stop interacting reaches the second preset duration;
  • the object switching unit is configured to determine that if there is one candidate object, the candidate object is the new interactive object; if there are at least two candidate objects, determine one candidate object as the new interactive object according to the image recognition result and/or the sound source localization result New interactive objects.
  • the present invention also provides a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the interactive communication implementation method.
  • the interactive objects can be switched naturally, flexibly and intelligently in the multi-user interaction scene, so as to realize the timely and efficient interactive communication with multiple objects in a humanized manner. the goal of.
  • FIG. 1 is a flowchart of an embodiment of a method for implementing interactive communication of the present invention
  • FIG. 2 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 3 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 4 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 5 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 6 is a schematic diagram of the interaction of the emotional companion robot Robot of the present invention in a multi-user interaction scenario
  • FIG. 7 is a schematic diagram of the human-computer interaction process when the robot of the present invention faces multiple people
  • FIG. 8 is a schematic structural diagram of an embodiment of an interactive communication realization device of the present invention.
  • the terminal for implementing object switching includes, but is not limited to, personal virtual assistants, homework robots (such as sweeping robots), children's educational robots, elderly care robots, emotional companion robots, airport service robots, shopping service robots and other robots. It also includes smart devices such as smart phones, smart speaker devices, and smart voice elevators, which are usually used in social places such as shopping malls, subway stations, and railway stations.
  • a method for implementing interactive communication includes:
  • the robot or smart device can collect image data (including but not limited to face images, gesture images) within the field of view through image collection modules such as cameras or camera arrays, and can also use audio collection modules such as microphones or microphone arrays. Obtain the input voice signal within the effective acquisition range.
  • image collection modules such as cameras or camera arrays
  • audio collection modules such as microphones or microphone arrays.
  • the types of interaction between the robot or smart device and the current interactive object include, but are not limited to, voice dialogue interaction and gesture dialogue interaction.
  • the robot or smart device can judge whether to input the input voice signal according to the image data and/or voice signal on the current interactive object. It is also possible to determine whether to input a gesture based on the image data for the current interactive object.
  • the processor of the robot or smart device will perform the tasks it receives, it can also detect its own process to determine whether there is a voice interaction task obtained by voice recognition or a gesture interaction task obtained by image recognition, so as to detect and judge according to the above judgment result Whether the current interactive object stops interacting.
  • the microphone array in the embodiment of the present invention may be an array formed by a group of acoustic sensors located at different positions in space and regularly arranged according to a certain shape, and is a device for spatially sampling voice signals propagating in space.
  • the voice signal processing method of the embodiment of the present invention does not specifically limit the specific form of the microphone array used.
  • the camera array in the embodiment of the present invention may be an array in which a group of image sensors located at different positions in space are regularly arranged according to a certain shape to collect image data from multiple viewing angles.
  • the microphone array or camera may be a horizontal array, a T-shaped array, an L-shaped array, a polyhedral array, a spherical array, and so on.
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
  • the robot or smart device can determine whether the currently tracked interaction object (current interaction object includes a person, other smart device, or other robot) has stopped interacting with itself in an awakened state based on image data and/or voice signals. If the current interactive object stops interacting with the robot or smart device in the awake state, the robot or smart device will collect face images and voice signals to participate in one of the candidate objects of the interaction (the candidate objects include other people, Other smart devices or other robots) are replaced with new current interactive objects.
  • the candidate objects include other people, Other smart devices or other robots
  • robot A is the detection subject and user A is the current interactive object
  • robot A detects that user B is participating in the interaction by collecting image data and/or voice signals, then According to the image data and the voice signal, the user B is determined as the new interactive object.
  • a method for implementing interactive communication includes:
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
  • the robot or smart device When the robot or smart device is in the awake state, if it is detected that the current interactive object has not stopped interacting, the robot or smart device continues to real-time whether the current interactive object has stopped interacting, and at the same time, it also obtains the voice signal of the current interactive object during the detection process Perform voice recognition (or gesture recognition) (or gesture signal) to obtain the corresponding required service type, and perform corresponding operations according to the required service type to give a response to the current interactive object. Among them, performing voice recognition (gesture recognition) on the voice signal (or gesture signal) to obtain the required service type is the existing technology, and will not be repeated here.
  • a robot or smart device is used as the detection subject, and user A is the current interaction object.
  • the robot or smart device obtains the result by performing voice recognition on the voice signal input by user A. Play nursery rhymes", then the robot or smart device will query the music library to play nursery rhymes.
  • TTS Text To Speech
  • TTS devices TTS-enabled devices
  • TTS function no other services are provided
  • a method for implementing interactive communication includes:
  • S310 judges whether a wake-up signal is received when it is in a sleep state
  • the wake-up mechanism includes, but is not limited to, the wake-up signal is triggered by a voice input wake-up word, and mechanical buttons can also be preset on the robot or smart device. Or touch a button to generate a wake-up signal through touch and press, or it can generate a wake-up signal after receiving an input gesture that matches a preset wake-up gesture. Other ways of generating the wake-up signal by the wake-up mechanism are also within the protection scope of the present invention.
  • S320 If S320 receives the wake-up signal, switch from the dormant state to the wake-up state, and determine that the target object that triggers the awakening of itself is the current interactive object;
  • the robot or smart device receives the wake-up signal in the dormant state, it automatically switches from the dormant state to the awakened state, thereby determining the target object that triggers the wake-up as the initial current interaction object in the current awakening state, where the target object It can be a person with normal language ability, or a person who uses TTS equipment to send out voice signals.
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
  • a method for implementing interactive communication includes:
  • S410 detects whether the current interactive object stops interacting
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
  • S440 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
  • the robot or smart device when the robot or smart device is in the awake state, if the current interactive object stops interacting with itself, and the duration of the new interactive object interacting with itself is not detected for the first preset time period, it indicates that the robot or During the time period that the smart device lasts for the first preset time period, there is no interaction object to interact with the robot or the smart device.
  • the awake state when there are no interactive objects within the effective acquisition range of the audio acquisition module and image acquisition module of the robot or smart device and the duration reaches the first preset duration, it also indicates that the robot or smart device is continuing the first preset duration. In the set time period, there is no interaction object to interact with the robot or smart device.
  • the robot or smart device will automatically enter the dormant state at this time to prevent the robot or smart device from being awake for a long time and save the robot or smart device’s cost. Power consumption increases the standby time of robots or smart devices.
  • S450 judges whether a wake-up signal is received when it is in a sleep state
  • Interaction objects as long as the robot or smart device switches from the dormant state to the awakened state, during the subsequent wake-up state, there is no need to frequently voice input wake-up words as in the prior art to switch new interactive objects midway, and it does not need to frequently cause all Users must understand and master the trigger operations of different robots or smart devices. They only need to realize real-time and intelligent switching of new interactive objects in multi-user interaction scenarios based on the collected image data and voice signals. This is not only more in line with daily communication patterns, but also It helps to achieve effective communication and increase the personification effect of human-machine communication, so as to achieve the purpose of effective interactive communication between robots or smart devices and multiple objects.
  • a method for implementing interactive communication includes:
  • S510 detects whether the current interactive object stops interacting
  • the second preset duration is less than the first preset duration.
  • the robot or smart device when the robot or smart device meets the trigger condition for searching and switching new interactive objects, only one candidate object is determined as the new interactive object found this time after each search.
  • the robot or smart device can be responsible for sound collection through the audio collection module to realize the auditory function of the robot or smart device.
  • the voice signal After the voice signal is collected, the voice signal is processed by framing and windowing, and the audio processing of the voice signal is used to determine the number of sound sources Then, the number of candidate objects is determined according to the number of sound sources, and the sound source localization recognition is a prior art, which will not be repeated here. If the number of candidate objects is determined to be one through the above method, the candidate object is directly determined as the new interactive object. If it is determined that the number of candidate objects is at least two, it is determined according to the time sequence of the acquired voice signals that the candidate user corresponding to the earliest acquired voice signal is the new interactive object found for this handover.
  • the robot or smart device collects voice signals in real time through the audio collection module, and obtains the number of sound sources according to the sound source location recognition technology to determine the earliest sound source.
  • the candidate users of the voice signal are the new interactive objects found for this handover.
  • the robot or smart device can also be responsible for the collection of image data through the image acquisition module to realize the vision function of the robot or smart device.
  • the number of candidate objects can be determined through the image recognition result of the image recognition technology. If the candidate objects are determined When the number of is one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, the candidate user corresponding to the earliest participating interaction is determined as the new interaction object found in this handover according to the time sequence of each candidate object participating in the interaction obtained by image recognition.
  • the robot captures image data in real time through the image acquisition module, and performs face recognition on the acquired image data.
  • the candidate user A who issued the mouth opening action first is determined as the new interactive object found in this switch.
  • the robot or smart device can also be responsible for the collection of image data through the image collection module, and the audio collection module is responsible for the collection of sound.
  • the image recognition technology and sound source localization technology can be combined to analyze and determine the candidate object. Quantity, if the number of candidate objects is determined to be one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, comprehensively analyze the mouth opening action and voice signal of the candidate object according to the image recognition result and/or the sound source localization result, and find the earliest participant in the interaction from the candidate objects participating in the interaction. Corresponding to the candidate user, thereby determining that the candidate user who participated in the interaction earliest is the new interaction object found for this handover.
  • S560 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
  • S570 judges whether a wake-up signal is received when it is in a sleep state
  • the present invention preferably uses image data and voice signals as the judging factors to detect candidate objects and determine one of the candidate objects as a new interactive object, so as to avoid candidates who will emit meaningless voice signals within the effective collection range of the audio collection module and the image collection module Objects (such as babies) or users who have no interactive intentions are mistakenly identified as new interactive objects.
  • image recognition technology and sound source localization technology it achieves precise positioning of the direction and position of candidate objects, and improves the search and determination of new interactive objects. The accuracy rate.
  • the robot or smart device automatically switches to a new interactive object to continue the interaction when the robot or smart device is awakened, which improves the efficiency of switching between the robot or smart device and multiple interactive objects, and shortens the robot or smart device from turning to the next interactive object.
  • FIG. 6 in the use scene of the emotional companion robot Robot, it includes Robot, User1, User2, and User3. Moreover, User1, User2, and User3 mentioned in the figure are not specific, but only used to distinguish different users.
  • User1 comes to Robot and wakes up Robot through the wake-up word. Then Robot turns to User1 and interacts with User1. During the interaction, it is necessary to determine in real time whether User1 is still interacting with it (Robot). Robot locates and interacts with it through the sound source. Facial feature recognition judges that User1 has stopped interacting with it (Robot), and Robot should automatically turn to User2 who is talking. This strategy is also adapted when there are more than two users.
  • the process of human-computer interaction when the robot faces multiple people is shown in Figure 7 and includes the following steps:
  • Step 0 Initial state; one Robot (in dormant state), two or more users who can interact with the Robot.
  • Step 1 User1 approaches Robot and wakes up Robot. Robot is awakened from dormant state and switched to awakened state, and then go to step 2.
  • Step 2 Robot turns to User1 and interacts with User1, and then goes to step 3.
  • Step 3 In the process of interaction between Robot and User1, it will judge whether User1 is still interacting with itself (Robot) through sound source localization and facial feature recognition. The judgment results are divided into the following four types:
  • An embodiment of the present invention, an interactive communication realization device, as shown in FIG. 8, includes:
  • the image collection module 10 is used to collect face images
  • the audio collection module 20 is used to collect voice signals
  • the detection module 30 is used to detect whether the current interactive object stops interacting
  • the processing module 40 is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awake state.
  • this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
  • the detection module 30 is also used for judging whether a wake-up signal is received when it is in a dormant state
  • the processing module 40 is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
  • this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
  • the execution module is used to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
  • the processing module 40 is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awake state.
  • this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
  • the processing module 40 includes:
  • the searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration of the current interaction object stops interacting reaches the second preset duration;
  • the object switching unit is used to determine if there is a candidate object as a new interactive object; if there are at least two candidate objects, determine a candidate object as a new interactive object according to the image recognition result and/or the sound source localization result .
  • this embodiment is a device embodiment corresponding to the above method embodiment.
  • this embodiment is a device embodiment corresponding to the above method embodiment.
  • a smart device includes a processor and a memory, where the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory to implement the interaction in the above method embodiment Communication implementation method.
  • the smart device may be a desktop computer, a notebook, a palmtop computer, a tablet computer, a mobile phone, a human-computer interaction screen and other devices.
  • the smart device may include, but is not limited to, a processor and a memory.
  • Smart devices may also include input/output interfaces, display devices, network access devices, communication buses, communication interfaces, and so on.
  • the communication interface and the communication bus may also include an input/output interface, where the processor, the memory, the input/output interface, and the communication interface complete mutual communication through the communication bus.
  • the memory stores a computer program, and the processor is used to execute the computer program stored on the memory to implement the interactive communication implementation method in the foregoing method embodiment.
  • the processor may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory may be an internal storage unit of the smart device, such as a hard disk or memory of the smart device.
  • the memory may also be an external storage device of the smart device, for example: a plug-in hard disk equipped on the smart device, a smart media card (SMC), a secure digital (SD) card, Flash Card, etc.
  • the memory may also include both an internal storage unit of the smart device and an external storage device.
  • the memory is used to store the computer program and other programs and data required by the smart device.
  • the memory can also be used to temporarily store data that has been output or will be output.
  • the communication bus is a circuit that connects the described elements and realizes transmission between these elements.
  • the processor receives commands from other elements through the communication bus, decrypts the received commands, and performs calculations or data processing according to the decrypted commands.
  • the memory may include program modules, such as a kernel (kernel), middleware (middleware), application programming interface (Application Programming Interface, API), and applications.
  • the program module can be composed of software, firmware or hardware, or at least two of them.
  • the input/output interface forwards commands or data input by the user through the input/output interface (such as a sensor, a keyboard, and a touch screen).
  • the communication interface connects the smart device with other network devices, user equipment, and the network.
  • the communication interface may be wired or wirelessly connected to the network to connect to other external network equipment or user equipment.
  • the wireless communication may include at least one of the following: wireless fidelity (WiFi), Bluetooth (BT), short-range wireless communication technology (NFC), global satellite positioning system (GPS), cellular communication, and so on.
  • Wired communication may include at least one of the following: universal serial bus (USB), high-definition multimedia interface (HDMI), asynchronous transmission standard interface (RS-232), and so on.
  • the network can be a telecommunication network and a communication network.
  • the communication network can be a computer network, the Internet, the Internet of Things, and a telephone network. Smart devices can be connected to the network through a communication interface, and the protocol used by the smart device to communicate with other network devices can be supported by at least one of the application, application programming interface (API), middleware, kernel, and communication interface.
  • API application programming interface
  • An embodiment of the present invention is a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the corresponding embodiment of the foregoing interactive communication implementation method.
  • the computer-readable storage medium may be read-only memory (ROM), random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • the disclosed device/smart device and method may be implemented in other ways.
  • the device/smart device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other division methods for example, multiple divisions.
  • Units or components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the present invention implements all or part of the processes in the above-mentioned embodiment methods, and can also be completed by sending instructions to related hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes: computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media etc.
  • the content contained in the computer-readable storage medium can be appropriately increased or decreased in accordance with the requirements of the legislation and patent practice in the jurisdiction.
  • the computer can be The reading medium does not include electric carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un procédé et un dispositif de mise en œuvre d'une communication interactive, et un support de stockage. Le procédé comporte les étapes consistant à: détecter si l'objet interactif actuel cesse l'interaction (S110); et lorsque l'objet interactif actuel cesse l'interaction dans un état d'éveil, déterminer, au moyen de données d'image collectées et d'un signal de parole, un objet candidat participant à l'interaction en tant que nouvel objet interactif (S120). À ce titre, des objets interactifs peuvent être permutés de façon naturelle, souple et intelligente dans un scénario d'interaction multi-utilisateurs, de façon à atteindre, d'une manière humanisée, l'objectif d'une communication interactive avec une pluralité d'objets en temps opportun et d'une manière efficiente.
PCT/CN2020/086222 2020-04-22 2020-04-22 Procédé et dispositif de mise en œuvre d'une communication interactive, et support de stockage WO2021212388A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/086222 WO2021212388A1 (fr) 2020-04-22 2020-04-22 Procédé et dispositif de mise en œuvre d'une communication interactive, et support de stockage
CN202080004243.6A CN112739507B (zh) 2020-04-22 2020-04-22 一种交互沟通实现方法、设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/086222 WO2021212388A1 (fr) 2020-04-22 2020-04-22 Procédé et dispositif de mise en œuvre d'une communication interactive, et support de stockage

Publications (1)

Publication Number Publication Date
WO2021212388A1 true WO2021212388A1 (fr) 2021-10-28

Family

ID=75609496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086222 WO2021212388A1 (fr) 2020-04-22 2020-04-22 Procédé et dispositif de mise en œuvre d'une communication interactive, et support de stockage

Country Status (2)

Country Link
CN (1) CN112739507B (fr)
WO (1) WO2021212388A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114193477A (zh) * 2021-12-24 2022-03-18 上海擎朗智能科技有限公司 一种位置引领方法、装置、机器人及存储介质
CN116363566A (zh) * 2023-06-02 2023-06-30 华东交通大学 一种基于关系知识图的目标交互关系识别方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116978372A (zh) * 2022-04-22 2023-10-31 华为技术有限公司 语音交互方法、电子设备以及存储介质
CN114715175A (zh) * 2022-05-06 2022-07-08 Oppo广东移动通信有限公司 目标对象的确定方法、装置、电子设备以及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105881548A (zh) * 2016-04-29 2016-08-24 北京快乐智慧科技有限责任公司 唤醒智能交互机器人的方法及智能交互机器人
CN108733420A (zh) * 2018-03-21 2018-11-02 北京猎户星空科技有限公司 智能设备的唤醒方法、装置、智能设备和存储介质
CN109683610A (zh) * 2018-12-14 2019-04-26 北京猎户星空科技有限公司 智能设备控制方法、装置和存储介质
CN110111789A (zh) * 2019-05-07 2019-08-09 百度国际科技(深圳)有限公司 语音交互方法、装置、计算设备和计算机可读介质
US20190371342A1 (en) * 2018-06-05 2019-12-05 Samsung Electronics Co., Ltd. Methods and systems for passive wakeup of a user interaction device
CN110730115A (zh) * 2019-09-11 2020-01-24 北京小米移动软件有限公司 语音控制方法及装置、终端、存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354255A (zh) * 2016-08-26 2017-01-25 北京光年无限科技有限公司 一种面向机器人产品的人机交互方法及装置
CN110290096B (zh) * 2018-03-19 2022-06-24 阿里巴巴集团控股有限公司 一种人机交互方法和终端
CN109166575A (zh) * 2018-07-27 2019-01-08 百度在线网络技术(北京)有限公司 智能设备的交互方法、装置、智能设备和存储介质
CN109461448A (zh) * 2018-12-11 2019-03-12 百度在线网络技术(北京)有限公司 语音交互方法及装置
CN110689889B (zh) * 2019-10-11 2021-08-17 深圳追一科技有限公司 人机交互方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105881548A (zh) * 2016-04-29 2016-08-24 北京快乐智慧科技有限责任公司 唤醒智能交互机器人的方法及智能交互机器人
CN108733420A (zh) * 2018-03-21 2018-11-02 北京猎户星空科技有限公司 智能设备的唤醒方法、装置、智能设备和存储介质
US20190371342A1 (en) * 2018-06-05 2019-12-05 Samsung Electronics Co., Ltd. Methods and systems for passive wakeup of a user interaction device
CN109683610A (zh) * 2018-12-14 2019-04-26 北京猎户星空科技有限公司 智能设备控制方法、装置和存储介质
CN110111789A (zh) * 2019-05-07 2019-08-09 百度国际科技(深圳)有限公司 语音交互方法、装置、计算设备和计算机可读介质
CN110730115A (zh) * 2019-09-11 2020-01-24 北京小米移动软件有限公司 语音控制方法及装置、终端、存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114193477A (zh) * 2021-12-24 2022-03-18 上海擎朗智能科技有限公司 一种位置引领方法、装置、机器人及存储介质
CN116363566A (zh) * 2023-06-02 2023-06-30 华东交通大学 一种基于关系知识图的目标交互关系识别方法
CN116363566B (zh) * 2023-06-02 2023-10-17 华东交通大学 一种基于关系知识图的目标交互关系识别方法

Also Published As

Publication number Publication date
CN112739507A (zh) 2021-04-30
CN112739507B (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2021212388A1 (fr) Procédé et dispositif de mise en œuvre d'une communication interactive, et support de stockage
US11620984B2 (en) Human-computer interaction method, and electronic device and storage medium thereof
CN108735209B (zh) 唤醒词绑定方法、智能设备及存储介质
KR101726945B1 (ko) 수동 시작/종료 포인팅 및 트리거 구문들에 대한 필요성의 저감
WO2021036714A1 (fr) Procédé d'affichage d'écran divisé à commande vocale et dispositif électronique
CN110263131B (zh) 回复信息生成方法、装置及存储介质
EP4184506A1 (fr) Traitement audio
CN108766438A (zh) 人机交互方法、装置、存储介质及智能终端
CN112860169B (zh) 交互方法及装置、计算机可读介质和电子设备
CN111063354B (zh) 人机交互方法及装置
CN109032554B (zh) 一种音频处理方法和电子设备
WO2022042274A1 (fr) Procédé d'interaction vocale et dispositif électronique
CN112634895A (zh) 语音交互免唤醒方法和装置
US20230048330A1 (en) In-Vehicle Speech Interaction Method and Device
WO2024103926A1 (fr) Procédés et appareils de commande vocale, support de stockage et dispositif électronique
CN111370004A (zh) 人机交互方法、语音处理方法及设备
CN112233676A (zh) 智能设备唤醒方法、装置、电子设备及存储介质
CN117253478A (zh) 一种语音交互方法和相关装置
WO2022227507A1 (fr) Procédé d'apprentissage de modèle de reconnaissance de degré de réveil et procédé d'acquisition de degré de réveil vocal
CN106683668A (zh) 一种智能设备的唤醒控制的方法以及系统
WO2023006033A1 (fr) Procédé d'interaction vocale, dispositif électronique et support
US11929081B2 (en) Electronic apparatus and controlling method thereof
WO2024103893A1 (fr) Procédé de réveil de programme d'application et dispositif électronique
CN110989963B (zh) 唤醒词推荐方法及装置、存储介质
WO2024055831A1 (fr) Procédé et appareil d'interaction vocale et terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1