WO2021212388A1 - Interactive communication implementation method and device, and storage medium - Google Patents
Interactive communication implementation method and device, and storage medium Download PDFInfo
- Publication number
- WO2021212388A1 WO2021212388A1 PCT/CN2020/086222 CN2020086222W WO2021212388A1 WO 2021212388 A1 WO2021212388 A1 WO 2021212388A1 CN 2020086222 W CN2020086222 W CN 2020086222W WO 2021212388 A1 WO2021212388 A1 WO 2021212388A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- interactive
- interaction
- interactive object
- candidate
- state
- Prior art date
Links
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 173
- 238000004891 communication Methods 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000003993 interaction Effects 0.000 claims abstract description 79
- 238000012545 processing Methods 0.000 claims description 19
- 230000004807 localization Effects 0.000 claims description 18
- 238000001514 detection method Methods 0.000 claims description 15
- 230000009471 action Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000002996 emotional effect Effects 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000010408 sweeping Methods 0.000 description 2
- 206010011878 Deafness Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000035484 reaction time Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J11/00—Manipulators not otherwise provided for
- B25J11/0005—Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Definitions
- the invention relates to the technical field of human-computer interaction, in particular to a method, equipment and storage medium for realizing interactive communication.
- Triggering operations such as "wake-up words" or touch input operations are the main triggering methods for triggering current robots or smart devices to perform human-computer interaction.
- the problem of using the above method for interaction in a multi-person scenario is that for each subject person participating in the interaction, the above operation must be performed when the robot or smart device is in the awake state to switch new interactive objects midway, resulting in all The user must understand and master the trigger operations of different robots or smart devices.
- the above-mentioned trigger operation is executed.
- Such an interaction process is not only mechanical but also affects the rhythm of multi-person switching interaction. It cannot interact with multiple users in real time and intelligently in multi-user interaction scenarios. Users communicate effectively.
- the purpose of the present invention is to provide a method, equipment and storage medium for realizing interactive communication to realize the natural, flexible and intelligent switching of interactive objects in a multi-user interaction scenario, so as to realize timely and efficient interaction with multiple objects in a humanized manner.
- the purpose of communication is to provide a method, equipment and storage medium for realizing interactive communication to realize the natural, flexible and intelligent switching of interactive objects in a multi-user interaction scenario, so as to realize timely and efficient interaction with multiple objects in a humanized manner.
- the present invention provides a method for realizing interactive communication, which includes the steps:
- a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
- the detection is continued while responding to the required service type of the current interactive object.
- the duration of no interaction object reaches the first preset time period, it enters the dormant state.
- a wake-up signal If a wake-up signal is received, it switches from the dormant state to the wake-up state, and it is determined that the target object that triggers the awakening of itself is the current interactive object.
- determining a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals includes the steps:
- one candidate object is determined as the new interactive object according to the image recognition result and/or the sound source localization result.
- the present invention also provides an interactive communication realization device, including:
- Image collection module used to collect face images
- Audio collection module used to collect voice signals
- the detection module is used to detect whether the current interactive object stops interacting
- the processing module is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awakened state.
- the execution module is configured to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
- the processing module is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state.
- the detection module is also used to determine whether a wake-up signal is received when the detection module is in a sleep state
- the processing module is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
- processing module includes:
- the searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration for the current interaction object to stop interacting reaches the second preset duration;
- the object switching unit is configured to determine that if there is one candidate object, the candidate object is the new interactive object; if there are at least two candidate objects, determine one candidate object as the new interactive object according to the image recognition result and/or the sound source localization result New interactive objects.
- the present invention also provides a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the interactive communication implementation method.
- the interactive objects can be switched naturally, flexibly and intelligently in the multi-user interaction scene, so as to realize the timely and efficient interactive communication with multiple objects in a humanized manner. the goal of.
- FIG. 1 is a flowchart of an embodiment of a method for implementing interactive communication of the present invention
- FIG. 2 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
- FIG. 3 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
- FIG. 4 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
- FIG. 5 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
- FIG. 6 is a schematic diagram of the interaction of the emotional companion robot Robot of the present invention in a multi-user interaction scenario
- FIG. 7 is a schematic diagram of the human-computer interaction process when the robot of the present invention faces multiple people
- FIG. 8 is a schematic structural diagram of an embodiment of an interactive communication realization device of the present invention.
- the terminal for implementing object switching includes, but is not limited to, personal virtual assistants, homework robots (such as sweeping robots), children's educational robots, elderly care robots, emotional companion robots, airport service robots, shopping service robots and other robots. It also includes smart devices such as smart phones, smart speaker devices, and smart voice elevators, which are usually used in social places such as shopping malls, subway stations, and railway stations.
- a method for implementing interactive communication includes:
- the robot or smart device can collect image data (including but not limited to face images, gesture images) within the field of view through image collection modules such as cameras or camera arrays, and can also use audio collection modules such as microphones or microphone arrays. Obtain the input voice signal within the effective acquisition range.
- image collection modules such as cameras or camera arrays
- audio collection modules such as microphones or microphone arrays.
- the types of interaction between the robot or smart device and the current interactive object include, but are not limited to, voice dialogue interaction and gesture dialogue interaction.
- the robot or smart device can judge whether to input the input voice signal according to the image data and/or voice signal on the current interactive object. It is also possible to determine whether to input a gesture based on the image data for the current interactive object.
- the processor of the robot or smart device will perform the tasks it receives, it can also detect its own process to determine whether there is a voice interaction task obtained by voice recognition or a gesture interaction task obtained by image recognition, so as to detect and judge according to the above judgment result Whether the current interactive object stops interacting.
- the microphone array in the embodiment of the present invention may be an array formed by a group of acoustic sensors located at different positions in space and regularly arranged according to a certain shape, and is a device for spatially sampling voice signals propagating in space.
- the voice signal processing method of the embodiment of the present invention does not specifically limit the specific form of the microphone array used.
- the camera array in the embodiment of the present invention may be an array in which a group of image sensors located at different positions in space are regularly arranged according to a certain shape to collect image data from multiple viewing angles.
- the microphone array or camera may be a horizontal array, a T-shaped array, an L-shaped array, a polyhedral array, a spherical array, and so on.
- a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
- the robot or smart device can determine whether the currently tracked interaction object (current interaction object includes a person, other smart device, or other robot) has stopped interacting with itself in an awakened state based on image data and/or voice signals. If the current interactive object stops interacting with the robot or smart device in the awake state, the robot or smart device will collect face images and voice signals to participate in one of the candidate objects of the interaction (the candidate objects include other people, Other smart devices or other robots) are replaced with new current interactive objects.
- the candidate objects include other people, Other smart devices or other robots
- robot A is the detection subject and user A is the current interactive object
- robot A detects that user B is participating in the interaction by collecting image data and/or voice signals, then According to the image data and the voice signal, the user B is determined as the new interactive object.
- a method for implementing interactive communication includes:
- a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
- the robot or smart device When the robot or smart device is in the awake state, if it is detected that the current interactive object has not stopped interacting, the robot or smart device continues to real-time whether the current interactive object has stopped interacting, and at the same time, it also obtains the voice signal of the current interactive object during the detection process Perform voice recognition (or gesture recognition) (or gesture signal) to obtain the corresponding required service type, and perform corresponding operations according to the required service type to give a response to the current interactive object. Among them, performing voice recognition (gesture recognition) on the voice signal (or gesture signal) to obtain the required service type is the existing technology, and will not be repeated here.
- a robot or smart device is used as the detection subject, and user A is the current interaction object.
- the robot or smart device obtains the result by performing voice recognition on the voice signal input by user A. Play nursery rhymes", then the robot or smart device will query the music library to play nursery rhymes.
- TTS Text To Speech
- TTS devices TTS-enabled devices
- TTS function no other services are provided
- a method for implementing interactive communication includes:
- S310 judges whether a wake-up signal is received when it is in a sleep state
- the wake-up mechanism includes, but is not limited to, the wake-up signal is triggered by a voice input wake-up word, and mechanical buttons can also be preset on the robot or smart device. Or touch a button to generate a wake-up signal through touch and press, or it can generate a wake-up signal after receiving an input gesture that matches a preset wake-up gesture. Other ways of generating the wake-up signal by the wake-up mechanism are also within the protection scope of the present invention.
- S320 If S320 receives the wake-up signal, switch from the dormant state to the wake-up state, and determine that the target object that triggers the awakening of itself is the current interactive object;
- the robot or smart device receives the wake-up signal in the dormant state, it automatically switches from the dormant state to the awakened state, thereby determining the target object that triggers the wake-up as the initial current interaction object in the current awakening state, where the target object It can be a person with normal language ability, or a person who uses TTS equipment to send out voice signals.
- a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
- a method for implementing interactive communication includes:
- S410 detects whether the current interactive object stops interacting
- a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
- S440 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
- the robot or smart device when the robot or smart device is in the awake state, if the current interactive object stops interacting with itself, and the duration of the new interactive object interacting with itself is not detected for the first preset time period, it indicates that the robot or During the time period that the smart device lasts for the first preset time period, there is no interaction object to interact with the robot or the smart device.
- the awake state when there are no interactive objects within the effective acquisition range of the audio acquisition module and image acquisition module of the robot or smart device and the duration reaches the first preset duration, it also indicates that the robot or smart device is continuing the first preset duration. In the set time period, there is no interaction object to interact with the robot or smart device.
- the robot or smart device will automatically enter the dormant state at this time to prevent the robot or smart device from being awake for a long time and save the robot or smart device’s cost. Power consumption increases the standby time of robots or smart devices.
- S450 judges whether a wake-up signal is received when it is in a sleep state
- Interaction objects as long as the robot or smart device switches from the dormant state to the awakened state, during the subsequent wake-up state, there is no need to frequently voice input wake-up words as in the prior art to switch new interactive objects midway, and it does not need to frequently cause all Users must understand and master the trigger operations of different robots or smart devices. They only need to realize real-time and intelligent switching of new interactive objects in multi-user interaction scenarios based on the collected image data and voice signals. This is not only more in line with daily communication patterns, but also It helps to achieve effective communication and increase the personification effect of human-machine communication, so as to achieve the purpose of effective interactive communication between robots or smart devices and multiple objects.
- a method for implementing interactive communication includes:
- S510 detects whether the current interactive object stops interacting
- the second preset duration is less than the first preset duration.
- the robot or smart device when the robot or smart device meets the trigger condition for searching and switching new interactive objects, only one candidate object is determined as the new interactive object found this time after each search.
- the robot or smart device can be responsible for sound collection through the audio collection module to realize the auditory function of the robot or smart device.
- the voice signal After the voice signal is collected, the voice signal is processed by framing and windowing, and the audio processing of the voice signal is used to determine the number of sound sources Then, the number of candidate objects is determined according to the number of sound sources, and the sound source localization recognition is a prior art, which will not be repeated here. If the number of candidate objects is determined to be one through the above method, the candidate object is directly determined as the new interactive object. If it is determined that the number of candidate objects is at least two, it is determined according to the time sequence of the acquired voice signals that the candidate user corresponding to the earliest acquired voice signal is the new interactive object found for this handover.
- the robot or smart device collects voice signals in real time through the audio collection module, and obtains the number of sound sources according to the sound source location recognition technology to determine the earliest sound source.
- the candidate users of the voice signal are the new interactive objects found for this handover.
- the robot or smart device can also be responsible for the collection of image data through the image acquisition module to realize the vision function of the robot or smart device.
- the number of candidate objects can be determined through the image recognition result of the image recognition technology. If the candidate objects are determined When the number of is one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, the candidate user corresponding to the earliest participating interaction is determined as the new interaction object found in this handover according to the time sequence of each candidate object participating in the interaction obtained by image recognition.
- the robot captures image data in real time through the image acquisition module, and performs face recognition on the acquired image data.
- the candidate user A who issued the mouth opening action first is determined as the new interactive object found in this switch.
- the robot or smart device can also be responsible for the collection of image data through the image collection module, and the audio collection module is responsible for the collection of sound.
- the image recognition technology and sound source localization technology can be combined to analyze and determine the candidate object. Quantity, if the number of candidate objects is determined to be one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, comprehensively analyze the mouth opening action and voice signal of the candidate object according to the image recognition result and/or the sound source localization result, and find the earliest participant in the interaction from the candidate objects participating in the interaction. Corresponding to the candidate user, thereby determining that the candidate user who participated in the interaction earliest is the new interaction object found for this handover.
- S560 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
- S570 judges whether a wake-up signal is received when it is in a sleep state
- the present invention preferably uses image data and voice signals as the judging factors to detect candidate objects and determine one of the candidate objects as a new interactive object, so as to avoid candidates who will emit meaningless voice signals within the effective collection range of the audio collection module and the image collection module Objects (such as babies) or users who have no interactive intentions are mistakenly identified as new interactive objects.
- image recognition technology and sound source localization technology it achieves precise positioning of the direction and position of candidate objects, and improves the search and determination of new interactive objects. The accuracy rate.
- the robot or smart device automatically switches to a new interactive object to continue the interaction when the robot or smart device is awakened, which improves the efficiency of switching between the robot or smart device and multiple interactive objects, and shortens the robot or smart device from turning to the next interactive object.
- FIG. 6 in the use scene of the emotional companion robot Robot, it includes Robot, User1, User2, and User3. Moreover, User1, User2, and User3 mentioned in the figure are not specific, but only used to distinguish different users.
- User1 comes to Robot and wakes up Robot through the wake-up word. Then Robot turns to User1 and interacts with User1. During the interaction, it is necessary to determine in real time whether User1 is still interacting with it (Robot). Robot locates and interacts with it through the sound source. Facial feature recognition judges that User1 has stopped interacting with it (Robot), and Robot should automatically turn to User2 who is talking. This strategy is also adapted when there are more than two users.
- the process of human-computer interaction when the robot faces multiple people is shown in Figure 7 and includes the following steps:
- Step 0 Initial state; one Robot (in dormant state), two or more users who can interact with the Robot.
- Step 1 User1 approaches Robot and wakes up Robot. Robot is awakened from dormant state and switched to awakened state, and then go to step 2.
- Step 2 Robot turns to User1 and interacts with User1, and then goes to step 3.
- Step 3 In the process of interaction between Robot and User1, it will judge whether User1 is still interacting with itself (Robot) through sound source localization and facial feature recognition. The judgment results are divided into the following four types:
- An embodiment of the present invention, an interactive communication realization device, as shown in FIG. 8, includes:
- the image collection module 10 is used to collect face images
- the audio collection module 20 is used to collect voice signals
- the detection module 30 is used to detect whether the current interactive object stops interacting
- the processing module 40 is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awake state.
- this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
- the detection module 30 is also used for judging whether a wake-up signal is received when it is in a dormant state
- the processing module 40 is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
- this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
- the execution module is used to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
- the processing module 40 is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awake state.
- this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
- the processing module 40 includes:
- the searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration of the current interaction object stops interacting reaches the second preset duration;
- the object switching unit is used to determine if there is a candidate object as a new interactive object; if there are at least two candidate objects, determine a candidate object as a new interactive object according to the image recognition result and/or the sound source localization result .
- this embodiment is a device embodiment corresponding to the above method embodiment.
- this embodiment is a device embodiment corresponding to the above method embodiment.
- a smart device includes a processor and a memory, where the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory to implement the interaction in the above method embodiment Communication implementation method.
- the smart device may be a desktop computer, a notebook, a palmtop computer, a tablet computer, a mobile phone, a human-computer interaction screen and other devices.
- the smart device may include, but is not limited to, a processor and a memory.
- Smart devices may also include input/output interfaces, display devices, network access devices, communication buses, communication interfaces, and so on.
- the communication interface and the communication bus may also include an input/output interface, where the processor, the memory, the input/output interface, and the communication interface complete mutual communication through the communication bus.
- the memory stores a computer program, and the processor is used to execute the computer program stored on the memory to implement the interactive communication implementation method in the foregoing method embodiment.
- the processor may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the memory may be an internal storage unit of the smart device, such as a hard disk or memory of the smart device.
- the memory may also be an external storage device of the smart device, for example: a plug-in hard disk equipped on the smart device, a smart media card (SMC), a secure digital (SD) card, Flash Card, etc.
- the memory may also include both an internal storage unit of the smart device and an external storage device.
- the memory is used to store the computer program and other programs and data required by the smart device.
- the memory can also be used to temporarily store data that has been output or will be output.
- the communication bus is a circuit that connects the described elements and realizes transmission between these elements.
- the processor receives commands from other elements through the communication bus, decrypts the received commands, and performs calculations or data processing according to the decrypted commands.
- the memory may include program modules, such as a kernel (kernel), middleware (middleware), application programming interface (Application Programming Interface, API), and applications.
- the program module can be composed of software, firmware or hardware, or at least two of them.
- the input/output interface forwards commands or data input by the user through the input/output interface (such as a sensor, a keyboard, and a touch screen).
- the communication interface connects the smart device with other network devices, user equipment, and the network.
- the communication interface may be wired or wirelessly connected to the network to connect to other external network equipment or user equipment.
- the wireless communication may include at least one of the following: wireless fidelity (WiFi), Bluetooth (BT), short-range wireless communication technology (NFC), global satellite positioning system (GPS), cellular communication, and so on.
- Wired communication may include at least one of the following: universal serial bus (USB), high-definition multimedia interface (HDMI), asynchronous transmission standard interface (RS-232), and so on.
- the network can be a telecommunication network and a communication network.
- the communication network can be a computer network, the Internet, the Internet of Things, and a telephone network. Smart devices can be connected to the network through a communication interface, and the protocol used by the smart device to communicate with other network devices can be supported by at least one of the application, application programming interface (API), middleware, kernel, and communication interface.
- API application programming interface
- An embodiment of the present invention is a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the corresponding embodiment of the foregoing interactive communication implementation method.
- the computer-readable storage medium may be read-only memory (ROM), random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
- the disclosed device/smart device and method may be implemented in other ways.
- the device/smart device embodiments described above are only illustrative.
- the division of the modules or units is only a logical function division.
- there may be other division methods for example, multiple divisions.
- Units or components can be combined or integrated into another system, or some features can be omitted or not implemented.
- the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
- the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the present invention implements all or part of the processes in the above-mentioned embodiment methods, and can also be completed by sending instructions to related hardware through a computer program.
- the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
- the computer program includes: computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
- the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
- ROM Read-Only Memory
- RAM Random Access Memory
- electrical carrier signal telecommunications signal
- software distribution media etc.
- the content contained in the computer-readable storage medium can be appropriately increased or decreased in accordance with the requirements of the legislation and patent practice in the jurisdiction.
- the computer can be The reading medium does not include electric carrier signals and telecommunication signals.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Robotics (AREA)
- Mechanical Engineering (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (10)
- 一种交互沟通实现方法,其特征在于,包括步骤:A method for realizing interactive communication, which is characterized in that it comprises the following steps:检测当前交互对象是否停止交互;Detect whether the current interactive object stops interacting;在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象。When the current interactive object stops interacting in the awake state, a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
- 根据权利要求1所述的交互沟通实现方法,其特征在于,还包括步骤:The method for implementing interactive communication according to claim 1, characterized in that it further comprises the steps of:在唤醒状态下所述当前交互对象未停止交互时,继续检测的同时响应所述当前交互对象的需求服务类型。When the current interactive object does not stop interacting in the awakened state, the detection is continued while responding to the required service type of the current interactive object.
- 根据权利要求1所述的交互沟通实现方法,其特征在于,还包括步骤:The method for implementing interactive communication according to claim 1, characterized in that it further comprises the steps of:在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态。In the awakened state and the duration of no interaction object reaches the first preset time period, it enters the dormant state.
- 根据权利要求1所述的交互沟通实现方法,其特征在于,还包括步骤:The method for implementing interactive communication according to claim 1, characterized in that it further comprises the steps of:在自身处于休眠状态时判断是否接收到唤醒信号;Judge whether it receives a wake-up signal when it is in a sleep state;若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。If a wake-up signal is received, it switches from the dormant state to the wake-up state, and it is determined that the target object that triggers the awakening of itself is the current interactive object.
- 根据权利要求1-4任一项所述的交互沟通实现方法,其特征在于,所述在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象包括步骤:The method for implementing interactive communication according to any one of claims 1 to 4, wherein when the current interactive object stops interacting in the awakened state, a candidate participating in the interaction is selected by collecting image data and voice signals. Determining the object as a new interactive object includes the following steps:在所述当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;When the duration for the current interactive object to stop interacting reaches the second preset time period, search for candidate objects participating in the interaction through image recognition and/or sound source localization;若存在一个候选对象,确定该候选对象为所述新的交互对象;If there is a candidate object, determine that the candidate object is the new interactive object;若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为所述新的交互对象。If there are at least two candidate objects, one candidate object is determined as the new interactive object according to the image recognition result and/or the sound source localization result.
- 一种交互沟通实现设备,其特征在于,包括:An interactive communication realization device, which is characterized in that it includes:图像采集模块,用于在采集人脸图像;Image collection module, used to collect face images;音频采集模块,用于采集语音信号;Audio collection module, used to collect voice signals;检测模块,用于检测当前交互对象是否停止交互;The detection module is used to detect whether the current interactive object stops interacting;处理模块,用于在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象。The processing module is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awakened state.
- 根据权利要求6所述的交互沟通实现设备,其特征在于,还包括:The interactive communication realization device according to claim 6, characterized in that it further comprises:执行模块,用于在唤醒状态下所述当前交互对象未停止交互时,继续检测的同时响应所述当前交互对象的需求服务类型;The execution module is configured to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;所述处理模块,还用于在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态。The processing module is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state.
- 根据权利要求6所述的交互沟通实现设备,其特征在于:The interactive communication realization device according to claim 6, characterized in that:所述检测模块,还用于在自身处于休眠状态时判断是否接收到唤醒信号;The detection module is also used to determine whether a wake-up signal is received when the detection module is in a sleep state;所述处理模块,还用于若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。The processing module is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
- 根据权利要求6-8任一项所述的交互沟通实现设备,其特征在于,所述处理模块包括:The interactive communication realization device according to any one of claims 6-8, wherein the processing module comprises:查找单元,在所述当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;The searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration for the current interaction object to stop interacting reaches the second preset duration;对象切换单元,用于若存在一个候选对象,确定该候选对象为所述新的交互对象;若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为所述新的交互对象。The object switching unit is configured to determine that if there is one candidate object, the candidate object is the new interactive object; if there are at least two candidate objects, determine one candidate object as the new interactive object according to the image recognition result and/or the sound source localization result New interactive objects.
- 一种存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1至权利要求5任一项所述的交互沟通实现方法所执行的操作。A storage medium, characterized in that at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the method for implementing interactive communication according to any one of claims 1 to 5; Action performed.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/086222 WO2021212388A1 (en) | 2020-04-22 | 2020-04-22 | Interactive communication implementation method and device, and storage medium |
CN202080004243.6A CN112739507B (en) | 2020-04-22 | 2020-04-22 | Interactive communication realization method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2020/086222 WO2021212388A1 (en) | 2020-04-22 | 2020-04-22 | Interactive communication implementation method and device, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021212388A1 true WO2021212388A1 (en) | 2021-10-28 |
Family
ID=75609496
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/086222 WO2021212388A1 (en) | 2020-04-22 | 2020-04-22 | Interactive communication implementation method and device, and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112739507B (en) |
WO (1) | WO2021212388A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114193477A (en) * | 2021-12-24 | 2022-03-18 | 上海擎朗智能科技有限公司 | Position leading method, device, robot and storage medium |
CN116363566A (en) * | 2023-06-02 | 2023-06-30 | 华东交通大学 | Target interaction relation recognition method based on relation knowledge graph |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116978372A (en) * | 2022-04-22 | 2023-10-31 | 华为技术有限公司 | Voice interaction method, electronic equipment and storage medium |
CN114715175A (en) * | 2022-05-06 | 2022-07-08 | Oppo广东移动通信有限公司 | Target object determination method and device, electronic equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105881548A (en) * | 2016-04-29 | 2016-08-24 | 北京快乐智慧科技有限责任公司 | Method for waking up intelligent interactive robot and intelligent interactive robot |
CN108733420A (en) * | 2018-03-21 | 2018-11-02 | 北京猎户星空科技有限公司 | Awakening method, device, smart machine and the storage medium of smart machine |
CN109683610A (en) * | 2018-12-14 | 2019-04-26 | 北京猎户星空科技有限公司 | Smart machine control method, device and storage medium |
CN110111789A (en) * | 2019-05-07 | 2019-08-09 | 百度国际科技(深圳)有限公司 | Voice interactive method, calculates equipment and computer-readable medium at device |
US20190371342A1 (en) * | 2018-06-05 | 2019-12-05 | Samsung Electronics Co., Ltd. | Methods and systems for passive wakeup of a user interaction device |
CN110730115A (en) * | 2019-09-11 | 2020-01-24 | 北京小米移动软件有限公司 | Voice control method and device, terminal and storage medium |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106354255A (en) * | 2016-08-26 | 2017-01-25 | 北京光年无限科技有限公司 | Man-machine interactive method and equipment facing robot product |
CN110290096B (en) * | 2018-03-19 | 2022-06-24 | 阿里巴巴集团控股有限公司 | Man-machine interaction method and terminal |
CN109166575A (en) * | 2018-07-27 | 2019-01-08 | 百度在线网络技术(北京)有限公司 | Exchange method, device, smart machine and the storage medium of smart machine |
CN109461448A (en) * | 2018-12-11 | 2019-03-12 | 百度在线网络技术(北京)有限公司 | Voice interactive method and device |
CN110689889B (en) * | 2019-10-11 | 2021-08-17 | 深圳追一科技有限公司 | Man-machine interaction method and device, electronic equipment and storage medium |
-
2020
- 2020-04-22 WO PCT/CN2020/086222 patent/WO2021212388A1/en active Application Filing
- 2020-04-22 CN CN202080004243.6A patent/CN112739507B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105881548A (en) * | 2016-04-29 | 2016-08-24 | 北京快乐智慧科技有限责任公司 | Method for waking up intelligent interactive robot and intelligent interactive robot |
CN108733420A (en) * | 2018-03-21 | 2018-11-02 | 北京猎户星空科技有限公司 | Awakening method, device, smart machine and the storage medium of smart machine |
US20190371342A1 (en) * | 2018-06-05 | 2019-12-05 | Samsung Electronics Co., Ltd. | Methods and systems for passive wakeup of a user interaction device |
CN109683610A (en) * | 2018-12-14 | 2019-04-26 | 北京猎户星空科技有限公司 | Smart machine control method, device and storage medium |
CN110111789A (en) * | 2019-05-07 | 2019-08-09 | 百度国际科技(深圳)有限公司 | Voice interactive method, calculates equipment and computer-readable medium at device |
CN110730115A (en) * | 2019-09-11 | 2020-01-24 | 北京小米移动软件有限公司 | Voice control method and device, terminal and storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114193477A (en) * | 2021-12-24 | 2022-03-18 | 上海擎朗智能科技有限公司 | Position leading method, device, robot and storage medium |
CN116363566A (en) * | 2023-06-02 | 2023-06-30 | 华东交通大学 | Target interaction relation recognition method based on relation knowledge graph |
CN116363566B (en) * | 2023-06-02 | 2023-10-17 | 华东交通大学 | Target interaction relation recognition method based on relation knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN112739507B (en) | 2023-05-09 |
CN112739507A (en) | 2021-04-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021212388A1 (en) | Interactive communication implementation method and device, and storage medium | |
US11620984B2 (en) | Human-computer interaction method, and electronic device and storage medium thereof | |
CN108735209B (en) | Wake-up word binding method, intelligent device and storage medium | |
KR101726945B1 (en) | Reducing the need for manual start/end-pointing and trigger phrases | |
WO2021036714A1 (en) | Voice-controlled split-screen display method and electronic device | |
CN110263131B (en) | Reply information generation method, device and storage medium | |
CN112860169B (en) | Interaction method and device, computer readable medium and electronic equipment | |
CN108766438A (en) | Man-machine interaction method, device, storage medium and intelligent terminal | |
EP3933570A1 (en) | Method and apparatus for controlling a voice assistant, and computer-readable storage medium | |
EP4184506A1 (en) | Audio processing | |
CN111063354B (en) | Man-machine interaction method and device | |
CN109032554B (en) | Audio processing method and electronic equipment | |
WO2022042274A1 (en) | Voice interaction method and electronic device | |
CN112634895A (en) | Voice interaction wake-up-free method and device | |
US20230048330A1 (en) | In-Vehicle Speech Interaction Method and Device | |
WO2024103926A1 (en) | Voice control methods and apparatuses, storage medium, and electronic device | |
CN112233676A (en) | Intelligent device awakening method and device, electronic device and storage medium | |
WO2022227507A1 (en) | Wake-up degree recognition model training method and speech wake-up degree acquisition method | |
CN106683668A (en) | Method of awakening control of intelligent device and system | |
WO2023006033A1 (en) | Speech interaction method, electronic device, and medium | |
US11929081B2 (en) | Electronic apparatus and controlling method thereof | |
WO2024103893A1 (en) | Method for waking up application program, and electronic device | |
CN109119075A (en) | Speech recognition scene awakening method and device | |
CN110989963B (en) | Wake-up word recommendation method and device and storage medium | |
WO2024055831A1 (en) | Voice interaction method and apparatus, and terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20931847 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20931847 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20931847 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.05.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20931847 Country of ref document: EP Kind code of ref document: A1 |