WO2021212388A1 - Interactive communication implementation method and device, and storage medium - Google Patents

Interactive communication implementation method and device, and storage medium Download PDF

Info

Publication number
WO2021212388A1
WO2021212388A1 PCT/CN2020/086222 CN2020086222W WO2021212388A1 WO 2021212388 A1 WO2021212388 A1 WO 2021212388A1 CN 2020086222 W CN2020086222 W CN 2020086222W WO 2021212388 A1 WO2021212388 A1 WO 2021212388A1
Authority
WO
WIPO (PCT)
Prior art keywords
interactive
interaction
interactive object
candidate
state
Prior art date
Application number
PCT/CN2020/086222
Other languages
French (fr)
Chinese (zh)
Inventor
马海滨
Original Assignee
南京阿凡达机器人科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京阿凡达机器人科技有限公司 filed Critical 南京阿凡达机器人科技有限公司
Priority to PCT/CN2020/086222 priority Critical patent/WO2021212388A1/en
Priority to CN202080004243.6A priority patent/CN112739507B/en
Publication of WO2021212388A1 publication Critical patent/WO2021212388A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J11/00Manipulators not otherwise provided for
    • B25J11/0005Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the invention relates to the technical field of human-computer interaction, in particular to a method, equipment and storage medium for realizing interactive communication.
  • Triggering operations such as "wake-up words" or touch input operations are the main triggering methods for triggering current robots or smart devices to perform human-computer interaction.
  • the problem of using the above method for interaction in a multi-person scenario is that for each subject person participating in the interaction, the above operation must be performed when the robot or smart device is in the awake state to switch new interactive objects midway, resulting in all The user must understand and master the trigger operations of different robots or smart devices.
  • the above-mentioned trigger operation is executed.
  • Such an interaction process is not only mechanical but also affects the rhythm of multi-person switching interaction. It cannot interact with multiple users in real time and intelligently in multi-user interaction scenarios. Users communicate effectively.
  • the purpose of the present invention is to provide a method, equipment and storage medium for realizing interactive communication to realize the natural, flexible and intelligent switching of interactive objects in a multi-user interaction scenario, so as to realize timely and efficient interaction with multiple objects in a humanized manner.
  • the purpose of communication is to provide a method, equipment and storage medium for realizing interactive communication to realize the natural, flexible and intelligent switching of interactive objects in a multi-user interaction scenario, so as to realize timely and efficient interaction with multiple objects in a humanized manner.
  • the present invention provides a method for realizing interactive communication, which includes the steps:
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
  • the detection is continued while responding to the required service type of the current interactive object.
  • the duration of no interaction object reaches the first preset time period, it enters the dormant state.
  • a wake-up signal If a wake-up signal is received, it switches from the dormant state to the wake-up state, and it is determined that the target object that triggers the awakening of itself is the current interactive object.
  • determining a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals includes the steps:
  • one candidate object is determined as the new interactive object according to the image recognition result and/or the sound source localization result.
  • the present invention also provides an interactive communication realization device, including:
  • Image collection module used to collect face images
  • Audio collection module used to collect voice signals
  • the detection module is used to detect whether the current interactive object stops interacting
  • the processing module is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awakened state.
  • the execution module is configured to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
  • the processing module is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state.
  • the detection module is also used to determine whether a wake-up signal is received when the detection module is in a sleep state
  • the processing module is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
  • processing module includes:
  • the searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration for the current interaction object to stop interacting reaches the second preset duration;
  • the object switching unit is configured to determine that if there is one candidate object, the candidate object is the new interactive object; if there are at least two candidate objects, determine one candidate object as the new interactive object according to the image recognition result and/or the sound source localization result New interactive objects.
  • the present invention also provides a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the interactive communication implementation method.
  • the interactive objects can be switched naturally, flexibly and intelligently in the multi-user interaction scene, so as to realize the timely and efficient interactive communication with multiple objects in a humanized manner. the goal of.
  • FIG. 1 is a flowchart of an embodiment of a method for implementing interactive communication of the present invention
  • FIG. 2 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 3 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 4 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 5 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention.
  • FIG. 6 is a schematic diagram of the interaction of the emotional companion robot Robot of the present invention in a multi-user interaction scenario
  • FIG. 7 is a schematic diagram of the human-computer interaction process when the robot of the present invention faces multiple people
  • FIG. 8 is a schematic structural diagram of an embodiment of an interactive communication realization device of the present invention.
  • the terminal for implementing object switching includes, but is not limited to, personal virtual assistants, homework robots (such as sweeping robots), children's educational robots, elderly care robots, emotional companion robots, airport service robots, shopping service robots and other robots. It also includes smart devices such as smart phones, smart speaker devices, and smart voice elevators, which are usually used in social places such as shopping malls, subway stations, and railway stations.
  • a method for implementing interactive communication includes:
  • the robot or smart device can collect image data (including but not limited to face images, gesture images) within the field of view through image collection modules such as cameras or camera arrays, and can also use audio collection modules such as microphones or microphone arrays. Obtain the input voice signal within the effective acquisition range.
  • image collection modules such as cameras or camera arrays
  • audio collection modules such as microphones or microphone arrays.
  • the types of interaction between the robot or smart device and the current interactive object include, but are not limited to, voice dialogue interaction and gesture dialogue interaction.
  • the robot or smart device can judge whether to input the input voice signal according to the image data and/or voice signal on the current interactive object. It is also possible to determine whether to input a gesture based on the image data for the current interactive object.
  • the processor of the robot or smart device will perform the tasks it receives, it can also detect its own process to determine whether there is a voice interaction task obtained by voice recognition or a gesture interaction task obtained by image recognition, so as to detect and judge according to the above judgment result Whether the current interactive object stops interacting.
  • the microphone array in the embodiment of the present invention may be an array formed by a group of acoustic sensors located at different positions in space and regularly arranged according to a certain shape, and is a device for spatially sampling voice signals propagating in space.
  • the voice signal processing method of the embodiment of the present invention does not specifically limit the specific form of the microphone array used.
  • the camera array in the embodiment of the present invention may be an array in which a group of image sensors located at different positions in space are regularly arranged according to a certain shape to collect image data from multiple viewing angles.
  • the microphone array or camera may be a horizontal array, a T-shaped array, an L-shaped array, a polyhedral array, a spherical array, and so on.
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
  • the robot or smart device can determine whether the currently tracked interaction object (current interaction object includes a person, other smart device, or other robot) has stopped interacting with itself in an awakened state based on image data and/or voice signals. If the current interactive object stops interacting with the robot or smart device in the awake state, the robot or smart device will collect face images and voice signals to participate in one of the candidate objects of the interaction (the candidate objects include other people, Other smart devices or other robots) are replaced with new current interactive objects.
  • the candidate objects include other people, Other smart devices or other robots
  • robot A is the detection subject and user A is the current interactive object
  • robot A detects that user B is participating in the interaction by collecting image data and/or voice signals, then According to the image data and the voice signal, the user B is determined as the new interactive object.
  • a method for implementing interactive communication includes:
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
  • the robot or smart device When the robot or smart device is in the awake state, if it is detected that the current interactive object has not stopped interacting, the robot or smart device continues to real-time whether the current interactive object has stopped interacting, and at the same time, it also obtains the voice signal of the current interactive object during the detection process Perform voice recognition (or gesture recognition) (or gesture signal) to obtain the corresponding required service type, and perform corresponding operations according to the required service type to give a response to the current interactive object. Among them, performing voice recognition (gesture recognition) on the voice signal (or gesture signal) to obtain the required service type is the existing technology, and will not be repeated here.
  • a robot or smart device is used as the detection subject, and user A is the current interaction object.
  • the robot or smart device obtains the result by performing voice recognition on the voice signal input by user A. Play nursery rhymes", then the robot or smart device will query the music library to play nursery rhymes.
  • TTS Text To Speech
  • TTS devices TTS-enabled devices
  • TTS function no other services are provided
  • a method for implementing interactive communication includes:
  • S310 judges whether a wake-up signal is received when it is in a sleep state
  • the wake-up mechanism includes, but is not limited to, the wake-up signal is triggered by a voice input wake-up word, and mechanical buttons can also be preset on the robot or smart device. Or touch a button to generate a wake-up signal through touch and press, or it can generate a wake-up signal after receiving an input gesture that matches a preset wake-up gesture. Other ways of generating the wake-up signal by the wake-up mechanism are also within the protection scope of the present invention.
  • S320 If S320 receives the wake-up signal, switch from the dormant state to the wake-up state, and determine that the target object that triggers the awakening of itself is the current interactive object;
  • the robot or smart device receives the wake-up signal in the dormant state, it automatically switches from the dormant state to the awakened state, thereby determining the target object that triggers the wake-up as the initial current interaction object in the current awakening state, where the target object It can be a person with normal language ability, or a person who uses TTS equipment to send out voice signals.
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
  • a method for implementing interactive communication includes:
  • S410 detects whether the current interactive object stops interacting
  • a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
  • S440 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
  • the robot or smart device when the robot or smart device is in the awake state, if the current interactive object stops interacting with itself, and the duration of the new interactive object interacting with itself is not detected for the first preset time period, it indicates that the robot or During the time period that the smart device lasts for the first preset time period, there is no interaction object to interact with the robot or the smart device.
  • the awake state when there are no interactive objects within the effective acquisition range of the audio acquisition module and image acquisition module of the robot or smart device and the duration reaches the first preset duration, it also indicates that the robot or smart device is continuing the first preset duration. In the set time period, there is no interaction object to interact with the robot or smart device.
  • the robot or smart device will automatically enter the dormant state at this time to prevent the robot or smart device from being awake for a long time and save the robot or smart device’s cost. Power consumption increases the standby time of robots or smart devices.
  • S450 judges whether a wake-up signal is received when it is in a sleep state
  • Interaction objects as long as the robot or smart device switches from the dormant state to the awakened state, during the subsequent wake-up state, there is no need to frequently voice input wake-up words as in the prior art to switch new interactive objects midway, and it does not need to frequently cause all Users must understand and master the trigger operations of different robots or smart devices. They only need to realize real-time and intelligent switching of new interactive objects in multi-user interaction scenarios based on the collected image data and voice signals. This is not only more in line with daily communication patterns, but also It helps to achieve effective communication and increase the personification effect of human-machine communication, so as to achieve the purpose of effective interactive communication between robots or smart devices and multiple objects.
  • a method for implementing interactive communication includes:
  • S510 detects whether the current interactive object stops interacting
  • the second preset duration is less than the first preset duration.
  • the robot or smart device when the robot or smart device meets the trigger condition for searching and switching new interactive objects, only one candidate object is determined as the new interactive object found this time after each search.
  • the robot or smart device can be responsible for sound collection through the audio collection module to realize the auditory function of the robot or smart device.
  • the voice signal After the voice signal is collected, the voice signal is processed by framing and windowing, and the audio processing of the voice signal is used to determine the number of sound sources Then, the number of candidate objects is determined according to the number of sound sources, and the sound source localization recognition is a prior art, which will not be repeated here. If the number of candidate objects is determined to be one through the above method, the candidate object is directly determined as the new interactive object. If it is determined that the number of candidate objects is at least two, it is determined according to the time sequence of the acquired voice signals that the candidate user corresponding to the earliest acquired voice signal is the new interactive object found for this handover.
  • the robot or smart device collects voice signals in real time through the audio collection module, and obtains the number of sound sources according to the sound source location recognition technology to determine the earliest sound source.
  • the candidate users of the voice signal are the new interactive objects found for this handover.
  • the robot or smart device can also be responsible for the collection of image data through the image acquisition module to realize the vision function of the robot or smart device.
  • the number of candidate objects can be determined through the image recognition result of the image recognition technology. If the candidate objects are determined When the number of is one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, the candidate user corresponding to the earliest participating interaction is determined as the new interaction object found in this handover according to the time sequence of each candidate object participating in the interaction obtained by image recognition.
  • the robot captures image data in real time through the image acquisition module, and performs face recognition on the acquired image data.
  • the candidate user A who issued the mouth opening action first is determined as the new interactive object found in this switch.
  • the robot or smart device can also be responsible for the collection of image data through the image collection module, and the audio collection module is responsible for the collection of sound.
  • the image recognition technology and sound source localization technology can be combined to analyze and determine the candidate object. Quantity, if the number of candidate objects is determined to be one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, comprehensively analyze the mouth opening action and voice signal of the candidate object according to the image recognition result and/or the sound source localization result, and find the earliest participant in the interaction from the candidate objects participating in the interaction. Corresponding to the candidate user, thereby determining that the candidate user who participated in the interaction earliest is the new interaction object found for this handover.
  • S560 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
  • S570 judges whether a wake-up signal is received when it is in a sleep state
  • the present invention preferably uses image data and voice signals as the judging factors to detect candidate objects and determine one of the candidate objects as a new interactive object, so as to avoid candidates who will emit meaningless voice signals within the effective collection range of the audio collection module and the image collection module Objects (such as babies) or users who have no interactive intentions are mistakenly identified as new interactive objects.
  • image recognition technology and sound source localization technology it achieves precise positioning of the direction and position of candidate objects, and improves the search and determination of new interactive objects. The accuracy rate.
  • the robot or smart device automatically switches to a new interactive object to continue the interaction when the robot or smart device is awakened, which improves the efficiency of switching between the robot or smart device and multiple interactive objects, and shortens the robot or smart device from turning to the next interactive object.
  • FIG. 6 in the use scene of the emotional companion robot Robot, it includes Robot, User1, User2, and User3. Moreover, User1, User2, and User3 mentioned in the figure are not specific, but only used to distinguish different users.
  • User1 comes to Robot and wakes up Robot through the wake-up word. Then Robot turns to User1 and interacts with User1. During the interaction, it is necessary to determine in real time whether User1 is still interacting with it (Robot). Robot locates and interacts with it through the sound source. Facial feature recognition judges that User1 has stopped interacting with it (Robot), and Robot should automatically turn to User2 who is talking. This strategy is also adapted when there are more than two users.
  • the process of human-computer interaction when the robot faces multiple people is shown in Figure 7 and includes the following steps:
  • Step 0 Initial state; one Robot (in dormant state), two or more users who can interact with the Robot.
  • Step 1 User1 approaches Robot and wakes up Robot. Robot is awakened from dormant state and switched to awakened state, and then go to step 2.
  • Step 2 Robot turns to User1 and interacts with User1, and then goes to step 3.
  • Step 3 In the process of interaction between Robot and User1, it will judge whether User1 is still interacting with itself (Robot) through sound source localization and facial feature recognition. The judgment results are divided into the following four types:
  • An embodiment of the present invention, an interactive communication realization device, as shown in FIG. 8, includes:
  • the image collection module 10 is used to collect face images
  • the audio collection module 20 is used to collect voice signals
  • the detection module 30 is used to detect whether the current interactive object stops interacting
  • the processing module 40 is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awake state.
  • this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
  • the detection module 30 is also used for judging whether a wake-up signal is received when it is in a dormant state
  • the processing module 40 is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
  • this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
  • the execution module is used to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
  • the processing module 40 is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awake state.
  • this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
  • the processing module 40 includes:
  • the searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration of the current interaction object stops interacting reaches the second preset duration;
  • the object switching unit is used to determine if there is a candidate object as a new interactive object; if there are at least two candidate objects, determine a candidate object as a new interactive object according to the image recognition result and/or the sound source localization result .
  • this embodiment is a device embodiment corresponding to the above method embodiment.
  • this embodiment is a device embodiment corresponding to the above method embodiment.
  • a smart device includes a processor and a memory, where the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory to implement the interaction in the above method embodiment Communication implementation method.
  • the smart device may be a desktop computer, a notebook, a palmtop computer, a tablet computer, a mobile phone, a human-computer interaction screen and other devices.
  • the smart device may include, but is not limited to, a processor and a memory.
  • Smart devices may also include input/output interfaces, display devices, network access devices, communication buses, communication interfaces, and so on.
  • the communication interface and the communication bus may also include an input/output interface, where the processor, the memory, the input/output interface, and the communication interface complete mutual communication through the communication bus.
  • the memory stores a computer program, and the processor is used to execute the computer program stored on the memory to implement the interactive communication implementation method in the foregoing method embodiment.
  • the processor may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory may be an internal storage unit of the smart device, such as a hard disk or memory of the smart device.
  • the memory may also be an external storage device of the smart device, for example: a plug-in hard disk equipped on the smart device, a smart media card (SMC), a secure digital (SD) card, Flash Card, etc.
  • the memory may also include both an internal storage unit of the smart device and an external storage device.
  • the memory is used to store the computer program and other programs and data required by the smart device.
  • the memory can also be used to temporarily store data that has been output or will be output.
  • the communication bus is a circuit that connects the described elements and realizes transmission between these elements.
  • the processor receives commands from other elements through the communication bus, decrypts the received commands, and performs calculations or data processing according to the decrypted commands.
  • the memory may include program modules, such as a kernel (kernel), middleware (middleware), application programming interface (Application Programming Interface, API), and applications.
  • the program module can be composed of software, firmware or hardware, or at least two of them.
  • the input/output interface forwards commands or data input by the user through the input/output interface (such as a sensor, a keyboard, and a touch screen).
  • the communication interface connects the smart device with other network devices, user equipment, and the network.
  • the communication interface may be wired or wirelessly connected to the network to connect to other external network equipment or user equipment.
  • the wireless communication may include at least one of the following: wireless fidelity (WiFi), Bluetooth (BT), short-range wireless communication technology (NFC), global satellite positioning system (GPS), cellular communication, and so on.
  • Wired communication may include at least one of the following: universal serial bus (USB), high-definition multimedia interface (HDMI), asynchronous transmission standard interface (RS-232), and so on.
  • the network can be a telecommunication network and a communication network.
  • the communication network can be a computer network, the Internet, the Internet of Things, and a telephone network. Smart devices can be connected to the network through a communication interface, and the protocol used by the smart device to communicate with other network devices can be supported by at least one of the application, application programming interface (API), middleware, kernel, and communication interface.
  • API application programming interface
  • An embodiment of the present invention is a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the corresponding embodiment of the foregoing interactive communication implementation method.
  • the computer-readable storage medium may be read-only memory (ROM), random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
  • the disclosed device/smart device and method may be implemented in other ways.
  • the device/smart device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • there may be other division methods for example, multiple divisions.
  • Units or components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the present invention implements all or part of the processes in the above-mentioned embodiment methods, and can also be completed by sending instructions to related hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes: computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media etc.
  • the content contained in the computer-readable storage medium can be appropriately increased or decreased in accordance with the requirements of the legislation and patent practice in the jurisdiction.
  • the computer can be The reading medium does not include electric carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Disclosed are an interactive communication implementation method and device, and a storage medium. The method comprises: detecting whether the current interactive object stops interaction (S110); and when the current interactive object stops interaction in a wake-up state, determining, by means of collected image data and a speech signal, one candidate object participating in interaction as a new interactive object (S120). As such, interactive objects can be switched naturally, flexibly, and intelligently in a multi-user interaction scenario, so as to achieve, in a humanized manner, the aim of interactive communication with a plurality of objects in a timely and efficient manner.

Description

一种交互沟通实现方法、设备和存储介质Method, equipment and storage medium for realizing interactive communication 技术领域Technical field
本发明涉及人机交互技术领域,尤指一种交互沟通实现方法、设备和存储介质。The invention relates to the technical field of human-computer interaction, in particular to a method, equipment and storage medium for realizing interactive communication.
背景技术Background technique
近几年,“人工智能”已经成为互联网圈里面出现频次最高的一个词汇,与此同时,服务机器人发展迅速,个人虚拟助理、家庭作业机器人(如扫地机器人)等机器人或智能设备等实现了“人工智能”的应用。目前,在很多场景中需要机器人或智能设备具备交互的能力,良好的交互服务就成了极具竞争力的人工智能服务因素之一。In recent years, "artificial intelligence" has become the most frequently appeared word in the Internet circle. At the same time, service robots have developed rapidly, and robots or smart devices such as personal virtual assistants and homework robots (such as sweeping robots) have realized " "Artificial Intelligence" application. At present, in many scenarios, robots or smart devices are required to have the ability to interact, and good interactive services have become one of the factors of highly competitive artificial intelligence services.
现有的交互方法多是基于唤醒词对语音内容进行识别,“唤醒词”或者触摸输入操作等触发操作,是触发当前机器人或智能设备进行人机交互时的主要触发方式。然而,使用上述方式进行多人场景下的交互问题在于,对于每个参与交互的主体人而言,在机器人或智能设备处于唤醒状态下必须通过上述操作才能中途切换新的交互对象,导致所有的用户必须了解掌握不同机器人或智能设备的触发操作。再者,每次切换新的用户与机器人或智能设备交互前执行上述触发操作,这样的交互流程不仅机械且影响多人切换交互的节奏,无法在多用户交互场景下实时、智能地与多个用户进行有效沟通。Most of the existing interaction methods are based on the recognition of voice content based on wake-up words. Triggering operations such as "wake-up words" or touch input operations are the main triggering methods for triggering current robots or smart devices to perform human-computer interaction. However, the problem of using the above method for interaction in a multi-person scenario is that for each subject person participating in the interaction, the above operation must be performed when the robot or smart device is in the awake state to switch new interactive objects midway, resulting in all The user must understand and master the trigger operations of different robots or smart devices. Furthermore, each time a new user is switched to interact with the robot or smart device, the above-mentioned trigger operation is executed. Such an interaction process is not only mechanical but also affects the rhythm of multi-person switching interaction. It cannot interact with multiple users in real time and intelligently in multi-user interaction scenarios. Users communicate effectively.
发明内容Summary of the invention
本发明的目的是提供一种交互沟通实现方法、设备和存储介质,实现在多用户交互场景下自然、灵活、智能地切换交互对象,以便人性化地实现与多个 对象进行及时、高效地交互沟通的目的。The purpose of the present invention is to provide a method, equipment and storage medium for realizing interactive communication to realize the natural, flexible and intelligent switching of interactive objects in a multi-user interaction scenario, so as to realize timely and efficient interaction with multiple objects in a humanized manner. The purpose of communication.
本发明提供的技术方案如下:The technical solutions provided by the present invention are as follows:
本发明提供一种交互沟通实现方法,包括步骤:The present invention provides a method for realizing interactive communication, which includes the steps:
检测当前交互对象是否停止交互;Detect whether the current interactive object stops interacting;
在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象。When the current interactive object stops interacting in the awake state, a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
进一步的,还包括步骤:Further, it also includes the steps:
在唤醒状态下所述当前交互对象未停止交互时,继续检测的同时响应所述当前交互对象的需求服务类型。When the current interactive object does not stop interacting in the awakened state, the detection is continued while responding to the required service type of the current interactive object.
进一步的,还包括步骤:Further, it also includes the steps:
在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态。In the awakened state and the duration of no interaction object reaches the first preset time period, it enters the dormant state.
进一步的,还包括步骤:Further, it also includes the steps:
在自身处于休眠状态时判断是否接收到唤醒信号;Judge whether it receives a wake-up signal when it is in a sleep state;
若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。If a wake-up signal is received, it switches from the dormant state to the wake-up state, and it is determined that the target object that triggers the awakening of itself is the current interactive object.
进一步的,所述在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象包括步骤:Further, when the current interactive object stops interacting in the awake state, determining a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals includes the steps:
在所述当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;When the duration for the current interactive object to stop interacting reaches the second preset time period, search for candidate objects participating in the interaction through image recognition and/or sound source localization;
若存在一个候选对象,确定该候选对象为所述新的交互对象;If there is a candidate object, determine that the candidate object is the new interactive object;
若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为所述新的交互对象。If there are at least two candidate objects, one candidate object is determined as the new interactive object according to the image recognition result and/or the sound source localization result.
本发明还提供一种交互沟通实现设备,包括:The present invention also provides an interactive communication realization device, including:
图像采集模块,用于在采集人脸图像;Image collection module, used to collect face images;
音频采集模块,用于采集语音信号;Audio collection module, used to collect voice signals;
检测模块,用于检测当前交互对象是否停止交互;The detection module is used to detect whether the current interactive object stops interacting;
处理模块,用于在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象。The processing module is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awakened state.
进一步的,还包括:Further, it also includes:
执行模块,用于在唤醒状态下所述当前交互对象未停止交互时,继续检测的同时响应所述当前交互对象的需求服务类型;The execution module is configured to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
所述处理模块,还用于在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态。The processing module is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state.
进一步的,所述检测模块,还用于在自身处于休眠状态时判断是否接收到唤醒信号;Further, the detection module is also used to determine whether a wake-up signal is received when the detection module is in a sleep state;
所述处理模块,还用于若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。The processing module is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
进一步的,所述处理模块包括:Further, the processing module includes:
查找单元,在所述当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;The searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration for the current interaction object to stop interacting reaches the second preset duration;
对象切换单元,用于若存在一个候选对象,确定该候选对象为所述新的交互对象;若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为所述新的交互对象。The object switching unit is configured to determine that if there is one candidate object, the candidate object is the new interactive object; if there are at least two candidate objects, determine one candidate object as the new interactive object according to the image recognition result and/or the sound source localization result New interactive objects.
本发明还提供一种存储介质,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现所述的交互沟通实现方法所执行的操作。The present invention also provides a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the interactive communication implementation method.
通过本发明提供的一种交互沟通实现方法、设备和存储介质,实现在多用 户交互场景下自然、灵活、智能地切换交互对象,以便人性化地实现与多个对象进行及时、高效地交互沟通的目的。Through the interactive communication realization method, equipment and storage medium provided by the present invention, the interactive objects can be switched naturally, flexibly and intelligently in the multi-user interaction scene, so as to realize the timely and efficient interactive communication with multiple objects in a humanized manner. the goal of.
附图说明Description of the drawings
下面将以明确易懂的方式,结合附图说明优选实施方式,对一种交互沟通实现方法、设备和存储介质的上述特性、技术特征、优点及其实现方式予以进一步说明。In the following, in a clear and easy-to-understand manner, the preferred embodiments will be described in conjunction with the accompanying drawings, and the above-mentioned characteristics, technical features, advantages and implementation methods of an interactive communication implementation method, device, and storage medium will be further described.
图1是本发明一种交互沟通实现方法的一个实施例的流程图;FIG. 1 is a flowchart of an embodiment of a method for implementing interactive communication of the present invention;
图2是本发明一种交互沟通实现方法的另一个实施例的流程图;2 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention;
图3是本发明一种交互沟通实现方法的另一个实施例的流程图;3 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention;
图4是本发明一种交互沟通实现方法的另一个实施例的流程图;4 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention;
图5是本发明一种交互沟通实现方法的另一个实施例的流程图;FIG. 5 is a flowchart of another embodiment of a method for implementing interactive communication of the present invention;
图6是本发明情感陪伴机器人Robot在多用户交互场景下的交互示意图;FIG. 6 is a schematic diagram of the interaction of the emotional companion robot Robot of the present invention in a multi-user interaction scenario;
图7是本发明机器人面对多个人时进行人机交互过程示意图;FIG. 7 is a schematic diagram of the human-computer interaction process when the robot of the present invention faces multiple people;
图8是本发明一种交互沟通实现设备的一个实施例的结构示意图。FIG. 8 is a schematic structural diagram of an embodiment of an interactive communication realization device of the present invention.
具体实施方式Detailed ways
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对照附图说明本发明的具体实施方式。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图,并获得其他的实施方式。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the specific implementation manners of the present invention will be described below with reference to the accompanying drawings. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, without creative work, other drawings can be obtained based on these drawings and obtained Other embodiments.
为使图面简洁,各图中只示意性地表示出了与本发明相关的部分,它们并不代表其作为产品的实际结构。另外,以使图面简洁便于理解,在有些图中具有相同结构或功能的部件,仅示意性地绘示了其中的一个,或仅标出了其中的一个。在本文中,“一个”不仅表示“仅此一个”,也可以表示“多于一个”的情形。In order to make the drawings concise, the drawings only schematically show the parts related to the present invention, and they do not represent the actual structure of the product. In addition, in order to make the drawings concise and easy to understand, in some drawings, only one of the components with the same structure or function is schematically shown, or only one of them is marked. In this article, "a" not only means "only this one", but can also mean "more than one".
在本发明实施例中,对象切换实现终端包括但是不限于个人虚拟助理、家 庭作业机器人(如扫地机器人)、儿童教育机器人、老人看护机器人和情感陪伴机器人、机场服务机器人、购物服务机器人等机器人,还包括智能手机、智能音箱设备、智能语音电梯等智能设备,通常应用于商场、地铁站、火车站等社交场所。In the embodiment of the present invention, the terminal for implementing object switching includes, but is not limited to, personal virtual assistants, homework robots (such as sweeping robots), children's educational robots, elderly care robots, emotional companion robots, airport service robots, shopping service robots and other robots. It also includes smart devices such as smart phones, smart speaker devices, and smart voice elevators, which are usually used in social places such as shopping malls, subway stations, and railway stations.
本发明的一个实施例,如图1所示,一种交互沟通实现方法,包括:An embodiment of the present invention, as shown in FIG. 1, a method for implementing interactive communication includes:
S110检测当前交互对象是否停止交互;S110 detects whether the current interactive object stops interacting;
具体的,机器人或者智能设备可以通过摄像头或者相机阵列等图像采集模块进行采集获取视野范围内的图像数据(包括但是不限于人脸图像、手势图像),还可以通过麦克风或者麦克风阵列等音频采集模块获取有效采集范围内输入的语音信号。机器人或者智能设备与当前交互对象进行交互的类型包括但是不限于语音对话交互、手势对话交互。机器人或者智能设备可以根据图像数据和/或语音信号对当前交互对象进行判断是否输入输入语音信号。也可以根据图像数据对当前交互对象进行判断是否输入手势。此外,由于机器人或者智能设备的处理器会执行自身接收到的任务,因此也可以检测自身进程判断是否存在语音识别获取的语音交互任务或者图像识别获取的手势交互任务,从而根据上述判断结果检测判断当前交互对象是否停止交互。Specifically, the robot or smart device can collect image data (including but not limited to face images, gesture images) within the field of view through image collection modules such as cameras or camera arrays, and can also use audio collection modules such as microphones or microphone arrays. Obtain the input voice signal within the effective acquisition range. The types of interaction between the robot or smart device and the current interactive object include, but are not limited to, voice dialogue interaction and gesture dialogue interaction. The robot or smart device can judge whether to input the input voice signal according to the image data and/or voice signal on the current interactive object. It is also possible to determine whether to input a gesture based on the image data for the current interactive object. In addition, because the processor of the robot or smart device will perform the tasks it receives, it can also detect its own process to determine whether there is a voice interaction task obtained by voice recognition or a gesture interaction task obtained by image recognition, so as to detect and judge according to the above judgment result Whether the current interactive object stops interacting.
本发明实施例中的麦克风阵列可以是一组位于空间不同位置的声学传感器按照一定的形状规则布置形成的阵列,是对空间传播的语音信号进行空间采样的一种装置。本发明实施例的语音信号处理方法对使用的麦克风阵列的具体形式不做具体限定。The microphone array in the embodiment of the present invention may be an array formed by a group of acoustic sensors located at different positions in space and regularly arranged according to a certain shape, and is a device for spatially sampling voice signals propagating in space. The voice signal processing method of the embodiment of the present invention does not specifically limit the specific form of the microphone array used.
本发明实施例中的相机阵列可以是一组位于空间不同位置的图像传感器按照一定的形状规则布置以采集多个视角下图像数据的阵列。作为一个示例,麦克风阵列或者相机可以是水平阵列、T型阵列、L型阵列、多面体阵列、球形阵列等等。The camera array in the embodiment of the present invention may be an array in which a group of image sensors located at different positions in space are regularly arranged according to a certain shape to collect image data from multiple viewing angles. As an example, the microphone array or camera may be a horizontal array, a T-shaped array, an L-shaped array, a polyhedral array, a spherical array, and so on.
S120在唤醒状态下当前交互对象停止交互时,通过采集到图像数据和语音 信号将参与交互的一个候选对象确定为新的交互对象。In S120, when the current interactive object stops interacting in the awake state, a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
具体的,机器人或智能设备在唤醒状态下可以根据图像数据和/或语音信号,判断追踪的当前交互对象(当前交互对象包括人、其他智能设备或者其他机器人)是否停止与自身的交互行为。如果在唤醒状态下当前交互对象停止与机器人或智能设备的交互行为时,那么机器人或智能设备会通过采集到人脸图像和语音信号将参与交互的其中一个候选对象(候选对象包括其他的人、其他智能设备或者其他机器人)替换为新的当前交互对象。Specifically, the robot or smart device can determine whether the currently tracked interaction object (current interaction object includes a person, other smart device, or other robot) has stopped interacting with itself in an awakened state based on image data and/or voice signals. If the current interactive object stops interacting with the robot or smart device in the awake state, the robot or smart device will collect face images and voice signals to participate in one of the candidate objects of the interaction (the candidate objects include other people, Other smart devices or other robots) are replaced with new current interactive objects.
示例性的,假设机器人A作为检测主体,用户甲作为当前交互对象,在用户甲停止与机器人A进行交互时,若机器人A通过采集到图像数据和/或语音信号检测到用户乙参与交互,则根据图像数据和语音信号将用户乙确定为新的交互对象。Exemplarily, assuming that robot A is the detection subject and user A is the current interactive object, when user A stops interacting with robot A, if robot A detects that user B is participating in the interaction by collecting image data and/or voice signals, then According to the image data and the voice signal, the user B is determined as the new interactive object.
本实施例中,在机器人或智能设备处于唤醒状态下,无需像现有技术那样频繁语音输入唤醒词实现中途切换新的交互对象,也无需频繁导致所有的用户必须了解掌握不同机器人或智能设备的触发操作,只需要根据采集的图像数据和语音信号就能实现在多用户交互场景下实时、智能地切换新的交互对象,完美、有效、人性化地实现与多个对象及时、自然地进行切换交互沟通的目的。In this embodiment, when the robot or smart device is in an awake state, there is no need to frequently voice input wake words to switch between new interactive objects as in the prior art, nor does it need to frequently cause all users to understand and master different robots or smart devices. Trigger operation, only need to realize real-time and intelligent switching of new interactive objects in multi-user interaction scenarios based on the collected image data and voice signals, and achieve perfect, effective and humanized switching with multiple objects in a timely and natural manner The purpose of interactive communication.
本发明的一个实施例,如图2所示,一种交互沟通实现方法,包括:An embodiment of the present invention, as shown in FIG. 2, a method for implementing interactive communication includes:
S210检测当前交互对象是否停止交互;S210 detects whether the current interactive object stops interacting;
S220在唤醒状态下当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象;S220 When the current interactive object stops interacting in the awakened state, a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
S230在唤醒状态下当前交互对象未停止交互时,继续检测的同时响应当前交互对象的需求服务类型。In S230, when the current interactive object does not stop interacting in the awakened state, the detection is continued while responding to the required service type of the current interactive object.
具体的,本实施例中与上述实施例相同的部分参见上述实施例,在此不再一一赘述。在机器人或智能设备处于唤醒状态下,如果检测到当前交互对象 未停止交互时,那么机器人或智能设备继续实时当前交互对象是否停止交互的同时,还根据检测过程中获取到当前交互对象的语音信号(或者手势信号)进行语音识别(或者手势识别)得到对应的需求服务类型,从而根据需求服务类型执行对应操作给予当前交互对象响应。其中,对语音信号(或者手势信号)进行语音识别(手势识别)得到需求服务类型是现有技术,在此不再一一赘述。Specifically, for the parts in this embodiment that are the same as those in the above-mentioned embodiment, refer to the above-mentioned embodiment, which will not be repeated here. When the robot or smart device is in the awake state, if it is detected that the current interactive object has not stopped interacting, the robot or smart device continues to real-time whether the current interactive object has stopped interacting, and at the same time, it also obtains the voice signal of the current interactive object during the detection process Perform voice recognition (or gesture recognition) (or gesture signal) to obtain the corresponding required service type, and perform corresponding operations according to the required service type to give a response to the current interactive object. Among them, performing voice recognition (gesture recognition) on the voice signal (or gesture signal) to obtain the required service type is the existing technology, and will not be repeated here.
示例性的,机器人或智能设备作为检测主体,用户甲作为当前交互对象,在用户甲未停止与机器人或智能设备进行交互时,机器人或智能设备通过对用户甲输入的语音信号进行语音识别得到“播放儿歌”,那么机器人或智能设备会查询曲库播放儿歌。通过TTS(Text To Speech的缩写,即“从文本到语音”)输入语音信号,适用于聋哑人通过带有TTS功能的设备(下文都TTS设备来简称,而且TTS设备在这个场景下仅提供TTS功能,不提供其他服务)来手动输入“播放儿歌”,使得TTS设备语音播报“播放儿歌”的语音信号,机器人或智能设备语音识别并查询曲库播放儿歌。Exemplarily, a robot or smart device is used as the detection subject, and user A is the current interaction object. When user A does not stop interacting with the robot or smart device, the robot or smart device obtains the result by performing voice recognition on the voice signal input by user A. Play nursery rhymes", then the robot or smart device will query the music library to play nursery rhymes. Input voice signals through TTS (Text To Speech), which is suitable for deaf people through TTS-enabled devices (hereinafter referred to as TTS devices for short, and TTS devices are only provided in this scenario TTS function, no other services are provided) to manually input "Play Children's Songs" to make the TTS device voice broadcast the voice signal of "Play Children's Songs", the robot or smart device voice recognition and query the music library to play the children's songs.
本实施例中,在机器人或智能设备处于唤醒状态下,无需像现有技术那样频繁语音输入唤醒词实现中途切换新的交互对象,也无需频繁导致所有的用户必须了解掌握不同机器人或智能设备的触发操作,只需要根据采集的图像数据和语音信号就能实现在多用户交互场景下实时、智能地切换新的交互对象,完美、有效、人性化地实现与多个对象及时、自然地进行切换交互沟通的目的。In this embodiment, when the robot or smart device is in an awake state, there is no need to frequently voice input wake words to switch between new interactive objects as in the prior art, nor does it need to frequently cause all users to understand and master different robots or smart devices. Trigger operation, only need to realize real-time and intelligent switching of new interactive objects in multi-user interaction scenarios based on the collected image data and voice signals, and achieve perfect, effective and humanized switching with multiple objects in a timely and natural manner The purpose of interactive communication.
本发明的一个实施例,如图3所示,一种交互沟通实现方法,包括:An embodiment of the present invention, as shown in FIG. 3, a method for implementing interactive communication includes:
S310在自身处于休眠状态时判断是否接收到唤醒信号;S310 judges whether a wake-up signal is received when it is in a sleep state;
具体的,机器人或智能设备处于休眠状态下时,会持续监测是否接收到唤醒信号,唤醒机制包括但是不限于通过语音输入唤醒词触发产生唤醒信号,也可以在机器人或者智能设备上预先设置机械按钮或者触摸按钮,通过触摸按压产生唤醒信号,也可以接收到输入手势符合预设唤醒手势后 产生唤醒信号。其他唤醒机制生成唤醒信号的方式亦在本发明保护范围内。Specifically, when a robot or smart device is in a sleep state, it will continuously monitor whether it receives a wake-up signal. The wake-up mechanism includes, but is not limited to, the wake-up signal is triggered by a voice input wake-up word, and mechanical buttons can also be preset on the robot or smart device. Or touch a button to generate a wake-up signal through touch and press, or it can generate a wake-up signal after receiving an input gesture that matches a preset wake-up gesture. Other ways of generating the wake-up signal by the wake-up mechanism are also within the protection scope of the present invention.
S320若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象;If S320 receives the wake-up signal, switch from the dormant state to the wake-up state, and determine that the target object that triggers the awakening of itself is the current interactive object;
具体的,机器人或智能设备处于休眠状态下一旦接收到唤醒信号,则自动从休眠状态切换为唤醒状态,从而确定触发唤醒自身的目标对象作为当前唤醒状态下初始的当前交互对象,这里的目标对象可以是具有正常语言能力的人,也可以是借助TTS设备发出语音信号的人。Specifically, once the robot or smart device receives the wake-up signal in the dormant state, it automatically switches from the dormant state to the awakened state, thereby determining the target object that triggers the wake-up as the initial current interaction object in the current awakening state, where the target object It can be a person with normal language ability, or a person who uses TTS equipment to send out voice signals.
S330检测当前交互对象是否停止交互;S330 detects whether the current interactive object stops interacting;
S340在唤醒状态下当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象;S340 When the current interactive object stops interacting in the awakened state, a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
S350在唤醒状态下当前交互对象未停止交互时,继续检测的同时响应当前交互对象的需求服务类型。In S350, when the current interactive object does not stop interacting in the awakening state, the detection is continued while responding to the required service type of the current interactive object.
具体的,本实施例中与上述实施例相同的部分参见上述实施例,在此不再一一赘述。本实施例中,机器人或者智能设备只有在从休眠状态切换为唤醒状态时,需要通过触发产生唤醒信号的目标对象中确定当前交互对象,只要机器人或智能设备从休眠状态切换为唤醒状态后,后续的整个唤醒状态过程中,无需像现有技术那样频繁语音输入唤醒词实现中途切换新的交互对象,也无需频繁导致所有的用户必须了解掌握不同机器人或智能设备的触发操作,只需要根据采集的图像数据和语音信号就能实现在多用户交互场景下实时、智能地切换新的交互对象,完美、有效、人性化地实现与多个对象及时、自然地进行切换交互沟通的目的。Specifically, for the parts in this embodiment that are the same as those in the above-mentioned embodiment, refer to the above-mentioned embodiment, which will not be repeated here. In this embodiment, only when the robot or smart device switches from the sleep state to the awakened state, it needs to determine the current interaction object by triggering the target object that generates the wake-up signal. As long as the robot or smart device switches from the sleep state to the awake state, the During the entire wake-up state process, there is no need to frequently voice input wake-up words to switch new interactive objects midway as in the prior art, nor does it need to frequently cause all users to understand and master the trigger operations of different robots or smart devices. Image data and voice signals can realize real-time and intelligent switching of new interactive objects in multi-user interaction scenarios, perfect, effective, and humane realization of the purpose of switching interactive communication with multiple objects in a timely and natural manner.
本发明的一个实施例,如图4所示,一种交互沟通实现方法,包括:An embodiment of the present invention, as shown in FIG. 4, a method for implementing interactive communication includes:
S410检测当前交互对象是否停止交互;S410 detects whether the current interactive object stops interacting;
S420在唤醒状态下当前交互对象停止交互时,通过采集到图像数据和语音 信号将参与交互的一个候选对象确定为新的交互对象;S420 When the current interactive object stops interacting in the awake state, a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals;
S430在唤醒状态下当前交互对象未停止交互时,继续检测的同时响应当前交互对象的需求服务类型;S430 When the current interactive object does not stop interacting in the awakened state, it will continue to detect while responding to the required service type of the current interactive object;
S440在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态;S440 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
具体的,在机器人或者智能设备处于唤醒状态下时,如果当前交互对象停止与自身进行交互,并且未检测到新的交互对象与自身进行交互的持续时长达到第一预设时长时,表明机器人或者智能设备在持续第一预设时长的时间段内,不存在任何交互对象与机器人或者智能设备进行互动交互。此外,处于唤醒状态下时,机器人或者智能设备的音频采集模块和图像采集模块的有效采集范围内没有交互对象且持续时长达到第一预设时长时,亦表明机器人或者智能设备在持续第一预设时长的时间段内,不存在任何交互对象与机器人或者智能设备进行互动交互。一旦确定在唤醒状态下且不存在交互对象的持续时长达到第一预设时长,此时机器人或者智能设备自动进入休眠状态,避免机器人或者智能设备长时间处于唤醒状态下,节约机器人或者智能设备的电量消耗,提升机器人或者智能设备的待机时长。Specifically, when the robot or smart device is in the awake state, if the current interactive object stops interacting with itself, and the duration of the new interactive object interacting with itself is not detected for the first preset time period, it indicates that the robot or During the time period that the smart device lasts for the first preset time period, there is no interaction object to interact with the robot or the smart device. In addition, when in the awake state, when there are no interactive objects within the effective acquisition range of the audio acquisition module and image acquisition module of the robot or smart device and the duration reaches the first preset duration, it also indicates that the robot or smart device is continuing the first preset duration. In the set time period, there is no interaction object to interact with the robot or smart device. Once it is determined that the duration of the awakening state and the absence of interactive objects reaches the first preset duration, the robot or smart device will automatically enter the dormant state at this time to prevent the robot or smart device from being awake for a long time and save the robot or smart device’s cost. Power consumption increases the standby time of robots or smart devices.
S450在自身处于休眠状态时判断是否接收到唤醒信号;S450 judges whether a wake-up signal is received when it is in a sleep state;
S460若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。In S460, if the wake-up signal is received, switch from the dormant state to the wake-up state, and determine that the target object that triggers the wake-up of itself is the current interactive object.
具体的,本实施例中与上述实施例相同的部分参见上述实施例,在此不再一一赘述。本实施例与上述实施例表明,无论机器人或者智能设备在何时进入休眠状态,机器人或者智能设备均只在从休眠状态切换为唤醒状态时,才需要通过触发产生唤醒信号的目标对象中确定当前交互对象,只要机器人或智能设备从休眠状态切换为唤醒状态后,后续的整个唤醒状态过程中,无需像现有技术那样频繁语音输入唤醒词实现中途切换新的交互对象,也无需频 繁导致所有的用户必须了解掌握不同机器人或智能设备的触发操作,只需要根据采集的图像数据和语音信号就能实现在多用户交互场景下实时、智能地切换新的交互对象,不仅更符合日常交流模式,更有助于达成有效沟通,增加人机沟通的拟人化效果,从而实现机器人或者智能设备与多个对象进行有效交互沟通的目的。Specifically, for the parts in this embodiment that are the same as those in the above-mentioned embodiment, refer to the above-mentioned embodiment, which will not be repeated here. This embodiment and the above embodiments show that no matter when the robot or the smart device enters the dormant state, the robot or the smart device only needs to determine the current state by triggering the target object that generates the wake-up signal when it switches from the dormant state to the awakened state. Interaction objects, as long as the robot or smart device switches from the dormant state to the awakened state, during the subsequent wake-up state, there is no need to frequently voice input wake-up words as in the prior art to switch new interactive objects midway, and it does not need to frequently cause all Users must understand and master the trigger operations of different robots or smart devices. They only need to realize real-time and intelligent switching of new interactive objects in multi-user interaction scenarios based on the collected image data and voice signals. This is not only more in line with daily communication patterns, but also It helps to achieve effective communication and increase the personification effect of human-machine communication, so as to achieve the purpose of effective interactive communication between robots or smart devices and multiple objects.
本发明的一个实施例,如图5所示,一种交互沟通实现方法,包括:An embodiment of the present invention, as shown in FIG. 5, a method for implementing interactive communication includes:
S510检测当前交互对象是否停止交互;S510 detects whether the current interactive object stops interacting;
S520在当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;S520, when the duration of the current interaction object stops interacting reaches the second preset duration, search for candidate objects participating in the interaction through image recognition and/or sound source localization;
具体的,第二预设时长小于第一预设时长,机器人或者智能设备在满足查找切换新的交互对象的触发条件时,即机器人或者智能设备在与当前交互对象进行交互的过程中,每一次机器人或者智能设备在执行了当前交互对象的上一个需求服务类型之后,会等待第二预设时长,若在等待的第二预设时长之内未收到该当前交互对象的交互信息,则机器人或者智能设备默认当前交互对象已经不再参与交互,此时机器人或者智能设备通过图像识别和/或声源定位查找所有参与交互的候选对象,以便选择一个新的交互对象继续进行交互。Specifically, the second preset duration is less than the first preset duration. When the robot or smart device meets the trigger condition for finding and switching new interactive objects, that is, each time the robot or smart device interacts with the current interactive object, After the robot or smart device executes the last required service type of the current interactive object, it will wait for the second preset period of time. If the interactive information of the current interactive object is not received within the second preset period of time, the robot Or the smart device defaults that the current interactive object is no longer participating in the interaction. At this time, the robot or the smart device searches for all candidate objects participating in the interaction through image recognition and/or sound source localization, so as to select a new interactive object to continue the interaction.
S530若存在一个候选对象,确定该候选对象为新的交互对象;S530 If there is a candidate object, determine that the candidate object is a new interactive object;
S540若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为新的交互对象。In S540, if there are at least two candidate objects, determine one candidate object as a new interactive object according to the image recognition result and/or the sound source localization result.
具体的,机器人或者智能设备在满足查找切换新的交互对象的触发条件时,每次查找后只确定一个候选对象作为此次查找到的新的交互对象。机器人或者智能设备可以通过音频采集模块负责声音的采集,实现机器人或者智能设备的听觉功能,采集到语音信号之后通过对语音信号进行分帧、加窗处理,采用语音信号的音频处理确定声源数目,进而根据声源数目确定候选对象的数量,声源定位识别为现有技术,在此不再一一赘述。通过上述方式如果确定候 选对象的数量为一个时,直接将该候选对象确定为新的交互对象。如果确定候选对象的数量为至少两个时,根据获取到语音信号的时间先后顺序确定最早获取到的语音信号所对应候选用户为此次切换寻找到的新的交互对象。Specifically, when the robot or smart device meets the trigger condition for searching and switching new interactive objects, only one candidate object is determined as the new interactive object found this time after each search. The robot or smart device can be responsible for sound collection through the audio collection module to realize the auditory function of the robot or smart device. After the voice signal is collected, the voice signal is processed by framing and windowing, and the audio processing of the voice signal is used to determine the number of sound sources Then, the number of candidate objects is determined according to the number of sound sources, and the sound source localization recognition is a prior art, which will not be repeated here. If the number of candidate objects is determined to be one through the above method, the candidate object is directly determined as the new interactive object. If it is determined that the number of candidate objects is at least two, it is determined according to the time sequence of the acquired voice signals that the candidate user corresponding to the earliest acquired voice signal is the new interactive object found for this handover.
示例性的,机器人或智能设备与多人交互的场景下,机器人或智能设备通过音频采集模块实时采集语音信号,并对获取到的语音信号根据声源定位识别技术获取声源数目,确定最早发出语音信号的候选用户为此次切换寻找到的新的交互对象。Exemplarily, in a scenario where a robot or smart device interacts with multiple people, the robot or smart device collects voice signals in real time through the audio collection module, and obtains the number of sound sources according to the sound source location recognition technology to determine the earliest sound source. The candidate users of the voice signal are the new interactive objects found for this handover.
当然,机器人或者智能设备还可以通过图像采集模块负责图像数据的采集,实现机器人或者智能设备的视觉功能,采集到图像数据之后通过图像识别技术的图像识别结果确定候选对象的数量,如果确定候选对象的数量为一个时,直接将该候选对象确定为新的交互对象。如果确定候选对象的数量为至少两个时,根据图像识别得到的各候选对象参与交互的时间先后顺序确定最早参与交互所对应候选用户为此次切换寻找到的新的交互对象。Of course, the robot or smart device can also be responsible for the collection of image data through the image acquisition module to realize the vision function of the robot or smart device. After the image data is collected, the number of candidate objects can be determined through the image recognition result of the image recognition technology. If the candidate objects are determined When the number of is one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, the candidate user corresponding to the earliest participating interaction is determined as the new interaction object found in this handover according to the time sequence of each candidate object participating in the interaction obtained by image recognition.
示例性的,在多人与机器人交互的场景下,机器人通过图像采集模块实时捕捉图像数据,并对获取到的图像数据进行人脸识别,在识别确定为人脸时,再进行张嘴识别,在确定为识别结果为张嘴时,获取发出张嘴动作的人体数量,确定最早发出张嘴动作的候选用户甲为此次切换寻找到的新的交互对象。Exemplarily, in a scene where multiple people interact with the robot, the robot captures image data in real time through the image acquisition module, and performs face recognition on the acquired image data. In order to obtain the number of the human body that issued the mouth opening action when the recognition result is that the mouth is opened, the candidate user A who issued the mouth opening action first is determined as the new interactive object found in this switch.
当然,机器人或者智能设备还可以通过图像采集模块负责图像数据的采集,以及音频采集模块负责声音的采集,采集到图像数据和语音信号之后通过图像识别技术和声源定位技术结合分析确定候选对象的数量,如果确定候选对象的数量为一个时,直接将该候选对象确定为新的交互对象。如果确定候选对象的数量为至少两个时,根据图像识别结果和/或声源定位结果对候选对象的张嘴动作和语音信号进行综合分析,从各参与交互的候选对象中查找出最早参与交互所对应候选用户,从而确定该最早参与交互的候选用户为此次切换寻找到的新的交互对象。Of course, the robot or smart device can also be responsible for the collection of image data through the image collection module, and the audio collection module is responsible for the collection of sound. After the image data and voice signals are collected, the image recognition technology and sound source localization technology can be combined to analyze and determine the candidate object. Quantity, if the number of candidate objects is determined to be one, the candidate object is directly determined as a new interactive object. If it is determined that the number of candidate objects is at least two, comprehensively analyze the mouth opening action and voice signal of the candidate object according to the image recognition result and/or the sound source localization result, and find the earliest participant in the interaction from the candidate objects participating in the interaction. Corresponding to the candidate user, thereby determining that the candidate user who participated in the interaction earliest is the new interaction object found for this handover.
S550在唤醒状态下当前交互对象未停止交互时,继续检测的同时响应当前交互对象的需求服务类型;S550 When the current interactive object does not stop interacting in the awakening state, it will continue to detect while responding to the service type required by the current interactive object;
S560在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态;S560 enters the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state;
S570在自身处于休眠状态时判断是否接收到唤醒信号;S570 judges whether a wake-up signal is received when it is in a sleep state;
S580若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。In S580, if the wake-up signal is received, switch from the dormant state to the wake-up state, and determine that the target object that triggers the awakening of itself is the current interactive object.
本实施例中与上述实施例相同的部分参见上述实施例,在此不再一一赘述。本发明优选采用图像数据和语音信号同时作为判断因素检测候选对象并确定其中一个候选对象作为新的交互对象,避免将在音频采集模块和图像采集模块的有效采集范围内发出无意义语音信号的候选对象(例如婴儿),或者在没有交互意图的用户错误确定为新的交互对象,结合图像识别技术和声源定位技术,实现了对候选对象所在方向位置的精确定位,提升查找确定新的交互对象的准确率。For the parts in this embodiment that are the same as those in the foregoing embodiment, refer to the foregoing embodiment, which will not be repeated here. The present invention preferably uses image data and voice signals as the judging factors to detect candidate objects and determine one of the candidate objects as a new interactive object, so as to avoid candidates who will emit meaningless voice signals within the effective collection range of the audio collection module and the image collection module Objects (such as babies) or users who have no interactive intentions are mistakenly identified as new interactive objects. Combining image recognition technology and sound source localization technology, it achieves precise positioning of the direction and position of candidate objects, and improves the search and determination of new interactive objects. The accuracy rate.
本实施例中,机器人或者智能设备在唤醒状态下自动切换新的交互对象继续进行交互,提高了机器人或者智能设备与多个交互对象进行切换交互的效率,缩短机器人或者智能设备转向下一个交互对象的切换时间,从而大大减少切换交互的反应时间,提高机器人或者智能设备与多个交互对象进行切换沟通的效率,使得交互过程更加自然、灵活,大大提高了机器人或者智能设备的交互能力。In this embodiment, the robot or smart device automatically switches to a new interactive object to continue the interaction when the robot or smart device is awakened, which improves the efficiency of switching between the robot or smart device and multiple interactive objects, and shortens the robot or smart device from turning to the next interactive object. This greatly reduces the reaction time of switching interactions, improves the efficiency of switching communication between the robot or smart device and multiple interactive objects, makes the interaction process more natural and flexible, and greatly improves the interaction capabilities of the robot or smart device.
示例性的,如图6所示,在情感陪伴机器人Robot使用场景下,包括Robot、User1、User2和User3。而且图示提到的User1、User2和User3不是特定的,只是用于区分不同的User。User1来到Robot面前,通过唤醒词把Robot唤醒,随后Robot转向User1并与User1进行交互,在交互的过程中要实时的判断User1是否还在继续和它(Robot)交互,Robot通过声源定位和人脸特征识别判断User1 已经停止和它(Robot)进行交互,Robot要自动转向正在说话的User2,当有两个User以上的时候也适应此策略。机器人面对多个人时进行人机交互过程如图7所示包括如下步骤:Exemplarily, as shown in FIG. 6, in the use scene of the emotional companion robot Robot, it includes Robot, User1, User2, and User3. Moreover, User1, User2, and User3 mentioned in the figure are not specific, but only used to distinguish different users. User1 comes to Robot and wakes up Robot through the wake-up word. Then Robot turns to User1 and interacts with User1. During the interaction, it is necessary to determine in real time whether User1 is still interacting with it (Robot). Robot locates and interacts with it through the sound source. Facial feature recognition judges that User1 has stopped interacting with it (Robot), and Robot should automatically turn to User2 who is talking. This strategy is also adapted when there are more than two users. The process of human-computer interaction when the robot faces multiple people is shown in Figure 7 and includes the following steps:
步骤0、初始状态;一个Robot(处于休眠状态),两个或两个以上的可以和Robot进行交互的User。Step 0: Initial state; one Robot (in dormant state), two or more users who can interact with the Robot.
步骤1、User1走近Robot并唤醒Robot,Robot从休眠状态被唤醒切换为唤醒状态,转到步骤2。Step 1. User1 approaches Robot and wakes up Robot. Robot is awakened from dormant state and switched to awakened state, and then go to step 2.
步骤2、Robot转向User1并与User1进行交互,转到步骤3。Step 2. Robot turns to User1 and interacts with User1, and then goes to step 3.
步骤3、Robot与User1交互的过程中会通过声源定位和人脸特征识别来判断当前User1是否还在继续和自身(Robot)进行交互,判断结果分以下四种:Step 3. In the process of interaction between Robot and User1, it will judge whether User1 is still interacting with itself (Robot) through sound source localization and facial feature recognition. The judgment results are divided into the following four types:
(1)判断结果为“结果1”,即Robot判断User1持续和Robot进行交互,那么Robot就一直盯着User1,转到步骤3。(1) The judgment result is "Result 1", that is, Robot judges that User1 continues to interact with Robot, then Robot keeps staring at User1 and goes to step 3.
(2)判断结果为“结果2”,即Robot判断User1已经停止和Robot交互,而且此时Robot听到User2在说话,转到步骤2,转到步骤2之后这里User2就会替换步骤2的User1。(2) The judgment result is "Result 2", that is, Robot judges that User1 has stopped interacting with Robot, and at this time Robot hears User2 talking, go to step 2, and then User2 will replace User1 in step 2 after going to step 2. .
(3)判断结果为“结果3”,即Robot判断User1已经停止和Robot交互,而且此时Robot没有听到User2说话,Robot会进入休眠倒计时状态,如果休眠倒计时结束前Robot听到User2在说话,则转到步骤2,转到步骤2之后这里User2就会替换步骤2的User1。(3) The judgment result is "Result 3", that is, Robot judges that User1 has stopped interacting with Robot, and at this time Robot does not hear User2 speaking, Robot will enter the sleep countdown state, if Robot hears User2 speaking before the end of the sleep countdown, Go to step 2. After going to step 2, User2 will replace User1 in step 2.
(4)判断结果为“结果4”,即Robot判断User1已经停止和Robot交互,而且此时Robot没有听到User2说话,Robot会进入休眠倒计时状态,如果休眠倒计时结束前Robot也没有听到User2在说话,则转到步骤0。(4) The judgment result is "Result 4", that is, Robot judges that User1 has stopped interacting with Robot, and at this time, Robot does not hear User2 speaking, Robot will enter the sleep countdown state, if Robot does not hear User2 before the end of the sleep countdown To speak, go to step 0.
本发明的一个实施例,一种交互沟通实现设备,如图8所示,包括:An embodiment of the present invention, an interactive communication realization device, as shown in FIG. 8, includes:
图像采集模块10,用于在采集人脸图像;The image collection module 10 is used to collect face images;
音频采集模块20,用于采集语音信号;The audio collection module 20 is used to collect voice signals;
检测模块30,用于检测当前交互对象是否停止交互;The detection module 30 is used to detect whether the current interactive object stops interacting;
处理模块40,用于在唤醒状态下当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象。The processing module 40 is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awake state.
具体的,本实施例是上述方法实施例对应的装置实施例,具体效果参见上述方法实施例,在此不再一一赘述。Specifically, this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
基于前述实施例,还包括:Based on the foregoing embodiment, it further includes:
检测模块30,还用于在自身处于休眠状态时判断是否接收到唤醒信号;The detection module 30 is also used for judging whether a wake-up signal is received when it is in a dormant state;
处理模块40,还用于若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。The processing module 40 is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
具体的,本实施例是上述方法实施例对应的装置实施例,具体效果参见上述方法实施例,在此不再一一赘述。Specifically, this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
基于前述实施例,还包括:Based on the foregoing embodiment, it further includes:
执行模块,用于在唤醒状态下当前交互对象未停止交互时,继续检测的同时响应当前交互对象的需求服务类型;The execution module is used to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
处理模块40,还用于在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态。The processing module 40 is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awake state.
具体的,本实施例是上述方法实施例对应的装置实施例,具体效果参见上述方法实施例,在此不再一一赘述。Specifically, this embodiment is a device embodiment corresponding to the foregoing method embodiment, and for specific effects, refer to the foregoing method embodiment, which will not be repeated here.
基于前述实施例,处理模块40包括:Based on the foregoing embodiment, the processing module 40 includes:
查找单元,在当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;The searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration of the current interaction object stops interacting reaches the second preset duration;
对象切换单元,用于若存在一个候选对象,确定该候选对象为新的交互对象;若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为新的交互对象。The object switching unit is used to determine if there is a candidate object as a new interactive object; if there are at least two candidate objects, determine a candidate object as a new interactive object according to the image recognition result and/or the sound source localization result .
具体的,本实施例是上述方法实施例对应的装置实施例,具体效果参见上 述方法实施例,在此不再一一赘述。Specifically, this embodiment is a device embodiment corresponding to the above method embodiment. For specific effects, refer to the above method embodiment, which will not be repeated here.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各程序模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的程序模块完成,即将所述装置的内部结构划分成不同的程序单元或模块,以完成以上描述的全部或者部分功能。实施例中的各程序模块可以集成在一个处理单元中,也可是各个单元单独物理存在,也可以两个或两个以上单元集成在一个处理单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序单元的形式实现。另外,各程序模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。Those skilled in the art can clearly understand that for the convenience and conciseness of description, only the division of the above-mentioned program modules is used as an example. In practical applications, the above-mentioned functions can be allocated by different program modules as needed, namely The internal structure of the device is divided into different program units or modules to complete all or part of the functions described above. The program modules in the embodiments can be integrated in one processing unit, or each unit can exist alone physically, or two or more units can be integrated in one processing unit. The above-mentioned integrated units can be implemented in the form of hardware. It can also be implemented in the form of a software program unit. In addition, the specific names of the program modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application.
本发明的一个实施例,一种智能设备,包括处理器、存储器,其中,存储器,用于存放计算机程序;处理器,用于执行存储器上所存放的计算机程序,实现上述方法实施例中的交互沟通实现方法。In an embodiment of the present invention, a smart device includes a processor and a memory, where the memory is used to store a computer program; the processor is used to execute the computer program stored in the memory to implement the interaction in the above method embodiment Communication implementation method.
所述智能设备可以为桌上型计算机、笔记本、掌上电脑、平板型计算机、手机、人机交互屏等设备。所述智能设备可包括,但不仅限于处理器、存储器。本领域技术人员可以理解,上述仅仅是智能设备的示例,并不构成对智能设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如:智能设备还可以包括输入/输出接口、显示设备、网络接入设备、通信总线、通信接口等。通信接口和通信总线,还可以包括输入/输出接口,其中,处理器、存储器、输入/输出接口和通信接口通过通信总线完成相互间的通信。该存储器存储有计算机程序,该处理器用于执行存储器上所存放的计算机程序,实现上述方法实施例中的交互沟通实现方法。The smart device may be a desktop computer, a notebook, a palmtop computer, a tablet computer, a mobile phone, a human-computer interaction screen and other devices. The smart device may include, but is not limited to, a processor and a memory. Those skilled in the art can understand that the above are only examples of smart devices, and do not constitute a limitation on smart devices, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different components, for example: Smart devices may also include input/output interfaces, display devices, network access devices, communication buses, communication interfaces, and so on. The communication interface and the communication bus may also include an input/output interface, where the processor, the memory, the input/output interface, and the communication interface complete mutual communication through the communication bus. The memory stores a computer program, and the processor is used to execute the computer program stored on the memory to implement the interactive communication implementation method in the foregoing method embodiment.
所述处理器可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、 现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), on-site Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
所述存储器可以是所述智能设备的内部存储单元,例如:智能设备的硬盘或内存。所述存储器也可以是所述智能设备的外部存储设备,例如:所述智能设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器还可以既包括所述智能设备的内部存储单元也包括外部存储设备。所述存储器用于存储所述计算机程序以及所述智能设备所需要的其他程序和数据。所述存储器还可以用于暂时地存储已经输出或者将要输出的数据。The memory may be an internal storage unit of the smart device, such as a hard disk or memory of the smart device. The memory may also be an external storage device of the smart device, for example: a plug-in hard disk equipped on the smart device, a smart media card (SMC), a secure digital (SD) card, Flash Card, etc. Further, the memory may also include both an internal storage unit of the smart device and an external storage device. The memory is used to store the computer program and other programs and data required by the smart device. The memory can also be used to temporarily store data that has been output or will be output.
通信总线是连接所描述的元素的电路并且在这些元素之间实现传输。例如,处理器通过通信总线从其它元素接收到命令,解密接收到的命令,根据解密的命令执行计算或数据处理。存储器可以包括程序模块,例如内核(kernel),中间件(middleware),应用程序编程接口(Application Programming Interface,API)和应用。该程序模块可以是有软件、固件或硬件、或其中的至少两种组成。输入/输出接口转发用户通过输入/输出接口(例如感应器、键盘、触摸屏)输入的命令或数据。通信接口将该智能设备与其它网络设备、用户设备、网络进行连接。例如,通信接口可以通过有线或无线连接到网络以连接到外部其它的网络设备或用户设备。无线通信可以包括以下至少一种:无线保真(WiFi),蓝牙(BT),近距离无线通信技术(NFC),全球卫星定位系统(GPS)和蜂窝通信等等。有线通信可以包括以下至少一种:通用串行总线(USB),高清晰度多媒体接口(HDMI),异步传输标准接口(RS-232)等等。网络可以是电信网络和通信网络。通信网络可以为计算机网络、因特网、物联网、电话网络。智能设备可以通过通信接口连接网络,智能设备和其它网络设备通信所用的协议可以被应用、应用程序编 程接口(API)、中间件、内核和通信接口至少一个支持。The communication bus is a circuit that connects the described elements and realizes transmission between these elements. For example, the processor receives commands from other elements through the communication bus, decrypts the received commands, and performs calculations or data processing according to the decrypted commands. The memory may include program modules, such as a kernel (kernel), middleware (middleware), application programming interface (Application Programming Interface, API), and applications. The program module can be composed of software, firmware or hardware, or at least two of them. The input/output interface forwards commands or data input by the user through the input/output interface (such as a sensor, a keyboard, and a touch screen). The communication interface connects the smart device with other network devices, user equipment, and the network. For example, the communication interface may be wired or wirelessly connected to the network to connect to other external network equipment or user equipment. The wireless communication may include at least one of the following: wireless fidelity (WiFi), Bluetooth (BT), short-range wireless communication technology (NFC), global satellite positioning system (GPS), cellular communication, and so on. Wired communication may include at least one of the following: universal serial bus (USB), high-definition multimedia interface (HDMI), asynchronous transmission standard interface (RS-232), and so on. The network can be a telecommunication network and a communication network. The communication network can be a computer network, the Internet, the Internet of Things, and a telephone network. Smart devices can be connected to the network through a communication interface, and the protocol used by the smart device to communicate with other network devices can be supported by at least one of the application, application programming interface (API), middleware, kernel, and communication interface.
本发明的一个实施例,一种存储介质,存储介质中存储有至少一条指令,指令由处理器加载并执行以实现上述交互沟通实现方法对应实施例所执行的操作。例如,计算机可读存储介质可以是只读内存(ROM)、随机存取存储器(RAM)、只读光盘(CD-ROM)、磁带、软盘和光数据存储设备等。An embodiment of the present invention is a storage medium in which at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the corresponding embodiment of the foregoing interactive communication implementation method. For example, the computer-readable storage medium may be read-only memory (ROM), random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。They can be implemented by program codes executable by a computing device, so that they can be stored in a storage device to be executed by the computing device, or they can be made into individual integrated circuit modules, or multiple modules or steps in them Made into a single integrated circuit module to achieve. In this way, the present invention is not limited to any specific combination of hardware and software.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述或记载的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in an embodiment, reference may be made to related descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的装置/智能设备和方法,可以通过其他的方式实现。例如,以上所描述的装置/智能设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如,多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性、机械或其他的形式。In the embodiments provided in this application, it should be understood that the disclosed device/smart device and method may be implemented in other ways. For example, the device/smart device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, for example, multiple divisions. Units or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的, 作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可能集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序发送指令给相关的硬件完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括:计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如:在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present invention implements all or part of the processes in the above-mentioned embodiment methods, and can also be completed by sending instructions to related hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes: computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) ), Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc. It should be noted that the content contained in the computer-readable storage medium can be appropriately increased or decreased in accordance with the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer can be The reading medium does not include electric carrier signals and telecommunication signals.
应当说明的是,上述实施例均可根据需要自由组合。以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。It should be noted that the above embodiments can be freely combined as required. The above are only the preferred embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

Claims (10)

  1. 一种交互沟通实现方法,其特征在于,包括步骤:A method for realizing interactive communication, which is characterized in that it comprises the following steps:
    检测当前交互对象是否停止交互;Detect whether the current interactive object stops interacting;
    在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象。When the current interactive object stops interacting in the awake state, a candidate object participating in the interaction is determined as a new interactive object by collecting image data and voice signals.
  2. 根据权利要求1所述的交互沟通实现方法,其特征在于,还包括步骤:The method for implementing interactive communication according to claim 1, characterized in that it further comprises the steps of:
    在唤醒状态下所述当前交互对象未停止交互时,继续检测的同时响应所述当前交互对象的需求服务类型。When the current interactive object does not stop interacting in the awakened state, the detection is continued while responding to the required service type of the current interactive object.
  3. 根据权利要求1所述的交互沟通实现方法,其特征在于,还包括步骤:The method for implementing interactive communication according to claim 1, characterized in that it further comprises the steps of:
    在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态。In the awakened state and the duration of no interaction object reaches the first preset time period, it enters the dormant state.
  4. 根据权利要求1所述的交互沟通实现方法,其特征在于,还包括步骤:The method for implementing interactive communication according to claim 1, characterized in that it further comprises the steps of:
    在自身处于休眠状态时判断是否接收到唤醒信号;Judge whether it receives a wake-up signal when it is in a sleep state;
    若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。If a wake-up signal is received, it switches from the dormant state to the wake-up state, and it is determined that the target object that triggers the awakening of itself is the current interactive object.
  5. 根据权利要求1-4任一项所述的交互沟通实现方法,其特征在于,所述在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象包括步骤:The method for implementing interactive communication according to any one of claims 1 to 4, wherein when the current interactive object stops interacting in the awakened state, a candidate participating in the interaction is selected by collecting image data and voice signals. Determining the object as a new interactive object includes the following steps:
    在所述当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;When the duration for the current interactive object to stop interacting reaches the second preset time period, search for candidate objects participating in the interaction through image recognition and/or sound source localization;
    若存在一个候选对象,确定该候选对象为所述新的交互对象;If there is a candidate object, determine that the candidate object is the new interactive object;
    若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为所述新的交互对象。If there are at least two candidate objects, one candidate object is determined as the new interactive object according to the image recognition result and/or the sound source localization result.
  6. 一种交互沟通实现设备,其特征在于,包括:An interactive communication realization device, which is characterized in that it includes:
    图像采集模块,用于在采集人脸图像;Image collection module, used to collect face images;
    音频采集模块,用于采集语音信号;Audio collection module, used to collect voice signals;
    检测模块,用于检测当前交互对象是否停止交互;The detection module is used to detect whether the current interactive object stops interacting;
    处理模块,用于在唤醒状态下所述当前交互对象停止交互时,通过采集到图像数据和语音信号将参与交互的一个候选对象确定为新的交互对象。The processing module is configured to determine a candidate object participating in the interaction as a new interactive object by collecting image data and voice signals when the current interactive object stops interacting in the awakened state.
  7. 根据权利要求6所述的交互沟通实现设备,其特征在于,还包括:The interactive communication realization device according to claim 6, characterized in that it further comprises:
    执行模块,用于在唤醒状态下所述当前交互对象未停止交互时,继续检测的同时响应所述当前交互对象的需求服务类型;The execution module is configured to respond to the required service type of the current interactive object while continuing to detect when the current interactive object does not stop interacting in the awakened state;
    所述处理模块,还用于在唤醒状态下且不存在交互对象的持续时长达到第一预设时长时进入休眠状态。The processing module is also configured to enter the dormant state when the duration for which there is no interactive object reaches the first preset duration in the awakened state.
  8. 根据权利要求6所述的交互沟通实现设备,其特征在于:The interactive communication realization device according to claim 6, characterized in that:
    所述检测模块,还用于在自身处于休眠状态时判断是否接收到唤醒信号;The detection module is also used to determine whether a wake-up signal is received when the detection module is in a sleep state;
    所述处理模块,还用于若接收到唤醒信号,从休眠状态切换为唤醒状态,且确定触发唤醒自身的目标对象为当前交互对象。The processing module is further configured to switch from the dormant state to the awakened state if a wake-up signal is received, and determine that the target object that triggers the awakening of itself is the current interactive object.
  9. 根据权利要求6-8任一项所述的交互沟通实现设备,其特征在于,所述处理模块包括:The interactive communication realization device according to any one of claims 6-8, wherein the processing module comprises:
    查找单元,在所述当前交互对象停止交互的持续时长达到第二预设时长时,通过图像识别和/或声源定位查找参与交互的候选对象;The searching unit searches for candidate objects participating in the interaction through image recognition and/or sound source localization when the duration for the current interaction object to stop interacting reaches the second preset duration;
    对象切换单元,用于若存在一个候选对象,确定该候选对象为所述新的交互对象;若存在至少两个候选对象,根据图像识别结果和/或声源定位结果确定一个候选对象为所述新的交互对象。The object switching unit is configured to determine that if there is one candidate object, the candidate object is the new interactive object; if there are at least two candidate objects, determine one candidate object as the new interactive object according to the image recognition result and/or the sound source localization result New interactive objects.
  10. 一种存储介质,其特征在于,所述存储介质中存储有至少一条指令,所述指令由处理器加载并执行以实现如权利要求1至权利要求5任一项所述的交互沟通实现方法所执行的操作。A storage medium, characterized in that at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the method for implementing interactive communication according to any one of claims 1 to 5; Action performed.
PCT/CN2020/086222 2020-04-22 2020-04-22 Interactive communication implementation method and device, and storage medium WO2021212388A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/086222 WO2021212388A1 (en) 2020-04-22 2020-04-22 Interactive communication implementation method and device, and storage medium
CN202080004243.6A CN112739507B (en) 2020-04-22 2020-04-22 Interactive communication realization method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/086222 WO2021212388A1 (en) 2020-04-22 2020-04-22 Interactive communication implementation method and device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021212388A1 true WO2021212388A1 (en) 2021-10-28

Family

ID=75609496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/086222 WO2021212388A1 (en) 2020-04-22 2020-04-22 Interactive communication implementation method and device, and storage medium

Country Status (2)

Country Link
CN (1) CN112739507B (en)
WO (1) WO2021212388A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114193477A (en) * 2021-12-24 2022-03-18 上海擎朗智能科技有限公司 Position leading method, device, robot and storage medium
CN116363566A (en) * 2023-06-02 2023-06-30 华东交通大学 Target interaction relation recognition method based on relation knowledge graph

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116978372A (en) * 2022-04-22 2023-10-31 华为技术有限公司 Voice interaction method, electronic equipment and storage medium
CN114715175A (en) * 2022-05-06 2022-07-08 Oppo广东移动通信有限公司 Target object determination method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105881548A (en) * 2016-04-29 2016-08-24 北京快乐智慧科技有限责任公司 Method for waking up intelligent interactive robot and intelligent interactive robot
CN108733420A (en) * 2018-03-21 2018-11-02 北京猎户星空科技有限公司 Awakening method, device, smart machine and the storage medium of smart machine
CN109683610A (en) * 2018-12-14 2019-04-26 北京猎户星空科技有限公司 Smart machine control method, device and storage medium
CN110111789A (en) * 2019-05-07 2019-08-09 百度国际科技(深圳)有限公司 Voice interactive method, calculates equipment and computer-readable medium at device
US20190371342A1 (en) * 2018-06-05 2019-12-05 Samsung Electronics Co., Ltd. Methods and systems for passive wakeup of a user interaction device
CN110730115A (en) * 2019-09-11 2020-01-24 北京小米移动软件有限公司 Voice control method and device, terminal and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354255A (en) * 2016-08-26 2017-01-25 北京光年无限科技有限公司 Man-machine interactive method and equipment facing robot product
CN110290096B (en) * 2018-03-19 2022-06-24 阿里巴巴集团控股有限公司 Man-machine interaction method and terminal
CN109166575A (en) * 2018-07-27 2019-01-08 百度在线网络技术(北京)有限公司 Exchange method, device, smart machine and the storage medium of smart machine
CN109461448A (en) * 2018-12-11 2019-03-12 百度在线网络技术(北京)有限公司 Voice interactive method and device
CN110689889B (en) * 2019-10-11 2021-08-17 深圳追一科技有限公司 Man-machine interaction method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105881548A (en) * 2016-04-29 2016-08-24 北京快乐智慧科技有限责任公司 Method for waking up intelligent interactive robot and intelligent interactive robot
CN108733420A (en) * 2018-03-21 2018-11-02 北京猎户星空科技有限公司 Awakening method, device, smart machine and the storage medium of smart machine
US20190371342A1 (en) * 2018-06-05 2019-12-05 Samsung Electronics Co., Ltd. Methods and systems for passive wakeup of a user interaction device
CN109683610A (en) * 2018-12-14 2019-04-26 北京猎户星空科技有限公司 Smart machine control method, device and storage medium
CN110111789A (en) * 2019-05-07 2019-08-09 百度国际科技(深圳)有限公司 Voice interactive method, calculates equipment and computer-readable medium at device
CN110730115A (en) * 2019-09-11 2020-01-24 北京小米移动软件有限公司 Voice control method and device, terminal and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114193477A (en) * 2021-12-24 2022-03-18 上海擎朗智能科技有限公司 Position leading method, device, robot and storage medium
CN116363566A (en) * 2023-06-02 2023-06-30 华东交通大学 Target interaction relation recognition method based on relation knowledge graph
CN116363566B (en) * 2023-06-02 2023-10-17 华东交通大学 Target interaction relation recognition method based on relation knowledge graph

Also Published As

Publication number Publication date
CN112739507B (en) 2023-05-09
CN112739507A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
WO2021212388A1 (en) Interactive communication implementation method and device, and storage medium
US11620984B2 (en) Human-computer interaction method, and electronic device and storage medium thereof
CN108735209B (en) Wake-up word binding method, intelligent device and storage medium
KR101726945B1 (en) Reducing the need for manual start/end-pointing and trigger phrases
WO2021036714A1 (en) Voice-controlled split-screen display method and electronic device
CN110263131B (en) Reply information generation method, device and storage medium
CN112860169B (en) Interaction method and device, computer readable medium and electronic equipment
CN108766438A (en) Man-machine interaction method, device, storage medium and intelligent terminal
EP3933570A1 (en) Method and apparatus for controlling a voice assistant, and computer-readable storage medium
EP4184506A1 (en) Audio processing
CN111063354B (en) Man-machine interaction method and device
CN109032554B (en) Audio processing method and electronic equipment
WO2022042274A1 (en) Voice interaction method and electronic device
CN112634895A (en) Voice interaction wake-up-free method and device
US20230048330A1 (en) In-Vehicle Speech Interaction Method and Device
WO2024103926A1 (en) Voice control methods and apparatuses, storage medium, and electronic device
CN112233676A (en) Intelligent device awakening method and device, electronic device and storage medium
WO2022227507A1 (en) Wake-up degree recognition model training method and speech wake-up degree acquisition method
CN106683668A (en) Method of awakening control of intelligent device and system
WO2023006033A1 (en) Speech interaction method, electronic device, and medium
US11929081B2 (en) Electronic apparatus and controlling method thereof
WO2024103893A1 (en) Method for waking up application program, and electronic device
CN109119075A (en) Speech recognition scene awakening method and device
CN110989963B (en) Wake-up word recommendation method and device and storage medium
WO2024055831A1 (en) Voice interaction method and apparatus, and terminal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04.05.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20931847

Country of ref document: EP

Kind code of ref document: A1