WO2017031860A1 - Procédé et système de commande basés sur l'intelligence artificielle pour dispositif d'interaction intelligent - Google Patents

Procédé et système de commande basés sur l'intelligence artificielle pour dispositif d'interaction intelligent Download PDF

Info

Publication number
WO2017031860A1
WO2017031860A1 PCT/CN2015/096587 CN2015096587W WO2017031860A1 WO 2017031860 A1 WO2017031860 A1 WO 2017031860A1 CN 2015096587 W CN2015096587 W CN 2015096587W WO 2017031860 A1 WO2017031860 A1 WO 2017031860A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
willingness
interact
interaction
sound source
Prior art date
Application number
PCT/CN2015/096587
Other languages
English (en)
Chinese (zh)
Inventor
葛行飞
李峥
林汉权
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Publication of WO2017031860A1 publication Critical patent/WO2017031860A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B15/00Systems controlled by a computer
    • G05B15/02Systems controlled by a computer electric

Definitions

  • the present invention relates to the field of intelligent terminal technologies, and in particular, to an intelligent intelligence device control method, a control system, and an intelligent interaction device based on artificial intelligence (AI).
  • AI artificial intelligence
  • the interaction with humans is single and the interaction is poor. This is because the remote control operation has limited functions, and the intelligent interaction device cannot perform actions other than the remote operation function. Similarly, the intelligent interaction device operates according to the program set in advance, and there is also Other actions than the setup procedure are completed, and different movements cannot be performed for different user needs. In addition, these interactions are performed after the user remotely controls or triggers a function button, so it is completely passive interaction.
  • video conferencing tracking systems can turn the camera and the like to the speaker according to the voice of the speaker, it is not possible to accurately determine whether the speaker has an willingness to interact or to respond appropriately according to the willingness to interact.
  • the first object of the present invention is to propose an intelligent interactive device control method based on artificial intelligence.
  • the method can improve the interaction experience between the user and the smart interaction device, and improve the intelligence of the smart interaction device.
  • a second object of the present invention is to provide an intelligent interactive device control system based on artificial intelligence.
  • a third object of the present invention is to provide an intelligent interactive device.
  • a fourth object of the invention is to propose an apparatus.
  • a fifth object of the present invention is to provide a non-volatile computer storage medium.
  • an embodiment of the first aspect of the present invention discloses an artificial intelligence-based intelligent interactive device control method, including the steps of: receiving a multi-modal input signal, the multi-modal input signal including a user Input image signal, sound signal and/or distance signal; performing face detection according to the image signal, and acquiring the face image and face information when detecting a human face; performing lip according to the face image Area detection to determine a lip motion condition; performing sound source localization according to the sound signal to obtain sound source information; according to the face information, the lip motion condition, the sound source information, and/or the distance signal Determining the user's willingness to interact and the degree of willingness to interact; and controlling the smart interaction device to perform a corresponding interactive response according to the user's willingness to interact and the willingness to interact.
  • the artificial intelligence-based intelligent interactive device control method can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
  • An embodiment of the second aspect of the present invention discloses an artificial intelligence-based intelligent interactive device control system, including: a receiving module, configured to receive a multi-modal input signal, where the multi-modal input signal includes an image input by a user a signal, a sound signal, and/or a distance signal; a face detection module, configured to perform face detection according to the image signal, and acquire the face image and face information when a human face is detected; a lip detection module For performing lip detection according to the face image to determine a lip motion condition; a sound source positioning module, configured to perform sound source localization according to the sound signal to obtain sound source information; and a decision module, the decision module Determining the user's willingness to interact and the degree of interaction will be based on the face information, the lip motion, the sound source information, and/or the distance signal; and a composite output control module for The user's willingness to interact and the willingness to interact strongly control the intelligent interactive device to perform a corresponding interactive response.
  • a receiving module configured to receive a multi-modal input signal,
  • the artificial intelligence-based intelligent interactive device control system can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
  • the embodiment of the third aspect of the present invention discloses an intelligent interaction device, comprising: the artificial intelligence-based intelligent interaction device control system according to the second aspect embodiment.
  • the intelligent intelligent interaction device can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing by artificial intelligence, determine whether the user has the willingness to interact, and can determine the strong degree of interaction intention, and then autonomously Control the intelligent interaction device to perform corresponding actions, actively interact with the user and enrich the interaction means, thereby improving the user experience.
  • a fourth aspect of the present invention provides an apparatus comprising: one or more processors; a memory; one or more programs, the one or more programs being stored in the memory when many When the processor executes, the artificial intelligence-based intelligent interactive device control method of the first aspect of the present invention is executed.
  • a fifth aspect of the present invention provides a non-volatile computer storage medium storing one or more programs, when the one or more programs are executed by a device, causing the device An artificial intelligence-based intelligent interactive device control method for implementing the first aspect of the present invention.
  • FIG. 1 is a flowchart of an artificial intelligence based intelligent interactive device control method according to an embodiment of the present invention
  • FIG. 2 is a structural block diagram of an artificial intelligence based intelligent interactive device control system according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of an artificial intelligence based intelligent interactive device control system in accordance with one embodiment of the present invention.
  • the present invention realizes intelligent interactive device control method, control system and intelligence based on artificial intelligence with high intelligence and good human interaction experience.
  • Interactive devices in which Artificial Intelligence (AI) is a new technical science that studies and develops theories, methods, techniques, and application systems for simulating, extending, and extending human intelligence.
  • Artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that responds in a manner similar to human intelligence. Research in this area includes robotics, speech recognition, image recognition, and nature. Language processing and expert systems.
  • Artificial intelligence is a simulation of the information process of human consciousness and thinking. Artificial intelligence is not human intelligence, but it can be like human thinking, and it may exceed human intelligence. Artificial intelligence is a very broad science that consists of different fields. Such as machine learning, computer vision, etc. In general, one of the main goals of artificial intelligence research is to enable machines to perform complex tasks that typically require human intelligence.
  • FIG. 1 is a flow chart of an artificial intelligence based intelligent interactive device control method according to an embodiment of the present invention. As shown in FIG. 1, the method includes the following steps:
  • S101 Receive a multi-modal input signal, where the multi-modal input signal includes an image signal, a sound signal, and/or a distance signal input by a user.
  • the sound signal input by the user may be input by the user through a microphone; the image signal may be collected by a camera; the distance signal may be acquired by an infrared distance sensor.
  • S102 Perform face detection according to the image signal, and acquire a face image and a face information when a human face is detected.
  • the face information includes but is not limited to the face area information and the face face degree.
  • a face detection means may be used to detect whether there is a face in the image, an area occupied by the face in the image, whether the face is facing the smart interaction device, or the like.
  • the face image After detecting the presence of a face in the image, the face image can be intercepted from the image and the face information can be saved.
  • S103 Perform lip detection according to the face image to determine the movement of the lip region.
  • the detection of the lip motion condition may be performed from the intercepted face image by the lip region detecting means.
  • the detection result is that the lip zone sends an action or the lip zone does not act.
  • the lip motion condition can be determined based on the lip shape difference between the multi-frame face images. For example, the lip area of the face image of the previous frame shows that the upper and lower lips are closed, and the lip area of the face image of the latter frame shows that the upper and lower lips are open. At this time, it can be determined that the user's lip area is moving, and the user may be opening. Speak and so on.
  • the upper and lower lips may act at a certain moment, such as yawning.
  • the user's lip area should not be considered to have an action related to speaking or the like. Therefore, in order to avoid the occurrence of misjudgment, it is possible to determine whether the upper and lower lips are generated by comparing the lip regions of the continuous multi-frame image.
  • the action that is, whether the user has a speech or the like.
  • the function of the recognition is implemented.
  • the voice of the speaker is included in the voice signal (ie, the voice)
  • the sound source information includes, but is not limited to, sound source orientation information and sound intensity information.
  • the sound source can be determined accordingly.
  • the bit means performs sound source localization to determine sound source orientation information (ie, sound source angle information) and sound intensity information.
  • the sound source is based on the sound signal.
  • the sound signal can be denoised to filter out other noise interference, and the positioning accuracy of the sound source positioning of the speaker's voice can be improved. Specifically, it is determined whether the voice signal contains the voice when the user speaks; if yes, the voice of the voice signal in the voice signal is retained, and other interference noise is filtered out from the voice signal.
  • the voice can be manually
  • the function of speech recognition in intelligence is to recognize the speech of the speaker contained in the sound signal through the speech recognition function, thereby filtering out other noises, thereby improving the positioning accuracy of the sound source localization of the speaker's speech. .
  • S105 Determine the user's willingness to interact and the degree of interaction intention according to the face information, the lip motion, the sound source information, and/or the distance signal.
  • the user's willingness to interact and the degree of interaction will be determined according to any one of the face information, the lip motion, the sound source information, and the distance signal, and may also be based on the face information, A plurality of or all of the lip motion, the sound source information, and the distance signal are used together to determine the user's willingness to interact and the degree of interaction willingness. Relative to the degree of interaction of the user and the willingness to interact through one or a few pieces of information, the accuracy and reliability of the user's willingness to interact and the degree of interaction will be judged by multiple or all of the above information. high.
  • the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source.
  • the sound intensity can also be replaced by a voice activity index
  • the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if it is determined that the user faces the smart interactive device, the distance is close (such as within 1 meter), the lips are not moving, and there is no high-intensity sound source, it is determined that the user is interested in the smart interactive device, and there is a weak willingness to interact.
  • the user is determined to have a suspected interaction willingness.
  • the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, otherwise it is considered to be a high-intensity sound source; the preset distance can also be determined empirically, for example, the preset distance is 1 meter. That is, if the user faces the smart interactive device, the distance is close (eg within 1 meter), the lips The action is generated, and there is no high-intensity sound source, and it is determined to be a suspected interactive will.
  • the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source.
  • the sound intensity can also be replaced by a voice activity index
  • the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if the user faces the smart interactive device, the distance is close (such as within 1 meter), the lips generate motion, and there is a high-intensity sound source, it is determined that the user has a strong willingness to interact.
  • the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source.
  • the sound intensity can also be replaced by a voice activity index;
  • the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if the user's side face is facing the device and the distance is close (eg, within 1 meter) and there is a high-intensity sound source, it is determined that the user has a willingness to interact.
  • the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source.
  • the sound intensity can also be replaced by a voice activity index;
  • the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if there is a high-intensity sound source, the camera can not detect the face, and the distance is close (such as within 1 meter): it is judged that the user has a strong suspected willingness to interact (that is, a strong interactive willingness is required).
  • the predetermined intensity can be determined empirically, the purpose of which is to distinguish between a high-intensity sound and a relatively low-intensity sound, for example, the predetermined intensity may exist in the form of decibels, the predetermined intensity is, for example, 50 decibels, and when the sound intensity is less than 50 decibels, It is considered to be a low-intensity sound source, but it is considered to be a high-intensity sound source.
  • the sound intensity can also be replaced by a voice activity index;
  • the preset distance can also be determined empirically, for example: The preset distance is 1 meter. That is to say, if there is a high-intensity sound source, no face can be detected, and if the distance is far (such as more than 1 meter), it is judged to be weakly suspected (ie, weakly willing) Like interactive willingness).
  • a multi-classifier for multiple interaction intentions is constructed according to multiple independent features input, and comprehensive judgment is performed according to the value of the multi-modal input signal to accurately determine the interaction intention. And respond accordingly.
  • S106 Control the smart interaction device to perform corresponding interaction response according to the user's willingness to interact and the willingness to interact.
  • the smart interaction device can be intelligently controlled to perform a silent response, such as displaying different expressions, simple mechanical actions, and the like without vocalization.
  • the smart interaction device may be controlled to perform a volume prompt response, such as issuing a prompt for increasing the volume.
  • the smart interaction device may be controlled to perform a formal interactive response, that is, formally interact with the user.
  • the smart interaction device may be controlled to perform a voice/chat interaction response, that is, the voice/chat interaction mode is mainly used.
  • the smart interaction device may be controlled to turn to the sound source direction and perform a prompt response, for example, turning the microphone to the sound source direction and prompting the user.
  • only the smart interaction device may be controlled to turn to the sound source direction. For example: just turn the microphone to the direction of the sound source without prompting.
  • the face information, the lip motion, and the sound source information And/or determining whether the face information, the lip motion condition, the sound source information, and/or the distance signal satisfy a predetermined condition before the distance signal determines the user's willingness to interact and the degree of interaction intention; if the predetermined condition is met, the user interaction is performed. Willingness and the degree of strong willingness to interact.
  • the above condition can be determined by a timer, for example, when it is detected that a positive face faces the smart interaction device, the timer is started, and the time of facing the smart interaction device on the front face exceeds a specific time. After the time (such as 3 seconds), it is determined that the user is indeed facing the smart interaction device. In this way, the occurrence of misjudgment can be avoided.
  • the user only moves his head, he or she may face the smart interactive device at a certain moment, and by the above timing judgment, the user can move to the head. At some point, the face is neglected by the intelligent interactive device, so the probability of misjudgment can be reduced or even the misjudgment can be eliminated.
  • the user's willingness to interact and the willingness to interact are determined based on the face information, the lip motion, the sound source information, and/or the distance signal.
  • face information and lip movements can quantify face information and lip movements.
  • 30% positive face to face intelligent interaction The device, 50% of the face is facing the smart interaction device.
  • a unified standard can be provided for the user's willingness to interact and the degree of strong willingness to interact, thereby improving the accuracy of the judgment.
  • the method further includes: adjusting weights of the face information, the lip motion, the sound source information, and/or the distance signal, wherein the weight is used to influence the user's willingness to interact and the degree of interaction intention
  • the judgment result; determining the user's willingness to interact and the intensity of the interaction intention further includes: judging the user's willingness to interact and the intensity of the interaction intention according to the face information, the lip motion, the sound source information, and/or the weight of the distance signal.
  • the sensitivity ie, weight
  • the smart interaction device can be an ordinary living appliance, an information appliance (such as a computer, a television, etc.), a video conference system, or an intelligent robot.
  • the artificial intelligence-based intelligent interactive device control method can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
  • FIG. 2 is a structural block diagram of an artificial intelligence based intelligent interactive device control system according to an embodiment of the present invention.
  • an artificial intelligence-based intelligent interactive device control system 200 includes: a receiving module 210 (such as a camera, an infrared distance sensor, a microphone array), and a face detecting module 220.
  • the receiving module 210 is configured to receive a multi-modal input signal, where the multi-modal input signal includes an image signal, a sound signal, and/or a distance signal input by a user.
  • the face detection module 220 is configured to perform face detection according to the image signal, and acquire the face image and face information when a human face is detected.
  • the lip detection module 230 is configured to perform lip detection based on the facial image to determine a lip motion condition.
  • the sound source positioning module 240 is configured to perform sound source localization according to the sound signal to obtain sound source information.
  • the decision module 250 is configured to determine the user's willingness to interact and the degree of interaction will be based on the face information, the lip motion condition, the sound source information, and/or the distance signal.
  • the composite output control module 260 is configured to control the smart interaction device to perform a corresponding interaction response according to the user's willingness to interact and the willingness to interact.
  • the method further includes: a voice activity detecting module (not shown in FIG. 2), configured to determine the sound signal before the sound source positioning module 240 performs sound source localization according to the sound signal to obtain the sound source information. Whether the voice of the user is spoken, and if so, the voice of the user in the voice signal is kept and filtered from the voice signal Other than interference noise.
  • a voice activity detecting module (not shown in FIG. 2), configured to determine the sound signal before the sound source positioning module 240 performs sound source localization according to the sound signal to obtain the sound source information. Whether the voice of the user is spoken, and if so, the voice of the user in the voice signal is kept and filtered from the voice signal Other than interference noise.
  • a sound signal includes a plurality of sounds, such as a voice and other noises. Therefore, in order to accurately position the voice of the speaker, the sound source is positioned according to the sound signal. Before the sound source information is obtained, the sound signal can be denoised to filter out other noise interference, and the positioning accuracy of the sound source positioning of the speaker's voice can be improved. Specifically, it is determined whether the voice signal contains the voice when the user speaks; if yes, the voice of the voice signal in the voice signal is retained, and other interference noise is filtered out from the voice signal.
  • the voice can be manually
  • the function of speech recognition in intelligence is to recognize the speech of the speaker contained in the sound signal through the speech recognition function, thereby filtering out other noises, thereby enhancing the sound source localization of the speaker's speech. Positioning accuracy.
  • the decision module 250 is further configured to determine, according to the face information, the lip motion condition, the sound source information, and/or the distance signal, the user's willingness to interact and Before the degree of interaction intention is strong, determining whether the face information, the lip motion, the sound source information, and/or the distance signal meet a predetermined condition; if the predetermined condition is met, performing a user's willingness to interact And a strong degree of willingness to interact.
  • the decision module 250 is further configured to determine, according to the face information, the lip motion condition, the sound source information, and/or the distance signal, the user's willingness to interact and Before the degree of interaction is strong, the face information and the lip motion are quantified.
  • the decision module 250 is further configured to: adjust the weight of the face information, the lip motion condition, the sound source information, and/or the distance signal, wherein the weight a judgment result for influencing the user's willingness to interact and a strong degree of interaction intention; the judging the user's willingness to interact and the degree of interaction intention, including: according to the face information, the movement of the lip region, The weight of the sound source information and/or the distance signal determines the user's willingness to interact and the degree of interaction willingness.
  • the face information includes face area information and a face face degree
  • the sound source information includes sound source orientation information and sound intensity information
  • the decision module 250 is configured to: when determining that the user is facing the smart interaction device, the user's lips are not moving, the user is vocalized and the sound intensity is greater than a predetermined intensity, and the user and the user When the distance between the smart interaction devices is less than the preset distance, the user is determined to have weak interaction intention, and the composite output control module 260 is configured to: control the smart interaction device to perform a silent response.
  • the decision module 250 is configured to: when determining that the user is exercising motion on the smart interaction device, the user's lips, the user vocalizing and the sound intensity is less than a predetermined intensity, and the user and the user When the distance between the smart interaction devices is less than the preset distance, the user is determined to have a suspected interaction intention, and the composite output control module 260 is configured to: control the smart interaction device to perform a volume prompt response.
  • the determining module 250 is configured to: when determining that the user is facing the smart interaction Determining that the user has a strong willingness to interact when the user's lips generate motion, the user utters sound and the sound intensity is greater than the predetermined intensity, and the distance between the user and the smart interaction device is less than the preset distance.
  • the composite output control module 260 is configured to: control the smart interaction device to perform a formal interaction response.
  • the decision module 250 is configured to: when it is determined that the user faces the smart interaction device, the user vocalizes and the sound intensity is greater than the predetermined strength, and the user interacts with the smart When the distance between the devices is less than the preset distance, the user is determined to have the willingness to interact, and the composite output control module 260 is configured to: control the smart interaction device to perform a voice/chat interaction response.
  • the decision module 250 is configured to: when the face image is not detected, the user utters the voice and the sound intensity is greater than the predetermined strength, and the distance between the user and the smart interaction device is less than When the preset distance is determined, the user is determined to have a strong suspected interaction intention, and the composite output control module 260 is configured to: control the smart interaction device to turn to the sound source direction and perform a prompt response.
  • the decision module 250 is configured to: when no face image is detected, the user utters sound and the sound intensity is greater than the predetermined intensity, and the distance between the user and the smart interaction device is greater than When the preset distance is determined, the user is determined to have a weak mutual willingness to interact.
  • the composite output control module 260 is configured to: control the response of the smart interactive device to the sound source.
  • the lip detection module 230 is configured to determine the lip motion according to a lip shape difference between the multi-frame facial images.
  • the artificial intelligence-based intelligent interactive device control system can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing the artificial intelligence, determine whether the user has the willingness to interact, and can determine The intensity of the interaction will be strong, and then the intelligent interaction device is controlled autonomously to perform corresponding actions, actively interacting with the user and enriching the interaction means, thereby improving the user experience.
  • the specific implementation manner of the artificial intelligence-based intelligent interactive device control system in the embodiment of the present invention is similar to the specific implementation manner of the artificial intelligence-based intelligent interactive device control method in the embodiment of the present invention.
  • an embodiment of the present invention discloses an intelligent interaction device, including: an artificial intelligence-based intelligent interaction device control system according to any one of the above embodiments.
  • the smart interaction device can collect the user's sound signal, image signal and/or distance signal in real time, and after analyzing by artificial intelligence, determine whether the user has the willingness to interact, and can determine the strong degree of interaction intention, and then control the intelligence autonomously.
  • the interactive device performs corresponding actions, actively interacts with the user, and enriches the interaction means, thereby improving the user experience.
  • first and second are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated.
  • features defining “first” or “second” may include at least one of the features, either explicitly or implicitly.
  • the meaning of "a plurality” is at least two, such as two, three, etc., unless specifically defined otherwise.
  • a "computer-readable medium” can be any apparatus that can contain, store, communicate, propagate, or transport a program for use in an instruction execution system, apparatus, or device, or in conjunction with the instruction execution system, apparatus, or device.
  • computer readable media include the following: electrical connections (electronic devices) having one or more wires, portable computer disk cartridges (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM).
  • the computer readable medium may even be a paper or other suitable medium on which the program can be printed, as it may be optically scanned, for example by paper or other medium, followed by editing, interpretation or, if appropriate, other suitable The method is processed to obtain the program electronically and then stored in computer memory.
  • portions of the invention may be implemented in hardware, software, firmware or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Abstract

L'invention concerne un procédé et un système de commande basés sur l'intelligence artificielle d'un dispositif d'interaction intelligent, et un dispositif d'interaction intelligent. Le procédé consiste à : recevoir des signaux d'entrée multimodaux qui comprennent un signal d'image, un signal sonore et/ou un signal de distance introduits par un utilisateur (S101) ; effectuer la détection de visage humain selon le signal d'image, et acquérir, lorsqu'un visage humain est détecté, des informations relatives à l'image de visage humain et des informations relatives au visage humain (S102) ; effectuer la détection de la région des lèvres selon l'image de visage humain afin de déterminer l'état de mobilité de la région des lèvres (S103) ; positionner une source sonore en fonction du signal sonore afin d'obtenir des informations relatives à la source sonore (S104) ; déterminer l'intention d'interaction de l'utilisateur et le niveau d'intensité de l'intention d'interaction selon les informations relatives au visage humain, à l'état de mobilité de la région des lèvres, et les informations relatives à la source sonore et/ou au signal de distance (S105) ; et commander, selon l'intention d'interaction de l'utilisateur et le niveau d''intensité de l'intention d'interaction, un dispositif d'interaction intelligent pour générer une réponse d'interaction correspondante (S106). A l'aide de ce procédé, l'expérience d'interaction d'un utilisateur pendant l'interaction avec un dispositif d'interaction intelligent est améliorée, de même que l'intelligence du dispositif d'interaction intelligent.
PCT/CN2015/096587 2015-08-24 2015-12-07 Procédé et système de commande basés sur l'intelligence artificielle pour dispositif d'interaction intelligent WO2017031860A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510523179.3 2015-08-24
CN201510523179.3A CN105159111B (zh) 2015-08-24 2015-08-24 基于人工智能的智能交互设备控制方法及系统

Publications (1)

Publication Number Publication Date
WO2017031860A1 true WO2017031860A1 (fr) 2017-03-02

Family

ID=54799999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/096587 WO2017031860A1 (fr) 2015-08-24 2015-12-07 Procédé et système de commande basés sur l'intelligence artificielle pour dispositif d'interaction intelligent

Country Status (2)

Country Link
CN (1) CN105159111B (fr)
WO (1) WO2017031860A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657852A (zh) * 2017-11-14 2018-02-02 翟奕雲 基于人脸识别的幼儿教学机器人、教学系统、存储介质
CN111124109A (zh) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 一种交互方式的选择方法、智能终端、设备及存储介质
CN111694433A (zh) * 2020-06-11 2020-09-22 北京百度网讯科技有限公司 语音交互的方法、装置、电子设备及存储介质
CN111880854A (zh) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 用于处理语音的方法和装置
CN114329654A (zh) * 2022-03-15 2022-04-12 深圳英鸿骏智能科技有限公司 一种基于智慧镜面的交互显示方法和系统

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105912128B (zh) * 2016-04-29 2019-05-24 北京光年无限科技有限公司 面向智能机器人的多模态交互数据处理方法及装置
CN106055105A (zh) * 2016-06-02 2016-10-26 上海慧模智能科技有限公司 机器人和人机交互系统
CN107643509B (zh) * 2016-07-22 2019-01-11 腾讯科技(深圳)有限公司 定位方法、定位系统及终端设备
CN106231234B (zh) * 2016-08-05 2019-07-05 广州小百合信息技术有限公司 视频会议的拍摄方法和系统
CN107273944A (zh) * 2017-05-16 2017-10-20 北京元视觉科技有限公司 自主社交的智能设备、自主交互方法及存储介质
CN107404682B (zh) * 2017-08-10 2019-11-05 京东方科技集团股份有限公司 一种智能耳机
CN109767774A (zh) * 2017-11-08 2019-05-17 阿里巴巴集团控股有限公司 一种交互方法和设备
CN109087636A (zh) * 2017-12-15 2018-12-25 蔚来汽车有限公司 交互设备
CN108388594A (zh) * 2018-01-31 2018-08-10 上海乐愚智能科技有限公司 穿衣提示方法及智能家电
CN108388138A (zh) * 2018-02-02 2018-08-10 宁夏玲杰科技有限公司 设备控制方法、装置及系统
CN108461084A (zh) * 2018-03-01 2018-08-28 广东美的制冷设备有限公司 语音识别系统控制方法、控制装置及计算机可读存储介质
CN108957392A (zh) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 声源方向估计方法和装置
CN110634486A (zh) * 2018-06-21 2019-12-31 阿里巴巴集团控股有限公司 一种语音处理方法及设备
CN109035968B (zh) * 2018-07-12 2020-10-30 杜蘅轩 钢琴学习辅助系统和钢琴
CN109166575A (zh) * 2018-07-27 2019-01-08 百度在线网络技术(北京)有限公司 智能设备的交互方法、装置、智能设备和存储介质
CN110875060A (zh) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 语音信号处理方法、装置、系统、设备和存储介质
CN111230891B (zh) * 2018-11-29 2021-07-27 深圳市优必选科技有限公司 一种机器人及其语音交互系统
CN109803013B (zh) * 2019-01-21 2020-10-23 浙江大学 一种基于人工智能的弱交互系统及其控制方法
CN111724772A (zh) * 2019-03-20 2020-09-29 阿里巴巴集团控股有限公司 一种智能设备的交互方法、装置和智能设备
CN110187766A (zh) * 2019-05-31 2019-08-30 北京猎户星空科技有限公司 一种智能设备的控制方法、装置、设备及介质
CN110309799B (zh) * 2019-07-05 2022-02-08 四川长虹电器股份有限公司 基于摄像头的说话判断方法
CN110335603A (zh) * 2019-07-12 2019-10-15 四川长虹电器股份有限公司 应用于电视场景的多模态交互方法
CN111091823A (zh) * 2019-11-28 2020-05-01 广州赛特智能科技有限公司 基于语音及人脸动作的机器人控制系统、方法及电子设备
CN112102546A (zh) * 2020-08-07 2020-12-18 浙江大华技术股份有限公司 一种人机交互控制方法、对讲呼叫方法及相关装置
CN113608449B (zh) * 2021-08-18 2023-09-15 四川启睿克科技有限公司 一种智慧家庭场景下语音设备定位系统及自动定位方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008126329A (ja) * 2006-11-17 2008-06-05 Toyota Motor Corp 音声認識ロボットおよび音声認識ロボットの制御方法
JP2008152125A (ja) * 2006-12-19 2008-07-03 Toyota Central R&D Labs Inc 発話検出装置及び発話検出方法
CN102298443A (zh) * 2011-06-24 2011-12-28 华南理工大学 结合视频通道的智能家居语音控制系统及其控制方法
CN102360187A (zh) * 2011-05-25 2012-02-22 吉林大学 语谱图互相关的驾驶员汉语语音控制系统及方法
CN103745723A (zh) * 2014-01-13 2014-04-23 苏州思必驰信息科技有限公司 一种音频信号识别方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6964023B2 (en) * 2001-02-05 2005-11-08 International Business Machines Corporation System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input
WO2010126321A2 (fr) * 2009-04-30 2010-11-04 삼성전자주식회사 Appareil et procédé pour inférence d'intention utilisateur au moyen d'informations multimodes
KR101568347B1 (ko) * 2011-04-12 2015-11-12 한국전자통신연구원 지능형 로봇 특성을 갖는 휴대형 컴퓨터 장치 및 그 동작 방법
AU2014236686B2 (en) * 2013-03-15 2017-06-15 Ntt Disruption Us, Inc. Apparatus and methods for providing a persistent companion device
CN104777910A (zh) * 2015-04-23 2015-07-15 福州大学 一种表情识别应用于显示器的方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008126329A (ja) * 2006-11-17 2008-06-05 Toyota Motor Corp 音声認識ロボットおよび音声認識ロボットの制御方法
JP2008152125A (ja) * 2006-12-19 2008-07-03 Toyota Central R&D Labs Inc 発話検出装置及び発話検出方法
CN102360187A (zh) * 2011-05-25 2012-02-22 吉林大学 语谱图互相关的驾驶员汉语语音控制系统及方法
CN102298443A (zh) * 2011-06-24 2011-12-28 华南理工大学 结合视频通道的智能家居语音控制系统及其控制方法
CN103745723A (zh) * 2014-01-13 2014-04-23 苏州思必驰信息科技有限公司 一种音频信号识别方法及装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657852A (zh) * 2017-11-14 2018-02-02 翟奕雲 基于人脸识别的幼儿教学机器人、教学系统、存储介质
CN107657852B (zh) * 2017-11-14 2023-09-22 翟奕雲 基于人脸识别的幼儿教学机器人、教学系统、存储介质
CN111124109A (zh) * 2019-11-25 2020-05-08 北京明略软件系统有限公司 一种交互方式的选择方法、智能终端、设备及存储介质
CN111694433A (zh) * 2020-06-11 2020-09-22 北京百度网讯科技有限公司 语音交互的方法、装置、电子设备及存储介质
CN111880854A (zh) * 2020-07-29 2020-11-03 百度在线网络技术(北京)有限公司 用于处理语音的方法和装置
CN111880854B (zh) * 2020-07-29 2024-04-30 百度在线网络技术(北京)有限公司 用于处理语音的方法和装置
CN114329654A (zh) * 2022-03-15 2022-04-12 深圳英鸿骏智能科技有限公司 一种基于智慧镜面的交互显示方法和系统
CN114329654B (zh) * 2022-03-15 2022-05-20 深圳英鸿骏智能科技有限公司 一种基于智慧镜面的交互显示方法和系统

Also Published As

Publication number Publication date
CN105159111A (zh) 2015-12-16
CN105159111B (zh) 2019-01-25

Similar Documents

Publication Publication Date Title
WO2017031860A1 (fr) Procédé et système de commande basés sur l'intelligence artificielle pour dispositif d'interaction intelligent
CN110291489B (zh) 计算上高效的人类标识智能助理计算机
CN107077847B (zh) 关键短语用户识别的增强
US9899025B2 (en) Speech recognition system adaptation based on non-acoustic attributes and face selection based on mouth motion using pixel intensities
US10019992B2 (en) Speech-controlled actions based on keywords and context thereof
CN112074901A (zh) 语音识别登入
US11699442B2 (en) Methods and systems for speech detection
KR102230667B1 (ko) 오디오-비주얼 데이터에 기반한 화자 분리 방법 및 장치
CN110808048A (zh) 语音处理方法、装置、系统及存储介质
WO2014209262A1 (fr) Détection de la parole sur la base de mouvements du visage
JP6562790B2 (ja) 対話装置および対話プログラム
US10325600B2 (en) Locating individuals using microphone arrays and voice pattern matching
KR20200085696A (ko) 사람의 감성 상태를 결정하기 위하여 영상을 처리하는 감성인식 방법
JP6891601B2 (ja) ロボットの制御プログラム、ロボット装置、及びロボットの制御方法
US20220335937A1 (en) Acoustic zoning with distributed microphones
CN115461811A (zh) 用于多方交互的多模态波束成形和注意力过滤
US11743588B1 (en) Object selection in computer vision
CN114449320A (zh) 一种播放控制方法、装置、存储介质及电子设备
EP3839719B1 (fr) Dispositif de calcul et son procédé de fonctionnement
JP2018051648A (ja) ロボット制御装置、ロボット、ロボット制御方法、及びプログラム
US20210392427A1 (en) Systems and Methods for Live Conversation Using Hearing Devices
JP2022147989A (ja) 発話制御装置、発話制御方法及び発話制御プログラム
CN117116250A (zh) 语音交互的拒识方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15902129

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15902129

Country of ref document: EP

Kind code of ref document: A1