CN108052079B - Device control method, device control apparatus, and storage medium - Google Patents

Device control method, device control apparatus, and storage medium Download PDF

Info

Publication number
CN108052079B
CN108052079B CN201711315879.9A CN201711315879A CN108052079B CN 108052079 B CN108052079 B CN 108052079B CN 201711315879 A CN201711315879 A CN 201711315879A CN 108052079 B CN108052079 B CN 108052079B
Authority
CN
China
Prior art keywords
operation instruction
recognition result
action
target object
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711315879.9A
Other languages
Chinese (zh)
Other versions
CN108052079A (en
Inventor
林形省
冯智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201711315879.9A priority Critical patent/CN108052079B/en
Publication of CN108052079A publication Critical patent/CN108052079A/en
Application granted granted Critical
Publication of CN108052079B publication Critical patent/CN108052079B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM]
    • G05B19/4183Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS], computer integrated manufacturing [CIM] characterised by data acquisition, e.g. workpiece identification
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B15/00Systems controlled by a computer
    • G05B15/02Systems controlled by a computer electric
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/26Pc applications
    • G05B2219/2642Domotique, domestic, home control, automation, smart house
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The disclosure provides a device control method, a device control device and a storage medium, and belongs to the technical field of smart home. The method comprises the following steps: when the target object is detected to face the intelligent equipment, acquiring the five sense organs action or the limb action of the target object; acquiring an operation instruction corresponding to the action according to the acquired action; and responding to the operation instruction, and executing the equipment control process corresponding to the operation instruction. According to the method and the device, the five sense organs or the limbs of the user are collected, the corresponding operation instruction is obtained and responded according to the action of the user, the instruction sent by the user can be accurately received even in a noisy environment, and the success rate of the device control method can be improved.

Description

Device control method, device control apparatus, and storage medium
Technical Field
The present disclosure relates to the field of smart home technologies, and in particular, to an apparatus control method and apparatus, an apparatus control apparatus, and a storage medium.
Background
With the development of smart home technology, smart devices come up endlessly, and more functions can be realized. There are many scenarios that require a user to enter relevant instructions to control a smart device.
At present, a user usually makes a sound, when the intelligent device collects the sound of the user, the sound can be subjected to voice recognition, which voice instruction the sound corresponds to is determined according to a voice recognition result, and then the voice instruction corresponding to the sound is executed, so that the intelligent device is controlled.
Disclosure of Invention
The present disclosure provides a device control method, apparatus, device control apparatus, and storage medium, which can solve the problem of low power consumption in the related art.
According to a first aspect of the embodiments of the present disclosure, there is provided an apparatus control method applied to an intelligent apparatus, including:
when the target object is detected to face the intelligent equipment, acquiring the five sense organs action or the limb action of the target object;
acquiring an operation instruction corresponding to the action according to the acquired action;
and responding to the operation instruction, and executing the equipment control process corresponding to the operation instruction.
In one possible implementation manner of the first aspect, the method further includes:
and acquiring images in real time, and determining that the target object faces the intelligent equipment when the facial features are detected in the acquired images.
In one possible implementation manner of the first aspect, the method further includes:
carrying out image acquisition in real time, carrying out face detection on the acquired image, and positioning the eyeball center and the pupil center of a target object;
determining the direction of a connecting line of the eyeball center and the pupil center as the sight line direction of a target object;
when the fact that the sight line direction of the target object passes through the intelligent device is detected, it is determined that the target object faces the intelligent device.
In one possible implementation manner of the first aspect, the method further includes:
when the target object is detected to face the intelligent equipment, starting a timing function, and acquiring images in real time in the timing process;
and when the timing does not reach the preset duration, ignoring the operation instruction when detecting that the target object does not face the intelligent equipment any more.
In a possible implementation manner of the first aspect, the obtaining, according to the collected action, an operation instruction corresponding to the action includes:
when the collected action is a lip action and/or a gesture action, performing lip language recognition on the lip action and/or performing gesture recognition on the gesture action to obtain a lip language recognition result and/or a gesture recognition result;
and acquiring an operation instruction corresponding to the action according to the lip language recognition result and/or the gesture recognition result.
In a possible implementation manner of the first aspect, the obtaining, according to the lip language recognition result and/or the gesture recognition result, an operation instruction corresponding to the action includes:
when the collected action is a lip action, acquiring an operation instruction matched with the lip language identification result;
when the collected action is a gesture action, acquiring an operation instruction matched with the gesture recognition result;
and when the collected actions are lip actions and gesture actions, respectively calculating weights corresponding to the recognition results, and acquiring the operation instructions corresponding to the actions based on the operation instructions and the weights corresponding to the recognition results.
In a possible implementation manner of the first aspect, the obtaining, according to the collected action, an operation instruction corresponding to the action further includes:
carrying out image acquisition in real time, and carrying out face recognition on the acquired image when facial features are detected in the acquired image to obtain a facial expression recognition result of the target object;
respectively calculating weights corresponding to the facial expression recognition result and the action recognition result;
and acquiring the operation instruction corresponding to the action based on the operation instruction corresponding to each recognition result and the weight.
In one possible implementation manner of the first aspect, the calculating weights corresponding to the recognition results, and acquiring the operation instruction corresponding to the action based on the operation instruction corresponding to each recognition result and the weight respectively includes:
respectively matching each recognition result with a preset operation instruction corresponding to the type of each recognition result to obtain at least one operation instruction corresponding to each recognition result and the probability corresponding to each operation instruction;
determining the operation instruction corresponding to the probability maximum value in the probabilities of the identification results as the operation instruction corresponding to the identification results;
determining the weight corresponding to each recognition result according to the corresponding relation between the preset probability maximum value and the weight;
based on the weight corresponding to each recognition result, performing weighted calculation on the operation instruction corresponding to each recognition result to obtain a comprehensive weight corresponding to each operation instruction;
and acquiring the operation instruction with the maximum comprehensive weight.
In one possible implementation manner of the first aspect, the method further includes:
performing voice acquisition in real time, and performing voice recognition on the acquired voice to obtain a voice recognition result of the target object;
matching the voice recognition result with a preset operation instruction corresponding to the voice to obtain at least one operation instruction corresponding to the voice recognition result and the probability corresponding to each operation instruction;
when the probability maximum value in the probabilities of the voice recognition results is smaller than a preset threshold value, respectively calculating the weight corresponding to each recognition result;
and executing the step of acquiring the operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result.
In one possible implementation manner of the first aspect, the method further includes:
when the probability maximum value in the probabilities of the voice recognition results is larger than or equal to a preset threshold value, acquiring an operation instruction corresponding to the probability maximum value in the probabilities of the voice recognition results;
and executing the step of responding to the operating instruction and executing the equipment control process corresponding to the operating instruction.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus control device applied to an intelligent apparatus, including:
the acquisition module is used for acquiring the five sense organs action or the limb action of the target object when the target object is detected to face the intelligent equipment;
the acquisition module is used for acquiring an operation instruction corresponding to the action according to the acquired action;
and the processing module is used for responding to the operation instruction and executing the equipment control process corresponding to the operation instruction.
In one possible implementation manner of the second aspect, the apparatus further includes:
the first detection module is used for acquiring images in real time, and when the facial features are detected in the acquired images, the detected target object faces the intelligent device.
In one possible implementation manner of the second aspect, the apparatus further includes:
the positioning module is used for acquiring images in real time, detecting faces of the acquired images and positioning the eyeball center and the pupil center of the target object;
the first determination module is used for determining the direction of a connecting line of the eyeball center and the pupil center as the sight line direction of the target object;
and the second detection module is used for determining that the target object faces the intelligent equipment when the fact that the sight line direction of the target object passes through the intelligent equipment is detected.
In one possible implementation manner of the second aspect, the apparatus further includes:
the timing module is used for starting a timing function when the target object is detected to face the intelligent equipment, and acquiring images in real time in the timing process;
the processing module is further configured to ignore the operation instruction when the timing does not reach the preset duration and the target object is detected to no longer face the intelligent device.
In one possible implementation manner of the second aspect, the apparatus further includes:
the action recognition module is used for carrying out lip language recognition on the lip action and/or carrying out gesture recognition on the gesture action when the collected action is the lip action and/or the gesture action to obtain a lip language recognition result and/or a gesture recognition result;
and the acquisition module is used for acquiring the operation instruction corresponding to the action according to the lip language recognition result and/or the gesture recognition result.
In one possible implementation manner of the second aspect, the obtaining module is configured to:
when the collected action is a lip action, acquiring an operation instruction matched with the lip language identification result; or the like, or, alternatively,
when the collected action is a gesture action, acquiring an operation instruction matched with the gesture recognition result; or the like, or, alternatively,
and when the collected actions are lip actions and gesture actions, respectively calculating weights corresponding to the recognition results, and acquiring the operation instructions corresponding to the actions based on the operation instructions and the weights corresponding to the recognition results.
In one possible implementation manner of the second aspect, the apparatus further includes:
the face recognition module is used for acquiring images in real time, and when facial features are detected in the acquired images, the face recognition module is used for carrying out face recognition on the acquired images to obtain a facial expression recognition result of the target object;
the first calculation module is used for respectively calculating the weights corresponding to the facial expression recognition result and the action recognition result;
the obtaining module is further configured to obtain an operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result.
In one possible implementation manner of the second aspect, the apparatus further includes:
the first matching module is used for respectively matching each recognition result with a preset operation instruction corresponding to the type of each recognition result to obtain at least one operation instruction corresponding to each recognition result and the probability corresponding to each operation instruction;
the second determining module is used for determining the operation instruction corresponding to the probability maximum value in the probabilities of the identification results as the operation instruction corresponding to the identification results;
the second determining module is further configured to determine a weight corresponding to each recognition result according to a preset correspondence between the maximum probability value and the weight;
the second calculation module is used for performing weighted calculation on the operation instructions corresponding to the identification results based on the weights corresponding to the identification results to obtain a comprehensive weight corresponding to each operation instruction;
the obtaining module is further configured to obtain the operation instruction with the largest comprehensive weight.
In one possible implementation manner of the second aspect, the apparatus further includes:
the voice recognition module is used for carrying out voice collection in real time and carrying out voice recognition on the collected voice to obtain a voice recognition result of the target object;
the second matching module is used for matching the voice recognition result with a preset operation instruction corresponding to the voice to obtain at least one operation instruction corresponding to the voice recognition result and the probability corresponding to each operation instruction;
the third calculation module is used for calculating the weight corresponding to each recognition result when the probability maximum value in the probabilities of the voice recognition results is smaller than a preset threshold value;
the obtaining module is further configured to execute the step of obtaining the operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result.
In a possible implementation manner of the second aspect, the obtaining module is further configured to obtain, when a maximum probability value in the probabilities of the speech recognition results is greater than or equal to a preset threshold, an operation instruction corresponding to the maximum probability value in the probabilities of the speech recognition results;
and the processing module is used for executing the step of responding to the operating instruction and executing the equipment control process corresponding to the operating instruction.
According to a third aspect of the embodiments of the present disclosure, there is provided an apparatus control device including: a processor; a memory for storing processor-executable instructions;
wherein the processor is configured to: when the target object is detected to face the intelligent equipment, acquiring the five sense organs action or the limb action of the target object; acquiring an operation instruction corresponding to the action according to the acquired action; and responding to the operation instruction, and executing the equipment control process corresponding to the operation instruction.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored therein a computer program which, when executed by a processor, performs the method steps of any one of the first aspects.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the embodiment of the disclosure collects the five sense organ actions or the limb actions of the user, acquires and responds to the corresponding operation instruction according to the actions of the user, and can accurately receive the instruction sent by the user even in a noisy environment, so that the success rate of the device control method can be improved. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a flow chart illustrating a device control method according to an exemplary embodiment.
FIG. 2 is a flow chart illustrating a method of device control according to an exemplary embodiment.
FIG. 3 is a flow chart illustrating a method of device control according to an exemplary embodiment.
Fig. 4 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 5 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 6 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 7 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 8 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 9 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 10 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 11 is a block diagram illustrating an apparatus control device according to an exemplary embodiment.
Fig. 12 is a block diagram illustrating an appliance control apparatus 1200 according to an example embodiment.
Detailed Description
To make the objects, technical solutions and advantages of the present disclosure more apparent, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a device control method according to an exemplary embodiment, which is used in an intelligent device, as shown in fig. 1, and includes the following steps.
In step 101, when it is detected that the target object faces the smart device, the smart device collects the five sense organs or limbs of the target object.
In step 102, according to the collected action, the smart device obtains an operation instruction corresponding to the action.
In step 103, the smart device responds to the operation instruction, and executes a device control process corresponding to the operation instruction.
According to the method and the device, the five sense organs or the limbs of the user are collected, the corresponding operation instruction is obtained and responded according to the action of the user, the instruction sent by the user can be accurately received even in a noisy environment, and the success rate of the device control method can be improved.
In one possible implementation, the method further comprises: and (3) acquiring images in real time, and determining that the target object faces the intelligent equipment when the facial features are detected in the acquired images.
In one possible implementation, the method further comprises:
carrying out image acquisition in real time, carrying out face detection on the acquired image, and positioning the eyeball center and the pupil center of a target object;
determining the direction of a connecting line of the eyeball center and the pupil center as the sight line direction of the target object;
when the sight line direction of the target object is detected to pass through the intelligent device, the fact that the target object faces the intelligent device is determined to be detected.
In one possible implementation, the method further comprises:
when the target object is detected to face the intelligent equipment, starting a timing function, and acquiring images in real time in the timing process;
and when the timing does not reach the preset time length, ignoring the operation instruction when detecting that the target object does not face the intelligent equipment any more.
In a possible implementation manner, the obtaining, according to the collected action, an operation instruction corresponding to the action includes:
when the collected action is a lip action and/or a gesture action, performing lip language recognition on the lip action and/or performing gesture recognition on the gesture action to obtain a lip language recognition result and/or a gesture recognition result;
and acquiring an operation instruction corresponding to the action according to the lip language recognition result and/or the gesture recognition result.
In a possible implementation manner, the obtaining, according to the lip language recognition result and/or the gesture recognition result, an operation instruction corresponding to the action includes:
when the collected action is a lip action, acquiring an operation instruction matched with the lip language identification result;
when the collected action is a gesture action, acquiring an operation instruction matched with the gesture recognition result;
and when the collected actions are lip actions and gesture actions, respectively calculating weights corresponding to the recognition results, and acquiring the operation instruction corresponding to the action based on the operation instruction and the weights corresponding to the recognition results.
In a possible implementation manner, the obtaining, according to the collected action, an operation instruction corresponding to the action further includes:
carrying out image acquisition in real time, and carrying out face recognition on the acquired image when facial features are detected in the acquired image to obtain a facial expression recognition result of the target object;
respectively calculating the weights corresponding to the facial expression recognition result and the action recognition result;
and acquiring the operation instruction corresponding to the action based on the operation instruction corresponding to each recognition result and the weight.
In one possible implementation manner, the calculating weights corresponding to the recognition results respectively, and acquiring the operation instruction corresponding to the action based on the operation instruction and the weights corresponding to the recognition results respectively includes:
respectively matching each recognition result with a preset operation instruction corresponding to the type of each recognition result to obtain at least one operation instruction corresponding to each recognition result and the probability corresponding to each operation instruction;
determining the operation instruction corresponding to the probability maximum value in the probabilities of the identification results as the operation instruction corresponding to the identification results;
determining the weight corresponding to each recognition result according to the corresponding relation between the preset probability maximum value and the weight;
based on the weight corresponding to each recognition result, performing weighted calculation on the operation instruction corresponding to each recognition result to obtain a comprehensive weight corresponding to each operation instruction;
and acquiring the operation instruction with the maximum comprehensive weight.
In one possible implementation, the method further comprises:
performing voice acquisition in real time, and performing voice recognition on the acquired voice to obtain a voice recognition result of the target object;
matching the voice recognition result with a preset operation instruction corresponding to the voice to obtain at least one operation instruction corresponding to the voice recognition result and the probability corresponding to each operation instruction;
when the probability maximum value in the probabilities of the voice recognition results is smaller than a preset threshold value, respectively calculating the weight corresponding to each recognition result;
and executing the step of acquiring the operation instruction corresponding to the action based on the operation instruction corresponding to each recognition result and the weight.
In one possible implementation, the method further comprises:
when the probability maximum value in the probabilities of the voice recognition results is larger than or equal to a preset threshold value, acquiring an operation instruction corresponding to the probability maximum value in the probabilities of the voice recognition results;
and executing the step of responding to the operation instruction and executing the equipment control process corresponding to the operation instruction.
Fig. 2 is a flowchart illustrating a device control method applied to an intelligent device according to an exemplary embodiment, and referring to fig. 2, the device control method includes the steps of:
in step 201, when it is detected that the target object faces the smart device, the smart device collects the five sense organs or limbs of the target object.
An action is a behavior that can be recognized based on an image, and thus, an action-based control function can be set on the smart device. The motion-based control function may be: when the intelligent device detects that the target object faces the intelligent device through the camera, the intelligent device can recognize the action to determine and execute the operation instruction corresponding to the action, and the action of the target object can be five-sense organ action or limb action. When the smart device determines that the target object is oriented to the smart device, it may be assumed that the user is likely in operational control over it, and thus, actions of the target object may be collected to determine whether the target object is conveying relevant control information to the smart device.
The detection by the smart device that the target object faces the smart device may include, but is not limited to, the following two implementation manners:
in the first implementation manner, the intelligent device performs image acquisition in real time, and when the facial features are detected in the acquired image, it is determined that the target object faces the intelligent device.
In this first implementation manner, the intelligent device can detect facial features, which indicates that the difference between the orientation of the facial feature and the direction of the intelligent device is not large. Further, the intelligent device can also position the centers of the pupils of the two eyes of the human face, calculate the distance between the centers of the two pupils, and determine that the target object faces the intelligent device when the distance meets a preset distance range. Wherein, the preset distance range is the pupil distance range of a common user.
In a second implementation mode, the intelligent device performs image acquisition in real time, performs face detection on the acquired image, and positions the eyeball center and the pupil center of the target object; determining the direction of a connecting line of the eyeball center and the pupil center as the sight line direction of the target object; when the sight line direction of the target object is detected to pass through the intelligent device, the fact that the target object faces the intelligent device is determined to be detected.
In the second implementation manner, a connection line direction between the center of the eyeball and the center of the pupil is generally a human sight line direction, and when it is determined that the sight line direction of the target object passes through the smart device, it may be determined that the target object is looking at the smart device, so that it may be determined that the target object is facing the smart device.
After the intelligent device is subjected to the detection operation, the specific implementation process of acquiring the five sense organs movement or the limb movement of the target object is as follows:
taking the five sense organs movement collected by the intelligent device as the lip movement as an example, when the intelligent device collects an image, the lip area of the target object in the image can be detected, and whether the target object has the lip movement or not can be determined. Specifically, the intelligent device may perform comparative analysis on the lip contour of the target object in the multi-frame image, and determine whether the lip contour of the target object changes.
Similarly, the smart device may also detect a limb area of the target object in the image, and the limb motion of the target object may be a static motion (e.g., a gesture motion) or a dynamic motion (e.g., a hand waving motion), so that the smart device may locate the limb of the target object in each frame of the image to recognize the limb motion.
Furthermore, the intelligent device can be controlled based on voice, and the success rate of voice recognition of the intelligent device to the user is low in a noisy environment or when the voice sent by the user is small, so that the intelligent device can be controlled by combining a voice-based control function and an action-based control function.
In step 202, when the collected motion is a lip motion and/or a gesture motion, the smart device performs lip language recognition on the lip motion and/or performs gesture recognition on the gesture motion to obtain a lip language recognition result and/or a gesture recognition result.
Because the recognition technology accuracy of the lip action and the gesture action is high, and the lip action and the gesture action of the user are convenient and standard, the intelligent device can be generally set to collect the lip action and/or the gesture action of the user, and certainly, the intelligent device can also collect other five sense organ actions or limb actions, such as eye actions, leg actions and the like. In the embodiments of the present disclosure, only lip movements and gesture movements are taken as examples for explanation.
The smart device may execute different recognition steps according to different actions when acquiring the action of the target object, acquire different actions of the target object, and execute the recognition step corresponding to the action on the different actions, and in practical applications, the step 202 may include the following three cases:
in the first situation, when the collected action is a lip action, the intelligent device conducts lip language recognition on the lip action to obtain a lip language recognition result.
In this first case, when the intelligent device performs lip contour detection on the acquired image, the lip feature of the target object can be extracted, and the lip feature is compared with preset lip language identification information to obtain a lip language identification result corresponding to the lip feature, and the lip language identification information can be stored in a database of the intelligent device in advance.
In practical implementation, the smart device may perform model training based on a large number of images including the target object and a known lip language recognition result to establish a lip language recognition model, and when the smart device detects that the target object faces the smart device, the smart device may input the acquired image of the target object into the lip language recognition model, or input the image of the lip region of the target object into the lip language recognition model, and the lip language recognition process is implemented by the lip language recognition model, which is not described herein in detail.
And in the second situation, when the collected action is a gesture action, the intelligent equipment performs gesture recognition on the gesture action to obtain a gesture recognition result.
In the second case, the smart device may locate the hand of the target object in the collected multi-frame image, match the location result of the gesture of the target object in the multi-frame image with the gesture template, and determine which one or more of the gesture templates the location result of the gesture may be. And when the gesture positioning results of the target object in the multi-frame image are the same, the gesture action acquired by the intelligent equipment is taken as a static gesture action. And when the gesture positioning results of the target objects in the multi-frame images are different, the gesture action acquired by the intelligent equipment is indicated as a dynamic gesture action. For the dynamic gesture actions, the intelligent device can analyze the motion track of the gesture of the target object besides performing the gesture matching, and determine the gesture actions corresponding to the motion track. Of course, the intelligent device may also recognize the gesture motion through a gesture recognition algorithm such as an edge contour extraction method, a centroid finger and other multi-feature synthesis method, which is not specifically limited by the present disclosure.
In practical implementation, the smart device may perform model training based on a large number of images including gesture motions of the target object and a known gesture recognition result to establish a gesture recognition model, and when the smart device detects that the target object faces the smart device, the acquired image of the target object may be input into the gesture recognition model, and the gesture recognition process is implemented by the gesture recognition model, which is not repeated herein.
And in the third situation, when the collected actions are lip actions and gesture actions, the intelligent equipment performs lip language recognition on the lip actions and performs gesture recognition on the gesture actions to obtain lip language recognition results and gesture recognition results.
In the third case, when it is detected that the user faces the intelligent device, the intelligent device collects both the lip motion and the gesture motion, so that the intelligent device can respectively recognize the two motions to obtain recognition results of the two motions, and the specific recognition processes of the two motions are the same as those in the first case and the second case, which is not repeated herein.
In step 203, according to the lip language recognition result and/or the gesture recognition result, the smart device obtains an operation instruction corresponding to the action.
The intelligent device may store a plurality of preset operation instructions, may also store correspondence between different types of recognition results and the preset operation instructions, and correspondence between different types of recognition results and the preset operation instructions, for example, the intelligent device may store correspondence between lip language recognition results and the preset operation instructions, and may also store correspondence between different lip language recognition results and the preset operation instructions. Of course, the user may also set by self-definition which recognition result corresponds to which preset operation instruction on the intelligent device according to the use habit of the user, which is not limited by the present disclosure.
The intelligent device acquires different actions and different identification results, and the intelligent device acquires different corresponding operation instructions according to the identification results. Corresponding to the three cases in step 202, step 203 may also include three cases:
in the first case, when the collected action is a lip action, the intelligent device acquires an operation instruction matched with the lip language recognition result.
And in the second situation, when the collected action is a gesture action, the intelligent equipment acquires an operation instruction matched with the gesture recognition result.
In both the first case and the second case, an action is collected, and in both cases, the specific implementation process of the smart device for obtaining the operation instruction corresponding to the action is the same, and here, only the lip action is taken as an example to describe the specific process of the smart device for obtaining the operation instruction matched with the lip language recognition result, where the specific process may include the following 3 steps:
(1) the intelligent device matches the lip language recognition result with the preset operation instruction corresponding to the lip action to obtain at least one preset operation instruction corresponding to the lip language recognition result, and calculates the probability when the lip language recognition result corresponds to each preset operation instruction.
(2) And the intelligent equipment determines the operation instruction corresponding to the maximum probability value in the probabilities of the lip language identification results as the operation instruction matched with the lip language identification results.
(3) And the intelligent equipment acquires the operation instruction matched with the lip language identification result.
For example, the smart device may be a smart speaker, and the smart speaker performs lip language recognition on the collected image to obtain a lip language recognition result, where the lip language recognition result may be: and the intelligent sound box matches the lip language identification result with a preset operation instruction to obtain that the lip language identification result can correspond to a wakeup operation instruction, a play operation instruction and a pause operation instruction, the probability of the lip language identification result corresponding to the wakeup operation instruction is 85%, the probability of the lip language identification result corresponding to the play operation instruction is 10%, the probability of the lip language identification result corresponding to the pause operation instruction is 5%, and the intelligent sound box determines the wakeup operation instruction with the probability of 85% as the operation instruction of the lip language identification result and obtains the wakeup operation instruction.
And in the third situation, when the collected actions are lip actions and gesture actions, the intelligent device respectively calculates weights corresponding to the recognition results, and obtains the operation instructions corresponding to the actions based on the operation instructions and the weights corresponding to the recognition results.
In the third situation, the intelligent device collects both the lip action and the gesture action, so that the operation instruction corresponding to the action can be determined by combining the recognition results of the lip action and the gesture action, the action of the target object can be recognized in a multi-dimensional manner, and the success rate of controlling the intelligent device is improved.
In the third case, the specific process of calculating the weight and obtaining the operation instruction corresponding to the action of the target object based on the weight may be implemented by the following steps (1) to (5):
(1) and the intelligent equipment respectively matches each recognition result with a preset operation instruction corresponding to the type of each recognition result to obtain at least one operation instruction corresponding to each recognition result and the probability corresponding to each operation instruction.
(2) And the intelligent equipment determines the operation instruction corresponding to the maximum probability value in the probabilities of the identification results as the operation instruction corresponding to the identification results.
(3) And the intelligent equipment determines the weight corresponding to each recognition result according to the corresponding relation between the preset probability maximum value and the weight.
(4) And the intelligent equipment performs weighted calculation on the operation instructions corresponding to the identification results based on the weights corresponding to the identification results to obtain the comprehensive weight corresponding to each operation instruction.
(5) And the intelligent equipment acquires the operation instruction with the maximum comprehensive weight.
For example, the smart device may be a smart speaker, the smart speaker collects lip movements and gesture movements, and the lip language recognition result may be: and determining that the operation instruction corresponding to the lip language identification result is a wake-up operation instruction through operations such as matching, and the probability of the lip language identification result corresponding to the wake-up operation instruction is 85%. The gesture recognition result can be 'waving' hand, the gesture recognition result is determined to be a playing operation instruction through operations such as matching, the probability when the gesture recognition result corresponds to the playing operation instruction is 50%, the intelligent sound box determines that the weight of the lip language recognition result is 0.85% corresponding to 85%, the weight corresponding to the gesture recognition result is 0.5% corresponding to 50%, the intelligent sound box performs weighting calculation to obtain the comprehensive weight of 0.85 corresponding to the awakening operation instruction, and the comprehensive weight of 0.5 corresponding to the playing operation instruction, so that the intelligent sound box obtains the awakening operation instruction corresponding to the comprehensive weight of 0.85. Or, the smart speaker obtains two types of recognition results, and when calculating the weight, the relative weight of the two types of recognition results may be calculated, for example, the weight corresponding to the lip language recognition result is 0.85/(0.85+0.5), that is, 0.63, and the weight corresponding to the gesture recognition result is 0.5/(0.85+0.5), that is, 0.37, and accordingly, the smart speaker obtains the wake-up operation instruction with a large comprehensive weight.
It should be noted that the correspondence between the maximum probability value and the weight is merely an exemplary description in the embodiment of the present disclosure, and when the number of the types of the collected recognition results changes, the adjustment rule of the correspondence between the maximum probability value and the weight may be determined by a person skilled in the art through an experimental result, so as to improve the success rate of the intelligent device in controlling the device in combination with multiple recognition results.
The above steps 202 and 203 are processes of acquiring an operation instruction corresponding to the collected action, and the intelligent device may collect various actions and implement operation control on the intelligent device based on the identification result of the action.
In step 204, the smart device responds to the operation instruction, and executes a device control process corresponding to the operation instruction.
For example, the intelligent device obtains an operation instruction, the operation instruction is to wake up the intelligent device, and when the intelligent device responds to the operation instruction, the intelligent device is switched from a sleep state to a working state.
In a possible implementation manner, the smart device may further determine whether the target object is performing operation control on the smart device according to the time that the target object faces the smart device, and if the time that the target object faces the smart device is short, the target object may be in daily activities and does not want to perform operation control on the smart device. Specifically, when it is detected that the target object faces the intelligent device, the intelligent device starts a timing function, image acquisition is performed in real time in a timing process, and when the timing does not reach a preset duration, the intelligent device detects that the target object does not face the intelligent device any more, and the intelligent device can ignore the operation instruction. In practical application, the preset time length can be adjusted according to the actual requirements of the user. By setting the preset time, the response to the daily behavior of the user can be avoided, and the accuracy of the device control method is improved.
The steps 201 to 203 analyze the actions of the target object by acquiring the actions of the target object, and determine the process of the operation instruction corresponding to the actions, so that the step 204 can be executed to realize the device control process. In practical application, the intelligent device can acquire other characteristics of the target object besides executing steps 201 to 203, determine and acquire an operation instruction by combining with other types of identification results, and identify the behavior of the user in a multi-dimensional manner, so that the identification accuracy is improved, and the success rate of the device control method is improved. For example, other types of recognition results may be facial expression recognition results and/or speech recognition results. The following describes a case of combining the motion recognition result with the facial expression recognition result and/or the voice recognition result.
The intelligent device combines the action recognition result and the facial expression recognition result, and the process of obtaining the operation instruction can be as follows: the intelligent equipment acquires images in real time, and when facial features are detected in the acquired images, the intelligent equipment performs face recognition on the acquired images to obtain facial expression recognition results of the target object; the intelligent equipment respectively calculates the weights corresponding to the facial expression recognition result and the action recognition result; and based on the operation instruction and the weight corresponding to each recognition result, the intelligent equipment acquires the operation instruction corresponding to the action. The specific process of calculating the weight of each type of recognition result and obtaining the operation instruction is the same as the third case in step 203, and is not repeated here.
When the smart device combines the action recognition result with the facial expression recognition result, there is another possible scenario: the smart device detects that the target object faces the smart device, and although no action of the target object is acquired, the smart device obtains a facial expression recognition result of the target object through the acquired image. In such a scenario, the target object may just happen to face the smart device, but does not want to perform operation control on the smart device, and therefore, the smart device may take the operation instruction corresponding to the facial expression recognition result as an invalid instruction.
The intelligent device combines the action recognition result with the voice recognition result to obtain the operation instruction, and combines the action recognition result with the voice recognition result and the facial expression recognition result, and the process of obtaining the operation instruction is the same as the process of obtaining the operation instruction combined with the facial expression recognition result. In this design, the smart device may acquire the user's actions through the above steps 201 to 202 to obtain an action recognition result, or may acquire the facial expression recognition result by acquiring the facial expression of the target object, and may also adopt the following steps (1) to (4) to acquire the voice of the target object and combine a plurality of types of recognition results to acquire and respond to the corresponding operation instruction.
(1) The intelligent device carries out voice acquisition in real time, carries out voice recognition on the acquired voice and obtains a voice recognition result of the target object.
(2) The intelligent device matches the voice recognition result with a preset operation instruction corresponding to the voice to obtain at least one operation instruction corresponding to the voice recognition result and the probability corresponding to each operation instruction.
(3) And when the maximum probability value in the probabilities of the voice recognition results is smaller than a preset threshold value, the intelligent equipment respectively calculates the weight corresponding to each recognition result.
(4) And based on the operation instruction and the weight corresponding to each recognition result, the intelligent equipment executes the step of acquiring the operation instruction corresponding to the action.
In the step (3), the maximum probability value in the probabilities of the speech recognition results is smaller than the preset threshold, which indicates that the intelligent device cannot definitely indicate which operation instruction the real requirement of the target object is when matching the speech recognition results, so that the weighted calculation can be performed by combining each recognition result to obtain a more accurate result.
As shown in fig. 3, in another possible implementation manner, when a maximum probability value in the probabilities of the speech recognition results is greater than or equal to a preset threshold, the smart device may obtain an operation instruction corresponding to the maximum probability value in the probabilities of the speech recognition results, and the smart device may perform the step of executing the device control process corresponding to the operation instruction in response to the operation instruction. That is, the voice recognition result can already clearly indicate that the voice of the user wants the intelligent device to execute a certain operation instruction, so that the user requirement can be accurately met without other types of recognition results.
For example, the preset threshold may be 90%, and the smart device may be a smart speaker. The user sees to intelligent audio amplifier to saying "Hi, XX" to intelligent audio amplifier, user's pronunciation is gathered to intelligent audio amplifier, carries out speech recognition, obtains speech recognition result, and simultaneously, user's lip action is also gathered to intelligent audio amplifier, and has obtained lip language recognition result. When the surrounding environment is quite or the sound of the user is quite large, the intelligent sound box performs the matching process, the probability that the voice recognition result corresponds to the awakening operation instruction is 92%, the probability that the voice recognition result corresponds to the playing operation instruction is 8%, the probability that the lip recognition result corresponds to the awakening operation instruction is 95%, the probability that the lip recognition result corresponds to the playing operation instruction is 2%, the probability that the lip recognition result corresponds to the pausing operation instruction is 3%, and since 92% is greater than 90% of the preset threshold value, the intelligent sound box can directly awaken the operation instruction corresponding to 92% of the voice recognition result as the operation instruction which the user wants to input, and accordingly the operation instruction is obtained and responded. And if the surrounding environment is very noisy or the sound of the user is relatively small, the probability that the voice recognition result is corresponding to the awakening operation instruction is 60% and the probability that the voice recognition result is corresponding to the playing operation instruction is 40% for the intelligent sound box, and since 60% of the probability is smaller than the preset threshold value 90%, the intelligent sound box can be combined with the lip language recognition result, the comprehensive weight of the awakening operation instruction is obtained by calculating the weights of the lip language recognition result and the voice recognition result and is larger than the comprehensive weight of the playing operation instruction, so that the awakening operation instruction is obtained, and the awakening operation instruction is executed.
Of course, the intelligent device can also be configured as an option of a plurality of identification functions, and the user can select which identification result to adopt or select which identification results to adopt according to the actual requirement of the user.
Furthermore, the intelligent device can also identify the object corresponding to the collected motion, facial expression or voice, so that the intelligent device only responds to the preset instruction sent by the target object. Specifically, the intelligent device is preset with information of at least one target object, where the information may include a sound feature of the target object, a facial feature or a limb feature of the target object, and the like, and when the intelligent device performs image acquisition in real time, it may detect whether the information of the object in the image matches the preset information of the target object to determine whether the object in the image is the preset target object, so as to determine whether to respond to an instruction issued by the object in the image. Of course, when the smart device collects the sound of a certain object, the sound of the object may also be recognized to determine whether the object is a preset target object, so as to determine whether the object has the right to control the smart device. When the object is a target object, the smart device performs the step of executing the above-described acquisition operation instruction according to the recognition result of the motion, facial expression, or voice of the target object and responding to the operation instruction, and when the object is not a target object, the smart device may ignore the motion, etc. of the object and not perform the operation of subsequently acquiring the operation instruction, etc. on each recognition result of the object. Therefore, the use permission of the equipment is preset, and the identity of the object corresponding to the collected action, facial expression or voice is identified, so that the intelligence of the equipment control method can be improved, and the safety of equipment control is improved.
According to the method and the device, the five sense organs or the limbs of the user are collected, the corresponding operation instruction is obtained and responded according to the action of the user, the instruction sent by the user can be accurately received even in a noisy environment, and the success rate of the device control method can be improved. Furthermore, the embodiment of the disclosure also times the process of the user facing the intelligent device, and ignores the operation instruction that the duration of the user facing the intelligent device is less than the preset duration, so as to avoid responding to the daily behavior of the user and improve the accuracy of the device control method. Furthermore, the embodiment of the disclosure can identify the behavior of the user in multiple dimensions by simultaneously collecting the actions and facial expressions of the user, thereby improving the accuracy of identification and the success rate of the device control method. Furthermore, the embodiment of the disclosure also can identify the behavior of the user in multiple dimensions by simultaneously collecting the actions, facial expressions and voices of the user, thereby improving the accuracy of identification and the success rate of the device control method. Further, the embodiment of the present disclosure also provides a flexible device control method by setting priorities of different types of recognition results.
All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
Fig. 4 is a block diagram illustrating an apparatus control device according to an exemplary embodiment. The device control apparatus is applied to an intelligent device, and referring to fig. 4, the apparatus includes an acquisition module 401, an acquisition module 402, and a processing module 403.
The acquisition module 401 is configured to acquire the five sense organs or the limb movement of the target object when the target object is detected to face the smart device;
the obtaining module 402 is configured to obtain an operation instruction corresponding to the collected action;
the processing module 403 is configured to respond to the operation instruction and execute the device control process corresponding to the operation instruction.
In one possible implementation, as shown in fig. 5, the apparatus further includes:
the first detection module 404 is configured to perform image acquisition in real time, and when a human face five sense organ is detected in the acquired image, it is determined that the target object is detected to face the smart device.
In one possible implementation, as shown in fig. 6, the apparatus further includes:
the positioning module 405 is configured to perform image acquisition in real time, perform face detection on the acquired image, and position an eyeball center and a pupil center of the target object;
the first determination module 406 is configured to determine a direction of a line connecting the eyeball center and the pupil center as a gaze direction of the target object;
the second detection module 407 is configured to determine that the target object is detected to face the smart device when the gaze direction of the target object is detected to pass the smart device.
In one possible implementation, as shown in fig. 7, the apparatus further includes:
the timing module 408 is configured to start a timing function when the target object is detected to face the smart device, and perform image acquisition in real time during timing;
the processing module 403 is further configured to ignore the operation instruction when detecting that the target object no longer faces the smart device when the timing does not reach the preset duration.
In one possible implementation, as shown in fig. 8, the apparatus further includes:
the action recognition module 409 is configured to, when the collected action is a lip action and/or a gesture action, perform lip language recognition on the lip action and/or perform gesture recognition on the gesture action to obtain a lip language recognition result and/or a gesture recognition result;
the obtaining module 402 is configured to obtain an operation instruction corresponding to the action according to the lip language recognition result and/or the gesture recognition result.
In one possible implementation, the obtaining module 402 is configured to:
when the collected action is a lip action, acquiring an operation instruction matched with the lip language identification result; or the like, or, alternatively,
when the collected action is a gesture action, acquiring an operation instruction matched with the gesture recognition result; or the like, or, alternatively,
and when the collected actions are lip actions and gesture actions, respectively calculating weights corresponding to the recognition results, and acquiring the operation instruction corresponding to the action based on the operation instruction and the weights corresponding to the recognition results.
In one possible implementation, as shown in fig. 9, the apparatus further includes:
the face recognition module 410 is configured to perform image acquisition in real time, and when facial features are detected in the acquired image, perform face recognition on the acquired image to obtain a facial expression recognition result of the target object;
the first calculating module 411 is configured to calculate weights corresponding to the facial expression recognition result and the motion recognition result respectively;
the obtaining module 402 is further configured to obtain an operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result.
In one possible implementation, as shown in fig. 10, the apparatus further includes:
the first matching module 412 is configured to match each recognition result with a preset operation instruction corresponding to the type of each recognition result, so as to obtain at least one operation instruction corresponding to each recognition result and a probability corresponding to each operation instruction;
the second determining module 413 is configured to determine the operation instruction corresponding to the probability maximum value in the probabilities of the respective recognition results as the operation instruction corresponding to the respective recognition results;
the second determining module 413 is further configured to determine the weight corresponding to each recognition result according to a preset corresponding relationship between the maximum probability value and the weight;
the second calculating module 414 is configured to perform weighted calculation on the operation instructions corresponding to the respective recognition results based on the weights corresponding to the respective recognition results, so as to obtain a comprehensive weight corresponding to each operation instruction;
the obtaining module 402 is further configured to obtain the operation instruction with the largest comprehensive weight.
In one possible implementation, as shown in fig. 11, the apparatus further includes:
the voice recognition module 415 is configured to perform voice collection in real time, perform voice recognition on the collected voice, and obtain a voice recognition result of the target object;
the second matching module 416 is configured to match the voice recognition result with a preset operation instruction corresponding to the voice, so as to obtain at least one operation instruction corresponding to the voice recognition result and a probability corresponding to each operation instruction;
the third calculating module 417 is configured to calculate the weight corresponding to each recognition result when the probability maximum value in the probabilities of the speech recognition results is smaller than a preset threshold;
the obtaining module 402 is further configured to perform the step of obtaining the operation instruction corresponding to the action based on the operation instruction corresponding to each recognition result and the weight.
In a possible implementation manner, the obtaining module 402 is further configured to obtain, when a maximum probability value of the probabilities of the speech recognition results is greater than or equal to a preset threshold, an operation instruction corresponding to the maximum probability value of the probabilities of the speech recognition results;
the processing module 403 is configured to execute the step of executing the device control process corresponding to the operation instruction in response to the operation instruction.
The device provided by the embodiment of the disclosure acquires and responds to the corresponding operation instruction according to the action of the user by acquiring the five sense organs action or the limb action of the user, and can accurately receive the instruction sent by the user even in a noisy environment, so that the success rate of the device control method can be improved.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 12 is a block diagram illustrating an appliance control apparatus 1200 according to an example embodiment. The device control apparatus may be provided as a smart device, for example, the device control apparatus 1200 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 12, the device control apparatus 1200 may include one or more of the following components: processing component 1202, memory 1204, power component 1206, multimedia component 1208, audio component 1210, input/output (I/O) interface 1212, sensor component 1214, and communications component 1216.
The processing component 1202 generally controls overall operation of the device control apparatus 1200, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1202 may include one or more processors 1220 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 1202 can include one or more modules that facilitate interaction between the processing component 1202 and other components. For example, the processing component 1202 can include a multimedia module to facilitate interaction between the multimedia component 1208 and the processing component 1202.
The memory 1204 is configured to store various types of data to support the operation at the device control apparatus 1200. Examples of such data include instructions for any application or method operating on the device control apparatus 1200, contact data, phonebook data, messages, pictures, videos, and the like. The memory 1204 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 1206 provides power to the various components of the device control apparatus 1200. The power components 1206 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device control apparatus 1200.
The multimedia component 1208 includes a screen providing an output interface between the device control apparatus 1200 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1208 includes a front facing camera and/or a rear facing camera. When the device control apparatus 1200 is in an operation mode, such as a photographing mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
Audio component 1210 is configured to output and/or input audio signals. For example, the audio assembly 1210 includes a Microphone (MIC) configured to receive an external audio signal when the device control apparatus 1200 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 1204 or transmitted via the communication component 1216. In some embodiments, audio assembly 1210 further includes a speaker for outputting audio signals.
The I/O interface 1212 provides an interface between the processing component 1202 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 1214 includes one or more sensors for providing status evaluations of various aspects of the device control apparatus 1200. For example, the sensor assembly 1214 may detect an open/closed state of the device control apparatus 1200, the relative positioning of the components, such as a display and keypad of the device control apparatus 1200, the sensor assembly 1214 may also detect a change in position of the device control apparatus 1200 or a component of the device control apparatus 1200, the presence or absence of user contact with the device control apparatus 1200, the orientation or acceleration/deceleration of the device control apparatus 1200, and a change in temperature of the device control apparatus 1200. The sensor assembly 1214 may include a proximity sensor configured to detect the presence of a nearby object in the absence of any physical contact. The sensor assembly 1214 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1214 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communications component 1216 is configured to facilitate communications between the device control apparatus 1200 and other devices, either wired or wirelessly. The device control apparatus 1200 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1216 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1216 also includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the device control apparatus 1200 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic elements for performing the above-described device control methods.
In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory, storing a computer program, which when processed and executed, implements the apparatus control method illustrated in fig. 1, fig. 2, or fig. 3 of the above embodiments. For example, the computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a Compact Disc Read-Only Memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (18)

1. A device control method is applied to an intelligent device, and comprises the following steps:
when a target object is detected to face the intelligent equipment, collecting the five sense organs action or the limb action of the target object, wherein the target object is a preset object with control authority over the intelligent equipment;
according to the collected actions, obtaining operation instructions corresponding to the actions, wherein the operation instructions comprise: carrying out image acquisition in real time, and carrying out face recognition on the acquired image when facial features are detected in the acquired image to obtain a facial expression recognition result of the target object; respectively calculating weights corresponding to the facial expression recognition result and the action recognition result; acquiring an operation instruction corresponding to the action based on the operation instruction corresponding to each recognition result and the weight;
the calculating weights corresponding to the facial expression recognition result and the action recognition result respectively, and acquiring an operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result, includes: respectively matching each recognition result with a preset operation instruction corresponding to the type of each recognition result to obtain at least one operation instruction corresponding to each recognition result and the probability corresponding to each operation instruction; determining the operation instruction corresponding to the probability maximum value in the probabilities of the identification results as the operation instruction corresponding to the identification results; determining the weight corresponding to each recognition result according to the corresponding relation between the preset probability maximum value and the weight; based on the weight corresponding to each recognition result, performing weighted calculation on the operation instruction corresponding to each recognition result to obtain a comprehensive weight corresponding to each operation instruction; acquiring the operation instruction with the maximum comprehensive weight;
and responding to the operation instruction, and executing the equipment control process corresponding to the operation instruction.
2. The method of claim 1, further comprising:
and acquiring images in real time, and determining that the target object faces the intelligent equipment when the facial features are detected in the acquired images.
3. The method of claim 1, further comprising:
carrying out image acquisition in real time, carrying out face detection on the acquired image, and positioning the eyeball center and the pupil center of a target object;
determining the direction of a connecting line of the eyeball center and the pupil center as the sight line direction of a target object;
when the fact that the sight line direction of the target object passes through the intelligent device is detected, it is determined that the target object faces the intelligent device.
4. The method of claim 1, further comprising:
when the target object is detected to face the intelligent equipment, starting a timing function, and acquiring images in real time in the timing process;
and when the timing does not reach the preset duration, ignoring the operation instruction when detecting that the target object does not face the intelligent equipment any more.
5. The method according to claim 1, wherein the obtaining, according to the collected action, an operation instruction corresponding to the action comprises:
when the collected action is a lip action and/or a gesture action, performing lip language recognition on the lip action and/or performing gesture recognition on the gesture action to obtain a lip language recognition result and/or a gesture recognition result;
and acquiring an operation instruction corresponding to the action according to the lip language recognition result and/or the gesture recognition result.
6. The method according to claim 5, wherein the obtaining of the operation instruction corresponding to the action according to the lip language recognition result and/or the gesture recognition result includes:
when the collected action is a lip action, acquiring an operation instruction matched with the lip language identification result;
when the collected action is a gesture action, acquiring an operation instruction matched with the gesture recognition result;
and when the collected actions are lip actions and gesture actions, respectively calculating weights corresponding to the recognition results, and acquiring the operation instructions corresponding to the actions based on the operation instructions and the weights corresponding to the recognition results.
7. The method according to any one of claims 1-6, further comprising:
performing voice acquisition in real time, and performing voice recognition on the acquired voice to obtain a voice recognition result of the target object;
matching the voice recognition result with a preset operation instruction corresponding to the voice to obtain at least one operation instruction corresponding to the voice recognition result and the probability corresponding to each operation instruction;
when the probability maximum value in the probabilities of the voice recognition results is smaller than a preset threshold value, respectively calculating the weight corresponding to each recognition result;
and executing the step of acquiring the operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result.
8. The method of claim 7, further comprising:
when the probability maximum value in the probabilities of the voice recognition results is larger than or equal to a preset threshold value, acquiring an operation instruction corresponding to the probability maximum value in the probabilities of the voice recognition results;
and executing the step of responding to the operating instruction and executing the equipment control process corresponding to the operating instruction.
9. The device control device is characterized by being applied to intelligent devices and comprising:
the acquisition module is used for acquiring the five sense organs action or the limb action of a target object when the target object is detected to face the intelligent equipment, wherein the target object is a preset object with control authority over the intelligent equipment;
the acquisition module is used for acquiring an operation instruction corresponding to the action according to the acquired action;
the processing module is used for responding to the operation instruction and executing the equipment control process corresponding to the operation instruction;
the device further comprises:
the face recognition module is used for acquiring images in real time, and when facial features are detected in the acquired images, the face recognition module is used for carrying out face recognition on the acquired images to obtain a facial expression recognition result of the target object;
the first calculation module is used for respectively calculating the weights corresponding to the facial expression recognition result and the action recognition result;
the obtaining module is further configured to obtain an operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result;
the first matching module is used for respectively matching each recognition result with a preset operation instruction corresponding to the type of each recognition result to obtain at least one operation instruction corresponding to each recognition result and the probability corresponding to each operation instruction;
the second determining module is used for determining the operation instruction corresponding to the probability maximum value in the probabilities of the identification results as the operation instruction corresponding to the identification results;
the second determining module is further configured to determine a weight corresponding to each recognition result according to a preset correspondence between the maximum probability value and the weight;
the second calculation module is used for performing weighted calculation on the operation instructions corresponding to the identification results based on the weights corresponding to the identification results to obtain a comprehensive weight corresponding to each operation instruction;
the obtaining module is further configured to obtain the operation instruction with the largest comprehensive weight.
10. The apparatus of claim 9, further comprising:
the first detection module is used for acquiring images in real time, and when the facial features are detected in the acquired images, the detected target object faces the intelligent device.
11. The apparatus of claim 9, further comprising:
the positioning module is used for acquiring images in real time, detecting faces of the acquired images and positioning the eyeball center and the pupil center of the target object;
the first determination module is used for determining the direction of a connecting line of the eyeball center and the pupil center as the sight line direction of the target object;
and the second detection module is used for determining that the target object faces the intelligent equipment when the fact that the sight line direction of the target object passes through the intelligent equipment is detected.
12. The apparatus of claim 9, further comprising:
the timing module is used for starting a timing function when the target object is detected to face the intelligent equipment, and acquiring images in real time in the timing process;
the processing module is further configured to ignore the operation instruction when the timing does not reach the preset duration and the target object is detected to no longer face the intelligent device.
13. The apparatus of claim 9, further comprising:
the action recognition module is used for carrying out lip language recognition on the lip action and/or carrying out gesture recognition on the gesture action when the collected action is the lip action and/or the gesture action to obtain a lip language recognition result and/or a gesture recognition result;
and the acquisition module is used for acquiring the operation instruction corresponding to the action according to the lip language recognition result and/or the gesture recognition result.
14. The apparatus of claim 13, wherein the obtaining module is configured to:
when the collected action is a lip action, acquiring an operation instruction matched with the lip language identification result; or the like, or, alternatively,
when the collected action is a gesture action, acquiring an operation instruction matched with the gesture recognition result; or the like, or, alternatively,
and when the collected actions are lip actions and gesture actions, respectively calculating weights corresponding to the recognition results, and acquiring the operation instructions corresponding to the actions based on the operation instructions and the weights corresponding to the recognition results.
15. The apparatus according to any one of claims 9-14, further comprising:
the voice recognition module is used for carrying out voice collection in real time and carrying out voice recognition on the collected voice to obtain a voice recognition result of the target object;
the second matching module is used for matching the voice recognition result with a preset operation instruction corresponding to the voice to obtain at least one operation instruction corresponding to the voice recognition result and the probability corresponding to each operation instruction;
the third calculation module is used for calculating the weight corresponding to each recognition result when the probability maximum value in the probabilities of the voice recognition results is smaller than a preset threshold value;
the obtaining module is further configured to execute the step of obtaining the operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result.
16. The apparatus of claim 15,
the obtaining module is further configured to obtain an operation instruction corresponding to a maximum probability value in the probabilities of the voice recognition results when the maximum probability value in the probabilities of the voice recognition results is greater than or equal to a preset threshold;
and the processing module is used for executing the step of responding to the operating instruction and executing the equipment control process corresponding to the operating instruction.
17. An apparatus control device, characterized by comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to: when the target object is detected to face the intelligent equipment, acquiring the five sense organs action or the limb action of the target object; acquiring an operation instruction corresponding to the action according to the acquired action; responding to the operating instruction, executing an equipment control process corresponding to the operating instruction, wherein the target object is a preset object with control authority over the intelligent equipment;
wherein, according to the action that gathers, obtain the operating instruction that corresponds with the action, include: carrying out image acquisition in real time, and carrying out face recognition on the acquired image when facial features are detected in the acquired image to obtain a facial expression recognition result of the target object; respectively calculating weights corresponding to the facial expression recognition result and the action recognition result; acquiring an operation instruction corresponding to the action based on the operation instruction corresponding to each recognition result and the weight;
the calculating weights corresponding to the facial expression recognition result and the action recognition result respectively, and acquiring an operation instruction corresponding to the action based on the operation instruction and the weight corresponding to each recognition result, includes: respectively matching each recognition result with a preset operation instruction corresponding to the type of each recognition result to obtain at least one operation instruction corresponding to each recognition result and the probability corresponding to each operation instruction; determining the operation instruction corresponding to the probability maximum value in the probabilities of the identification results as the operation instruction corresponding to the identification results; determining the weight corresponding to each recognition result according to the corresponding relation between the preset probability maximum value and the weight; based on the weight corresponding to each recognition result, performing weighted calculation on the operation instruction corresponding to each recognition result to obtain a comprehensive weight corresponding to each operation instruction; and acquiring the operation instruction with the maximum comprehensive weight.
18. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 8.
CN201711315879.9A 2017-12-12 2017-12-12 Device control method, device control apparatus, and storage medium Active CN108052079B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711315879.9A CN108052079B (en) 2017-12-12 2017-12-12 Device control method, device control apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711315879.9A CN108052079B (en) 2017-12-12 2017-12-12 Device control method, device control apparatus, and storage medium

Publications (2)

Publication Number Publication Date
CN108052079A CN108052079A (en) 2018-05-18
CN108052079B true CN108052079B (en) 2021-01-15

Family

ID=62124344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711315879.9A Active CN108052079B (en) 2017-12-12 2017-12-12 Device control method, device control apparatus, and storage medium

Country Status (1)

Country Link
CN (1) CN108052079B (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108766438B (en) * 2018-06-21 2020-12-01 Oppo广东移动通信有限公司 Man-machine interaction method and device, storage medium and intelligent terminal
CN109032345B (en) * 2018-07-04 2022-11-29 百度在线网络技术(北京)有限公司 Equipment control method, device, equipment, server and storage medium
CN111177329A (en) * 2018-11-13 2020-05-19 奇酷互联网络科技(深圳)有限公司 User interaction method of intelligent terminal, intelligent terminal and storage medium
CN111176430B (en) * 2018-11-13 2023-10-13 奇酷互联网络科技(深圳)有限公司 Interaction method of intelligent terminal, intelligent terminal and storage medium
CN111107407A (en) * 2019-01-08 2020-05-05 姜鹏飞 Audio and video playing control method, device and equipment and computer readable storage medium
CN111435422B (en) * 2019-01-11 2024-03-08 商汤集团有限公司 Action recognition method, control method and device, electronic equipment and storage medium
CN109901408A (en) * 2019-03-08 2019-06-18 阿里巴巴集团控股有限公司 A kind of control method of smart machine, device and system
CN110213138A (en) * 2019-04-23 2019-09-06 深圳康佳电子科技有限公司 Intelligent terminal user authentication method, intelligent terminal and storage medium
CN110197171A (en) * 2019-06-06 2019-09-03 深圳市汇顶科技股份有限公司 Exchange method, device and the electronic equipment of action message based on user
CN112417923A (en) * 2019-08-20 2021-02-26 云丁网络技术(北京)有限公司 System, method and apparatus for controlling smart devices
CN110730115B (en) 2019-09-11 2021-11-09 北京小米移动软件有限公司 Voice control method and device, terminal and storage medium
CN111242029A (en) * 2020-01-13 2020-06-05 湖南世优电气股份有限公司 Device control method, device, computer device and storage medium
CN111625094B (en) * 2020-05-25 2023-07-14 阿波罗智联(北京)科技有限公司 Interaction method and device of intelligent rearview mirror, electronic equipment and storage medium
CN113671846B (en) * 2021-08-06 2024-03-12 深圳市欧瑞博科技股份有限公司 Intelligent device control method and device, wearable device and storage medium
CN113742687B (en) * 2021-08-31 2022-10-21 深圳时空数字科技有限公司 Internet of things control method and system based on artificial intelligence
CN113759748A (en) * 2021-10-20 2021-12-07 深圳市博视系统集成有限公司 Intelligent home control method and system based on Internet of things
CN114028794A (en) * 2021-11-12 2022-02-11 成都拟合未来科技有限公司 Auxiliary fitness method and system with interaction function
CN116434027A (en) * 2023-06-12 2023-07-14 深圳星寻科技有限公司 Artificial intelligent interaction system based on image recognition

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6104707B2 (en) * 2013-05-23 2017-03-29 アルパイン株式会社 Electronic device, operation input method, and operation input program
CN105448292B (en) * 2014-08-19 2019-03-12 北京羽扇智信息科技有限公司 A kind of time Speech Recognition System and method based on scene
CN104951808B (en) * 2015-07-10 2018-04-27 电子科技大学 A kind of 3D direction of visual lines methods of estimation for robot interactive object detection
CN105159460B (en) * 2015-09-10 2018-01-23 哈尔滨理工大学 The control method of the intelligent domestic appliance controller based on eye-tracking
CN105739688A (en) * 2016-01-21 2016-07-06 北京光年无限科技有限公司 Man-machine interaction method and device based on emotion system, and man-machine interaction system
CN107239139B (en) * 2017-05-18 2018-03-16 刘国华 Based on the man-machine interaction method and system faced
CN107330418B (en) * 2017-07-12 2021-06-01 深圳市铂越科技有限公司 Robot system

Also Published As

Publication number Publication date
CN108052079A (en) 2018-05-18

Similar Documents

Publication Publication Date Title
CN108052079B (en) Device control method, device control apparatus, and storage medium
US11580983B2 (en) Sign language information processing method and apparatus, electronic device and readable storage medium
CN108363706B (en) Method and device for man-machine dialogue interaction
CN107945133B (en) Image processing method and device
CN112118380B (en) Camera control method, device, equipment and storage medium
CN105357425B (en) Image capturing method and device
CN107463903B (en) Face key point positioning method and device
CN108712603B (en) Image processing method and mobile terminal
EP3133471A1 (en) Play control method, apparatus, terminal, and recording medium
EP3328062A1 (en) Photo synthesizing method and device
CN108668080A (en) Prompt method and device, the electronic equipment of camera lens degree of fouling
US11816924B2 (en) Method for behaviour recognition based on line-of-sight estimation, electronic equipment, and storage medium
CN112188091B (en) Face information identification method and device, electronic equipment and storage medium
CN111144266B (en) Facial expression recognition method and device
KR102163996B1 (en) Apparatus and Method for improving performance of non-contact type recognition function in a user device
CN106127132B (en) The reminding method and device, electronic equipment of slidingtype typing fingerprint
CN109117819B (en) Target object identification method and device, storage medium and wearable device
CN112114653A (en) Terminal device control method, device, equipment and storage medium
CN111988522B (en) Shooting control method and device, electronic equipment and storage medium
CN112099639A (en) Display attribute adjusting method and device, display equipment and storage medium
CN111698600A (en) Processing execution method and device and readable medium
CN105635573B (en) Camera visual angle regulating method and device
CN111373409B (en) Method and terminal for obtaining color value change
EP3200127B1 (en) Method and device for fingerprint recognition
CN111292743A (en) Voice interaction method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant