CN111103981A

CN111103981A - Control instruction generation method and device

Info

Publication number: CN111103981A
Application number: CN201911329945.7A
Authority: CN
Inventors: 刘思阳
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-05-05

Abstract

The embodiment of the invention provides a control instruction generation method and device, electronic equipment and a computer readable storage medium, and relates to the field of data processing. The method comprises the following steps: acquiring continuous two-dimensional image frames; calculating user pose keypoint information for each of the successive two-dimensional image frames, the user pose keypoint information comprising: three-dimensional coordinates of the user gesture key points; identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames; and generating a corresponding control instruction according to the control intention. According to the method and the device, the three-dimensional coordinates of each gesture key point of the user of each frame in the continuous two-dimensional image frames are calculated and obtained only on the basis of the obtained continuous two-dimensional image frames, the control intention of the user is further determined on the basis of the three-dimensional coordinates of each gesture key point of the user of at least one frame in the continuous two-dimensional image frames, and the control intention of the user is recognized by adopting the two-dimensional image frames.

Description

Control instruction generation method and device

Technical Field

The present invention relates to the field of data processing, and in particular, to a method and an apparatus for generating a control instruction, an electronic device, and a computer-readable storage medium.

Background

The control intention of the user is obtained through the image, the corresponding control instruction is obtained, convenience and diversity of user operation are facilitated to be improved, and therefore the application is wide.

Currently, a depth data camera is generally used to acquire a depth image of a person, and a control intention of the person is identified based on the depth image. For example, the RGBD depth camera may collect a depth image in addition to a normal color image, and identify a control intention of a person to be photographed based on the depth image collected by the RGBD depth camera.

However, in the prior art, the control intention of a person cannot be identified by using a two-dimensional image acquired by a common camera.

Disclosure of Invention

The embodiment of the invention aims to provide a control instruction generation method, a control instruction generation device, electronic equipment and a computer readable storage medium, so as to solve the problem that a control intention of a person cannot be identified by a two-dimensional image acquired by a common camera. The specific technical scheme is as follows:

in a first aspect of the present invention, there is provided a control instruction generating method, including:

acquiring continuous two-dimensional image frames;

calculating user pose keypoint information for each of the successive two-dimensional image frames, the user pose keypoint information comprising: three-dimensional coordinates of the user gesture key points;

identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames;

and generating a corresponding control instruction according to the control intention.

Optionally, the identifying the control intention of the user according to the user gesture key point information of at least one frame of the continuous two-dimensional image frames includes:

inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into a gesture matching model to obtain at least one frame of corresponding matching control gesture;

and determining the corresponding control intention of the user according to the matching control gesture corresponding to the at least one frame.

Optionally, the inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into the gesture matching model in sequence to obtain at least one frame of corresponding matching control gesture includes:

inputting the user posture key point information of each frame in the continuous two-dimensional image frame and the control posture key point information corresponding to each preset control posture into the posture matching model to obtain the matching confidence coefficient between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture;

and if the matching confidence exceeds a preset threshold, determining the current control posture as the matching control posture corresponding to the at least one frame.

Optionally, the determining the control intention corresponding to the user according to the matching control gesture corresponding to the at least one frame includes:

and if the matching control posture is a static control posture, determining a control intention corresponding to the matching control posture according to a preset corresponding relation between the static control posture and the control intention.

if the matched control gesture is a dynamic control gesture, detecting the variable quantity of the control quantity gesture key point in the dynamic control gesture according to the current frame and at least one frame after the current frame time sequence;

and determining a control intention corresponding to the matched control gesture according to the dynamic control gesture and the variation of the key points of the control quantity gesture.

Optionally, the detecting, according to the current frame and at least one frame after the current frame time sequence, a variation of a control amount gesture key point in the dynamic control gesture includes:

and inputting the current frame and at least one frame after the current frame in time sequence into a variable quantity determination model corresponding to the dynamic control posture, and outputting the variable quantity of the control quantity posture key point in the dynamic control posture.

inputting at least one frame after the current frame time sequence into the gesture matching model to obtain a matching control gesture corresponding to each frame in the at least one frame after the current frame time sequence;

and detecting the variation of the key point of the control amount posture in the dynamic control posture according to the current frame and a target frame with the same matching control posture as that of the current frame in at least one frame after the current frame time sequence.

Optionally, the gesture matching model includes: a first fully connected network, a second fully connected network, and a third fully connected network; inputting the user posture key point information of each frame in the continuous two-dimensional image frame and the control posture key point information corresponding to each preset control posture into the posture matching model to obtain the matching confidence coefficient between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture, and the method comprises the following steps:

inputting user gesture key point information of each frame in the continuous two-dimensional image frames into the first fully-connected network, and inputting control gesture key point information corresponding to each preset control gesture into the second fully-connected network;

and adding the output vectors of the first fully-connected network and the second fully-connected network, inputting the sum to a third fully-connected network, and outputting the matching confidence coefficient between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture through the third fully-connected network.

Optionally, the gesture key point information includes: gesture keypoint information, said calculating user gesture keypoint information for each of said successive two-dimensional image frames further comprising:

detecting human key points of each frame in the continuous two-dimensional image frames to obtain a recognition result of the human key points;

determining the left elbow coordinate and/or the right elbow coordinate in each frame according to the recognition result of the key points of the human body;

determining a gesture detection area in each frame according to the left elbow coordinate and/or the right elbow coordinate; the gesture detection area comprises finger key points and wrist key points of a left hand and/or finger key points and wrist key points of a right hand;

the calculating user gesture keypoint information for each of the successive two-dimensional image frames comprises:

and calculating user posture key point information in a gesture detection area in each frame of the continuous two-dimensional image frames.

Optionally, before the calculating the user gesture key point information of each frame in the consecutive two-dimensional image frames, the method further includes:

performing face recognition on each frame in the continuous two-dimensional image frames, and determining an authorized user in each frame in the continuous two-dimensional image frames;

deleting the gesture key points of the unauthorized user in each frame, and keeping the gesture key points of the authorized user;

and calculating gesture key point information of authorized users reserved in each frame of the continuous two-dimensional image frames.

Optionally, before identifying the control intention of the user according to the user gesture key point information of at least one frame of the continuous two-dimensional image frames, the method further includes:

judging whether a preparation gesture is received or not according to the user gesture key point information of at least one frame of the continuous two-dimensional image frames;

the identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames comprises:

and under the condition that a preparation gesture is received, identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames.

Optionally, the method further includes:

sending the control instruction to the controlled equipment; the controlled equipment is used for executing the control instruction.

In a second aspect of the present invention, there is also provided a control instruction generating apparatus, including:

the image acquisition module is used for acquiring continuous two-dimensional image frames;

a pose key point information calculation module for calculating user pose key point information for each of the successive two-dimensional image frames, the user pose key point information comprising: three-dimensional coordinates of the user gesture key points;

the control intention identification module is used for identifying the control intention of the user according to the key point information of the user posture of at least one frame in the continuous two-dimensional image frames;

and the control instruction generating module is used for generating a corresponding control instruction according to the control intention.

Optionally, the control intention identifying module includes:

the matching control posture determining submodule is used for inputting the user posture key point information of each frame in the continuous two-dimensional image frames and the control posture key point information corresponding to each preset control posture into a posture matching model in sequence to obtain at least one frame of corresponding matching control posture;

and the first control intention identification submodule is used for determining the control intention corresponding to the user according to the matching control gesture corresponding to the at least one frame.

Optionally, the matching control gesture determination submodule includes:

a matching confidence determining unit, configured to input, to the gesture matching model, the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture in sequence, so as to obtain a matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture;

a matching control gesture determining unit, configured to determine the current control gesture as the matching control gesture corresponding to the at least one frame if the matching confidence exceeds a preset threshold.

Optionally, the first control intention identifying sub-module includes:

and the first control intention identification unit is used for determining the control intention corresponding to the matching control gesture according to the preset corresponding relation between the static control gesture and the control intention if the matching control gesture is the static control gesture.

Optionally, the first control intention identifying sub-module includes:

a variation detecting unit, configured to detect, if the matching control gesture is a dynamic control gesture, a variation of a control amount gesture key point in the dynamic control gesture according to the current frame and at least one frame after a time sequence of the current frame;

and the second control intention identification unit is used for determining the control intention corresponding to the matched control gesture according to the dynamic control gesture and the variation of the key points of the control quantity gesture.

Optionally, the variation detecting unit includes:

and the first variation detection subunit is configured to input the current frame and at least one frame subsequent to the current frame in time sequence to a variation determination model corresponding to the dynamic control gesture, and output a variation of a controlled variable gesture key point in the dynamic control gesture.

Optionally, the variation detecting unit includes:

a pose determining subunit, configured to input the at least one frame after the current frame time sequence into the pose matching model, so as to obtain a matching control pose corresponding to each frame in the at least one frame after the current frame time sequence;

and a second variation detecting subunit, configured to detect a variation of a control amount pose key point in the dynamic control pose according to the current frame and a target frame, in which a matching control pose of the target frame is the same as a matching pose of the current frame, in at least one frame subsequent to a time sequence of the current frame.

Optionally, the gesture matching model includes: a first fully connected network, a second fully connected network, and a third fully connected network; the matching confidence determining unit includes:

an input subunit, configured to input user gesture key point information of each frame in the consecutive two-dimensional image frames into the first fully connected network, and input control gesture key point information corresponding to each preset control gesture into the second fully connected network;

a matching confidence determining subunit, configured to add output vectors of the first fully connected network and the second fully connected network, input the added output vectors to the third fully connected network, and output, through the third fully connected network, a matching confidence between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture.

Optionally, the gesture key point information includes: gesture keypoint information, the apparatus further comprising:

the human body key point identification module is used for detecting human body key points of each frame in the continuous two-dimensional image frames to obtain an identification result of the human body key points;

the elbow coordinate determination module is used for determining the left elbow coordinate and/or the right elbow coordinate in each frame according to the recognition result of the key points of the human body;

the gesture detection area determining module is used for determining a gesture detection area in each frame according to the left elbow coordinate and/or the right elbow coordinate; the gesture detection area comprises finger key points and wrist key points of a left hand and/or finger key points and wrist key points of a right hand;

the gesture keypoint information calculation module comprises:

and the gesture key point information first calculation submodule is used for calculating the user gesture key point information in the gesture detection area in each frame of the continuous two-dimensional image frames.

Optionally, the apparatus further includes:

the face recognition module is used for carrying out face recognition on each frame in the continuous two-dimensional image frames and determining authorized users in each frame in the continuous two-dimensional image frames;

a deleting module, configured to delete the gesture key points of the unauthorized user in each frame, and keep the gesture key points of the authorized user;

the gesture keypoint information calculation module comprises:

and the second calculation submodule of the gesture key point information is used for calculating the gesture key point information of the authorized user reserved in each frame of the continuous two-dimensional image frames.

Optionally, the apparatus further comprises:

the preparation gesture judging module is used for judging whether a preparation gesture is received or not according to the user gesture key point information of at least one frame of the continuous two-dimensional image frames;

the control intention recognition module includes:

and the second control intention identification submodule is used for identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames under the condition of receiving the preparation gesture.

Optionally, the apparatus further comprises:

the control instruction sending module is used for sending the control instruction to the controlled equipment; the controlled equipment is used for executing the control instruction. In another aspect of the present invention, there is also provided an electronic device, including a processor, a communication interface, a memory and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus; a memory for storing a computer program; and a processor for implementing any of the above control instruction generation methods when executing the program stored in the memory.

In yet another aspect of the present invention, there is also provided a computer-readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any of the above-described control instruction generation methods.

In yet another aspect of the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the control instruction generation methods described above.

The embodiment of the invention provides a control instruction generation method and a control instruction generation device, wherein the method comprises the following steps: acquiring continuous two-dimensional image frames; calculating user pose keypoint information for each of the successive two-dimensional image frames, the user pose keypoint information comprising: three-dimensional coordinates of the user gesture key points; identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames; and generating a corresponding control instruction according to the control intention, so that the problem that the control intention of a person cannot be identified by a two-dimensional image acquired by a common camera can be solved to a great extent.

In the embodiment of the invention, the three-dimensional coordinates of each gesture key point of the user of each frame in the continuous two-dimensional image frames are calculated and obtained only on the basis of the obtained continuous two-dimensional image frames, and the control intention of the user is further determined on the basis of the three-dimensional coordinates of each gesture key point of the user of at least one frame in the continuous two-dimensional image frames, so that the control intention of the user is identified by adopting the two-dimensional image frames.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart illustrating steps of a method for generating control commands according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a key point of a human body according to an embodiment of the present invention;

FIG. 3 is a diagram of a left-hand gesture key according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating steps of a method for generating control commands according to an embodiment of the present invention;

FIG. 5 is a flow chart of one step in determining a matching control gesture in an embodiment of the present invention;

FIG. 6 is a schematic diagram of the operation of a gesture matching model according to an embodiment of the present invention;

FIG. 7 is a flowchart of one step of calculating a confidence level of a match in an embodiment of the present invention;

FIG. 8 is a flowchart of one step in determining intent to control in an embodiment of the present invention;

FIG. 9 is a flowchart illustrating steps of a method for generating control commands according to an embodiment of the present invention;

FIG. 10 is a flowchart illustrating steps for determining a gesture detection area according to an embodiment of the present invention;

fig. 11 is a control instruction generating apparatus according to an embodiment of the present invention;

fig. 12 is still another control instruction generating apparatus according to the embodiment of the present invention;

FIG. 13 is a control instruction generating apparatus according to still another embodiment of the present invention;

fig. 14 is a schematic structural diagram of an electronic device in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of a control instruction generating method according to an embodiment of the present invention, where the method may be applied to a terminal or a controller of the terminal, and a specific application or type of the terminal, which are not limited in this embodiment of the present invention. For example, the terminal may include: video playback devices, and the like. For example, the method may be applied in television programs. In the embodiment of the invention, the method mainly comprises the following steps:

step 101: successive two-dimensional image frames are acquired.

In the embodiment of the invention, continuous two-dimensional image frames can be acquired based on a common or universal camera and the like. The two-dimensional image frame may be in color, black and white, etc., which is not specifically limited in this embodiment of the present invention. The two-dimensional image frame may contain an image of a user. The consecutive two-dimensional image frames may be a plurality of two-dimensional image frames that are chronologically consecutive. The number of frames specifically included in the continuous two-dimensional image frame is not specifically limited.

Step 102: calculating user pose keypoint information for each of the successive two-dimensional image frames, the user pose keypoint information comprising: three-dimensional coordinates of user gesture key points.

In the embodiment of the present invention, the gesture key points of the user may be body parts and the like capable of embodying the gesture of the user. For example, the gesture keypoints may include: human body key points, gesture key points, etc. of the user. The human body key points may specifically include: a head skeleton key point, a neck skeleton key point, a right shoulder skeleton key point, a left shoulder skeleton key point, a right elbow skeleton key point, a left elbow skeleton key point, a right wrist skeleton key point, a left wrist skeleton key point, a right hip skeleton key point, a left hip skeleton key point, a right knee skeleton key point, a left knee skeleton key point, a right foot wrist skeleton key point, and a left ankle skeleton key point. The gesture key points can be wrist key points of the left hand and the right hand, and finger tips, roots and joints on each finger. Both the left and right hand may include 21 gesture keypoints. For example, the gesture key points of the left hand may specifically include: wrist keypoints, and gesture keypoints on each finger. The gesture keypoints on each finger may in turn include: the number of finger tips, the root and the two finger joints is 4.

For example, referring to fig. 2, fig. 2 is a schematic diagram of a human body key point provided in an embodiment of the present invention. The skeletal points numbered 1 through 14 in fig. 2 may be human key points. There may be 14 human key points.

For example, referring to fig. 3, fig. 3 is a schematic diagram of a left-hand gesture key point according to an embodiment of the present invention. The 21 gesture keypoints numbered 0-20 for the left hand in fig. 3.

In the embodiment of the invention, the three-dimensional coordinates of the gesture key points of the user are three-dimensional coordinates capable of reflecting the relative position relationship among the gesture key points. In the embodiment of the present invention, each gesture key point of the user may be recognized from each frame in the continuous two-dimensional image, for example, each gesture key point of the user may be recognized from each frame in the continuous two-dimensional image based on a Visual Geometry Group Network (Visual Geometry Group) model, etc. In the embodiment of the present invention, this is not particularly limited. And then calculating the three-dimensional coordinates of each posture key point according to the position information of each posture key point in the two-dimensional image frame, the imaging parameters of a camera shooting the two-dimensional image frame and the like. Alternatively, the three-dimensional coordinate recognition network of the user gesture key point is trained in advance, each frame of the continuous two-dimensional images is input into the three-dimensional coordinate recognition network of the user gesture key point, and three-dimensional modeling and the like are performed to calculate the three-dimensional coordinate of the user gesture key point of each frame of the continuous two-dimensional images. In the embodiment of the present invention, this is not particularly limited.

Step 103: and identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames.

In the embodiment of the present invention, the control intention of the user may be which operation the user wants to perform on the terminal, and the like. The corresponding relationship between the user's pose key point information and the user's control intention may be established in advance, and the control intention corresponding to the user's pose key point information of at least one frame of the aforementioned consecutive two-dimensional image frames may be determined based on the above-mentioned corresponding relationship.

Step 104: and generating a corresponding control instruction according to the control intention.

In the embodiment of the present invention, the correspondence relationship between the control intention and the control instruction may be set in advance. After the control intention of the user is acquired, a control instruction corresponding to the control intention of the user may be acquired in the correspondence relationship. In the embodiment of the present invention, this is not particularly limited.

The control instruction can be used for controlling the terminal to execute corresponding operation. If the determined control intention is to turn off the terminal, the control command may be off, and the terminal is turned off after acquiring the control command.

In the embodiment of the invention, the three-dimensional coordinates of each gesture key point of the user of each frame in the continuous two-dimensional image frames are calculated and obtained only on the basis of the obtained continuous two-dimensional image frames, and the control intention of the user is further determined on the basis of the three-dimensional coordinates of each gesture key point of the user of at least one frame in the continuous two-dimensional image frames, so that the control intention of the user is identified by adopting the two-dimensional image frames. The common camera can obtain a two-dimensional image without an expensive depth data camera, so that the cost for obtaining the user intention through image information is reduced; meanwhile, the control intention of the user is determined based on the three-dimensional coordinates of each posture key point of the user, so that the error of the determined control intention caused by different shooting visual angles can be avoided, and the accuracy is high.

Referring to fig. 4, fig. 4 is a flowchart of steps of a control instruction generation method according to another embodiment of the present invention, which may also be applied to a terminal or a controller side of the terminal, and refer to the foregoing description specifically. In the embodiment of the invention, the method mainly comprises the following steps:

step 201: successive two-dimensional image frames are acquired.

In the embodiment of the present invention, the step 201 may refer to the step 101 to avoid repetition, which is not described herein again.

Step 202: calculating user pose keypoint information for each of the successive two-dimensional image frames, the user pose keypoint information comprising: three-dimensional coordinates of user gesture key points.

In this embodiment of the present invention, the step 202 may refer to the step 102, and before the step 202, the following steps may be further included:

step S1: and performing face recognition on each frame in the continuous two-dimensional image frames, and determining authorized users in each frame in the continuous two-dimensional image frames.

Step S2: deleting gesture key points of unauthorized users in each frame, and keeping gesture key points of authorized users.

Specifically, a face image of an authorized user and the like may be stored in advance, face recognition may be performed on each of the continuous two-dimensional image frames based on the face image of the authorized user and the like stored in advance, and whether each of the continuous two-dimensional image frames includes the authorized user or not may be recognized first. If each frame in the continuous two-dimensional image frames does not comprise the authorized user, the steps are repeatedly executed until the authorized user is identified in one frame in the continuous two-dimensional image. If the authorized user is identified, the authorized user and the unauthorized user of each frame in the continuous two-dimensional image frames are distinguished by using face images and the like of the authorized user stored in advance. And deleting the gesture key points of the unauthorized user in each frame of the continuous two-dimensional image frames, only keeping the gesture key points of the authorized user, and then subsequently responding only to the authorized user, so that the unauthorized user can be prevented from operating the terminal, the privacy of the authorized user is protected, and the like.

In the embodiment of the present invention, there may be one or more authorized users corresponding to one terminal, which is not particularly limited in the embodiment of the present invention.

In an embodiment of the present invention, optionally, the step 202 may include the following sub-steps: and calculating gesture key point information of authorized users reserved in each frame of the continuous two-dimensional image frames. Specifically, only the pose key points of authorized users are reserved for each frame in the continuous two-dimensional image frames, and only the pose key point information of the authorized users reserved in each frame in the continuous two-dimensional image frames is calculated, but the pose key point information of unauthorized users in each frame in the continuous two-dimensional image frames is not calculated, so that the calculation amount is reduced, and the speed of calculating the pose key point information is high; on the other hand, the control intention of the authorized user is identified only according to the gesture key point information of the authorized user reserved in each frame of the continuous two-dimensional image frames, the control intention of the unauthorized user is ignored, the unauthorized user can be prevented from operating the terminal, the privacy of the authorized user is protected, and the like.

Step 203: and inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into a gesture matching model to obtain at least one frame of corresponding matching control gesture.

Specifically, a gesture matching model may be obtained through training in advance, and the gesture matching model is mainly used for outputting the similarity between the user gesture key point information, and determining two types of user gesture key point information with the similarity exceeding a set similarity as matched gesture key point information. The control gesture key point information corresponding to the control gesture may be preset in advance. And inputting the user posture key point information of each frame in the continuous two-dimensional image frame and the control posture key point information corresponding to each preset control posture into a posture matching model, wherein the posture matching model calculates the similarity between the user posture key point information of each frame in the continuous two-dimensional image frame and the control posture key point information corresponding to each preset control posture in sequence, and under the condition that the similarity between the user posture key point information of the current frame in the continuous two-dimensional image frame and the control posture key point information corresponding to the preset current control posture exceeds the set similarity, determining the current control posture as the matching control posture corresponding to the current frame in the continuous two-dimensional image frame. By adopting the gesture matching model, the user intention can be recognized accurately.

In the embodiment of the present invention, optionally, referring to fig. 5, fig. 5 is a flowchart of a step of determining a matching control gesture in the embodiment of the present invention. The step 203 may include the following steps:

step 2031: inputting the user posture key point information of each frame in the continuous two-dimensional image frame and the control posture key point information corresponding to each preset control posture into the posture matching model to obtain the matching confidence coefficient between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture;

step 2032: and if the matching confidence exceeds a preset threshold, determining the current control posture as the matching control posture corresponding to the at least one frame.

Specifically, the user posture key point information of each frame in the continuous two-dimensional image frames and the control posture key point information corresponding to each preset control posture may be input into the posture matching model, the posture matching model sequentially calculates the matching confidence between the user posture key point information of each frame in the continuous two-dimensional image frames and the control posture key point information corresponding to each preset control posture, and in the case that the matching confidence between the user posture key point information of the current frame in the continuous two-dimensional image frames and the control posture key point information corresponding to the preset current control posture exceeds a preset threshold, the current control posture may be determined as the matching control posture corresponding to at least one frame in the continuous two-dimensional image frames. It should be noted that the preset threshold may be set according to actual needs, and this is not specifically limited in the embodiment of the present invention. And sequentially determining matching confidence degrees of the user posture key point information of each frame in the continuous two-dimensional image frames and the control posture key point information corresponding to each preset control posture, so that the matching control posture corresponding to at least one frame in the continuous two-dimensional image frames can be accurately determined.

In the embodiment of the present invention, if each matching confidence corresponding to each frame in the continuous two-dimensional image frames does not exceed the preset confidence, the continuous two-dimensional image frame may be considered as an invalid frame, or the continuous two-dimensional image frame may not find a matching control gesture. If only one matching confidence coefficient exceeding a preset threshold exists in the matching confidence coefficients corresponding to one frame in the continuous two-dimensional image frames, determining the preset control posture corresponding to the matching confidence coefficient exceeding the preset threshold as the matching control posture corresponding to the continuous two-dimensional image frames. Or determining the preset control gesture corresponding to the matching confidence coefficient exceeding the preset threshold value as the matching control gesture corresponding to the frame in the continuous two-dimensional image frame. If multiple frames in the continuous two-dimensional image frames all determine the matching confidence exceeding the preset threshold, and if the preset control postures corresponding to the matching confidence exceeding the preset threshold of the multiple frames are the same, the preset control posture corresponding to the matching confidence exceeding the preset threshold of the multiple frames can be determined as the matching control posture corresponding to the continuous two-dimensional image frames. Or respectively determining the preset control gesture corresponding to the matching confidence coefficient when the multiple frames exceed the preset threshold value as the matching control gesture corresponding to each frame in the multiple frames in the continuous two-dimensional image frames.

For example, the user gesture key point information of the first frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture are input into the gesture matching model, and the gesture matching model sequentially calculates the respective matching confidence degrees between the user gesture key point information of the first frame and the control gesture key point information corresponding to each preset control gesture. And judging whether each matching confidence corresponding to the first frame has a matching confidence exceeding a threshold value, if so, determining a preset control gesture corresponding to the matching confidence exceeding a preset threshold value as the matching control gesture corresponding to the first frame, and if not, considering the first frame as an invalid frame or considering that the first frame does not find the matching control gesture. And performing a similar calculation process for the user posture key point information of the second frame in the continuous two-dimensional image frames, and repeating the steps until the last frame is calculated. If only the matching confidence coefficient between the user posture key point information of the fourth frame and the user posture key point information of the second preset control posture in the continuous two-dimensional image frames exceeds the preset threshold value after the calculation is finished, and the matching confidence coefficient between the user posture key point information of the rest frames and the user posture key point information of each preset control posture does not exceed the preset threshold value, the second preset control posture can be determined as the matching control posture corresponding to the fourth frame in the continuous two-dimensional image frame. Or, the second preset control posture is determined as the matching control posture corresponding to the continuous two-dimensional image frame.

In the embodiment of the present invention, referring to fig. 6, fig. 6 is a schematic diagram illustrating an operation of a posture matching model in the embodiment of the present invention. The gesture matching model may include: a first fully connected network, a second fully connected network, and a third fully connected network. The specific structures of the first, second, and third fully-connected networks are not particularly limited. Referring to fig. 7, fig. 7 is a flowchart illustrating a step of calculating a matching confidence level according to an embodiment of the present invention. The step 2031 may include the following steps:

step 20311: inputting the user gesture key point information of each frame in the continuous two-dimensional image frames into the first fully-connected network, and inputting the control gesture key point information corresponding to each preset control gesture into the second fully-connected network.

Step 20312: and adding the output vectors of the first fully-connected network and the second fully-connected network, inputting the sum to a third fully-connected network, and outputting the matching confidence coefficient between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture through the third fully-connected network.

Specifically, the user gesture key point information of each frame in the continuous two-dimensional image frames may be input into a first fully-connected network, and the control gesture key point information corresponding to each preset control gesture may be input into a second fully-connected network. The first full-connection network calculates the user gesture key point information of each frame and outputs a first vector. And the second full-connection network calculates the control posture key point information corresponding to each preset control posture and outputs a second vector. And adding the first vector and the second vector output by the first fully-connected network and the second fully-connected network, inputting the sum to a third fully-connected network, and outputting the matching confidence coefficient between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture through the third fully-connected network.

Step 204: and determining the corresponding control intention of the user according to the matching control gesture corresponding to the at least one frame.

In the embodiment of the invention, a corresponding relation between the matching control gesture and the control intention can be established in advance, and the control intention corresponding to the user is determined according to the matching control gesture corresponding to at least one frame in the continuous two-dimensional image frames based on the corresponding relation.

In embodiments of the present invention, the preset control gesture may comprise a static control gesture, and then the matching control gesture may also comprise a static control gesture. For example, the salutation gesture may be a static control gesture.

In this embodiment of the present invention, optionally, if the matching control gesture is a static control gesture, the control intention corresponding to the matching control gesture is determined according to a preset correspondence between the static control gesture and the control intention.

Specifically, the correspondence relationship between the static control posture and the control intention may be established in advance. If the matching control gesture corresponding to at least one frame in the continuous two-dimensional image frames is a static control gesture, the control intention corresponding to the static control gesture can be determined as the control intention corresponding to the matching control gesture according to the preset corresponding relation between the static control gesture and the control intention,

in embodiments of the present invention, the preset control gesture may comprise a dynamic control gesture, and then the matching control gesture may also comprise a dynamic control gesture. Dynamic control gestures may be more distinguishable from static control gestures, while individual dynamic control gestures may be more distinguishable from one another. For example, a thumb to forefinger ratio by the letter C may be a dynamic control gesture.

In the embodiment of the present invention, optionally, referring to fig. 8, fig. 8 is a flowchart of a step of determining a control intention in the embodiment of the present invention. The step 204 may include the following steps:

step 2041: and if the matched control gesture is a dynamic control gesture, detecting the variable quantity of the control quantity gesture key point in the dynamic control gesture according to the current frame and at least one frame after the current frame time sequence.

Step 2042: and determining a control intention corresponding to the matched control gesture according to the dynamic control gesture and the variation of the key points of the control quantity gesture.

Specifically, the dynamic control gesture may include a control amount gesture key point, which may be a gesture key point that easily changes from among the respective gesture key points in the dynamic control gesture, or the like. The number of control quantity gesture key points in the dynamic control gesture may be one or more, and is not particularly limited in the embodiment of the present invention. For example, a thumb-to-index finger ratio of the letter C may be a dynamic control gesture, and the thumb tip and index finger tip may be control quantity gesture key points for the dynamic control gesture.

If the matching control gesture corresponding to the current frame in the continuous two-dimensional image frames is a dynamic control gesture, the current frame and at least one frame after the current frame time sequence can be compared to obtain the variation of the control quantity gesture key point in the current frame and at least one frame after the current frame time sequence. The variation may be a position variation of the control quantity pose key point in the current frame and at least one frame subsequent to the current frame in time sequence, and the like. In the embodiment of the present invention, this is not particularly limited. For example, the variation may be any one of a translation amount, a rotation amount, and a zoom amount of the control amount pose key point in the current frame and at least one frame subsequent to the current frame timing.

For example, if the thumb and index finger are compared by the letter C, a dynamic control gesture can be made, and the tip of the thumb and the tip of the index finger can be the control quantity gesture key points of the dynamic control gesture. And if the matching control posture corresponding to the current frame in the continuous two-dimensional image frames is the dynamic control posture. The amount of distance change between the thumb tip and the index finger tip may be detected in the current frame and at least one frame subsequent to the current frame timing. And if the distance between the thumb tip and the index finger tip of the user is determined to be gradually increased from 5cm to 10cm according to the current frame in the continuous two-dimensional image frames and at least one frame after the time sequence of the current frame. Then, the determined variation of the control amount pose key point in the dynamic control pose of the current frame in the continuous two-dimensional image frames and at least one frame after the current frame time sequence is: the distance between the tip of the thumb and the tip of the index finger increases gradually from the first 5cm to 10 cm.

In the embodiment of the present invention, the dynamic control posture and the variation amount of the control amount posture key point in the dynamic control posture may be combined, and the correspondence relationship between each combination and the control intention may be set in advance. Based on the correspondence relationship, a dynamic control posture corresponding to the current frame may be determined, and the variation of the control amount posture key point in the dynamic control posture detected by the current frame and at least one frame chronologically after the current frame may be combined with the corresponding control intention.

In an embodiment of the present invention, optionally, step 2041 may include: and inputting the current frame and at least one frame after the current frame in time sequence into a variable quantity determination model corresponding to the dynamic control posture, and outputting the variable quantity of the control quantity posture key point in the dynamic control posture.

Specifically, for each preset dynamic control gesture, a variation determining model corresponding to the dynamic control gesture may be trained in advance, where the variation determining model is configured to receive a current frame in the continuous two-dimensional image frame and at least one frame after the current frame in time sequence, compare the variation corresponding to the dynamic control gesture in the current frame in the continuous two-dimensional image frame and at least one frame after the current frame in time sequence, and output the variation of the control amount gesture key point in the dynamic control gesture in the current frame in the continuous two-dimensional image frame and at least one frame after the current frame in time sequence. The current frame and at least one frame after the current frame in time sequence may be input into the previously trained variation determination model corresponding to the dynamic control gesture, and the variation of the controlled variable gesture key point in the dynamic control gesture may be output. The variation of the control quantity posture key point in the dynamic control posture can be accurately and quickly obtained through the variation determining model corresponding to the dynamic control posture.

In an embodiment of the present invention, optionally, step 2041 may include: inputting at least one frame after the current frame time sequence into the gesture matching model to obtain a matching control gesture corresponding to each frame in the at least one frame after the current frame time sequence; and detecting the variation of the key point of the control amount posture in the dynamic control posture according to the current frame and a target frame with the same matching control posture as that of the current frame in at least one frame after the current frame time sequence.

Specifically, at least one frame after the current frame time sequence in the continuous two-dimensional image frames may be input into the aforementioned gesture matching model, so as to obtain a matching control gesture corresponding to each frame in the at least one frame after the current frame time sequence. A target frame with the matching control posture identical to that of the current frame is selected from at least one frame after the current frame in time sequence, and the variation of the control quantity posture key point in the dynamic control posture is calculated only in the current frame and the target frame with the same dynamic control posture as the current frame. The matching control postures corresponding to the current frame and the target frame corresponding to the current frame are the same and are the same dynamic control posture, and the variable quantity of the control quantity posture key point in the dynamic control posture is calculated only on the basis of the current frame and the target frame corresponding to the current frame, so that the error detection of the variable quantity can be avoided, and the accuracy of the variable quantity is favorably improved.

Step 205: and generating a corresponding control instruction according to the control intention.

In the embodiment of the present invention, the step refers to the related description of the step 104 to avoid repetition, and is not described herein again.

Step 206: sending the control instruction to the controlled equipment; the controlled equipment is used for executing the control instruction.

In the embodiment of the present invention, in a case where the execution subject of the method and the controlled device are not the same device, the control instruction may be sent to the controlled device. The controlled device may be a device or the like that can execute the above-described operation instructions. For example, the controlled device may be a video playing terminal, such as a television set. The executing agent in the above steps 201 to 205 may perform data interaction with the controlled device, where the data interaction may be performed in a wired manner or a wireless manner, and the control instruction may be sent to the controlled device, and the controlled device executes the control instruction to control the controlled device. The controlled equipment can be operated in a diversified manner, and the operation of a user can be facilitated.

For example, for the above example, if the matching control gesture corresponding to at least one frame of the consecutive two-dimensional image frames is identified: a salute posture. If the control intention corresponding to the salutation posture is as follows: and controlling the controlled equipment to pause or continue playing. The control instruction determined according to the control intention may be: pause the play or continue the play. Then, a control instruction to pause or continue playing may be sent to the controlled device, such as: a television set. If the television is currently in a normal video playing state, after receiving the control instruction, the television executes the control instruction of pausing the playing to stop the playing of the current video. If the television is in the video pause playing state currently, after receiving the control instruction, the television executes the control instruction of continuing playing so as to continue playing the current video. The user can conveniently control the television only through the gesture key point information, and the diversity of the user in operating the television is increased.

For another example, in the above example, if the matching control gesture corresponding to at least one frame of the consecutive two-dimensional image frames is identified: the thumb is compared to the index finger by the letter C. From the current frame and at least one frame after the current frame in the continuous two-dimensional image frames, the distance between the finger tip of the thumb and the finger tip of the index finger is gradually increased from 5cm to 10 cm. If the thumb and the index finger are compared to form a letter C, the distance between the tip of the thumb and the tip of the index finger gradually increases from the initial 5cm to the 10cm, and the corresponding control intentions are as follows: and controlling the controlled equipment to turn the volume up by 5 decibels. The control instruction determined according to the control intention may be: the volume is turned up by 5 db. Then, a control instruction for turning up the volume by 5 db may be sent to the controlled device, such as: a television set. If the current playing volume of the television is 20 decibels, after the control instruction is received, the television executes the control instruction for increasing the volume by 5 decibels, and the playing volume of the television is adjusted to 25 decibels.

In this embodiment of the present invention, optionally, if the execution main body in step 201 to step 205 and the controlled device are the same device, the controlled device may directly execute the control instruction without executing the control instruction sending operation.

Referring to fig. 9, fig. 9 is a flowchart of steps of a control instruction generating method in an embodiment of the present invention, and the method may also be applied to a terminal or a controller side of the terminal, and refer to the foregoing description specifically. In the embodiment of the invention, the method mainly comprises the following steps:

step 301: successive two-dimensional image frames are acquired.

In the embodiment of the present invention, the step 301 may refer to the step 101, and is not described herein again to avoid repetition.

Step 302: the gesture keypoint information comprises: and the gesture key point information is used for detecting human key points of each frame in the continuous two-dimensional image frames to obtain the recognition result of the human key points.

In an embodiment of the present invention, the gesture keypoint information comprises: gesture keypoint information. That is, the gesture keypoint information includes three-dimensional coordinates of the gesture keypoints. The gesture keypoints may be left-handed gesture keypoints, and/or right-handed gesture keypoints. For the gesture key points, reference may be made to the above descriptions, and details are not repeated here to avoid repetition.

In the embodiment of the invention, the human body key point detection can be carried out on each frame in the continuous two-dimensional image frames to obtain the recognition result of the human body key points in each frame in the continuous two-dimensional image frames. That is, individual human key points are found in each of successive two-dimensional image frames. Or, the coordinates of each human body key point are identified in each frame in the continuous two-dimensional image frames.

Step 303: and determining the left elbow coordinate and/or the right elbow coordinate in each frame according to the recognition result of the key points of the human body.

In the embodiment of the invention, the left elbow and/or the right elbow are key points of the human body, so that the left elbow coordinate and/or the right elbow coordinate in each frame can be determined based on the recognition result of the key points of the human body. The left elbow coordinate and/or the right elbow coordinate may be a two-dimensional coordinate or a three-dimensional coordinate of the left elbow and/or the right elbow in each frame, and the like. In the embodiment of the present invention, this is not particularly limited.

Step 304: determining a gesture detection area in each frame according to the left elbow coordinate and/or the right elbow coordinate; the gesture detection area comprises finger key points and wrist key points of a left hand and/or finger key points and wrist key points of a right hand.

In the embodiment of the present invention, the gesture detection area including the finger keypoints and the wrist keypoints of the left hand and/or the finger keypoints and the wrist keypoints of the right hand in each frame may be determined according to the left-elbow coordinates and/or the right-elbow coordinates in the human body keypoints.

In this embodiment of the present invention, optionally, the step 304 may include the following steps: determining left wrist coordinates of the user, and/or right wrist coordinates of the user in each of the frames. And determining a gesture detection area according to the left wrist coordinate and the left elbow coordinate, and/or determining the gesture detection area according to the right wrist coordinate and the right elbow coordinate.

The left wrist coordinates of the user, and/or the right wrist coordinates of the user, may be determined in each frame. The left wrist coordinates, and/or the right wrist coordinates may be two-dimensional coordinates, etc. As the wrist is a gesture key point, the left wrist coordinate and/or the right wrist coordinate can be determined from each frame according to the recognition result of the gesture key point. In the embodiment of the present invention, this is not particularly limited.

In the embodiment of the present invention, the gesture detection area may cover all the gesture key points of one hand, and meanwhile, the other contents except the above all the gesture key points are less. Alternatively, the gesture detection area may cover a portion of the gesture key points of one hand, while less content is covered other than the portion of the gesture key points. Alternatively, the gesture detection area may cover all the gesture key points of both hands, while less other content is covered than all the gesture key points. Alternatively, the gesture detection area may cover part of the gesture key points of both hands, while less other content is covered than the above part of the gesture key points.

For example, FIG. 3 may be a gesture detection area determined from a certain frame.

In the embodiment of the present invention, optionally, the gesture detection area may be a square, and the left elbow coordinates of the user may include: a first abscissa and a first ordinate, the wrist keypoints may include: left and/or right wrist, the user's left wrist coordinates may include: a third abscissa and a third ordinate. The user's right elbow coordinates may include: a second abscissa and a second ordinate, the user's right wrist coordinates may include: a fourth abscissa and a fourth ordinate.

Referring to fig. 10, fig. 10 is a flowchart illustrating a step of determining a gesture detection area according to an embodiment of the present invention. The method can comprise the following steps:

step S1: subtracting the third abscissa from 5 times the first abscissa to obtain a first difference.

Step S2: and dividing the first difference value by 4 to obtain a target abscissa of the center of the gesture detection area.

Step S3: and subtracting the third ordinate from the first ordinate by 5 times to obtain a second difference.

Step S4: and dividing the second difference value by 4 to obtain a target vertical coordinate of the center of the gesture detection area.

Step S5: and subtracting the third abscissa from the first abscissa to obtain a third difference.

Step S6: and dividing the third difference value by 2 to obtain the side length of the gesture detection area.

Specifically, the third abscissa of the left wrist of the user is subtracted from the 5 times of the first abscissa of the left elbow of the user to obtain a first difference, and the first difference is divided by 4 to obtain the target abscissa of the center of the gesture detection area. And subtracting the third vertical coordinate of the left wrist of the user from the first vertical coordinate of the left elbow of the user by 5 times to obtain a second difference value, and dividing the second difference value by 4 to obtain a target vertical coordinate of the center of the gesture detection area. Subtracting a third abscissa of the left wrist of the user from the first abscissa of the left elbow of the user to obtain a third difference; and dividing the third difference value by 2 to obtain the side length of the gesture detection area. And the gesture detection area obtained by calculating the left elbow coordinate and the left wrist coordinate is mainly used for framing the gesture detection area of the left hand. Similarly, the gesture detection area calculated by the right elbow coordinate and the right wrist coordinate can be mainly used for framing the gesture detection area of the right hand. The obtained gesture detection area can accurately frame all the gesture key points, and the framed content except all the gesture key points is less, so that the accuracy is higher.

For example, in a frame, if the first coordinate of the user's left elbow is (x1, y1), the third coordinate of the user's left wrist is (x3, y 3). The gesture detection area may be square. The target abscissa of the center of the gesture detection area is:

the target ordinate of the center of the gesture detection area is:

the side length of the gesture detection area is as follows:

step 305: calculating user posture key point information in a gesture detection area in each frame of the continuous two-dimensional image frames; the user gesture keypoint information comprises: three-dimensional coordinates of user gesture key points.

In an embodiment of the present invention, only user gesture keypoint information in a gesture detection region in each of successive two-dimensional image frames may be calculated. All or part of the gesture key points are located in the gesture detection area, so that only the user gesture key point information in the gesture detection area in each frame needs to be concerned, the processing of irrelevant image information in each frame is avoided, the accuracy of the acquired gesture key point information can be improved, and the efficiency can be improved.

Step 306: and judging whether a preparation gesture is received or not according to the user gesture key point information of at least one frame of the continuous two-dimensional image frames.

In the embodiment of the invention, a preparation gesture can be preset, and the preparation gesture and the preset dynamic control gesture and the preset static control gesture have larger discrimination. For example, the preparation gesture may be: the fist or the five fingers are fully extended.

When the user gesture key point information acquired in at least one of the consecutive two-dimensional image frames matches the preparation gesture, it may be considered that the preparation gesture is received. In the case where the user gesture key point information of each of the consecutive two-dimensional image frames does not match any of the above preparation gestures, it may be considered that no preparation gesture has been received. When the preparation gesture is received, it is described that the gesture key point information of the subsequent user is the gesture key point information for recognizing the control intention.

Step 307: and under the condition that a preparation gesture is received, identifying the control intention of the user according to the user gesture key point information of at least one frame in the continuous two-dimensional image frames.

Furthermore, in the embodiment of the present invention, when it is received that the acquired corresponding posture information of the user is the preparation posture information, the control intention of the user is identified according to the user posture key point information of at least one frame in the continuous two-dimensional image frames, so as to avoid the false detection of the control intention.

Step 308: and generating a corresponding control instruction according to the control intention.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the embodiments of the application.

Fig. 11 is a control instruction generating apparatus according to an embodiment of the present invention, where the apparatus 400 may include:

an image acquisition module 401, configured to acquire consecutive two-dimensional image frames;

a pose keypoint information calculation module 402 for calculating user pose keypoint information for each of said successive two-dimensional image frames, said user pose keypoint information comprising: three-dimensional coordinates of the user gesture key points;

a control intention identifying module 403, configured to identify a control intention of the user according to user gesture key point information of at least one frame in the consecutive two-dimensional image frames;

and a control instruction generating module 404, configured to generate a corresponding control instruction according to the control intention.

Optionally, on the basis of fig. 11, referring to fig. 12, the control intention identifying module 403 may include:

a matching control gesture determining submodule 4031, configured to input, to the gesture matching model, the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture in sequence, so as to obtain at least one frame of corresponding matching control gesture;

a first control intention recognition sub-module 4032, configured to determine a control intention corresponding to the user according to the matching control gesture corresponding to the at least one frame.

Optionally, the matching control gesture determination submodule 4031 may include:

a matching confidence determining unit 40311, configured to input, to the gesture matching model, the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture in sequence, so as to obtain a matching confidence between the user gesture key point information of the current frame and the control gesture key point information corresponding to the current control gesture;

a matching control gesture determining unit 40312, configured to determine the current control gesture as the matching control gesture corresponding to the at least one frame if the matching confidence exceeds a preset threshold.

Optionally, the first control intention identifying sub-module 4032 may include:

a first control intention recognition unit 40321, configured to, if the matching control gesture is a static control gesture, determine a control intention corresponding to the matching control gesture according to a preset correspondence between the static control gesture and the control intention.

a variation detecting unit 40322, configured to detect, if the matching control gesture is a dynamic control gesture, a variation of a control amount gesture key point in the dynamic control gesture according to the current frame and at least one frame subsequent to the current frame in time sequence;

a second control intention recognition unit 40323, configured to determine a control intention corresponding to the matching control gesture according to the dynamic control gesture and the variation amount of the control amount gesture key point.

Optionally, the variation detecting unit 40322 may include:

a first variation detecting subunit 403221, configured to input the current frame and at least one frame after the current frame in time sequence into a variation determining model corresponding to the dynamic control gesture, and output a variation of a control amount gesture key point in the dynamic control gesture.

Optionally, the variation detecting unit 40322 may include:

Optionally, the gesture matching model includes: a first fully connected network, a second fully connected network, and a third fully connected network; the matching confidence determination unit 40311 may include:

Alternatively, as shown in fig. 13 in addition to fig. 11, the posture key point information may include: gesture keypoint information, the apparatus may further comprise:

a human body key point identification module 405, configured to perform human body key point detection on each frame in the continuous two-dimensional image frames to obtain an identification result of a human body key point;

an elbow coordinate determining module 406, configured to determine, according to the recognition result of the human body keypoints, a left elbow coordinate and/or a right elbow coordinate in each frame;

a gesture detection area determining module 407, configured to determine a gesture detection area in each frame according to the left elbow coordinate and/or the right elbow coordinate; the gesture detection area comprises finger key points and wrist key points of a left hand and/or finger key points and wrist key points of a right hand;

the gesture keypoint information calculation module 402 may include:

the gesture key point information first calculating submodule 4021 is configured to calculate user gesture key point information in a gesture detection area in each of the consecutive two-dimensional image frames.

Optionally, the apparatus may further include:

the gesture keypoint information calculation module 402 may include:

Optionally, the apparatus may further include:

a preparation gesture determining module 408, configured to determine whether a preparation gesture is received according to user gesture key point information of at least one frame of the consecutive two-dimensional image frames;

the control intention identifying module 403 may include:

a second control intention identifying sub-module 4033, configured to identify a control intention of the user according to user gesture key point information of at least one frame in the consecutive two-dimensional image frames when the preparation gesture is received.

Optionally, on the basis of fig. 11, referring to fig. 12, the apparatus further includes:

a control instruction sending module 409, configured to send the control instruction to a controlled device; the controlled equipment is used for executing the control instruction.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

An embodiment of the present invention further provides an electronic device, as shown in fig. 14, including a processor 91, a communication interface 92, a memory 93, and a communication bus 94, where the processor 91, the communication interface 92, and the memory 93 complete mutual communication through the communication bus 94,

a memory 93 for storing a computer program;

the processor 91, when executing the program stored in the memory 93, implements the following steps:

acquiring continuous two-dimensional image frames;

The communication bus mentioned in the above terminal may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the terminal and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, which stores instructions that, when executed on a computer, cause the computer to execute the control instruction generation method described in any of the above embodiments.

In yet another embodiment, the present invention further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the control instruction generation method described in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A control instruction generation method, the method comprising:

acquiring continuous two-dimensional image frames;

2. The method of claim 1, wherein identifying the control intent of the user from the user gesture keypoint information of at least one of the consecutive two-dimensional image frames comprises:

3. The method according to claim 2, wherein the inputting the user gesture key point information of each frame in the continuous two-dimensional image frames and the control gesture key point information corresponding to each preset control gesture into the gesture matching model to obtain at least one frame of corresponding matching control gesture comprises:

4. The method of claim 3, wherein determining the corresponding control intent of the user based on the corresponding matching control gesture of the at least one frame comprises:

5. The method of claim 3, wherein determining the corresponding control intent of the user based on the corresponding matching control gesture of the at least one frame comprises:

6. The method according to claim 5, wherein the detecting the variation of the control quantity pose key point in the dynamic control pose according to the current frame and at least one frame after the current frame time sequence comprises:

7. The method according to claim 5, wherein the detecting the variation of the control quantity pose key point in the dynamic control pose according to the current frame and at least one frame after the current frame time sequence comprises:

8. The method of claim 3, wherein the gesture matching model comprises: a first fully connected network, a second fully connected network, and a third fully connected network; inputting the user posture key point information of each frame in the continuous two-dimensional image frame and the control posture key point information corresponding to each preset control posture into the posture matching model to obtain the matching confidence coefficient between the user posture key point information of the current frame and the control posture key point information corresponding to the current control posture, and the method comprises the following steps:

9. The method of claim 1, wherein the gesture keypoint information comprises: gesture keypoint information, said calculating user gesture keypoint information for each of said successive two-dimensional image frames further comprising:

10. The method of claim 1, wherein prior to said computing user pose keypoint information for each of said successive two-dimensional image frames, further comprising:

11. The method of claim 1, wherein before identifying the control intent of the user based on the user gesture keypoint information of at least one of the consecutive two-dimensional image frames, further comprising:

12. The method of claim 1, further comprising:

13. A control instruction generating apparatus, characterized in that the apparatus comprises:

14. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-12 when executing a program stored in the memory.

15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-12.