CN111367415B

CN111367415B - Equipment control method and device, computer equipment and medium

Info

Publication number: CN111367415B
Application number: CN202010186785.1A
Authority: CN
Inventors: 何吉波; 谭北平; 谭志鹏
Original assignee: Tsinghua University; Beijing Mininglamp Software System Co ltd
Current assignee: Tsinghua University; Beijing Mininglamp Software System Co ltd
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2024-01-23
Anticipated expiration: 2040-03-17
Also published as: CN111367415A

Abstract

The application provides a control method, a control device, a computer device and a medium of equipment, wherein the method comprises the following steps: acquiring a target video for controlling target equipment; determining a gesture pattern and a gesture motion track which appear in the target video according to the multi-frame images appointed in the target video; determining a control instruction corresponding to the target video according to the gesture pattern and the gesture motion track; and controlling the target equipment according to the control instruction. According to the embodiment of the application, the gesture patterns and the gesture motion tracks are determined in the acquired video, and the control instruction is determined according to the determined gesture patterns and gesture motion tracks so as to control the target equipment, so that a user does not need to wear data gloves matched with the target equipment, complicated steps for controlling the target equipment are reduced, and the convenience for controlling the target equipment is improved.

Description

Equipment control method and device, computer equipment and medium

Technical Field

The present application relates to the field of intelligent recognition, and in particular, to a method and apparatus for controlling a device, a computer device, and a medium.

Background

Along with the development of science and technology, remote control also gradually walks into the field of vision of people, and remote control can realize people and equipment's interaction, under the condition that people do not contact equipment, lets equipment work according to people's instruction, and remote control can improve people's productivity to a great extent, improves work efficiency.

Generally, people achieve remote control by wearing data gloves. The data glove is provided with a plurality of sensors, the gestures made by the hand wearing the data glove can be identified through the sensors, and the corresponding equipment is remotely controlled according to the identified gestures. According to the method for controlling the equipment, the appointed data glove is needed to be worn, the equipment cannot be controlled under the condition that the appointed data glove is not available, and the convenience for controlling the equipment is reduced.

Disclosure of Invention

In view of the foregoing, an object of the present application is to provide a control method, apparatus, computer device and medium for a device, which are used for solving the problem of how to remotely control the device in the prior art.

In a first aspect, an embodiment of the present application provides a method for controlling an apparatus, including:

acquiring a target video for controlling target equipment;

Determining a gesture pattern and a gesture motion track which appear in the target video according to the multi-frame images appointed in the target video;

determining a control instruction corresponding to the target video according to the gesture pattern and the gesture motion track;

and controlling the target equipment according to the control instruction.

Optionally, after acquiring the target video acting on the mechanical device, before determining the gesture pattern and the gesture motion track appearing in the target video according to the multi-frame image specified in the target video, the method further includes:

for each frame of image in the target video, carrying out foreground extraction based on the color value of each part in the image so as to determine an area image in which the hand is positioned in the frame of image;

and determining whether the target video is an effective target video according to the area of the region image of each frame of image of the target video where the hand is located.

Optionally, the determining, according to the multi-frame image specified in the target video, the gesture pattern and the gesture motion track appearing in the target video includes:

determining the outline of a hand and the position information of a positioning point in each frame of image in the target video;

Determining the gesture pattern according to the similarity between the outline of the hand in each frame of image and the outline of the candidate standard gesture;

and determining the gesture motion track according to the position information and the time information of the positioning point in each frame of image.

Optionally, the location point location includes any one or more of the following locations:

thumb tip position, index finger tip position, middle finger tip position, ring finger tip position, little finger tip position, and centroid position.

Optionally, after determining the control instruction corresponding to the target video according to the gesture pattern and the gesture motion track, before controlling the mechanical device according to the control instruction, the method further includes:

the control instruction is sent to a user terminal, so that the user terminal prompts the control instruction in a message prompt mode;

and receiving a reply instruction of the user terminal for the prompted control instruction.

Optionally, the message prompting mode includes any one or more of the following modes:

text prompting mode, image prompting mode and broadcasting prompting mode.

In a second aspect, an embodiment of the present application provides a control apparatus for a device, including:

The acquisition module is used for acquiring a target video for controlling target equipment;

the first determining module is used for determining a gesture pattern and a gesture motion track which appear in the target video according to the multi-frame images appointed in the target video;

the second determining module is used for determining a control instruction corresponding to the target video according to the gesture pattern and the gesture motion track;

and the control module is used for controlling the target equipment according to the control instruction.

Optionally, the apparatus further includes:

the extraction module is used for carrying out foreground extraction on the basis of the color value of each part in each frame of image in the target video so as to determine an area image in which the hand is positioned in the frame of image;

and the judging module is used for determining whether the target video is an effective target video according to the area of the area image of the hand in each frame of image of the target video.

In a third aspect, embodiments of the present application provide a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method.

The embodiment of the application provides a control method of equipment, firstly, a target video for controlling target equipment is obtained; secondly, determining a gesture pattern and a gesture motion track which appear in the target video according to the multi-frame images appointed in the target video; thirdly, determining a control instruction corresponding to the target video according to the gesture pattern and the gesture motion track; and finally, controlling the target equipment according to the control instruction.

According to the embodiment of the application, the gesture patterns and the gesture motion tracks are determined in the acquired video, and the control instruction is determined according to the determined gesture patterns and gesture motion tracks so as to control the target equipment, so that a user does not need to wear data gloves matched with the target equipment, complicated steps for controlling the target equipment are reduced, and the convenience for controlling the target equipment is improved.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a control method of a device according to an embodiment of the present application;

fig. 2 is a flow chart of a method for determining a gesture pattern and a gesture motion track according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a control device of an apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a computer device 400 according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

At present, when man-machine interaction is performed, a user wears a data glove matched with a target device (device to be controlled), a gesture made by the user under the condition of wearing the data glove can be recognized, and the target device is controlled through the recognized gesture, so that the target device can complete corresponding actions.

Controlling the target device comprises the steps of:

step 1, acquiring gestures made by a hand wearing a data glove;

step 2, determining a control instruction corresponding to the gesture in a database according to the gesture;

and step 3, controlling the target equipment according to the control instruction.

In the method for controlling the target equipment, the target equipment can be controlled by wearing the data glove matched with the target equipment, and the hand gestures of the user are required to be completely recognized by the data glove, so that the data glove is provided with a plurality of sensors, the cost of the data glove provided with the sensors is high, the data glove cannot be perfectly matched with the hand of the user, and the experience effect of the user is reduced. And each time the user is controlling the target equipment, the user needs to wear the corresponding data glove, so that the convenience of the user in controlling the target equipment is reduced, and the efficiency of the user in controlling the target equipment is also reduced.

For the above reasons, the embodiment of the present application provides a control method of an apparatus, as shown in fig. 1, including the following steps:

s101, acquiring a target video for controlling target equipment;

s102, determining a gesture pattern and a gesture motion track which appear in a target video according to a multi-frame image appointed in the target video;

s103, determining a control instruction corresponding to the target video according to the gesture pattern and the gesture motion track;

s104, controlling the target equipment according to the control instruction.

In the above step S101, the target device refers to a device that needs to be controlled, and the target device may include any one or several of the following devices: robot, robotic arm, intelligent air conditioner, intelligent TV, intelligent light etc. the target video refers to the video that can be used to control target equipment, need contain the hand action in the target video, acquire target video's equipment can be camera, intelligent equipment etc..

Specifically, the solution of the present application requires an image capturing apparatus for capturing a target video, by which the target video is acquired, and the subsequent steps S102 to S104 can be performed only if the target video is acquired.

In the above step S102, the specified multi-frame image may be a frame image extracted from the target video at a preset time interval, and the preset time interval may be 1 second, 2 seconds, 3 seconds, or the like, which is not limited herein. The gesture pattern refers to a hand shape formed by finger variation, and the gesture pattern comprises any one of a fist holding pattern, a single finger protruding pattern, two finger protruding patterns, three finger protruding patterns, four finger protruding patterns, five finger protruding patterns and the like;

wherein, a finger protruding pattern includes any one of the following: thumb protruding style, index finger protruding style, middle finger protruding style, ring finger protruding style, little finger protruding style, etc.;

the two finger protrusion pattern includes any one of the following: an index finger and thumb protruding pattern, an index finger and middle finger protruding pattern, a middle finger and ring finger protruding pattern, a thumb and little finger protruding pattern, and the like;

the three finger protrusion pattern includes any one of the following: a forefinger, middle finger and ring finger protruding pattern, a middle finger, ring finger and little finger protruding pattern, etc.;

the four finger protrusion pattern includes any one of the following: the protruding patterns of the index finger, the middle finger, the ring finger, the little finger, etc.;

The five finger protrusion pattern includes any one of the following: thumb, index finger, middle finger, ring finger, and little finger protruding patterns, etc.

The gesture motion track refers to a position where a hand moves while the gesture pattern remains unchanged in the target video, and includes any one of an upward translation, a downward translation, a leftward translation, a rightward translation, a horizontal clockwise rotation, a horizontal counterclockwise rotation, a clockwise rotation perpendicular to the horizontal direction, a counterclockwise rotation perpendicular to the horizontal direction, and the like.

Specifically, in a specified multi-frame image in a target video, determining an area image where a hand is located and a position where the hand is located in each frame image of the multi-frame image, determining a gesture pattern according to the outline of the area image where the hand is located, sorting the positions where the hand is located in each frame image according to time, and determining a gesture motion track after sorting.

For example, there is a target video in which 5 frame images are extracted, and in the first frame image, the gesture pattern is a forefinger protruding pattern, and the position of the hand is 2 cm away from the bottom edge of the frame image; in the second frame image, the gesture pattern is a forefinger protruding pattern, and the position of the hand is 3 cm away from the bottom edge of the frame image; in the third frame image, the gesture pattern is a protruding pattern of an index finger, and the position of the hand is 4 cm away from the bottom edge of the frame image; in the fourth frame image, the gesture pattern is a forefinger protruding pattern, and the position of the hand is 5 cm away from the bottom edge of the frame image; in the fifth frame image, the gesture pattern is a protruding index finger pattern, and the position of the hand is 6 cm away from the bottom edge of the frame image. According to the 5-frame images, the gesture pattern in the target video is the index finger protruding pattern, and the gesture motion track is upwards translated.

In the above step S103, the control instruction refers to an instruction that can be used to control the target device, and different combinations of the gesture pattern and the gesture motion trajectory correspond to different control instructions.

Specifically, after the gesture patterns and gesture motion trajectories are determined, screening out control instructions which have association relations with the determined gesture patterns and gesture motion trajectories in a control instruction database. And the gesture patterns, the gesture motion gestures and the control instructions in the control instruction database are stored in an associated mode.

For example, in the control instruction database, the gesture pattern is a thumb protrusion pattern and the gesture motion track is an upward translation, and the represented control instruction is a forward movement instruction; the gesture pattern is an index finger protruding pattern, the gesture motion track is upwards translated, and the represented control instruction is a backward movement instruction; the gesture pattern is a middle finger protruding pattern, the gesture motion track is upwards translated, and the represented control instruction is a leftward movement instruction; the gesture pattern is a ring-finger protruding pattern, the gesture motion track is upwards translated, and the represented control instruction is a rightward movement instruction; the gesture pattern is a small-finger protruding pattern, the gesture motion track is upwards translated, and the represented control instruction is a pivot instruction. When the gesture pattern in the target video is detected to be the index finger protruding pattern and the gesture motion track is translated upwards, the control instruction represented by the target video can be determined to be a backward movement instruction in the control instruction database.

In step S104, after determining the control instruction, the target device needs to send the control instruction to the target device, and the target device can perform a corresponding action according to the control instruction.

According to the scheme, through the four steps, the gesture patterns and the gesture motion tracks are determined in the acquired video, and the control instruction is determined according to the determined gesture patterns and gesture motion tracks, so that the target equipment is controlled, a user does not need to wear data gloves matched with the target equipment, complicated steps for controlling the target equipment are reduced, and the convenience for controlling the target equipment is improved.

When a video is shot in a natural environment, a plurality of images can be shot in the video, so that the target video of the application can comprise the hands of a user, the arms of the user and other objects in the environment where the user is located. In this way, when the gesture pattern and the gesture motion trajectory are identified for the target video, the recognition of the gesture pattern and the gesture motion trajectory is interfered by other objects except the hands of the user, so that the interference of the other objects except the hands in the target video needs to be removed, the following steps may be adopted:

Step 105, for each frame of image in the target video, performing foreground extraction based on the color value of each part in the image to determine an area image in which the hand is located in the frame of image;

and step 106, determining whether the target video is an effective target video according to the area of the region image of the hand in each frame of image of the target video.

In the step 105, the color value refers to a color characteristic value of the pixel, and the color value may be represented by color space values of red, green, and blue, or may be represented by color space values of hue, saturation, and brightness. The color value of each portion refers to the color value of each pixel point. The foreground extraction refers to distinguishing an area image where a hand is located in each frame of image from a background area image, and removing the background area image.

In the step 105, for each frame of image, a foreground extraction is performed to remove the background area in the frame of image, and only the area image where the hand is located is reserved, so that the interference of the background on the gesture pattern and the gesture motion track recognition can be reduced.

The foreground extraction of each frame of image in the application comprises the following steps:

step 1051, gray scale processing is performed on the frame image;

Step 1052, according to the first color threshold, binarizing the frame image of the gray scale process;

step 1053, performing noise reduction processing on the frame image after the binarization processing to obtain a binary image;

and 1054, removing the background in the binary image according to the second color threshold value to reserve the image of the region where the hand is located.

In step 1051, the graying process is to adjust the color of the pixel point to a gray color, that is, to adjust the color space value of red, the color space value of green, and the color space value of blue corresponding to the pixel point to a uniform value, so that the color of the pixel point is represented as gray.

Specifically, the gray processing of the frame image can be implemented by any one of the following algorithms: component algorithms, maximum value algorithms, average value algorithms, weighted average algorithms, etc. The gray scale processing of an image by any of the above algorithms is a common technique in the art and is not described in detail herein.

In the above step 1052, the first color threshold refers to the range of color values corresponding to all pixels in the image of the region where the hand is located, and the binarization process refers to unifying the color values of the pixels in the image of the region where the hand is located to one color value, and unifying the color values of the image of the background region to another color value, for example, setting the color value of the image of the region where the hand is located to 0, and setting the color value of the image of the background region to 255.

Specifically, the range of the color values (i.e., the first color threshold) of the pixels in the image of the region where the hand is located is determined, the color values of the pixels belonging to the first color threshold are unified into one color value, and the color values of the pixels not belonging to the first color threshold are unified into another color value.

For example, when the color value is a color space value of hue, saturation, and lightness, the color value of the pixel of the region image in which the hand is located is in the range of hue (2, 28) and saturation (50, 200), the color value of the pixel whose color value is in the above range is set to 0, and the color value of the pixel not in the above color value range is set to 255.

In step 1053, the noise reduction process refers to smoothing the edges of the hand in the region image.

Specifically, the frame image after the binarization is subjected to median filtering treatment, so that the edge of the frame image after the binarization is processed more clearly, then expansion and corrosion treatment are performed through morphology, the region of the hand belonging to noise in the region image (for example, the region of the image is far smaller than the region of the gesture region image with the same background color value) and the region of the hand belonging to noise in the background image (for example, the region of the image is far smaller than the region of the gesture region image with the same color value of the hand in the region image) can be eliminated, and then Gaussian filtering treatment is performed to smooth the edge of the region image where the hand is located.

In step 1054, the second color threshold refers to the color value of the pixel of the image of the region where the hand is located (e.g., the color value is 0).

Specifically, according to the color value in the frame image, the image of the area where the hand is located can be separated from the image of the background area, and the image area where the second color threshold is located is reserved.

In step 106, the valid target video refers to a video in which the gesture pattern and gesture motion trajectory can be recognized.

According to the determined region image of the hand, the area of the region image of the hand can be calculated, and according to the ratio of the area of the region image of the hand to the area of the frame image and the preset ratio range, whether the target video is an effective target video can be determined. The preset ratio range is preset according to the actual situation, if the ratio of the area image where the hand is located to the area of the frame image belongs to the preset ratio range, the current video is determined to be an effective target video, and if the ratio of the area image where the hand is located to the area of the frame image does not belong to the preset ratio range, the current video is determined to be an ineffective target video. When the area of the region image of the hand in the invalid video is too small, deviation of the recognized gesture patterns may occur, and when the area of the region image of the hand in the invalid video is too large, only a part of the gesture patterns may be contained in the frame image, and the gesture patterns cannot be recognized. Therefore, the area of the region image where the hand is located judges whether the target video is an effective target video, and the success rate of hand pattern recognition can be improved.

For example, the area of one frame image is 100 square centimeters, the preset ratio range is between 60% and 80%, five frame images are extracted from the target video and are G, H, J, K, L respectively, wherein the area of the area image of the hand in G is 70 square centimeters, the area of the area image of the hand in H is 80 square centimeters, the area of the area image of the hand in J is 75 square centimeters, the area of the area image of the hand in K is 70 square centimeters, the area of the area image of the hand in L is 65 square centimeters, the ratio of the area image of the hand in G to the area of the frame image is 70%, the ratio of the area image of the hand in H to the area of the frame image is 80%, the ratio of the area image of the hand in J to the area of the frame image is 75%, the area image of the hand in K to the area of the frame image is 65%, and the ratio of the area image of the hand in each frame image to the area of the frame image is within the preset ratio range, and the target video is effective.

The gesture pattern and gesture motion trajectory may be included in one video, and thus, the gesture pattern and gesture motion trajectory are determined in the target video, as shown in fig. 2, step S103 includes:

S1031, determining the outline of a hand and the position information of a positioning point in each frame of image in a target video;

s1032, determining a gesture pattern according to the similarity between the outline of the hand in each frame of image and the outline of the candidate standard gesture;

s1033, determining gesture motion tracks according to the position information and the time information of the positioning points in each frame of image.

In step S1031, the outline of the hand is defined by the boundary of the region image in which the hand is located. The locating point refers to a designated point for determining the position of the hand, in particular, the locating point may be a certain point on the hand, and the locating point may include any one or more of the following positions: thumb tip position, index finger tip position, middle finger tip position, ring finger tip position, little finger tip position, centroid position, etc. The centroid position refers to a middle point position of an area image where a hand is located, specifically, a minimum rectangle capable of containing the area image where the hand is located is determined according to a boundary of the area image where the hand is located, and an intersection point of two diagonal lines of the minimum rectangle is determined as the centroid position. The location information of the anchor points may be determined by an anchor point acquisition model, which is trained by a large amount of training data. Training of the anchor point determination model may include the steps of:

Step 11, acquiring a positioning point training sample set; the positioning point training sample set comprises a plurality of positioning point training samples;

step 12, for each training sample, taking the untagged gesture image as a positive sample of the anchor point determination model to be trained, taking the gesture image marked with the anchor point as a negative sample of the anchor point determination model to be trained, and training the anchor point determination model to be trained.

In the step 11, the set of anchor point training samples includes a plurality of anchor point training samples, where each anchor point training sample includes an unlabeled gesture image and a gesture image labeled with an anchor point, the gesture image refers to a photograph including an area image where a hand is located, and the labeled anchor point in the gesture image may be a thumb fingertip position, an index finger fingertip position, a middle finger fingertip position, a ring finger fingertip position, a little finger fingertip position, and a centroid position.

In the step 12, the anchor point training model obtained through multiple times of training can determine the position information of the anchor point in the frame image after the frame image is input.

Specifically, in a frame image, the outline of the hand can be determined according to the boundary line of the area image where the hand is located, and the frame image is input into a positioning point determination model to obtain the position information of the positioning point in the frame image.

In the step S1032, there are a plurality of candidate gesture profiles in the control command database, each gesture profile represents a gesture pattern, the similarity between the hand profile and each candidate gesture profile is calculated, candidate gesture profiles meeting the similarity threshold are selected from all candidate gesture profiles, one of the candidate gesture profiles having the highest similarity to the hand profile is determined, and the gesture pattern represented by the candidate gesture profile having the highest similarity is the gesture pattern represented by the hand profile.

For example, the hand contour is a, 5 candidate gesture contours are respectively a candidate gesture contour B, a candidate gesture contour C, a candidate gesture contour D, a candidate gesture contour E and a candidate gesture contour F, the similarity threshold is between 60% and 100%, according to the calculation, the similarity between the hand contour a and the candidate gesture contour B is 50%, the similarity between the hand contour a and the candidate gesture contour C is 70%, the similarity between the hand contour a and the candidate gesture contour D is 30%, the similarity between the hand contour a and the candidate gesture contour E is 80%, and the similarity between the hand contour a and the candidate gesture contour F is 20%, it can be determined that the candidate gesture contour meeting the similarity threshold is the candidate gesture contour C and the candidate gesture contour E, and the candidate gesture contour with the maximum similarity is the candidate gesture contour E, and the gesture pattern represented by the candidate gesture contour E is the gesture pattern represented by the hand contour.

In the step S1033, the position information of the anchor points in each frame of image is ordered according to the time information, and the connection line of the positions of the ordered anchor points is the gesture motion track.

When the target equipment is controlled, the target equipment executes corresponding actions as long as the control instruction is sent to the target equipment and the target equipment receives the control instruction, but the target equipment is a machine without thought, without human thinking and without random strain capability, and when the high-risk work is processed through the target equipment and the wrong control instruction is received, life danger can be brought to staff; or damage may occur to the expensive item when the expensive item is handled by the target device and the wrong control instruction is received. Thus, when sending a control instruction to a target device, the method further comprises:

step 107, sending the control instruction to the user terminal, so that the user terminal prompts the control instruction according to a message prompt mode;

step 108, receiving a reply instruction of the user terminal for the prompted control instruction.

In the step 107, the user terminal refers to a device that may be used to receive the control instruction, and the user terminal may be a mobile phone, a computer, a tablet computer, or the like. The message prompting mode refers to a mode for prompting the control instruction, and the message prompting mode can comprise any one or more of the following modes: text prompting mode, image prompting mode and broadcasting prompting mode.

Specifically, the control instruction is prompted at the user terminal according to a preset message prompting mode, so that the user can review the control instruction again, and the user can check whether the control instruction is correct or incorrect.

For example, the control command is that the target device moves forward, after the control command is sent to the user terminal, the user terminal may pop up a message box, display "control target device moves forward" in the box, and display two buttons for confirm and cancel below the box.

In the above step 108, the reply instruction refers to a reply to the message prompted in the user terminal, and the reply instruction may include confirmation and cancellation.

Specifically, after the user sees the prompt message prompted by the user terminal, the user can reply to the prompt message, when the user checks that the control instruction is correct, the user can confirm that the control instruction can be continuously executed, and when the user checks that the control instruction is incorrect, the user can cancel the control instruction to be sent to the target device. The method can improve the accuracy of the control command, reduce the threat to life safety caused by the fact that the target equipment receives the error control command, and reduce the property loss caused by the fact that the target equipment receives the error control command.

Continuing the example of prompting the control instruction by the user terminal, the user can judge according to the content of the prompting message, when the user confirms that the content of the prompting message is correct, the control instruction can be continuously sent to the target device after clicking the confirmation button, and when the user confirms that the content of the prompting message is wrong, the control instruction can not be continuously sent to the target device after clicking the cancellation button.

In the method, the gesture patterns and the gesture motion tracks are determined through the target videos, and then the target equipment is controlled according to the control instructions determined by the gesture patterns and the gesture motion tracks, so that the target equipment can be controlled without wearing data gloves which are completely matched with the target equipment by a user, the efficiency of controlling the target equipment is improved, the data gloves are omitted, and the production cost is reduced. The control instructions are determined through the combination of the gesture patterns and the gesture motion tracks, various control instructions can be set, the control instructions are richer, a user can control the target equipment more comprehensively, the control efficiency of the target equipment is improved, and the working efficiency of the target equipment is also improved. The foreground extraction is carried out through the color values, so that the background is removed from the video, the interference is reduced for determining the gesture patterns and the gesture motion tracks in the video, and the accuracy of the gesture patterns and the gesture motion tracks is improved. By calculating the area of the region image where the opponent is located, whether the target video is an effective video or not can be determined, the gesture patterns and gesture motion tracks in the target video can be further identified only after the effective target video is confirmed, the ineffective target video does not identify the gesture patterns and gesture motion tracks, unnecessary work is reduced, and the identification efficiency of the gesture patterns and gesture motion tracks is improved.

As shown in fig. 3, an embodiment of the present application provides a control device of an apparatus, including:

an acquisition module 301, configured to acquire a target video for controlling a target device;

the first determining module 302 is configured to determine, according to the multi-frame image specified in the target video, a gesture pattern and a gesture motion track that occur in the target video;

the second determining module 303 is configured to determine a control instruction corresponding to the target video according to the gesture pattern and the gesture motion track;

and the control module 304 is configured to control the target device according to the control instruction.

Optionally, the device further includes:

Optionally, the first determining module 302 includes:

a first determining unit, configured to determine, for each frame of image in the target video, position information of a contour and a positioning point of a hand in the frame of image;

The second determining unit is used for determining a gesture pattern according to the similarity between the outline of the hand in each frame of image and the outline of the candidate standard gesture;

and the third determining unit is used for determining the gesture motion track according to the position information and the time information of the locating point in each frame of image.

Optionally, the anchor point location includes any one or more of the following locations:

Optionally, the device further includes:

the prompting module is used for sending the control instruction to the user terminal so that the user terminal prompts the control instruction according to a message prompting mode;

and the reply module is used for receiving a reply instruction of the control instruction of the user terminal aiming at the prompt.

text prompting mode, image prompting mode and broadcasting prompting mode.

Corresponding to the control method of the device in fig. 1, the embodiment of the application further provides a computer device 400, as shown in fig. 4, where the device includes a memory 401, a processor 402, and a computer program stored in the memory 401 and capable of running on the processor 402, where the processor 402 implements the control method of the device when executing the computer program.

Specifically, the memory 401 and the processor 402 can be general-purpose memories and processors, which are not limited herein, and when the processor 402 runs a computer program stored in the memory 401, the control method of the device can be executed, so as to solve the problem of how to remotely control the device in the prior art. According to the method and the device, the gesture patterns and the gesture motion tracks are determined in the acquired video, and the control instruction is determined according to the determined gesture patterns and gesture motion tracks so as to control the target equipment, so that a user does not need to wear data gloves matched with the target equipment, cumbersome steps for controlling the target equipment are reduced, and the convenience for controlling the target equipment is improved.

Corresponding to the control method of the device in fig. 1, the embodiment of the present application further provides a computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, performs the steps of the control method of the device.

Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk and the like, when a computer program on the storage medium is run, the control method of the device can be executed, the problem of how to remotely control the device in the prior art is solved, the gesture pattern and the gesture motion track are determined in the acquired video, the control instruction is determined according to the determined gesture pattern and gesture motion track, so that the target device is controlled, a user does not need to wear a data glove matched with the target device, complicated steps for controlling the target device are reduced, and the convenience for controlling the target device is improved.

In the embodiments provided in the present application, it should be understood that the disclosed methods and apparatuses may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of this application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A control method of an apparatus, characterized by comprising:

acquiring a target video for controlling target equipment;

determining the outline of a hand and the position information of a positioning point in each frame of image aiming at each frame of image in the target video; the position information of the positioning points is determined through a positioning point acquisition model, and the positioning point acquisition model is obtained through training of a large amount of training data; training the anchor point acquisition model comprises the following steps: acquiring a positioning point training sample set; the positioning point training sample set comprises a plurality of positioning point training samples; aiming at each training sample, taking an untagged gesture image as a positive sample of a positioning point acquisition model to be trained, taking a gesture image marked with a positioning point as a negative sample of the positioning point acquisition model to be trained, and training the positioning point acquisition model to be trained;

Determining a gesture pattern according to the similarity between the outline of the hand in each frame of image and the outline of the candidate standard gesture; the gesture patterns comprise any one of a fist-making pattern, a one-finger protruding pattern, a two-finger protruding pattern, a three-finger protruding pattern, a four-finger protruding pattern and a five-finger protruding pattern;

determining a gesture motion track according to the position information and the time information of the positioning point in each frame of image;

according to the control instruction, controlling the target equipment;

after acquiring the target video acted on the mechanical equipment, before determining the gesture pattern and gesture motion track appearing in the target video according to the multi-frame image specified in the target video, the method further comprises:

determining whether the target video is an effective target video or not according to the area of an area image of a hand in each frame of image of the target video;

And if the ratio of the area image where the hand is located in each frame image of the target video to the area of the frame image belongs to a preset ratio range, determining that the target video is an effective target video.

2. The method of claim 1, wherein the anchor points comprise any one or more of the following locations:

3. The method of claim 1, further comprising, after determining a control instruction corresponding to the target video according to the gesture pattern and the gesture motion trajectory, before controlling the target device according to the control instruction:

4. A method according to claim 3, wherein the message alert mode comprises any one or more of the following modes:

text prompting mode, image prompting mode and broadcasting prompting mode.

5. A control device of an apparatus, characterized by comprising:

the first determining module is used for determining the outline of the hand and the position information of the positioning point in each frame of image aiming at each frame of image in the target video; the position information of the positioning points is determined through a positioning point acquisition model, and the positioning point acquisition model is obtained through training of a large amount of training data; training the anchor point acquisition model comprises the following steps: acquiring a positioning point training sample set; the positioning point training sample set comprises a plurality of positioning point training samples; aiming at each training sample, taking an untagged gesture image as a positive sample of a positioning point acquisition model to be trained, taking a gesture image marked with a positioning point as a negative sample of the positioning point acquisition model to be trained, and training the positioning point acquisition model to be trained; determining a gesture pattern according to the similarity between the outline of the hand in each frame of image and the outline of the candidate standard gesture; the gesture patterns comprise any one of a fist-making pattern, a one-finger protruding pattern, a two-finger protruding pattern, a three-finger protruding pattern, a four-finger protruding pattern and a five-finger protruding pattern; determining a gesture motion track according to the position information and the time information of the positioning point in each frame of image;

the control module is used for controlling the target equipment according to the control instruction;

the device further comprises:

the judging module is used for determining whether the target video is an effective target video according to the area of the area image of the hand in each frame of image of the target video; and if the ratio of the area image where the hand is located in each frame image of the target video to the area of the frame image belongs to a preset ratio range, determining that the target video is an effective target video.

6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of the preceding claims 1-4 when the computer program is executed.

7. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the method of any of the preceding claims 1-4.