WO2022198819A1

WO2022198819A1 - Image recognition-based device control method and apparatus, electronic device, and computer readable storage medium

Info

Publication number: WO2022198819A1
Application number: PCT/CN2021/102478
Authority: WO
Inventors: 孔祥晖
Original assignee: 北京市商汤科技开发有限公司
Priority date: 2021-03-22
Filing date: 2021-06-25
Publication date: 2022-09-29
Also published as: CN113031464A; CN113031464B

Abstract

The present disclosure provides an image recognition-based device control method and apparatus, an electronic device, and a computer readable storage medium. The method comprises: performing hand detection on an obtained first image to be tested, and determining hand detection information of a target hand matching a preset gesture category; on the basis of the hand detection information of the target hand, performing limb tracking detection on a target limb connected to the target hand in an obtained second image to be tested, and determining a gesture recognition result of the target hand in said second image, said second image being an image acquired after said first image; and controlling a target device on the basis of the gesture recognition result.

Description

Device control method and device based on image recognition, electronic device and computer-readable storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

The present disclosure is based on the Chinese patent application with the application number of 202110301465.0, the application date of March 22, 2020, and the application name of "equipment control method, device, electronic equipment and storage medium", and claims the priority of the Chinese patent application, The entire contents of this Chinese patent application are hereby incorporated by reference into the present disclosure.

technical field

The present disclosure relates to the technical field of computer vision, and in particular, to an image recognition-based device control method, apparatus, electronic device, and computer-readable storage medium.

Background technique

With the development of science and technology, people continue to put forward new requirements and adjustments to the level and quality of human-computer interaction. Among them, gestures have become an important means of human-computer interaction due to their intuitive and natural characteristics. Therefore, gesture recognition based on computer vision has become a research focus in the field of human-computer interaction.

Generally, the user's gesture category can be determined through the acquired image, and the target device can be controlled by using the determined gesture category. However, when there are multiple users in the human-computer interaction scene, there may be interference between gestures of different users. , thereby reducing the accuracy of image recognition of the gesture of the main control user, thereby reducing the control accuracy of the target device.

SUMMARY OF THE INVENTION

The embodiments of the present disclosure provide at least an image recognition-based device control method, device, electronic device, and computer-readable storage medium, which can improve the accuracy of image recognition, thereby improving the accuracy of target device control based on the image recognition result.

In a first aspect, the present disclosure provides a device control method based on image recognition, including:

Perform hand detection on the acquired first image to be detected, and determine the hand detection information of the target hand matching the preset gesture category;

Based on the hand detection information of the target hand, perform limb tracking detection on the target limb connected to the target hand in the acquired second to-be-detected image, and determine that the target hand is in the second to-be-detected image The gesture recognition result in the image; wherein, the second to-be-detected image is an image obtained after the first to-be-detected image;

Based on the gesture recognition result, the target device is controlled.

In the above method, by performing hand detection on the first to-be-detected image, the hand detection information of the target hand that matches the preset gesture category is determined, and based on the hand detection information of the target hand, the acquired second to-be-detected hand is detected. The target limb connected to the target hand in the detection image is subjected to limb tracking detection, and the gesture recognition result of the target hand in the second to-be-detected image is determined. In this way, the target hand that is difficult to be tracked and detected can be tracked by means of limb tracking, and then the target device can be controlled based on the gesture recognition result. In the hands of many users, or in the two hands of the same user, by locking the target hand and using the unique matching between the limb and the hand, limb tracking is carried out for the purpose of tracking the target hand , and based on the limb tracking result, the gesture recognition result of the target hand in the second to-be-detected image is obtained, thereby effectively reducing the risk of other users' problems when performing image recognition on the gesture of the target user corresponding to the target hand controlling the target device. The interference generated by hand movements improves the accuracy of image recognition, thereby improving the control accuracy of the target device.

It can be seen that, by adopting the technical solution provided by the present disclosure, the target user used to control the target device among multiple users can be effectively identified, and to a certain extent, when both hands of the target user have hand movements , and choose one to determine the target hand to accurately control the target device. It should be noted that, if part of the control operations are touched by the user's two hands performing corresponding actions respectively, then the technical solution provided by the present disclosure can lock the target user, and based on the corresponding two hands of the target user, the target user can be locked. Hand movements to achieve control of the target device.

In a possible implementation manner, before the control of the target device based on the gesture recognition result, the method further includes:

detecting whether the target hand satisfies the cut-off condition;

When it is detected that the gesture recognition result satisfies the cutoff condition, in the second image to be detected, the hand detection information of the target hand matching the preset gesture category is re-determined.

Here, when it is detected that the gesture recognition result meets the cutoff condition, and the target hand representing the target user no longer controls the target device, the hand detection information of the target hand matching the preset gesture category can be re-determined, so that At least one user in the second image to be detected can control the target device in real time.

In a possible implementation manner, the target hand satisfying the cut-off condition includes one or more of the following:

In the second image to be detected, the gesture category indicated by the gesture recognition result of the target hand is an invalid gesture category, and the invalid gesture category includes at least one of the following: the gesture category and the preset gesture category mismatch, and the target hand has not moved;

In the case where the second image to be detected includes multiple frames, the gesture category indicated by the gesture recognition result of the target hand is the invalid gesture category. The number of frames is greater than or equal to the number threshold, and/or the duration is greater than or equal to is equal to the duration threshold;

In the second image to be detected, the gesture category indicated by the gesture recognition result of the target hand is a valid gesture category, and the valid gesture category is used to instruct to re-determine the target hand and/or hand detection information.

In a possible implementation manner, performing hand detection on the acquired first image to be detected includes:

performing limb detection on the acquired first image to be detected to obtain limb detection information;

Based on the limb detection information, hand detection is performed on the first image to be detected, and the hand detection information of the target hand associated with the limb is determined.

Since it is difficult to track and detect the hand in the image, and the tracking and detection of the limb is easier to achieve, and the hand is connected with the limb, the limb detection can be performed on the first image to be detected first to determine the limb detection information, and then based on the limb detection By performing hand detection on the first image to be detected, the hand detection information of the target hand associated with the limb can be more accurately determined.

Performing limb detection and hand detection on the acquired first image to be detected, respectively, to obtain limb detection information and the hand detection information;

determining the distance between the hand and the limb based on the limb detection information and the hand detection information;

Based on the distance, the hand detection information for the target hand associated with the limb is determined.

Here, the hand detection information of the target hand associated with the limb can be determined through the distance between the hand and the limb, and the determination process is simple and easy to implement.

In a possible implementation manner, the control target device includes at least one of the following:

adjust the volume of the target device;

Adjust the working mode of the target device, the working mode includes turning off or turning on at least part of the function of the target device;

Displaying the mobile logo in the display interface of the target device, or adjusting the display position of the mobile logo in the display interface;

reduction or enlargement of at least part of the displayed content in the display interface;

Sliding or jumping of the display interface.

Here, based on the gesture recognition result, the volume of the target device can be controlled, the target device can be turned off, and the display position of the movement logo in the display interface of the target device, etc., so as to realize flexible control of the target device.

In a possible implementation manner, in the case that the first image to be detected includes multiple users, in the hand detection information based on the target hand, the second image to be detected that is obtained and the Before performing the limb tracking detection on the target limb connected to the target hand, the method further includes:

Determine the target joint point position information of each user in the first to-be-detected image;

Taking each user in the first to-be-detected image as a target user, and based on the target joint position information of the target user, determine the target joint of the target user and the target user in the multiple users except the target user The horizontal distance between the target joint points of other users;

In the case that it is determined based on the horizontal distance that there is no interfering user among the other users, the default gesture category of the target user is taken as the preset gesture category of the target user, and the interfering user includes: Users whose horizontal distance is smaller than a distance threshold corresponding to the target user.

In a possible implementation, it also includes:

When it is determined that there is an interfering user among the other users based on the horizontal distance, the default gesture category of the target user is adjusted, and the adjusted default gesture category is used as the preset gesture category of the target user. Assuming a gesture category, adjusting the default gesture category includes at least one of the following operations: increasing the category of the default gesture category, increasing the category of the gesture category used to control at least one function of the target device, and moving the gesture category The detection is adjusted to the motion detection of the hand detection frame.

In the above embodiment, when multiple users are included in the first image to be detected, each user can be regarded as a target user, and the target user can be determined based on the target joint position information of the target user and the target joint position information of other users. The horizontal distance from the target joint points of other users. When it is determined that there are interfering users among other users based on the horizontal distance, the gesture fault tolerance mechanism corresponding to the target user can be adjusted, that is, the adjusted default gesture category can be used as the target user. The preset gesture category can alleviate the influence of the interference user on the gesture category detection of the target user.

In a possible implementation manner, the distance threshold corresponding to the target user is determined according to the following steps:

determining the position information of the first joint point and the position information of the second joint point of the target user;

based on the position information of the first joint point and the position information of the second joint point, determining an intermediate distance used to represent the shoulder width of the target user;

Based on the intermediate distance, the distance threshold corresponding to the target user is determined.

Using the above method, the intermediate distance representing the shoulder width of the target user can be determined based on the determined position information of the first joint point and the position information of the second joint point, and then the distance threshold of the target user can be determined based on the intermediate distance corresponding to the target user. , different users correspond to different distance thresholds. By determining the corresponding distance threshold for each target user, it can be more accurately judged whether other users will cause interference to the target user.

For descriptions of the effects of the following apparatuses, electronic devices, etc., reference may be made to the descriptions of the above-mentioned methods, which will not be repeated here.

In a second aspect, the present disclosure provides a device control device based on image recognition, including:

a first determining module, configured to perform hand detection on the acquired first image to be detected, and determine the hand detection information of the target hand matching the preset gesture category;

The detection module is configured to, based on the hand detection information of the target hand, perform limb tracking detection on the target limb connected to the target hand in the acquired second image to be detected, and determine that the target hand is in the The gesture recognition result in the second to-be-detected image; wherein, the second to-be-detected image is an image obtained after the first to-be-detected image;

The control module is configured to control the target device based on the gesture recognition result.

In a possible implementation manner, before the target device is controlled based on the gesture recognition result, it also includes: a second determining module, configured as:

detecting whether the target hand satisfies the cut-off condition;

In a possible implementation manner, the first determination module, when performing hand detection on the acquired first image to be detected, is configured to:

In a possible implementation manner, the control module, when controlling the target device, includes at least one of the following:

adjust the volume of the target device;

Sliding or jumping of the display interface.

In a possible implementation manner, in the case that the first image to be detected includes multiple users, based on the hand detection information of the target hand, the obtained second image to be detected is compared with the Before performing the limb tracking detection on the target limb connected to the target hand, the method further includes: an adjustment module, which is configured as:

When it is determined based on the horizontal distance that there is no interfering user among the other users, the default gesture category of the target user is taken as the preset gesture category of the target user, and the interfering user includes Users whose horizontal distance is smaller than a distance threshold corresponding to the target user.

In a possible implementation manner, the adjustment module is further configured to:

In a possible implementation manner, the apparatus further includes a distance threshold determination module, the distance threshold determination module is configured to determine the distance threshold corresponding to the target user according to the following steps:

In a third aspect, the present disclosure provides an electronic device, comprising: a processor, a memory, and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device runs, the processor communicates with the The memory communicates through a bus, and when the machine-readable instruction is executed by the processor, the image recognition-based device control method according to the first aspect or any one of the implementation manners is executed.

In a fourth aspect, the present disclosure provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the computer program according to the first aspect or any one of the above-mentioned embodiments is executed. Device control method for image recognition.

In a fifth aspect, the present disclosure provides a computer program, comprising computer-readable code, and when the computer-readable code is executed in an electronic device, the processor in the electronic device implements the above-mentioned first aspect when executed. Or the device control method based on image recognition described in any embodiment.

In order to make the above-mentioned objects, features and advantages of the present disclosure more obvious and easy to understand, the preferred embodiments are exemplified below, and are described in detail as follows in conjunction with the accompanying drawings.

Description of drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings required in the embodiments, which are incorporated into the specification and constitute a part of the specification. The drawings illustrate embodiments consistent with the present disclosure, and together with the description serve to explain the technical solutions of the present disclosure. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. Other related figures are obtained from these figures.

FIG. 1 shows a schematic flowchart of an image recognition-based device control method provided by an embodiment of the present disclosure;

2 shows a schematic diagram of a limb joint point and a hand detection frame in an image recognition-based device control method provided by an embodiment of the present disclosure;

FIG. 3 shows a schematic structural diagram of an image recognition-based device control apparatus provided by an embodiment of the present disclosure;

FIG. 4 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the purposes, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments These are only some of the embodiments of the present disclosure, but not all of the embodiments. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

Generally, the user's gesture category can be determined through the acquired image, and the target device can be controlled by using the determined gesture category. However, when there are multiple users in the human-computer interaction scene, there may be interference between gestures of different users. , reducing the control effect of controlling the target device through human-computer interaction. In order to solve the above problems and improve the control effect of the control target device based on human-computer interaction, an embodiment of the present disclosure provides a device control scheme based on image recognition.

The defects existing in the above solutions are all the results obtained by the inventor after practice and careful research. Therefore, the discovery process of the above problems and the solutions to the above problems proposed by the present disclosure hereinafter should be the inventors Contributions made to this disclosure during the course of this disclosure.

The technical solutions in the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all of the embodiments. The components of the present disclosure generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further definition and explanation in subsequent figures.

In order to facilitate the understanding of the embodiments of the present disclosure, an image recognition-based device control method disclosed in the embodiments of the present disclosure is first introduced in detail. The execution subject of the device control method based on image recognition provided by the embodiments of the present disclosure is generally a computer device with a certain computing capability. Equipment, UE), mobile devices, user terminals, cellular phones, cordless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementations, the image recognition-based device control method may be implemented by a processor calling computer-readable instructions stored in a memory.

Referring to FIG. 1, which is a schematic flowchart of an image recognition-based device control method provided by an embodiment of the present disclosure, the method includes S101-S103, wherein:

S101, performing hand detection on the acquired first image to be detected, and determining hand detection information of a target hand matching a preset gesture category;

S102, based on the hand detection information of the target hand, perform limb tracking detection on the target limb connected to the target hand in the acquired second image to be detected, and determine that the target hand is in the second image to be detected. The gesture recognition result in the image to be detected; wherein, the second image to be detected is an image obtained after the first image to be detected;

S103, based on the gesture recognition result, control the target device.

The hand detection information refers to the detected feature information of the target hand matching the preset gesture category in the first image to be detected, which may include hand position information, gesture category, hand identification information, and the like. Exemplarily, the hand position information may be the coordinate information of the vertex of the hand detection frame corresponding to the target hand in the image coordinate system corresponding to the first image to be detected, or the hand position information may be the position corresponding to the target hand. Coordinate information of the contour region in the image coordinate system corresponding to the first image to be detected, etc. The gesture category may be the category of the gesture action of the target hand on the first image to be detected, for example, the gesture category may be the category of the gesture action of "ok". The hand identification information may be any identification matched for the target hand, and the identification information may be composed of numbers, characters, patterns, etc., for example, the hand identification information may be the left hand a1.

The first to-be-detected image and the second to-be-detected image may be two frames of video images that are adjacent in time sequence in the video stream, or adjacent in time sequence in the video sequence obtained by sampling and sampling the original video stream. Two frames of video images.

In practical applications, if there are other images between the first image to be detected and the second image to be detected, usually the changes of each object in other images can be ignored, for example, the first image to be detected The time difference formed between the acquisition moments corresponding to the second to-be-detected image is relatively small, which can be regarded as a small difference between the different acquired video images, which will not affect the image recognition based on the first detection image and the second detection image. analysis and processing.

S101-S103 will be described in detail below.

For S101:

Here, the first image to be detected may be the current image of the set target area, and the target area is any scene area set for controlling the target device. In some embodiments, a camera device may be set on the target device, or a camera device may also be set in a surrounding area of the target device, so that the camera device can acquire the first image to be detected of the target area corresponding to the target device. The photographing area corresponding to the imaging device includes a target area, that is, the target area is located within the photographing range of the imaging device.

Perform hand detection on the first to-be-detected image to obtain the hand detection information of each user included in the first to-be-detected image, and then determine and preset according to the gesture category information indicated by the hand detection information corresponding to each user. The hand detection information of the target hand matched by the gesture category.

The preset gesture category can be the set gesture action category, and the set gesture action can be used to control the target device. ” of the gesture action category, etc.

If the gesture category indicated by the hand detection information of multiple users in the first to-be-detected image is the same as the preset gesture category, the gesture category information and the preset gesture category can be determined from the gesture category information and the preset gesture category according to the position information of each user's limb center point. Among the same multiple users, the target user is determined, for example, a user whose limb center point is located in the middle of the first image to be detected is selected as the target user, and the target user's hand is used as the target hand.

In an optional implementation manner, performing hand detection on the acquired first image to be detected includes:

S1011: Perform limb detection on the acquired first image to be detected to obtain limb detection information.

S1012. Based on the limb detection information, perform hand detection on the first image to be detected, and determine the hand detection information of the target hand associated with the limb.

Here, limb detection may be performed on the first image to be detected, and the limb detection information of each user included in the first image to be detected is determined. The limb detection information may include position information of a plurality of limb joint points, a limb identification corresponding to the user (the limb identification may be associated with the hand identification information included in the hand detection information), etc.; or the limb detection information may include the user's The limb contour information includes position information of multiple limb contour points. Wherein, the limb detection information may be the limb detection information of the user's half body. The limb joint points may be image key points extracted from the identified limb images of each user by performing limb detection on the first image to be detected by an image detection method.

If there is a user's limb identification in the historical to-be-detected image prior to the first to-be-detected image, the tracked and determined user's limb identification in the historical to-be-detected image is determined as the user's limb identification in the first to-be-detected image ; If the user's limb identification does not exist in the historical to-be-detected image before the first to-be-detected image, generate a corresponding limb identification for the user.

Then, the limb detection information of at least one user can be used to perform hand detection on the first image to be detected, and the hand detection information of the target hand associated with the limb can be determined. For example, the hand region image of the hand associated with the limb on the first to-be-detected image can be determined according to the limb detection information, and the hand region image can be detected by hand to obtain the hand detection information associated with the limb; The gesture category included in the part detection information is determined, and the target hand matching the preset gesture category is determined.

In some embodiments, the constructed first neural network may be trained so that the trained first neural network satisfies a first preset condition, for example, the loss value of the trained first neural network is smaller than a set loss threshold wherein, the trained first neural network is used to perform limb detection on the first image to be detected, and determine the limb detection information of at least one user included in the first image to be detected. The number of the limb joint points and the positions of the limb joint points included in the limb detection information can be set as required. For example, the number of limb joint points can be 14, 17, etc. And the second neural network for detecting the hand can also be trained, so that the trained second neural network satisfies the second preset condition, and then the trained second neural network can be used, based on the limb detection information, to detect the first neural network. The image to be detected is subjected to hand detection, and the hand detection information of the target hand associated with the limb is determined.

S1013, performing limb detection and hand detection on the acquired first image to be detected, respectively, to obtain limb detection information and the hand detection information;

S1014, determining the distance between the hand and the limb based on the limb detection information and the hand detection information;

S1015. Based on the distance, determine the hand detection information of the target hand associated with the limb.

Exemplarily, a first neural network may be used to perform limb detection on the first image to be detected to obtain limb detection information of at least one user, and a second neural network may be used to perform hand detection on the first image to be detected to obtain at least one hand. corresponding hand detection information. Determine the target hand according to the gesture category indicated by the hand detection information.

Then determine the distance between the hand and the limb according to the position information of the limb center point indicated by the limb detection information and the position information of the hand center point indicated by the hand detection information; and then determine the limb with the shortest distance from the target hand. , and the target hand is associated, that is, the hand detection information of the target hand associated with the limb is obtained.

Refer to a schematic diagram of a limb joint point and a hand detection frame in an image recognition-based device control method shown in FIG. 2 . The limb joint point information of the target user in FIG. 2 may include head vertex 5, head center point 4, neck joint point 3, left shoulder joint point 9, right shoulder joint point 6, left elbow joint point 10, right elbow joint point 7. Left wrist joint point 11, right wrist joint point 8, half body limb center point 12, crotch joint point 1, crotch joint point 2, and crotch center point 0; the hand detection frame can include four of the right hand detection frame. The

vertices

13, 15, 16, 17 and the center point 14 of the right-hand frame; and the four

vertices

18, 20, 21, 22 of the left-hand detection frame and the center point 19 of the left-hand frame.

For S102:

Taking the user corresponding to the target hand as the target user who controls the target device, and based on the hand detection information of the target user's target hand, perform limb tracking on the target limb connected to the target hand in the acquired second image to be detected Detect, determine the limb information of the target user in the second image to be detected, and determine the gesture recognition result of the target hand in the second image to be detected according to the determined limb information of the target user. The gesture recognition result includes, but is not limited to, gesture category, hand position information, and the like.

The second to-be-detected image is one or more frames of images acquired after the first to-be-detected image.

In an optional implementation manner, before the control of the target device based on the gesture recognition result, the method further includes:

1. Detecting whether the gesture recognition result satisfies the cut-off condition;

2. When it is detected that the gesture recognition result meets the cut-off condition, in the second image to be detected, the hand detection information of the target hand matching the preset gesture category is re-determined. Wherein, the gesture recognition result satisfying the cut-off condition includes one or more of the following:

Condition 1. In the second image to be detected, the gesture category indicated by the gesture recognition result of the target hand is an invalid gesture category, and the invalid gesture category includes at least one of the following: the gesture category and the preset gesture category. It is assumed that the gesture categories do not match, and the target hand does not move;

Condition 2: In the case where the second image to be detected includes multiple frames, the gesture category indicated by the gesture recognition result of the target hand is the invalid gesture category. The number of frames is greater than or equal to the number threshold, and/or continues The duration is greater than or equal to the duration threshold;

Condition 3: In the second to-be-detected image, the gesture category indicated by the gesture recognition result of the target hand is a valid gesture category, and the valid gesture category is used to instruct to re-determine the target hand and/or hand detection information.

During implementation, the gesture recognition result of the target hand can be detected in real time to determine whether the gesture recognition result satisfies the cut-off condition, and when it is detected that the gesture recognition result meets the cut-off condition, it means that the target hand no longer controls the target device, then The hand detection information of the target hand that matches the preset gesture category can be re-determined, so that at least one user in the second to-be-detected image can control the target device in real time.

When it is detected that the gesture recognition result meets the cutoff condition, in the second image to be detected, the hand detection information of the target hand matching the preset gesture category is re-determined, so as to use the re-determined gesture recognition result of the target hand to control the target device.

The cut-off condition includes but is not limited to one or more of the first condition, the second condition, and the third condition. For example, the cut-off condition may also include: if the hand of the target hand cannot be detected in the second image to be detected When the information is detected, the hand detection information of the target hand matching the preset gesture category is re-determined.

In condition 1, if the gesture category indicated by the gesture recognition result of the target hand in the second image to be detected does not match the preset gesture category, and/or, if in the second image to be detected, the gesture category of the target hand does not match the preset gesture category When the gesture recognition result indicates that the target hand has not moved, it is determined that the first condition is satisfied. Exemplarily, it may be determined whether the target hand moves according to the position information of the target hand in the multiple frames of the second images to be detected.

In the second condition, when it is detected that the target hand does not move in the second image to be detected in consecutive N frames, and the value of N is greater than or equal to the number threshold, it is determined that the second condition is satisfied, and N is a positive integer; Condition 2 is determined to be satisfied when the gesture category of the target hand in the second to-be-detected images of consecutive N frames does not match the preset gesture category, and the value of N is greater than or equal to the number threshold. The number threshold may be set as required, for example, the number threshold may be 3, 5, 10, and so on. Alternatively, it is determined that the second condition is satisfied when the duration of the gesture category indicated by the gesture recognition result of the target hand is an invalid gesture category and the duration is greater than or equal to the duration threshold. The duration threshold can be set according to actual needs.

In condition 3, a cut-off gesture category may be preset, and the cut-off gesture category is used to instruct the target hand and/or hand detection information to be re-determined. For example, the cut-off gesture category may be a thumbs-up gesture category. When the gesture category of the target hand is thumbs up, it is determined that the target hand satisfies the third condition.

For S103:

After the gesture recognition result of the target hand in the second image to be detected is determined, the target device can be controlled according to the gesture recognition result. The target device may be a smart TV, a smart display screen, or the like.

In an optional implementation manner, controlling the target device includes at least one of the following: adjusting the volume of the target device; adjusting a working mode of the target device, where the working mode includes turning off or turning on at least part of the target device function; display the mobile logo in the display interface of the target device, or adjust the display position of the mobile logo in the display interface; reduce or enlarge at least part of the displayed content in the display interface; slide the display interface or jump.

Here, based on the gesture recognition result, the volume of the target device can be controlled, the target device can be turned off, the display position of the movement logo in the display interface of the target device, etc., to realize flexible control of the target device.

An exemplary description will be given of adjusting the volume of the target device based on the gesture recognition result. If the gesture category included in the gesture recognition result is the set first target gesture category for volume control, for example, the first target gesture category may be the gesture category of the index finger and the middle finger. When the gesture category is the gesture category of the vertical index finger and the middle finger, it can be determined that the target hand has triggered the function of adjusting the volume of the target device, and then the volume can be increased or decreased according to the moving direction and distance of the target hand. And determine the amplified volume value or the reduced volume value. For example, if it is detected that the target hand moves from bottom to top, it indicates that the volume of the target device is amplified, and can be moved from bottom to top according to the distance, and The current volume is to determine the amplified volume value; if it is detected that the target hand moves from top to bottom, it indicates that the volume of the target device is being decreased, and it can be determined according to the distance moving from top to bottom and the current volume value. The volume value after the small.

An exemplary description will be given of adjusting the working mode of the target device based on the gesture recognition result. For example, if the gesture category in the gesture recognition result is the second target gesture category set for shutting down the target device, for example, the second target gesture category may be the OK gesture category. When the gesture category is the OK gesture category, it can be determined that the target hand triggers the function of closing the target device, and then the target device can be closed in response to the function triggered by the user.

It is also possible to determine the display position of the mobile logo on the target device based on the position information of the target hand indicated by the gesture recognition result, and control the display interface of the target device to display the mobile logo at the display position, wherein the mobile logo can be a moving cursor, etc. .

If the gesture category in the gesture recognition result is the same as the third target gesture category corresponding to the click, for example, the third target gesture category may be the gesture category of the vertical index finger, and if the gesture category of the target hand indicated by the gesture recognition result is the vertical gesture category When the gesture category of the index finger is used, it can be determined that the target user has triggered the click function at the target display position of the target device that matches the current position of the target hand, and the target device can be controlled to display the corresponding And the display content that matches the target display position controls the sliding or jumping of the display interface.

Considering that when multiple users are included in the first image to be detected, if the distance between the user and the user is relatively close, there may be interference between the user and the user's gestures, if it is detected that there is interference between the user and the user, Can adjust the fault tolerance mechanism of preset gesture category detection.

In an optional implementation manner, in the case that the first image to be detected includes multiple users, in the hand detection information based on the target hand, the second image to be detected that is obtained and the Before performing the limb tracking detection on the target limb connected to the target hand, the method further includes:

Step 1. Determine the target joint point position information of each user in the first to-be-detected image;

Step 2: Take each user in the first image to be detected as a target user, and based on the target joint position information of the target user, determine the target joint of the target user and the target joint of the multiple users. The horizontal distance between the target joint points of other users other than the target user;

Step 3: When it is determined based on the horizontal distance that there is no interfering user among the other users, the default gesture category of the target user is taken as the preset gesture category of the target user, and the Interfering users include users whose horizontal distance is less than a distance threshold corresponding to the target user.

Step 4. When it is determined that there is an interfering user among the other users based on the horizontal distance, adjust the default gesture category of the target user, and use the adjusted default gesture category as the target user's default gesture category. For the preset gesture category, adjusting the default gesture category includes at least one of the following operations: increasing the category of the default gesture category, increasing the category of the gesture category used to control at least one function of the target device, and adding the gesture category. The motion detection of the category is adjusted to the motion detection of the hand detection frame.

For step 1, limb detection can be performed on the first image to be detected, and limb detection information of each user in the first image to be detected can be determined. The limb detection information can include target joint point position information, and the joints of each user are obtained. point location information. The target joint point can be selected as required, for example, the target joint point can be the center point of the limb, that is, the center point 12 of the half-body limb in FIG. 2 , or the center point 0 of the crotch in FIG. 2 .

For step 2, each user in the first to-be-detected image can be used as a target user, and based on the target joint position information of the target user, the target joint of the target user and other users other than the target user among the multiple users can be determined. The horizontal distance between the target joint points of the user, that is, the abscissa value indicated by the target joint point position information of the target user and other users can be subtracted to determine the target joint point of the target user and multiple users except the target user. The horizontal distance between the target joint points of other users.

Then, based on the horizontal distance between the target user and other users, it can be determined whether there are interfering users in other users, if not, go to step 3; if there is, go to step four. Among them, when the horizontal distance between other users and the target user is greater than or equal to the distance threshold corresponding to the determined target user, other users are determined to be interfering users; if the horizontal distance between other users and the target user is less than the determined target user. When the distance threshold corresponding to the user is determined, it is determined that other users are not interfering users.

Wherein, the distance threshold corresponding to the target user can be determined according to the following steps A1 to A3:

Step A1, determining the position information of the first joint point and the position information of the second joint point of the target user;

Step A2, based on the position information of the first joint point and the position information of the second joint point, determine the intermediate distance used to represent the shoulder width of the target user;

Step A3: Determine the distance threshold corresponding to the target user based on the intermediate distance corresponding to the target user.

Exemplarily, the first joint point may be the left shoulder joint point 9 in FIG. 2 , and the second joint point may be the neck joint point 3 in FIG. 2 ; or, the first joint point may be the right shoulder joint in FIG. 2 . Point 6 and the second joint point may be the neck joint point 3 in Figure 2; or, the first joint point may be the right shoulder joint point 6 in Figure 2, and the second joint point may be the left shoulder joint point in Figure 2 9.

Then, based on the position information of the first joint point and the position information of the second joint point, the intermediate distance used to represent the shoulder width of the target user can be determined. The abscissa values indicated by the position information of the joint points are subtracted to determine the intermediate distance.

Finally, based on the intermediate distance corresponding to the target user, the distance threshold corresponding to the target user is determined. For example, the determined intermediate distance may be used as the distance threshold corresponding to the target user; alternatively, the determined intermediate distance may be reduced or enlarged, and the reduced or enlarged intermediate distance may be used as the distance threshold corresponding to the target user.

In step 3, if the target user does not interfere with the user, the default gesture category of the target user can be used as the default gesture category of the target user, and there is no need to adjust the default gesture category. In step 4, if it is determined that the target user is interfering with the user, the default gesture category corresponding to the target user may be adjusted, and the adjusted default gesture category is used as the preset gesture category of the target user.

For example, the types of default gesture categories can be added. For example, the default gesture category before adjustment is a dynamic gesture of one-finger circle, and the adjusted default gesture category can include: one-finger circle gesture category, and fist circle gesture category.

For another example, the types of gesture categories used to control at least one function of the target device can be added. For example, the first target gesture category for controlling the volume of the target device before the increase is the gesture category of raising the index finger and the middle finger, and the added control target The first target gesture category of the volume of the device may include: gesture category of raising index finger and middle finger, gesture category of palm, gesture category of raising three fingers, and the like.

Alternatively, the types of cut-off gesture categories may also be added. For example, the types of cut-off gesture categories before the addition are thumb-up gesture categories; the added cut-off gesture categories may be thumb-up gesture categories, index finger-raise gesture categories, and gesture categories of the vertical tail finger, etc.

For another example, the movement detection of the gesture category can also be adjusted to the movement detection of the hand detection frame, that is, the real-time movement of the gesture category is detected before adjustment, and the display position of the mobile logo on the target device is determined based on the detection result of the gesture category. . In some embodiments, before the adjustment: the target hand may be detected first, the current gesture category corresponding to the target hand may be determined, and when the current gesture category matches the set movement gesture category, the hand gesture of the target hand may be determined. and determine the display position of the mobile logo on the target device based on the hand position of the target hand; when the current gesture category does not match the set mobile gesture category, the hand position of the target hand is not determined. step, that is, the movement of the mobile logo on the display device cannot be controlled at this time, wherein the hand position of the target hand can be the position of the center point of the hand detection frame corresponding to the target hand, or it can also be the target hand The position of the hand center point set on the hand.

After adjustment: The real-time movement of the hand detection frame can be detected, and the display position of the movement mark on the target device can be determined based on the detection result of the hand detection frame. In some embodiments, the position information of the hand detection frame of the target hand may be determined, and based on the position information of the hand detection frame (for example, the position information of the center point of the hand detection frame), it is determined that the mobile identifier is in The display position on the target device, at this time there is no need to detect the current gesture category of the target hand.

Those skilled in the art can understand that the writing order of each step in the above method does not imply a strict execution order but constitutes any limitation on the implementation process, and the execution order of each step should be determined by its function and possible internal logic.

Based on the same concept, an embodiment of the present disclosure also provides an image recognition-based device control apparatus. Referring to FIG. 3 , a schematic diagram of the architecture of an image recognition-based device control apparatus provided by an embodiment of the present disclosure includes the first A determination module 301, a detection module 302, and a control module 303, wherein:

The first determining module 301 is configured to perform hand detection on the acquired first image to be detected, and determine the hand detection information of the target hand matching the preset gesture category;

The detection module 302 is configured to perform limb tracking detection on the target limb connected to the target hand in the acquired second image to be detected based on the hand detection information of the target hand, and determine the target hand The gesture recognition result in the second to-be-detected image; wherein, the second to-be-detected image is an image acquired after the first to-be-detected image;

The control module 303 is configured to control the target device based on the gesture recognition result.

In a possible implementation manner, before the control of the target device based on the gesture recognition result, the method further includes: a second determination module 304 configured to:

detecting whether the target hand satisfies the cut-off condition;

In a possible implementation manner, the first determination module 301, when performing hand detection on the acquired first image to be detected, is configured to:

In a possible implementation manner, the control module 303, when controlling the target device, includes at least one of the following:

adjust the volume of the target device;

Sliding or jumping of the display interface.

In a possible implementation manner, in the case that the first image to be detected includes multiple users, based on the hand detection information of the target hand, the obtained second image to be detected is compared with the Before performing the limb tracking detection on the target limb connected to the target hand, the method further includes: an adjustment module 305, which is configured to:

In a possible implementation manner, the adjustment module 305 is further configured to:

In a possible implementation manner, the apparatus further includes a distance threshold determination module, and the distance threshold determination module 306 is configured to determine the distance threshold corresponding to the target user according to the following steps:

determining the position information of the first joint point and the position information of the second joint point corresponding to the target user;

Based on the intermediate distance corresponding to the target user, the distance threshold corresponding to the target user is determined.

In some embodiments, the functions or templates included in the apparatus provided in the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments. For implementation, reference may be made to the descriptions in the above method embodiments. Repeat.

Based on the same technical concept, an embodiment of the present disclosure also provides an electronic device. Referring to FIG. 4 , a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure includes a processor 401 , a memory 402 , and a bus 403 . Among them, the memory 402 is configured to store execution instructions, including the memory 4021 and the external memory 4022; the memory 4021 here is also called internal memory, and is configured to temporarily store the operation data in the processor 401 and the external memory 4022 such as the hard disk. Data, the processor 401 exchanges data with the external memory 4022 through the memory 4021. When the electronic device 400 is running, the processor 401 and the memory 402 communicate through the bus 403, so that the processor 401 executes the following instructions:

Based on the gesture recognition result, the target device is controlled.

In addition, embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is run by a processor, the image recognition-based device described in the foregoing method embodiments is executed Control Method.

The computer program product of the image recognition-based device control method provided by the embodiments of the present disclosure includes a computer-readable storage medium storing computer-readable codes, and the instructions included in the computer-readable codes can be used to execute the above method embodiments. The device control method based on image recognition.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, for the specific working process of the system and device described above, reference may be made to the corresponding process in the foregoing method embodiments, which will not be repeated here. In the several embodiments provided by the present disclosure, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. The apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, which may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a processor-executable non-volatile computer-readable storage medium. Based on such understanding, the technical solutions of the present disclosure can be embodied in the form of software products in essence, or the parts that contribute to the prior art or the parts of the technical solutions. The computer software products are stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person skilled in the art who is familiar with the technical scope of the present disclosure can easily think of changes or substitutions, which should be covered within the scope of the present disclosure. within the scope of the present disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Industrial Applicability

By performing hand detection on the first image to be detected, the hand detection information of the target hand that matches the preset gesture category is determined, and based on the hand detection information of the target hand, the obtained second image to be detected and the The target limb connected to the target hand performs limb tracking detection to determine the gesture recognition result of the target hand in the second to-be-detected image. In this way, the target hand that is difficult to be tracked and detected can be tracked by means of limb tracking, and then the target device can be controlled based on the gesture recognition result. In the hands of many users, or in the two hands of the same user, by locking the target hand and using the unique matching between the limb and the hand, limb tracking is carried out for the purpose of tracking the target hand , and based on the limb tracking result, the gesture recognition result of the target hand in the second to-be-detected image is obtained, thereby effectively reducing the risk of other users' problems when performing image recognition on the gesture of the target user corresponding to the target hand controlling the target device. The interference generated by hand movements improves the accuracy of image recognition, thereby improving the control accuracy of the target device.

Claims

A device control method based on image recognition, comprising:

Perform hand detection on the acquired first image to be detected, and determine the hand detection information of the target hand matching the preset gesture category;

Based on the hand detection information of the target hand, perform limb tracking detection on the target limb connected to the target hand in the acquired second to-be-detected image, and determine that the target hand is in the second to-be-detected image The gesture recognition result in the image; wherein, the second to-be-detected image is an image obtained after the first to-be-detected image;

Based on the gesture recognition result, the target device is controlled.
The method according to claim 1, wherein before the controlling the target device based on the gesture recognition result, the method further comprises:

Detecting whether the gesture recognition result satisfies the cut-off condition;

When it is detected that the gesture recognition result satisfies the cutoff condition, in the second image to be detected, the hand detection information of the target hand matching the preset gesture category is re-determined.
The method according to claim 2, wherein the gesture recognition result satisfying the cut-off condition comprises one or more of the following:

In the second image to be detected, the gesture category indicated by the gesture recognition result of the target hand is an invalid gesture category, and the invalid gesture category includes at least one of the following: the gesture category and the preset gesture category mismatch, and the target hand has not moved;

In the case where the second image to be detected includes multiple frames, the gesture category indicated by the gesture recognition result of the target hand is the invalid gesture category. The number of frames is greater than or equal to the number threshold, and/or the duration is greater than or equal to is equal to the duration threshold;

In the second image to be detected, the gesture category indicated by the gesture recognition result of the target hand is a valid gesture category, and the valid gesture category is used to instruct to re-determine the target hand and/or hand detection information.
The method according to any one of claims 1 to 3, wherein the performing hand detection on the acquired first image to be detected comprises:

performing limb detection on the acquired first image to be detected to obtain limb detection information;

Based on the limb detection information, hand detection is performed on the first image to be detected, and the hand detection information of the target hand associated with the limb is determined.
The method according to any one of claims 1 to 3, wherein the performing hand detection on the acquired first image to be detected comprises:

Performing limb detection and hand detection on the acquired first image to be detected, respectively, to obtain limb detection information and the hand detection information;

determining the distance between the hand and the limb based on the limb detection information and the hand detection information;

Based on the distance, the hand detection information for the target hand associated with the limb is determined.
The method according to any one of claims 1 to 5, wherein the control target device includes at least one of the following:

adjust the volume of the target device;

Adjust the working mode of the target device, the working mode includes turning off or turning on at least part of the function of the target device;

Displaying the mobile logo in the display interface of the target device, or adjusting the display position of the mobile logo in the display interface;

reduction or enlargement of at least part of the displayed content in the display interface;

Sliding or jumping of the display interface.
The method according to any one of claims 1 to 6, wherein in the case that the first image to be detected includes multiple users, in the hand detection information based on the target hand, the acquired Before performing the limb tracking detection on the target limb connected to the target hand in the second to-be-detected image, the method further includes:

Determine the target joint point position information of each user in the first to-be-detected image;

Taking each user in the first to-be-detected image as a target user, and based on the target joint position information of the target user, determine the target joint of the target user and a plurality of users, except for the target The horizontal distance between the target joint points of other users other than the user;

When it is determined based on the horizontal distance that there is no interfering user among the other users, the default gesture category of the target user is taken as the preset gesture category of the target user, and the interfering user includes Users whose horizontal distance is smaller than a distance threshold corresponding to the target user.
The method of claim 7, further comprising:

When it is determined that there is an interfering user among the other users based on the horizontal distance, the default gesture category of the target user is adjusted, and the adjusted default gesture category is used as the preset gesture category of the target user. Assuming a gesture category, adjusting the default gesture category includes at least one of the following operations: increasing the category of the default gesture category, increasing the category of the gesture category used to control at least one function of the target device, and moving the gesture category The detection is adjusted to the motion detection of the hand detection frame.
The method according to claim 7 or 8, wherein the distance threshold corresponding to the target user is determined according to the following steps:

determining the position information of the first joint point and the position information of the second joint point of the target user;

based on the position information of the first joint point and the position information of the second joint point, determining an intermediate distance used to represent the shoulder width of the target user;

Based on the intermediate distance, the distance threshold corresponding to the target user is determined.
A device control device based on image recognition, comprising:

a first determining module, configured to perform hand detection on the acquired first image to be detected, and determine the hand detection information of the target hand matching the preset gesture category;

The detection module is configured to, based on the hand detection information of the target hand, perform limb tracking detection on the target limb connected to the target hand in the acquired second image to be detected, and determine that the target hand is in the The gesture recognition result in the second to-be-detected image; wherein, the second to-be-detected image is an image obtained after the first to-be-detected image;

The control module is configured to control the target device based on the gesture recognition result.
An electronic device, comprising: a processor, a memory and a bus, the memory stores machine-readable instructions executable by the processor, and when the electronic device is running, the processor and the memory communicate through the bus , the image recognition-based device control method according to any one of claims 1 to 9 is executed when the machine-readable instructions are executed by the processor.
A computer-readable storage medium storing a computer program on the computer-readable storage medium, when the computer program is executed by a processor, the image recognition-based device control method according to any one of claims 1 to 9 is executed.
A computer program, comprising computer-readable codes, when the computer-readable codes are executed in an electronic device, a processor in the electronic device implements the method described in any one of claims 1 to 9 when executed. Device control method based on image recognition.