WO2023025181A1

WO2023025181A1 - Image recognition method and apparatus, and electronic device

Info

Publication number: WO2023025181A1
Application number: PCT/CN2022/114436
Authority: WO
Inventors: 林高杰; 罗宇轩; 唐堂
Original assignee: 北京字跳网络技术有限公司
Priority date: 2021-08-27
Filing date: 2022-08-24
Publication date: 2023-03-02
Also published as: CN115731570A

Abstract

Embodiments of the present application disclose an image recognition method and apparatus, and an electronic device. A specific implementation of the method comprises: determining the current hand region from a target image frame of an image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and images corresponding to two hands in the two-hand region to be recognized have an overlapping region; and when the current hand region is the two-hand region to be recognized, adjusting the preceding hand pose information of a preceding hand region in a preceding image frame to obtain the current hand pose information of the hand region to be recognized, wherein the preceding image frame comprises an image frame before the target image frame in the image frame sequence.

Description

Image recognition method, device and electronic equipment

Cross References to Related Applications

This application claims the priority of the Chinese patent application with the application number 202110999935.5 and the invention title "Image Recognition Method, Device and Electronic Equipment" filed on August 27, 2021, the entire content of which is incorporated by reference in this application .

technical field

The present disclosure relates to the field of computer technology, in particular to an image recognition method, device and electronic equipment.

Background technique

With the development of computer technology, more and more users use terminal devices to implement various functions.

In some application scenarios, the electronic device can recognize the gesture of the user's hand and respond according to the gesture of the user's hand, so that the user can interact with the electronic device.

Contents of the invention

This Disclosure section is provided to introduce a simplified form of concepts that are described in detail that follow in the Detailed Description section. This disclosure part is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.

In a first aspect, an embodiment of the present disclosure provides an image recognition method, the method includes: determining the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized Or to be identified hands area, the images corresponding to the two hands in the area to be identified have overlapping areas; Adjusting the previous hand posture information to obtain the current hand posture information of the hand area to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that is prior to the target image frame.

In a second aspect, an embodiment of the present disclosure provides an image recognition device, including: a determination unit configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is the One-handed area or two-hands area to be identified, the images corresponding to the two hands in the two-hands area to be identified have an overlapping area; the adjustment unit is used to adjust the previous image when the current hand area is the two-hands area to be identified Adjust the previous hand posture information in the front hand region in the frame to obtain the current hand posture information of the hand region to be recognized, wherein the previous image frame includes the sequence of image frames in the sequence The image frame preceding the target image frame.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more executed by one or more processors, so that the one or more processors realize the image recognition method as described in the first aspect.

In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image recognition method as described in the first aspect are implemented.

Description of drawings

The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.

FIG. 1 is a flowchart of an embodiment of an image recognition method according to the present disclosure;

2 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure;

FIG. 3 is a flowchart of another exemplary implementation of an image recognition method according to the present disclosure;

FIG. 4 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure;

FIG. 5A is a schematic diagram of another application scenario of the image recognition method of the present disclosure;

FIG. 5B is a schematic diagram of another application scenario of the image recognition method of the present disclosure;

FIG. 6 is a schematic diagram of another application scenario of the image recognition method of the present disclosure;

FIG. 7A is a schematic diagram of another application scenario of the image recognition method of the present disclosure;

FIG. 7B is a schematic diagram of another application scenario of the image recognition method of the present disclosure;

Fig. 8 is a schematic structural diagram of an embodiment of an image recognition device according to the present disclosure;

FIG. 9 is an exemplary system architecture to which an image recognition method according to an embodiment of the present disclosure can be applied;

Fig. 10 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.

It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.

As used herein, the term "comprising" and its variants are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.

It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.

It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".

The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.

Please refer to FIG. 1 , which shows the flow of an embodiment of the image recognition method according to the present disclosure. The image recognition method can be applied to terminal equipment. The image recognition method as shown in Figure 1 comprises the following steps:

Step 101, in the target image frame of the image frame sequence, determine the current hand area.

In this embodiment, the executing subject of the image recognition method (such as a terminal device) may determine the current hand area from the target image frame in the sequence of image frames.

Here, the above image frame sequence may include at least two image frames. The target image frame may be any image frame in the sequence of image frames.

In this embodiment, the above-mentioned current hand region may be a single-hand region to be recognized or a two-hand region to be recognized.

Here, an image of a single hand may be included in the single-hand area to be recognized.

Here, the area of both hands to be recognized may include images of two hands, and there is an overlapping area between the images of the two hands.

Step 102, when the current hand region is the hands region to be recognized, adjust the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized.

Here, the above-mentioned previous image frame may include an image frame in the image frame sequence before the target image frame.

Optionally, the number of previous image frames may be one or at least two. The previous image frame and the target image frame may or may not be adjacent.

Here, the hand image area in the previous image frame may be referred to as the previous hand area. In the previous hand pose information, the hand pose in the previous hand region in the previous image frame may be indicated. It can be understood that the front hand region may include two hands, and the two hands may be mutually occluded (that is, overlapping areas exist), or may not be mutually occluded. Correspondingly, the front hand region may include both-hands region (including images of two hands, and there is an overlapping region between the images of the two hands), and may also include a single-hand region.

It can be understood that if the hand image in the previous hand region and the hand image in the current hand region may belong to the same person.

In this embodiment, the current hand pose information may indicate the hand pose in the current hand region in the target image frame.

In this embodiment, the previous hand pose information may indicate the hand pose in the previous hand region in the previous image frame.

In some application scenarios, hand posture information may include at least one of the following but not limited to: 3D rotation information of each joint of the hand, 2D position information of the root node of the hand in the image frame, Dimension information in . It can be understood that specific items of hand gesture information may be set according to actual application scenarios.

In this embodiment, the way of adjusting the posture information of the front hand can be set according to the actual application scenario, which is not limited here.

As an example, one or more items of previous hand posture information may be adjusted to obtain current hand posture information.

It should be noted that the image recognition method provided in this embodiment can first determine the current hand region from the target image frame of the image frame sequence. The current hand region may include a single-hand region to be recognized or a double-hand region to be recognized. There is an overlapping area between the images corresponding to the two hands in the two-hands area. Then, when the current hand region is the hands region to be recognized, the previous hand pose information in the previous hand region in the previous image frame is adjusted to obtain the current hand pose information of the hand region to be recognized. Thus, a new image recognition method can be provided.

It should be noted that this image recognition method can adjust the previous hand pose information of the previous image frame to obtain the current hand pose of the hands area in the target image frame. In the scene where the hands images have overlapping areas, the scene characteristics make It is difficult to directly recognize the hand pose of the occluded hand image. The video frame sequence can represent the change from the hand pose of the previous image frame to the hand pose of the target image frame. According to this change, adjust the hand pose information in the previous image to get The current hand posture information can improve the accuracy of the determined hand posture information in an image scene with mutually occluded hands.

In some embodiments, the above method may further include: adding hand image special effects on the target image frame according to the current hand posture information.

Optionally, the hand image special effect can be various forms of special effects, for example, a sticker special effect (adding a sticker to the hand image to cover the hand image).

It should be noted that the hand image special effect added according to the above current hand posture information can make the effect of adding the special effect more adaptable to the hand image, and the adding effect is more natural. Specifically, the current hand pose information obtained according to the previous hand pose information of the previous image, even if there is a certain deviation from the real hand pose, due to the use of the characteristics that the hand pose is unlikely to change abruptly, It can ensure that the hand image special effect is adapted to the hand posture in the target image frame, so that in the scene where the hand posture is used to drive the hand image special effect, when the hands are close to or even overlapped, a stable and natural driving effect is produced.

In some embodiments, the above step 101 may include step 201 , step 202 and step 203 shown in FIG. 2 .

Step 201, in the target image frame, determine the position of the hand image to obtain at least one undetermined hand region.

Here, the pending hand region can be determined in various ways. In some application scenarios, the undetermined hand area can be understood as the hand area obtained through preliminary positioning.

In this embodiment, the aforementioned undetermined hand area may be understood as a preliminarily determined hand area.

It can be understood that the number of undetermined hand regions obtained by determining the position of the hand image on the target image frame may be 0, may be one, may be two, or may be greater than two. If it is 0, it means that the target image frame does not include the hand image, and this situation can be ignored and processed in this embodiment. If it is one, then the hand image in the undetermined hand region may be a single-handed image or a double-handed image. If there are two, then the hand image in each undetermined hand area may be a single-hand image or a double-hand image. If more than two undetermined hand regions are identified, the hand images can be divided into multiple groups according to the image features, and the hand images in each group of hand images correspond to the same person. Pending hand area.

In other words, for the case where there are multiple human hand images in the target image frame, multiple sets of region groups including at least one undetermined hand region can be obtained. For convenience of description, in this application, at least one undetermined hand region belonging to the same person is taken as an example for description. Thus, in the case that a general person has no more than two hands, the number of hand regions to be determined may be one or two.

In this embodiment, the pending hand area may be indicated by area indication information. As an example, a tracking box may be used to indicate the pending hand region.

Step 202, based on the positional relationship of the hand images in the undetermined hand regions, determine the processing mode of each undetermined hand region.

Here, the above processing manner may include but not limited to at least one of the following: no processing, splitting and merging.

Here, the above-mentioned positional relationship of the hand images may include the existence of overlapping regions or the absence of overlapping regions.

Optionally, there is an overlapping area, which may be that the images of the two hands in the two-handed area overlap, or that the images of the two hands in the single-handed area have an overlapping area.

Optionally, there is no overlapping area, which may mean that the images of the two hands respectively located in the two single-handed areas do not overlap, or that the images of the two hands in the two-handed area do not have an overlapping area.

Step 203, adopt the determined processing method to process each pending hand area to obtain the current hand area.

In some application scenarios, the current hand image obtained by the above-mentioned processing method can make the two hand images with overlapping areas in the two-handed area, and the one-handed image that does not overlap with other hand images in the one-handed area. in the area.

It should be noted that, by processing the undetermined hand region based on the positional relationship of the hand images in the undetermined hand region, an accurate single-hand image or two-hand image can be obtained, avoiding errors in determining the pending hand region (for example, the region includes Two independent single-handed images, or two areas overlap) lead to recognition errors.

In some embodiments, the above step 201 may be implemented by calling a hand detection model. The hand detection model can detect the undetermined hand area in the target image, such as a rectangular box containing the appearance of the hand.

In some embodiments, the above step 201 may include: when the previous image frame includes a hand image, adjusting the previous hand area of the previous image frame to obtain the undetermined hand area of the target image frame; When the previous image frame does not include a hand image, perform hand image recognition on the target image frame to obtain the undetermined hand region.

As an example, the hand tracking model may be invoked to locate the pending hand region of the target image frame near the previous hand region of the previous frame image.

It should be noted that, according to the undetermined hand region obtained from the previous hand region position, the characteristics of the limited hand speed and the possible proximity of the target image frame and the previous image frame can be used to avoid the hand region from the whole image. Search, reducing the time and computation consumed to determine the hand region.

In some embodiments, different tracking logics may be adopted according to whether there is an overlapping area of the hands. When the hands are close to each other, the hand tracking model can locate the two hands in a rectangular frame (which can be a two-handed area), and track the two hands as a whole. When the two hands are separated by a certain distance and do not overlap, the left and right hands are regarded as independent individuals and tracked separately (the rectangular frame containing the appearance of one hand can be called a single-handed area). Thus, the accuracy of the tracking effect can be improved.

In some embodiments, when the previous image frame includes a hand image, tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.

When the hands are close and do not overlap, since the two hands have similar appearance, their tracking process may interfere with each other, such as tracking the left hand to the right hand, tracking the right hand to the left hand, or even the tracking results are completely confused. According to whether there is an overlapping area between the hands, different tracking logics can be used to achieve: when the two hands are close to each other, it can avoid the tracking confusion that is easy to occur when tracking one hand alone; when the two hands do not have an overlapping area When , tracking the two hands separately can effectively ensure the accuracy of the tracking effect.

It should be noted that the tracking method provided in the present application can improve the accuracy of the current hand region by using the current hand region determined by the position of the previous hand region in the previous image frame. Specifically, on the basis of having previous image frames in the front hand region, the front hand region may be adjusted, errors (for example, two independent single-hand images are included in the region, or two regions overlap ) is less likely, so the accuracy in the front hand area can be guaranteed, and thus the accuracy in the current hand area can be guaranteed.

In some embodiments, please refer to FIG. 3 , step 202 may include: step 2021 , step 2022 and step 2023 .

Step 2021, determine the number of hands in the pending hand area.

Step 2022, for a pending hand region whose number of hands is not less than 2, determine whether to split the pending hand region.

In general, the number of hands in the pending hand area is 2. For the undetermined hand region with the number of hands greater than 2, the hand images in the undetermined hand region can be grouped in pairs, and then processed with reference to the number of hands in the undetermined hand region being 2.

If the number of hands in the undetermined hand area is 2, it can be judged that the undetermined hand area is split into two single-handed areas to ensure the positioning accuracy of the two hand areas. In contrast, if the distance between the two hands is relatively long and they are located in a two-handed area, the two-handed area will be too large, while the hand image will be too small, resulting in a decrease in recognition accuracy.

Step 2023, for at least two pending hand regions with a hand quantity of 1, determine whether to merge the pending hand regions with a hand quantity of 1.

In some application scenarios, there may be only one undetermined hand region with a hand number of 1 in the target image frame. This is a one-handed operation scenario, and will not be further discussed here.

If the number of hands in the undetermined hand area is 1, it can be judged whether to merge the two undetermined hand areas into a two-handed area to ensure the accuracy of hand image tracking and gesture recognition when both hands have overlapping areas.

It should be noted that, according to the number of hands in the undetermined hand area, for the undetermined hand area with the number of hands not less than 2 and the undetermined hand area with the number of hands being 1, different logics are used for judgment, and the obtained The accuracy of the image of the single hand to be recognized or the image of both hands to be recognized in the current hand image.

In some embodiments, the above step 2022 may include: in the undetermined hand region with the number of hands not less than two, locating the first hand image to obtain the undetermined first subregion, and locating the second hand image to obtain the undetermined second subregion area.

Here, if the number of hands is not less than 2, after pairwise grouping, for the two hand images in each group, the positions of the two hand images can be located to obtain two sub-regions. As an example, the left-hand image can be positioned to obtain the first sub-region to be determined, and the right-hand image can be positioned to obtain the second sub-region to be determined. Optionally, it may also be two left hands of two people, or two right hands of two people, which will not be repeated here.

As an example, a pre-trained single-hand localization model can be used to localize single-hand images. The training images for the one-hand positioning model may include images with overlapping images of both hands or images with a small distance between the images of the two hands (for example, less than a preset threshold). Therefore, by using the single-hand positioning model to process the first sub-region and the second sub-region obtained from the hand region to be recognized, the positioning accuracy of the hand image is relatively high, and confusion is less likely to occur.

In some embodiments, step 203 may include: in response to determining that there is an overlapping area between the pending first sub-region and the pending second sub-region, not splitting the pending hand region whose number of hands is not less than 2, and dividing the pending hand region The undetermined hand area with the number of hands not less than 2 is determined as the hand area to be identified.

If there is an overlapping area between the first to-be-stator area and the second to-be-stator area, no additional processing is performed, and the two-handed area including two hands is still taken as a whole.

Please refer to FIG. 4 , which shows various implementation manners of step 203 among the implementation manners of step 2021 , step 2022 and step 2023 .

Step 203 includes: in response to determining that there is no overlapping area between the pending first sub-region and the pending second sub-region, splitting the pending hand region whose number of hands is not less than 2 to obtain the pending single-hand region.

If there is no overlapping area between the pending first subregion and the pending second subregion, then the pending first subregion can be determined as a single-handed region to be identified, and the pending second subregion can be determined as another single-handed region to be identified.

In some embodiments, the above-mentioned step 2023 may include: determining whether any two pending hand regions with a hand quantity of 1 have overlapping regions.

Step 203 may include: merging pending hand regions with overlapped regions to obtain the to-be-recognized hands region.

As an example, in FIG. 5A , there is an overlapping area between the two undetermined hand areas whose hand number is 1. Therefore, combining the two undetermined hand regions in FIG. 5A results in one hands-shaped region in FIG. 5B .

Step 203 may include: if the undetermined hand area with the number of hands being 1 does not overlap with any undetermined hand area, determine the undetermined hand area with the number of hands being 1 as the single-hand area to be identified .

There is no overlapping area between the two pending hand regions whose number of hands is 1 in FIG. 6 , and these two pending hand regions are determined as independent single-hand regions to be recognized.

In the target image frame, determine the position of the hand image to obtain at least one undetermined hand region.

In this embodiment, determining the position of the hand image may be implemented in various manners, which are not limited here.

In some embodiments, the hand pose information may include at least one of the following but not limited to: three-dimensional rotation information, root node position information and size information.

Here, the three-dimensional rotation information may indicate the degree of three-dimensional rotation of each joint of the human hand. In some application scenarios, three-dimensional rotation information can be expressed in the form of Euler angles or rotation matrices. As an example, the three-dimensional rotation information represented by Euler angles may include the rotation angles of a certain finger joint around the X axis, the Y axis, and the Z axis.

Here, the position information of the root node may indicate a preset position of the root node of the hand in the hand area (for example, a tracking frame). In some application scenarios, the location information of the root node may be represented by two-dimensional pixel coordinates of the root node in the image. The hand root node can be a pre-specified hand location, such as the palm center point.

Here, the size information may refer to the size of the interactive hand image in the image. As an example, size information may be expressed in absolute or relative sizes.

Please refer to FIG. 7A. FIG. 7A shows the relevant parameters of the previous hand pose information in the previous image frame, S' shows the size information, and a' and b' show the position information of the root node. 2D pixel coordinates. Three-dimensional rotation information is not shown.

Please refer to FIG. 7B. FIG. 7B shows the relevant parameters of the previous hand pose information in the previous image frame. S shows the size information, and a and b show the two-dimensional pixels representing the position information of the root node. coordinate. Three-dimensional rotation information is not shown.

In some embodiments, when the current hand region is the hands region to be identified, adjusting the previous hand pose information of the previous hand region in the previous image frame includes at least one of the following but not Limited to: according to the three-dimensional rotation information in the front hand posture information, determine the three-dimensional rotation information in the current hand posture information; according to the relative position of the hand root node in the front hand area, determine the hand root node in the The relative position in the area of the hands to be identified; according to the size information of the hand image in the front hand area, determine the size information of the corresponding hand area in the area of the hands to be identified.

It should be noted that the hand pose can be restored by using three-dimensional rotation information, root node position information and size information to represent the hand pose information. Further, the restored hand pose has continuity with the hand pose of the previous image frame, so as to ensure that the restored hand pose can be used in the texture special effect scene to ensure the fit and naturalness of the texture and the hand image. degree.

Here, the hands area to be identified includes a first sub-area and a second sub-area, the hand gesture information of the hand image in the first sub-area can be referred to as the first sub-pose information, and the hand image in the second sub-area The hand gesture information of can be referred to as the second sub-pose information.

Here, the third sub-region and the fourth sub-region are included in the front hand region. The image of the hand in the third sub-area and the image of the hand in the first sub-area indicate the same hand, and the image of the hand in the fourth sub-area indicates the same hand as the image of the hand in the second sub-area.

Here, the previous hand gesture information includes third sub-pose information and fourth sub-pose information.

In some application scenarios, the first sub-pose information of the first sub-region can be determined according to the third sub-pose information of the hand image in the third sub-region; according to the fourth sub-pose information of the hand image in the fourth sub-region , to determine the second sub-pose information of the second sub-region.

It should be noted that, for the two hand images in the two-hand area to be recognized, the adjustment is based on the previous hand image corresponding to each hand image (the corresponding hand image in the previous image frame), which can avoid The image recognition of the two hands is confused, ensuring the accuracy of the gesture information of each hand.

In contrast, since the two-hand frame contains both the appearance information of the left hand and the right hand, the accuracy of the model's predicted pose results will be greatly reduced, resulting in confusing and unnatural results driven by hand poses. The reasons may include: first, the left and right hands have very similar appearance, and the model is easily disturbed by the appearance of the other hand when predicting the pose of one hand; second, the left and right hands have complex interaction relationships, so the occlusion of the two hands is often Very complex, one hand may be almost completely occluded by the other, such extreme scenes lack sufficient appearance information to predict the pose of the hand.

In the scene where both hands are in the same frame, we do not directly predict the pose of the left and right hands, but use the position of the hand frame of the current frame to correct the result of the hand pose of the previous frame, so as to obtain the result of the hand pose of the current frame. Although such a hand driving result may not be consistent with the actual human hand posture, it guarantees the naturalness of the driving effect to a certain extent, and no more chaotic and unstable driving effects.

In some embodiments, the third sub-pose information includes third sub-3D rotation information, third sub-root node position information, and third sub-size information. As mentioned above, the third sub-pose information may indicate the pose of a hand (eg, left hand) in the front hand region.

Here, the third sub-pose information is taken as an example to illustrate how to correct the third sub-pose to obtain the first sub-pose information in the first sub-region of the current hand region. The process of obtaining the second sub-attitude information from the fourth sub-attitude information is similar to the process of obtaining the first sub-attitude information, and will not be repeated here.

In some embodiments, the determining the 3D rotation information in the current hand posture information according to the 3D rotation information in the previous hand posture information may include: combining the third sub-3D rotation information in the third sub-pose information, Determined as the first sub-3D rotation information.

It should be noted that, here, the three-dimensional rotation information may not be changed. In some application scenarios, the three-dimensional rotation information may have little effect on gesture driving. In this case, the three-dimensional rotation information may not be processed, thereby ensuring the accuracy of the driving effect and reducing the amount of calculation.

In some embodiments, the determining the relative position of the root node of the hand in the area of both hands to be identified according to the relative position of the root node of the hand in the area of the preceding hand may include: determining a third sub-root The first ratio of the width value in the node information to the width value of the third sub-area, and the product of the first ratio and the width value of the first sub-area, is determined as the width value in the first sub-root node information; The second ratio of the height value in the third sub-root node information to the height value of the third sub-region, and the product of the second ratio and the height value of the first sub-region is determined as the height value in the first sub-root node information .

It should be noted that, through the second ratio, the relative positions of the root node of the hand in the first sub-area and the root node of the hand in the third sub-area are the same, thus, the hand area (such as the tracking frame) can be reduced The difference caused by movement or scaling can accurately determine the position of the root node of the hand.

In some embodiments, the determining the size information of the corresponding hand region in the to-be-recognized hands region according to the size information of the hand image in the front hand region may include: determining the third sub-hand The hand size value in the size information and the third ratio of the third sub-region of the size value of the previous image frame, and the product of the third ratio and the size value of the target image frame is determined as the hand of the first sub-region size value.

Here, the size information of the hand region can be understood as the proportion of the length of the hand region in the image frame. As an example, the size of the hand region may indicate the length of the diagonal of the tracking box.

It should be noted that by determining the size information, the area of the hand image can be effectively determined, and a more accurate hand image can be determined. Furthermore, in the scene of special sticker effects, the accurate determination of the size information can greatly improve the degree of fit between the special sticker effects and the hand image, and improve the naturalness of the special sticker effects.

In some embodiments, the above method may further include: when the current hand region is a single-handed region to be recognized, calling the single-hand pose estimation model to recognize the hand poses in each of the single-handed regions in the single-handed region to be recognized, The fifth hand gesture information corresponding to each single-hand area to be identified is obtained.

Here, the hand gesture information may indicate the hand gesture in the single-hand area to be recognized. There are two single-hand regions to be recognized in the single-hand region to be recognized, so there may also be two hand gesture information.

Further referring to FIG. 8 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image recognition device, which corresponds to the method embodiment shown in FIG. 1 , and the device can specifically be Used in various electronic equipment.

As shown in FIG. 8 , the image recognition device of this embodiment includes: a determination unit 801 and an adjustment unit 802 . Wherein, the determination unit is configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the two-hand region to be recognized The images corresponding to the two hands have overlapped areas; the adjustment unit is used to adjust the previous hand posture information in the previous hand area in the previous image frame when the current hand area is the area of both hands to be identified , to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that precedes the target image frame.

In this embodiment, the specific processing of the recording unit determination unit 801 and the adjustment unit 802 of the image recognition device and the technical effects brought by them can refer to the relevant descriptions of step 101 and step 102 in the embodiment corresponding to FIG. 1 , here No longer.

In some embodiments, the device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.

In some embodiments, the determining the current hand region from the target image frame of the image frame sequence includes: determining the position of the hand image in the target image frame to obtain at least one pending hand region; The positional relationship of the hand images in the hand area, determine the processing mode of each undetermined hand area, wherein, the processing mode includes at least one of the following: no processing, splitting and merging; use the determined processing mode to process each undetermined hand area Hand area, get the current hand area.

In some embodiments, the processing method of determining the pending hand region based on the positional relationship of the hand images in the pending hand region includes: determining the number of hands in the pending hand region; for the number of hands not less than For the pending hand region of 2, determine whether to split the pending hand region; for at least two pending hand regions with the number of hands being 1, determine whether to merge the pending hand regions with the number of hands being 1.

In some embodiments, for the pending hand region with the number of hands not less than 2, determining whether to split the pending hand region includes: in the pending hand region with the number of hands not less than 2, positioning the first Obtaining the undetermined first sub-area from the hand image, and locating the second hand image to obtain the undetermined second sub-area; and processing each undetermined hand area by using the determined processing method to obtain the current hand area, including: responding to determining the undetermined sub-area There is an overlapping area between the first sub-region and the pending second sub-region, do not split the pending hand region with the number of hands not less than 2, and determine the pending hand region with the number of hands not less than 2 as the pending hand region to be identified Two-hand area: in response to determining that there is no overlapping area between the first sub-area to be determined and the second sub-area to be determined, splitting the undetermined hand area with the number of hands not less than 2 to obtain the single-hand area to be identified.

In some embodiments, for at least two pending hand regions with a hand number of 1, determining whether to merge the pending hand regions with a hand number of 1 includes: determining any two pending hand regions with a hand number of 1 The hand area, whether there is an overlapping area; the processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes: merging the undetermined hand areas with overlapping areas to obtain the unidentified hands area; If the pending hand region with the number of hands being 1 does not overlap with any pending hand region, the pending hand region with the number of hands being 1 is determined as the single-hand region to be identified.

In some embodiments, the determining the position of the hand image in the target image frame to obtain at least one undetermined hand region includes: when the previous image frame includes a hand image, tracking the position of the previous image frame In the previous hand area, determine the undetermined hand area of the target image frame; when it is determined that the previous image does not include the hand image, perform hand image recognition on the target image frame to obtain the undetermined hand area.

In some embodiments, the hand pose information includes at least one of the following: three-dimensional rotation information, hand root node information, and size information; The front hand pose information in the front hand region in the frame includes at least one of the following: according to the three-dimensional rotation information in the front hand pose information, determine the three-dimensional rotation information in the current hand pose information; The relative position of the node in the front hand area is to determine the relative position of the hand root node in the hands area to be identified; according to the size information of the hand image in the front hand area, determine the The size information of the corresponding hand region in the hands region is identified.

In some embodiments, the hands area to be identified includes a first sub-area and a second sub-area, and the front hand area includes a third sub-area and a fourth sub-area; wherein, the hand image in the third sub-area and the first sub-area The hand image in a sub-area indicates the same hand, the hand image in the fourth sub-area and the hand image in the second sub-area indicate the same hand; wherein, the first sub-pose information of the first sub-area, It is determined according to the third sub-pose information of the hand image in the third sub-region; the second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.

In some embodiments, the third sub-pose information includes third sub-3D rotation information; and determining the 3D rotation information in the previous hand pose information as the 3D rotation information in the current hand pose information includes: The third sub-3D rotation information in the third sub-pose information is determined as the first sub-3D rotation information.

In some embodiments, the third sub-pose information includes third sub-root node position information; and according to the relative position of the hand root node in the preceding hand region, it is determined that the hand root node is in the to-be-identified The relative position in the two-hand area includes: determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-area, and determining the product of the first ratio and the width value of the first sub-area is the width value in the first sub-root node information; determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-area, and compare the second ratio to the height value of the first sub-area The product is determined as the height value in the information of the first child root node.

In some embodiments, the third sub-pose information includes third sub-size information; and according to the size information of the hand image in the front hand region, determine the corresponding hand region in the to-be-recognized hands region The size information, including: determining the third ratio of the hand size value in the third sub-hand size information and the third sub-region of the size value of the previous image frame, and comparing the third ratio with the size value of the target image frame The product of is determined as the hand size value of the first sub-region.

In some embodiments, the device is further configured to: when the current hand region is a single-handed region to be identified, invoke a single-hand pose estimation model to identify each single-handed region in the single-handed region to be identified The hand posture information corresponding to each single-hand area to be recognized is obtained.

Please refer to FIG. 9 , which shows an exemplary system architecture in which the image recognition method of an embodiment of the present disclosure can be applied.

As shown in FIG. 9 , the system architecture may include

terminal devices

901 , 902 , and 903 , a network 904 , and a server 905 . The network 904 is used as a medium for providing communication links between the

terminal devices

901 , 902 , 903 and the server 905 . Network 904 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

The

terminal devices

901, 902, 903 can interact with the server 905 through the network 904 to receive or send messages and the like. Various client applications, such as web browser applications, search applications, and news information applications, may be installed on the

terminal devices

901, 902, and 903. The client applications in the

terminal devices

901, 902, and 903 can receive user instructions and complete corresponding functions according to the user instructions, such as adding corresponding information to information according to the user instructions.

Terminal devices

901, 902, and 903 may be hardware or software. When the

terminal devices

901, 902, and 903 are hardware, they may be various electronic devices that have display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc. When the

terminal devices

901, 902, and 903 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.

The server 905 may be a server that provides various services, such as receiving information acquisition requests sent by the

terminal devices

901, 902, and 903, and obtaining display information corresponding to the information acquisition requests in various ways according to the information acquisition requests. And the relevant data showing the information is sent to the

terminal devices

901 , 902 , 903 .

It should be noted that the image recognition method provided by the embodiment of the present disclosure may be executed by a terminal device, and correspondingly, the image recognition apparatus may be set in the

terminal devices

901 , 902 , and 903 . In addition, the image recognition method provided by the embodiment of the present disclosure may also be executed by the server 905 , and correspondingly, the image recognition device may be set in the server 905 .

It should be understood that the numbers of terminal devices, networks and servers in FIG. 9 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

Referring now to FIG. 10 , it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 9 ) suitable for implementing the embodiments of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 10 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.

As shown in FIG. 10, an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1001, which may be loaded into a random access memory according to a program stored in a read-only memory (ROM) 1002 or from a storage device 1008. (RAM) 1003 to execute various appropriate actions and processing. In the RAM 1003, various programs and data necessary for the operation of the electronic device 1000 are also stored. The processing device 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004. An input/output (I/O) interface 1005 is also connected to the bus 1004 .

Typically, the following devices can be connected to the I/O interface 1005: input devices 1009 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1007 such as a computer; a storage device 1008 including, for example, a magnetic tape, a hard disk, and the like; and a communication device 1008 . The communication means 1008 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While FIG. 10 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 1008, or from storage means 1008, or from ROM 1002. When the computer program is executed by the processing device 1001, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.

It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.

In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium The communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: determines the current hand region from the target image frame of the image frame sequence, wherein, The current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the images corresponding to the two hands in the two-hand region to be recognized have overlapping regions; when the current hand region is a two-hand region to be recognized, Adjusting the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes the middle position of the image frame sequence The image frame immediately preceding the target image frame.

In some embodiments, the electronic device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.

In some embodiments, the electronic device is further configured to: when the current hand area is a single-handed area to be identified, call a single-hand pose estimation model to identify each single-handed area in the single-handed area to be identified The hand pose in the image is obtained to obtain the hand pose information corresponding to each single-hand area to be recognized.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the determination unit may also be described as "a unit for determining the current hand region".

The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.

In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.

The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principles. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.

In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims

An image recognition method, characterized in that, comprising:

From the target image frame of the image frame sequence, determine the current hand region, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and images corresponding to two hands in the two-hand region to be recognized have overlapping areas;

When the current hand region is the hands region to be identified, adjusting the previous hand posture information in the previous hand region in the previous image frame to obtain the current hand posture information of the hand region to be recognized, wherein, The previous image frame includes an image frame in the image frame sequence preceding the target image frame.
The method according to claim 1, further comprising:

Adding hand image special effects to the target image frame according to the current hand posture information.
The method according to claim 1, wherein said determining the current hand region from the target image frame of the image frame sequence comprises:

In the target image frame, determining the position of the hand image to obtain at least one undetermined hand region;

Based on the positional relationship of the hand images in the undetermined hand area, determine the processing mode of each undetermined hand area, wherein the processing mode includes at least one of the following: no processing, splitting and merging;

Each pending hand area is processed by the determined processing method to obtain the current hand area.
The method according to claim 3, wherein said determining the processing method of the pending hand region based on the positional relationship of the hand images in the pending hand region includes:

Determine the number of hands in the pending hand region;

For the undetermined hand area whose hand quantity is not less than 2, determine whether to split the undetermined hand area;

For at least two pending hand regions with a hand quantity of 1, determine whether to merge the pending hand regions with a hand quantity of 1.
The method according to claim 4, wherein, for the undetermined hand area whose number of hands is not less than 2, determining whether to split the undetermined hand area comprises:

In the undetermined hand region with the number of hands not less than 2, locating the first hand image to obtain the undetermined first subregion, and locating the second hand image to obtain the undetermined second subregion; and

The processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes:

In response to determining that there is an overlapping area between the pending first sub-region and the pending second sub-region, not splitting the pending hand region with the number of hands not less than 2, and dividing the pending hand region with the number of hands not less than 2 Determined as the hands area to be identified;

In response to determining that there is no overlapping area between the pending first sub-region and the pending second sub-region, splitting the pending hand region with the number of hands not less than 2 to obtain the single-hand region to be identified.
The method according to claim 4, wherein, for the at least two pending hand regions whose number of hands is 1, determining whether to merge the pending hand regions whose number of hands is 1 comprises:

Determine whether any two undetermined hand areas with a hand number of 1 exist overlapping areas;

The processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes:

Merge the undetermined hand areas with overlapping areas to obtain the unidentified hands area;

If the undetermined hand area with the number of hands being 1 does not overlap with any undetermined hand area, the undetermined hand area with the number of hands being 1 is determined as the single-hand area to be identified.
The method according to claim 3, wherein, in the target image frame, determining the position of the hand image to obtain at least one undetermined hand region comprises:

When the previous image frame includes a hand image, determine the pending hand area of the target image frame according to the previous hand area of the previous image frame;

When it is determined that the previous image does not include a hand image, performing hand image recognition on the target image frame to obtain the pending hand region.
The method according to claim 7, wherein when the previous image frame includes a hand image, determining the pending hand area of the target image frame according to the previous hand area of the previous image frame, include:

If the front hand region is a hands region, the hands region is adjusted to obtain a pending hand region including both hands images.
The method according to claim 1, wherein the hand posture information includes at least one of the following: three-dimensional rotation information, hand root node information and size information; and

When the hand region is the hands region to be recognized, adjusting the previous hand posture information of the hand region in the previous image frame includes at least one of the following:

determining the three-dimensional rotation information in the current hand posture information according to the three-dimensional rotation information in the previous hand posture information;

According to the relative position of the root node of the hand in the region of the front hand, determine the relative position of the root node of the current hand in the region of both hands to be identified;

According to the size information of the hand image in the front hand region, the size information of the corresponding hand region in the to-be-recognized hands region is determined.
The method according to claim 9, wherein the area of both hands to be identified includes a first sub-area and a second sub-area, and the front hand area includes a third sub-area and a fourth sub-area; wherein, the third sub-area The hand image in the first sub-area indicates the same hand as the hand image in the first sub-area, and the hand image in the fourth sub-area indicates the same hand as the hand image in the second sub-area;

Wherein, the first sub-pose information of the first sub-region is determined according to the third sub-pose information of the hand image in the third sub-region;

The second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
The method according to claim 10, wherein the third sub-pose information includes third sub-3D rotation information; and

According to the three-dimensional rotation information in the previous hand posture information, determine the three-dimensional rotation information in the current hand posture information, including:

The third sub-3D rotation information in the third sub-attitude information is determined as the first sub-3D rotation information.
The method according to claim 10, wherein the third sub-pose information includes position information of a third sub-root node; and

The determining the relative position of the current hand root node in the hands-to-be-recognized area according to the relative position of the hand root node in the front hand area includes:

determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-region, and determining the product of the first ratio and the width value of the first sub-region as the first ratio in the first sub-root node information width value;

Determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-region, and determine the product of the second ratio and the height value of the first sub-region as the first sub-root node information height value.
The method according to claim 10, wherein the third sub-pose information includes third sub-size information; and

According to the size information of the hand image in the front hand area, determining the size information of the corresponding hand area in the hands area to be identified includes:

Determine the third ratio of the hand size value in the third sub-hand size information to the size value of the previous image frame in the third sub-region, and determine the product of the third ratio and the size value of the target image frame as the first Hand size value for a subregion.
The method according to claim 1, further comprising:

When the current hand area is a single-handed area to be identified, the single-hand pose estimation model is invoked to identify the hand gestures in each of the single-handed areas in the single-handed area to be identified, and the correspondence between each single-handed area to be identified is obtained. hand gesture information.
An image recognition device, characterized in that it comprises:

The determining unit is configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and two hands in the region to be recognized are The images corresponding to only hands have overlapping regions;

An adjustment unit, configured to adjust the previous hand posture information of the previous hand region in the previous image frame when the current hand region is the hands region to be recognized, to obtain the current hand of the hand region to be recognized Posture information, wherein the previous image frame includes an image frame in the sequence of image frames that is prior to the target image frame.
An electronic device, characterized in that it comprises:

one or more processors;

storage means for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-14.
A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-14 is realized.