WO2023025181A1 - Procédé et appareil de reconnaissance d'image, et dispositif électronique - Google Patents

Procédé et appareil de reconnaissance d'image, et dispositif électronique Download PDF

Info

Publication number
WO2023025181A1
WO2023025181A1 PCT/CN2022/114436 CN2022114436W WO2023025181A1 WO 2023025181 A1 WO2023025181 A1 WO 2023025181A1 CN 2022114436 W CN2022114436 W CN 2022114436W WO 2023025181 A1 WO2023025181 A1 WO 2023025181A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
region
area
sub
hands
Prior art date
Application number
PCT/CN2022/114436
Other languages
English (en)
Chinese (zh)
Inventor
林高杰
罗宇轩
唐堂
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023025181A1 publication Critical patent/WO2023025181A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the present disclosure relates to the field of computer technology, in particular to an image recognition method, device and electronic equipment.
  • the electronic device can recognize the gesture of the user's hand and respond according to the gesture of the user's hand, so that the user can interact with the electronic device.
  • an embodiment of the present disclosure provides an image recognition method, the method includes: determining the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized Or to be identified hands area, the images corresponding to the two hands in the area to be identified have overlapping areas; Adjusting the previous hand posture information to obtain the current hand posture information of the hand area to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that is prior to the target image frame.
  • an embodiment of the present disclosure provides an image recognition device, including: a determination unit configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is the One-handed area or two-hands area to be identified, the images corresponding to the two hands in the two-hands area to be identified have an overlapping area; the adjustment unit is used to adjust the previous image when the current hand area is the two-hands area to be identified Adjust the previous hand posture information in the front hand region in the frame to obtain the current hand posture information of the hand region to be recognized, wherein the previous image frame includes the sequence of image frames in the sequence The image frame preceding the target image frame.
  • an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more executed by one or more processors, so that the one or more processors realize the image recognition method as described in the first aspect.
  • an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image recognition method as described in the first aspect are implemented.
  • FIG. 1 is a flowchart of an embodiment of an image recognition method according to the present disclosure
  • FIG. 2 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure
  • FIG. 3 is a flowchart of another exemplary implementation of an image recognition method according to the present disclosure.
  • FIG. 4 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure.
  • FIG. 5A is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 5B is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 6 is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 7A is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 7B is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • Fig. 8 is a schematic structural diagram of an embodiment of an image recognition device according to the present disclosure.
  • FIG. 9 is an exemplary system architecture to which an image recognition method according to an embodiment of the present disclosure can be applied.
  • Fig. 10 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
  • the term “comprising” and its variants are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • FIG. 1 shows the flow of an embodiment of the image recognition method according to the present disclosure.
  • the image recognition method can be applied to terminal equipment.
  • the image recognition method as shown in Figure 1 comprises the following steps:
  • Step 101 in the target image frame of the image frame sequence, determine the current hand area.
  • the executing subject of the image recognition method may determine the current hand area from the target image frame in the sequence of image frames.
  • the above image frame sequence may include at least two image frames.
  • the target image frame may be any image frame in the sequence of image frames.
  • the above-mentioned current hand region may be a single-hand region to be recognized or a two-hand region to be recognized.
  • an image of a single hand may be included in the single-hand area to be recognized.
  • the area of both hands to be recognized may include images of two hands, and there is an overlapping area between the images of the two hands.
  • Step 102 when the current hand region is the hands region to be recognized, adjust the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized.
  • the above-mentioned previous image frame may include an image frame in the image frame sequence before the target image frame.
  • the number of previous image frames may be one or at least two.
  • the previous image frame and the target image frame may or may not be adjacent.
  • the hand image area in the previous image frame may be referred to as the previous hand area.
  • the hand pose in the previous hand region in the previous image frame may be indicated.
  • the front hand region may include two hands, and the two hands may be mutually occluded (that is, overlapping areas exist), or may not be mutually occluded.
  • the front hand region may include both-hands region (including images of two hands, and there is an overlapping region between the images of the two hands), and may also include a single-hand region.
  • the current hand pose information may indicate the hand pose in the current hand region in the target image frame.
  • the previous hand pose information may indicate the hand pose in the previous hand region in the previous image frame.
  • hand posture information may include at least one of the following but not limited to: 3D rotation information of each joint of the hand, 2D position information of the root node of the hand in the image frame, Dimension information in . It can be understood that specific items of hand gesture information may be set according to actual application scenarios.
  • the way of adjusting the posture information of the front hand can be set according to the actual application scenario, which is not limited here.
  • one or more items of previous hand posture information may be adjusted to obtain current hand posture information.
  • the image recognition method provided in this embodiment can first determine the current hand region from the target image frame of the image frame sequence.
  • the current hand region may include a single-hand region to be recognized or a double-hand region to be recognized. There is an overlapping area between the images corresponding to the two hands in the two-hands area. Then, when the current hand region is the hands region to be recognized, the previous hand pose information in the previous hand region in the previous image frame is adjusted to obtain the current hand pose information of the hand region to be recognized.
  • a new image recognition method can be provided.
  • this image recognition method can adjust the previous hand pose information of the previous image frame to obtain the current hand pose of the hands area in the target image frame.
  • the video frame sequence can represent the change from the hand pose of the previous image frame to the hand pose of the target image frame. According to this change, adjust the hand pose information in the previous image to get The current hand posture information can improve the accuracy of the determined hand posture information in an image scene with mutually occluded hands.
  • the above method may further include: adding hand image special effects on the target image frame according to the current hand posture information.
  • the hand image special effect can be various forms of special effects, for example, a sticker special effect (adding a sticker to the hand image to cover the hand image).
  • the hand image special effect added according to the above current hand posture information can make the effect of adding the special effect more adaptable to the hand image, and the adding effect is more natural.
  • the current hand pose information obtained according to the previous hand pose information of the previous image even if there is a certain deviation from the real hand pose, due to the use of the characteristics that the hand pose is unlikely to change abruptly, It can ensure that the hand image special effect is adapted to the hand posture in the target image frame, so that in the scene where the hand posture is used to drive the hand image special effect, when the hands are close to or even overlapped, a stable and natural driving effect is produced.
  • the above step 101 may include step 201 , step 202 and step 203 shown in FIG. 2 .
  • Step 201 in the target image frame, determine the position of the hand image to obtain at least one undetermined hand region.
  • the pending hand region can be determined in various ways.
  • the undetermined hand area can be understood as the hand area obtained through preliminary positioning.
  • the aforementioned undetermined hand area may be understood as a preliminarily determined hand area.
  • the number of undetermined hand regions obtained by determining the position of the hand image on the target image frame may be 0, may be one, may be two, or may be greater than two. If it is 0, it means that the target image frame does not include the hand image, and this situation can be ignored and processed in this embodiment. If it is one, then the hand image in the undetermined hand region may be a single-handed image or a double-handed image. If there are two, then the hand image in each undetermined hand area may be a single-hand image or a double-hand image. If more than two undetermined hand regions are identified, the hand images can be divided into multiple groups according to the image features, and the hand images in each group of hand images correspond to the same person. Pending hand area.
  • multiple sets of region groups including at least one undetermined hand region can be obtained.
  • at least one undetermined hand region belonging to the same person is taken as an example for description.
  • the number of hand regions to be determined may be one or two.
  • the pending hand area may be indicated by area indication information.
  • a tracking box may be used to indicate the pending hand region.
  • Step 202 based on the positional relationship of the hand images in the undetermined hand regions, determine the processing mode of each undetermined hand region.
  • the above processing manner may include but not limited to at least one of the following: no processing, splitting and merging.
  • the above-mentioned positional relationship of the hand images may include the existence of overlapping regions or the absence of overlapping regions.
  • an overlapping area which may be that the images of the two hands in the two-handed area overlap, or that the images of the two hands in the single-handed area have an overlapping area.
  • overlapping area there is no overlapping area, which may mean that the images of the two hands respectively located in the two single-handed areas do not overlap, or that the images of the two hands in the two-handed area do not have an overlapping area.
  • Step 203 adopt the determined processing method to process each pending hand area to obtain the current hand area.
  • the current hand image obtained by the above-mentioned processing method can make the two hand images with overlapping areas in the two-handed area, and the one-handed image that does not overlap with other hand images in the one-handed area. in the area.
  • an accurate single-hand image or two-hand image can be obtained, avoiding errors in determining the pending hand region (for example, the region includes Two independent single-handed images, or two areas overlap) lead to recognition errors.
  • the above step 201 may be implemented by calling a hand detection model.
  • the hand detection model can detect the undetermined hand area in the target image, such as a rectangular box containing the appearance of the hand.
  • the above step 201 may include: when the previous image frame includes a hand image, adjusting the previous hand area of the previous image frame to obtain the undetermined hand area of the target image frame; When the previous image frame does not include a hand image, perform hand image recognition on the target image frame to obtain the undetermined hand region.
  • the hand tracking model may be invoked to locate the pending hand region of the target image frame near the previous hand region of the previous frame image.
  • the characteristics of the limited hand speed and the possible proximity of the target image frame and the previous image frame can be used to avoid the hand region from the whole image. Search, reducing the time and computation consumed to determine the hand region.
  • different tracking logics may be adopted according to whether there is an overlapping area of the hands.
  • the hand tracking model can locate the two hands in a rectangular frame (which can be a two-handed area), and track the two hands as a whole.
  • the left and right hands are regarded as independent individuals and tracked separately (the rectangular frame containing the appearance of one hand can be called a single-handed area).
  • the accuracy of the tracking effect can be improved.
  • tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
  • tracking the two hands When the hands are close and do not overlap, since the two hands have similar appearance, their tracking process may interfere with each other, such as tracking the left hand to the right hand, tracking the right hand to the left hand, or even the tracking results are completely confused. According to whether there is an overlapping area between the hands, different tracking logics can be used to achieve: when the two hands are close to each other, it can avoid the tracking confusion that is easy to occur when tracking one hand alone; when the two hands do not have an overlapping area When , tracking the two hands separately can effectively ensure the accuracy of the tracking effect.
  • the tracking method provided in the present application can improve the accuracy of the current hand region by using the current hand region determined by the position of the previous hand region in the previous image frame.
  • the front hand region may be adjusted, errors (for example, two independent single-hand images are included in the region, or two regions overlap ) is less likely, so the accuracy in the front hand area can be guaranteed, and thus the accuracy in the current hand area can be guaranteed.
  • step 202 may include: step 2021 , step 2022 and step 2023 .
  • Step 2021 determine the number of hands in the pending hand area.
  • Step 2022 for a pending hand region whose number of hands is not less than 2, determine whether to split the pending hand region.
  • the number of hands in the pending hand area is 2.
  • the hand images in the undetermined hand region can be grouped in pairs, and then processed with reference to the number of hands in the undetermined hand region being 2.
  • the undetermined hand area is split into two single-handed areas to ensure the positioning accuracy of the two hand areas.
  • the distance between the two hands is relatively long and they are located in a two-handed area, the two-handed area will be too large, while the hand image will be too small, resulting in a decrease in recognition accuracy.
  • Step 2023 for at least two pending hand regions with a hand quantity of 1, determine whether to merge the pending hand regions with a hand quantity of 1.
  • the number of hands in the undetermined hand area is 1, it can be judged whether to merge the two undetermined hand areas into a two-handed area to ensure the accuracy of hand image tracking and gesture recognition when both hands have overlapping areas.
  • the above step 2022 may include: in the undetermined hand region with the number of hands not less than two, locating the first hand image to obtain the undetermined first subregion, and locating the second hand image to obtain the undetermined second subregion area.
  • the positions of the two hand images can be located to obtain two sub-regions.
  • the left-hand image can be positioned to obtain the first sub-region to be determined
  • the right-hand image can be positioned to obtain the second sub-region to be determined.
  • it may also be two left hands of two people, or two right hands of two people, which will not be repeated here.
  • a pre-trained single-hand localization model can be used to localize single-hand images.
  • the training images for the one-hand positioning model may include images with overlapping images of both hands or images with a small distance between the images of the two hands (for example, less than a preset threshold). Therefore, by using the single-hand positioning model to process the first sub-region and the second sub-region obtained from the hand region to be recognized, the positioning accuracy of the hand image is relatively high, and confusion is less likely to occur.
  • step 203 may include: in response to determining that there is an overlapping area between the pending first sub-region and the pending second sub-region, not splitting the pending hand region whose number of hands is not less than 2, and dividing the pending hand region The undetermined hand area with the number of hands not less than 2 is determined as the hand area to be identified.
  • FIG. 4 shows various implementation manners of step 203 among the implementation manners of step 2021 , step 2022 and step 2023 .
  • Step 203 includes: in response to determining that there is no overlapping area between the pending first sub-region and the pending second sub-region, splitting the pending hand region whose number of hands is not less than 2 to obtain the pending single-hand region.
  • the pending first subregion can be determined as a single-handed region to be identified, and the pending second subregion can be determined as another single-handed region to be identified.
  • the above-mentioned step 2023 may include: determining whether any two pending hand regions with a hand quantity of 1 have overlapping regions.
  • Step 203 may include: merging pending hand regions with overlapped regions to obtain the to-be-recognized hands region.
  • FIG. 5A there is an overlapping area between the two undetermined hand areas whose hand number is 1. Therefore, combining the two undetermined hand regions in FIG. 5A results in one hands-shaped region in FIG. 5B .
  • Step 203 may include: if the undetermined hand area with the number of hands being 1 does not overlap with any undetermined hand area, determine the undetermined hand area with the number of hands being 1 as the single-hand area to be identified .
  • the target image frame determine the position of the hand image to obtain at least one undetermined hand region.
  • determining the position of the hand image may be implemented in various manners, which are not limited here.
  • the hand pose information may include at least one of the following but not limited to: three-dimensional rotation information, root node position information and size information.
  • the three-dimensional rotation information may indicate the degree of three-dimensional rotation of each joint of the human hand.
  • three-dimensional rotation information can be expressed in the form of Euler angles or rotation matrices.
  • the three-dimensional rotation information represented by Euler angles may include the rotation angles of a certain finger joint around the X axis, the Y axis, and the Z axis.
  • the position information of the root node may indicate a preset position of the root node of the hand in the hand area (for example, a tracking frame).
  • the location information of the root node may be represented by two-dimensional pixel coordinates of the root node in the image.
  • the hand root node can be a pre-specified hand location, such as the palm center point.
  • the size information may refer to the size of the interactive hand image in the image.
  • size information may be expressed in absolute or relative sizes.
  • FIG. 7A shows the relevant parameters of the previous hand pose information in the previous image frame
  • S' shows the size information
  • a' and b' show the position information of the root node. 2D pixel coordinates. Three-dimensional rotation information is not shown.
  • FIG. 7B shows the relevant parameters of the previous hand pose information in the previous image frame.
  • S shows the size information
  • a and b show the two-dimensional pixels representing the position information of the root node. coordinate. Three-dimensional rotation information is not shown.
  • adjusting the previous hand pose information of the previous hand region in the previous image frame includes at least one of the following but not Limited to: according to the three-dimensional rotation information in the front hand posture information, determine the three-dimensional rotation information in the current hand posture information; according to the relative position of the hand root node in the front hand area, determine the hand root node in the The relative position in the area of the hands to be identified; according to the size information of the hand image in the front hand area, determine the size information of the corresponding hand area in the area of the hands to be identified.
  • the hand pose can be restored by using three-dimensional rotation information, root node position information and size information to represent the hand pose information. Further, the restored hand pose has continuity with the hand pose of the previous image frame, so as to ensure that the restored hand pose can be used in the texture special effect scene to ensure the fit and naturalness of the texture and the hand image. degree.
  • the hands area to be identified includes a first sub-area and a second sub-area
  • the hand gesture information of the hand image in the first sub-area can be referred to as the first sub-pose information
  • the hand image in the second sub-area The hand gesture information of can be referred to as the second sub-pose information.
  • the third sub-region and the fourth sub-region are included in the front hand region.
  • the image of the hand in the third sub-area and the image of the hand in the first sub-area indicate the same hand
  • the image of the hand in the fourth sub-area indicates the same hand as the image of the hand in the second sub-area.
  • the previous hand gesture information includes third sub-pose information and fourth sub-pose information.
  • the first sub-pose information of the first sub-region can be determined according to the third sub-pose information of the hand image in the third sub-region; according to the fourth sub-pose information of the hand image in the fourth sub-region , to determine the second sub-pose information of the second sub-region.
  • the adjustment is based on the previous hand image corresponding to each hand image (the corresponding hand image in the previous image frame), which can avoid The image recognition of the two hands is confused, ensuring the accuracy of the gesture information of each hand.
  • the two-hand frame contains both the appearance information of the left hand and the right hand
  • the accuracy of the model's predicted pose results will be greatly reduced, resulting in confusing and unnatural results driven by hand poses.
  • the reasons may include: first, the left and right hands have very similar appearance, and the model is easily disturbed by the appearance of the other hand when predicting the pose of one hand; second, the left and right hands have complex interaction relationships, so the occlusion of the two hands is often Very complex, one hand may be almost completely occluded by the other, such extreme scenes lack sufficient appearance information to predict the pose of the hand.
  • the third sub-pose information includes third sub-3D rotation information, third sub-root node position information, and third sub-size information.
  • the third sub-pose information may indicate the pose of a hand (eg, left hand) in the front hand region.
  • the third sub-pose information is taken as an example to illustrate how to correct the third sub-pose to obtain the first sub-pose information in the first sub-region of the current hand region.
  • the process of obtaining the second sub-attitude information from the fourth sub-attitude information is similar to the process of obtaining the first sub-attitude information, and will not be repeated here.
  • the determining the 3D rotation information in the current hand posture information according to the 3D rotation information in the previous hand posture information may include: combining the third sub-3D rotation information in the third sub-pose information, Determined as the first sub-3D rotation information.
  • the three-dimensional rotation information may not be changed.
  • the three-dimensional rotation information may have little effect on gesture driving. In this case, the three-dimensional rotation information may not be processed, thereby ensuring the accuracy of the driving effect and reducing the amount of calculation.
  • the determining the relative position of the root node of the hand in the area of both hands to be identified according to the relative position of the root node of the hand in the area of the preceding hand may include: determining a third sub-root The first ratio of the width value in the node information to the width value of the third sub-area, and the product of the first ratio and the width value of the first sub-area, is determined as the width value in the first sub-root node information; The second ratio of the height value in the third sub-root node information to the height value of the third sub-region, and the product of the second ratio and the height value of the first sub-region is determined as the height value in the first sub-root node information .
  • the relative positions of the root node of the hand in the first sub-area and the root node of the hand in the third sub-area are the same, thus, the hand area (such as the tracking frame) can be reduced
  • the difference caused by movement or scaling can accurately determine the position of the root node of the hand.
  • the determining the size information of the corresponding hand region in the to-be-recognized hands region according to the size information of the hand image in the front hand region may include: determining the third sub-hand The hand size value in the size information and the third ratio of the third sub-region of the size value of the previous image frame, and the product of the third ratio and the size value of the target image frame is determined as the hand of the first sub-region size value.
  • the size information of the hand region can be understood as the proportion of the length of the hand region in the image frame.
  • the size of the hand region may indicate the length of the diagonal of the tracking box.
  • the area of the hand image can be effectively determined, and a more accurate hand image can be determined. Furthermore, in the scene of special sticker effects, the accurate determination of the size information can greatly improve the degree of fit between the special sticker effects and the hand image, and improve the naturalness of the special sticker effects.
  • the above method may further include: when the current hand region is a single-handed region to be recognized, calling the single-hand pose estimation model to recognize the hand poses in each of the single-handed regions in the single-handed region to be recognized, The fifth hand gesture information corresponding to each single-hand area to be identified is obtained.
  • the hand gesture information may indicate the hand gesture in the single-hand area to be recognized. There are two single-hand regions to be recognized in the single-hand region to be recognized, so there may also be two hand gesture information.
  • the present disclosure provides an embodiment of an image recognition device, which corresponds to the method embodiment shown in FIG. 1 , and the device can specifically be Used in various electronic equipment.
  • the image recognition device of this embodiment includes: a determination unit 801 and an adjustment unit 802 .
  • the determination unit is configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the two-hand region to be recognized
  • the images corresponding to the two hands have overlapped areas;
  • the adjustment unit is used to adjust the previous hand posture information in the previous hand area in the previous image frame when the current hand area is the area of both hands to be identified , to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that precedes the target image frame.
  • the specific processing of the recording unit determination unit 801 and the adjustment unit 802 of the image recognition device and the technical effects brought by them can refer to the relevant descriptions of step 101 and step 102 in the embodiment corresponding to FIG. 1 , here No longer.
  • the device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.
  • the determining the current hand region from the target image frame of the image frame sequence includes: determining the position of the hand image in the target image frame to obtain at least one pending hand region; The positional relationship of the hand images in the hand area, determine the processing mode of each undetermined hand area, wherein, the processing mode includes at least one of the following: no processing, splitting and merging; use the determined processing mode to process each undetermined hand area Hand area, get the current hand area.
  • the processing method of determining the pending hand region based on the positional relationship of the hand images in the pending hand region includes: determining the number of hands in the pending hand region; for the number of hands not less than For the pending hand region of 2, determine whether to split the pending hand region; for at least two pending hand regions with the number of hands being 1, determine whether to merge the pending hand regions with the number of hands being 1.
  • determining whether to split the pending hand region includes: in the pending hand region with the number of hands not less than 2, positioning the first Obtaining the undetermined first sub-area from the hand image, and locating the second hand image to obtain the undetermined second sub-area; and processing each undetermined hand area by using the determined processing method to obtain the current hand area, including: responding to determining the undetermined sub-area There is an overlapping area between the first sub-region and the pending second sub-region, do not split the pending hand region with the number of hands not less than 2, and determine the pending hand region with the number of hands not less than 2 as the pending hand region to be identified Two-hand area: in response to determining that there is no overlapping area between the first sub-area to be determined and the second sub-area to be determined, splitting the undetermined hand area with the number of hands not less than 2 to obtain the single-hand area to be identified.
  • determining whether to merge the pending hand regions with a hand number of 1 includes: determining any two pending hand regions with a hand number of 1 The hand area, whether there is an overlapping area; the processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes: merging the undetermined hand areas with overlapping areas to obtain the unidentified hands area; If the pending hand region with the number of hands being 1 does not overlap with any pending hand region, the pending hand region with the number of hands being 1 is determined as the single-hand region to be identified.
  • the determining the position of the hand image in the target image frame to obtain at least one undetermined hand region includes: when the previous image frame includes a hand image, tracking the position of the previous image frame In the previous hand area, determine the undetermined hand area of the target image frame; when it is determined that the previous image does not include the hand image, perform hand image recognition on the target image frame to obtain the undetermined hand area.
  • tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
  • the hand pose information includes at least one of the following: three-dimensional rotation information, hand root node information, and size information;
  • the front hand pose information in the front hand region in the frame includes at least one of the following: according to the three-dimensional rotation information in the front hand pose information, determine the three-dimensional rotation information in the current hand pose information;
  • the relative position of the node in the front hand area is to determine the relative position of the hand root node in the hands area to be identified; according to the size information of the hand image in the front hand area, determine the The size information of the corresponding hand region in the hands region is identified.
  • the hands area to be identified includes a first sub-area and a second sub-area
  • the front hand area includes a third sub-area and a fourth sub-area; wherein, the hand image in the third sub-area and the first sub-area
  • the hand image in a sub-area indicates the same hand
  • the hand image in the fourth sub-area and the hand image in the second sub-area indicate the same hand; wherein, the first sub-pose information of the first sub-area, It is determined according to the third sub-pose information of the hand image in the third sub-region; the second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
  • the third sub-pose information includes third sub-3D rotation information; and determining the 3D rotation information in the previous hand pose information as the 3D rotation information in the current hand pose information includes: The third sub-3D rotation information in the third sub-pose information is determined as the first sub-3D rotation information.
  • the third sub-pose information includes third sub-root node position information; and according to the relative position of the hand root node in the preceding hand region, it is determined that the hand root node is in the to-be-identified
  • the relative position in the two-hand area includes: determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-area, and determining the product of the first ratio and the width value of the first sub-area is the width value in the first sub-root node information; determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-area, and compare the second ratio to the height value of the first sub-area
  • the product is determined as the height value in the information of the first child root node.
  • the third sub-pose information includes third sub-size information; and according to the size information of the hand image in the front hand region, determine the corresponding hand region in the to-be-recognized hands region
  • the size information including: determining the third ratio of the hand size value in the third sub-hand size information and the third sub-region of the size value of the previous image frame, and comparing the third ratio with the size value of the target image frame
  • the product of is determined as the hand size value of the first sub-region.
  • the device is further configured to: when the current hand region is a single-handed region to be identified, invoke a single-hand pose estimation model to identify each single-handed region in the single-handed region to be identified The hand posture information corresponding to each single-hand area to be recognized is obtained.
  • FIG. 9 shows an exemplary system architecture in which the image recognition method of an embodiment of the present disclosure can be applied.
  • the system architecture may include terminal devices 901 , 902 , and 903 , a network 904 , and a server 905 .
  • the network 904 is used as a medium for providing communication links between the terminal devices 901 , 902 , 903 and the server 905 .
  • Network 904 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • the terminal devices 901, 902, 903 can interact with the server 905 through the network 904 to receive or send messages and the like.
  • client applications such as web browser applications, search applications, and news information applications, may be installed on the terminal devices 901, 902, and 903.
  • the client applications in the terminal devices 901, 902, and 903 can receive user instructions and complete corresponding functions according to the user instructions, such as adding corresponding information to information according to the user instructions.
  • Terminal devices 901, 902, and 903 may be hardware or software.
  • the terminal devices 901, 902, and 903 may be various electronic devices that have display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • the terminal devices 901, 902, and 903 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.
  • the server 905 may be a server that provides various services, such as receiving information acquisition requests sent by the terminal devices 901, 902, and 903, and obtaining display information corresponding to the information acquisition requests in various ways according to the information acquisition requests. And the relevant data showing the information is sent to the terminal devices 901 , 902 , 903 .
  • the image recognition method provided by the embodiment of the present disclosure may be executed by a terminal device, and correspondingly, the image recognition apparatus may be set in the terminal devices 901 , 902 , and 903 .
  • the image recognition method provided by the embodiment of the present disclosure may also be executed by the server 905 , and correspondingly, the image recognition device may be set in the server 905 .
  • terminal devices, networks and servers in FIG. 9 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 10 shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 9 ) suitable for implementing the embodiments of the present disclosure.
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 10 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1001, which may be loaded into a random access memory according to a program stored in a read-only memory (ROM) 1002 or from a storage device 1008. (RAM) 1003 to execute various appropriate actions and processing. In the RAM 1003, various programs and data necessary for the operation of the electronic device 1000 are also stored.
  • the processing device 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004.
  • An input/output (I/O) interface 1005 is also connected to the bus 1004 .
  • the following devices can be connected to the I/O interface 1005: input devices 1009 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1007 such as a computer; a storage device 1008 including, for example, a magnetic tape, a hard disk, and the like; and a communication device 1008 .
  • the communication means 1008 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While FIG. 10 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 1008, or from storage means 1008, or from ROM 1002.
  • the processing device 1001 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: determines the current hand region from the target image frame of the image frame sequence, wherein, The current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the images corresponding to the two hands in the two-hand region to be recognized have overlapping regions; when the current hand region is a two-hand region to be recognized, Adjusting the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes the middle position of the image frame sequence The image frame immediately preceding the target image frame.
  • the electronic device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.
  • the determining the current hand region from the target image frame of the image frame sequence includes: determining the position of the hand image in the target image frame to obtain at least one pending hand region; The positional relationship of the hand images in the hand area, determine the processing mode of each undetermined hand area, wherein, the processing mode includes at least one of the following: no processing, splitting and merging; use the determined processing mode to process each undetermined hand area Hand area, get the current hand area.
  • the processing method of determining the pending hand region based on the positional relationship of the hand images in the pending hand region includes: determining the number of hands in the pending hand region; for the number of hands not less than For the pending hand region of 2, determine whether to split the pending hand region; for at least two pending hand regions with the number of hands being 1, determine whether to merge the pending hand regions with the number of hands being 1.
  • determining whether to split the pending hand region includes: in the pending hand region with the number of hands not less than 2, positioning the first Obtaining the undetermined first sub-area from the hand image, and locating the second hand image to obtain the undetermined second sub-area; and processing each undetermined hand area by using the determined processing method to obtain the current hand area, including: responding to determining the undetermined sub-area There is an overlapping area between the first sub-region and the pending second sub-region, do not split the pending hand region with the number of hands not less than 2, and determine the pending hand region with the number of hands not less than 2 as the pending hand region to be identified Two-hand area: in response to determining that there is no overlapping area between the first sub-area to be determined and the second sub-area to be determined, splitting the undetermined hand area with the number of hands not less than 2 to obtain the single-hand area to be identified.
  • determining whether to merge the pending hand regions with a hand number of 1 includes: determining any two pending hand regions with a hand number of 1 The hand area, whether there is an overlapping area; the processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes: merging the undetermined hand areas with overlapping areas to obtain the unidentified hands area; If the pending hand region with the number of hands being 1 does not overlap with any pending hand region, the pending hand region with the number of hands being 1 is determined as the single-hand region to be identified.
  • the determining the position of the hand image in the target image frame to obtain at least one undetermined hand region includes: when the previous image frame includes a hand image, tracking the position of the previous image frame In the previous hand area, determine the undetermined hand area of the target image frame; when it is determined that the previous image does not include the hand image, perform hand image recognition on the target image frame to obtain the undetermined hand area.
  • tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
  • the hand pose information includes at least one of the following: three-dimensional rotation information, hand root node information, and size information;
  • the front hand pose information in the front hand region in the frame includes at least one of the following: according to the three-dimensional rotation information in the front hand pose information, determine the three-dimensional rotation information in the current hand pose information;
  • the relative position of the node in the front hand area is to determine the relative position of the hand root node in the hands area to be identified; according to the size information of the hand image in the front hand area, determine the The size information of the corresponding hand region in the hands region is identified.
  • the hands area to be identified includes a first sub-area and a second sub-area
  • the front hand area includes a third sub-area and a fourth sub-area; wherein, the hand image in the third sub-area and the first sub-area
  • the hand image in a sub-area indicates the same hand
  • the hand image in the fourth sub-area and the hand image in the second sub-area indicate the same hand; wherein, the first sub-pose information of the first sub-area, It is determined according to the third sub-pose information of the hand image in the third sub-region; the second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
  • the third sub-pose information includes third sub-3D rotation information; and determining the 3D rotation information in the previous hand pose information as the 3D rotation information in the current hand pose information includes: The third sub-3D rotation information in the third sub-pose information is determined as the first sub-3D rotation information.
  • the third sub-pose information includes third sub-root node position information; and according to the relative position of the hand root node in the preceding hand region, it is determined that the hand root node is in the to-be-identified
  • the relative position in the two-hand area includes: determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-area, and determining the product of the first ratio and the width value of the first sub-area is the width value in the first sub-root node information; determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-area, and compare the second ratio to the height value of the first sub-area
  • the product is determined as the height value in the information of the first child root node.
  • the third sub-pose information includes third sub-size information; and according to the size information of the hand image in the front hand region, determine the corresponding hand region in the to-be-recognized hands region
  • the size information including: determining the third ratio of the hand size value in the third sub-hand size information and the third sub-region of the size value of the previous image frame, and comparing the third ratio with the size value of the target image frame
  • the product of is determined as the hand size value of the first sub-region.
  • the electronic device is further configured to: when the current hand area is a single-handed area to be identified, call a single-hand pose estimation model to identify each single-handed area in the single-handed area to be identified The hand pose in the image is obtained to obtain the hand pose information corresponding to each single-hand area to be recognized.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the determination unit may also be described as "a unit for determining the current hand region".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Des modes de réalisation de la présente demande divulguent un procédé et un appareil de reconnaissance d'image, ainsi qu'un dispositif électronique. Une mise en œuvre spécifique du procédé consiste à : déterminer la région de main actuelle à partir d'une trame d'image cible d'une séquence de trames d'image, la région de main actuelle étant une région de main unique à reconnaître ou une région de deux mains à reconnaître, et des images correspondant à deux mains dans la région de deux mains à reconnaître ayant une région de chevauchement; et lorsque la région de main actuelle est la région de deux mains à reconnaître, ajuster les informations de pose de main précédentes d'une région de main précédente dans une trame d'image précédente afin d'obtenir les informations de pose de main actuelle de la région de main à reconnaître, la trame d'image précédente comprenant une trame d'image précédant la trame d'image cible dans la séquence de trames d'image.
PCT/CN2022/114436 2021-08-27 2022-08-24 Procédé et appareil de reconnaissance d'image, et dispositif électronique WO2023025181A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110999935.5 2021-08-27
CN202110999935.5A CN115731570A (zh) 2021-08-27 2021-08-27 图像识别方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2023025181A1 true WO2023025181A1 (fr) 2023-03-02

Family

ID=85290588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114436 WO2023025181A1 (fr) 2021-08-27 2022-08-24 Procédé et appareil de reconnaissance d'image, et dispositif électronique

Country Status (2)

Country Link
CN (1) CN115731570A (fr)
WO (1) WO2023025181A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6252598B1 (en) * 1997-07-03 2001-06-26 Lucent Technologies Inc. Video hand image computer interface
CN103593680A (zh) * 2013-11-19 2014-02-19 南京大学 一种基于隐马尔科夫模型自增量学习的动态手势识别方法
CN108256421A (zh) * 2017-12-05 2018-07-06 盈盛资讯科技有限公司 一种动态手势序列实时识别方法、系统及装置
CN112733823A (zh) * 2021-03-31 2021-04-30 南昌虚拟现实研究院股份有限公司 手势姿态识别关键帧提取方法、装置及可读存储介质
CN112906646A (zh) * 2021-03-23 2021-06-04 中国联合网络通信集团有限公司 人体姿态的检测方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6252598B1 (en) * 1997-07-03 2001-06-26 Lucent Technologies Inc. Video hand image computer interface
CN103593680A (zh) * 2013-11-19 2014-02-19 南京大学 一种基于隐马尔科夫模型自增量学习的动态手势识别方法
CN108256421A (zh) * 2017-12-05 2018-07-06 盈盛资讯科技有限公司 一种动态手势序列实时识别方法、系统及装置
CN112906646A (zh) * 2021-03-23 2021-06-04 中国联合网络通信集团有限公司 人体姿态的检测方法及装置
CN112733823A (zh) * 2021-03-31 2021-04-30 南昌虚拟现实研究院股份有限公司 手势姿态识别关键帧提取方法、装置及可读存储介质

Also Published As

Publication number Publication date
CN115731570A (zh) 2023-03-03

Similar Documents

Publication Publication Date Title
WO2021139408A1 (fr) Procédé et appareil pour afficher un effet spécial, et support d'enregistrement et dispositif électronique
CN109584276B (zh) 关键点检测方法、装置、设备及可读介质
WO2020186935A1 (fr) Procédé et dispositif d'affichage d'objet virtuel, appareil électronique, et support de stockage lisible par ordinateur
JP7181375B2 (ja) 目標対象の動作認識方法、装置及び電子機器
CN112051961A (zh) 虚拟交互方法、装置、电子设备及计算机可读存储介质
WO2022007565A1 (fr) Procédé et appareil de traitement d'image pour réalité augmentée, dispositif électronique et support d'enregistrement
US11863835B2 (en) Interaction method and apparatus, and electronic device
WO2023193642A1 (fr) Procédé et appareil de traitement vidéo, dispositif, et support de stockage
WO2022183887A1 (fr) Procédé et appareil d'édition vidéo, procédé et appareil de lecture vidéo, dispositif et support
CN112270242B (zh) 轨迹的显示方法、装置、可读介质和电子设备
WO2024027820A1 (fr) Procédé et appareil de génération d'animation à base d'images, dispositif, et support de stockage
WO2024037556A1 (fr) Appareil et procédé de traitement d'image, dispositif et support de stockage
CN111833459B (zh) 一种图像处理方法、装置、电子设备及存储介质
WO2024032752A1 (fr) Procédé et appareil pour générer une image d'effet spécial de transition, dispositif, et support de stockage
TW202219822A (zh) 字元檢測方法、電子設備及電腦可讀儲存介質
WO2024016923A1 (fr) Procédé et appareil de génération de graphe à effets spéciaux, dispositif et support de stockage
WO2023193639A1 (fr) Procédé et appareil de rendu d'image, support lisible et dispositif électronique
WO2023138468A1 (fr) Procédé et appareil de génération d'objet virtuel, dispositif, et support de stockage
WO2020155908A1 (fr) Procédé et appareil de génération d'informations
WO2023025181A1 (fr) Procédé et appareil de reconnaissance d'image, et dispositif électronique
CN111368668A (zh) 三维手部识别方法、装置、电子设备及存储介质
CN111027495A (zh) 用于检测人体关键点的方法和装置
WO2022194145A1 (fr) Procédé et appareil de détermination de position de photographie, dispositif et support
WO2021073204A1 (fr) Procédé et appareil d'affichage d'objet, dispositif électronique et support de stockage lisible par ordinateur
WO2022057576A1 (fr) Procédé et appareil d'affichage d'image faciale, dispositif électronique et support de stockage

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE