WO2023025181A1 - Image recognition method and apparatus, and electronic device - Google Patents

Image recognition method and apparatus, and electronic device Download PDF

Info

Publication number
WO2023025181A1
WO2023025181A1 PCT/CN2022/114436 CN2022114436W WO2023025181A1 WO 2023025181 A1 WO2023025181 A1 WO 2023025181A1 CN 2022114436 W CN2022114436 W CN 2022114436W WO 2023025181 A1 WO2023025181 A1 WO 2023025181A1
Authority
WO
WIPO (PCT)
Prior art keywords
hand
region
area
sub
hands
Prior art date
Application number
PCT/CN2022/114436
Other languages
French (fr)
Chinese (zh)
Inventor
林高杰
罗宇轩
唐堂
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023025181A1 publication Critical patent/WO2023025181A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • the present disclosure relates to the field of computer technology, in particular to an image recognition method, device and electronic equipment.
  • the electronic device can recognize the gesture of the user's hand and respond according to the gesture of the user's hand, so that the user can interact with the electronic device.
  • an embodiment of the present disclosure provides an image recognition method, the method includes: determining the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized Or to be identified hands area, the images corresponding to the two hands in the area to be identified have overlapping areas; Adjusting the previous hand posture information to obtain the current hand posture information of the hand area to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that is prior to the target image frame.
  • an embodiment of the present disclosure provides an image recognition device, including: a determination unit configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is the One-handed area or two-hands area to be identified, the images corresponding to the two hands in the two-hands area to be identified have an overlapping area; the adjustment unit is used to adjust the previous image when the current hand area is the two-hands area to be identified Adjust the previous hand posture information in the front hand region in the frame to obtain the current hand posture information of the hand region to be recognized, wherein the previous image frame includes the sequence of image frames in the sequence The image frame preceding the target image frame.
  • an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more executed by one or more processors, so that the one or more processors realize the image recognition method as described in the first aspect.
  • an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image recognition method as described in the first aspect are implemented.
  • FIG. 1 is a flowchart of an embodiment of an image recognition method according to the present disclosure
  • FIG. 2 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure
  • FIG. 3 is a flowchart of another exemplary implementation of an image recognition method according to the present disclosure.
  • FIG. 4 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure.
  • FIG. 5A is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 5B is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 6 is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 7A is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • FIG. 7B is a schematic diagram of another application scenario of the image recognition method of the present disclosure.
  • Fig. 8 is a schematic structural diagram of an embodiment of an image recognition device according to the present disclosure.
  • FIG. 9 is an exemplary system architecture to which an image recognition method according to an embodiment of the present disclosure can be applied.
  • Fig. 10 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
  • the term “comprising” and its variants are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • FIG. 1 shows the flow of an embodiment of the image recognition method according to the present disclosure.
  • the image recognition method can be applied to terminal equipment.
  • the image recognition method as shown in Figure 1 comprises the following steps:
  • Step 101 in the target image frame of the image frame sequence, determine the current hand area.
  • the executing subject of the image recognition method may determine the current hand area from the target image frame in the sequence of image frames.
  • the above image frame sequence may include at least two image frames.
  • the target image frame may be any image frame in the sequence of image frames.
  • the above-mentioned current hand region may be a single-hand region to be recognized or a two-hand region to be recognized.
  • an image of a single hand may be included in the single-hand area to be recognized.
  • the area of both hands to be recognized may include images of two hands, and there is an overlapping area between the images of the two hands.
  • Step 102 when the current hand region is the hands region to be recognized, adjust the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized.
  • the above-mentioned previous image frame may include an image frame in the image frame sequence before the target image frame.
  • the number of previous image frames may be one or at least two.
  • the previous image frame and the target image frame may or may not be adjacent.
  • the hand image area in the previous image frame may be referred to as the previous hand area.
  • the hand pose in the previous hand region in the previous image frame may be indicated.
  • the front hand region may include two hands, and the two hands may be mutually occluded (that is, overlapping areas exist), or may not be mutually occluded.
  • the front hand region may include both-hands region (including images of two hands, and there is an overlapping region between the images of the two hands), and may also include a single-hand region.
  • the current hand pose information may indicate the hand pose in the current hand region in the target image frame.
  • the previous hand pose information may indicate the hand pose in the previous hand region in the previous image frame.
  • hand posture information may include at least one of the following but not limited to: 3D rotation information of each joint of the hand, 2D position information of the root node of the hand in the image frame, Dimension information in . It can be understood that specific items of hand gesture information may be set according to actual application scenarios.
  • the way of adjusting the posture information of the front hand can be set according to the actual application scenario, which is not limited here.
  • one or more items of previous hand posture information may be adjusted to obtain current hand posture information.
  • the image recognition method provided in this embodiment can first determine the current hand region from the target image frame of the image frame sequence.
  • the current hand region may include a single-hand region to be recognized or a double-hand region to be recognized. There is an overlapping area between the images corresponding to the two hands in the two-hands area. Then, when the current hand region is the hands region to be recognized, the previous hand pose information in the previous hand region in the previous image frame is adjusted to obtain the current hand pose information of the hand region to be recognized.
  • a new image recognition method can be provided.
  • this image recognition method can adjust the previous hand pose information of the previous image frame to obtain the current hand pose of the hands area in the target image frame.
  • the video frame sequence can represent the change from the hand pose of the previous image frame to the hand pose of the target image frame. According to this change, adjust the hand pose information in the previous image to get The current hand posture information can improve the accuracy of the determined hand posture information in an image scene with mutually occluded hands.
  • the above method may further include: adding hand image special effects on the target image frame according to the current hand posture information.
  • the hand image special effect can be various forms of special effects, for example, a sticker special effect (adding a sticker to the hand image to cover the hand image).
  • the hand image special effect added according to the above current hand posture information can make the effect of adding the special effect more adaptable to the hand image, and the adding effect is more natural.
  • the current hand pose information obtained according to the previous hand pose information of the previous image even if there is a certain deviation from the real hand pose, due to the use of the characteristics that the hand pose is unlikely to change abruptly, It can ensure that the hand image special effect is adapted to the hand posture in the target image frame, so that in the scene where the hand posture is used to drive the hand image special effect, when the hands are close to or even overlapped, a stable and natural driving effect is produced.
  • the above step 101 may include step 201 , step 202 and step 203 shown in FIG. 2 .
  • Step 201 in the target image frame, determine the position of the hand image to obtain at least one undetermined hand region.
  • the pending hand region can be determined in various ways.
  • the undetermined hand area can be understood as the hand area obtained through preliminary positioning.
  • the aforementioned undetermined hand area may be understood as a preliminarily determined hand area.
  • the number of undetermined hand regions obtained by determining the position of the hand image on the target image frame may be 0, may be one, may be two, or may be greater than two. If it is 0, it means that the target image frame does not include the hand image, and this situation can be ignored and processed in this embodiment. If it is one, then the hand image in the undetermined hand region may be a single-handed image or a double-handed image. If there are two, then the hand image in each undetermined hand area may be a single-hand image or a double-hand image. If more than two undetermined hand regions are identified, the hand images can be divided into multiple groups according to the image features, and the hand images in each group of hand images correspond to the same person. Pending hand area.
  • multiple sets of region groups including at least one undetermined hand region can be obtained.
  • at least one undetermined hand region belonging to the same person is taken as an example for description.
  • the number of hand regions to be determined may be one or two.
  • the pending hand area may be indicated by area indication information.
  • a tracking box may be used to indicate the pending hand region.
  • Step 202 based on the positional relationship of the hand images in the undetermined hand regions, determine the processing mode of each undetermined hand region.
  • the above processing manner may include but not limited to at least one of the following: no processing, splitting and merging.
  • the above-mentioned positional relationship of the hand images may include the existence of overlapping regions or the absence of overlapping regions.
  • an overlapping area which may be that the images of the two hands in the two-handed area overlap, or that the images of the two hands in the single-handed area have an overlapping area.
  • overlapping area there is no overlapping area, which may mean that the images of the two hands respectively located in the two single-handed areas do not overlap, or that the images of the two hands in the two-handed area do not have an overlapping area.
  • Step 203 adopt the determined processing method to process each pending hand area to obtain the current hand area.
  • the current hand image obtained by the above-mentioned processing method can make the two hand images with overlapping areas in the two-handed area, and the one-handed image that does not overlap with other hand images in the one-handed area. in the area.
  • an accurate single-hand image or two-hand image can be obtained, avoiding errors in determining the pending hand region (for example, the region includes Two independent single-handed images, or two areas overlap) lead to recognition errors.
  • the above step 201 may be implemented by calling a hand detection model.
  • the hand detection model can detect the undetermined hand area in the target image, such as a rectangular box containing the appearance of the hand.
  • the above step 201 may include: when the previous image frame includes a hand image, adjusting the previous hand area of the previous image frame to obtain the undetermined hand area of the target image frame; When the previous image frame does not include a hand image, perform hand image recognition on the target image frame to obtain the undetermined hand region.
  • the hand tracking model may be invoked to locate the pending hand region of the target image frame near the previous hand region of the previous frame image.
  • the characteristics of the limited hand speed and the possible proximity of the target image frame and the previous image frame can be used to avoid the hand region from the whole image. Search, reducing the time and computation consumed to determine the hand region.
  • different tracking logics may be adopted according to whether there is an overlapping area of the hands.
  • the hand tracking model can locate the two hands in a rectangular frame (which can be a two-handed area), and track the two hands as a whole.
  • the left and right hands are regarded as independent individuals and tracked separately (the rectangular frame containing the appearance of one hand can be called a single-handed area).
  • the accuracy of the tracking effect can be improved.
  • tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
  • tracking the two hands When the hands are close and do not overlap, since the two hands have similar appearance, their tracking process may interfere with each other, such as tracking the left hand to the right hand, tracking the right hand to the left hand, or even the tracking results are completely confused. According to whether there is an overlapping area between the hands, different tracking logics can be used to achieve: when the two hands are close to each other, it can avoid the tracking confusion that is easy to occur when tracking one hand alone; when the two hands do not have an overlapping area When , tracking the two hands separately can effectively ensure the accuracy of the tracking effect.
  • the tracking method provided in the present application can improve the accuracy of the current hand region by using the current hand region determined by the position of the previous hand region in the previous image frame.
  • the front hand region may be adjusted, errors (for example, two independent single-hand images are included in the region, or two regions overlap ) is less likely, so the accuracy in the front hand area can be guaranteed, and thus the accuracy in the current hand area can be guaranteed.
  • step 202 may include: step 2021 , step 2022 and step 2023 .
  • Step 2021 determine the number of hands in the pending hand area.
  • Step 2022 for a pending hand region whose number of hands is not less than 2, determine whether to split the pending hand region.
  • the number of hands in the pending hand area is 2.
  • the hand images in the undetermined hand region can be grouped in pairs, and then processed with reference to the number of hands in the undetermined hand region being 2.
  • the undetermined hand area is split into two single-handed areas to ensure the positioning accuracy of the two hand areas.
  • the distance between the two hands is relatively long and they are located in a two-handed area, the two-handed area will be too large, while the hand image will be too small, resulting in a decrease in recognition accuracy.
  • Step 2023 for at least two pending hand regions with a hand quantity of 1, determine whether to merge the pending hand regions with a hand quantity of 1.
  • the number of hands in the undetermined hand area is 1, it can be judged whether to merge the two undetermined hand areas into a two-handed area to ensure the accuracy of hand image tracking and gesture recognition when both hands have overlapping areas.
  • the above step 2022 may include: in the undetermined hand region with the number of hands not less than two, locating the first hand image to obtain the undetermined first subregion, and locating the second hand image to obtain the undetermined second subregion area.
  • the positions of the two hand images can be located to obtain two sub-regions.
  • the left-hand image can be positioned to obtain the first sub-region to be determined
  • the right-hand image can be positioned to obtain the second sub-region to be determined.
  • it may also be two left hands of two people, or two right hands of two people, which will not be repeated here.
  • a pre-trained single-hand localization model can be used to localize single-hand images.
  • the training images for the one-hand positioning model may include images with overlapping images of both hands or images with a small distance between the images of the two hands (for example, less than a preset threshold). Therefore, by using the single-hand positioning model to process the first sub-region and the second sub-region obtained from the hand region to be recognized, the positioning accuracy of the hand image is relatively high, and confusion is less likely to occur.
  • step 203 may include: in response to determining that there is an overlapping area between the pending first sub-region and the pending second sub-region, not splitting the pending hand region whose number of hands is not less than 2, and dividing the pending hand region The undetermined hand area with the number of hands not less than 2 is determined as the hand area to be identified.
  • FIG. 4 shows various implementation manners of step 203 among the implementation manners of step 2021 , step 2022 and step 2023 .
  • Step 203 includes: in response to determining that there is no overlapping area between the pending first sub-region and the pending second sub-region, splitting the pending hand region whose number of hands is not less than 2 to obtain the pending single-hand region.
  • the pending first subregion can be determined as a single-handed region to be identified, and the pending second subregion can be determined as another single-handed region to be identified.
  • the above-mentioned step 2023 may include: determining whether any two pending hand regions with a hand quantity of 1 have overlapping regions.
  • Step 203 may include: merging pending hand regions with overlapped regions to obtain the to-be-recognized hands region.
  • FIG. 5A there is an overlapping area between the two undetermined hand areas whose hand number is 1. Therefore, combining the two undetermined hand regions in FIG. 5A results in one hands-shaped region in FIG. 5B .
  • Step 203 may include: if the undetermined hand area with the number of hands being 1 does not overlap with any undetermined hand area, determine the undetermined hand area with the number of hands being 1 as the single-hand area to be identified .
  • the target image frame determine the position of the hand image to obtain at least one undetermined hand region.
  • determining the position of the hand image may be implemented in various manners, which are not limited here.
  • the hand pose information may include at least one of the following but not limited to: three-dimensional rotation information, root node position information and size information.
  • the three-dimensional rotation information may indicate the degree of three-dimensional rotation of each joint of the human hand.
  • three-dimensional rotation information can be expressed in the form of Euler angles or rotation matrices.
  • the three-dimensional rotation information represented by Euler angles may include the rotation angles of a certain finger joint around the X axis, the Y axis, and the Z axis.
  • the position information of the root node may indicate a preset position of the root node of the hand in the hand area (for example, a tracking frame).
  • the location information of the root node may be represented by two-dimensional pixel coordinates of the root node in the image.
  • the hand root node can be a pre-specified hand location, such as the palm center point.
  • the size information may refer to the size of the interactive hand image in the image.
  • size information may be expressed in absolute or relative sizes.
  • FIG. 7A shows the relevant parameters of the previous hand pose information in the previous image frame
  • S' shows the size information
  • a' and b' show the position information of the root node. 2D pixel coordinates. Three-dimensional rotation information is not shown.
  • FIG. 7B shows the relevant parameters of the previous hand pose information in the previous image frame.
  • S shows the size information
  • a and b show the two-dimensional pixels representing the position information of the root node. coordinate. Three-dimensional rotation information is not shown.
  • adjusting the previous hand pose information of the previous hand region in the previous image frame includes at least one of the following but not Limited to: according to the three-dimensional rotation information in the front hand posture information, determine the three-dimensional rotation information in the current hand posture information; according to the relative position of the hand root node in the front hand area, determine the hand root node in the The relative position in the area of the hands to be identified; according to the size information of the hand image in the front hand area, determine the size information of the corresponding hand area in the area of the hands to be identified.
  • the hand pose can be restored by using three-dimensional rotation information, root node position information and size information to represent the hand pose information. Further, the restored hand pose has continuity with the hand pose of the previous image frame, so as to ensure that the restored hand pose can be used in the texture special effect scene to ensure the fit and naturalness of the texture and the hand image. degree.
  • the hands area to be identified includes a first sub-area and a second sub-area
  • the hand gesture information of the hand image in the first sub-area can be referred to as the first sub-pose information
  • the hand image in the second sub-area The hand gesture information of can be referred to as the second sub-pose information.
  • the third sub-region and the fourth sub-region are included in the front hand region.
  • the image of the hand in the third sub-area and the image of the hand in the first sub-area indicate the same hand
  • the image of the hand in the fourth sub-area indicates the same hand as the image of the hand in the second sub-area.
  • the previous hand gesture information includes third sub-pose information and fourth sub-pose information.
  • the first sub-pose information of the first sub-region can be determined according to the third sub-pose information of the hand image in the third sub-region; according to the fourth sub-pose information of the hand image in the fourth sub-region , to determine the second sub-pose information of the second sub-region.
  • the adjustment is based on the previous hand image corresponding to each hand image (the corresponding hand image in the previous image frame), which can avoid The image recognition of the two hands is confused, ensuring the accuracy of the gesture information of each hand.
  • the two-hand frame contains both the appearance information of the left hand and the right hand
  • the accuracy of the model's predicted pose results will be greatly reduced, resulting in confusing and unnatural results driven by hand poses.
  • the reasons may include: first, the left and right hands have very similar appearance, and the model is easily disturbed by the appearance of the other hand when predicting the pose of one hand; second, the left and right hands have complex interaction relationships, so the occlusion of the two hands is often Very complex, one hand may be almost completely occluded by the other, such extreme scenes lack sufficient appearance information to predict the pose of the hand.
  • the third sub-pose information includes third sub-3D rotation information, third sub-root node position information, and third sub-size information.
  • the third sub-pose information may indicate the pose of a hand (eg, left hand) in the front hand region.
  • the third sub-pose information is taken as an example to illustrate how to correct the third sub-pose to obtain the first sub-pose information in the first sub-region of the current hand region.
  • the process of obtaining the second sub-attitude information from the fourth sub-attitude information is similar to the process of obtaining the first sub-attitude information, and will not be repeated here.
  • the determining the 3D rotation information in the current hand posture information according to the 3D rotation information in the previous hand posture information may include: combining the third sub-3D rotation information in the third sub-pose information, Determined as the first sub-3D rotation information.
  • the three-dimensional rotation information may not be changed.
  • the three-dimensional rotation information may have little effect on gesture driving. In this case, the three-dimensional rotation information may not be processed, thereby ensuring the accuracy of the driving effect and reducing the amount of calculation.
  • the determining the relative position of the root node of the hand in the area of both hands to be identified according to the relative position of the root node of the hand in the area of the preceding hand may include: determining a third sub-root The first ratio of the width value in the node information to the width value of the third sub-area, and the product of the first ratio and the width value of the first sub-area, is determined as the width value in the first sub-root node information; The second ratio of the height value in the third sub-root node information to the height value of the third sub-region, and the product of the second ratio and the height value of the first sub-region is determined as the height value in the first sub-root node information .
  • the relative positions of the root node of the hand in the first sub-area and the root node of the hand in the third sub-area are the same, thus, the hand area (such as the tracking frame) can be reduced
  • the difference caused by movement or scaling can accurately determine the position of the root node of the hand.
  • the determining the size information of the corresponding hand region in the to-be-recognized hands region according to the size information of the hand image in the front hand region may include: determining the third sub-hand The hand size value in the size information and the third ratio of the third sub-region of the size value of the previous image frame, and the product of the third ratio and the size value of the target image frame is determined as the hand of the first sub-region size value.
  • the size information of the hand region can be understood as the proportion of the length of the hand region in the image frame.
  • the size of the hand region may indicate the length of the diagonal of the tracking box.
  • the area of the hand image can be effectively determined, and a more accurate hand image can be determined. Furthermore, in the scene of special sticker effects, the accurate determination of the size information can greatly improve the degree of fit between the special sticker effects and the hand image, and improve the naturalness of the special sticker effects.
  • the above method may further include: when the current hand region is a single-handed region to be recognized, calling the single-hand pose estimation model to recognize the hand poses in each of the single-handed regions in the single-handed region to be recognized, The fifth hand gesture information corresponding to each single-hand area to be identified is obtained.
  • the hand gesture information may indicate the hand gesture in the single-hand area to be recognized. There are two single-hand regions to be recognized in the single-hand region to be recognized, so there may also be two hand gesture information.
  • the present disclosure provides an embodiment of an image recognition device, which corresponds to the method embodiment shown in FIG. 1 , and the device can specifically be Used in various electronic equipment.
  • the image recognition device of this embodiment includes: a determination unit 801 and an adjustment unit 802 .
  • the determination unit is configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the two-hand region to be recognized
  • the images corresponding to the two hands have overlapped areas;
  • the adjustment unit is used to adjust the previous hand posture information in the previous hand area in the previous image frame when the current hand area is the area of both hands to be identified , to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that precedes the target image frame.
  • the specific processing of the recording unit determination unit 801 and the adjustment unit 802 of the image recognition device and the technical effects brought by them can refer to the relevant descriptions of step 101 and step 102 in the embodiment corresponding to FIG. 1 , here No longer.
  • the device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.
  • the determining the current hand region from the target image frame of the image frame sequence includes: determining the position of the hand image in the target image frame to obtain at least one pending hand region; The positional relationship of the hand images in the hand area, determine the processing mode of each undetermined hand area, wherein, the processing mode includes at least one of the following: no processing, splitting and merging; use the determined processing mode to process each undetermined hand area Hand area, get the current hand area.
  • the processing method of determining the pending hand region based on the positional relationship of the hand images in the pending hand region includes: determining the number of hands in the pending hand region; for the number of hands not less than For the pending hand region of 2, determine whether to split the pending hand region; for at least two pending hand regions with the number of hands being 1, determine whether to merge the pending hand regions with the number of hands being 1.
  • determining whether to split the pending hand region includes: in the pending hand region with the number of hands not less than 2, positioning the first Obtaining the undetermined first sub-area from the hand image, and locating the second hand image to obtain the undetermined second sub-area; and processing each undetermined hand area by using the determined processing method to obtain the current hand area, including: responding to determining the undetermined sub-area There is an overlapping area between the first sub-region and the pending second sub-region, do not split the pending hand region with the number of hands not less than 2, and determine the pending hand region with the number of hands not less than 2 as the pending hand region to be identified Two-hand area: in response to determining that there is no overlapping area between the first sub-area to be determined and the second sub-area to be determined, splitting the undetermined hand area with the number of hands not less than 2 to obtain the single-hand area to be identified.
  • determining whether to merge the pending hand regions with a hand number of 1 includes: determining any two pending hand regions with a hand number of 1 The hand area, whether there is an overlapping area; the processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes: merging the undetermined hand areas with overlapping areas to obtain the unidentified hands area; If the pending hand region with the number of hands being 1 does not overlap with any pending hand region, the pending hand region with the number of hands being 1 is determined as the single-hand region to be identified.
  • the determining the position of the hand image in the target image frame to obtain at least one undetermined hand region includes: when the previous image frame includes a hand image, tracking the position of the previous image frame In the previous hand area, determine the undetermined hand area of the target image frame; when it is determined that the previous image does not include the hand image, perform hand image recognition on the target image frame to obtain the undetermined hand area.
  • tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
  • the hand pose information includes at least one of the following: three-dimensional rotation information, hand root node information, and size information;
  • the front hand pose information in the front hand region in the frame includes at least one of the following: according to the three-dimensional rotation information in the front hand pose information, determine the three-dimensional rotation information in the current hand pose information;
  • the relative position of the node in the front hand area is to determine the relative position of the hand root node in the hands area to be identified; according to the size information of the hand image in the front hand area, determine the The size information of the corresponding hand region in the hands region is identified.
  • the hands area to be identified includes a first sub-area and a second sub-area
  • the front hand area includes a third sub-area and a fourth sub-area; wherein, the hand image in the third sub-area and the first sub-area
  • the hand image in a sub-area indicates the same hand
  • the hand image in the fourth sub-area and the hand image in the second sub-area indicate the same hand; wherein, the first sub-pose information of the first sub-area, It is determined according to the third sub-pose information of the hand image in the third sub-region; the second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
  • the third sub-pose information includes third sub-3D rotation information; and determining the 3D rotation information in the previous hand pose information as the 3D rotation information in the current hand pose information includes: The third sub-3D rotation information in the third sub-pose information is determined as the first sub-3D rotation information.
  • the third sub-pose information includes third sub-root node position information; and according to the relative position of the hand root node in the preceding hand region, it is determined that the hand root node is in the to-be-identified
  • the relative position in the two-hand area includes: determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-area, and determining the product of the first ratio and the width value of the first sub-area is the width value in the first sub-root node information; determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-area, and compare the second ratio to the height value of the first sub-area
  • the product is determined as the height value in the information of the first child root node.
  • the third sub-pose information includes third sub-size information; and according to the size information of the hand image in the front hand region, determine the corresponding hand region in the to-be-recognized hands region
  • the size information including: determining the third ratio of the hand size value in the third sub-hand size information and the third sub-region of the size value of the previous image frame, and comparing the third ratio with the size value of the target image frame
  • the product of is determined as the hand size value of the first sub-region.
  • the device is further configured to: when the current hand region is a single-handed region to be identified, invoke a single-hand pose estimation model to identify each single-handed region in the single-handed region to be identified The hand posture information corresponding to each single-hand area to be recognized is obtained.
  • FIG. 9 shows an exemplary system architecture in which the image recognition method of an embodiment of the present disclosure can be applied.
  • the system architecture may include terminal devices 901 , 902 , and 903 , a network 904 , and a server 905 .
  • the network 904 is used as a medium for providing communication links between the terminal devices 901 , 902 , 903 and the server 905 .
  • Network 904 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • the terminal devices 901, 902, 903 can interact with the server 905 through the network 904 to receive or send messages and the like.
  • client applications such as web browser applications, search applications, and news information applications, may be installed on the terminal devices 901, 902, and 903.
  • the client applications in the terminal devices 901, 902, and 903 can receive user instructions and complete corresponding functions according to the user instructions, such as adding corresponding information to information according to the user instructions.
  • Terminal devices 901, 902, and 903 may be hardware or software.
  • the terminal devices 901, 902, and 903 may be various electronic devices that have display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc.
  • the terminal devices 901, 902, and 903 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.
  • the server 905 may be a server that provides various services, such as receiving information acquisition requests sent by the terminal devices 901, 902, and 903, and obtaining display information corresponding to the information acquisition requests in various ways according to the information acquisition requests. And the relevant data showing the information is sent to the terminal devices 901 , 902 , 903 .
  • the image recognition method provided by the embodiment of the present disclosure may be executed by a terminal device, and correspondingly, the image recognition apparatus may be set in the terminal devices 901 , 902 , and 903 .
  • the image recognition method provided by the embodiment of the present disclosure may also be executed by the server 905 , and correspondingly, the image recognition device may be set in the server 905 .
  • terminal devices, networks and servers in FIG. 9 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 10 shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 9 ) suitable for implementing the embodiments of the present disclosure.
  • the terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 10 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1001, which may be loaded into a random access memory according to a program stored in a read-only memory (ROM) 1002 or from a storage device 1008. (RAM) 1003 to execute various appropriate actions and processing. In the RAM 1003, various programs and data necessary for the operation of the electronic device 1000 are also stored.
  • the processing device 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004.
  • An input/output (I/O) interface 1005 is also connected to the bus 1004 .
  • the following devices can be connected to the I/O interface 1005: input devices 1009 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1007 such as a computer; a storage device 1008 including, for example, a magnetic tape, a hard disk, and the like; and a communication device 1008 .
  • the communication means 1008 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While FIG. 10 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 1008, or from storage means 1008, or from ROM 1002.
  • the processing device 1001 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: determines the current hand region from the target image frame of the image frame sequence, wherein, The current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the images corresponding to the two hands in the two-hand region to be recognized have overlapping regions; when the current hand region is a two-hand region to be recognized, Adjusting the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes the middle position of the image frame sequence The image frame immediately preceding the target image frame.
  • the electronic device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.
  • the determining the current hand region from the target image frame of the image frame sequence includes: determining the position of the hand image in the target image frame to obtain at least one pending hand region; The positional relationship of the hand images in the hand area, determine the processing mode of each undetermined hand area, wherein, the processing mode includes at least one of the following: no processing, splitting and merging; use the determined processing mode to process each undetermined hand area Hand area, get the current hand area.
  • the processing method of determining the pending hand region based on the positional relationship of the hand images in the pending hand region includes: determining the number of hands in the pending hand region; for the number of hands not less than For the pending hand region of 2, determine whether to split the pending hand region; for at least two pending hand regions with the number of hands being 1, determine whether to merge the pending hand regions with the number of hands being 1.
  • determining whether to split the pending hand region includes: in the pending hand region with the number of hands not less than 2, positioning the first Obtaining the undetermined first sub-area from the hand image, and locating the second hand image to obtain the undetermined second sub-area; and processing each undetermined hand area by using the determined processing method to obtain the current hand area, including: responding to determining the undetermined sub-area There is an overlapping area between the first sub-region and the pending second sub-region, do not split the pending hand region with the number of hands not less than 2, and determine the pending hand region with the number of hands not less than 2 as the pending hand region to be identified Two-hand area: in response to determining that there is no overlapping area between the first sub-area to be determined and the second sub-area to be determined, splitting the undetermined hand area with the number of hands not less than 2 to obtain the single-hand area to be identified.
  • determining whether to merge the pending hand regions with a hand number of 1 includes: determining any two pending hand regions with a hand number of 1 The hand area, whether there is an overlapping area; the processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes: merging the undetermined hand areas with overlapping areas to obtain the unidentified hands area; If the pending hand region with the number of hands being 1 does not overlap with any pending hand region, the pending hand region with the number of hands being 1 is determined as the single-hand region to be identified.
  • the determining the position of the hand image in the target image frame to obtain at least one undetermined hand region includes: when the previous image frame includes a hand image, tracking the position of the previous image frame In the previous hand area, determine the undetermined hand area of the target image frame; when it is determined that the previous image does not include the hand image, perform hand image recognition on the target image frame to obtain the undetermined hand area.
  • tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
  • the hand pose information includes at least one of the following: three-dimensional rotation information, hand root node information, and size information;
  • the front hand pose information in the front hand region in the frame includes at least one of the following: according to the three-dimensional rotation information in the front hand pose information, determine the three-dimensional rotation information in the current hand pose information;
  • the relative position of the node in the front hand area is to determine the relative position of the hand root node in the hands area to be identified; according to the size information of the hand image in the front hand area, determine the The size information of the corresponding hand region in the hands region is identified.
  • the hands area to be identified includes a first sub-area and a second sub-area
  • the front hand area includes a third sub-area and a fourth sub-area; wherein, the hand image in the third sub-area and the first sub-area
  • the hand image in a sub-area indicates the same hand
  • the hand image in the fourth sub-area and the hand image in the second sub-area indicate the same hand; wherein, the first sub-pose information of the first sub-area, It is determined according to the third sub-pose information of the hand image in the third sub-region; the second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
  • the third sub-pose information includes third sub-3D rotation information; and determining the 3D rotation information in the previous hand pose information as the 3D rotation information in the current hand pose information includes: The third sub-3D rotation information in the third sub-pose information is determined as the first sub-3D rotation information.
  • the third sub-pose information includes third sub-root node position information; and according to the relative position of the hand root node in the preceding hand region, it is determined that the hand root node is in the to-be-identified
  • the relative position in the two-hand area includes: determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-area, and determining the product of the first ratio and the width value of the first sub-area is the width value in the first sub-root node information; determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-area, and compare the second ratio to the height value of the first sub-area
  • the product is determined as the height value in the information of the first child root node.
  • the third sub-pose information includes third sub-size information; and according to the size information of the hand image in the front hand region, determine the corresponding hand region in the to-be-recognized hands region
  • the size information including: determining the third ratio of the hand size value in the third sub-hand size information and the third sub-region of the size value of the previous image frame, and comparing the third ratio with the size value of the target image frame
  • the product of is determined as the hand size value of the first sub-region.
  • the electronic device is further configured to: when the current hand area is a single-handed area to be identified, call a single-hand pose estimation model to identify each single-handed area in the single-handed area to be identified The hand pose in the image is obtained to obtain the hand pose information corresponding to each single-hand area to be recognized.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the determination unit may also be described as "a unit for determining the current hand region".
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Abstract

Embodiments of the present application disclose an image recognition method and apparatus, and an electronic device. A specific implementation of the method comprises: determining the current hand region from a target image frame of an image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and images corresponding to two hands in the two-hand region to be recognized have an overlapping region; and when the current hand region is the two-hand region to be recognized, adjusting the preceding hand pose information of a preceding hand region in a preceding image frame to obtain the current hand pose information of the hand region to be recognized, wherein the preceding image frame comprises an image frame before the target image frame in the image frame sequence.

Description

图像识别方法、装置和电子设备Image recognition method, device and electronic equipment
相关申请的交叉引用Cross References to Related Applications
本申请要求于2021年08月27日提交的,申请号为202110999935.5、发明名称为“图像识别方法、装置和电子设备”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202110999935.5 and the invention title "Image Recognition Method, Device and Electronic Equipment" filed on August 27, 2021, the entire content of which is incorporated by reference in this application .
技术领域technical field
本公开涉及计算机技术领域,尤其涉及一种图像识别方法、装置和电子设备。The present disclosure relates to the field of computer technology, in particular to an image recognition method, device and electronic equipment.
背景技术Background technique
随着计算机技术的发展,用户越来越多的使用终端设备实现各种功能。With the development of computer technology, more and more users use terminal devices to implement various functions.
在一些应用场景中,电子设备可以识别用户手部姿态,并根据用户手部姿态做出响应,从而实现用户可以与电子设备进行交互。In some application scenarios, the electronic device can recognize the gesture of the user's hand and respond according to the gesture of the user's hand, so that the user can interact with the electronic device.
发明内容Contents of the invention
提供该公开内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该公开内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。This Disclosure section is provided to introduce a simplified form of concepts that are described in detail that follow in the Detailed Description section. This disclosure part is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.
第一方面,本公开实施例提供了一种图像识别方法,该方法 包括:从图像帧序列的目标图像帧中,确定当前手部区域,其中,所述当前手部区域是待识别单手区域或待识别双手区域,所述待识别双手区域中的两只手对应的图像具有重合区域;当所述当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,得到待识别手部区域的当前手部姿态信息,其中,所述在前图像帧包括所述图像帧序列中位次在所述目标图像帧之前的图像帧。In a first aspect, an embodiment of the present disclosure provides an image recognition method, the method includes: determining the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized Or to be identified hands area, the images corresponding to the two hands in the area to be identified have overlapping areas; Adjusting the previous hand posture information to obtain the current hand posture information of the hand area to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that is prior to the target image frame.
第二方面,本公开实施例提供了一种图像识别装置,包括:确定单元,用于从图像帧序列的目标图像帧中,确定当前手部区域,其中,所述当前手部区域是待识别单手区域或待识别双手区域,所述待识别双手区域中的两只手对应的图像具有重合区域;调整单元,用于当所述当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,得到待识别手部区域的当前手部姿态信息,其中,所述在前图像帧包括所述图像帧序列中位次在所述目标图像帧之前的图像帧。In a second aspect, an embodiment of the present disclosure provides an image recognition device, including: a determination unit configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is the One-handed area or two-hands area to be identified, the images corresponding to the two hands in the two-hands area to be identified have an overlapping area; the adjustment unit is used to adjust the previous image when the current hand area is the two-hands area to be identified Adjust the previous hand posture information in the front hand region in the frame to obtain the current hand posture information of the hand region to be recognized, wherein the previous image frame includes the sequence of image frames in the sequence The image frame preceding the target image frame.
第三方面,本公开实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如第一方面所述的图像识别方法。In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more executed by one or more processors, so that the one or more processors realize the image recognition method as described in the first aspect.
第四方面,本公开实施例提供了一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如第一方面所述的图像识别方法的步骤。In a fourth aspect, an embodiment of the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, the steps of the image recognition method as described in the first aspect are implemented.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale.
图1是根据本公开的图像识别方法的一个实施例的流程图;FIG. 1 is a flowchart of an embodiment of an image recognition method according to the present disclosure;
图2是根据本公开的图像识别方法的示例性实现方式的流程图;2 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure;
图3是根据本公开的图像识别方法的另一示例性实现方式的流程图;FIG. 3 is a flowchart of another exemplary implementation of an image recognition method according to the present disclosure;
图4是根据本公开的图像识别方法的一示例性实现方式的流程图;FIG. 4 is a flowchart of an exemplary implementation of an image recognition method according to the present disclosure;
图5A是本公开的图像识别方法的又一个应用场景的示意图;FIG. 5A is a schematic diagram of another application scenario of the image recognition method of the present disclosure;
图5B是本公开的图像识别方法的又一个应用场景的示意图;FIG. 5B is a schematic diagram of another application scenario of the image recognition method of the present disclosure;
图6是本公开的图像识别方法的又一个应用场景的示意图;FIG. 6 is a schematic diagram of another application scenario of the image recognition method of the present disclosure;
图7A是本公开的图像识别方法的另一个应用场景的示意图;FIG. 7A is a schematic diagram of another application scenario of the image recognition method of the present disclosure;
图7B是本公开的图像识别方法的另一个应用场景的示意图;FIG. 7B is a schematic diagram of another application scenario of the image recognition method of the present disclosure;
图8是根据本公开的图像识别装置的一个实施例的结构示意图;Fig. 8 is a schematic structural diagram of an embodiment of an image recognition device according to the present disclosure;
图9是本公开的一个实施例的图像识别方法可以应用于其中的示例性系统架构;FIG. 9 is an exemplary system architecture to which an image recognition method according to an embodiment of the present disclosure can be applied;
图10是根据本公开实施例提供的电子设备的基本结构的示意图。Fig. 10 is a schematic diagram of a basic structure of an electronic device provided according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this respect.
本文使用的术语“包括”及其变形是开放性包括,即“包括 但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprising" and its variants are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that the modifications of "one" and "multiple" mentioned in the present disclosure are illustrative and not restrictive, and those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "one or more" multiple".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.
请参考图1,其示出了根据本公开的图像识别方法的一个实施例的流程。该图像识别方法可以应用于终端设备。如图1所示该图像识别方法,包括以下步骤:Please refer to FIG. 1 , which shows the flow of an embodiment of the image recognition method according to the present disclosure. The image recognition method can be applied to terminal equipment. The image recognition method as shown in Figure 1 comprises the following steps:
步骤101,在图像帧序列的目标图像帧中,确定当前手部区域。 Step 101, in the target image frame of the image frame sequence, determine the current hand area.
在本实施例中,图像识别方法的执行主体(例如终端设备)可以从图像帧序列的目标图像帧中,确定当前手部区域。In this embodiment, the executing subject of the image recognition method (such as a terminal device) may determine the current hand area from the target image frame in the sequence of image frames.
在这里,上述图像帧序列可以包括至少两个图像帧。目标图像帧可以是图像帧序列中的任一图像帧。Here, the above image frame sequence may include at least two image frames. The target image frame may be any image frame in the sequence of image frames.
在本实施例中,上述当前手部区域可以是待识别单手区域或者待识别双手区域。In this embodiment, the above-mentioned current hand region may be a single-hand region to be recognized or a two-hand region to be recognized.
在这里,待识别单手区域中可以包括单只手的图像。Here, an image of a single hand may be included in the single-hand area to be recognized.
在这里,待识别双手区域,可以包括两只手的图像,两只手的图像之间具有重合区域。Here, the area of both hands to be recognized may include images of two hands, and there is an overlapping area between the images of the two hands.
步骤102,当当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,得到待识别手部区域的当前手部姿态信息。 Step 102, when the current hand region is the hands region to be recognized, adjust the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized.
在这里,上述在前图像帧可以包括所述图像帧序列中位次在 所述目标图像帧之前的图像帧。Here, the above-mentioned previous image frame may include an image frame in the image frame sequence before the target image frame.
可选的,在前图像帧的数量可以是一个或者至少两个。在前图像帧与目标图像帧,可以相邻,也可以不相邻。Optionally, the number of previous image frames may be one or at least two. The previous image frame and the target image frame may or may not be adjacent.
在这里,在前图像帧中的手部图像区域,可以称为在前手部区域。在前手部姿态信息,可以指示在前图像帧中在前手部区域中的手部姿态。可以理解,在前手部区域可能包括两只手,两只手之间可能有相互遮挡(即存在重合区域),也可能相互无遮挡。相应的,在前手部区域可能包括双手区域(包括两只手的图像,并且两只手的图像之间具有重合区域),也可能包括单手区域。Here, the hand image area in the previous image frame may be referred to as the previous hand area. In the previous hand pose information, the hand pose in the previous hand region in the previous image frame may be indicated. It can be understood that the front hand region may include two hands, and the two hands may be mutually occluded (that is, overlapping areas exist), or may not be mutually occluded. Correspondingly, the front hand region may include both-hands region (including images of two hands, and there is an overlapping region between the images of the two hands), and may also include a single-hand region.
可以理解,如果在前手部区域中的手部图像和当前手部区域中的手部图像,可以属于同一个人。It can be understood that if the hand image in the previous hand region and the hand image in the current hand region may belong to the same person.
在本实施例中,当前手部姿态信息,可以指示目标图像帧中当前手部区域中的手部姿态。In this embodiment, the current hand pose information may indicate the hand pose in the current hand region in the target image frame.
在本实施例中,在前手部姿态信息,可以指示在前图像帧中在前手部区域中的手部姿态。In this embodiment, the previous hand pose information may indicate the hand pose in the previous hand region in the previous image frame.
在一些应用场景中,手部姿态信息,可以包括以下至少一项但不限于:手部各个关节的三维旋转信息、手部根节点在图像帧中的二维位置信息、手部图像在图像帧中的尺寸信息。可以理解,手部姿态信息的具体项可以根据实际应用场景设置。In some application scenarios, hand posture information may include at least one of the following but not limited to: 3D rotation information of each joint of the hand, 2D position information of the root node of the hand in the image frame, Dimension information in . It can be understood that specific items of hand gesture information may be set according to actual application scenarios.
在本实施例中,在前手部姿态信息的调整方式,可以根据实际应用场景设置,在此不作限定。In this embodiment, the way of adjusting the posture information of the front hand can be set according to the actual application scenario, which is not limited here.
作为示例,可以调整在前手部姿态信息的一项或者多项,得到当前手部姿态信息。As an example, one or more items of previous hand posture information may be adjusted to obtain current hand posture information.
需要说明的是,本实施例提供的图像识别方法,可以先从图像帧序列的目标图像帧中,确定当前手部区域,当前手部区域可能包括待识别单手区域或者待识别双手区域,待识别双手区域中的两只手对应的图像之间具有重合区域。再后,当当前手部区域是待识别双手区域时,调整在前图像帧中在前手部区域的在前手部姿态信息,得到待识别手部区域的当前手部姿态信息。由此,可以提供一种新的图像识别方法。It should be noted that the image recognition method provided in this embodiment can first determine the current hand region from the target image frame of the image frame sequence. The current hand region may include a single-hand region to be recognized or a double-hand region to be recognized. There is an overlapping area between the images corresponding to the two hands in the two-hands area. Then, when the current hand region is the hands region to be recognized, the previous hand pose information in the previous hand region in the previous image frame is adjusted to obtain the current hand pose information of the hand region to be recognized. Thus, a new image recognition method can be provided.
需要说明的是,该图像识别方法可以调整在前图像帧的在前手部姿态信息,得到目标图像帧中的双手区域的当前手部姿态,在双手图像具有重合区域的场景下,场景特点使得直接识别被遮挡手部图像的手部姿态较为困难,视频帧序列可以表征在前图像帧的手部姿态到目标图像帧的手部姿态的变化,根据这种变化调整在前手部姿态信息得到当前手部姿态信息,可以在具有相互遮挡的双手图像场景中,提高所确定的手部姿态信息的准确程度。It should be noted that this image recognition method can adjust the previous hand pose information of the previous image frame to obtain the current hand pose of the hands area in the target image frame. In the scene where the hands images have overlapping areas, the scene characteristics make It is difficult to directly recognize the hand pose of the occluded hand image. The video frame sequence can represent the change from the hand pose of the previous image frame to the hand pose of the target image frame. According to this change, adjust the hand pose information in the previous image to get The current hand posture information can improve the accuracy of the determined hand posture information in an image scene with mutually occluded hands.
在一些实施例中,上述方法还可以包括:根据当前手部姿态信息,在所述目标图像帧上,添加手部图像特效。In some embodiments, the above method may further include: adding hand image special effects on the target image frame according to the current hand posture information.
可选的,手部图像特效可以是各种形式的特效,例如,贴纸特效(在手部图像添加贴纸覆盖手部图像)。Optionally, the hand image special effect can be various forms of special effects, for example, a sticker special effect (adding a sticker to the hand image to cover the hand image).
需要说明的是,根据上述当前手部姿态信息而添加的手部图像特效,可以使得特效的添加效果与手部图像的适配性较高,添加效果比较自然。具体来说,根据在前图像的在前手部姿态信息得到的当前手部姿态信息,即使与真实的手部姿态存在一定偏差,但是由于利用了手部姿态不太可能发生突变的特点,也可以保证手部图像特效与目标图像帧中的手部姿态相适应,从而在以手部姿态驱动手部图像特效的场景中,在双手靠近甚至重合时,产生稳定且自然的驱动效果。It should be noted that the hand image special effect added according to the above current hand posture information can make the effect of adding the special effect more adaptable to the hand image, and the adding effect is more natural. Specifically, the current hand pose information obtained according to the previous hand pose information of the previous image, even if there is a certain deviation from the real hand pose, due to the use of the characteristics that the hand pose is unlikely to change abruptly, It can ensure that the hand image special effect is adapted to the hand posture in the target image frame, so that in the scene where the hand posture is used to drive the hand image special effect, when the hands are close to or even overlapped, a stable and natural driving effect is produced.
在一些实施例中,上述步骤101可以包括图2示出的步骤201、步骤202和步骤203。In some embodiments, the above step 101 may include step 201 , step 202 and step 203 shown in FIG. 2 .
步骤201,在目标图像帧中,确定手部图像位置得到至少一个待定手部区域。 Step 201, in the target image frame, determine the position of the hand image to obtain at least one undetermined hand region.
在这里,可以采用各种方式确定待定手部区域。在一些应用场景中,待定手部区域可以理解为初步定位得到的手部区域。Here, the pending hand region can be determined in various ways. In some application scenarios, the undetermined hand area can be understood as the hand area obtained through preliminary positioning.
在本实施例中,上述待定手部区域,可以理解为初步确定的手部区域。In this embodiment, the aforementioned undetermined hand area may be understood as a preliminarily determined hand area.
可以理解,对目标图像帧进行手部图像位置确定,得到的待定手部区域的数量,可能是0个,可能是一个,也可能是两个,也可能是大于两个。如果是0个,说明该目标图像帧不包括手部 图像,这种情况可以先不考虑用本实施例处理。如果是一个,那么个待定手部区域中的手部图像,可能是单手图像,也可能是双手图像。如果是两个,那么每个待定手部区域中的手部图像,可能是,可能是单手图像,也可能是双手图像。如果识别到大于两个的待定手部区域,那么可以根据图像特征将手部图像分为多组,每组手部图像中的手部图像与同一个人对应,由此,可以得到与人对应的待定手部区域。It can be understood that the number of undetermined hand regions obtained by determining the position of the hand image on the target image frame may be 0, may be one, may be two, or may be greater than two. If it is 0, it means that the target image frame does not include the hand image, and this situation can be ignored and processed in this embodiment. If it is one, then the hand image in the undetermined hand region may be a single-handed image or a double-handed image. If there are two, then the hand image in each undetermined hand area may be a single-hand image or a double-hand image. If more than two undetermined hand regions are identified, the hand images can be divided into multiple groups according to the image features, and the hand images in each group of hand images correspond to the same person. Pending hand area.
换句话说,对于目标图像帧中有多个人的人手图像的情况,可以得到多组包括至少一个待定手部区域的区域组。为了方便说明,在本申请中,以属于同一个人的至少一个待定手部区域为例进行说明。由此,在一般人具有不大于两个手的情况下,待定手部区域的数量可以是一个或者是两个。In other words, for the case where there are multiple human hand images in the target image frame, multiple sets of region groups including at least one undetermined hand region can be obtained. For convenience of description, in this application, at least one undetermined hand region belonging to the same person is taken as an example for description. Thus, in the case that a general person has no more than two hands, the number of hand regions to be determined may be one or two.
在本实施例中,待定手部区域可以用区域指示信息进行指示。作为示例,可以采用跟踪框指示待定手部区域。In this embodiment, the pending hand area may be indicated by area indication information. As an example, a tracking box may be used to indicate the pending hand region.
步骤202,基于待定手部区域中的手部图像的位置关系,确定各个待定手部区域的处理方式。 Step 202, based on the positional relationship of the hand images in the undetermined hand regions, determine the processing mode of each undetermined hand region.
在这里,上述处理方式可以包括以下至少一项但不限于:不处理、拆分和合并。Here, the above processing manner may include but not limited to at least one of the following: no processing, splitting and merging.
在这里,上述手部图像的位置关系,可以包括存在重合区域或者不存在重合区域。Here, the above-mentioned positional relationship of the hand images may include the existence of overlapping regions or the absence of overlapping regions.
可选的,存在重合区域,可以是双手型区域中的两只手图像存在重合,也可以是单手型区域中的两只手图像存在重合区域。Optionally, there is an overlapping area, which may be that the images of the two hands in the two-handed area overlap, or that the images of the two hands in the single-handed area have an overlapping area.
可选的,不存在重合区域,可以是分别位于两个单手型区域中的两只手图像不存在重合,也可以是双手型区域中的两只手图像不存在重合区域。Optionally, there is no overlapping area, which may mean that the images of the two hands respectively located in the two single-handed areas do not overlap, or that the images of the two hands in the two-handed area do not have an overlapping area.
步骤203,采用所确定的处理方式处理各个待定手部区域,得到当前手部区域。 Step 203, adopt the determined processing method to process each pending hand area to obtain the current hand area.
在一些应用场景中,经过上述处理方式得到的当前手部图像,可以尽量使得具有重合区域的两只手图像在双手型区域中、与其它手部图像没有重合区域的单手图像在单手型区域中。In some application scenarios, the current hand image obtained by the above-mentioned processing method can make the two hand images with overlapping areas in the two-handed area, and the one-handed image that does not overlap with other hand images in the one-handed area. in the area.
需要说明的是,通过基于待定手部区域中的手部图像的位置关系,处理待定手部区域,可以得到准确的单手图像或者双手图像,避免由于待定手部区域确定错误(例如区域中包括两个独立的单手图像,或者两个区域存在重合)而导致得识别错误。It should be noted that, by processing the undetermined hand region based on the positional relationship of the hand images in the undetermined hand region, an accurate single-hand image or two-hand image can be obtained, avoiding errors in determining the pending hand region (for example, the region includes Two independent single-handed images, or two areas overlap) lead to recognition errors.
在一些实施例中,上述步骤201,可以通过调用手部检测模型实现。手部检测模型可以在目标图像中,检测出待定手部区域,例如包含手部外观的矩形框。In some embodiments, the above step 201 may be implemented by calling a hand detection model. The hand detection model can detect the undetermined hand area in the target image, such as a rectangular box containing the appearance of the hand.
在一些实施例中,上述步骤201,可以包括:当所述在前图像帧包括手部图像时,调整在前图像帧的在前手部区域,得到目标图像帧的待定手部区域;当确定在在前图像帧不包括手部图像时,对所述目标图像帧进行手部图像识别,得到所述待定手部区域。In some embodiments, the above step 201 may include: when the previous image frame includes a hand image, adjusting the previous hand area of the previous image frame to obtain the undetermined hand area of the target image frame; When the previous image frame does not include a hand image, perform hand image recognition on the target image frame to obtain the undetermined hand region.
作为示例,可以调用手部跟踪模型,在上一帧图像的在前手部区域附近定位目标图像帧的待定手部区域。As an example, the hand tracking model may be invoked to locate the pending hand region of the target image frame near the previous hand region of the previous frame image.
需要说明的是,根据在前手部区域位置得到待定手部区域,可以利用人手速度有限、目标图像帧与上一图像帧的位置很可能临近的特点,避免了从全图中进行手部区域搜索,减少了确定手部区域所消耗的时间和计算量。It should be noted that, according to the undetermined hand region obtained from the previous hand region position, the characteristics of the limited hand speed and the possible proximity of the target image frame and the previous image frame can be used to avoid the hand region from the whole image. Search, reducing the time and computation consumed to determine the hand region.
在一些实施例中,根据双手是否存在重合区域,可以采用不同的跟踪逻辑。当双手靠近有重合时,手部跟踪模型可以将两只手定位在一个矩形框(可以成为双手型区域)中,将两只手看作一个整体进行跟踪。当两只手相隔一定距离没有重合时,将左右手分别看作独立的个体分别进行跟踪(可以将包含一只手外观的矩形框称为单手型区域)。由此,可以提高跟踪效果的准确性。In some embodiments, different tracking logics may be adopted according to whether there is an overlapping area of the hands. When the hands are close to each other, the hand tracking model can locate the two hands in a rectangular frame (which can be a two-handed area), and track the two hands as a whole. When the two hands are separated by a certain distance and do not overlap, the left and right hands are regarded as independent individuals and tracked separately (the rectangular frame containing the appearance of one hand can be called a single-handed area). Thus, the accuracy of the tracking effect can be improved.
在一些实施例中,所述当所述在前图像帧包括手部图像时,跟踪在前图像帧的在前手部区域,确定目标图像帧的待定手部区域,包括:如果在前手部区域是双手区域,调整在前手部区域的双手区域得到包括双手图像的待定手部区域。In some embodiments, when the previous image frame includes a hand image, tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
当双手临近且不重合时,由于两只手具有相似的外观,它们的跟踪过程可能会相互干扰,例如左手跟踪到右手上,右手跟踪到左手上,甚至是跟踪结果完全混乱。而上述根据双手是否存在 重合区域,采用不同的跟踪逻辑,可以实现:当两只手靠近有重合时,可以避免单独跟踪一只手时容易出现的跟踪混乱的问题;当两只手没有重合区域时,对两只手单独地跟踪可以有效保证跟踪效果的准确性。When the hands are close and do not overlap, since the two hands have similar appearance, their tracking process may interfere with each other, such as tracking the left hand to the right hand, tracking the right hand to the left hand, or even the tracking results are completely confused. According to whether there is an overlapping area between the hands, different tracking logics can be used to achieve: when the two hands are close to each other, it can avoid the tracking confusion that is easy to occur when tracking one hand alone; when the two hands do not have an overlapping area When , tracking the two hands separately can effectively ensure the accuracy of the tracking effect.
需要说明的是,本申请提供的跟踪方式,利用在前图像帧的在前手部区域的位置所确定的当前手部区域,可以提高当前手部区域的准确性。具体来说,在在前手部区域具有在前图像帧的基础上,在前手部区域可能是经过调整的,错误(例如区域中包括两个独立的单手图像,或者两个区域存在重合)的可能性较低,因此可以保证在前手部区域的准确性,进而保证当前手部区域的准确性。It should be noted that the tracking method provided in the present application can improve the accuracy of the current hand region by using the current hand region determined by the position of the previous hand region in the previous image frame. Specifically, on the basis of having previous image frames in the front hand region, the front hand region may be adjusted, errors (for example, two independent single-hand images are included in the region, or two regions overlap ) is less likely, so the accuracy in the front hand area can be guaranteed, and thus the accuracy in the current hand area can be guaranteed.
在一些实施例中,请参考图3,步骤202可以包括:步骤2021、步骤2022和步骤2023。In some embodiments, please refer to FIG. 3 , step 202 may include: step 2021 , step 2022 and step 2023 .
步骤2021,确定待定手部区域中的手部数量。 Step 2021, determine the number of hands in the pending hand area.
步骤2022,对于手部数量不小于2的待定手部区域,确定是否拆分该待定手部区域。 Step 2022, for a pending hand region whose number of hands is not less than 2, determine whether to split the pending hand region.
一般情况下,待定手部区域的手部数量为2。对于手部数量大于2的待定手部区域,可以将该待定手部区域中的手部图像两两分组,然后参考待定手部区域的手部数量为2进行处理。In general, the number of hands in the pending hand area is 2. For the undetermined hand region with the number of hands greater than 2, the hand images in the undetermined hand region can be grouped in pairs, and then processed with reference to the number of hands in the undetermined hand region being 2.
如果待定手部区域的手部数量为2,可以判断是将该待定手部区域拆分为两个单手型区域,以保证两只手区域的定位精度。对比来说,如果两只手距离较远还定位到一个双手型区域中,则会导致双手型区域过大,而手部图像过小,使得识别准确率下降。If the number of hands in the undetermined hand area is 2, it can be judged that the undetermined hand area is split into two single-handed areas to ensure the positioning accuracy of the two hand areas. In contrast, if the distance between the two hands is relatively long and they are located in a two-handed area, the two-handed area will be too large, while the hand image will be too small, resulting in a decrease in recognition accuracy.
步骤2023,对于手部数量为1的至少两个待定手部区域,确定是否合并手部数量为1的待定手部区域。 Step 2023, for at least two pending hand regions with a hand quantity of 1, determine whether to merge the pending hand regions with a hand quantity of 1.
在一些应用场景中,可能目标图像帧中只有一个手部数量为1的待定手部区域。这种属于单手操作场景,在这里不作进一步讨论。In some application scenarios, there may be only one undetermined hand region with a hand number of 1 in the target image frame. This is a one-handed operation scenario, and will not be further discussed here.
如果待定手部区域的手部数量为1,可以判断是否将两个待定手部区域合并为双手型区域,以保证双手具有重合区域时的手部 图像跟踪和姿态识别的准确性。If the number of hands in the undetermined hand area is 1, it can be judged whether to merge the two undetermined hand areas into a two-handed area to ensure the accuracy of hand image tracking and gesture recognition when both hands have overlapping areas.
需要说明的是,按照待定手部区域中手部数量,对于手部数量不小于2的待定手部区域和手部数量为1的待定手部区域,采用不同的逻辑进行判断,可以保证得到的当前手部图像中待识别单手图像或者待识别双手图像的准确性。It should be noted that, according to the number of hands in the undetermined hand area, for the undetermined hand area with the number of hands not less than 2 and the undetermined hand area with the number of hands being 1, different logics are used for judgment, and the obtained The accuracy of the image of the single hand to be recognized or the image of both hands to be recognized in the current hand image.
在一些实施例中,上述步骤2022,可以包括:在手部数量不小于2的待定手部区域中,定位第一手图像得到待定第一子区域,以及定位第二手图像得到待定第二子区域。In some embodiments, the above step 2022 may include: in the undetermined hand region with the number of hands not less than two, locating the first hand image to obtain the undetermined first subregion, and locating the second hand image to obtain the undetermined second subregion area.
在这里,如果手部数量不小于2,两两分组之后,对于每组中的两个手图像,可以定位两个手图像的位置,得到两个子区域。作为示例,可以定位左手图像得到待定第一子区域,可以定位右手图像得到待定第二子区域。可选的,也可能是两个人的两只左手,或者是两个人的两只右手,在此不再赘述。Here, if the number of hands is not less than 2, after pairwise grouping, for the two hand images in each group, the positions of the two hand images can be located to obtain two sub-regions. As an example, the left-hand image can be positioned to obtain the first sub-region to be determined, and the right-hand image can be positioned to obtain the second sub-region to be determined. Optionally, it may also be two left hands of two people, or two right hands of two people, which will not be repeated here.
作为示例,可以预先训练的单手定位模型,定位单手图像。单手定位模型的训练图像可以包括双手图像有重合的图像或者双手图像距离较小(例如小于预设阈值)的图像。由此,利用单手定位模型处理待识别手部区域得到的第一子区域和第二子区域,手部图像的定位精度较高,不易出现混乱。As an example, a pre-trained single-hand localization model can be used to localize single-hand images. The training images for the one-hand positioning model may include images with overlapping images of both hands or images with a small distance between the images of the two hands (for example, less than a preset threshold). Therefore, by using the single-hand positioning model to process the first sub-region and the second sub-region obtained from the hand region to be recognized, the positioning accuracy of the hand image is relatively high, and confusion is less likely to occur.
在一些实施例中,步骤203,可以包括:响应于确定待定第一子区域和待定第二子区域存在重合区域,不拆分所述手部数量不小于2的待定手部区域,以及将所述手部数量不小于2的待定手部区域确定为待识别双手区域。In some embodiments, step 203 may include: in response to determining that there is an overlapping area between the pending first sub-region and the pending second sub-region, not splitting the pending hand region whose number of hands is not less than 2, and dividing the pending hand region The undetermined hand area with the number of hands not less than 2 is determined as the hand area to be identified.
如果第一待定子区域和第二待定子区域存在重合区域,则不做额外处理,仍然将包括两个手的双手型区域作为一个整体。If there is an overlapping area between the first to-be-stator area and the second to-be-stator area, no additional processing is performed, and the two-handed area including two hands is still taken as a whole.
请参考图4所示,图4示出了在步骤2021、步骤2022和步骤2023的实现方式中,步骤203的各种实现方式。Please refer to FIG. 4 , which shows various implementation manners of step 203 among the implementation manners of step 2021 , step 2022 and step 2023 .
步骤203,包括:响应于确定待定第一子区域和待定第二子区域不存在重合区域,拆分所述手部数量不小于2的待定手部区域,得到所述待识别单手区域。Step 203 includes: in response to determining that there is no overlapping area between the pending first sub-region and the pending second sub-region, splitting the pending hand region whose number of hands is not less than 2 to obtain the pending single-hand region.
如果待定第一子区域和待定第二子区域不存在重合区域,那 么可以将待定第一子区域确定为一个待识别单手区域,将待定第二子区域确定为另一个待识别单手区域。If there is no overlapping area between the pending first subregion and the pending second subregion, then the pending first subregion can be determined as a single-handed region to be identified, and the pending second subregion can be determined as another single-handed region to be identified.
在一些实施例中,上述步骤2023,可以包括:确定任意两个手部数量为1的待定手部区域,是否存在重合区域。In some embodiments, the above-mentioned step 2023 may include: determining whether any two pending hand regions with a hand quantity of 1 have overlapping regions.
步骤203,可以包括:合并存在重合区域的待定手部区域,得到所述待识别双手区域。Step 203 may include: merging pending hand regions with overlapped regions to obtain the to-be-recognized hands region.
作为示例,图5A中的两个手部数量为1的待定手部区域之间,存在重合区域。因此,合并图5A的两个待定手部区域得到图5B的一个双手型区域。As an example, in FIG. 5A , there is an overlapping area between the two undetermined hand areas whose hand number is 1. Therefore, combining the two undetermined hand regions in FIG. 5A results in one hands-shaped region in FIG. 5B .
步骤203,可以包括:如果手部数量为1的待定手部区域不与任何待定手部区域存在重合区域,将该手部数量为1的待定手部区域,确定为所述待识别单手区域。Step 203 may include: if the undetermined hand area with the number of hands being 1 does not overlap with any undetermined hand area, determine the undetermined hand area with the number of hands being 1 as the single-hand area to be identified .
图6中的两个手部数量为1的待定手部区域之间没有重合区域,将这两个待定手部区域确定为独立的待识别单手区域。There is no overlapping area between the two pending hand regions whose number of hands is 1 in FIG. 6 , and these two pending hand regions are determined as independent single-hand regions to be recognized.
在目标图像帧中,确定手部图像位置得到至少一个待定手部区域。In the target image frame, determine the position of the hand image to obtain at least one undetermined hand region.
在本实施例中,确定手部图像位置,可以采用各种方式实现,在此不作限定。In this embodiment, determining the position of the hand image may be implemented in various manners, which are not limited here.
在一些实施例中,手部姿态信息可以包括以下至少一项但不限于:三维旋转信息、根节点位置信息和尺寸信息。In some embodiments, the hand pose information may include at least one of the following but not limited to: three-dimensional rotation information, root node position information and size information.
在这里,三维旋转信息可以指示人手的各个关节的三维旋转程度。在一些应用场景中,三维旋转信息,可以用欧拉角或者旋转矩阵等形式表示。作为示例,以欧拉角表示的三维旋转信息可以包括某个手指关节围绕X轴、Y轴和Z轴的旋转角度。Here, the three-dimensional rotation information may indicate the degree of three-dimensional rotation of each joint of the human hand. In some application scenarios, three-dimensional rotation information can be expressed in the form of Euler angles or rotation matrices. As an example, the three-dimensional rotation information represented by Euler angles may include the rotation angles of a certain finger joint around the X axis, the Y axis, and the Z axis.
在这里,根节点位置信息,可以指示预设的手部根节点在手部区域(例如跟踪框)中的位置。在一些应用场景中,根节点位置信息可以采用根节点在图像中的二维像素坐标表示。手部根节点可以是预先指定的手部位置,例如手掌中心点。Here, the position information of the root node may indicate a preset position of the root node of the hand in the hand area (for example, a tracking frame). In some application scenarios, the location information of the root node may be represented by two-dimensional pixel coordinates of the root node in the image. The hand root node can be a pre-specified hand location, such as the palm center point.
在这里,尺寸信息,可以指交互示手部图像在图像中的尺寸大小。作为示例,尺寸信息可以以绝对尺寸表示,也可以以相对 尺寸表示。Here, the size information may refer to the size of the interactive hand image in the image. As an example, size information may be expressed in absolute or relative sizes.
请参考图7A,图7A示出了在前图像帧中的在前手部姿态信息的相关参数,以S’示出了尺寸信息,以a’和b’示出了表示根节点位置信息的二维像素坐标。三维旋转信息没有示出。Please refer to FIG. 7A. FIG. 7A shows the relevant parameters of the previous hand pose information in the previous image frame, S' shows the size information, and a' and b' show the position information of the root node. 2D pixel coordinates. Three-dimensional rotation information is not shown.
请参考图7B,图7B示出了在前图像帧中的在前手部姿态信息的相关参数,以S示出了尺寸信息,以a和b示出了表示根节点位置信息的二维像素坐标。三维旋转信息没有示出。Please refer to FIG. 7B. FIG. 7B shows the relevant parameters of the previous hand pose information in the previous image frame. S shows the size information, and a and b show the two-dimensional pixels representing the position information of the root node. coordinate. Three-dimensional rotation information is not shown.
在一些实施例中,所述当所述当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,包括以下至少一项但不限于:根据在前手部姿态信息中的三维旋转信息,确定当前手部姿态信息中的三维旋转信息;根据手部根节点在所述在前手部区域的相对位置,确定手部根节点在所述待识别双手区域中的相对位置;根据所述在前手部区域中手部图像的尺寸信息,确定所述待识别双手区域中对应的手部区域的尺寸信息。In some embodiments, when the current hand region is the hands region to be identified, adjusting the previous hand pose information of the previous hand region in the previous image frame includes at least one of the following but not Limited to: according to the three-dimensional rotation information in the front hand posture information, determine the three-dimensional rotation information in the current hand posture information; according to the relative position of the hand root node in the front hand area, determine the hand root node in the The relative position in the area of the hands to be identified; according to the size information of the hand image in the front hand area, determine the size information of the corresponding hand area in the area of the hands to be identified.
需要说明的是,选用三维旋转信息、根节点位置信息和尺寸信息表征手部姿态信息,可以还原出手部姿态。进一步地,还原出的手部姿态与在前图像帧的手部姿态具有连续性,从而可以保证还原出的手部姿态可以在贴图特效场景中,保证贴图与手部图像的贴合性与自然程度。It should be noted that the hand pose can be restored by using three-dimensional rotation information, root node position information and size information to represent the hand pose information. Further, the restored hand pose has continuity with the hand pose of the previous image frame, so as to ensure that the restored hand pose can be used in the texture special effect scene to ensure the fit and naturalness of the texture and the hand image. degree.
在这里,待识别双手区域包括第一子区域和第二子区域,第一子区域中的手部图像的手部姿态信息可以称为第一子姿态信息,第二子区域中的手部图像的手部姿态信息可以称为第二子姿态信息。Here, the hands area to be identified includes a first sub-area and a second sub-area, the hand gesture information of the hand image in the first sub-area can be referred to as the first sub-pose information, and the hand image in the second sub-area The hand gesture information of can be referred to as the second sub-pose information.
在这里,在前手部区域包括第三子区域和第四子区域。第三子区域中的手部图像和第一子区域中的手部图像指示同一手部,第四子区域中的手部图像与第二子区域中的手部图像指示同一手部。Here, the third sub-region and the fourth sub-region are included in the front hand region. The image of the hand in the third sub-area and the image of the hand in the first sub-area indicate the same hand, and the image of the hand in the fourth sub-area indicates the same hand as the image of the hand in the second sub-area.
在这里,在前手部姿态信息包括第三子姿态信息和第四子姿态信息。Here, the previous hand gesture information includes third sub-pose information and fourth sub-pose information.
在一些应用场景中,可以根据第三子区域中手部图像的第三子姿态信息,确定第一子区域的第一子姿态信息;根据第四子区域中手部图像的第四子姿态信息,确定第二子区域的第二子姿态信息。In some application scenarios, the first sub-pose information of the first sub-region can be determined according to the third sub-pose information of the hand image in the third sub-region; according to the fourth sub-pose information of the hand image in the fourth sub-region , to determine the second sub-pose information of the second sub-region.
需要说明的是,对于待识别两手区域中的两个手部图像,以每个手部图像对应的在前手部图像(在前图像帧中对应的手部图像)为基础进行调整,可以避免两只手的图像识别出现混乱,保证每只手的姿态信息的准确程度。It should be noted that, for the two hand images in the two-hand area to be recognized, the adjustment is based on the previous hand image corresponding to each hand image (the corresponding hand image in the previous image frame), which can avoid The image recognition of the two hands is confused, ensuring the accuracy of the gesture information of each hand.
对比来说,由于双手框同时包含了左手和右手的外观信息,因此模型预测姿态结果的准确性会大幅降低,导致手部姿态驱动的结果混乱而不自然。其原因可能包括:首先,左右手具有非常相似的外观,模型在预测一只手的姿态时容易被另一只手的外观干扰;然后,左右手具有复杂的交互关系,因此两只手的遮挡情况往往非常复杂,可能一只手几乎完全被另一只手遮挡,这类极端场景缺乏足够的外观信息来预测手的姿态。In contrast, since the two-hand frame contains both the appearance information of the left hand and the right hand, the accuracy of the model's predicted pose results will be greatly reduced, resulting in confusing and unnatural results driven by hand poses. The reasons may include: first, the left and right hands have very similar appearance, and the model is easily disturbed by the appearance of the other hand when predicting the pose of one hand; second, the left and right hands have complex interaction relationships, so the occlusion of the two hands is often Very complex, one hand may be almost completely occluded by the other, such extreme scenes lack sufficient appearance information to predict the pose of the hand.
在双手同框的场景中,我们不对左右手的姿态进行直接预测,而是采用当前帧的手框位置对上一帧的手姿态结果进行矫正,从而得到当前帧的手姿态结果。这样的手部驱动结果虽然可能与实际人手的姿态不一致,但是一定程度上保证了驱动效果的自然性,不再出现混乱且不稳定的驱动效果。In the scene where both hands are in the same frame, we do not directly predict the pose of the left and right hands, but use the position of the hand frame of the current frame to correct the result of the hand pose of the previous frame, so as to obtain the result of the hand pose of the current frame. Although such a hand driving result may not be consistent with the actual human hand posture, it guarantees the naturalness of the driving effect to a certain extent, and no more chaotic and unstable driving effects.
在一些实施例中,第三子姿态信息包括第三子三维旋转信息、第三子根节点位置信息和第三子尺寸信息。如上所述,第三子姿态信息可以指示在前手部区域中的一只手(例如左手)的姿态。In some embodiments, the third sub-pose information includes third sub-3D rotation information, third sub-root node position information, and third sub-size information. As mentioned above, the third sub-pose information may indicate the pose of a hand (eg, left hand) in the front hand region.
在这里,以第三子姿态信息为例说明了如何矫正第三子姿态,得到当前手部区域的第一子区域中的第一子姿态信息。从第四子姿态信息得到第二子姿态信息的过程,与得到第一子姿态信息的过程类似,在此不再赘述。Here, the third sub-pose information is taken as an example to illustrate how to correct the third sub-pose to obtain the first sub-pose information in the first sub-region of the current hand region. The process of obtaining the second sub-attitude information from the fourth sub-attitude information is similar to the process of obtaining the first sub-attitude information, and will not be repeated here.
在一些实施例中,所述根据在前手部姿态信息中的三维旋转信息,确定当前手部姿态信息中的三维旋转信息,可以包括:将第三子姿态信息中第三子三维旋转信息,确定为第一子三维旋转 信息。In some embodiments, the determining the 3D rotation information in the current hand posture information according to the 3D rotation information in the previous hand posture information may include: combining the third sub-3D rotation information in the third sub-pose information, Determined as the first sub-3D rotation information.
需要说明的是,在这里,对于三维旋转信息可以不做变动。在一些应用场景中,可能三维旋转信息对于手势驱动的作用较小,在这种情况下,对于三维旋转信息可以不做处理,由此,可以保证驱动效果的准确性的同时,减少计算量。It should be noted that, here, the three-dimensional rotation information may not be changed. In some application scenarios, the three-dimensional rotation information may have little effect on gesture driving. In this case, the three-dimensional rotation information may not be processed, thereby ensuring the accuracy of the driving effect and reducing the amount of calculation.
在一些实施例中,所述根据手部根节点在所述在前手部区域的相对位置,确定手部根节点在所述待识别双手区域中的相对位置,可以包括:确定第三子根节点信息中的宽度值与第三子区域的宽度值的第一比值,以及将第一比值与第一子区域的宽度值的乘积,确定为第一子根节点信息中的宽度值;确定第三子根节点信息中的高度值与第三子区域的高度值的第二比值,以及将第二比值与第一子区域的高度值的乘积,确定为第一子根节点信息中的高度值。In some embodiments, the determining the relative position of the root node of the hand in the area of both hands to be identified according to the relative position of the root node of the hand in the area of the preceding hand may include: determining a third sub-root The first ratio of the width value in the node information to the width value of the third sub-area, and the product of the first ratio and the width value of the first sub-area, is determined as the width value in the first sub-root node information; The second ratio of the height value in the third sub-root node information to the height value of the third sub-region, and the product of the second ratio and the height value of the first sub-region is determined as the height value in the first sub-root node information .
需要说明的是,通过第二比值,可以第一子区域中的手部根节点和第三子区域中的手部根节点的相对位置相同,由此,可以减少手部区域(例如跟踪框)移动或者放缩带来的差别,准确确定手部根节点位置。It should be noted that, through the second ratio, the relative positions of the root node of the hand in the first sub-area and the root node of the hand in the third sub-area are the same, thus, the hand area (such as the tracking frame) can be reduced The difference caused by movement or scaling can accurately determine the position of the root node of the hand.
在一些实施例中,所述根据所述在前手部区域中手部图像的尺寸信息,确定所述待识别双手区域中对应的手部区域的尺寸信息,可以包括:确定第三子手部尺寸信息中的手部尺寸值与在前图像帧的尺寸值的第三子区域的第三比值,以及将第三比值与目标图像帧的尺寸值的乘积,确定为第一子区域的手部尺寸值。In some embodiments, the determining the size information of the corresponding hand region in the to-be-recognized hands region according to the size information of the hand image in the front hand region may include: determining the third sub-hand The hand size value in the size information and the third ratio of the third sub-region of the size value of the previous image frame, and the product of the third ratio and the size value of the target image frame is determined as the hand of the first sub-region size value.
在这里,可以将手部区域的尺寸信息,理解为手部区域在图像帧中的长度占比。作为示例,手部区域的尺寸可以指示跟踪框对角线的长度。Here, the size information of the hand region can be understood as the proportion of the length of the hand region in the image frame. As an example, the size of the hand region may indicate the length of the diagonal of the tracking box.
需要说明的是,通过确定尺寸信息,可以有效确定手部图像的面积,确定更为准确手部图像。进一步地,在贴图特效的场景中,尺寸信息的准确确定,可以极大地提高贴图特效与手部图像的贴合程度,提高贴图特效的自然程度。It should be noted that by determining the size information, the area of the hand image can be effectively determined, and a more accurate hand image can be determined. Furthermore, in the scene of special sticker effects, the accurate determination of the size information can greatly improve the degree of fit between the special sticker effects and the hand image, and improve the naturalness of the special sticker effects.
在一些实施例中,上述方法还可以包括:当当前手部区域是 待识别单手区域时,调用单手姿态估计模型,识别待识别单手区域中的各个单手区域中的手部姿态,得到各个待识别单手区域应的第五手部姿态信息。In some embodiments, the above method may further include: when the current hand region is a single-handed region to be recognized, calling the single-hand pose estimation model to recognize the hand poses in each of the single-handed regions in the single-handed region to be recognized, The fifth hand gesture information corresponding to each single-hand area to be identified is obtained.
在这里,手部姿态信息可以指示待识别单手区域中的手部姿态。待识别单手区域具有两个待识别单手区域,因此手部姿态信息也可以有两个。Here, the hand gesture information may indicate the hand gesture in the single-hand area to be recognized. There are two single-hand regions to be recognized in the single-hand region to be recognized, so there may also be two hand gesture information.
进一步参考图8,作为对上述各图所示方法的实现,本公开提供了一种图像识别装置的一个实施例,该装置实施例与图1所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 8 , as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image recognition device, which corresponds to the method embodiment shown in FIG. 1 , and the device can specifically be Used in various electronic equipment.
如图8所示,本实施例的图像识别装置包括:确定单元801和调整单元802。其中,确定单元,用于从图像帧序列的目标图像帧中,确定当前手部区域,其中,所述当前手部区域是待识别单手区域或待识别双手区域,所述待识别双手区域中的两只手对应的图像具有重合区域;调整单元,用于当所述当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,得到待识别手部区域的当前手部姿态信息,其中,所述在前图像帧包括所述图像帧序列中位次在所述目标图像帧之前的图像帧。As shown in FIG. 8 , the image recognition device of this embodiment includes: a determination unit 801 and an adjustment unit 802 . Wherein, the determination unit is configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the two-hand region to be recognized The images corresponding to the two hands have overlapped areas; the adjustment unit is used to adjust the previous hand posture information in the previous hand area in the previous image frame when the current hand area is the area of both hands to be identified , to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes an image frame in the sequence of image frames that precedes the target image frame.
在本实施例中,图像识别装置的记录单元确定单元801和调整单元802的具体处理及其所带来的技术效果可分别参考图1对应实施例中步骤101和步骤102的相关说明,在此不再赘述。In this embodiment, the specific processing of the recording unit determination unit 801 and the adjustment unit 802 of the image recognition device and the technical effects brought by them can refer to the relevant descriptions of step 101 and step 102 in the embodiment corresponding to FIG. 1 , here No longer.
在一些实施例中,所述装置还用于:根据所述当前手部姿态信息,在所述目标图像帧上,添加手部图像特效。In some embodiments, the device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.
在一些实施例中,所述从图像帧序列的目标图像帧中,确定当前手部区域,包括:在所述目标图像帧中,确定手部图像位置得到至少一个待定手部区域;基于待定手部区域中的手部图像的位置关系,确定各个待定手部区域的处理方式,其中,所述处理方式包括以下至少一项:不处理、拆分和合并;采用所确定的处理方式处理各个待定手部区域,得到当前手部区域。In some embodiments, the determining the current hand region from the target image frame of the image frame sequence includes: determining the position of the hand image in the target image frame to obtain at least one pending hand region; The positional relationship of the hand images in the hand area, determine the processing mode of each undetermined hand area, wherein, the processing mode includes at least one of the following: no processing, splitting and merging; use the determined processing mode to process each undetermined hand area Hand area, get the current hand area.
在一些实施例中,所述基于待定手部区域中的手部图像的位置关系,确定待定手部区域的处理方式,包括:确定待定手部区域中的手部数量;对于手部数量不小于2的待定手部区域,确定是否拆分该待定手部区域;对于手部数量为1的至少两个待定手部区域,确定是否合并手部数量为1的待定手部区域。In some embodiments, the processing method of determining the pending hand region based on the positional relationship of the hand images in the pending hand region includes: determining the number of hands in the pending hand region; for the number of hands not less than For the pending hand region of 2, determine whether to split the pending hand region; for at least two pending hand regions with the number of hands being 1, determine whether to merge the pending hand regions with the number of hands being 1.
在一些实施例中,所述对于手部数量不小于2的待定手部区域,确定是否拆分该待定手部区域,包括:在手部数量不小于2的待定手部区域中,定位第一手图像得到待定第一子区域,以及定位第二手图像得到待定第二子区域;以及所述采用所确定的处理方式处理各个待定手部区域,得到当前手部区域,包括:响应于确定待定第一子区域和待定第二子区域存在重合区域,不拆分所述手部数量不小于2的待定手部区域,以及将所述手部数量不小于2的待定手部区域确定为待识别双手区域;响应于确定待定第一子区域和待定第二子区域不存在重合区域,拆分所述手部数量不小于2的待定手部区域,得到所述待识别单手区域。In some embodiments, for the pending hand region with the number of hands not less than 2, determining whether to split the pending hand region includes: in the pending hand region with the number of hands not less than 2, positioning the first Obtaining the undetermined first sub-area from the hand image, and locating the second hand image to obtain the undetermined second sub-area; and processing each undetermined hand area by using the determined processing method to obtain the current hand area, including: responding to determining the undetermined sub-area There is an overlapping area between the first sub-region and the pending second sub-region, do not split the pending hand region with the number of hands not less than 2, and determine the pending hand region with the number of hands not less than 2 as the pending hand region to be identified Two-hand area: in response to determining that there is no overlapping area between the first sub-area to be determined and the second sub-area to be determined, splitting the undetermined hand area with the number of hands not less than 2 to obtain the single-hand area to be identified.
在一些实施例中,所述对于手部数量为1的至少两个待定手部区域,确定是否合并手部数量为1的待定手部区域,包括:确定任意两个手部数量为1的待定手部区域,是否存在重合区域;所述采用所确定的处理方式处理各个待定手部区域,得到当前手部区域,包括:合并存在重合区域的待定手部区域,得到所述待识别双手区域;如果手部数量为1的待定手部区域不与任何待定手部区域存在重合区域,将该手部数量为1的待定手部区域,确定为所述待识别单手区域。In some embodiments, for at least two pending hand regions with a hand number of 1, determining whether to merge the pending hand regions with a hand number of 1 includes: determining any two pending hand regions with a hand number of 1 The hand area, whether there is an overlapping area; the processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes: merging the undetermined hand areas with overlapping areas to obtain the unidentified hands area; If the pending hand region with the number of hands being 1 does not overlap with any pending hand region, the pending hand region with the number of hands being 1 is determined as the single-hand region to be identified.
在一些实施例中,所述在所述目标图像帧中,确定手部图像位置得到至少一个待定手部区域,包括:当所述在前图像帧包括手部图像时,跟踪在前图像帧的在前手部区域,确定目标图像帧的待定手部区域;当确定在前图像不包括手部图像时,对所述目标图像帧进行手部图像识别,得到所述待定手部区域。In some embodiments, the determining the position of the hand image in the target image frame to obtain at least one undetermined hand region includes: when the previous image frame includes a hand image, tracking the position of the previous image frame In the previous hand area, determine the undetermined hand area of the target image frame; when it is determined that the previous image does not include the hand image, perform hand image recognition on the target image frame to obtain the undetermined hand area.
在一些实施例中,所述当所述在前图像帧包括手部图像时,跟踪在前图像帧的在前手部区域,确定目标图像帧的待定手部区 域,包括:如果在前手部区域是双手区域,调整在前手部区域的双手区域得到包括双手图像的待定手部区域。In some embodiments, when the previous image frame includes a hand image, tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
在一些实施例中,手部姿态信息包括以下至少一项:三维旋转信息、手部根节点信息和尺寸信息;以及所述当所述当前手部区域是待识别双手区域时,调整在前图像帧中在前手部区域的在前手部姿态信息,包括以下至少一项:根据在前手部姿态信息中的三维旋转信息,确定当前手部姿态信息中的三维旋转信息;根据手部根节点在所述在前手部区域的相对位置,确定手部根节点在所述待识别双手区域中的相对位置;根据所述在前手部区域中手部图像的尺寸信息,确定所述待识别双手区域中对应的手部区域的尺寸信息。In some embodiments, the hand pose information includes at least one of the following: three-dimensional rotation information, hand root node information, and size information; The front hand pose information in the front hand region in the frame includes at least one of the following: according to the three-dimensional rotation information in the front hand pose information, determine the three-dimensional rotation information in the current hand pose information; The relative position of the node in the front hand area is to determine the relative position of the hand root node in the hands area to be identified; according to the size information of the hand image in the front hand area, determine the The size information of the corresponding hand region in the hands region is identified.
在一些实施例中,待识别双手区域包括第一子区域和第二子区域,在前手部区域包括第三子区域和第四子区域;其中,第三子区域中的手部图像和第一子区域中的手部图像指示同一手部,第四子区域中的手部图像与第二子区域中的手部图像指示同一手部;其中,第一子区域的第一子姿态信息,根据第三子区域中手部图像的第三子姿态信息确定;第二子区域的第二子姿态信息,根据第四子区域中手部图像的第四子姿态信息确定。In some embodiments, the hands area to be identified includes a first sub-area and a second sub-area, and the front hand area includes a third sub-area and a fourth sub-area; wherein, the hand image in the third sub-area and the first sub-area The hand image in a sub-area indicates the same hand, the hand image in the fourth sub-area and the hand image in the second sub-area indicate the same hand; wherein, the first sub-pose information of the first sub-area, It is determined according to the third sub-pose information of the hand image in the third sub-region; the second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
在一些实施例中,第三子姿态信息包括第三子三维旋转信息;以及所述将在前手部姿态信息中的三维旋转信息,确定为当前手部姿态信息中的三维旋转信息,包括:将第三子姿态信息中第三子三维旋转信息,确定为第一子三维旋转信息。In some embodiments, the third sub-pose information includes third sub-3D rotation information; and determining the 3D rotation information in the previous hand pose information as the 3D rotation information in the current hand pose information includes: The third sub-3D rotation information in the third sub-pose information is determined as the first sub-3D rotation information.
在一些实施例中,第三子姿态信息包括第三子根节点位置信息;以及所述根据手部根节点在所述在前手部区域的相对位置,确定手部根节点在所述待识别双手区域中的相对位置,包括:确定第三子根节点信息中的宽度值与第三子区域的宽度值的第一比值,以及将第一比值与第一子区域的宽度值的乘积,确定为第一子根节点信息中的宽度值;确定第三子根节点信息中的高度值与第三子区域的高度值的第二比值,以及将第二比值与第一子区域的高度值的乘积,确定为第一子根节点信息中的高度值。In some embodiments, the third sub-pose information includes third sub-root node position information; and according to the relative position of the hand root node in the preceding hand region, it is determined that the hand root node is in the to-be-identified The relative position in the two-hand area includes: determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-area, and determining the product of the first ratio and the width value of the first sub-area is the width value in the first sub-root node information; determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-area, and compare the second ratio to the height value of the first sub-area The product is determined as the height value in the information of the first child root node.
在一些实施例中,第三子姿态信息包括第三子尺寸信息;以及所述根据所述在前手部区域中手部图像的尺寸信息,确定所述待识别双手区域中对应的手部区域的尺寸信息,包括:确定第三子手部尺寸信息中的手部尺寸值与在前图像帧的尺寸值的第三子区域的第三比值,以及将第三比值与目标图像帧的尺寸值的乘积,确定为第一子区域的手部尺寸值。In some embodiments, the third sub-pose information includes third sub-size information; and according to the size information of the hand image in the front hand region, determine the corresponding hand region in the to-be-recognized hands region The size information, including: determining the third ratio of the hand size value in the third sub-hand size information and the third sub-region of the size value of the previous image frame, and comparing the third ratio with the size value of the target image frame The product of is determined as the hand size value of the first sub-region.
在一些实施例中,所述装置还用于:当所述当前手部区域是待识别单手区域时,调用单手姿态估计模型,识别所述待识别单手区域中的各个单手区域中的手部姿态,得到各个待识别单手区域对应的手部姿态信息。In some embodiments, the device is further configured to: when the current hand region is a single-handed region to be identified, invoke a single-hand pose estimation model to identify each single-handed region in the single-handed region to be identified The hand posture information corresponding to each single-hand area to be recognized is obtained.
请参考图9,图9示出了本公开的一个实施例的图像识别方法可以应用于其中的示例性系统架构。Please refer to FIG. 9 , which shows an exemplary system architecture in which the image recognition method of an embodiment of the present disclosure can be applied.
如图9所示,系统架构可以包括终端设备901、902、903,网络904,服务器905。网络904用以在终端设备901、902、903和服务器905之间提供通信链路的介质。网络904可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 9 , the system architecture may include terminal devices 901 , 902 , and 903 , a network 904 , and a server 905 . The network 904 is used as a medium for providing communication links between the terminal devices 901 , 902 , 903 and the server 905 . Network 904 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
终端设备901、902、903可以通过网络904与服务器905交互,以接收或发送消息等。终端设备901、902、903上可以安装有各种客户端应用,例如网页浏览器应用、搜索类应用、新闻资讯类应用。终端设备901、902、903中的客户端应用可以接收用户的指令,并根据用户的指令完成相应的功能,例如根据用户的指令在信息中添加相应信息。The terminal devices 901, 902, 903 can interact with the server 905 through the network 904 to receive or send messages and the like. Various client applications, such as web browser applications, search applications, and news information applications, may be installed on the terminal devices 901, 902, and 903. The client applications in the terminal devices 901, 902, and 903 can receive user instructions and complete corresponding functions according to the user instructions, such as adding corresponding information to information according to the user instructions.
终端设备901、902、903可以是硬件,也可以是软件。当终端设备901、902、903为硬件时,可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。当终端设备901、902、903为软件时,可以安装在上述所列举的电子设备中。其可以实现成 多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。 Terminal devices 901, 902, and 903 may be hardware or software. When the terminal devices 901, 902, and 903 are hardware, they may be various electronic devices that have display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, moving picture expert compression standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture expert compression standard audio layer 4) player, laptop portable computer and desktop computer, etc. When the terminal devices 901, 902, and 903 are software, they can be installed in the electronic devices listed above. It can be implemented as multiple software or software modules (such as software or software modules for providing distributed services), or as a single software or software module. No specific limitation is made here.
服务器905可以是提供各种服务的服务器,例如接收终端设备901、902、903发送的信息获取请求,根据信息获取请求通过各种方式获取信息获取请求对应的展示信息。并展示信息的相关数据发送给终端设备901、902、903。The server 905 may be a server that provides various services, such as receiving information acquisition requests sent by the terminal devices 901, 902, and 903, and obtaining display information corresponding to the information acquisition requests in various ways according to the information acquisition requests. And the relevant data showing the information is sent to the terminal devices 901 , 902 , 903 .
需要说明的是,本公开实施例所提供的图像识别方法可以由终端设备执行,相应地,图像识别装置可以设置在终端设备901、902、903中。此外,本公开实施例所提供的图像识别方法还可以由服务器905执行,相应地,图像识别装置可以设置于服务器905中。It should be noted that the image recognition method provided by the embodiment of the present disclosure may be executed by a terminal device, and correspondingly, the image recognition apparatus may be set in the terminal devices 901 , 902 , and 903 . In addition, the image recognition method provided by the embodiment of the present disclosure may also be executed by the server 905 , and correspondingly, the image recognition device may be set in the server 905 .
应该理解,图9中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 9 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
下面参考图10,其示出了适于用来实现本公开实施例的电子设备(例如图9中的终端设备或服务器)的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图10示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring now to FIG. 10 , it shows a schematic structural diagram of an electronic device (such as the terminal device or server in FIG. 9 ) suitable for implementing the embodiments of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 10 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
如图10所示,电子设备可以包括处理装置(例如中央处理器、图形处理器等)1001,其可以根据存储在只读存储器(ROM)1002中的程序或者从存储装置1008加载到随机访问存储器(RAM)1003中的程序而执行各种适当的动作和处理。在RAM 1003中,还存储有电子设备1000操作所需的各种程序和数据。处理装置1001、ROM 1002以及RAM 1003通过总线1004彼此相连。输入/输出(I/O)接口1005也连接至总线1004。As shown in FIG. 10, an electronic device may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 1001, which may be loaded into a random access memory according to a program stored in a read-only memory (ROM) 1002 or from a storage device 1008. (RAM) 1003 to execute various appropriate actions and processing. In the RAM 1003, various programs and data necessary for the operation of the electronic device 1000 are also stored. The processing device 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004. An input/output (I/O) interface 1005 is also connected to the bus 1004 .
通常,以下装置可以连接至I/O接口1005:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1009;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置1007;包括例如磁带、硬盘等的存储装置1008;以及通信装置1008。通信装置1008可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图10示出了具有各种装置的电子设备,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the I/O interface 1005: input devices 1009 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 1007 such as a computer; a storage device 1008 including, for example, a magnetic tape, a hard disk, and the like; and a communication device 1008 . The communication means 1008 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While FIG. 10 shows an electronic device having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1008从网络上被下载和安装,或者从存储装置1008被安装,或者从ROM 1002被安装。在该计算机程序被处理装置1001执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 1008, or from storage means 1008, or from ROM 1002. When the computer program is executed by the processing device 1001, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式, 包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium The communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:从图像帧序列的目标图像帧中,确定当前手部区域,其中,所述当前手部区域是待识别单手区域或待识别双手区域,所述待识别双手区域中的两只手对应的图像具有重合区域;当所述当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,得到待识别手部区域的当前手部姿态信息,其中,所述在前图像帧包括所述图像帧序列中位次在所述目标图像帧之前的图像帧。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: determines the current hand region from the target image frame of the image frame sequence, wherein, The current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and the images corresponding to the two hands in the two-hand region to be recognized have overlapping regions; when the current hand region is a two-hand region to be recognized, Adjusting the previous hand pose information of the previous hand region in the previous image frame to obtain the current hand pose information of the hand region to be recognized, wherein the previous image frame includes the middle position of the image frame sequence The image frame immediately preceding the target image frame.
在一些实施例中,所述电子设备还用于:根据所述当前手部姿态信息,在所述目标图像帧上,添加手部图像特效。In some embodiments, the electronic device is further configured to: add hand image special effects on the target image frame according to the current hand posture information.
在一些实施例中,所述从图像帧序列的目标图像帧中,确定当前手部区域,包括:在所述目标图像帧中,确定手部图像位置得到至少一个待定手部区域;基于待定手部区域中的手部图像的 位置关系,确定各个待定手部区域的处理方式,其中,所述处理方式包括以下至少一项:不处理、拆分和合并;采用所确定的处理方式处理各个待定手部区域,得到当前手部区域。In some embodiments, the determining the current hand region from the target image frame of the image frame sequence includes: determining the position of the hand image in the target image frame to obtain at least one pending hand region; The positional relationship of the hand images in the hand area, determine the processing mode of each undetermined hand area, wherein, the processing mode includes at least one of the following: no processing, splitting and merging; use the determined processing mode to process each undetermined hand area Hand area, get the current hand area.
在一些实施例中,所述基于待定手部区域中的手部图像的位置关系,确定待定手部区域的处理方式,包括:确定待定手部区域中的手部数量;对于手部数量不小于2的待定手部区域,确定是否拆分该待定手部区域;对于手部数量为1的至少两个待定手部区域,确定是否合并手部数量为1的待定手部区域。In some embodiments, the processing method of determining the pending hand region based on the positional relationship of the hand images in the pending hand region includes: determining the number of hands in the pending hand region; for the number of hands not less than For the pending hand region of 2, determine whether to split the pending hand region; for at least two pending hand regions with the number of hands being 1, determine whether to merge the pending hand regions with the number of hands being 1.
在一些实施例中,所述对于手部数量不小于2的待定手部区域,确定是否拆分该待定手部区域,包括:在手部数量不小于2的待定手部区域中,定位第一手图像得到待定第一子区域,以及定位第二手图像得到待定第二子区域;以及所述采用所确定的处理方式处理各个待定手部区域,得到当前手部区域,包括:响应于确定待定第一子区域和待定第二子区域存在重合区域,不拆分所述手部数量不小于2的待定手部区域,以及将所述手部数量不小于2的待定手部区域确定为待识别双手区域;响应于确定待定第一子区域和待定第二子区域不存在重合区域,拆分所述手部数量不小于2的待定手部区域,得到所述待识别单手区域。In some embodiments, for the pending hand region with the number of hands not less than 2, determining whether to split the pending hand region includes: in the pending hand region with the number of hands not less than 2, positioning the first Obtaining the undetermined first sub-area from the hand image, and locating the second hand image to obtain the undetermined second sub-area; and processing each undetermined hand area by using the determined processing method to obtain the current hand area, including: responding to determining the undetermined sub-area There is an overlapping area between the first sub-region and the pending second sub-region, do not split the pending hand region with the number of hands not less than 2, and determine the pending hand region with the number of hands not less than 2 as the pending hand region to be identified Two-hand area: in response to determining that there is no overlapping area between the first sub-area to be determined and the second sub-area to be determined, splitting the undetermined hand area with the number of hands not less than 2 to obtain the single-hand area to be identified.
在一些实施例中,所述对于手部数量为1的至少两个待定手部区域,确定是否合并手部数量为1的待定手部区域,包括:确定任意两个手部数量为1的待定手部区域,是否存在重合区域;所述采用所确定的处理方式处理各个待定手部区域,得到当前手部区域,包括:合并存在重合区域的待定手部区域,得到所述待识别双手区域;如果手部数量为1的待定手部区域不与任何待定手部区域存在重合区域,将该手部数量为1的待定手部区域,确定为所述待识别单手区域。In some embodiments, for at least two pending hand regions with a hand number of 1, determining whether to merge the pending hand regions with a hand number of 1 includes: determining any two pending hand regions with a hand number of 1 The hand area, whether there is an overlapping area; the processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes: merging the undetermined hand areas with overlapping areas to obtain the unidentified hands area; If the pending hand region with the number of hands being 1 does not overlap with any pending hand region, the pending hand region with the number of hands being 1 is determined as the single-hand region to be identified.
在一些实施例中,所述在所述目标图像帧中,确定手部图像位置得到至少一个待定手部区域,包括:当所述在前图像帧包括手部图像时,跟踪在前图像帧的在前手部区域,确定目标图像帧的待定手部区域;当确定在前图像不包括手部图像时,对所述目 标图像帧进行手部图像识别,得到所述待定手部区域。In some embodiments, the determining the position of the hand image in the target image frame to obtain at least one undetermined hand region includes: when the previous image frame includes a hand image, tracking the position of the previous image frame In the previous hand area, determine the undetermined hand area of the target image frame; when it is determined that the previous image does not include the hand image, perform hand image recognition on the target image frame to obtain the undetermined hand area.
在一些实施例中,所述当所述在前图像帧包括手部图像时,跟踪在前图像帧的在前手部区域,确定目标图像帧的待定手部区域,包括:如果在前手部区域是双手区域,调整在前手部区域的双手区域得到包括双手图像的待定手部区域。In some embodiments, when the previous image frame includes a hand image, tracking the previous hand region of the previous image frame and determining the pending hand region of the target image frame includes: if the previous hand region The region is the hands region, and adjusting the hands region in the front hand region results in a pending hand region including both hands images.
在一些实施例中,手部姿态信息包括以下至少一项:三维旋转信息、手部根节点信息和尺寸信息;以及所述当所述当前手部区域是待识别双手区域时,调整在前图像帧中在前手部区域的在前手部姿态信息,包括以下至少一项:根据在前手部姿态信息中的三维旋转信息,确定当前手部姿态信息中的三维旋转信息;根据手部根节点在所述在前手部区域的相对位置,确定手部根节点在所述待识别双手区域中的相对位置;根据所述在前手部区域中手部图像的尺寸信息,确定所述待识别双手区域中对应的手部区域的尺寸信息。In some embodiments, the hand pose information includes at least one of the following: three-dimensional rotation information, hand root node information, and size information; The front hand pose information in the front hand region in the frame includes at least one of the following: according to the three-dimensional rotation information in the front hand pose information, determine the three-dimensional rotation information in the current hand pose information; The relative position of the node in the front hand area is to determine the relative position of the hand root node in the hands area to be identified; according to the size information of the hand image in the front hand area, determine the The size information of the corresponding hand region in the hands region is identified.
在一些实施例中,待识别双手区域包括第一子区域和第二子区域,在前手部区域包括第三子区域和第四子区域;其中,第三子区域中的手部图像和第一子区域中的手部图像指示同一手部,第四子区域中的手部图像与第二子区域中的手部图像指示同一手部;其中,第一子区域的第一子姿态信息,根据第三子区域中手部图像的第三子姿态信息确定;第二子区域的第二子姿态信息,根据第四子区域中手部图像的第四子姿态信息确定。In some embodiments, the hands area to be identified includes a first sub-area and a second sub-area, and the front hand area includes a third sub-area and a fourth sub-area; wherein, the hand image in the third sub-area and the first sub-area The hand image in a sub-area indicates the same hand, the hand image in the fourth sub-area and the hand image in the second sub-area indicate the same hand; wherein, the first sub-pose information of the first sub-area, It is determined according to the third sub-pose information of the hand image in the third sub-region; the second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
在一些实施例中,第三子姿态信息包括第三子三维旋转信息;以及所述将在前手部姿态信息中的三维旋转信息,确定为当前手部姿态信息中的三维旋转信息,包括:将第三子姿态信息中第三子三维旋转信息,确定为第一子三维旋转信息。In some embodiments, the third sub-pose information includes third sub-3D rotation information; and determining the 3D rotation information in the previous hand pose information as the 3D rotation information in the current hand pose information includes: The third sub-3D rotation information in the third sub-pose information is determined as the first sub-3D rotation information.
在一些实施例中,第三子姿态信息包括第三子根节点位置信息;以及所述根据手部根节点在所述在前手部区域的相对位置,确定手部根节点在所述待识别双手区域中的相对位置,包括:确定第三子根节点信息中的宽度值与第三子区域的宽度值的第一比值,以及将第一比值与第一子区域的宽度值的乘积,确定为第一 子根节点信息中的宽度值;确定第三子根节点信息中的高度值与第三子区域的高度值的第二比值,以及将第二比值与第一子区域的高度值的乘积,确定为第一子根节点信息中的高度值。In some embodiments, the third sub-pose information includes third sub-root node position information; and according to the relative position of the hand root node in the preceding hand region, it is determined that the hand root node is in the to-be-identified The relative position in the two-hand area includes: determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-area, and determining the product of the first ratio and the width value of the first sub-area is the width value in the first sub-root node information; determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-area, and compare the second ratio to the height value of the first sub-area The product is determined as the height value in the information of the first child root node.
在一些实施例中,第三子姿态信息包括第三子尺寸信息;以及所述根据所述在前手部区域中手部图像的尺寸信息,确定所述待识别双手区域中对应的手部区域的尺寸信息,包括:确定第三子手部尺寸信息中的手部尺寸值与在前图像帧的尺寸值的第三子区域的第三比值,以及将第三比值与目标图像帧的尺寸值的乘积,确定为第一子区域的手部尺寸值。In some embodiments, the third sub-pose information includes third sub-size information; and according to the size information of the hand image in the front hand region, determine the corresponding hand region in the to-be-recognized hands region The size information, including: determining the third ratio of the hand size value in the third sub-hand size information and the third sub-region of the size value of the previous image frame, and comparing the third ratio with the size value of the target image frame The product of is determined as the hand size value of the first sub-region.
在一些实施例中,所述电子设备还用于:当所述当前手部区域是待识别单手区域时,调用单手姿态估计模型,识别所述待识别单手区域中的各个单手区域中的手部姿态,得到各个待识别单手区域对应的手部姿态信息。In some embodiments, the electronic device is further configured to: when the current hand area is a single-handed area to be identified, call a single-hand pose estimation model to identify each single-handed area in the single-handed area to be identified The hand pose in the image is obtained to obtain the hand pose information corresponding to each single-hand area to be recognized.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于 附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,确定单元还可以被描述为“确定当前手部区域的单元”。The units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the unit does not constitute a limitation of the unit itself under certain circumstances, for example, the determination unit may also be described as "a unit for determining the current hand region".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵 盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principles. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with (but not limited to) technical features with similar functions disclosed in this disclosure.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims.

Claims (17)

  1. 一种图像识别方法,其特征在于,包括:An image recognition method, characterized in that, comprising:
    从图像帧序列的目标图像帧中,确定当前手部区域,其中,所述当前手部区域是待识别单手区域或待识别双手区域,所述待识别双手区域中的两只手对应的图像具有重合区域;From the target image frame of the image frame sequence, determine the current hand region, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and images corresponding to two hands in the two-hand region to be recognized have overlapping areas;
    当所述当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,得到待识别手部区域的当前手部姿态信息,其中,所述在前图像帧包括所述图像帧序列中位次在所述目标图像帧之前的图像帧。When the current hand region is the hands region to be identified, adjusting the previous hand posture information in the previous hand region in the previous image frame to obtain the current hand posture information of the hand region to be recognized, wherein, The previous image frame includes an image frame in the image frame sequence preceding the target image frame.
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, further comprising:
    根据所述当前手部姿态信息,在所述目标图像帧上,添加手部图像特效。Adding hand image special effects to the target image frame according to the current hand posture information.
  3. 根据权利要求1所述的方法,其特征在于,所述从图像帧序列的目标图像帧中,确定当前手部区域,包括:The method according to claim 1, wherein said determining the current hand region from the target image frame of the image frame sequence comprises:
    在所述目标图像帧中,确定手部图像位置得到至少一个待定手部区域;In the target image frame, determining the position of the hand image to obtain at least one undetermined hand region;
    基于待定手部区域中的手部图像的位置关系,确定各个待定手部区域的处理方式,其中,所述处理方式包括以下至少一项:不处理、拆分和合并;Based on the positional relationship of the hand images in the undetermined hand area, determine the processing mode of each undetermined hand area, wherein the processing mode includes at least one of the following: no processing, splitting and merging;
    采用所确定的处理方式处理各个待定手部区域,得到当前手部区域。Each pending hand area is processed by the determined processing method to obtain the current hand area.
  4. 根据权利要求3所述的方法,其特征在于,所述基于待定手部区域中的手部图像的位置关系,确定待定手部区域的处理方式,包括:The method according to claim 3, wherein said determining the processing method of the pending hand region based on the positional relationship of the hand images in the pending hand region includes:
    确定待定手部区域中的手部数量;Determine the number of hands in the pending hand region;
    对于手部数量不小于2的待定手部区域,确定是否拆分该待 定手部区域;For the undetermined hand area whose hand quantity is not less than 2, determine whether to split the undetermined hand area;
    对于手部数量为1的至少两个待定手部区域,确定是否合并手部数量为1的待定手部区域。For at least two pending hand regions with a hand quantity of 1, determine whether to merge the pending hand regions with a hand quantity of 1.
  5. 根据权利要求4所述的方法,其特征在于,所述对于手部数量不小于2的待定手部区域,确定是否拆分该待定手部区域,包括:The method according to claim 4, wherein, for the undetermined hand area whose number of hands is not less than 2, determining whether to split the undetermined hand area comprises:
    在手部数量不小于2的待定手部区域中,定位第一手图像得到待定第一子区域,以及定位第二手图像得到待定第二子区域;以及In the undetermined hand region with the number of hands not less than 2, locating the first hand image to obtain the undetermined first subregion, and locating the second hand image to obtain the undetermined second subregion; and
    所述采用所确定的处理方式处理各个待定手部区域,得到当前手部区域,包括:The processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes:
    响应于确定待定第一子区域和待定第二子区域存在重合区域,不拆分所述手部数量不小于2的待定手部区域,以及将所述手部数量不小于2的待定手部区域确定为待识别双手区域;In response to determining that there is an overlapping area between the pending first sub-region and the pending second sub-region, not splitting the pending hand region with the number of hands not less than 2, and dividing the pending hand region with the number of hands not less than 2 Determined as the hands area to be identified;
    响应于确定待定第一子区域和待定第二子区域不存在重合区域,拆分所述手部数量不小于2的待定手部区域,得到所述待识别单手区域。In response to determining that there is no overlapping area between the pending first sub-region and the pending second sub-region, splitting the pending hand region with the number of hands not less than 2 to obtain the single-hand region to be identified.
  6. 根据权利要求4所述的方法,其特征在于,所述对于手部数量为1的至少两个待定手部区域,确定是否合并手部数量为1的待定手部区域,包括:The method according to claim 4, wherein, for the at least two pending hand regions whose number of hands is 1, determining whether to merge the pending hand regions whose number of hands is 1 comprises:
    确定任意两个手部数量为1的待定手部区域,是否存在重合区域;Determine whether any two undetermined hand areas with a hand number of 1 exist overlapping areas;
    所述采用所确定的处理方式处理各个待定手部区域,得到当前手部区域,包括:The processing of each undetermined hand area by using the determined processing method to obtain the current hand area includes:
    合并存在重合区域的待定手部区域,得到所述待识别双手区域;Merge the undetermined hand areas with overlapping areas to obtain the unidentified hands area;
    如果手部数量为1的待定手部区域不与任何待定手部区域存在重合区域,将该手部数量为1的待定手部区域,确定为所述待 识别单手区域。If the undetermined hand area with the number of hands being 1 does not overlap with any undetermined hand area, the undetermined hand area with the number of hands being 1 is determined as the single-hand area to be identified.
  7. 根据权利要求3所述的方法,其特征在于,所述在所述目标图像帧中,确定手部图像位置得到至少一个待定手部区域,包括:The method according to claim 3, wherein, in the target image frame, determining the position of the hand image to obtain at least one undetermined hand region comprises:
    当所述在前图像帧包括手部图像时,根据在前图像帧的在前手部区域,确定目标图像帧的待定手部区域;When the previous image frame includes a hand image, determine the pending hand area of the target image frame according to the previous hand area of the previous image frame;
    当确定在前图像不包括手部图像时,对所述目标图像帧进行手部图像识别,得到所述待定手部区域。When it is determined that the previous image does not include a hand image, performing hand image recognition on the target image frame to obtain the pending hand region.
  8. 根据权利要求7所述的方法,其特征在于,所述当所述在前图像帧包括手部图像时,根据在前图像帧的在前手部区域,确定目标图像帧的待定手部区域,包括:The method according to claim 7, wherein when the previous image frame includes a hand image, determining the pending hand area of the target image frame according to the previous hand area of the previous image frame, include:
    如果在前手部区域是双手区域,调整该双手区域得到包括双手图像的待定手部区域。If the front hand region is a hands region, the hands region is adjusted to obtain a pending hand region including both hands images.
  9. 根据权利要求1所述的方法,其特征在于,手部姿态信息包括以下至少一项:三维旋转信息、手部根节点信息和尺寸信息;以及The method according to claim 1, wherein the hand posture information includes at least one of the following: three-dimensional rotation information, hand root node information and size information; and
    所述当所述手部区域是待识别双手区域时,对在前图像帧中手部区域的在前手部姿态信息进行调整,包括以下至少一项:When the hand region is the hands region to be recognized, adjusting the previous hand posture information of the hand region in the previous image frame includes at least one of the following:
    根据在前手部姿态信息中的三维旋转信息,确定当前手部姿态信息中的三维旋转信息;determining the three-dimensional rotation information in the current hand posture information according to the three-dimensional rotation information in the previous hand posture information;
    根据手部根节点在所述在前手部区域的相对位置,确定当前手部根节点在所述待识别双手区域中的相对位置;According to the relative position of the root node of the hand in the region of the front hand, determine the relative position of the root node of the current hand in the region of both hands to be identified;
    根据所述在前手部区域中手部图像的尺寸信息,确定待识别双手区域中对应的手部区域的尺寸信息。According to the size information of the hand image in the front hand region, the size information of the corresponding hand region in the to-be-recognized hands region is determined.
  10. 根据权利要求9所述的方法,其特征在于,待识别双手区域包括第一子区域和第二子区域,在前手部区域包括第三子区域 和第四子区域;其中,第三子区域中的手部图像和第一子区域中的手部图像指示同一手部,第四子区域中的手部图像与第二子区域中的手部图像指示同一手部;The method according to claim 9, wherein the area of both hands to be identified includes a first sub-area and a second sub-area, and the front hand area includes a third sub-area and a fourth sub-area; wherein, the third sub-area The hand image in the first sub-area indicates the same hand as the hand image in the first sub-area, and the hand image in the fourth sub-area indicates the same hand as the hand image in the second sub-area;
    其中,第一子区域的第一子姿态信息,根据第三子区域中手部图像的第三子姿态信息确定;Wherein, the first sub-pose information of the first sub-region is determined according to the third sub-pose information of the hand image in the third sub-region;
    第二子区域的第二子姿态信息,根据第四子区域中手部图像的第四子姿态信息确定。The second sub-pose information of the second sub-region is determined according to the fourth sub-pose information of the hand image in the fourth sub-region.
  11. 根据权利要求10所述的方法,其特征在于,第三子姿态信息包括第三子三维旋转信息;以及The method according to claim 10, wherein the third sub-pose information includes third sub-3D rotation information; and
    根据在前手部姿态信息中的三维旋转信息,确定当前手部姿态信息中的三维旋转信息,包括:According to the three-dimensional rotation information in the previous hand posture information, determine the three-dimensional rotation information in the current hand posture information, including:
    第三子姿态信息中第三子三维旋转信息,确定为第一子三维旋转信息。The third sub-3D rotation information in the third sub-attitude information is determined as the first sub-3D rotation information.
  12. 根据权利要求10所述的方法,其特征在于,第三子姿态信息包括第三子根节点位置信息;以及The method according to claim 10, wherein the third sub-pose information includes position information of a third sub-root node; and
    所述根据手部根节点在所述在前手部区域的相对位置,确定当前手部根节点在所述待识别双手区域中的相对位置,包括:The determining the relative position of the current hand root node in the hands-to-be-recognized area according to the relative position of the hand root node in the front hand area includes:
    确定第三子根节点信息中的宽度值与第三子区域的宽度值的第一比值,以及将第一比值与第一子区域的宽度值的乘积,确定为第一子根节点信息中的宽度值;determining the first ratio of the width value in the third sub-root node information to the width value of the third sub-region, and determining the product of the first ratio and the width value of the first sub-region as the first ratio in the first sub-root node information width value;
    确定第三子根节点信息中的高度值与第三子区域的高度值的第二比值,以及将第二比值与第一子区域的高度值的乘积,确定为第一子根节点信息中的高度值。Determine the second ratio of the height value in the third sub-root node information to the height value of the third sub-region, and determine the product of the second ratio and the height value of the first sub-region as the first sub-root node information height value.
  13. 根据权利要求10所述的方法,其特征在于,第三子姿态信息包括第三子尺寸信息;以及The method according to claim 10, wherein the third sub-pose information includes third sub-size information; and
    所述根据所述在前手部区域中手部图像的尺寸信息,确定待识别双手区域中对应的手部区域的尺寸信息,包括:According to the size information of the hand image in the front hand area, determining the size information of the corresponding hand area in the hands area to be identified includes:
    确定第三子手部尺寸信息中的手部尺寸值与在前图像帧的尺寸值的第三子区域的第三比值,以及将第三比值与目标图像帧的尺寸值的乘积,确定为第一子区域的手部尺寸值。Determine the third ratio of the hand size value in the third sub-hand size information to the size value of the previous image frame in the third sub-region, and determine the product of the third ratio and the size value of the target image frame as the first Hand size value for a subregion.
  14. 根据权利要求1所述的方法,其特征在于,所述方法还包括:The method according to claim 1, further comprising:
    当所述当前手部区域是待识别单手区域时,调用单手姿态估计模型,识别所述待识别单手区域中的各个单手区域中的手部姿态,得到各个待识别单手区域对应的手部姿态信息。When the current hand area is a single-handed area to be identified, the single-hand pose estimation model is invoked to identify the hand gestures in each of the single-handed areas in the single-handed area to be identified, and the correspondence between each single-handed area to be identified is obtained. hand gesture information.
  15. 一种图像识别装置,其特征在于,包括:An image recognition device, characterized in that it comprises:
    确定单元,用于从图像帧序列的目标图像帧中,确定当前手部区域,其中,所述当前手部区域是待识别单手区域或待识别双手区域,所述待识别双手区域中的两只手对应的图像具有重合区域;The determining unit is configured to determine the current hand region from the target image frame of the image frame sequence, wherein the current hand region is a single-hand region to be recognized or a two-hand region to be recognized, and two hands in the region to be recognized are The images corresponding to only hands have overlapping regions;
    调整单元,用于当所述当前手部区域是待识别双手区域时,对在前图像帧中在前手部区域的在前手部姿态信息进行调整,得到待识别手部区域的当前手部姿态信息,其中,所述在前图像帧包括所述图像帧序列中位次在所述目标图像帧之前的图像帧。An adjustment unit, configured to adjust the previous hand posture information of the previous hand region in the previous image frame when the current hand region is the hands region to be recognized, to obtain the current hand of the hand region to be recognized Posture information, wherein the previous image frame includes an image frame in the sequence of image frames that is prior to the target image frame.
  16. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    一个或多个处理器;one or more processors;
    存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-14中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the method according to any one of claims 1-14.
  17. 一种计算机可读介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-14中任一所述的方法。A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-14 is realized.
PCT/CN2022/114436 2021-08-27 2022-08-24 Image recognition method and apparatus, and electronic device WO2023025181A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110999935.5 2021-08-27
CN202110999935.5A CN115731570A (en) 2021-08-27 2021-08-27 Image recognition method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2023025181A1 true WO2023025181A1 (en) 2023-03-02

Family

ID=85290588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114436 WO2023025181A1 (en) 2021-08-27 2022-08-24 Image recognition method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN115731570A (en)
WO (1) WO2023025181A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6252598B1 (en) * 1997-07-03 2001-06-26 Lucent Technologies Inc. Video hand image computer interface
CN103593680A (en) * 2013-11-19 2014-02-19 南京大学 Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model
CN108256421A (en) * 2017-12-05 2018-07-06 盈盛资讯科技有限公司 A kind of dynamic gesture sequence real-time identification method, system and device
CN112733823A (en) * 2021-03-31 2021-04-30 南昌虚拟现实研究院股份有限公司 Method and device for extracting key frame for gesture recognition and readable storage medium
CN112906646A (en) * 2021-03-23 2021-06-04 中国联合网络通信集团有限公司 Human body posture detection method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6252598B1 (en) * 1997-07-03 2001-06-26 Lucent Technologies Inc. Video hand image computer interface
CN103593680A (en) * 2013-11-19 2014-02-19 南京大学 Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model
CN108256421A (en) * 2017-12-05 2018-07-06 盈盛资讯科技有限公司 A kind of dynamic gesture sequence real-time identification method, system and device
CN112906646A (en) * 2021-03-23 2021-06-04 中国联合网络通信集团有限公司 Human body posture detection method and device
CN112733823A (en) * 2021-03-31 2021-04-30 南昌虚拟现实研究院股份有限公司 Method and device for extracting key frame for gesture recognition and readable storage medium

Also Published As

Publication number Publication date
CN115731570A (en) 2023-03-03

Similar Documents

Publication Publication Date Title
WO2021139408A1 (en) Method and apparatus for displaying special effect, and storage medium and electronic device
CN109584276B (en) Key point detection method, device, equipment and readable medium
WO2020186935A1 (en) Virtual object displaying method and device, electronic apparatus, and computer-readable storage medium
WO2022166872A1 (en) Special-effect display method and apparatus, and device and medium
JP7181375B2 (en) Target object motion recognition method, device and electronic device
CN112051961A (en) Virtual interaction method and device, electronic equipment and computer readable storage medium
WO2022007565A1 (en) Image processing method and apparatus for augmented reality, electronic device and storage medium
US11863835B2 (en) Interaction method and apparatus, and electronic device
WO2023193642A1 (en) Video processing method and apparatus, device and storage medium
WO2022183887A1 (en) Video editing method and apparatus, video playback method and apparatus, device and medium
CN111652675A (en) Display method and device and electronic equipment
WO2024027820A1 (en) Image-based animation generation method and apparatus, device, and storage medium
WO2024037556A1 (en) Image processing method and apparatus, and device and storage medium
CN111833459B (en) Image processing method and device, electronic equipment and storage medium
WO2024032752A1 (en) Method and apparatus for generating transition special effect image, device, and storage medium
TW202219822A (en) Character detection method, electronic equipment and computer-readable storage medium
WO2023193639A1 (en) Image rendering method and apparatus, readable medium and electronic device
WO2023138468A1 (en) Virtual object generation method and apparatus, device, and storage medium
WO2020155908A1 (en) Method and apparatus for generating information
US11935176B2 (en) Face image displaying method and apparatus, electronic device, and storage medium
WO2023025181A1 (en) Image recognition method and apparatus, and electronic device
CN111027495A (en) Method and device for detecting key points of human body
WO2021073204A1 (en) Object display method and apparatus, electronic device, and computer readable storage medium
CN110263743B (en) Method and device for recognizing images
CN113703704A (en) Interface display method, head-mounted display device and computer readable medium