WO2021196718A1 - 关键点检测的方法和装置、电子设备、存储介质及计算机程序 - Google Patents

关键点检测的方法和装置、电子设备、存储介质及计算机程序 Download PDF

Info

Publication number
WO2021196718A1
WO2021196718A1 PCT/CN2020/135394 CN2020135394W WO2021196718A1 WO 2021196718 A1 WO2021196718 A1 WO 2021196718A1 CN 2020135394 W CN2020135394 W CN 2020135394W WO 2021196718 A1 WO2021196718 A1 WO 2021196718A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
key point
feature map
point information
key
Prior art date
Application number
PCT/CN2020/135394
Other languages
English (en)
French (fr)
Inventor
金晟
刘文韬
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2022524649A priority Critical patent/JP2022553990A/ja
Publication of WO2021196718A1 publication Critical patent/WO2021196718A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to the field of computer vision technology, and in particular, to a method and device for key point detection, electronic equipment, storage media, and computer programs.
  • key point detection has played a vital role in video analysis.
  • it is possible to identify the target object by detecting the key points of the target object’s face in the video or image.
  • the target object it is necessary to detect a variety of key points of the target object to improve the authenticity of the target object display, for example, multiple key points
  • the points may include body key points, gesture key points, facial key points, and so on.
  • the present disclosure provides a method for key point detection, including:
  • the key position information includes first key point information on the target object, and at least one target part of the target object The position point information of the corresponding detection frame;
  • object key point information of the target object is determined.
  • the present disclosure proposes two-stage key point detection. After the target object is located in the target image, the first key point information of the target object and the detection frame of at least one target part are located through the first key point detection. Then, a more fine-grained second key point detection is performed for the image area of each target part, so that more accurate object key point information of the target object can be obtained.
  • the performing first key point detection on the target image to obtain key position information of the target object includes:
  • the second key point detection is performed on the image area of each target part in the target image based on the position point information of the detection frame corresponding to each target part of the target object to obtain each
  • the second key point information corresponding to the target part includes:
  • second key point information corresponding to each target part is determined.
  • the second feature map corresponding to each target part is intercepted from the first feature map obtained after convolution processing on the target image, and the second key point detection is performed on the basis of the second feature map. Compared with intercepting the image corresponding to each target part from the target image, and then processing it, the number of feature processing can be reduced, and the calculation amount of key point detection can be reduced.
  • the target feature map is subjected to convolution processing to determine target key point information according to the following steps, wherein, in the case that the target feature map is the first feature map, the target key point The information is the key position information of the target object, and when the target feature map is the second feature map, the target key point information is the second key point information corresponding to the target part:
  • the target key point information is determined.
  • the target feature map by performing multiple feature processing on the target feature map, multiple intermediate feature maps of different sizes are generated, and the intermediate feature maps of different sizes correspond to different receptive fields, and then the multiple intermediate feature maps are fused to obtain Fusion feature maps, the resulting fusion feature maps can include features corresponding to intermediate feature maps of different sizes, and then the target key point information is determined based on the fusion feature maps, so that the accuracy of key point detection can be improved.
  • the performing multiple feature processing on the target feature map includes: performing the current feature processing according to the following steps:
  • At least one level of convolution processing is performed respectively to obtain convolution feature maps of different sizes
  • the convolution feature maps of different sizes are subjected to multiple fusion processing to obtain feature maps of different sizes after the current feature processing.
  • At least one level of convolution processing and multiple fusion processing are performed on feature maps of different sizes to obtain feature maps of different sizes after the current feature processing, wherein the features of different sizes
  • the receptive fields of the map are different, and the feature maps of different sizes include different feature information. That is, the feature maps of different sizes include more feature information, so it can be the first key point information or the second key point information for subsequent detection. Provides more feature information to improve the accuracy of key point detection.
  • the first feature map includes a multi-level first feature map, the first feature maps of different levels are obtained through different levels of convolution processing, and each of the first feature maps based on the target object The position point information of the detection frame corresponding to the target part, and the second feature map corresponding to each target part is intercepted from the first feature map, including:
  • the target object includes a person, and the first key point is at least distributed on the limbs and head of the person;
  • the number of the first key points ranges from 5 to 25.
  • the target part includes at least one of a person's face, feet, and hands;
  • the second key points corresponding to the face are distributed at least in at least one area of the facial contour, eyes, eyebrows, nose, and lips of the face;
  • the second key point corresponding to the foot is at least distributed in at least one area of at least one toe, the sole of the foot, and the heel of the foot;
  • the second key points corresponding to the hand are distributed at least on at least one finger of the hand and at least one area of the palm of the hand.
  • fine-grained key point detection can be performed on different target parts based on the detection requirements in different application scenarios.
  • the number of second key points on the contour of the face ranges from 0 to 25, and the number of second key points on each eye is The number ranges from 0 to 10, the number of second key points on each eyebrow ranges from 0 to 10, the number of second key points on the nose ranges from 0 to 15, and the second key points on the lips The number of key points ranges from 0 to 15;
  • the target part includes a foot
  • the foot includes a left foot and/or a right foot
  • the number of second key points of any one of the feet ranges from 1 to 10
  • the hand includes a left hand and/or a right hand; the number of second key points of any one of the hands ranges from 1-25.
  • the method further includes:
  • the key point information of the object can be used to more accurately determine the action category information of the target object or construct a three-dimensional model of the target object.
  • the method further includes:
  • the method further includes:
  • the gesture of the target object and the category corresponding to the gesture are determined.
  • the object key point information can be used to more accurately determine the facial expression category of the target object or determine the gesture and gesture category of the target object.
  • the present disclosure provides a device for detecting key points, including:
  • An image determination module for determining a target image including the target object
  • the first detection module is configured to perform first key point detection on the target image to obtain key position information of the target object;
  • the key position information includes the first key point information on the target object, and the Position point information of the detection frame corresponding to at least one target part of the target object;
  • the second detection module is configured to perform second key point detection on the image area of each target part in the target image based on the position point information of the detection frame corresponding to each target part of the target object , Obtain the second key point information corresponding to each of the target parts;
  • the key point determination module is configured to determine the object key point information of the target object based on the first key point information and the second key point information.
  • the first detection module when the first detection module performs first key point detection on the target image to obtain key position information of the target object, it is configured to:
  • the second detection module performs a second key point on the image area of each target part in the target image based on the position point information of the detection frame corresponding to each target part of the target object When detecting and obtaining the second key point information corresponding to each target part, it is used to:
  • second key point information corresponding to each target part is determined.
  • the first detection module and the second detection module are respectively configured to perform convolution processing on the target feature map according to the following steps to determine target key point information, where the target feature In the case of the first feature map, the target key point information is the key position information of the target object, and the first detection module executes the following steps; in the target feature map, the first In the case of the second feature map, the target key point information is the second key point information corresponding to the target part, and the second detection module executes the following steps:
  • the target key point information is determined.
  • the first detection module and the second detection module are respectively used for performing the current feature processing according to the following steps when performing multiple feature processing on the target feature map:
  • At least one level of convolution processing is performed respectively to obtain convolution feature maps of different sizes
  • the convolution feature maps of different sizes are subjected to multiple fusion processing to obtain feature maps of different sizes after the current feature processing.
  • the first feature map includes multi-level first feature maps
  • the first feature maps of different levels are obtained through different levels of convolution processing
  • the second detection module is based on the target
  • the second feature map corresponding to each target part is used to:
  • the target object includes a person, and the first key point is at least distributed on the limbs and head of the person;
  • the number of the first key points ranges from 5 to 25.
  • the target part includes at least one of a person's face, feet, and hands;
  • the second key points corresponding to the face are distributed at least in at least one area of the facial contour, eyes, eyebrows, nose, and lips of the face;
  • the second key point corresponding to the foot is at least distributed in at least one area of at least one toe, the sole of the foot, and the heel of the foot;
  • the second key points corresponding to the hand are distributed at least on at least one finger of the hand and at least one area of the palm of the hand.
  • the number of second key points on the contour of the face ranges from 0 to 25, and the number of second key points on each eye is The number ranges from 0 to 10, the number of second key points on each eyebrow ranges from 0 to 10, the number of second key points on the nose ranges from 0 to 15, and the second key points on the lips The number of key points ranges from 0 to 15;
  • the target part includes a foot
  • the foot includes a left foot and/or a right foot
  • the number of second key points of any one of the feet ranges from 1 to 10
  • the hand includes a left hand and/or a right hand; the number of second key points of any one of the hands ranges from 1-25.
  • the device further includes:
  • a determining module configured to determine the action category information of the target object based on the determined key point information of the object
  • the construction module is used to construct a three-dimensional model of the target object based on the determined key point information of the object.
  • the device further includes:
  • An expression recognition module configured to determine the facial expression category of the target object based on the determined key point information of the object
  • the device also includes:
  • the gesture recognition module is configured to determine the gesture of the target object and the category corresponding to the gesture based on the determined key point information of the object.
  • the present disclosure provides an electronic device, including a processor, a memory, and a bus.
  • the memory stores machine-readable instructions executable by the processor.
  • the processor When the electronic device is running, the processor and the bus The memories communicate through a bus, and when the machine-readable instructions are executed by the processor, the processor is caused to execute the method for detecting key points as described in the first aspect or any one of the embodiments.
  • the present disclosure provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the processor can execute the first aspect or any one of the above The key point detection method described in the implementation mode.
  • the present disclosure provides a computer program that, when the computer program is executed by a processor, causes the processor to execute the method for detecting key points as described in the first aspect or any one of the embodiments.
  • FIG. 1 shows a schematic flowchart of a method for key point detection provided by an embodiment of the present disclosure
  • FIG. 2 shows a schematic flowchart of a specific method for determining target key point information in a method for key point detection provided by an embodiment of the present disclosure
  • FIG. 3 shows a schematic flowchart of a specific method for performing multiple feature processing on a target feature map in a method for detecting key points provided by an embodiment of the present disclosure
  • FIG. 4 shows a schematic structural diagram of a key point detection neural network provided by an embodiment of the present disclosure
  • FIG. 5 shows a schematic structural diagram of a key point detection device provided by an embodiment of the present disclosure
  • Fig. 6 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the target object By detecting the key points of the target object, the target object can be recognized for actions, expressions, and gestures.
  • different convolutional neural networks can be used to detect key points of different parts.
  • the first convolutional neural network can be used to detect the key points of the target object’s limbs
  • the second The convolutional neural network detects the key points of the face of the target object, and the key points of the hand of the target object are detected through the third convolutional neural network.
  • a large number of convolutional neural network models are required, which makes the calculation amount in the key point detection process large, which in turn leads to low detection efficiency of the key points.
  • the detection of multiple key points of the target object can be achieved.
  • the key points including limbs, face, hands, etc. can be obtained through the key point neural network.
  • the accuracy of the key points of the hand and face obtained in this way is poor.
  • embodiments of the present disclosure provide a method for key point detection.
  • FIG. 1 is a schematic flowchart of a method for key point detection provided by an embodiment of the present disclosure.
  • the method includes:
  • Step S101 Determine a target image including the target object
  • Step S102 Perform first key point detection on the target image to obtain key position information of the target object;
  • the key position information includes the first key point information on the target object and the position point of the detection frame corresponding to at least one target part of the target object information;
  • Step S103 based on the position point information of the detection frame corresponding to each target part in at least one target part of the target object, perform second key point detection on the image area of each target part in the target image to obtain each target part Corresponding second key point information;
  • Step S104 Determine the object key point information of the target object based on the first key point information and the second key point information.
  • the present disclosure proposes two-stage key point detection. After the target object is located in the target image, the first key point information of the target object and the detection frame of at least one target part are located through the first key point detection. Then, a more fine-grained second key point detection is performed for the image area of each target part, so that more accurate object key point information of the target object can be obtained.
  • the target object may be a person, an animal, etc., that is, the target image may be an image including a person, an animal, and the like.
  • the initial image may be obtained, and the initial image may be determined as the target image, and the initial image may include one or more target objects; or, object detection may be performed on the initial image through the object detection neural network to obtain the initial image According to the detection frame of each target object included in the image, the region image corresponding to each target object is intercepted from the initial image according to the detection frame of each target object.
  • the area image corresponding to each target object may be used as the target image corresponding to the target object, or the size of the area image corresponding to each target object may also be adjusted to the first preset size, and the adjusted size The area image is used as the target image corresponding to each target object.
  • step S102 and step S103 are identical to step S102 and step S103:
  • the target image can be detected by the key point detection neural network to obtain the object key point information corresponding to the target image.
  • the key point detection neural network may include a first key point detection network, at least one second key point detection network, and so on.
  • the first key point detection network in the key point detection neural network can perform first key point detection on the target image to obtain key position information of the target object.
  • the key position information includes the first key point information on the target object and Location point information of the detection frame corresponding to at least one target part of the target object.
  • the first key point information may include but is not limited to the coordinate position of the key point of the target object in the image coordinate system
  • the position point information of the detection frame corresponding to the target part may include but is not limited to at least one of the detection frames corresponding to the target part The coordinate position of the position point in the image coordinate system.
  • the first key point may be distributed at least on the limbs and head of the person.
  • the target part may include at least one of the person’s face, feet, and hands; wherein, in the case that the target part includes the face, the second key points corresponding to the face are at least distributed on the facial contour, eyes, eyebrows, and eyebrows of the face.
  • the second key point corresponding to the foot is distributed at least in at least one area of at least one toe, the sole of the foot and the heel of the foot;
  • the second key point corresponding to the hand is distributed at least on at least one finger of the hand and at least one area of the palm of the hand.
  • the type and number of the target part can be determined according to the actual situation.
  • the target part can include the face and the hand, or the target part can also include the face and the foot, or the target Parts can also include the face, hands, and feet.
  • each target part may have a corresponding second key point detection network.
  • the type of the second key point detection network used may be determined according to the situation of the target part.
  • the target part may include the face, feet, and hands of the person
  • the at least one second key point detection network may include the second key point detection network for the face, the second key point detection network for the feet, and the second key point detection network for the hands.
  • Critical point detection network Furthermore, the image area corresponding to the face can be determined based on the position point information of the detection frame corresponding to the face, and then the key point detection of the image area corresponding to the face is performed through the second key point detection network of the face to obtain the second key corresponding to the face on the target object Point information.
  • the foot second key point detection network may be the left foot second key point detection network and/or the right foot second key point detection network
  • the hand second key point detection network may be the left hand second key point detection network Network and/or right-hand second key point detection network.
  • the image area corresponding to the left hand can be determined based on the position point information of the detection frame corresponding to the left hand, and then pass The left-hand second key point detection network performs key point detection on the image area corresponding to the left hand, and obtains the second key point information corresponding to the left hand on the target object; and based on the position point information of the detection frame corresponding to the right hand, determines the image area corresponding to the right hand, The image area corresponding to the right hand is horizontally flipped, and the image area corresponding to the right hand after the horizontal flip processing is input to the left-hand second key point detection network to obtain the second key point information corresponding to the image area after the horizontal flip process. The obtained second key point information is then horizontally flipped to obtain the second key point information corresponding to the right hand.
  • the process of determining the second key point information of the foot can refer to the process of determining the second key point information of the hand, which
  • performing the first key point detection on the target image to obtain the key position information of the target object includes: performing the first convolution processing on the target image to obtain the first feature map; based on the first feature map, Determine the key location information of the target object.
  • the second key point detection is performed on the image area of each target part in the target image, and the second key point corresponding to each target part is obtained.
  • Two key point information including: based on the position point information of the detection frame corresponding to each target part of the target object, intercept the second feature map corresponding to each target part from the first feature map; determine based on the second feature map The second key point information corresponding to each target part.
  • the size of the obtained second feature map can be adjusted to the second preset size to obtain the adjusted second feature map; based on the adjusted second feature map, the second key corresponding to each target part is determined Point information.
  • the key point detection neural network may also include at least one level of convolutional neural network, and the first convolution process is performed on the target image through the at least one level of convolutional neural network included in the key point detection neural network to obtain the first feature map,
  • the first feature map is input into the first key point detection network to obtain key position information of the target object.
  • the target part includes face, hand, and foot
  • the obtained key position information includes the first key point information, the position point information of the detection frame corresponding to the face, the position point information of the detection frame corresponding to the hand, and the foot
  • the position point information of the corresponding detection frame may include the position information of the four vertices of the detection frame and/or the position information of the center point, and the like.
  • a second feature map corresponding to the target part can be intercepted from the first feature map; the second feature map corresponding to the target part can be input to the target
  • the second key point detection network corresponding to the part performs key point detection to obtain the second key point information corresponding to the target part. For example, based on the position point information of the detection frame corresponding to the face, the second feature map corresponding to the face is intercepted from the first feature map, and the second feature map corresponding to the face is input into the second key point detection network for the face. Face key point detection, to obtain the second key point information corresponding to the face.
  • the second feature map corresponding to each target location may be intercepted from the first feature map based on the position point information of the detection frame corresponding to each target location of the target object and the RoIAlign technology.
  • the RoIAlign technology can be used to determine the target position information corresponding to each position point information of the detection frame on the first feature map, and then based on the determined target part
  • a second feature map corresponding to the target part is intercepted from the first feature map.
  • the second feature map corresponding to each target part is intercepted from the first feature map obtained after convolution processing on the target image, and the second key point detection is performed on the basis of the second feature map. Compared with intercepting the image corresponding to each target part from the target image, and then processing it, the number of feature processing can be reduced, and the calculation amount of key point detection can be reduced.
  • the first feature map may include multiple levels of first feature maps, and the first feature maps of different levels are obtained through different levels of convolution processing.
  • the at least one-level convolutional neural network included in the key point detection neural network can be a three-level convolutional neural network, that is, the first-level convolutional neural network, the second-level convolutional neural network, and the third-level convolutional neural network.
  • the target image can be sequentially input to the first-level convolutional neural network and the second-level convolutional neural network for convolution processing to obtain the first-level first feature map, and then input the first-level first feature map to the third Convolution processing is performed in the first-level convolutional neural network to obtain the second-level first feature map.
  • the number of stages of the at least one level of convolutional neural network included in the key point detection neural network can be set according to actual needs.
  • the at least one level of convolutional neural network included in the key point detection neural network can be a five-level convolutional neural network. , Or ten-level convolutional neural network, etc.; the number of convolutions to obtain the first feature map of the first level and the number of convolutions to obtain the first feature map of the second level can be set according to actual needs.
  • the second feature map corresponding to each target part is intercepted from the first feature map, It may include: based on the position point information of the detection frame corresponding to each target part of the target object, from the first feature maps of different levels in the multi-level first feature map, respectively intercept the second feature map corresponding to each target part .
  • the multi-level first feature map includes the first level first feature map and the second level first feature map
  • it can be determined based on the position point information of the detection frame corresponding to each target part of the target object and the RoIAlign technology.
  • the location point information of the detection frame corresponding to each target part is the target location information on the first feature map of the first level and the target location information on the first feature map of the second level; and based on the detection frame corresponding to each target part Location point information
  • the target location information on the first-level first feature map, the first-level second feature map corresponding to the target part is intercepted from the first-level first feature map, and the detection frame corresponding to each target part
  • the location point information is the target location information on the second-level first feature map, and the second-level second feature map corresponding to the target part is intercepted from the second-level first feature map.
  • the target feature map can be convolved to determine the target key point information according to the following steps.
  • the target key point information is the target
  • the key position information of the object when the target feature map is the second feature map, the target key point information is the second key point information corresponding to the target part:
  • Step S201 Perform multiple feature processing on the target feature map to generate multiple intermediate feature maps with different sizes.
  • Step S202 Perform fusion processing on a plurality of intermediate feature maps to obtain a fusion feature map.
  • Step S203 Determine target key point information based on the fusion feature map.
  • the size of the multiple intermediate feature maps can match the preset ratio.
  • the multiple intermediate feature maps include three intermediate feature maps, and the preset ratio is 1:2:4, then the size of the three intermediate feature maps The ratio can be 1:2:4.
  • the sizes of multiple intermediate feature maps can be adjusted to be consistent through a convolutional neural network, and then the multiple intermediate feature maps after the size adjustment are fused to obtain a fused feature map. Further, the fusion feature map is analyzed and processed to obtain target key point information.
  • the target feature map by performing multiple feature processing on the target feature map, multiple intermediate feature maps of different sizes are generated, and the intermediate feature maps of different sizes correspond to different receptive fields, and then the multiple intermediate feature maps are fused to obtain Fusion feature maps, the resulting fusion feature maps can include features corresponding to intermediate feature maps of different sizes, and then the target key point information is determined based on the fusion feature maps, so that the accuracy of key point detection can be improved.
  • performing multiple feature processing on the target feature map includes: performing the current feature processing according to the following steps:
  • step S301 at least one level of convolution processing is performed on the feature maps of different sizes before the current feature processing is performed to obtain convolution feature maps of different sizes;
  • Step S302 Perform multiple fusion processing on convolution feature maps of different sizes to obtain feature maps of different sizes after the current feature processing.
  • Step S301 will be described.
  • the size of the convolution feature map obtained after at least one level of convolution processing and the size of the feature map before at least one level of convolution processing may be the same or different.
  • the sizes of convolution feature maps of different sizes obtained after at least one level of convolution processing also have a proportional relationship.
  • Step S302 is described.
  • the convolution feature maps of different sizes include a first convolution feature map of a first size, a second convolution feature map of a second size, and a third convolution feature map of a third size.
  • performing multiple fusion processing on convolution feature maps of different sizes may include: adjusting the size of the second convolution feature map and the third convolution feature map to the first size, and setting the first convolution feature
  • the second convolution feature map after the size adjustment, and the third convolution feature map after the size adjustment are subjected to feature fusion processing to obtain the feature map of the first size after the current feature processing;
  • the first convolution can be respectively
  • At least one level of convolution processing and multiple fusion processing are performed on feature maps of different sizes to obtain feature maps of different sizes after the current feature processing, wherein the features of different sizes
  • the receptive fields of the map are different, and the feature maps of different sizes include different feature information. That is, the feature maps of different sizes include more feature information, so it can be the first key point information or the second key point information for subsequent detection. Provides more feature information to improve the accuracy of key point detection.
  • the process of the key point detection method is illustrated by an example.
  • the target image can be detected through the key point detection neural network to obtain the object key point information corresponding to the target image.
  • the structure diagram of the key point detection neural network is shown in FIG. 4.
  • the key point detection neural network includes a first key point detection network 41, a face second key point detection network 42, and a hand second key point detection network 43.
  • the target image F0 is input into the key point detection neural network, and feature extraction is performed on the target image F0 through at least one level of convolutional neural network to obtain the first level first feature map F1, and the first level first feature map F1 then performs feature extraction through at least one level of convolutional neural network, and obtains the second level first feature map F2.
  • the size of the first-level first characteristic map F1 and the second-level first characteristic map F2 may be the same or different.
  • the second-level first feature map F2 into the first keypoint detection network 41, and perform feature extraction on the second-level first feature map F2 through at least one level of convolutional neural network to obtain the feature map F3, and the feature map F3 performs at least one level of convolution processing to obtain feature map F41, and performs down-sampling or convolution processing on feature map F3 to obtain feature map F42, where the size of feature map F41 and feature map F42 are proportional, for example, feature map The ratio between the size of F41 and the size of feature map F42 may be 2:1.
  • the size of the convolution feature map F51 can be the same as that of the feature map F41.
  • the size is the same, and the size of the convolution feature map F52 may be the same as the size of the feature map F42.
  • the convolution feature map F51 and the convolution feature map F52 are subjected to a variety of fusion processing to obtain the feature map F61, the feature map F62, and the feature map F63.
  • the size of the feature map F61 can be the same as the size of the convolution feature map F51.
  • the size of the feature map F62 can be the same as the size of the convolution feature map F52; the size ratio among the feature map F61, the feature map F62, and the feature map F63 can be 4:2:1.
  • the process of multiple fusion processing may be: adjusting the size of the convolution feature map F52, so that the size of the adjusted convolution feature map F52 is the same as the size of the convolution feature map F51, and the convolution feature map F51 is the same as the size of the convolution feature map F51.
  • the adjusted convolution feature map F52 is subjected to feature fusion processing to obtain the feature map F61; the size of the convolution feature map F51 is adjusted so that the size of the adjusted convolution feature map F51 is the same as the size of the convolution feature map F52, and the convolution Perform feature fusion processing on the convolution feature map F52 and the size-adjusted convolution feature map F51 to obtain the feature map F62; adjust the size of the convolution feature map F51 and the convolution feature map F52 to make the adjusted convolution feature map F51 and convolution
  • the size of the product feature map F52 is a preset size (ie, the size corresponding to the feature map F63), and the convolution feature map F51 and the convolution feature map F52 after the size adjustment are subjected to feature fusion processing to obtain the feature map F63.
  • the ways to adjust the size of the feature map include but are not limited to up-sampling processing, down-sampling processing, convolution processing, etc.; the feature fusion processing process can be to fuse feature maps in a cascaded manner, or to combine features The map is fused through a convolutional neural network, or the feature maps are cascaded and then input into the convolutional neural network for fusion.
  • the feature fusion processing process can be to fuse feature maps in a cascaded manner, or to combine features The map is fused through a convolutional neural network, or the feature maps are cascaded and then input into the convolutional neural network for fusion.
  • there are many ways of adjusting the size of the feature map and the way of feature fusion processing which are not specifically limited here.
  • the process of obtaining the convolution feature map F71, the convolution feature map F72, and the convolution feature map F73 can be referred to to obtain the convolution feature map F51, convolution feature The process in Figure F52 will not be repeated here.
  • the process of obtaining feature map F81, feature map F82, feature map F83, and feature map F84 by performing multiple fusion processing on convolution feature map F71, convolution feature map F72, and convolution feature map F73 you can refer to the process of obtaining feature maps
  • the process of F61, feature map F62, and feature map F63 will not be repeated here.
  • the feature map F81, feature map F82, feature map F83, and feature map F84 are respectively subjected to at least one level of convolution processing to obtain the corresponding intermediate feature map, and then the intermediate feature map is subjected to feature fusion processing to obtain the fused feature map, and finally Based on the fusion feature map, the key position information is determined, and the key position information includes the first key point information, the position point information of the detection frame corresponding to the face, and the position point information of the detection frame corresponding to the hand.
  • the first-level second feature map F12 corresponding to the hand can be obtained from the first-level first feature map F1 and the second-level first feature map F2 respectively.
  • the first level second feature map F12 and the second level second feature map F22 corresponding to the hand are input into the second key point detection network 43 for processing to obtain the hand
  • the second key point of information can refer to the processing procedure of the first key point detection network 41, which will not be repeated here.
  • the first level second feature maps F13 and the second level corresponding to the face can be obtained, respectively.
  • the second level feature map F23, the first level second feature map F13 and the second level second feature map F23 corresponding to the face are input into the face second key point detection network 42 for processing to obtain the second key point information of the face .
  • the processing procedure of the second key point detection network 42 for the face can refer to the processing procedure of the first key point detection network 41, which will not be repeated here.
  • the structures of the first key point detection network 41, the face second key point detection network 42, and the hand second key point detection network 43 are only exemplary descriptions.
  • the object key point information of the target object includes the first key point information and the second key point information corresponding to each target part.
  • the number of first key points may range from 5 to 25; when the target part includes a face, the number of second key points on the contour of the face included in the face may range from 0 to 25, each The number of second key points on the eyes can range from 0 to 10, the number of second key points on each eyebrow can range from 0 to 10, and the number of second key points on the nose can range from 0 to 15.
  • the number of second key points on the lips can range from 0 to 15; when the target part includes the foot, the foot includes the left foot and/or the right foot, and the number of second key points on any foot can be in the range In the case where the target part includes the hand, the hand includes the left hand and/or the right hand, and the number of the second key points of any hand can range from 1 to 25.
  • the number of first key points and the number of second key points corresponding to each target part can be determined according to the actual detection scene and the requirements for detection accuracy.
  • the number of the second key points corresponding to the face may be 6, which may be distributed on the facial features of the face, that is, on the eyes, eyebrows, nose, and lips of the face.
  • the number of the second key points of any foot can be two, which are distributed on the heel and the middle toe; that is, the second key point of the left foot
  • the number of the two key points may be two, and/or the number of the second key point of the right foot may be two.
  • the hand includes the left hand and/or the right hand
  • the number of second key points on any hand can be six, which are distributed in the center of the palm and on the fingertips of each finger; that is, The number of second key points for the left hand may be six, and/or the number of second key points for the right hand may be six.
  • the method further includes:
  • the action category information of the target object is determined, or, based on the determined key point information of the object, a three-dimensional model of the target object is constructed.
  • the object key point information can be input into the motion detection neural network to obtain the action category information of the target object.
  • the key point information of the object can be input into the 3D model construction software to construct a 3D model of the target object.
  • the key point information of the object can be used to more accurately determine the action category information of the target object or construct a three-dimensional model of the target object.
  • the method further includes:
  • the facial expression category of the target object is determined.
  • the determined key point information of the object can be input into the facial recognition neural network, and the facial expression category of the target object can be recognized.
  • the method further includes:
  • the gesture of the target object and the category corresponding to the gesture are determined.
  • the determined key point information of the object can be input into the gesture recognition neural network, and the gesture and gesture category of the target object can be recognized.
  • the object key point information can be used to more accurately determine the facial expression category of the target object and/or determine the gesture and gesture category of the target object.
  • the embodiments of the present disclosure also provide a device for detecting key points.
  • the device includes an image determining module 501.
  • the image determining module 501 is used to determine a target image including a target object
  • the first detection module 502 is configured to perform first key point detection on the target image to obtain key position information of the target object;
  • the key position information includes the first key point information on the target object, and
  • the position point information of the detection frame corresponding to at least one target part of the target object;
  • the second detection module 503 is configured to perform a second key point on the image area of each target part in the target image based on the position point information of the detection frame corresponding to each target part of the target object Detecting to obtain the second key point information corresponding to each of the target parts;
  • the key point determination module 504 is configured to determine the object key point information of the target object based on the first key point information and the second key point information.
  • the first detection module 502 when performing first key point detection on the target image to obtain key position information of the target object, is configured to:
  • the second detection module 503 performs a second key on the image area of each target part in the target image based on the position point information of the detection frame corresponding to each target part of the target object. Point detection, when the second key point information corresponding to each target part is obtained, it is used to:
  • second key point information corresponding to each target part is determined.
  • the first detection module 502 and the second detection module 503 are respectively configured to perform convolution processing on the target feature map according to the following steps to determine target key point information, where the In the case where the target feature map is the first feature map, the target key point information is the key position information of the target object, and the first detection module 502 performs the following steps; in the target feature map In the case of the second feature map, the target key point information is the second key point information corresponding to the target part, and the second detection module 503 performs the following steps:
  • the target key point information is determined.
  • the first detection module 502 and the second detection module 503 are respectively used for performing the current feature processing according to the following steps when performing multiple feature processing on the target feature map :
  • At least one level of convolution processing is performed respectively to obtain convolution feature maps of different sizes
  • the convolution feature maps of different sizes are subjected to multiple fusion processing to obtain feature maps of different sizes after the current feature processing.
  • the first feature map includes multi-level first feature maps
  • the first feature maps of different levels are obtained through different levels of convolution processing
  • the second detection module 503 is based on the When the position point information of the detection frame corresponding to each target part of the target object is intercepted from the first feature map, the second feature map corresponding to each target part is used to:
  • the target object includes a person
  • the first key point may be at least distributed on the limbs and head of the person
  • the number of the first key points ranges from 5 to 25.
  • the target part may include at least one of a person's face, feet, and hands;
  • the second key points corresponding to the face are distributed at least in at least one area of the facial contour, eyes, eyebrows, nose, and lips of the face;
  • the second key point corresponding to the foot is at least distributed in at least one area of at least one toe, the sole of the foot, and the heel of the foot;
  • the second key points corresponding to the hand are distributed at least on at least one finger of the hand and at least one area of the palm of the hand.
  • the number of second key points on the contour of the face ranges from 0 to 25, and the number of second key points on each eye is The number ranges from 0 to 10, the number of second key points on each eyebrow ranges from 0 to 10, the number of second key points on the nose ranges from 0 to 15, and the second key points on the lips The number of key points ranges from 0 to 15;
  • the target part includes a foot
  • the foot includes a left foot and/or a right foot
  • the number of second key points of any one of the feet ranges from 1 to 10
  • the hand includes a left hand and/or a right hand; the number of second key points of any one of the hands ranges from 1-25.
  • the device further includes:
  • the determining module 505 is configured to determine the action category information of the target object based on the determined key point information of the object;
  • the construction module 506 is configured to construct a three-dimensional model of the target object based on the determined key point information of the object.
  • the device further includes:
  • the expression recognition module 507 is configured to determine the facial expression category of the target object based on the determined key point information of the object;
  • the device also includes:
  • the gesture recognition module 508 is configured to determine the gesture of the target object and the category corresponding to the gesture based on the determined key point information of the object.
  • the functions or templates contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or templates contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • FIG. 6 it is a schematic structural diagram of an electronic device 600 provided by an embodiment of the present disclosure.
  • the electronic device 600 includes a processor 601, a memory 602, and a bus 603.
  • the memory 602 is used to store machine-readable instructions executable by the processor 601, including a memory 6021 and an external memory 6022; the memory 6021 here is also called an internal memory, which is used to temporarily store the calculation data in the processor 601 and to communicate with the hard disk.
  • the processor 601 exchanges data with the external memory 6022 through the memory 6021.
  • the processor 601 and the memory 602 communicate through the bus 603, and the machine-readable instructions are processed
  • the processor 601 is caused to execute the following steps:
  • the key position information includes first key point information on the target object, and at least one target part of the target object The position point information of the corresponding detection frame;
  • the object key point information of the target object is determined.
  • the embodiments of the present disclosure also provide a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the computer program When the computer program is executed by a processor, the processor executes the method described in the foregoing method embodiment.
  • the key point detection method When the computer program is executed by a processor, the processor executes the method described in the foregoing method embodiment. The key point detection method.
  • the embodiments of the present disclosure also provide a computer program, which when the computer program is executed by a processor, causes the processor to execute the key point detection method described in the foregoing method embodiment.
  • the computer program product of the key point detection method provided by the embodiment of the present disclosure includes a computer readable storage medium storing program code, and the program code includes instructions that can be used to execute the key point detection described in the above method embodiment
  • the program code includes instructions that can be used to execute the key point detection described in the above method embodiment
  • the specific working process of the device described above can refer to the corresponding process in the foregoing method embodiment, and will not be repeated here.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division, and there may be other divisions in actual implementation.
  • multiple modules or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection between devices or modules, and may be in electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the embodiments of the present disclosure.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer-readable storage medium executable by a processor.
  • the embodiments of the present disclosure are essentially or all or part of the embodiments of the present disclosure can be embodied in the form of a computer software product.
  • the computer software product is stored in a storage medium and includes several instructions to make a A computer device (which may be a personal computer, a server, or a network device, etc.) executes all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种关键点检测的方法和装置、电子设备、存储介质及计算机程序,该方法包括:确定包括目标对象的目标图像(S101);对目标图像进行第一关键点检测,得到目标对象的关键位置信息;关键位置信息包括目标对象上的第一关键点信息,以及目标对象的至少一个目标部位对应的检测框的位置点信息(S102);基于目标对象的至少一个目标部位中的每个目标部位对应的检测框的位置点信息,对每个目标部位在目标图像中的图像区域进行第二关键点检测,得到每个目标部位对应的第二关键点信息(S103);基于第一关键点信息和第二关键点信息,确定目标对象的对象关键点信息(S104)。

Description

关键点检测的方法和装置、电子设备、存储介质及计算机程序
相关申请的交叉引用
本公开要求于2020年3月30日提交的、申请号为202010239542.X、发明名称为“关键点检测的方法、装置、电子设备及存储介质”的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。
技术领域
本公开涉及计算机视觉技术领域,具体而言,涉及一种关键点检测的方法和装置、电子设备、存储介质及计算机程序。
背景技术
近年来,关键点检测在视频分析中起到至关重要的作用,比如,在安防领域内,可以通过检测视频或图像中目标对象的面部关键点,对该目标对象进行识别。
目前在虚拟现实(Virtual Reality,VR)、增强现实(Augmented Reality,AR)等应用场景中,需要对目标对象的多种关键点进行检测,以提高目标对象显示的真实性,比如,多种关键点可以包括肢体关键点、手势关键点、面部关键点等。
发明内容
第一方面,本公开提供了一种关键点检测的方法,包括:
确定包括目标对象的目标图像;
对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息;所述关键位置信息包括所述目标对象上的第一关键点信息,以及所述目标对象的至少一个目标部位对应的检测框的位置点信息;
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息;
基于所述第一关键点信息和所述第二关键点信息,确定所述目标对象的对象关键点信息。
考虑到如果在目标图像的全图上直接进行目标对象的各个部位的检测,目标对象的各个部位的特征在全图中的占比较小,很难关注目标对象的细粒度的特征,导致检测精度较低,而本公开提出两个阶段的关键点检测,在从目标图像中定位出目标对象后,通过第一关键点检测定位出目标对象的第一关键点信息以及至少一个目标部位的检测框的位置点信息,然后分别针对各个目标部位的图像区域进行更为细粒度的第二关键点检测,从而可以得到更为准确的目标对象的对象关键点信息。
一种可能的实施方式中,所述对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息,包括:
对所述目标图像进行第一卷积处理,得到第一特征图;
基于所述第一特征图,确定所述目标对象的关键位置信息;
所述基于所述目标对象的每个所述目标部位分别对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息,包括:
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图;
基于所述第二特征图,确定每个所述目标部位对应的第二关键点信息。
上述实施方式中,通过从对目标图像进行卷积处理后得到的第一特征图中截取每个目标部位对应的第二特征图,在该第二特征图的基础上进行第二关键点检测,相比于从目标图像中截取每个目标部位对应的图像,再进行处理,可以减少特征处理的次数,减少关键点检测的运算量。
一种可能的实施方式中,根据以下步骤对目标特征图进行卷积处理,确定目标关键点信息,其中,在所述目标特征图为所述第一特征图的情况下,所述目标关键点信息为所述目标对象的关键位置信息,在所述目标特征图为所述第二特征图的情况下,所述目标关键点信息为所述目标部位对应的第二关键点信息:
对所述目标特征图进行多次特征处理,生成尺寸不同的多个中间特征图;
将所述多个中间特征图进行融合处理,得到融合特征图;
基于所述融合特征图,确定所述目标关键点信息。
上述实施方式中,通过对目标特征图进行多次特征处理,生成尺寸不同的多个中间特征图,不同尺寸的中间特征图对应的感受野不同,进而将多个中间特征图进行融合处理,得到融合特征图,得到的融合特征图中可以包括不同尺寸的中间特征图对应的特征,进而基于融合特征图确定目标关键点信息,从而可以提高关键点检测的准确度。
一种可能的实施方式中,所述对所述目标特征图进行多次特征处理,包括:根据以下步骤进行当前次特征处理:
针对进行当前次特征处理前的不同尺寸的特征图,分别进行至少一级卷积处理,得到不同尺寸的卷积特征图;
将所述不同尺寸的卷积特征图进行多种融合处理,得到当前次特征处理后的不同尺寸的特征图。
上述实施方式中,针对当前次特征处理,对不同尺寸的特征图进行至少一级卷积处理以及进行多种融合处理,得到当前次特征处理后的不同尺寸的特征图,其中,不同尺 寸的特征图的感受野不同,进而不同尺寸的特征图包括的特征信息也不同,即得到的不同尺寸的特征图包括的特征信息较多,故可以为后续检测第一关键点信息或第二关键点信息提供较多的特征信息,提高了关键点检测的精确度。
一种可能的实施方式中,所述第一特征图包括多级第一特征图,不同级第一特征图为经过不同级卷积处理得到的,所述基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图,包括:
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述多级第一特征图中的不同级第一特征图中,分别截取与每个所述目标部位对应的第二特征图。
一种可能的实施方式中,所述目标对象包括人物,第一关键点至少分布在所述人物的四肢、头部上;
所述第一关键点的数量范围为5~25。
一种可能的实施方式中,所述目标部位包括人物的面部、脚部、手部中的至少一种;
在所述目标部位包括面部的情况下,面部对应的第二关键点至少分布在所述面部的脸部轮廓、眼睛、眉毛、鼻子、和嘴唇中的至少一个区域;
在所述目标部位包括脚部的情况下,脚部对应的第二关键点至少分布在所述脚部的至少一根脚趾、脚心以及脚跟中的至少一个区域;
在所述目标部位包括手部的情况下,手部对应的第二关键点至少分布在所述手部的至少一根手指、以及手心中的至少一个区域。
上述实施方式中,通过至少一种目标部位的检测,可以在不同的应用场景下,基于检测需求对不同目标部位进行细粒度的关键点检测。
一种可能的实施方式中,在所述目标部位包括面部的情况下,所述脸部轮廓上的第二关键点的数量范围为0~25,每个所述眼睛上的第二关键点的数量范围为0~10,每个所述眉毛上的第二关键点的数量范围为0~10,所述鼻子上的第二关键点的数量范围为0~15,所述嘴唇上的第二关键点的数量范围为0~15;
在所述目标部位包括脚部的情况下,所述脚部包括左脚和/或右脚;任一所述脚部的第二关键点的数量范围为1~10;
在所述目标部位包括手部的情况下,所述手部包括左手和/或右手;任一所述手部的第二关键点的数量范围为1~25。
一种可能的实施方式中,所述方法还包括:
基于确定的所述对象关键点信息,确定所述目标对象的动作类别信息,或者,基于确定的所述对象关键点信息,构建所述目标对象的三维模型。
在基于上述实施方式较准确地检测得到对象关键点信息后,应用该对象关键点信息 就可以较准确地确定目标对象的动作类别信息或构建目标对象的三维模型。
一种可能的实施方式中,在所述目标部位包括面部时,所述方法还包括:
基于确定的所述对象关键点信息,确定所述目标对象的面部表情类别;
在所述目标部位包括手部时,所述方法还包括:
基于确定的所述对象关键点信息,确定所述目标对象的手势以及所述手势对应的类别。
在基于上述实施方式较准确地检测得到对象关键点信息后,应用该对象关键点信息就可以较准确地确定目标对象的面部表情类别或确定目标对象的手势以及手势类别。
以下装置、电子设备等的效果描述参见上述方法的说明,这里不再赘述。
第二方面,本公开提供了一种关键点检测的装置,包括:
图像确定模块,用于确定包括目标对象的目标图像;
第一检测模块,用于对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息;所述关键位置信息包括所述目标对象上的第一关键点信息,以及所述目标对象的至少一个目标部位对应的检测框的位置点信息;
第二检测模块,用于基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息;
关键点确定模块,用于基于所述第一关键点信息和所述第二关键点信息,确定所述目标对象的对象关键点信息。
一种可能的实施方式中,所述第一检测模块,在对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息时,用于:
对所述目标图像进行第一卷积处理,得到第一特征图;
基于所述第一特征图,确定所述目标对象的关键位置信息;
所述第二检测模块,在基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息时,用于:
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图;
基于所述第二特征图,确定每个所述目标部位对应的第二关键点信息。
一种可能的实施方式中,所述第一检测模块和所述第二检测模块,分别用于根据以下步骤对目标特征图进行卷积处理,确定目标关键点信息,其中,在所述目标特征图为所述第一特征图的情况下,所述目标关键点信息为所述目标对象的关键位置信息,且由所述第一检测模块执行以下步骤;在所述目标特征图为所述第二特征图的情况下,所述 目标关键点信息为所述目标部位对应的第二关键点信息,且由所述第二检测模块执行以下步骤:
对所述目标特征图进行多次特征处理,生成尺寸不同的多个中间特征图;
将所述多个中间特征图进行融合处理,得到融合特征图;
基于所述融合特征图,确定所述目标关键点信息。
一种可能的实施方式中,所述第一检测模块和所述第二检测模块,在对所述目标特征图进行多次特征处理时,分别用于:根据以下步骤进行当前次特征处理:
针对进行当前次特征处理前的不同尺寸的特征图,分别进行至少一级卷积处理,得到不同尺寸的卷积特征图;
将所述不同尺寸的卷积特征图进行多种融合处理,得到当前次特征处理后的不同尺寸的特征图。
一种可能的实施方式中,所述第一特征图包括多级第一特征图,不同级第一特征图为经过不同级卷积处理得到的,所述第二检测模块,在基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图时,用于:
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述多级第一特征图中的不同级第一特征图中,分别截取与每个所述目标部位对应的第二特征图。
一种可能的实施方式中,所述目标对象包括人物,第一关键点至少分布在所述人物的四肢、头部上;
所述第一关键点的数量范围为5~25。
一种可能的实施方式中,所述目标部位包括人物的面部、脚部、手部中的至少一种;
在所述目标部位包括面部的情况下,面部对应的第二关键点至少分布在所述面部的脸部轮廓、眼睛、眉毛、鼻子、和嘴唇中的至少一个区域;
在所述目标部位包括脚部的情况下,脚部对应的第二关键点至少分布在所述脚部的至少一根脚趾、脚心、以及脚跟中的至少一个区域;
在所述目标部位包括手部的情况下,手部对应的第二关键点至少分布在所述手部的至少一根手指、以及手心中的至少一个区域。
一种可能的实施方式中,在所述目标部位包括面部的情况下,所述脸部轮廓上的第二关键点的数量范围为0~25,每个所述眼睛上的第二关键点的数量范围为0~10,每个所述眉毛上的第二关键点的数量范围为0~10,所述鼻子上的第二关键点的数量范围为0~15,所述嘴唇上的第二关键点的数量范围为0~15;
在所述目标部位包括脚部的情况下,所述脚部包括左脚和/或右脚;任一所述脚部的第二关键点的数量范围为1~10;
在所述目标部位包括手部的情况下,所述手部包括左手和/或右手;任一所述手部的第二关键点的数量范围为1~25。
一种可能的实施方式中,所述装置还包括:
确定模块,用于基于确定的所述对象关键点信息,确定所述目标对象的动作类别信息;
构建模块,用于基于确定的所述对象关键点信息,构建所述目标对象的三维模型。
一种可能的实施方式中,所述装置还包括:
表情识别模块,用于基于确定的所述对象关键点信息,确定所述目标对象的面部表情类别;
所述装置还包括:
手势识别模块,用于基于确定的所述对象关键点信息,确定所述目标对象的手势以及所述手势对应的类别。
第三方面,本公开提供一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当电子设备运行时,所述处理器与所述存储器之间通过总线通信,所述机器可读指令被所述处理器执行时,使所述处理器执行如上述第一方面或任一实施方式所述的关键点检测的方法。
第四方面,本公开提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时,使所述处理器执行如上述第一方面或任一实施方式所述的关键点检测的方法。
第五方面,本公开提供一种计算机程序,所述计算机程序被处理器执行时,使所述处理器执行如上述第一方面或任一实施方式所述的关键点检测的方法。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例,下面将对实施例中所需要使用的附图作简单地介绍,此处的附图被并入说明书中并构成本说明书中的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本公开实施例所提供的一种关键点检测的方法的流程示意图;
图2示出了本公开实施例所提供的一种关键点检测的方法中,确定目标关键点信息的具体方法的流程示意图;
图3示出了本公开实施例所提供的一种关键点检测的方法中,对目标特征图进行多次特征处理的具体方法的流程示意图;
图4示出了本公开实施例所提供的一种关键点检测神经网络的结构示意图;
图5示出了本公开实施例所提供的一种关键点检测的装置的架构示意图;
图6示出了本公开实施例所提供的一种电子设备的结构示意图。
具体实施方式
为使本公开实施例的目的、特征和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的特定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
通过对目标对象进行关键点检测,可以对目标对象进行动作、表情、手势等的识别。通常,在进行关键点检测时,可以分别通过不同的卷积神经网络对不同部位的关键点进行检测,比如,可以通过第一卷积神经网络对目标对象的肢体关键点进行检测,通过第二卷积神经网络对目标对象的面部关键点进行检测,通过第三卷积神经网络对目标对象的手部关键点进行检测等。利用上述方式对目标对象的多种关键点进行检测时,需要的卷积神经网络模型的数量较多,使得关键点检测过程中的计算量较大,进而导致关键点的检测效率较低。
或者,可以通过增加关键点的数量以及类型,实现对目标对象的多种关键点的检测。示例性的,可以通过关键点神经网络,得到包含肢体、面部、手部等的关键点。但是,由于手部区域、面部区域相对肢体区域的面积较小,使得这种方式得到的手部、面部等的关键点的精度较差。
因此,为了提高关键点检测的效率以及精度,本公开实施例提供了一种关键点检测的方法。
为便于对本公开实施例进行理解,首先对本公开实施例所提供的一种关键点检测的方法进行详细介绍。
参见图1所示,为本公开实施例所提供的关键点检测的方法的流程示意图,该方法包括:
步骤S101,确定包括目标对象的目标图像;
步骤S102,对目标图像进行第一关键点检测,得到目标对象的关键位置信息;关键位置信息包括目标对象上的第一关键点信息,以及目标对象的至少一个目标部位对应的检测框的位置点信息;
步骤S103,基于目标对象的至少一个目标部位中的每个目标部位对应的检测框的位置点信息,对每个目标部位在目标图像中的图像区域进行第二关键点检测,得到每个目标部位对应的第二关键点信息;
步骤S104,基于第一关键点信息和第二关键点信息,确定目标对象的对象关键点信息。
考虑到如果在目标图像的全图上直接进行目标对象的各个部位的检测,目标对象的各个部位的特征在全图中的占比较小,很难关注目标对象的细粒度的特征,导致检测精度较低,而本公开提出两个阶段的关键点检测,在从目标图像中定位出目标对象后,通过第一关键点检测定位出目标对象的第一关键点信息以及至少一个目标部位的检测框的位置点信息,然后分别针对各个目标部位的图像区域进行更为细粒度的第二关键点检测,从而可以得到更为准确的目标对象的对象关键点信息。
以下对步骤S101-S104进行说明。
针对步骤S101:
本公开实施例中,目标对象可以为人物、动物等,即目标图像可以为包括人物、动物等的图像。示例性的,可以获取初始图像,并可以将初始图像确定为目标图像,该初始图像中可以包括一个或多个目标对象;或者,也可以通过对象检测神经网络对初始图像进行对象检测,得到初始图像中包括的每个目标对象的检测框,根据每个目标对象的检测框,从初始图像中截取每个目标对象对应的区域图像。进一步的,可以将每个目标对象对应的区域图像作为该目标对象对应的目标图像,或者,也可以将每个目标对象对应的区域图像的尺寸调整为第一预设尺寸,将尺寸调整后的区域图像作为每个目标对象对应的目标图像。
针对步骤S102以及步骤S103:
这里,可以通过关键点检测神经网络对目标图像进行检测,得到目标图像对应的对象关键点信息。示例性的,关键点检测神经网络可以包括第一关键点检测网络、以及至少一个第二关键点检测网络等。
其中,可以通过关键点检测神经网络中的第一关键点检测网络对目标图像进行第一关键点检测,得到目标对象的关键位置信息,该关键位置信息包括目标对象上的第一关键点信息以及目标对象的至少一个目标部位对应的检测框的位置点信息。其中,第一关键点信息可以包括但不限于目标对象的关键点在图像坐标系中的坐标位置,目标部位对应的检测框的位置点信息可以包括但不限于目标部位对应的检测框的至少一个位置点在图像坐标系中的坐标位置。
示例性的,在目标对象包括人物的情况下,第一关键点可以至少分布在人物的四肢、头部上。目标部位可以包括人物的面部、脚部、手部中的至少一种;其中,在目标部位包括面部的情况下,面部对应的第二关键点至少分布在面部的脸部轮廓、眼睛、眉毛、鼻子、和嘴唇中的至少一个区域;在目标部位包括脚部的情况下,脚部对应的第 二关键点至少分布在脚部的至少一根脚趾、脚心以及脚跟中的至少一个区域;在目标部位包括手部的情况下,手部对应的第二关键点至少分布在手部的至少一根手指、以及手心中的至少一个区域。通过至少一种目标部位的检测,可以在不同的应用场景下,基于检测需求对不同目标部位进行细粒度的关键点检测。
这里,在目标对象包括人物的情况下,目标部位的类型和数量可以根据实际情况进行确定,比如,目标部位可以包括面部和手部,或者,目标部位也可以包括面部和脚部,或者,目标部位还可以包括面部、手部和脚部。进一步的,各目标部位可以有对应的第二关键点检测网络,具体可根据目标部位的情况,来确定使用的第二关键点检测网络的种类。
示例性的,目标部位可以包括人物的面部、脚部、手部,则至少一个第二关键点检测网络可以包括面部第二关键点检测网络、脚部第二关键点检测网络、手部第二关键点检测网络。进而可以基于面部对应的检测框的位置点信息,确定面部对应的图像区域,再通过面部第二关键点检测网络对面部对应的图像区域进行关键点检测,得到目标对象上面部对应的第二关键点信息。
示例性的,脚部第二关键点检测网络可以为左脚第二关键点检测网络和/或右脚第二关键点检测网络,手部第二关键点检测网络可以为左手第二关键点检测网络和/或右手第二关键点检测网络。
一种可能的实施方式中,在手部第二关键点检测网络为左手第二关键点检测网络的情况下,可以基于左手对应的检测框的位置点信息,确定左手对应的图像区域,再通过左手第二关键点检测网络对左手对应的图像区域进行关键点检测,得到目标对象上左手对应的第二关键点信息;并基于右手对应的检测框的位置点信息,确定右手对应的图像区域,将右手对应的图像区域进行水平翻转处理,并将水平翻转处理后的右手对应的图像区域输入至左手第二关键点检测网络,得到水平翻转处理后的图像区域对应的第二关键点信息,将得到的第二关键点信息再进行水平翻转处理,得到右手对应的第二关键点信息。脚部的第二关键点信息的确定过程可参考手部的第二关键点信息的确定过程,此处不再进行赘述。
一种可选实施方式中,对目标图像进行第一关键点检测,得到目标对象的关键位置信息,包括:对目标图像进行第一卷积处理,得到第一特征图;基于第一特征图,确定目标对象的关键位置信息。
基于目标对象的至少一个目标部位中的每个目标部位对应的检测框的位置点信息,对每个目标部位在目标图像中的图像区域进行第二关键点检测,得到每个目标部位对应的第二关键点信息,包括:基于目标对象的每个目标部位对应的检测框的位置点信息,从第一特征图中截取与每个目标部位对应的第二特征图;基于第二特征图,确定每个目标部位对应的第二关键点信息。
示例性的,可以将得到的第二特征图的尺寸调整为第二预设尺寸,得到调整后的第二特征图;基于调整后的第二特征图,确定每个目标部位对应的第二关键点信息。
这里,关键点检测神经网络还可以包括至少一级卷积神经网络,通过关键点检测神经网络中包括的至少一级卷积神经网络对目标图像进行第一卷积处理,得到第一特征图,将第一特征图输入至第一关键点检测网络中,得到目标对象的关键位置信息。在目标部位包括面部、手部、脚部时,得到的关键位置信息中包括第一关键点信息、以及面部对应的检测框的位置点信息、手部对应的检测框的位置点信息、脚部对应的检测框的位置点信息。示例性的,位置点信息可包括检测框的四个顶点的位置信息和/或中心点的位置信息等。
进一步的,可以根据每个目标部位对应的检测框的位置点信息,从第一特征图中截取与该目标部位对应的第二特征图;将该目标部位对应的第二特征图输入至该目标部位对应的第二关键点检测网络中进行关键点检测,得到该目标部位对应的第二关键点信息。比如,基于面部对应的检测框的位置点信息,从第一特征图中截取与面部对应的第二特征图,并将该面部对应的第二特征图输入至面部第二关键点检测网络中进行面部关键点检测,得到面部对应的第二关键点信息。
示例性的,可以基于目标对象的每个目标部位对应的检测框的位置点信息以及RoIAlign技术,从第一特征图中截取与每个目标部位对应的第二特征图。比如,可以基于每个目标部位对应的检测框的位置点信息,利用RoIAlign技术,确定检测框的每个位置点信息在第一特征图上对应的目标位置信息,进而基于确定的每个目标部位对应的检测框的目标位置信息,从第一特征图中截取与该目标部位对应的第二特征图。
本公开实施例中,通过从对目标图像进行卷积处理后得到的第一特征图中截取每个目标部位对应的第二特征图,在该第二特征图的基础上进行第二关键点检测,相比于从目标图像中截取每个目标部位对应的图像,再进行处理,可以减少特征处理的次数,减少关键点检测的运算量。
一种可选实施例中,第一特征图可以包括多级第一特征图,不同级第一特征图为经过不同级卷积处理得到的。比如,关键点检测神经网络包括的至少一级卷积神经网络可以为三级卷积神经网络,即第一级卷积神经网络、第二级卷积神经网络、第三级卷积神经网络,可以将目标图像依次输入至第一级卷积神经网络和第二级卷积神经网络中进行卷积处理,得到第一级第一特征图,再将第一级第一特征图输入至第三级卷积神经网络中进行卷积处理,得到第二级第一特征图。其中,关键点检测神经网络包括的至少一级卷积神经网络的级数可以根据实际需要进行设置,例如,关键点检测神经网络包括的至少一级卷积神经网络可以为五级卷积神经网络、或者十级卷积神经网络等;得到第一级第一特征图的卷积次数和得到第二级第一特征图的卷积次数可以根据实际需要进行设置。
在第一特征图包括多级第一特征图时,基于目标对象的每个目标部位对应的检测框的位置点信息,从第一特征图中截取与每个目标部位对应的第二特征图,可以包括:基于目标对象的每个目标部位对应的检测框的位置点信息,从多级第一特征图中的不同级第一特征图中,分别截取与每个目标部位对应的第二特征图。
这里,在多级第一特征图包括第一级第一特征图以及第二级第一特征图时,可以基于目标对象的每个目标部位对应的检测框的位置点信息以及RoIAlign技术,确定每个目标部位对应的检测框的位置点信息在第一级第一特征图上的目标位置信息以及在第二级第一特征图上的目标位置信息;并基于每个目标部位对应的检测框的位置点信息在第一级第一特征图上的目标位置信息,从第一级第一特征图上截取与目标部位对应的第一级第二特征图,以及基于每个目标部位对应的检测框的位置点信息在第二级第一特征图上的目标位置信息,从第二级第一特征图上截取与目标部位对应的第二级第二特征图。
示例性的,参见图2所示,可以根据以下步骤对目标特征图进行卷积处理,确定目标关键点信息,其中,在目标特征图为第一特征图的情况下,目标关键点信息为目标对象的关键位置信息,在目标特征图为第二特征图的情况下,目标关键点信息为目标部位对应的第二关键点信息:
步骤S201,对目标特征图进行多次特征处理,生成尺寸不同的多个中间特征图。
步骤S202,将多个中间特征图进行融合处理,得到融合特征图。
步骤S203,基于融合特征图,确定目标关键点信息。
这里,多个中间特征图的尺寸可以与预设的比例相符,比如,多个中间特征图包括三个中间特征图,预设的比例为1:2:4,则三个中间特征图的尺寸的比例可以为1:2:4。示例性的,可以通过卷积神经网络将多个中间特征图的尺寸调整为一致,再将尺寸调整后的多个中间特征图进行融合处理,得到融合特征图。进一步的,对融合特征图进行分析处理,得到目标关键点信息。
上述实施方式中,通过对目标特征图进行多次特征处理,生成尺寸不同的多个中间特征图,不同尺寸的中间特征图对应的感受野不同,进而将多个中间特征图进行融合处理,得到融合特征图,得到的融合特征图中可以包括不同尺寸的中间特征图对应的特征,进而基于融合特征图确定目标关键点信息,从而可以提高关键点检测的准确度。
一种可选实施方式中,参见图3所示,对目标特征图进行多次特征处理,包括:根据以下步骤进行当前次特征处理:
步骤S301,针对进行当前次特征处理前的不同尺寸的特征图,分别进行至少一级卷积处理,得到不同尺寸的卷积特征图;
步骤S302,将不同尺寸的卷积特征图进行多种融合处理,得到当前次特征处理后的不同尺寸的特征图。
对步骤S301进行说明,至少一级卷积处理后得到的卷积特征图的尺寸与至少一级卷积处理前的特征图的尺寸可以相同,也可以不同。同时,至少一级卷积处理后得到的不同尺寸的卷积特征图的尺寸也存在比例关系。
对步骤S302进行说明,示例性的,若不同尺寸的卷积特征图包括第一尺寸的第一卷积特征图、第二尺寸的第二卷积特征图以及第三尺寸的第三卷积特征图,则将不同 尺寸的卷积特征图进行多种融合处理可以包括:可以分别将第二卷积特征图以及第三卷积特征图的尺寸调整为第一尺寸,并将第一卷积特征图、尺寸调整后的第二卷积特征图、以及尺寸调整后的第三卷积特征图进行特征融合处理,得到当前次特征处理后的第一尺寸的特征图;可以分别将第一卷积特征图以及第三卷积特征图的尺寸调整为第二尺寸,并将尺寸调整后的第一卷积特征图、第二卷积特征图、以及尺寸调整后的第三卷积特征图进行特征融合处理,得到当前次特征处理后的第二尺寸的特征图;以及可以分别将第一卷积特征图以及第二卷积特征图的尺寸调整为第三尺寸,并将尺寸调整后的第一卷积特征图、尺寸调整后的第二卷积特征图、以及第三卷积特征图进行特征融合处理,得到当前次特征处理后的第三尺寸的特征图。其中,当前次特征处理后的第一尺寸的特征图、第二尺寸的特征图以及第三尺寸的特征图即为当前次特征处理后的不同尺寸的特征图。
上述实施方式中,针对当前次特征处理,对不同尺寸的特征图进行至少一级卷积处理以及进行多种融合处理,得到当前次特征处理后的不同尺寸的特征图,其中,不同尺寸的特征图的感受野不同,进而不同尺寸的特征图包括的特征信息也不同,即得到的不同尺寸的特征图包括的特征信息较多,故可以为后续检测第一关键点信息或第二关键点信息提供较多的特征信息,提高了关键点检测的精确度。
这里,对关键点检测的方法的过程进行举例说明,比如,可以通过关键点检测神经网络对目标图像进行检测,得到目标图像对应的对象关键点信息。其中,在目标部位包括面部以及手部的情况下,关键点检测神经网络的结构示意图如图4所示。
由图4可知,关键点检测神经网络包括第一关键点检测网络41、面部第二关键点检测网络42、以及手部第二关键点检测网络43。
具体的,将目标图像F0输入至关键点检测神经网络中,通过至少一级卷积神经网络对目标图像F0进行特征提取,得到第一级第一特征图F1,将第一级第一特征图F1再经过至少一级卷积神经网络进行特征提取,得到第二级第一特征图F2。其中,第一级第一特征图F1与第二级第一特征图F2的尺寸可以相同,也可以不同。
再将第二级第一特征图F2输入至第一关键点检测网络41中,通过至少一级卷积神经网络对第二级第一特征图F2进行特征提取,得到特征图F3,将特征图F3进行至少一级卷积处理得到特征图F41,并将特征图F3进行下采样处理或者卷积处理得到特征图F42,其中,特征图F41与特征图F42的尺寸存在比例关系,比如,特征图F41的尺寸与特征图F42的尺寸之间的比例可以为2:1。
再分别将特征图F41、以及特征图F42进行至少一级卷积处理,得到对应的卷积特征图F51、以及卷积特征图F52;其中,卷积特征图F51的尺寸可以与特征图F41的尺寸相同,以及卷积特征图F52的尺寸可以与特征图F42的尺寸相同。
再将卷积特征图F51和卷积特征图F52进行多种融合处理,得到特征图F61、特征图F62、特征图F63,其中,特征图F61的尺寸可以与卷积特征图F51的尺寸相同,特征图F62的尺寸可以与卷积特征图F52的尺寸相同;特征图F61、特征图F62、以及特征图F63之间的尺寸比例可以为4:2:1。具体的,多种融合处理的过程可以为:调 整卷积特征图F52的尺寸,使得调整后的卷积特征图F52的尺寸与卷积特征图F51的尺寸相同,将卷积特征图F51与尺寸调整后的卷积特征图F52进行特征融合处理,得到特征图F61;调整卷积特征图F51的尺寸,使得调整后的卷积特征图F51的尺寸与卷积特征图F52的尺寸相同,将卷积特征图F52与尺寸调整后的卷积特征图F51进行特征融合处理,得到特征图F62;调整卷积特征图F51和卷积特征图F52的尺寸,使得调整后的卷积特征图F51以及卷积特征图F52的尺寸为预设尺寸(即,特征图F63对应的尺寸),将尺寸调整后的卷积特征图F51与卷积特征图F52进行特征融合处理,得到特征图F63。
其中,对特征图的尺寸进行调整的方式包括但不限于上采样处理方式、下采样处理方式、卷积处理方式等;特征融合处理过程可以为将特征图以级联的方式融合,或者将特征图通过卷积神经网络进行融合,或者将特征图级联之后输入至卷积神经网络中进行融合等。这里,特征图尺寸调整的方式以及特征融合处理的方式有多种,此处不进行具体限定。
这里,通过特征图F61、特征图F62、以及特征图F63,得到卷积特征图F71、卷积特征图F72、以及卷积特征图F73的过程,可参考得到卷积特征图F51、卷积特征图F52的过程,此处不再赘述。通过对卷积特征图F71、卷积特征图F72、卷积特征图F73进行多种融合处理,得到特征图F81、特征图F82、特征图F83、以及特征图F84的过程,可参考得到特征图F61、特征图F62、以及特征图F63的过程,此处不再进行赘述。
最后将特征图F81、特征图F82、特征图F83、以及特征图F84分别进行至少一级卷积处理,得到对应的中间特征图,再将中间特征图进行特征融合处理,得到融合特征图,最后基于融合特征图,确定关键位置信息,关键位置信息中包括第一关键点信息、以及面部对应的检测框的位置点信息、手部对应的检测框的位置点信息。
进一步的,可以基于手部对应的检测框的位置点信息,从第一级第一特征图F1与第二级第一特征图F2中,分别得到手部对应的第一级第二特征图F12与第二级第二特征图F22,将手部对应的第一级第二特征图F12与第二级第二特征图F22输入至手部第二关键点检测网络43中进行处理,得到手部的第二关键点信息。其中,手部第二关键点检测网络43的处理过程可参考第一关键点检测网络41的处理过程,此处不再进行赘述。
同时,可以基于面部对应的检测框的位置点信息,从第一级第一特征图F1与第二级第一特征图F2中,分别得到面部对应的第一级第二特征图F13与第二级第二特征图F23,将面部对应的第一级第二特征图F13与第二级第二特征图F23输入至面部第二关键点检测网络42中进行处理,得到面部的第二关键点信息。其中,面部第二关键点检测网络42的处理过程可参考第一关键点检测网络41的处理过程,此处不再进行赘述。
这里,第一关键点检测网络41、面部第二关键点检测网络42、以及手部第二关键点检测网络43的结构仅为示例性说明。
针对步骤S104:
这里,目标对象的对象关键点信息包括第一关键点信息以及每个目标部位对应的第二关键点信息。
示例性的,第一关键点的数量范围可以为5~25;在目标部位包括面部的情况下,面部中包括的脸部轮廓上的第二关键点的数量范围可以为0~25,每个眼睛上的第二关键点的数量范围可以为0~10,每个眉毛上的第二关键点的数量范围可以为0~10,鼻子上的第二关键点的数量范围可以为0~15,嘴唇上的第二关键点的数量范围可以为0~15;在目标部位包括脚部的情况下,脚部包括左脚和/或右脚,任一脚部的第二关键点的数量范围可以为1~10;在目标部位包括手部的情况下,手部包括左手和/或右手,任一手部的第二关键点的数量范围可以为1~25。
这里,第一关键点的数量和各目标部位对应的第二关键点的数量可以根据实际检测场景和对于检测精度的需求进行确定。以下仅为示例性说明,第一关键点的数量可以为15个,可以分布在人体的四肢关节位置以及头部轮廓上。在目标部位包括面部的情况下,面部对应的第二关键点的数量可以为6个,可以分布于面部的五官上,即分布于面部的双眼、双眉、鼻子以及嘴唇上。在目标部位包括脚部的情况下,脚部包括左脚和/或右脚,任一脚部的第二关键点的数量可以为2个,分布于脚跟以及中脚趾上;即左脚的第二关键点的数量可以为2个,和/或右脚的第二关键点的数量可以为2个。在目标部位包括手部的情况下,手部包括左手和/或右手,任一手部上的第二关键点的数量可以为6个,分布于手掌中心位置以及每根手指的指端上;即左手的第二关键点的数量可以为6个,和/或右手的第二关键点的数量可以为6个。
一种可选实施方式中,该方法还包括:
基于确定的对象关键点信息,确定目标对象的动作类别信息,或者,基于确定的对象关键点信息,构建目标对象的三维模型。
示例性的,在确定目标对象的对象关键点信息之后,可以将对象关键点信息输入至动作检测神经网络中,得到该目标对象的动作类别信息。或者,可以将对象关键点信息输入至三维模型构建软件中,构建目标对象的三维模型。
在基于上述实施方式较准确地检测得到对象关键点信息后,应用该对象关键点信息就可以较准确地确定目标对象的动作类别信息或构建目标对象的三维模型。
一种可选实施方式中,在目标部位包括面部时,该方法还包括:
基于确定的对象关键点信息,确定目标对象的面部表情类别。
示例性的,可以将确定的对象关键点信息输入至面部识别神经网络中,识别得到目标对象的面部表情类别。
在目标部位包括手部时,该方法还包括:
基于确定的对象关键点信息,确定目标对象的手势以及手势对应的类别。
示例性的,可以将确定的对象关键点信息输入至手势识别神经网络中,识别得 到目标对象的手势以及手势类别。
在基于上述实施方式较准确地检测得到对象关键点信息后,应用该对象关键点信息就可以较准确地确定目标对象的面部表情类别和/或确定目标对象的手势以及手势类别。
本领域技术人员可以理解,在上述方法中,各步骤的顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当由其功能和可能的内在逻辑确定。
基于相同的构思,本公开实施例还提供了一种关键点检测的装置,参见图5所示,为本公开实施例提供的关键点检测的装置的架构示意图,所述装置包括图像确定模块501、第一检测模块502、第二检测模块503、关键点确定模块504、确定模块505、构建模块506、表情识别模块507、手势识别模块508,具体的:
图像确定模块501,用于确定包括目标对象的目标图像;
第一检测模块502,用于对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息;所述关键位置信息包括所述目标对象上的第一关键点信息,以及所述目标对象的至少一个目标部位对应的检测框的位置点信息;
第二检测模块503,用于基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息;
关键点确定模块504,用于基于所述第一关键点信息和所述第二关键点信息,确定所述目标对象的对象关键点信息。
一种可能的实施方式中,所述第一检测模块502,在对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息时,用于:
对所述目标图像进行第一卷积处理,得到第一特征图;
基于所述第一特征图,确定所述目标对象的关键位置信息;
所述第二检测模块503,在基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息时,用于:
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图;
基于所述第二特征图,确定每个所述目标部位对应的第二关键点信息。
一种可能的实施方式中,所述第一检测模块502和所述第二检测模块503,分别用于根据以下步骤对目标特征图进行卷积处理,确定目标关键点信息,其中,在所述目标特征图为所述第一特征图的情况下,所述目标关键点信息为所述目标对象的关键位置信息,且由所述第一检测模块502执行以下步骤;在所述目标特征图为所述第二特征图 的情况下,所述目标关键点信息为所述目标部位对应的第二关键点信息,且由所述第二检测模块503执行以下步骤:
对所述目标特征图进行多次特征处理,生成尺寸不同的多个中间特征图;
将所述多个中间特征图进行融合处理,得到融合特征图;
基于所述融合特征图,确定所述目标关键点信息。
一种可能的实施方式中,所述第一检测模块502和所述第二检测模块503,在对所述目标特征图进行多次特征处理时,分别用于:根据以下步骤进行当前次特征处理:
针对进行当前次特征处理前的不同尺寸的特征图,分别进行至少一级卷积处理,得到不同尺寸的卷积特征图;
将所述不同尺寸的卷积特征图进行多种融合处理,得到当前次特征处理后的不同尺寸的特征图。
一种可能的实施方式中,所述第一特征图包括多级第一特征图,不同级第一特征图为经过不同级卷积处理得到的,所述第二检测模块503,在基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图时,用于:
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述多级第一特征图中的不同级第一特征图中,分别截取与每个所述目标部位对应的第二特征图。
一种可能的实施方式中,所述目标对象包括人物,第一关键点可以至少分布在所述人物的四肢、头部上;
所述第一关键点的数量范围为5~25。
一种可能的实施方式中,所述目标部位可以包括人物的面部、脚部、手部中的至少一种;
在所述目标部位包括面部的情况下,面部对应的第二关键点至少分布在所述面部的脸部轮廓、眼睛、眉毛、鼻子、和嘴唇中的至少一个区域;
在所述目标部位包括脚部的情况下,脚部对应的第二关键点至少分布在所述脚部的至少一根脚趾、脚心、以及脚跟中的至少一个区域;
在所述目标部位包括手部的情况下,手部对应的第二关键点至少分布在所述手部的至少一根手指、以及手心中的至少一个区域。
一种可能的实施方式中,在所述目标部位包括面部的情况下,所述脸部轮廓上的第二关键点的数量范围为0~25,每个所述眼睛上的第二关键点的数量范围为0~10,每个所述眉毛上的第二关键点的数量范围为0~10,所述鼻子上的第二关键点的数量范围为0~15,所述嘴唇上的第二关键点的数量范围为0~15;
在所述目标部位包括脚部的情况下,所述脚部包括左脚和/或右脚;任一所述脚部的第二关键点的数量范围为1~10;
在所述目标部位包括手部的情况下,所述手部包括左手和/或右手;任一所述手部的第二关键点的数量范围为1~25。
一种可能的实施方式中,所述装置还包括:
确定模块505,用于基于确定的所述对象关键点信息,确定所述目标对象的动作类别信息;
构建模块506,用于基于确定的所述对象关键点信息,构建所述目标对象的三维模型。
一种可能的实施方式中,所述装置还包括:
表情识别模块507,用于基于确定的所述对象关键点信息,确定所述目标对象的面部表情类别;
所述装置还包括:
手势识别模块508,用于基于确定的所述对象关键点信息,确定所述目标对象的手势以及所述手势对应的类别。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模板可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
基于同一技术构思,本公开实施例还提供了一种电子设备。参照图6所示,为本公开实施例提供的电子设备600的结构示意图,电子设备600包括处理器601、存储器602、和总线603。其中,存储器602用于存储处理器601可执行的机器可读指令,包括内存6021和外部存储器6022;这里的内存6021也称内存储器,用于暂时存放处理器601中的运算数据,以及与硬盘等外部存储器6022交换的数据,处理器601通过内存6021与外部存储器6022进行数据交换,当电子设备600运行时,处理器601与存储器602之间通过总线603通信,所述机器可读指令被处理器601执行时,使得处理器601执行以下步骤:
确定包括目标对象的目标图像;
对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息;所述关键位置信息包括所述目标对象上的第一关键点信息,以及所述目标对象的至少一个目标部位对应的检测框的位置点信息;
基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息;
基于所述第一关键点信息和所述第二关键点信息,确定所述目标对象的对象关 键点信息。
此外,本公开实施例还提供一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时,使所述处理器执行上述方法实施例中所述的关键点检测的方法。
本公开实施例还提供一种计算机程序,所述计算机程序被处理器执行时,使所述处理器执行上述方法实施例中所述的关键点检测的方法。
本公开实施例所提供的关键点检测的方法的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行上述方法实施例中所述的关键点检测的方法,具体可参见上述方法实施例,在此不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。在本公开所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,又例如,多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些通信接口,装置或模块之间的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本公开实施例的目的。
另外,在本公开各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个处理器可执行的非易失性的计算机可读存储介质中。基于这样的理解,本公开实施例本质上或者说本公开实施例的全部或部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (14)

  1. 一种关键点检测的方法,包括:
    确定包括目标对象的目标图像;
    对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息;所述关键位置信息包括所述目标对象上的第一关键点信息,以及所述目标对象的至少一个目标部位对应的检测框的位置点信息;
    基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息;
    基于所述第一关键点信息和所述第二关键点信息,确定所述目标对象的对象关键点信息。
  2. 根据权利要求1所述的方法,其中,所述对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息,包括:
    对所述目标图像进行第一卷积处理,得到第一特征图;
    基于所述第一特征图,确定所述目标对象的关键位置信息;
    所述基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息,包括:
    基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图;
    基于所述第二特征图,确定每个所述目标部位对应的第二关键点信息。
  3. 根据权利要求2所述的方法,其中,根据以下步骤对目标特征图进行卷积处理,确定目标关键点信息,其中,在所述目标特征图为所述第一特征图的情况下,所述目标关键点信息为所述目标对象的关键位置信息,在所述目标特征图为所述第二特征图的情况下,所述目标关键点信息为所述目标部位对应的第二关键点信息:
    对所述目标特征图进行多次特征处理,生成尺寸不同的多个中间特征图;
    将所述多个中间特征图进行融合处理,得到融合特征图;
    基于所述融合特征图,确定所述目标关键点信息。
  4. 根据权利要求3所述的方法,其中,所述对所述目标特征图进行多次特征处理,包括:根据以下步骤进行当前次特征处理:
    针对进行当前次特征处理前的不同尺寸的特征图,分别进行至少一级卷积处理,得 到不同尺寸的卷积特征图;
    将所述不同尺寸的卷积特征图进行多种融合处理,得到当前次特征处理后的不同尺寸的特征图。
  5. 根据权利要求2至4任一所述的方法,其中,所述第一特征图包括多级第一特征图,不同级第一特征图为经过不同级卷积处理得到的,所述基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述第一特征图中截取与每个所述目标部位对应的第二特征图,包括:
    基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,从所述多级第一特征图中的不同级第一特征图中,分别截取与每个所述目标部位对应的第二特征图。
  6. 根据权利要求1至5任一所述的方法,其中,所述目标对象包括人物,第一关键点至少分布在所述人物的四肢、头部上;
    所述第一关键点的数量范围为5~25。
  7. 根据权利要求1至6任一所述的方法,其中,所述目标部位包括人物的面部、脚部、手部中的至少一种;
    在所述目标部位包括面部的情况下,面部对应的第二关键点至少分布在所述面部的脸部轮廓、眼睛、眉毛、鼻子、和嘴唇中的至少一个区域;
    在所述目标部位包括脚部的情况下,脚部对应的第二关键点至少分布在所述脚部的至少一根脚趾、脚心、以及脚跟中的至少一个区域;
    在所述目标部位包括手部的情况下,手部对应的第二关键点至少分布在所述手部的至少一根手指、以及手心中的至少一个区域。
  8. 根据权利要求7所述的方法,其中,
    在所述目标部位包括面部的情况下,所述脸部轮廓上的第二关键点的数量范围为0~25,每个所述眼睛上的第二关键点的数量范围为0~10,每个所述眉毛上的第二关键点的数量范围为0~10,所述鼻子上的第二关键点的数量范围为0~15,所述嘴唇上的第二关键点的数量范围为0~15;
    在所述目标部位包括脚部的情况下,所述脚部包括左脚和/或右脚;任一所述脚部的第二关键点的数量范围为1~10;
    在所述目标部位包括手部的情况下,所述手部包括左手和/或右手;任一所述手部的第二关键点的数量范围为1~25。
  9. 根据权利要求1-8任一所述的方法,所述方法还包括:
    基于确定的所述对象关键点信息,确定所述目标对象的动作类别信息,或者,基于 确定的所述对象关键点信息,构建所述目标对象的三维模型。
  10. 根据权利要求1-8任一所述的方法,其中,在所述目标部位包括面部时,所述方法还包括:基于确定的所述对象关键点信息,确定所述目标对象的面部表情类别;
    在所述目标部位包括手部时,所述方法还包括:基于确定的所述对象关键点信息,确定所述目标对象的手势以及所述手势对应的类别。
  11. 一种关键点检测的装置,包括:
    图像确定模块,用于确定包括目标对象的目标图像;
    第一检测模块,用于对所述目标图像进行第一关键点检测,得到所述目标对象的关键位置信息;所述关键位置信息包括所述目标对象上的第一关键点信息,以及所述目标对象的至少一个目标部位对应的检测框的位置点信息;
    第二检测模块,用于基于所述目标对象的每个所述目标部位对应的检测框的位置点信息,对每个所述目标部位在所述目标图像中的图像区域进行第二关键点检测,得到每个所述目标部位对应的第二关键点信息;
    关键点确定模块,用于基于所述第一关键点信息和所述第二关键点信息,确定所述目标对象的对象关键点信息。
  12. 一种电子设备,包括:处理器、存储器和总线,所述存储器存储有所述处理器可执行的机器可读指令,当所述电子设备运行时,所述处理器与所述存储器之间通过所述总线通信,所述机器可读指令被所述处理器执行时,使所述处理器执行如权利要求1至10任一所述的关键点检测的方法。
  13. 一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,该计算机程序被处理器执行时,使所述处理器执行如权利要求1至10任一所述的关键点检测的方法。
  14. 一种计算机程序,所述计算机程序被处理器执行时,使所述处理器执行如权利要求1至10任一所述的关键点检测的方法。
PCT/CN2020/135394 2020-03-30 2020-12-10 关键点检测的方法和装置、电子设备、存储介质及计算机程序 WO2021196718A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022524649A JP2022553990A (ja) 2020-03-30 2020-12-10 キーポイントの検出方法および装置、電子設備、記憶媒体、およびコンピュータプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010239542.XA CN111444928A (zh) 2020-03-30 2020-03-30 关键点检测的方法、装置、电子设备及存储介质
CN202010239542.X 2020-03-30

Publications (1)

Publication Number Publication Date
WO2021196718A1 true WO2021196718A1 (zh) 2021-10-07

Family

ID=71650855

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/135394 WO2021196718A1 (zh) 2020-03-30 2020-12-10 关键点检测的方法和装置、电子设备、存储介质及计算机程序

Country Status (4)

Country Link
JP (1) JP2022553990A (zh)
CN (1) CN111444928A (zh)
TW (1) TWI763205B (zh)
WO (1) WO2021196718A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444928A (zh) * 2020-03-30 2020-07-24 北京市商汤科技开发有限公司 关键点检测的方法、装置、电子设备及存储介质
CN112819885A (zh) * 2021-02-20 2021-05-18 深圳市英威诺科技有限公司 基于深度学习的动物识别方法、装置、设备及存储介质
CN113763440A (zh) * 2021-04-26 2021-12-07 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229293A (zh) * 2017-08-09 2018-06-29 北京市商汤科技开发有限公司 人脸图像处理方法、装置和电子设备
CN108229492A (zh) * 2017-03-29 2018-06-29 北京市商汤科技开发有限公司 提取特征的方法、装置及系统
CN110222685A (zh) * 2019-05-16 2019-09-10 华中科技大学 一种基于两阶段的服装关键点定位方法和系统
CN111444928A (zh) * 2020-03-30 2020-07-24 北京市商汤科技开发有限公司 关键点检测的方法、装置、电子设备及存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6573354B2 (ja) * 2014-11-28 2019-09-11 キヤノン株式会社 画像処理装置、画像処理方法、及びプログラム
CN207037676U (zh) * 2017-03-02 2018-02-23 北京旷视科技有限公司 人脸图像采集装置
CN109359568A (zh) * 2018-09-30 2019-02-19 南京理工大学 一种基于图卷积网络的人体关键点检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229492A (zh) * 2017-03-29 2018-06-29 北京市商汤科技开发有限公司 提取特征的方法、装置及系统
CN108229293A (zh) * 2017-08-09 2018-06-29 北京市商汤科技开发有限公司 人脸图像处理方法、装置和电子设备
CN110222685A (zh) * 2019-05-16 2019-09-10 华中科技大学 一种基于两阶段的服装关键点定位方法和系统
CN111444928A (zh) * 2020-03-30 2020-07-24 北京市商汤科技开发有限公司 关键点检测的方法、装置、电子设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KE SUN; BIN XIAO; DONG LIU; JINGDONG WANG: "Deep High-Resolution Representation Learning for Human Pose Estimation", ARXIV.ORG, 25 February 2019 (2019-02-25), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081032951 *

Also Published As

Publication number Publication date
TW202137053A (zh) 2021-10-01
JP2022553990A (ja) 2022-12-27
CN111444928A (zh) 2020-07-24
TWI763205B (zh) 2022-05-01

Similar Documents

Publication Publication Date Title
KR102523512B1 (ko) 얼굴 모델의 생성
WO2021196718A1 (zh) 关键点检测的方法和装置、电子设备、存储介质及计算机程序
WO2020207191A1 (zh) 虚拟物体被遮挡的区域确定方法、装置及终端设备
US20220004758A1 (en) Eye pose identification using eye features
WO2018028546A1 (zh) 一种关键点的定位方法及终端、计算机存储介质
US10817705B2 (en) Method, apparatus, and system for resource transfer
JP7476428B2 (ja) 画像の視線補正方法、装置、電子機器、コンピュータ可読記憶媒体及びコンピュータプログラム
WO2021169637A1 (zh) 图像识别方法、装置、计算机设备及存储介质
JP2021527877A (ja) 3次元人体姿勢情報の検出方法および装置、電子機器、記憶媒体
CN109635752B (zh) 人脸关键点的定位方法、人脸图像处理方法和相关装置
US10990170B2 (en) Eye tracking method, electronic device, and non-transitory computer readable storage medium
WO2022161301A1 (zh) 图像生成方法、装置、计算机设备及计算机可读存储介质
CN111652123B (zh) 图像处理和图像合成方法、装置和存储介质
US9213897B2 (en) Image processing device and method
US11574416B2 (en) Generating body pose information
US20200275017A1 (en) Tracking system and method thereof
CN111815768B (zh) 三维人脸重建方法和装置
CN112699857A (zh) 基于人脸姿态的活体验证方法、装置及电子设备
CN110533775B (zh) 一种基于3d人脸的眼镜匹配方法、装置及终端
US20100014760A1 (en) Information Extracting Method, Registration Device, Verification Device, and Program
CN112348069B (zh) 数据增强方法、装置、计算机可读存储介质及终端设备
CN113642354B (zh) 人脸姿态的确定方法、计算机设备、计算机可读存储介质
US20240161375A1 (en) System and method to display profile information in a virtual environment
WO2021087692A1 (zh) 散斑图像匹配方法、装置及存储介质
WO2023249694A1 (en) Object detection and tracking in extended reality devices

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928454

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022524649

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928454

Country of ref document: EP

Kind code of ref document: A1