WO2022206639A1 - 人体关键点检测方法及相关装置 - Google Patents

人体关键点检测方法及相关装置 Download PDF

Info

Publication number
WO2022206639A1
WO2022206639A1 PCT/CN2022/083227 CN2022083227W WO2022206639A1 WO 2022206639 A1 WO2022206639 A1 WO 2022206639A1 CN 2022083227 W CN2022083227 W CN 2022083227W WO 2022206639 A1 WO2022206639 A1 WO 2022206639A1
Authority
WO
WIPO (PCT)
Prior art keywords
key points
user
electronic device
image
angle
Prior art date
Application number
PCT/CN2022/083227
Other languages
English (en)
French (fr)
Inventor
赵杰
马春晖
黄磊
刘小蒙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22778821.3A priority Critical patent/EP4307220A4/en
Publication of WO2022206639A1 publication Critical patent/WO2022206639A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • G06V10/7515Shifting the patterns to accommodate for positional errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the field of terminal technology, and in particular, to a method for detecting key points of a human body and a related device.
  • Human keypoint detection is the foundation of many computer vision tasks. By detecting the three-dimensional (3-dimensions, 3D) key points of the human body, posture detection, action classification, intelligent fitness, and somatosensory games can be realized.
  • Electronic devices can capture images through cameras and identify 2-dimensions (2D) key points of the human body in the images. Based on 2D keypoints, electronic devices can estimate 3D keypoints of the human body using techniques such as deep learning.
  • 2D keypoints Based on 2D keypoints, electronic devices can estimate 3D keypoints of the human body using techniques such as deep learning.
  • the images collected by the camera will have different degrees of perspective deformation. Image perspective distortion will cause errors in the detected 2D keypoints. Then the 3D key points estimated by the above 2D key points will also have errors.
  • an electronic device can detect the position of the user from the camera through a plurality of cameras, so as to determine the degree of perspective deformation of the character in the image. Further, the electronic device can correct the 3D key points according to the degree of perspective deformation of the character in the image.
  • the above method requires multiple cameras and needs to design the placement positions of the multiple cameras to improve the detection accuracy, which is not only costly, but also has high computational complexity in determining 3D key points.
  • the present application provides a human body key point detection method and related device.
  • the method can be applied to electronic devices equipped with one or more cameras.
  • the electronics can identify 3D key points of the user in the image captured by the camera.
  • the electronic device can detect whether the user's posture matches the preset posture according to the above-mentioned 3D key points.
  • the electronic device can calculate the angle between the human body model determined by the set of 3D key points and the image plane.
  • the electronic device can use the included angle to correct the position information of the 3D key points.
  • the method can save costs, reduce the error caused by the image perspective deformation to the position information of the 3D key points, and improve the accuracy of the position information detection of the 3D key points.
  • the present application provides a method for detecting human key points, which can be applied to an electronic device including one or more cameras.
  • the electronic device may acquire the first image of the first user through the camera.
  • the electronic device may determine the first set of 3D key points of the first user according to the first image.
  • the electronic device may determine whether a plurality of 3D key points in the first group of 3D key points satisfy the first condition. If the first condition is satisfied, the electronic device may determine the first compensation angle according to the plurality of 3D key points. Rotation correction is performed on the first set of 3D keypoints using the first compensation angle.
  • the electronic device may use the second compensation angle to perform rotation correction on the first group of 3D key points.
  • the second compensation angle is determined from the second set of 3D keypoints.
  • the second set of 3D key points is the most recent set of 3D key points that satisfy the first condition before the first image is acquired.
  • the method for the electronic device to obtain the first image of the first user through the camera may be: the electronic device may determine the first time according to the first multimedia information, and the first time is the first time.
  • the media information indicates the moment when the user performs an action that satisfies the first condition.
  • the electronic device may acquire the first image of the first user through the camera within the first time period starting from the first moment.
  • the above-mentioned first time period may be a time period during which the first multimedia information instructs the user to perform an action that satisfies the first condition. For example, the time required to complete the above-mentioned action satisfying the first condition is 1 second. Then the above-mentioned first time period is 1 second from the first moment.
  • the above-mentioned first time period may be a fixed time period.
  • the electronic device can use the third compensation angle to Rotation correction is performed on the 3D key points determined by the images acquired in the second time period.
  • the third compensation angle is determined according to the third set of 3D key points.
  • the third group of 3D key points is the most recent group of 3D key points that satisfy the first condition before the second moment.
  • the above-mentioned second time period may be a time period in which the first multimedia information instructs the user to perform an action that does not satisfy the first condition. For example, the time required to complete the above-mentioned action satisfying the first condition is 1 second. Then the above-mentioned second time period is 1 second from the second moment.
  • the above-mentioned second time period may be a time period of a fixed time length.
  • the method for the electronic device to determine whether multiple 3D key points in the first group of 3D key points satisfy the first condition may be: Whether each 3D key point matches the 3D key point corresponding to the first action, where the first action is at least one upright action of the upper body and the legs.
  • the first compensation angle may be the included angle between the line between the neck point and the chest and abdomen point in the first group of 3D key points and the image plane.
  • the first compensation angle is the angle between the straight line where any two 3D key points of the first group of 3D key points, the hip point, the knee point, and the ankle point are located and the image plane.
  • the plurality of 3D keypoints in the first set of 3D keypoints include hip points, knee points, ankle points.
  • the method for the electronic device to determine whether the plurality of 3D key points in the first group of 3D key points satisfy the first condition may be as follows: the electronic device may calculate the straight line between the left hip point and the left knee point in the first group of 3D key points, The first angle between the point and the line where the left foot point is located, and the second angle between the line where the right hip point and the right knee point and the line where the right knee point and the right foot point are located in the first group of 3D key points. angle.
  • the electronic device can determine the first group of 3D key points by detecting whether the difference between the first included angle and 180° is smaller than the first difference, and whether the difference between the second included angle and 180° is smaller than the first difference. Whether multiple 3D keypoints in satisfies the first condition.
  • the situation where the multiple 3D key points in the first group of 3D key points satisfy the first condition includes: the difference between the first included angle and 180° is smaller than the first difference and/or the second included angle and 180° The difference between is smaller than the first difference.
  • the electronic device when detecting that the user's 3D key points meet the first condition, can use the 3D key points determined based on the image to determine the degree of perspective deformation of the character in the image. If the above-mentioned 3D key points of the user satisfy the first condition, the electronic device may detect that the user performs an upright upper body and/or an upright leg action. That is to say, when the upper body of the user is upright and/or the legs are upright, the electronic device can use the 3D key points determined based on the image to determine the degree of perspective deformation of the character in the image.
  • the electronic device can correct the position information of the 3D key points determined based on the image, reduce the error caused by the image perspective deformation on the position information of the 3D key points, and improve the accuracy of the position information detection of the 3D key points.
  • This method can only need one camera, which not only saves cost, but also has low computational complexity for key point detection.
  • the electronic device can update the compensation angle.
  • the updated compensation angle can more accurately reflect the degree of perspective deformation of the character in the image captured by the camera at the current position of the user. This can reduce the influence of the position change between the user and the camera on the correction of the position information of the 3D key points, and improve the accuracy of the position information of the 3D key points after correction. That is to say, the electronic device can use the updated compensation angle to correct the position information of the 3D key points.
  • the human body model determined by the corrected 3D key points can more accurately reflect the user's posture and improve the accuracy of posture detection.
  • the electronic device can more accurately determine whether the user's posture is correct and whether the range of the user's actions meets the requirements, so that the user has a better experience when playing fitness or somatosensory games.
  • the second compensation angle is an included angle between a line where the upper body 3D key points and/or the leg 3D key points in the second group of 3D key points are located and the image plane.
  • the third compensation angle is the included angle between the straight line where the 3D key points of the upper body and/or the 3D key points of the legs are located in the third group of 3D key points and the image plane.
  • the second compensation angle is the sum of the first compensation angle change and the third included angle
  • the third included angle is the upper body 3D key point and/or the leg in the second group of 3D key points
  • the first compensation angle change is determined according to the first height H1, the first distance Y1, the second height H2 and the second distance Y2, where H1 and Y1 are respectively is the height of the first user and the distance between the first user and the camera when the first image is collected, H2 and Y2 are the height of the first user and the distance between the first user and the camera when the second image is collected;
  • the key points in the image are the first group of 3D key points
  • the key points of the first user in the second image are the second group of 3D key points.
  • the third compensation angle is the sum of the variation of the second compensation angle and the fourth included angle; the fourth included angle is the folder between the upper body 3D key points and/or the leg 3D key points in the third group of 3D key points and the image plane.
  • angle; the second compensation angle variation is determined according to the third height H3, the third distance Y3, the fourth height H4 and the fourth distance Y4, wherein, H3 and Y3 are the first user's Height, the distance between the first user and the camera, H4 and Y4 are the height of the first user and the distance between the first user and the camera when the third image is collected; the key points of the first user in the third image are the third group 3D key point.
  • the above-mentioned first compensation angle transformation amount and second compensation angle change amount are both determined by the first model, and the first model is obtained by training multiple sets of training samples.
  • the electronic device can determine the degree of perspective deformation of the characters in the image in real time, and correct the position information of the 3D key points, thereby improving the 3D key points. detection accuracy.
  • the electronic device may The first compensation angle is determined according to the plurality of 3D key points. If it is determined that the angle between the straight line where the upper body 3D key points and/or the leg 3D key points are located in the first group of 3D key points and the plane image is greater than the fifth angle, the electronic device can use the second compensation angle to adjust the first group of 3D key points. point for rotation correction.
  • the fifth included angle can be set according to empirical values. This embodiment of the present application does not limit this.
  • the electronic device can determine whether the above-mentioned calculated angle can be used as a compensation angle for correcting the position information of the 3D key points by setting the fifth included angle, so as to avoid that the compensation angle is impossible due to the above-mentioned miscalculation of the included angle.
  • the value of improves the accuracy of detecting 3D key points.
  • the third compensation angle is the sum of the variation of the second compensation angle and the fourth included angle; the fourth included angle is the folder between the upper body 3D key points and/or the leg 3D key points in the third group of 3D key points and the image plane. angle; the second compensation angle variation is determined according to the third height H3, the third distance Y3, the fourth height H4 and the fourth distance Y4, wherein, H3 and Y3 are the first user's Height, the distance between the first user and the camera, H4 and Y4 are the height of the first user and the distance between the first user and the camera when the third image is collected; the key points of the first user in the third image are the third group 3D key point;
  • the included angle between the line where the upper body 3D key point and/or the leg 3D key point is located and the image plane is smaller than the fifth included angle
  • the present application provides an electronic device, the electronic device includes a camera, a display screen, a memory, and a processor, wherein: the camera can be used to capture images, the memory can be used to store a computer program, and the processor can be used to call the computer program, so that the electronic The device performs any one of the possible implementation manners of the above-mentioned first aspect.
  • the present application provides a computer storage medium, including instructions, which, when the above-mentioned instructions are executed on an electronic device, cause the above-mentioned electronic device to execute any one of the possible implementation manners of the above-mentioned first aspect.
  • an embodiment of the present application provides a chip, the chip is applied to an electronic device, the chip includes one or more processors, and the processor is configured to invoke computer instructions to cause the electronic device to execute any one of the first aspects above a possible implementation.
  • an embodiment of the present application provides a computer program product containing instructions, which, when the computer program product is run on a device, enables the electronic device to execute any one of the possible implementations of the first aspect.
  • the electronic device provided in the second aspect, the computer storage medium provided in the third aspect, the chip provided in the fourth aspect, and the computer program product provided in the fifth aspect are all used to execute the methods provided by the embodiments of the present application. Therefore, for the beneficial effects that can be achieved, reference may be made to the beneficial effects in the corresponding method, which will not be repeated here.
  • Fig. 1 is the position distribution diagram of a kind of human body key point provided by the embodiment of the present application
  • 2A is a schematic diagram of a 2D coordinate system for determining position information of 2D key points provided by an embodiment of the present application
  • 2B is a schematic diagram of a 3D coordinate system for determining position information of 3D key points provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of a scene for detecting human body key points according to an embodiment of the present application.
  • 4A and 4B are schematic diagrams of 3D key points detected by the electronic device 100 provided by the embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • FIGS. 6A to 6D are schematic diagrams of some human body key point detection scenarios provided by embodiments of the present application.
  • FIGS. 7A and 7B are schematic diagrams of 3D key points detected by the electronic device 100 according to an embodiment of the present application.
  • FIG. 8 is a flowchart of a method for detecting key points of a human body provided by an embodiment of the present application.
  • FIGS. 9A to 9C are schematic diagrams of some human body key point detection scenarios provided by embodiments of the present application.
  • FIG. 10 is a flowchart of another human body key point detection method provided by an embodiment of the present application.
  • FIG. 11 is a flowchart of another method for detecting human body key points provided by an embodiment of the present application.
  • FIG. 12 is a position distribution diagram of another human body key point provided by an embodiment of the present application.
  • FIG. 13 is a flowchart of another method for detecting human key points provided by an embodiment of the present application.
  • first and second are only used for descriptive purposes, and should not be construed as implying or implying relative importance or implying the number of indicated technical features. Therefore, the features defined as “first” and “second” may explicitly or implicitly include one or more of the features. In the description of the embodiments of the present application, unless otherwise specified, the “multiple” The meaning is two or more.
  • FIG. 1 exemplarily shows a position distribution diagram of key points of the human body.
  • the key points of the human body may include: head point, neck point, left shoulder point, right shoulder point, right elbow point, left elbow point, right hand point, left hand point, right hip point, left hip point, Left and right hip mid point, right knee point, left knee point, right ankle point, left ankle point.
  • the embodiments of the present application may also include other key points, which are not specifically limited here.
  • the 2D key points in the embodiments of the present application may represent key points distributed on a 2D plane.
  • the electronic device can capture an image of the user through a camera, and identify the 2D key points of the user in the image.
  • the above-mentioned 2D plane may be the image plane where the image collected by the camera is located.
  • the electronic device identifying the user's 2D key points in the image may specifically be determining the position information of the user's key points on the 2D plane.
  • the position information of 2D key points can be represented by two-dimensional coordinates in the 2D plane.
  • the position information of each 2D key point may be the position information of the above-mentioned 2D key points with one 2D key point as a reference point. For example, taking the midpoint of the left and right hips as the reference point, the position information of the midpoint of the left and right hips on the 2D plane may be the coordinates (0, 0). Then, the electronic device can determine the position information of the other 2D key points according to the relative positions of the other 2D key points and the middle point of the left and right hips.
  • the electronic device may establish a 2D coordinate system x_i-y_i as shown in FIG. 2A based on the image plane.
  • the 2D coordinate system can take a vertex of the image collected by the camera as the origin, the horizontal direction of the object in the image is the direction of the x_i axis of the 2D coordinate system, and the vertical direction of the object in the image is the 2D coordinate system.
  • the location information of each 2D key point may be the two-dimensional coordinates of the user's key point in the 2D coordinate system.
  • the embodiments of the present application do not limit the method for determining the location information of the above-mentioned 2D key points.
  • the electronics can identify a set of 2D keypoints from a frame of image captured by the camera. This set of 2D keypoints may include all keypoints of the human body shown in FIG. 1 . A set of 2D keypoints can be used to define a human body model on the 2D plane.
  • the 3D key points in the embodiments of the present application may represent key points distributed in a 3D space.
  • the electronic device can estimate the user's 3D keypoints using techniques such as deep learning.
  • the above-mentioned 3D space may be the 3D space in which the camera of the electronic device is located.
  • the electronic device determining the user's 3D key points may specifically be determining the position information of the user's key points in the 3D space.
  • the location information of 3D key points can be represented by three-dimensional coordinates in 3D space.
  • the location information of 3D keypoints contains the depth information of the user's keypoints. That is, the location information of the 3D key points can reflect the distance of the user's key points relative to the camera.
  • the position information of each 3D key point may be the position information of the above-mentioned 3D key points with one 3D key point as a reference point. For example, taking the midpoint of the left and right hips as the reference point, the position information of the midpoint of the left and right hips in the 3D control can be coordinates (0, 0, 0). Then, the electronic device may determine the position information of the other 3D key points according to the relative positions of the other 3D key points and the middle point of the left and right hips.
  • the electronic device may establish a 3D coordinate system x-y-z as shown in FIG. 2B in the 3D space where the camera is located.
  • the 3D coordinate system can take the optical center of the camera as the origin, take the direction of the optical axis of the camera (that is, the direction perpendicular to the image plane) as the direction of the z-axis, and take the x_i axis in the 2D coordinate system shown in FIG. 2A as the direction.
  • the direction and the direction of the y_i axis are the direction of the x-axis and the direction of the y-axis of the 3D coordinate system, respectively.
  • the location information of each 3D key point may be the three-dimensional coordinates of the user's key point in the 3D coordinate system.
  • the 3D coordinate system shown in FIG. 2B is a right-handed coordinate system.
  • -x can represent the negative direction of the x-axis.
  • This embodiment of the present application does not limit the method for establishing the above-mentioned 3D coordinate system.
  • the above-mentioned 3D coordinate system may also be a left-handed coordinate system.
  • the electronic device can determine the three-dimensional coordinates of the user's key points in the left-handed coordinate system.
  • the electronic device may determine the position information of the 3D key points through other methods.
  • the implementation process of the electronic device estimating the position information of the 3D key points based on the 2D key points reference may be made to the implementation in the prior art, which will not be repeated in this embodiment of the present application.
  • the electronic device may determine a set of 3D keypoints using a set of 2D keypoints. Alternatively, the electronic device may determine a set of 3D keypoints by using multiple sets of 2D keypoints determined from consecutive multiple frames of images.
  • the above set of 3D key points may include all the key points of the human body shown in FIG. 1 .
  • a set of 3D keypoints can be used to define a human model in 3D space.
  • the electronic device may perform 3D key point detection on one frame of images or multiple consecutive frames of images, and determine a set of 3D key points from one frame or multiple consecutive frames of images. That is, the electronic device may not need to first determine the 2D key points of the image, and then estimate the user's 3D key points based on the 2D key points.
  • the embodiments of the present application do not limit the specific method for performing 3D key point detection on the electronic device.
  • the method for detecting the human body key points provided by the present application is described by taking the method of first determining the 2D key points of the image by the electronic device, and then estimating the 3D key points of the user based on the 2D key points as an example.
  • FIG. 3 exemplarily shows a schematic diagram of a scenario where the electronic device 100 implements smart fitness by detecting 3D key points of the human body.
  • the electronic device 100 may include a camera 193 .
  • the electronic device 100 may capture images through the camera 193 .
  • the images captured by the camera 193 may include images of the user during exercise.
  • the electronic device 100 may display the image captured by the camera 193 on the user interface 210 .
  • This embodiment of the present application does not limit the content displayed on the above-mentioned user interface 210 .
  • the electronic device 100 can identify the 2D key points of the user in the image captured by the camera 193 . Based on the above 2D keypoints, the electronic device 100 may estimate 3D keypoints for a set of users. A set of 3D keypoints can be used to determine a user's body model in 3D space. The human body model determined by the 3D key points can reflect the user's pose. The more accurate the position information of the 3D key points is, the more accurately the human body model determined by the 3D key points can reflect the posture of the user.
  • the pitch angle, field of view and height of the camera are different. While the camera is capturing images, the distance between the user and the camera may change. Affected by the above factors, the image captured by the camera will undergo perspective deformation. If the electronic device 100 determines the 2D key points of the user by using the image with perspective deformation, the aspect ratio of the human body model determined by the above 2D key points is different from the aspect ratio of the actual human body of the user. The electronic device 100 uses the above-mentioned 2D key points to determine the 3D key points of the user. The position information of the above 3D key points may have errors. The human body model determined by the above 3D key points is difficult to accurately reflect the user's posture.
  • 4A and 4B exemplarily show a set of 3D key points determined by the electronic device 100 from different orientations when the user is in a standing posture.
  • the 3D coordinate system shown in FIG. 4A may be the 3D coordinate system x-y-z shown in the aforementioned FIG. 2B .
  • Figure 4A shows a set of 3D keypoints for a user from the direction of the z-axis. This orientation is equivalent to observing the user's posture from behind the user.
  • the posture of the user reflected by a human body model determined by this group of 3D key points is a standing posture.
  • the 3D coordinate system shown in FIG. 4B may be the 3D coordinate system x-y-z shown in the aforementioned FIG. 2B .
  • z can represent the positive direction of the z-axis.
  • the positive direction of the z-axis is the direction in which the camera points to the object to be photographed.
  • Figure 4B shows the 3D keypoints for a set of users from the negative direction of the x-axis. This orientation is equivalent to observing the user's posture from the user's side.
  • the human body model determined by this group of 3D key points leans forward in the direction where the camera 193 is located.
  • the human body model determined by a set of 3D key points When the user is in a standing position, the human body model determined by a set of 3D key points should be perpendicular to the plane x-0-z. Due to the perspective deformation of the image, the human body model determined by a set of 3D key points will have a forward leaning posture as shown in Figure 4B. The greater the degree of forward inclination of a group of 3D human models in the direction of the camera 193 , the greater the degree of perspective deformation of the characters in the image captured by the camera 193 .
  • the present application provides a method for detecting key points of a human body, which can be applied to an electronic device equipped with a monocular camera.
  • Electronic devices can detect 3D key points of the human body through images captured by a monocular camera.
  • the above-mentioned monocular camera may be the camera 193 in the aforementioned electronic device 100 .
  • the method can save costs and improve the accuracy of location information detection of 3D key points.
  • the electronic device can identify the 2D key point of the user in the image captured by the camera, and estimate the 3D key point of the user according to the 2D key point.
  • the electronic device may determine whether the user's legs are upright according to the above-mentioned 3D key points.
  • the electronic device can calculate a compensation angle between the human body model and the image plane determined by the set of 3D key points. Further, the electronic device can use the compensation angle to correct the position information of the 3D key points, reduce the error caused by the image perspective deformation to the position information of the 3D key points, and improve the accuracy of the position information detection of the 3D key points.
  • the above-mentioned image plane is the plane where the image collected by the camera is located (ie, the plane perpendicular to the optical axis of the camera).
  • the angle between the straight line where the user's legs are located and the image plane should be 0 or a value close to 0.
  • the electronic device can determine the degree of perspective deformation of the character in the image through the angle between the straight line where the 3D key points of the user's legs are located and the image plane, and determine the position information of the 3D key points. Make corrections. Compared with using images collected by multiple cameras to detect 3D key points, the above method can only need one camera, has lower cost and lower computational complexity.
  • the active area usually does not change much. That is to say, in the process of fitness or somatosensory game, the perspective deformation degree of the characters in the images collected by the camera will not change much.
  • the electronic device can detect the angle between the straight line where the 3D key points of the user's legs are located and the image plane when the user stands upright, and use the angle as compensation The position information of the 3D key points determined during the fitness or somatosensory game is corrected.
  • the electronic device may update the compensation angle. That is, the electronic device can use the angle between the straight line where the 3D key point of the user's leg is located and the image plane when the user's leg is upright last time as the compensation angle, and the 3D image determined in the subsequent stages of the fitness or somatosensory game can be used as the compensation angle.
  • the position information of key points is corrected. The above method can reduce the influence of the position change between the user and the camera on the correction of the position information of the 3D key points, and improve the accuracy of the position information of the 3D key points after correction.
  • the electronic device may determine the 3D key points of the user according to the images collected by the camera, and determine the virtual character in the somatosensory game according to the above-mentioned 3D key points.
  • the electronic device may present the above-mentioned avatar in the user interface.
  • the above-mentioned avatar may reflect the gesture of the user. For example, if the user jumps forward, the above-mentioned avatar also jumps forward.
  • the position information of 3D key points is corrected, due to the perspective deformation of the image, there is an error in the position information of 3D key points. Then there is a difference between the posture of the virtual character and the actual posture of the user.
  • the user is actually in a standing posture, while the above-mentioned avatar may be in a posture of leaning forward. Then the action that the user may actually complete is standard, but the electronic device 100 will determine that the action of the user is not standard and indicate that the user fails to clear the customs. This affects the user's gaming experience.
  • the posture of the above virtual character is more suitable for the actual posture of the user.
  • the electronic device 100 can more accurately determine whether the user's posture is correct and whether the user's motion range meets the requirements, etc., so that the user has a better experience when playing a fitness or somatosensory game.
  • FIG. 5 exemplarily shows a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110 , an external memory interface 120 , an internal memory 121 , a universal serial bus (USB) interface 130 , a charging management module 140 , a power management module 141 , and a battery 142 , Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Speaker 170A, Receiver 170B, Microphone 170C, Headphone Interface 170D, Sensor Module 180, Key 190, Motor 191, Indicator 192, Camera 193 , display screen 194, etc.
  • USB universal serial bus
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices.
  • the charging management module 140 is used to receive charging input from the charger. While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device 100 through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the electronic device 100 may implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a “speaker” is used to convert audio electrical signals into sound signals.
  • the receiver 170B also referred to as “earpiece”, is used to convert audio electrical signals into sound signals.
  • the microphone 170C also called “microphone” or “microphone”, is used to convert sound signals into electrical signals.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the sensor module 180 may include a pressure sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • Motor 191 can be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the electronic device 100 may contain more or fewer components.
  • the electronic device 100 in the embodiment of the present application may be a television, a mobile phone, a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a handheld computer, a netbook, and a personal digital assistant (PDA).
  • portable multimedia players portable multimedia players, PMP
  • dedicated media players AR (augmented reality) / VR (virtual reality) devices and other types of electronic devices.
  • AR augmented reality
  • VR virtual reality
  • the following describes a human body key point detection method provided by an embodiment of the present application based on a scenario in which the user follows a fitness course in the electronic device 100 to exercise and the electronic device 100 detects the user's 3D key points.
  • Multiple fitness classes may be stored in the electronic device 100 .
  • the electronic device 100 may acquire multiple fitness courses from a cloud server.
  • a fitness course usually includes a plurality of actions, and there may be a preset rest time between two consecutive actions in the above-mentioned plurality of actions, and any two actions in the above-mentioned plurality of actions may be the same or different.
  • the fitness courses may be recommended by the electronic device according to the user's historical fitness data, or may be selected by the user according to actual needs.
  • Fitness classes can be streamed locally or online. There is no specific limitation here.
  • a fitness class may include multiple sub-classes, and each sub-class may include one or more consecutive movements of the fitness class.
  • the above-mentioned multiple sub-courses may be divided according to the type of exercise, the purpose of exercise, the part of exercise, and the like. There is no specific limitation here.
  • a fitness class consists of three sub-classes. Among them, the first sub-course is warm-up exercise, the second sub-course is formal exercise, and the third sub-course is stretching exercise, any of the above three sub-courses includes one or more consecutive movements.
  • the fitness course may include one or more types of content in the form of video, animation, voice, text, etc., which is not specifically limited here.
  • Phase 1 Start a fitness class.
  • FIG. 6A exemplarily shows a user interface 61 on the electronic device 100 for presenting an application program installed by the electronic device 100.
  • User interface 61 may include icons 611 for the application fitness, as well as icons for other applications (eg, mail, gallery, music, etc.).
  • the icon of any application can be used to respond to a user's operation, such as a touch operation, so that the electronic device 100 starts the application corresponding to the icon.
  • the user interface 61 may further include more or less content, which is not limited in this embodiment of the present application.
  • the electronic device 100 may display the fitness course interface 62 as shown in FIG. 6B.
  • the fitness class interface 62 may include an application title bar 621 , a function bar 622 , and a display area 623 . in:
  • the application title bar 621 may be used to indicate that the current page is used to present the setting interface of the electronic device 100 .
  • the presentation form of the application title bar 621 may be text information "smart fitness", icons or other forms.
  • the function bar 622 may include: user center controls, course recommendation controls, fat burning area controls, shaping area controls, and shaping area controls. Not limited to the above-described controls, the function bar 622 may contain more or fewer controls.
  • the electronic device 100 may display the content indicated by the control in the display area 623 .
  • the electronic device 100 may display the interface content of the user's personal center in the display area 623 .
  • the electronic device 100 may display one or more recommended fitness courses in the display area 623 .
  • the display area 623 displays course covers of a plurality of recommended courses.
  • the course cover may include the course classification, duration, and name of the corresponding fitness course.
  • the electronic device 100 may start a fitness course corresponding to the course cover, and display the sports content in the fitness course.
  • the embodiments of the present application do not limit the user operations mentioned above.
  • the user can also control the electronic device 100 through the remote control to execute corresponding instructions (eg, start a fitness application, start a fitness course, etc.).
  • the fitness course interface 62 may further include more or less content, which is not limited in this embodiment of the present application.
  • the electronic device 100 may start the fitness course. Wherein, in the process of playing the fitness course, the electronic device 100 needs to collect images through the camera. Then, before playing the fitness class, the electronic device 100 may prompt the user that the camera is about to be turned on.
  • Stage 2 Determine the target user and the initial compensation angle, and use the initial compensation angle to correct the position information of the 3D key points.
  • the target user may refer to a user who needs the electronic device 100 to detect 3D key points and record exercise data when the electronic device 100 plays a fitness course.
  • the electronic device 100 may collect the user's face information to determine the target user who is exercising. Determining the target user is helpful for the electronic device 100 to track the user who needs to perform key point detection, and accurately acquire the movement data of the target user. In this way, other users other than the target user in the shooting range of the camera can prevent the electronic device 100 from interfering with the detection of the key points of the target user and the acquisition of the movement data of the target user.
  • the electronic device 100 may also determine the initial compensation angle. Before the initial compensation angle is updated, the electronic device 100 may use the initial compensation angle to correct the position information of the 3D key points determined based on the 2D key points after the fitness course starts playing.
  • the electronic device 100 may display the target user determination interface 63 .
  • the target user determination interface 63 may include a prompt 631 and a user image 632 .
  • the prompt language can be used to prompt the user to determine the relevant operation of the target user and the initial compensation angle.
  • the prompt can be a text prompt "Please stay standing in the area where you are exercising and point your face at the camera".
  • the embodiments of the present application do not limit the form and specific content of the above prompts.
  • the user image 632 is the image of the target user captured by the camera.
  • the user image collected by the above-mentioned camera can be used by the electronic device 100 to determine the target user who needs to perform key point detection and motion data recording during the playback of the fitness course.
  • the electronic device 100 can use a target tracking algorithm to track the target user.
  • a target tracking algorithm For the implementation manner of the above target tracking algorithm, reference may be made to the specific implementation of the target tracking algorithm in the prior art, which will not be repeated here.
  • the above prompt prompts the user to maintain a standing posture in the area where the exercise is performed, which can facilitate the electronic device 100 to determine the initial compensation angle.
  • the user maintains a standing posture in the area where he is about to exercise during the process of determining the target user by the electronic device 100.
  • the electronic device 100 may collect one or more frames of images including the user's motion gesture, and identify one or more sets of 2D key points of the user from the one or more frames of images. Further, based on the above-mentioned one or more sets of 2D keypoints, the electronic device 100 may estimate the 3D keypoints of a set of users. According to the above-mentioned group of 3D key points of the user, the electronic device 100 can determine whether the user's legs are upright.
  • the electronic device 100 may calculate the angle between the straight line where the 3D key point of the user's leg is located and the image plane. If the included angle is smaller than the preset included angle, the electronic device 100 may determine the included angle as the initial compensation angle. Otherwise, the electronic device 100 may take the default angle as the initial compensation angle.
  • the above-mentioned default angle may be pre-stored in the electronic device 100 .
  • the above default angle can be used as a general compensation angle to correct the position information of the 3D key points, and reduce the error caused by the perspective deformation of the characters in the image to the position information of the 3D key points.
  • the above-mentioned default angle can be set according to the experience value. This embodiment of the present application does not limit the value of the above-mentioned default angle.
  • the electronic device 100 determines whether the angle between the straight line where the 3D key point is located and the image plane is smaller than the preset angle, and determines the angle as the initial compensation angle when the angle is smaller than the preset angle. This can avoid the miscalculation of the angle between the straight line where the 3D key point is located and the image plane, resulting in an impossible value for the initial compensation angle.
  • the above-mentioned preset angle may be, for example, 45°. This embodiment of the present application does not limit the value of the above-mentioned preset included angle.
  • the electronic device 100 may play the fitness class. During the playback of the fitness course, the electronic device 100 may capture images through a camera, and identify 2D key points of the user in the images. Based on the above 2D key points, the electronic device 100 may determine the user's 3D key points. The electronic device 100 may use the initial compensation angle to correct the position information of the above-mentioned 3D key points.
  • the electronic device 100 may perform 2D key point detection on each frame of image. For a frame of image, the electronic device 100 can identify a set of 2D key points corresponding to this frame of image. The electronic device 100 may determine a set of 3D keypoints using the set of 2D keypoints. This set of 3D keypoints can be used to determine a human body model. This human body model can reflect the pose of the user in this frame of images.
  • the electronic device 100 may perform 2D keypoint detection on each frame of image.
  • the electronic device 100 may identify multiple groups of 2D key points corresponding to the consecutive multi-frame images.
  • the electronic device 100 may determine a set of 3D keypoints using the sets of 2D keypoints. This set of 3D keypoints can be used to determine a human body model. This human body model can reflect the user's pose in the consecutive multi-frame images. Compared with the 3D keypoints determined using the 2D keypoints identified from one frame of images, the 3D keypoints determined using the 2D keypoints identified from consecutive multi-frame images can more accurately reflect the user's pose.
  • the electronic device 100 may determine the target user by detecting whether the user's action matches a preset action.
  • the above-mentioned preset action may be an upward curling of the arms.
  • the electronic device 100 may display an action example of double-arm curling and a prompt for prompting the user to do double-arm curling on the target user determination interface 63 shown in FIG. 6C .
  • the electronic device 100 can perform human body posture detection according to the image collected by the camera. The electronic device 100 may determine a user whose posture is detected to be the same as that of curling up with both arms as a target user.
  • the electronic device 100 may also determine the initial compensation angle according to the method of the above-mentioned embodiment.
  • This embodiment of the present application does not limit the types of the foregoing preset actions.
  • the electronic device 100 when a fitness class is played, can obtain the actions included in the one fitness class.
  • the electronic device 100 may detect a user whose action matches the action in the fitness course to determine the target user.
  • the electronic device 100 may use the above-mentioned default angle as the initial compensation angle, and use the default angle to correct the position information of the 3D key points determined by the electronic device 100 . That is, the electronic device 100 may determine the target user and the initial compensation angle without using the method shown in FIG. 6C .
  • the electronic device 100 can detect that the user's legs are upright at the moment when a fitness course starts playing, or within a period of time (eg, within 3 seconds) after a fitness course starts playing, the electronic device 100 can The 3D keypoints used when the user's legs are upright are detected to determine the initial compensation angle described above.
  • the initial compensation angle is the angle between the straight line where the 3D key point of the leg is located in the above 3D key points and the image plane.
  • Stage 3 Update the compensation angle, and use the updated compensation angle to correct the position information of the 3D key points.
  • the electronic device 100 can acquire the actions included in the fitness course.
  • the electronic device 100 can determine whether the actions in this one fitness class include leg upright actions (such as standing actions, front-standing arms raising actions, etc.).
  • the electronic device 100 may display the user interface 64 shown in FIG. 6D .
  • User interface 64 may include exercise class window 641 and user exercise window 642 . in:
  • the fitness class window 641 can be used to display the specific content of the fitness class. Such as images of trainers performing movements in a fitness class, etc.
  • the user fitness window 642 can be used to display the image of the target user captured by the camera in real time.
  • This embodiment of the present application does not limit the distribution manner of the above-mentioned fitness course window 641 and the user fitness window 642 on the user interface 64 .
  • the fitness course window 641 and the user fitness window 642 may further include more contents, which are not limited in this embodiment of the present application.
  • the action that the fitness course instructs the user to complete is an upright leg action, such as a standing action.
  • the action of the trainer in the fitness class window 641 is a standing action.
  • the user can perform standing movements according to the instructions of the coach movements in the fitness course window 641 .
  • the electronic device 100 may obtain the 3D key points by using the image collected by the camera near time t1.
  • the electronic device 100 can determine whether the user's legs are upright according to the 3D key points. If the user's leg is upright, the electronic device 100 may calculate the angle between the straight line where the 3D key point of the user's leg is located and the image plane.
  • the electronic device 100 may determine the included angle as the compensation angle.
  • the electronic device 100 may use the compensation angle to update the previous compensation angle, and use the updated compensation angle to correct the position information of the 3D key points.
  • the compensation angle updated by the compensation angle calculated by the electronic device 100 is the initial compensation angle in the foregoing embodiment when the fitness course instructs the user to complete the leg upright action for the first time.
  • the compensation angle updated by the compensation angle calculated by the electronic device 100 may be calculated when the electronic device 100 detected the user's leg upright last time. compensation angle.
  • the electronic device 100 can detect whether the user's legs are upright when the fitness class instructs the user to complete the leg upright action. If the user's legs are upright, the electronic device 100 may update the compensation angle for correcting the position information of the 3D key points.
  • the updated compensation angle is the angle between the straight line where the 3D key point of the user's leg is located and the image plane when the user's leg is standing upright this time.
  • the updated compensation angle can more accurately reflect the degree of perspective deformation of the character in the image captured by the camera at the current position of the user. That is, the electronic device 100 can correct the position information of the 3D key points by using the updated compensation angle.
  • the human body model determined by the corrected 3D key points can more accurately reflect the user's posture and improve the accuracy of posture detection. In this way, in a fitness or somatosensory game, the electronic device 100 can more accurately determine whether the user's posture is correct and whether the user's motion range meets the requirements, etc., so that the user has a better experience when playing a fitness or somatosensory game.
  • the electronic device 100 may determine a set of 3D key points through a set of 2D key points identified by one frame of images or multiple sets of 2D key points identified by consecutive multiple frames of images during the playing of the fitness course.
  • the set of 3D keypoints can determine a human body model.
  • the electronic device 100 can determine whether the user's legs are upright through the set of 3D key points. If the user's leg is upright, the electronic device 100 may calculate the angle between the straight line where the 3D key point of the user's leg is located and the image plane. If the included angle is smaller than the preset included angle in the foregoing embodiment, the electronic device 100 may update the compensation angle.
  • the updated compensation angle is the above-mentioned included angle.
  • the electronic device 100 may correct the position information of the 3D key points using the updated compensation angle.
  • each time the electronic device 100 determines a set of 3D key points it can use the set of 3D key points to determine whether the compensation angle for correcting the position information of the 3D key points can be updated.
  • the method for detecting human body key points provided in the embodiments of the present application may also be applicable to other scenarios in which gesture detection is implemented by detecting 3D key points.
  • the following describes a method for judging whether a user's leg is upright and determining a compensation angle when the user's leg is upright, provided by an embodiment of the present application.
  • the user's legs are upright may refer to both of the user's legs being upright.
  • the angle between the thigh and the lower leg of each leg of the user should be 180° or a value close to 180°.
  • the electronic device 100 can use the 3D key points determined based on the 2D key points to determine whether the angle between the user's thigh and the lower leg is close to 180°, to determine whether the user's leg is upright.
  • FIG. 7A exemplarily shows a schematic diagram of a set of 3D keypoints.
  • the line between the user's left thigh that is, the line connecting the left hip point and the left knee point
  • the line where the left calf is located that is, the line connecting the left knee point and the left ankle point
  • the angle between them is ⁇ 1.
  • the angle between the line where the user's right thigh is located that is, the line connecting the right hip point and the right knee point
  • the line where the right calf is located that is, the line connecting the right knee point and the right ankle point
  • the electronic device 100 may calculate the difference between ⁇ 1 and 180° and the difference between ⁇ 2 and 180°.
  • the electronic device 100 may determine that the user's legs are upright.
  • the above-mentioned preset difference is a value close to 0.
  • the above-mentioned preset difference can be set according to experience. This embodiment of the present application does not limit the size of the above-mentioned preset difference.
  • the electronic device 100 can calculate the angle ⁇ 1 between the line connecting the left hip point and the left ankle point and the image plane, and the right hip point and the right ankle point.
  • the angle ⁇ 2 between the line where the connection line is located and the image plane.
  • the electronic device 100 can calculate the average value of the included angle ⁇ 1 and the included angle ⁇ 2 to obtain the included angle ⁇ as shown in FIG. 7B . If the included angle ⁇ is smaller than the preset included angle in the foregoing embodiment, the electronic device 100 may determine the included angle ⁇ as a compensation angle, and use the compensation angle to correct the position information of the 3D key points.
  • the electronic device 100 may determine the first position of the line connecting the left hip point and the left ankle point on the y-0-z plane of the 3D coordinate system shown in FIG. 7A .
  • the electronic device 100 can calculate the included angle ⁇ 3 between the above-mentioned first projected straight line and the positive direction of the y-axis of the 3D coordinate system shown in FIG. 7A , and the included angle ⁇ 4 between the above-mentioned second projected straight line and the positive direction of the y-axis of the 3D coordinate system shown in FIG. 7A .
  • the electronic device 100 can calculate the average value of the included angle ⁇ 3 and the included angle ⁇ 4 to obtain the included angle ⁇ as shown in FIG. 7B . If the included angle ⁇ is smaller than the preset included angle in the foregoing embodiment, the electronic device 100 may determine the included angle ⁇ as a compensation angle, and use the compensation angle to correct the position information of the 3D key points.
  • the user's leg is upright may refer to the user's one-leg upright.
  • the electronic device 100 can determine whether any one of the user's legs is upright according to the methods of the foregoing embodiments. If it is determined that one of the user's legs is upright, the electronic device 100 may determine that the user's leg is upright. Further, the electronic device 100 may calculate the included angle between the line connecting the hip point and the ankle point on the upright leg and the image plane. If the included angle is smaller than the preset included angle in the foregoing embodiment, the electronic device 100 may determine the included angle as a compensation angle, and use the compensation angle to correct the position information of the 3D key points.
  • the following introduces a method for correcting the position information of 3D key points by using the compensation angle ⁇ provided by the embodiment of the present application.
  • the electronic device 100 can correct the position information of the 3D key points according to the following formula (1):
  • Position information for a 3D key point before correction. is estimated by the electronic device 100 based on the 2D key points. Corrected position information for the above 3D key point.
  • FIG. 8 exemplarily shows a flowchart of a method for detecting human key points provided by an embodiment of the present application.
  • the method may include steps S101-S108. in:
  • the electronic device 100 may collect images through a camera.
  • the camera and the electronic device 100 may be integral. As shown in FIG. 3 , the electronic device 100 may include a camera 193 .
  • the camera and the electronic device 100 may be two separate devices.
  • a communication connection is established between the camera and the electronic device 100 .
  • the camera can send the captured image to the electronic device 100 .
  • the electronic device 100 in response to a user operation to start a fitness session, may turn on the camera or send an instruction to the camera indicating that the camera is turned on. As shown in FIG. 6C , in response to a user operation acting on the determination control 624A, the electronic device 100 may turn on the camera.
  • This embodiment of the present application does not limit the time when the camera is turned on.
  • the camera may also be turned on before the electronic device 100 receives a user operation to start a fitness class.
  • the electronic device 100 may determine m groups of 2D key points corresponding to the m frames of images according to the m frames of images collected by the camera.
  • the electronic device 100 may perform 2D key point detection on images captured by the camera during the playback of a fitness class. That is, the m frames of images are acquired by the camera during the playback of the fitness course.
  • the value of m is a positive integer.
  • the electronic device 100 may identify, from a frame of image, 2D key points of a group of users in the frame of image. Wherein, the electronic device 100 may use a deep learning method (such as an openpose algorithm, etc.) to perform 2D keypoint detection on the image.
  • the embodiments of the present application do not limit the method for identifying 2D key points by the electronic device 100 .
  • For the implementation process of identifying 2D key points by the electronic device 100 reference may be made to the implementation process in the prior art, which will not be repeated here.
  • the electronic device 100 may estimate a set of 3D key points corresponding to the m frames of images by the user according to the m sets of 2D key points.
  • the electronic device 100 can identify a set of 2D key points from one frame of image. Further, the electronic device 100 may use a set of 2D key points corresponding to this frame of image to estimate a set of 3D key points corresponding to this frame of image.
  • the m frames of images are consecutive multiple frames of images collected by the camera.
  • the electronic device 100 may estimate a set of 3D keypoints based on the consecutive multi-frame images. Specifically, the electronic device 100 may identify multiple groups of 2D key points from consecutive multiple frames of images. Using the multiple sets of 2D key points corresponding to the multiple frames of images, the electronic device 100 can estimate a set of 3D key points corresponding to the multiple frames of images.
  • the embodiments of the present application do not limit the method for estimating 3D key points by using 2D key points.
  • the implementation process of estimating 3D key points by the electronic device 100 reference may be made to the implementation process in the prior art, and details are not described here.
  • the electronic device 100 may detect whether the user's legs are upright by using a set of 3D key points corresponding to the above m frames of images.
  • the electronic device 100 may detect whether the user's legs are upright according to the position information of the set of 3D key points.
  • the electronic device 100 may detect whether the user's legs are upright according to the position information of the set of 3D key points. For the method for the electronic device 100 to detect whether the user's legs are upright, reference may be made to the foregoing embodiment shown in FIG. 7A .
  • the electronic device 100 may perform step S105.
  • the electronic device 100 may perform step S108.
  • the electronic device 100 can determine the angle between the straight line where the 3D key points of the legs are located in the above set of 3D key points and the image plane, and the image plane is perpendicular to the optical axis of the camera plane.
  • the electronic device 100 can determine the degree of perspective deformation of the character in the image captured by the camera by detecting the angle between the straight line where the 3D key point of the user's leg is located and the image plane, so as to correct the position information of the 3D key point.
  • the electronic device 100 can determine the distance between the straight line where the 3D key points of the legs are located in the above set of 3D key points and the image plane angle.
  • the electronic device 100 For the method for calculating the above-mentioned included angle by the electronic device 100, reference may be made to the description of the foregoing embodiment shown in FIG. 7A , which will not be repeated here.
  • the electronic device 100 may determine whether the above-mentioned included angle is smaller than a preset included angle.
  • the electronic device 100 may execute step S107 to update the compensation angle stored in the electronic device 100 .
  • the electronic device 100 may perform step S108 to correct the position information of a group of 3D key points corresponding to the m frames of images by using the compensation angle stored by the electronic device 100 .
  • the above-mentioned included angle calculated by the electronic device 100 is 70°.
  • the effect of the perspective deformation of the characters in the image on the 3D key points obviously cannot cause the angle between the straight line where the 3D key points of the user's legs are located and the image plane to reach 70°.
  • the electronic device 100 judges whether the above-mentioned calculated angle can be used as the compensation angle for correcting the position information of the 3D key points by setting the preset angle, thereby avoiding the impossible value of the compensation angle due to the above-mentioned miscalculation of the angle, and improving the value of the compensation angle. Accuracy of detecting 3D keypoints.
  • the value of the above-mentioned preset angle can be set according to experience.
  • the value of the preset included angle may be 45°. This embodiment of the present application does not limit the value of the above-mentioned preset included angle.
  • the electronic device 100 can utilize the above-mentioned included angle to update the compensation angle stored by itself, and the compensation angle stored by the electronic device 100 after the update is the above-mentioned included angle.
  • the above-mentioned predetermined included angle can be used as a compensation angle for correcting the position information of the 3D key point.
  • the electronic device 100 can use the above-mentioned included angle to update its stored compensation angle.
  • the electronic device 100 may correct the position information of the group of 3D key points by using the compensation angle stored by itself.
  • the compensation angle stored in the electronic device 100 may be the updated compensation angle in step S107 above.
  • the compensation angle after the above update is the angle between the straight line where the 3D key point of the leg is located in the set of 3D key points corresponding to the above m frames of images and the image plane.
  • the compensation angle stored in the electronic device 100 may be the initial compensation angle in the foregoing embodiments. It can be known from the foregoing embodiments that the above-mentioned initial compensation angle may be a default angle pre-stored by the electronic device 100 . Alternatively, the above-mentioned initial compensation angle may be the compensation angle calculated when it is detected that the user's legs are upright before a fitness course is played, as shown in FIG. 6C .
  • the electronic device 100 does not update the above-mentioned initial compensation angle.
  • the electronic device 100 may use the above-mentioned initial compensation angle to correct the position information of each group of 3D key points determined in the above-mentioned time period T1.
  • the compensation angle stored in the electronic device 100 may be the compensation angle calculated when the user's legs are upright for the last time within the above-mentioned time period T1.
  • the electronic device 100 detects that the user's legs are upright at least once. When it is detected that the user's legs are upright, the electronic device 100 may calculate the angle between the 3D keypoints of the legs and the image plane according to a set of 3D keypoints determined when the user's legs are upright this time. If the included angle can be used to correct the position information of the 3D key point, the electronic device 100 can update the compensation angle and use the included angle as a new compensation angle to replace the compensation angle calculated when the user's legs were upright last time. The electronic device 100 may correct the position information of the 3D key points using the stored compensation angle until the stored compensation angle is updated. The electronic device 100 may use the updated compensation angle to correct the position information of the 3D key points determined after the compensation angle is updated.
  • the electronic device 100 determines whether the actions in the fitness class include an upright leg action.
  • the electronic device 100 may execute step S104 shown in FIG. 8 . That is, the electronic device 100 may use the 3D key points determined based on the 2D key points to detect whether the user's legs are upright. In the case of detecting that the user's legs are upright, the electronic device 100 can determine whether the angle between the straight line where the 3D key points of the legs are located and the image plane can be used to correct the position information of the 3D key points in the above 3D key points . If the above-mentioned included angle can be used to correct the position information of the 3D key point, the electronic device 100 can update the compensation angle by using the above-mentioned included angle.
  • the electronic device 100 may determine the user's 2D key points by using the currently captured image, and determine the user's 3D key points based on the above 2D key points.
  • the electronic device 100 can correct the above-mentioned 3D key points by using the compensation angle updated last time.
  • the electronic device 100 may use the 3D key points determined based on the 2D key points at this time to determine whether the user's legs are upright when the fitness course is played to instruct the user to complete the leg upright action. Then, the electronic device 100 may not use each set of 3D key points determined based on the 2D key points to determine whether the user's legs are upright, and determine whether the compensation angle needs to be updated. This can save computing resources of the electronic device 100 .
  • This embodiment of the present application does not limit the time for the electronic device 100 to update the compensation angle.
  • the electronic device 100 may also regularly or irregularly detect whether the user's legs are upright, and update the compensation angle when detecting that the user's legs are upright.
  • the electronic device 100 can use the straight line and the image where the 3D key points of the legs are located in the 3D key points determined based on the 2D key points.
  • the angle between the planes determines the degree of perspective deformation of the characters in the image captured by the camera.
  • the electronic device 100 can correct the position information of the 3D key points, reduce the error caused by the image perspective deformation on the position information of the 3D key points, and improve the accuracy of the position information detection of the 3D key points.
  • This method can only need one camera, which not only saves cost, but also has low computational complexity for key point detection.
  • the electronic device 100 can update the compensation angle.
  • the updated compensation angle can more accurately reflect the degree of perspective deformation of the character in the image captured by the camera at the current position of the user. This can reduce the influence of the position change between the user and the camera on the correction of the position information of the 3D key points, and improve the accuracy of the position information of the 3D key points after correction. That is to say, the electronic device 100 can correct the position information of the 3D key points by using the updated compensation angle.
  • the human body model determined by the corrected 3D key points can more accurately reflect the user's posture and improve the accuracy of posture detection. In this way, in a fitness or somatosensory game, the electronic device 100 can more accurately determine whether the user's posture is correct and whether the user's motion range meets the requirements, etc., so that the user has a better experience when playing a fitness or somatosensory game.
  • the human body key point detection method in the foregoing embodiment mainly determines the degree of perspective deformation of the character in the image captured by the camera when the user's leg is upright.
  • the user will not keep the leg upright posture unchanged.
  • the user may perform actions such as squatting, lying on the side, and so on.
  • the electronic device 100 may determine 2D key points by using images captured by the camera, and determine one or more groups of 3D key points based on the above 2D key points.
  • the electronic device 100 can only use the compensation angle calculated when the user's legs are upright most recently detected for one or more sets of 3D key points determined when the user's legs are not upright. position information is corrected.
  • the relative position between the user and the camera and the height of the user in the image captured by the camera will not change much. All have an impact on the perspective deformation degree of the characters in the images captured by the camera.
  • the compensation angle calculated by the electronic device 100 when the user's leg is upright at the previous moment cannot accurately reflect the perspective deformation degree of the character in the image captured by the camera when the user's leg is not upright at the next moment.
  • the electronic device 100 corrects the position information of one or more groups of 3D key points determined when the user's legs are not upright by using the compensation angle calculated when the user's legs are upright at the latest detection, and after the correction, the 3D key point's position information.
  • the location information is still not accurate enough.
  • the height of the user in the image captured by the camera becomes smaller, which causes the degree of perspective deformation of the character in the image captured by the camera to increase.
  • the distance between the user and the camera increases, the height of the user in the image captured by the camera becomes smaller, which also causes the degree of perspective deformation of the character in the image captured by the camera to increase.
  • the present application provides a human body key point detection method, which can determine the degree of perspective deformation of a person in an image in real time without limiting the user to keep their legs upright, and determine the 3D key point determined based on the 2D key point. position information is corrected.
  • the electronic device 100 may determine the initial compensation angle.
  • the electronic device 100 can determine the amount of change in the degree of perspective deformation of the character between the two adjacent frames of images.
  • the sum of the degree of perspective deformation of the character in the previous frame of image and the amount of change of the degree of perspective deformation of the character between the two adjacent frames of images is the degree of perspective deformation of the character in the next frame of image.
  • the degree of perspective deformation of the person in the previous frame of image may be based on the initial compensation angle, and the amount of change in the degree of perspective deformation of the person between two adjacent frames of images before the previous frame of image is accumulated.
  • the electronic device 100 can determine the degree of perspective deformation of the character in the image captured by the camera.
  • the electronic device 100 corrects the position information of the 3D key points according to the degree of perspective deformation of the person in the image determined in real time, which can improve the accuracy of the 3D key point detection.
  • the following specifically introduces a method for determining the amount of change in the degree of perspective deformation of a character between two adjacent frames of images provided by an embodiment of the present application.
  • FIG. 9A exemplarily shows a scene diagram of a user during exercise.
  • the electronic device 100 can capture images through the camera 193 and display the above images on the user interface 910 .
  • the user moves toward the direction where the camera 193 is located. That is, the distance between the user and the camera 193 is reduced.
  • the above n is an integer greater than 1. The user's position in the n-1 th image and in the n th image changes.
  • FIG. 9B and FIG. 9C exemplarily show the n-1 th frame image and the n th frame image collected by the camera 193 .
  • the electronic device 100 may perform human body detection on the image, and determine the rectangular frame of the user's human body in the image.
  • the human body rectangle can be used to determine the user's position in the image.
  • the height and width of the human body rectangle are adapted to the height and width of the user in the image, respectively.
  • the electronic device 100 can determine the height of the human body rectangular frame in the image coordinate system (ie, the 2D coordinate system x_i-y_i in the foregoing embodiment) and the distance of the lower edge from the x_i axis in each frame of image.
  • the distance between the lower edge of the human body rectangular frame and the x_i axis is y n-1 .
  • the height of the human body rectangle is h n-1 .
  • the distance between the lower edge of the human body rectangular frame and the x_i axis is yn .
  • the height of the human body rectangle is h n .
  • the variation ⁇ y of the distance between the lower edge of the human body rectangular frame and the x_i axis can reflect the variation of the relative position between the user and the camera.
  • the above ⁇ y is the difference between the distance between the lower edge of the human body rectangular frame and the x_i axis in the next frame of image minus the distance between the lower edge of the human body rectangular frame and the x_i axis in the previous frame image. If the user approaches the camera, the distance between the lower edge of the human body rectangle and the x_i axis becomes smaller, and the degree of perspective deformation of the character in the image becomes larger.
  • the aforementioned compensation angle ⁇ shown in FIG. 4B can represent the degree of perspective deformation of the character in the image.
  • the above ⁇ is the difference between the latter compensation angle and the former compensation angle.
  • the above-mentioned previous compensation angle is used to correct the position information of a group of 3D key points determined according to the previous frame of images in two adjacent frames of images.
  • the above-mentioned latter compensation angle is used to correct the position information of a group of 3D key points determined according to the next frame of images in the adjacent two frames of images.
  • the variation ⁇ h of the height of the rectangular frame of the human body can reflect the variation of the user's height in the image.
  • the above ⁇ h is the difference between the height of the human body rectangular frame in the next frame image minus the height of the human body rectangular frame in the previous frame image. If the height of the human body rectangle frame becomes smaller, the degree of perspective deformation of the person in the image becomes larger. That is to say, the smaller the height of the human body rectangular frame in one frame of image, the larger the compensation angle ⁇ used to correct the position information of a group of 3D key points determined according to this frame of image.
  • the larger the absolute value of ⁇ h the larger the absolute value of the above-mentioned ⁇ .
  • ⁇ h ⁇ 0 ⁇ >0. That is, if the height of the user in the image becomes smaller, the degree of perspective deformation of the person in the image becomes larger, and the latter compensation angle is greater than the former compensation angle. If ⁇ h>0, then ⁇ 0. That is, if the height of the user in the image increases, the degree of perspective deformation of the person in the image decreases, and the latter compensation angle is smaller than the former compensation angle.
  • the size of the height h of the human body rectangular frame in the next frame of images can affect the amount of change in the degree of perspective deformation of the characters between the two adjacent frames of images. Understandably, the smaller the height of the user in the image, the greater the degree of perspective deformation of the user in the image. In the case that the ⁇ h between the two adjacent frames of images is the same, the smaller the height of the user in the two adjacent frames of images, the greater the variation in the degree of perspective deformation of the characters between the two adjacent frames of images. That is, when the user height in the image changes within a smaller value range, the change in the degree of perspective deformation of the person in the image is larger than that in the image where the user height changes within a larger value range.
  • the electronic device 100 can also determine the change of the relative position between the user and the camera, the camera User height and changes in user height in the acquired images.
  • the compensation angle determination model may be stored in the electronic device 100 .
  • the input of the compensation angle determination model may include ⁇ y and ⁇ h between two adjacent frames of images and the height h of the human body rectangle in the next frame of images, and the output may be ⁇ between the two adjacent frames of images.
  • the compensation angle determination model may be, for example, a linear model, a nonlinear model, a neural network model, or the like.
  • the embodiment of the present application does not limit the type of the compensation angle determination model.
  • the compensation angle determination model can be obtained through multiple sets of training samples.
  • the sets of training samples may be determined using images captured by the camera when the user's legs are upright.
  • a set of training samples may include ⁇ y and ⁇ h between two adjacent frames of images, and heights h and ⁇ of the rectangular frame of the human body in the next frame of images.
  • ⁇ in a set of training samples may be the difference between the latter compensation angle minus the former compensation angle.
  • the above-mentioned previous compensation angle is used to correct the position information of a group of 3D key points determined according to the previous frame of images in two adjacent frames of images.
  • the above-mentioned latter compensation angle is used to correct the position information of a group of 3D key points determined according to the next frame of images in the adjacent two frames of images.
  • the above-mentioned former compensation angle and the above-mentioned latter compensation angle are calculated by the method in the above-mentioned embodiment shown in FIG. 7A .
  • the embodiments of the present application do not limit the method for training the above compensation angle determination model, and for details, reference may be made to the training methods of models such as linear models or neural network models in the prior art.
  • the electronic device 100 may determine the ⁇ y′ and ⁇ h′ between the frame of image before the acquisition time and the frame of image after the acquisition time, and the rectangular frame of the human body in the image of the frame after the acquisition time high. Inputting the above ⁇ y′, ⁇ h′ and the height of the rectangular frame of the human body in a frame image with a later acquisition time into the compensation angle determination model, the electronic device 100 can obtain a frame image with a previous acquisition time and a frame image with a later acquisition time Compensation angle change between. That is, the electronic device 100 is not limited to determining the amount of compensating angle change between two adjacent frames of images, and the electronic device 100 may also determine the amount of compensating angular change between two frames of images with multiple frames in between.
  • the electronic device 100 may identify a set of 2D keypoints from a frame of image, and estimate a set of 3D keypoints corresponding to the frame of image according to the set of 2D keypoints. That is, each frame of image can correspond to a set of 3D key points.
  • the electronic device 100 can determine the amount of change in the degree of perspective deformation of the character between two adjacent frames of images according to the method in the foregoing embodiment. That is, for each frame of image, the electronic device 100 can determine a compensation angle.
  • the electronic device 100 may correct the position information of a group of 3D key points corresponding to a frame of image by using the compensation angle corresponding to the frame of image.
  • the electronic device 100 may identify multiple sets of 2D keypoints from consecutive multiple frames of images, and estimate a set of 3D keypoints corresponding to the multiple frames of images according to the multiple sets of 2D keypoints. For example, the electronic device 100 may determine a set of 3D keypoints using consecutive k frames of images. The above k is an integer greater than 1. According to the amount of change in the degree of perspective deformation of the character between the first k frames of images and the last k frames of images, the electronic device 100 can determine the compensation angle corresponding to the last k frames of images. That is, for every consecutive k frames of images, the electronic device 100 can determine a compensation angle. The electronic device 100 may correct the position information of a group of 3D key points corresponding to the consecutive k frames of images by using the compensation angles corresponding to the consecutive k frames of images.
  • the electronic device 100 may select one image from the first k frames of images and the last k frames of images.
  • the electronic device 100 may calculate the variation of the distance between the lower edge of the human body rectangular frame and the x_i axis and the variation of the height of the human body rectangular frame between the two selected frames of images. Based on the compensation angle determination model, the electronic device 100 may determine the amount of change in the degree of perspective deformation of the character between the above-mentioned first k frame images and the last k frame images.
  • the electronic device 100 may perform smooth processing on the compensation angle, and use the The smoothed compensation angle is used to correct the position information of this set of 3D keypoints. Specifically, the electronic device 100 may acquire multiple compensation angles corresponding to multiple sets of 3D key points before the aforementioned set of 3D key points. The electronic device 100 may calculate a compensation angle corresponding to the above-mentioned group of 3D key points and a weighted average of multiple compensation angles corresponding to multiple groups of 3D key points preceding the above-mentioned group of 3D key points.
  • the weight value occupied by the compensation angle corresponding to the 3D key point that is closer to the time determined by the above group of 3D key points may be larger.
  • the embodiment of the present application does not specifically limit the size of the weight occupied by each compensation angle when calculating the above-mentioned weighted average value.
  • the above weighted average is the smoothed compensation angle.
  • the electronic device 100 performs the above-mentioned smoothing process on the compensation angle, which can reduce the sudden change of the calculated compensation angle and improve the accuracy of 3D key point detection.
  • FIG. 10 exemplarily shows a flowchart of another method for detecting human body key points provided by an embodiment of the present application.
  • the electronic device 100 determines a set of 3D key points by using each frame of image as an example for description. As shown in FIG. 10, the method may include steps S201-S207. in:
  • the electronic device 100 may collect images through a camera.
  • step S201 reference may be made to step S101 in the aforementioned method shown in FIG. 8 .
  • the electronic device 100 may determine an initial compensation angle.
  • the electronic device 100 may determine the t nth group of 2D key points corresponding to the nth frame of images according to the nth frame of images collected by the camera, and estimate the user's tnth group of 2D key points corresponding to the nth frame of images according to the tnth group of 2D key points. t nth group of 3D keypoints.
  • the electronic device 100 can determine the displacement ⁇ y n of the lower edge of the rectangular frame of the human body from the image of the n-1 frame to the image of the n frame, and the height of the rectangular frame of the human body according to the image of the n-1 frame and the image of the n frame collected by the camera.
  • n is an integer greater than 1.
  • the n-1th frame image and the nth frame image are any adjacent two frames of images in the images collected by the camera.
  • the electronic device 100 may perform 2D key point detection on the first frame of image collected by the camera, and determine the t 1 th group of 2D key points corresponding to the first frame of image. According to the t1th group of 2D key points, the electronic device 100 may estimate the t1th group of 3D key points corresponding to the t1th frame image by the user. The electronic device 100 may use the above-mentioned initial compensation angle to correct the position information of the above-mentioned group t 1 of 3D key points.
  • the displacement ⁇ y n of the lower edge of the rectangular frame of the human body is the variation of the distance between the lower edge of the rectangular frame of the human body and the x_i axis between the n-1th frame image and the nth frame image.
  • the electronic device 100 may determine, according to the foregoing ⁇ y n , the foregoing ⁇ h n , and the foregoing h n , the compensation angle change amount from the t n-1 th group of 3D key points to the t n th group of 3D key points, and the t n-1 th group of 3D key points.
  • the key points are obtained from the n-1th frame image captured by the camera.
  • the electronic device 100 may use the compensation angle determination model to determine the compensation angle change amount ⁇ n from the t n-1 th group of 3D key points to the t n th group of 3D key points.
  • ⁇ n can reflect the variation of the degree of perspective deformation of the character between the n-1th frame image and the nth frame image.
  • the electronic device 100 may perform 2D key point detection on the n-1 th frame image collected by the camera, and determine the t n-1 th group of 2D key points corresponding to the n-1 th frame image. According to the t n-1 th group of 2D key points, the electronic device 100 may estimate the t n-1 th group of 3D key points corresponding to the n-1 th frame image by the user.
  • the electronic device 100 may determine the t n -th compensation angle according to the t n-1 th compensation angle and the above compensation angle variation, and the t n-1 th compensation angle is used to correct the t n-1 th group of 3D key points
  • the position information of the t n- 1th compensation angle is based on the initial compensation angle and the compensation angle change of all adjacent two groups of 3D key points from the t 1th group of 3D key points to the t n-1th group of 3D key points. sum of quantities.
  • the electronic device 100 may determine the t nth compensation angle according to the following formula (2):
  • ⁇ n is the t nth compensation angle.
  • ⁇ n-1 is the t n- 1th compensation angle.
  • ⁇ 1 is the above-mentioned initial compensation angle.
  • the electronic device 100 may determine the amount of compensating angular changes at the former set of 3D keypoints and at the latter set of 3D keypoints.
  • One or more groups of 3D key points may be spaced between the above-mentioned former group of 3D key points and the above-mentioned latter group of 3D key points.
  • the above compensation angle variation can be determined according to the height of the user and the distance between the user and the camera when the previous frame of image is collected, and the height of the user and the distance between the user and the camera when the next frame of image is collected.
  • the above-mentioned key points of the user in the previous frame of image are the above-mentioned previous group of 3D key points.
  • the above-mentioned key points of the user in the next frame of image are the above-mentioned group of 3D key points in the latter.
  • ⁇ nc may be the sum of the compensation angle change from the t1th group of 3D key points to the tncth group of key points and the compensation angle of the t1th group of 3D key points.
  • ⁇ ′ n is the compensation angle change between the former set of 3D key points and the latter set of 3D key points.
  • the electronic device 100 may correct the position information of the t n th group of 3D key points by using the t n th compensation angle.
  • the electronic device 100 can still determine the degree of perspective deformation of the characters in the image in real time, and perform the position information of the 3D key points determined based on the 2D key points. Correction.
  • the electronic device 100 may determine the compensation angle corresponding to each group of 3D key points.
  • the compensation angle corresponding to each group of 3D key points can be changed with the change of the relative position between the user and the camera, and the change of the action performed by the user.
  • a compensation angle corresponding to a set of 3D key points can reflect the degree of perspective deformation of the character in the image used to determine the set of 3D key points.
  • Using a compensation angle corresponding to a group of 3D key points to correct the position information of the group of 3D key points can improve the accuracy of 3D key point detection.
  • the electronic device 100 is based on the above-mentioned initial compensation angle, and accumulates the values of all adjacent two groups of 3D key points from the t1th group of 3D key points to the tnth group of 3D key points. Compensate for angular variation. Then, the error in the compensation angle change will also be accumulated. The larger the value of t n is, the larger the error of the t n th compensation angle determined by the electronic device 100 will be. This reduces the accuracy of 3D keypoint detection.
  • the electronic device 100 may periodically or irregularly detect whether the user's legs are upright. When detecting that the user's legs are upright, the electronic device 100 may calculate the included angle between the straight line where the 3D key points of the legs are located in the set of 3D key points determined based on the 2D key points and the image plane. If the included angle is smaller than the preset included angle, the electronic device 100 may determine the included angle as the compensation angle corresponding to the group of 3D key points, and use the compensation angle to correct the position information of the group of 3D key points .
  • the electronic device 100 may determine the compensation angle change amount corresponding to each group of 3D key points based on the compensation angle corresponding to this group of 3D key points and the method shown in FIG. 10 to determine the subsequent compensation angle corresponding to each group of 3D key points.
  • the electronic device 100 can use the angle between the straight line where the 3D key points of the legs are located and the image plane in the set of 3D key points determined based on the 2D key points when the user's legs are upright this time as basis to determine the subsequent compensation angle corresponding to each group of 3D key points.
  • the electronic device 100 can reduce the accumulated error when the compensation angle is determined by accumulating the compensation angle variation.
  • the above embodiment can further reduce the error in calculating the perspective deformation degree of the person in the image, and improve the accuracy of 3D key point detection.
  • FIG. 11 exemplarily shows a flowchart of another method for detecting human body key points provided by an embodiment of the present application.
  • the electronic device 100 determines a set of 3D key points by using each frame of image as an example for description. As shown in FIG. 11, the method may include steps S301-S310. in:
  • the electronic device 100 may collect images through a camera.
  • the electronic device 100 may determine the t nth group of 2D key points corresponding to the nth frame of images according to the nth frame of images collected by the camera, and estimate the user's tnth group of 2D key points corresponding to the nth frame of images according to the tnth group of 2D key points. t nth group of 3D keypoints.
  • the electronic device 100 may detect whether the user's legs are upright by using the t nth group of 3D key points.
  • the electronic device 100 may determine the angle between the straight line where the 3D key points of the legs are located in the t nth group of 3D key points and the image plane, where the image plane is the optical axis of the camera vertical plane.
  • the electronic device 100 may determine whether the above-mentioned included angle is smaller than the preset included angle.
  • steps S301 to S305 reference may be made to steps S101 to S106 in the method shown in FIG. 8 . I won't go into details here.
  • the electronic device 100 may determine the above-mentioned included angle as the t nth compensation angle.
  • the electronic device 100 may correct the position information of the t n th group of 3D key points by using the t n th compensation angle.
  • the electronic device 100 may calculate the angle between the straight line where the 3D key points of the legs are located in the 3D key points determined based on the 2D key points and the image plane.
  • the electronic device 100 may use the included angle as the compensation angle, and use the compensation angle to correct the position information of the 3D key points determined based on the 2D key points when the user's legs are upright this time. That is to say, if it is detected that the user's legs are upright, the electronic device 100 may not use the initial compensation angle or the compensation angle determined when the user's legs are upright as a basis to determine the corresponding 3D key points at this time. compensation angle.
  • the electronic device 100 may From the n-1th frame image and the nth frame image collected, determine the displacement ⁇ y n of the lower edge of the human body rectangle frame from the n-1th frame image to the nth frame image, the variation of the height of the human body rectangle frame ⁇ h n and the nth frame The height h n of the rectangular frame of the image body.
  • the electronic device 100 may determine, according to the above-mentioned ⁇ y n , the above-mentioned ⁇ h n and the above-mentioned h n , the compensation angle change amount from the t n-1 th group of 3D key points to the t n th group of 3D key points, and the t n-1 th group of 3D key points
  • the key points are obtained from the n-1th frame image captured by the camera.
  • the electronic device 100 may determine the tnth compensation angle according to the tn - 1th compensation angle and the above-mentioned compensation angle variation, and the tn - 1th compensation angle is used to correct the tn -1th group of 3D key points position information, the t n- 1th compensation angle is the angle between the straight line where the 3D key points of the legs in the t n-1 group of 3D key points are located and the image plane, or the t pth compensation angle and The sum of the compensation angle changes of all adjacent two groups of 3D key points from the t p th group of 3D key points to the t n-1 th group of 3D key points. Calculated when the user's legs are upright for the last time in the time before the n-1th frame.
  • ⁇ n is the t nth compensation angle.
  • ⁇ n-1 is the t n- 1th compensation angle.
  • the calculation method of ⁇ n-1 is different from the method shown in the aforementioned FIG. 10 .
  • the electronic device 100 uses the t n-1 th group of 3D key points to detect that the user's legs are upright, and the line where the 3D key points of the legs are located in the t n-1 th group of 3D key points is between the image plane
  • the included angle is smaller than the preset included angle, then the t n- 1th compensation angle ⁇ n-1 can be the clip between the straight line where the 3D key points of the legs are located in the t n-1th group of 3D key points and the image plane horn.
  • p is a positive integer less than n-1. That is, the p-th frame of image is the image collected by the camera before the n-1-th frame of image is collected.
  • ⁇ p is the t pth compensation angle.
  • ⁇ p can be used to correct the position information of the t p th group of 3D keypoints.
  • ⁇ p is calculated when the electronic device 100 detects that the user's legs are upright for the last time within the time before the camera captures the n-1 th frame image. That is, ⁇ p is the angle between the straight line where the 3D key point of the leg is located in the t p th group of 3D key points and the image plane is smaller than the preset angle.
  • the electronic device 100 may perform step S307 , that is, use ⁇ n to correct the position information of the t n th group of 3D key points.
  • the electronic device 100 may determine the amount of compensating angular changes at the former set of 3D keypoints and at the latter set of 3D keypoints.
  • One or more groups of 3D key points may be spaced between the above-mentioned former group of 3D key points and the above-mentioned latter group of 3D key points.
  • the above compensation angle variation can be determined according to the height of the user and the distance between the user and the camera when the previous frame of image is collected, and the height of the user and the distance between the user and the camera when the next frame of image is collected.
  • the above-mentioned key points of the user in the previous frame of image are the above-mentioned previous group of 3D key points.
  • the above-mentioned key points of the user in the next frame of image are the above-mentioned group of 3D key points in the latter.
  • the electronic device 100 may determine the compensation angle of the latter group of 3D key points according to the compensation angle of the former group of 3D key points and the compensation angle change amount.
  • the preceding image in the previous frame may be collected when the electronic device 100 detects that the user's legs are standing upright.
  • the preceding frame of image may be collected when the electronic device 100 detects that the user's leg is upright for the latest time before the collection time of the preceding frame of image.
  • the electronic device 100 may not detect whether the user's legs are upright for each group of 3D key points.
  • the electronic device 100 may periodically or irregularly detect whether the user's legs are upright.
  • the electronic device 100 may monitor whether the user's legs are upright when the fitness class is played to instruct the user to complete the leg upright action.
  • the electronic device 100 may determine the user's 3D key points by using the currently captured image.
  • the electronic device 100 may determine the compensation angle according to steps S308 to S310 in FIG. 11 , and use the compensation angle to correct the above-mentioned 3D key points.
  • the electronic device 100 can directly determine the compensation angle according to the method shown in the foregoing FIG. 8 . Otherwise, the electronic device 100 may determine the compensation angle according to the method shown in the aforementioned FIG. 10 on the basis of the initial compensation angle or the compensation angle determined when the user's legs are upright the most recent time. This can reduce the accumulated error when the compensation angle is determined by accumulating the compensation angle variation. Compared with the methods shown in FIG. 8 and FIG. 10 , the above embodiment can further reduce the error in calculating the perspective deformation degree of the person in the image, and improve the accuracy of 3D key point detection.
  • the electronic device 100 may also determine the compensation angle when it is detected that the user's posture matches a preset posture.
  • the above-mentioned preset posture may be a posture in which the upper body is upright and/or the legs are upright. That is to say, the aforementioned detection of whether the user's legs are upright is an implementation manner of detecting whether the user's posture matches the preset posture.
  • the electronic device 100 may determine the angle between the straight line where the 3D key point of the upper body is located and the image plane as the compensation angle, and use the compensation angle to correct the 3D key point.
  • the electronic device 100 may determine the angle between the straight line where the upper body 3D key point and/or the leg 3D key point is located and the image plane as a compensation angle, and use the compensation angle to adjust the 3D Correct key points.
  • FIG. 12 exemplarily shows a position distribution diagram of key points of another human body.
  • the key points of the human body may include the head point, the first neck point, the second neck point, the left shoulder point, the right shoulder point, the right elbow point, the left elbow point, the right hand point, the left hand point, the first The first chest and abdomen point, the second chest and abdomen point, the third chest and abdomen point, the right hip point, the left hip point, the left and right hip middle point, the right knee point, the left knee point, the right ankle point, the left ankle point.
  • the embodiments of the present application may also include other key points, which are not specifically limited here.
  • the electronic device 100 can identify more key points of the human body.
  • the electronic device 100 can identify the position information of each key point shown in FIG. 12 in the 2D plane and the position information in the 3D space according to the image collected by the camera, and determine the 2D key point and 3D key point corresponding to each key point shown in FIG. 12 . key point.
  • FIG. 13 exemplarily shows a flowchart of another method for detecting human body key points provided by an embodiment of the present application.
  • the method may include steps S401-S408. in:
  • the electronic device 100 may collect images through a camera.
  • the electronic device 100 may determine m groups of 2D key points corresponding to the m frames of images according to the m frames of images collected by the camera.
  • the electronic device 100 may estimate a group of 3D key points corresponding to the m frames of images by the user according to the m groups of 2D key points.
  • the electronic device 100 may use a set of 3D key points corresponding to the above m frames of images to determine whether the user's posture matches a preset posture, where the preset posture is a posture in which the upper body is upright and/or the legs are upright.
  • the above m frames of images may be any m frames of images collected by a camera. That is, the electronic device 100 can determine whether the posture of the user determined by each group of 3D key points matches the preset posture. If the user's posture matches the preset posture, the electronic device 100 may perform the following step S405. If the user's posture does not match the preset posture, the electronic device 100 may perform the following step S408.
  • the above-mentioned preset gestures may be gestures included in fitness classes or somatosensory games.
  • the electronic device 100 may acquire actions that a fitness class or a somatosensory game instructs the user to complete.
  • the electronic device 100 may store the time when the preset posture is played in the fitness course or the somatosensory game and the 3D key points corresponding to the preset posture.
  • the electronic device 100 may compare a set of 3D key points corresponding to the m frames of images with the 3D key points corresponding to the preset posture.
  • the electronic device 100 may perform the following step S405.
  • the above-mentioned m-frame images are collected by the camera at the moment when the fitness course or the somatosensory game is played to the above-mentioned preset posture.
  • the electronic device 100 may store the moment when the above-mentioned preset posture is played in a fitness class or a somatosensory game.
  • the electronic device 100 may acquire the 3D key points corresponding to the above-mentioned preset posture. Further, the electronic device 100 may compare a set of 3D key points corresponding to the above m frames of images with the 3D key points corresponding to the above preset posture.
  • the above-mentioned m-frame images are collected by the camera at a time when the fitness course or the somatosensory game is played out of the above-mentioned preset posture.
  • the electronic device 100 may correct the 3D key points determined by the m-frame images by using the compensation angle determined when the latest fitness class or somatosensory game is played to the preset posture. That is to say, the electronic device 100 may not need to judge whether the posture of the user determined by each group of 3D key points matches the preset posture. This can save computing resources of the electronic device 100 .
  • the above-mentioned preset gestures may be gestures included in the action library of other applications.
  • the above-mentioned action library may store information indicating that the user completes each action (eg, image information of the action, audio information of the action, etc.).
  • the electronic device 100 may instruct the user to complete actions corresponding to the preset postures a period of time before the fitness course or the somatosensory game is played, or during the playing of the fitness course or the somatosensory game. That is, the above-mentioned preset postures may not be included in fitness courses or somatosensory games.
  • the electronic device 100 may instruct the user to complete the action corresponding to the preset posture at regular or irregular intervals.
  • the electronic device 100 may compare a set of 3D key points corresponding to the above m frames of images with a set of 3D key points corresponding to the above preset posture. If the posture of the user indicated by the set of 3D key points corresponding to the above m frames of images matches the above preset posture, the electronic device 100 may perform the following step S405. The above m frames of images are collected by the camera when the electronic device 100 instructs the user to complete the action corresponding to the preset posture.
  • the electronic device 100 may use a part of the key points in a set of 3D key points (eg, the key points of the upper body, or the key points of the legs) for comparison.
  • the above preset posture is an upright posture of the upper body, and the electronic device 100 can compare the upper body 3D key points in the set of 3D key points of the user with the upper body 3D key points in the set of 3D key points corresponding to the above preset posture. If the position information of the two sets of upper body 3D key points is the same or the difference is smaller than the threshold, the electronic device 100 may determine that the user's posture matches the preset posture.
  • the above-mentioned preset posture is an upright posture of the legs, and the electronic device 100 can compare the leg 3D key points in the set of 3D key points of the user with the leg 3D key points in the set of 3D key points corresponding to the above-mentioned preset posture. If the position information of the two sets of leg 3D key points is the same or the difference is smaller than the threshold, the electronic device 100 may determine that the user's posture matches the preset posture.
  • the electronic device 100 may determine the angle between the straight line where some 3D key points are located in the above-mentioned group of 3D key points and the image plane, and some 3D key points include the 3D key points of the upper body and/or the 3D key points of the legs,
  • the image plane is the plane perpendicular to the optical axis of the camera.
  • the above-mentioned preset posture is an upright posture of the upper body.
  • the electronic device 100 detects that the user's posture matches the above-mentioned preset posture, which can indicate that in the set of 3D key points corresponding to the above-mentioned m-frame images, the neck point (the first neck point and the second neck point shown in FIG. 12 ) point) and the chest and abdomen points (the first chest and abdomen points, the second chest and abdomen points, and the third chest and abdomen points as shown in Figure 12) are approximately on a straight line.
  • the electronic device 100 can determine the straight line where any two 3D key points of the first neck point, the second neck point, the first chest and abdomen point, the second chest and abdomen point and the third chest and abdomen point are located in the above set of 3D key points and The angle between the image planes. For example, the electronic device 100 determines the included angle between the straight line where the first neck point and the third chest and abdomen point in the above set of 3D key points are located and the image plane.
  • the electronic device 100 may calculate the first neck point, the second neck point, the first chest and abdomen point, the second chest and abdomen point, and the third chest and abdomen point. The average value of the angles between the straight lines where any two 3D key points are located and the image plane among the abdominal points.
  • the above-mentioned preset posture is a posture with upright legs.
  • the electronic device 100 may determine the angle between the straight line where the 3D key point of the leg is located and the image plane according to the method in the foregoing embodiment. I won't go into details here.
  • the above-mentioned preset posture is a posture in which the upper body is upright and the legs are upright.
  • the electronic device 100 detects that the user's posture matches the above-mentioned preset posture, which may indicate that among the set of 3D key points corresponding to the above-mentioned m-frame images, the first neck point, the second neck point, the first chest and abdomen point, the second The chest and abdomen point, the third chest and abdomen point, the right hip point, the right knee point, the right ankle point, the left hip point, the left knee point, and the left ankle point are approximately on the same plane.
  • the electronic device 100 can determine the first neck point, the second neck point, the first chest and abdomen point, the second chest and abdomen point, the third chest and abdomen point, the right hip point (or the left hip point in the above set of 3D key points). ), right knee point (or left knee point), right ankle point (or left ankle point), the angle between the straight line where any two 3D key points are located and the image plane. For example, the electronic device 100 determines the included angle between the line where the first neck point and the right ankle point are located in the above set of 3D key points and the image plane.
  • the electronic device 100 may calculate the first neck point, the second neck point, the first chest and abdomen point, the second chest and abdomen point, and the third chest and abdomen point.
  • Abdominal point, right hip point (or left hip point), right knee point (or left knee point), right ankle point (or left ankle point) are defined by the line between any two 3D key points and the image plane. The mean value of the included angle.
  • the electronic device 100 may determine whether the above-mentioned included angle is smaller than the preset included angle.
  • the electronic device 100 can use the above-mentioned included angle to update its stored compensation angle, and the compensation angle stored by the electronic device 100 after the update is the above-mentioned included angle.
  • the electronic device 100 may use the compensation angle stored by itself to correct the position information of the group of 3D key points.
  • the method for determining the initial compensation angle in the method flow chart shown in the foregoing FIG. 10 may be the method shown in the foregoing FIG. 13 .
  • the electronic device 100 may also use the method shown in FIG. 13 to determine the compensation angle.
  • the electronic device 100 can use the 3D key points determined based on the image to determine the degree of perspective deformation of the character in the image.
  • the above-mentioned preset posture may be a posture in which the upper body is upright and/or the legs are upright. That is to say, when the upper body of the user is upright and/or the legs are upright, the electronic device 100 can use the 3D key points determined based on the image to determine the degree of perspective deformation of the character in the image.
  • the electronic device 100 can correct the position information of the 3D key points determined based on the image, reduce the error caused by the image perspective deformation on the position information of the 3D key points, and improve the accuracy of the position information detection of the 3D key points.
  • This method can only need one camera, which not only saves cost, but also has low computational complexity for key point detection.
  • the electronic device 100 can update the compensation angle.
  • the updated compensation angle can more accurately reflect the degree of perspective deformation of the character in the image captured by the camera at the current position of the user. This can reduce the influence of the position change between the user and the camera on the correction of the position information of the 3D key points, and improve the accuracy of the position information of the 3D key points after correction. That is to say, the electronic device 100 can correct the position information of the 3D key points by using the updated compensation angle.
  • the human body model determined by the corrected 3D key points can more accurately reflect the user's posture and improve the accuracy of posture detection. In this way, in a fitness or somatosensory game, the electronic device 100 can more accurately determine whether the user's posture is correct and whether the user's motion range meets the requirements, etc., so that the user has a better experience when playing a fitness or somatosensory game.
  • the electronic device may determine the first time according to the first multimedia information, and the first time may be the time when the first multimedia information instructs the user to perform an action that satisfies the first condition.
  • the above-mentioned first multimedia information may be related content in a fitness course or a somatosensory game application.
  • the above-mentioned multimedia information may include one or more types of content in the form of video, animation, voice, and text.
  • the above-mentioned action satisfying the first condition may be an upright action of the upper body and/or an upright action of the leg.
  • the electronic device may determine whether the user performs the action satisfying the first condition by comparing the 3D key point of the user with the 3D key point corresponding to the above-mentioned action satisfying the first condition.
  • the electronic device may determine whether the user performs an action that satisfies the first condition according to the method shown in FIG. 7A .
  • the electronic device may determine the compensation angle variation according to the first model.
  • the above-mentioned first model is the compensation angle determination model in the foregoing embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

本申请提供一种人体关键点检测方法及相关装置。该方法可适用于配置有有一个或多个摄像头的电子设备。电子设备可以识别摄像头采集的图像中用户的3D关键点。电子设备可以根据上述3D关键点检测用户的姿态是否与预设姿态匹配。利用在用户的姿态与预设姿态匹配时确定出的一组3D关键点,电子设备可以计算这一组3D关键点确定出的人体模型与图像平面之间的夹角。电子设备可以利用该夹角对3D关键点的位置信息进行矫正。该方法可以节约成本,减少图像透视形变对3D关键点的位置信息带来的误差,提高3D关键点的位置信息检测的准确度。

Description

人体关键点检测方法及相关装置
本申请要求于2021年03月31日提交中国专利局、申请号为202110351266.0、申请名称为“人体关键点检测方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及人体关键点检测方法及相关装置。
背景技术
人体关键点检测是诸多计算机视觉任务的基础。通过检测人体的三维(3 dimensions,3D)关键点,可以实现姿态检测、动作分类、智能健身、以及体感游戏等等。
电子设备可以通过摄像头采集图像,并识别图像中人体的二维(2 dimensions,2D)关键点。基于2D关键点,电子设备可以利用深度学习等技术估计人体的3D关键点。在上述通过2D关键点估计3D关键点的方法中,由于用户的身高差异、用户与摄像头的距离差异等因素,摄像头采集的图像会存在不同程度的透视形变。图像透视形变会导致检测得到的2D关键点出现误差。那么通过上述2D关键点估计得的3D关键点也会存在误差。
目前,电子设备可通过多个摄像头来检测用户距离摄像头所在的位置,来确定图像中人物透视形变的程度。进一步的,电子设备可以根据图像中人物透视形变的程度对3D关键点进行矫正。但上述方法需要多个摄像头且需要设计这多个摄像头的摆放位置才能提高检测精度,不仅成本高,而且确定3D关键点的计算复杂度高。
发明内容
本申请提供一种人体关键点检测方法及相关装置。该方法可适用于配置有一个或多个摄像头的电子设备。电子设备可以识别摄像头采集的图像中用户的3D关键点。电子设备可以根据上述3D关键点检测用户的姿态是否与预设姿态匹配。利用在用户的姿态与预设姿态匹配时确定出的一组3D关键点,电子设备可以计算这一组3D关键点确定出的人体模型与图像平面之间的夹角。电子设备可以利用该夹角对3D关键点的位置信息进行矫正。该方法可以节约成本,减少图像透视形变对3D关键点的位置信息带来的误差,提高3D关键点的位置信息检测的准确度。
第一方面,本申请提供一种人体关键点检测方法,该方法可应用于包含一个或多个摄像头的电子设备。在该方法中,电子设备可以通过摄像头获取第一用户的第一图像。电子设备可以根据第一图像确定第一用户的第一组3D关键点。电子设备可以判断第一组3D关键点中的多个3D关键点是否满足第一条件。若满足第一条件,电子设备可以根据多个3D关键点确定第一补偿角。利用第一补偿角对第一组3D关键点进行旋转矫正。
结合第一方面,在一些实施例中,若不满足第一条件,电子设备可以利用第二补偿角对第一组3D关键点进行旋转矫正。第二补偿角根据第二组3D关键点确定。第二组3D关键点为在获取第一图像之前最近一组满足第一条件的3D关键点。
结合第一方面,在一些实施例中,电子设备通过摄像头获取第一用户的第一图像的方法 可以为,电子设备可以根据第一多媒体信息确定第一时刻,第一时刻为第一多媒体信息指示用户进行满足第一条件的动作的时刻。电子设备可以通过摄像头在第一时刻开始的第一时间段内,获取第一用户的第一图像。
上述第一时间段可以是第一多媒体信息指示用户进行满足第一条件的动作的时间段。例如,完成上述满足第一条件的动作需要的时间为1秒。则上述第一时间段为从第一时刻开始的1秒。可选的,上述第一时间段可以是固定时间长度的时间段。
结合第一方面,在一些实施例中,若在第二时刻多媒体信息指示用户进行的动作对应的3D关键点不满足第一条件,电子设备可以利用第三补偿角对根据从第二时刻开始的第二时间段内采集的图像确定的3D关键点进行旋转矫正。第三补偿角根据第三组3D关键点确定。第三组3D关键点为在第二时刻之前最近一组满足第一条件的3D关键点。
上述第二时间段可以是第一多媒体信息指示用户进行不满足第一条件的动作的时间段。例如,完成上述满足第一条件的动作需要的时间为1秒。则上述第二时间段为从第二时刻开始的1秒。可选的,上述第二时间段可以是固定时间长度的时间段。
结合第一方面,在一些实施例中,电子设备判断第一组3D关键点中的多个3D关键点是否满足第一条件的方法可以为,电子设备可以判断第一组3D关键点中的多个3D关键点是否与第一动作对应的3D关键点匹配,第一动作为上半身、腿部中至少一项直立的动作。
其中,若上述第一动作为上半身直立的动作,则第一补偿角可以为第一组3D关键点中颈部点与胸腹部点所在直线与图像平面的夹角。
若上述第一动作为腿部直立的动作,则第一补偿角为第一组3D关键点中髋点、膝点、脚踝点中任意两个3D关键点所在直线与图像平面的夹角。
结合第一方面,在一些实施例中,第一组3D关键点中的多个3D关键点包括髋点、膝点、脚踝点。电子设备判断第一组3D关键点中的多个3D关键点是否满足第一条件的方法可以为,电子设备可以计算第一组3D关键点中左髋点与左膝点所在的直线、左膝点与左脚点所在的直线之间的第一夹角,以及第一组3D关键点中右髋点与右膝点所在的直线、右膝点与右脚点所在的直线之间的第二夹角。电子设备可以通过检测第一夹角与180°之间的差值是否小于第一差值、第二夹角与180°之间的差值是否小于第一差值来判断第一组3D关键点中的多个3D关键点是否满足第一条件。
其中,第一组3D关键点中的多个3D关键点满足第一条件的情况包括:第一夹角与180°之间的差值小于第一差值且/或第二夹角与180°之间的差值小于第一差值。
由上述方法可以看出,在检测到用户的3D关键点满足第一条件时,电子设备可以利用基于图像确定出的3D关键点来确定图像中人物透视形变的程度。上述用户的3D关键点满足第一条件可以电子设备检测到用户进行上半身直立且/或腿部直立的动作。也即是说,在用户的上半身直立且/或腿部直立时,电子设备均可以利用基于图像确定出的3D关键点来确定图像中人物透视形变的程度。进而,电子设备可以对基于图像确定出的3D关键点的位置信息进行矫正,减少图像透视形变对3D关键点的位置信息带来的误差,提高3D关键点的位置信息检测的准确度。该方法可以只需要一个摄像头,不仅节约成本,而且进行关键点检测的计算复杂度也较低。
另外,每一次检测到用户的3D关键点满足第一条件时,若上半身的3D关键点和/或腿部的3D关键点所在的直线与图像平面之间的夹角可用于矫正3D关键点的位置信息,电子设备可以更新补偿角。更新之后的补偿角可以更准确地反映用户在当前位置下,摄像头采集的图像中人物透视形变的程度。这可以减少用户与摄像头之间的位置发生变化对3D关键点的 位置信息进行矫正的影响,提高矫正后3D关键点的位置信息的准确度。也即是说,电子设备可以利用更新之后的补偿角来矫正3D关键点的位置信息。经过矫正之后的3D关键点确定的人体模型可以更准确地反映用户的姿态,提高姿态检测的准确率。这样,在健身或体感游戏中,电子设备可以更准确地判断出用户的姿态是否正确以及用户动作的幅度是否满足要求等,使得用户在进行健身或体感游戏时具有更好的体验。
结合第一方面,在一些实施例中,第二补偿角为第二组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角。第三补偿角为第三组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角。
结合第一方面,在一些实施例中,第二补偿角为第一补偿角变化量与第三夹角之和;第三夹角为第二组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角;第一补偿角变化量是根据第一高度H1,第一距离Y1,第二高度H2以及第二距离Y2确定出来的,其中,H1和Y1分别是采集第一图像时第一用户的高度、第一用户与摄像头的距离,H2和Y2分别是采集第二图像时第一用户的高度、第一用户与摄像头的距离;第一用户在第一图像中的关键点为第一组3D关键点,第一用户在第二图像中的关键点为第二组3D关键点。其中,从采集第二图像至采集第一图像之间的时间段内,第一用户的高度减少得越多,第二补偿角相比于第三夹角增加得越多,第一用户与摄像头的距离减少得越多,第二补偿角相比于第三夹角增加得越多,H1越小,第二补偿角相比于第三夹角增加得越多。
第三补偿角为第二补偿角变化量与第四夹角之和;第四夹角为第三组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角;第二补偿角变化量是根据第三高度H3,第三距离Y3,第四高度H4以及第四距离Y4确定出来的,其中,H3和Y3分别是在第二时间段内第一用户的高度、第一用户与摄像头的距离,H4和Y4分别是采集第三图像时第一用户的高度、第一用户与摄像头的距离;第一用户在第三图像中的关键点为第三组3D关键点。其中,从采集第二图像至第二时间段之间的时间段内,第一用户的高度减少得越多,第三补偿角相比于第四夹角增加得越多,第一用户与摄像头的距离减少得越多,第三补偿角相比于第四夹角增加得越多,H3越小,第三补偿角相比于第四夹角增加得越多。
上述第一补偿角变换量和第二补偿角变化量均由第一模型确定,第一模型由多组训练样本训练得到,一组训练样本包括:采集时间在前的图像和采集时间在后的图像之间人体在图像中所在位置的下边缘的变化量、人体在图像中高度的变化量、人体在采集时间在后的图像中的高度、根据采集时间在前的图像和采集时间在后的图像确定的两组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角的变化量;两组3D关键点中的多个3D关键点均满足第一条件。
由上述方法可以看出,即便用户的3D关键点不满足第一条件,电子设备也可以实时确定出图像中人物透视形变的程度,并对3D关键点的位置信息进行矫正,从而提高3D关键点检测的准确率。
结合第一方面,在一些实施例中,若判断出第一组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角小于第五夹角,电子设备可以根据上述多个3D关键点确定上述第一补偿角。若判断出第一组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与平面图像的夹角大于第五夹角,电子设备可以利用第二补偿角对第一组3D关键点进行旋转矫正。
上述第五夹角可以根据经验值设定。本申请实施例对此不作限定。
由上述方法可知,电子设备可以通过设置第五夹角来判断上述计算得到的夹角是否可以 作为矫正3D关键点的位置信息的补偿角,从而避免由于上述夹角计算失误导致补偿角为不可能的取值,提高检测3D关键点的准确率。
第三补偿角为第二补偿角变化量与第四夹角之和;第四夹角为第三组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角;第二补偿角变化量是根据第三高度H3,第三距离Y3,第四高度H4以及第四距离Y4确定出来的,其中,H3和Y3分别是在第二时间段内第一用户的高度、第一用户与摄像头的距离,H4和Y4分别是采集第三图像时第一用户的高度、第一用户与摄像头的距离;第一用户在第三图像中的关键点为第三组3D关键点;
其中,从采集第二图像至第二时间段之间的时间段内,第一用户的高度减少得越多,第三补偿角相比于第四夹角增加得越多,第一用户与摄像头的距离减少得越多,第三补偿角相比于第四夹角增加得越多,H3越小,第三补偿角相比于第四夹角增加得越多。
判断出第一组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与图像平面的夹角小于第五夹角;
若第一组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与平面图像的夹角大于第五夹角,利用第二补偿角对第一组3D关键点进行旋转矫正
第二方面,本申请提供一种电子设备,电子设备包括摄像头、显示屏、存储器和处理器,其中:摄像头可用于采集图像,存储器可用于存储计算机程序,处理器可用于调用计算机程序,使得电子设备执行上述第一方面中任一种可能的实现方式。
第三方面,本申请提供一种计算机存储介质,包括指令,当上述指令在电子设备上运行时,使得上述电子设备执行上述第一方面中任一种可能的实现方式。
第四方面,本申请实施例提供一种芯片,该芯片应用于电子设备,该芯片包括一个或多个处理器,该处理器用于调用计算机指令以使得该电子设备执行上述第一方面中任一种可能的实现方式。
第五方面,本申请实施例提供一种包含指令的计算机程序产品,当上述计算机程序产品在设备上运行时,使得上述电子设备执行上述第一方面中任一种可能的实现方式。
可以理解地,上述第二方面提供的电子设备、第三方面提供的计算机存储介质、第四方面提供的芯片、第五方面提供的计算机程序产品均用于执行本申请实施例所提供的方法。因此,其所能达到的有益效果可参考对应方法中的有益效果,此处不再赘述。
附图说明
图1是本申请实施例提供的一种人体关键点的位置分布图;
图2A是本申请实施例提供的一种确定2D关键点的位置信息的2D坐标系示意图;
图2B是本申请实施例提供的一种确定3D关键点的位置信息的3D坐标系示意图;
图3是本申请实施例提供的一种人体关键点检测的场景示意图;
图4A和图4B是本申请实施例提供的电子设备100检测到的3D关键点的示意图;
图5是本申请实施例提供的电子设备100的结构示意图;
图6A~图6D是本申请实施例提供的一些人体关键点检测的场景示意图;
图7A和图7B是本申请实施例提供的电子设备100检测到的3D关键点的示意图;
图8是本申请实施例提供的一种人体关键点检测方法的流程图;
图9A~图9C是本申请实施例提供的一些人体关键点检测的场景示意图;
图10是本申请实施例提供的另一种人体关键点检测方法的流程图;
图11是本申请实施例提供的另一种人体关键点检测方法的流程图;
图12是本申请实施例提供的另一种人体关键点的位置分布图;
图13是本申请实施例提供的另一种人体关键点检测方法的流程图。
具体实施方式
下面将结合附图对本申请实施例中的技术方案进行清楚、详尽地描述。其中,在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;文本中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,另外,在本申请实施例的描述中,“多个”是指两个或多于两个。
以下,术语“第一”、“第二”仅用于描述目的,而不能理解为暗示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征,在本申请实施例的描述中,除非另有说明,“多个”的含义是两个或两个以上。
图1示例性示出了人体的关键点的位置分布图。如图1所示,人体的关键点可包括:头部点、颈部点、左肩点、右肩点、右肘点、左肘点、右手点、左手点、右髋点、左髋点、左右髋中间点、右膝点、左膝点、右脚踝点、左脚踝点。不限于上述关键点,本申请实施例中还可以包括其他包括关键点,此处不做具体限定。
下面对本申请实施例中涉及的2D关键点和3D关键点的概念进行具体说明。
1、2D关键点
本申请实施例中的2D关键点可以表示分布在2D平面上的关键点。电子设备可以通过摄像头采集用户的图像,并识别该图像中用户的2D关键点。上述2D平面可以是摄像头采集的图像所在的图像平面。电子设备识别图像中用户的2D关键点具体可以是确定用户的关键点在2D平面上的位置信息。其中,2D关键点的位置信息可以通过2D平面中的二维坐标来表示。
在一种可能的实现方式中,各2D关键点的位置信息可以是上述各2D关键点中以一个2D关键点为参考点的位置信息。例如,以左右髋中间点为参考点,左右髋中间点在2D平面上的位置信息可以为坐标(0,0)。然后,电子设备可以根据其他2D关键点与左右髋中间点的相对位置确定其他2D关键点的位置信息。
在另一种可能的实现方式中,电子设备可以以图像平面为基础建立如图2A所示的2D坐标系x_i-y_i。其中,该2D坐标系可以以摄像头采集的图像的一个顶点为原点,以图像中拍摄物的水平方向为2D坐标系的x_i轴的方向,以图像中拍摄物的竖直方向为2D坐标系的y_i轴的方向。各2D关键点的位置信息可以是用户的关键点在该2D坐标系中的二维坐标。
本申请实施例对确定上述2D关键点的位置信息的方法不作限定。
电子设备从摄像头采集的一帧图像中可识别出一组2D关键点。这一组2D关键点可包括图1所示人体的全部关键点。一组2D关键点可用于确定一个在2D平面上的人体模型。
2、3D关键点
本申请实施例中的3D关键点可以表示分布在3D空间中的关键点。基于2D关键点,电 子设备可以利用深度学习等技术估计用户的3D关键点。上述3D空间可以电子设备的摄像头所在的3D空间。电子设备确定用户的3D关键点具体可以是确定用户的关键点在3D空间中的位置信息。其中,3D关键点的位置信息可以通过3D空间中的三维坐标来表示。相比于2D关键点,3D关键点的位置信息中包含了用户的关键点的深度信息。即3D关键点的位置信息可以反映出用户的关键点相对于摄像头的远近程度。
在一种可能的实现方式中,各3D关键点的位置信息可以是上述各3D关键点中以一个3D关键点为参考点的位置信息。例如,以左右髋中间点为参考点,左右髋中间点在3D控件中的位置信息可以为坐标(0,0,0)。然后,电子设备可以根据其他3D关键点与左右髋中间点的相对位置确定其他3D关键点的位置信息。
在另一种可能的实现方式中,电子设备可以以摄像头所在的3D空间建立如图2B所示的3D坐标系x-y-z。其中,该3D坐标系可以以摄像头的光心为原点,以摄像头光轴所在的方向(即与图像平面垂直的方向)为z轴的方向,以图2A所示的2D坐标系中x_i轴的方向和y_i轴的方向分别为该3D坐标系x轴的方向和y轴的方向。各3D关键点的位置信息可以是用户的关键点在该3D坐标系中的三维坐标。
图2B所示的3D坐标系为右手坐标系。其中-x可以表示x轴的负方向。本申请实施例对上述3D坐标系的建立方法不作限定。例如,上述3D坐标系还可以为左手坐标系。电子设备可以确定用户的关键点在左手坐标系中的三维坐标。
不限于利用深度学习的方法来估计3D关键点的位置信息,电子设备可以通过其他方法来确定3D关键点的位置信息。电子设备基于2D关键点估计3D关键点的位置信息的实现过程具体可以参考现有技术的实现,本申请实施例对此不作赘述。
电子设备可以利用一组2D关键点确定出一组3D关键点。或者,电子设备可以利用从连续多帧图像中确定出的多组2D关键点确定出一组3D关键点。上述一组3D关键点可包括图1所示人体的全部关键点。一组3D关键点可用于确定一个在3D空间中的人体模型。
在一些实施例中,电子设备可以对一帧图像或连续多帧图像进行3D关键点检测,从一帧或连续多帧图像中确定出一组3D关键点。即电子设备可以不用先确定图像的2D关键点,再基于2D关键点估计用户的3D关键点。本申请实施例对电子设备进行3D关键点检测的具体方法不作限定。
本申请后续实施例中具体以电子设备先确定图像的2D关键点,再基于2D关键点估计用户的3D关键点的方法为例来介绍本申请提供的人体关键点检测方法。
摄像头采集的图像中人物存在透视形变会影响3D关键点检测的准确性。下面具体说明透视形变对3D关键点检测的影响。
图3示例性示出了电子设备100通过检测人体的3D关键点,来实现智能健身的场景示意图。
如图3所示,电子设备100可包含摄像头193。电子设备100可以通过摄像头193采集图像。摄像头193采集的图像中可包括用户在健身过程中的图像。电子设备100可以在用户界面210显示摄像头193采集的图像。本申请实施例对上述用户界面210显示的内容不作限定。
电子设备100可以识别摄像头193采集的图像中用户的2D关键点。基于上述2D关键点,电子设备100可以估计一组用户的3D关键点。一组3D关键点可用于确定出一个用户在3D空间中的人体模型。由3D关键点确定出的人体模型可以反映用户的姿态。3D关键点的位置 信息越准确,由3D关键点确定出的人体模型可以越准确地反映用户的姿态。
由于不同产品上,摄像头的俯仰角、视场以及高度存在差异。在摄像头采集图像的过程中,用户与摄像头之间的距离可能发生变化。受上述因素的影响,摄像头采集的图像会发生透视形变。若电子设备100利用存在透视形变的图像确定出用户的2D关键点,那么上述2D关键点确定出的人体模型的长宽比与用户实际人体的长宽比存在差异。电子设备100利用上述2D关键点确定出用户的3D关键点。上述3D关键点的位置信息会存在误差。上述3D关键点确定的人体模型难以准确反映用户的姿态。
图4A和图4B从不同方位示例性示出了用户处于站立姿态时电子设备100确定出的一组3D关键点。
图4A所示的3D坐标系可以为前述图2B所示的3D坐标系x-y-z。图4A从z轴所在的方向示出了一组用户的3D关键点。该方位相当于从用户的后方观察用户的姿态。由图4A可以看出,这一组3D关键点确定出的一个人体模型反映出的用户的姿态为站立姿态。
图4B所示的3D坐标系可以为前述图2B所示的3D坐标系x-y-z。z可以表示z轴的正方向。z轴的正方向为摄像头指向被拍摄物体的方向。图4B从x轴的负方向示出了一组用户的3D关键点。该方位相当于从用户的侧方观察用户的姿态。由图4B可以看出,这一组3D关键点确定出的人体模型向摄像头193所在的方向前倾。在用户处于站立姿态时,一组3D关键点确定出的人体模型应该是垂直于平面x-0-z的。由于图像存在透视形变,一组3D关键点确定出的人体模型会存在如图4B所示的前倾姿态。一组3D人体模型向摄像头193所在方向前倾的程度越大,摄像头193采集的图像中人物透视形变的程度越大。
本申请提供一种人体关键点检测方法,可以适用于配置有单目摄像头的电子设备。电子设备可以通过单目摄像头采集的图像来检测人体的3D关键点。上述单目摄像头可以为前述电子设备100中的摄像头193。该方法可以节约成本,提高3D关键点的位置信息检测的准确度。
具体的,电子设备可以识别摄像头采集的图像中用户的2D关键点,并根据该2D关键点估计用户的3D关键点。电子设备可以根据上述3D关键点确定用户的腿部是否直立。利用在用户的腿部直立时确定出的一组3D关键点,电子设备可以计算这一组3D关键点确定出的人体模型与图像平面之间的补偿角。进一步的,电子设备可以利用该补偿角对3D关键点的位置信息进行矫正,减少图像透视形变对3D关键点的位置信息带来的误差,提高3D关键点的位置信息检测的准确度。
上述图像平面为摄像头采集的图像所在的平面(即垂直于摄像头光轴的平面)。
用户在腿部直立时,用户的腿部所在的直线与图像平面之间的夹角应该为0或者为与0接近的值。在检测到用户的腿部直立时,电子设备可以通过用户的腿部3D关键点所在的直线与图像平面之间的夹角来确定图像中人物透视形变的程度,并对3D关键点的位置信息进行矫正。上述方法相比于利用多个摄像头采集的图像来检测3D关键点,可以只需要一个摄像头,成本更低,并且计算复杂度也更低。
用户在进行健身或体感游戏的过程中,活动的区域通常不会有太大的变化。也即是说,在健身或体感游戏进行的过程中,摄像头采集的图像中人物的透视形变程度不会有太大的变化。在健身或体感游戏进行的过程中,电子设备可以检测该过程中某一次用户在腿部直立时用户腿部3D关键点所在的直线与图像平面之间的夹角,并以该夹角作为补偿角,对健身或体感游戏进行过程中确定出的3D关键点的位置信息进行矫正。
可选的,在健身或体感游戏进行的过程中,若电子设备多次检测到用户的腿部直立,电子设备可以更新补偿角。即电子设备可以以最近一次检测到用户的腿部直立时用户腿部3D关键点所在的直线与图像平面之间的夹角作为补偿角,对健身或体感游戏进行过程中后续阶段确定出的3D关键点的位置信息进行矫正。上述方法可以减少用户与摄像头之间的位置发生变化对3D关键点的位置信息进行矫正的影响,提高矫正后3D关键点的位置信息的准确度。
示例性,在体感游戏的场景中,电子设备可以根据摄像头采集的图像确定用户的3D关键点,并根据上述3D关键点确定体感游戏中的虚拟人物。电子设备可以在用户界面呈现上述虚拟人物。上述虚拟人物可以反映用户的姿态。例如,用户向前跳跃,上述虚拟人物也向前跳跃。在3D关键点的位置信息被矫正之前,由于图像存在透视形变,3D关键点位置信息存在误差。那么上述虚拟人物的姿态与用户实际的姿态之间存在差异。例如,用户实际处于站立姿态,而上述虚拟人物可能会处于身体前倾的姿态。那么用户可能实际完成的动作是标准的,但电子设备100会判定用户的动作不标准并指示用户通关失败。这会影响用户的游戏体验。
3D关键点的位置信息经过矫正后,上述虚拟人物的姿态与用户实际的姿态更加贴合。这样,在健身或体感游戏中,电子设备100可以更准确地判断出用户的姿态是否正确以及用户动作的幅度是否满足要求等,使得用户在进行健身或体感游戏时具有更好的体验。
图5示例性示出了本申请实施例提供的一种电子设备100的结构示意图。
如图5所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用 于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
充电管理模块140用于从充电器接收充电输入。其充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备100供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed, Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
其中传感器模块180可以包括压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
不限于图5所示的部件,电子设备100可以包含更多或更少的部件。本申请实施例中的电子设备100可以是电视、手机、平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、手持计算机、上网本、个人数字助理(personal digital assistant,PDA)、便携式多媒体播放器(portable multimedia player,PMP)、专用媒体播放器、AR(增强现实)/VR(虚拟现实)设备等其他类型的电子设备。本申请实施例对电子设备100的具体类别不作限定。
下面基于用户跟随电子设备100中的健身课程进行健身,电子设备100检测用户的3D关键点的场景,介绍本申请实施例提供的一种人体关键点检测方法。
电子设备100中可存储有多个健身课程。可选的,电子设备100可以从云服务器中获取多个健身课程。健身课程通常包括多个动作,上述多个动作中两个连续的动作间可以有预设的休息时间,上述多个动作中任意两个动作可以相同也可以不同。健身课程可以是电子设备根据用户历史健身数据推荐的,也可以是用户根据实际需求选择的。健身课程可以是本地播放,也可以是在线播放。此处均不作具体限定。
一个健身课程可包括多个子课程,每个子课程可以包括健身课程的一或多个连续的动作。上述多个子课程可以是依据运动类型、运动目的、锻炼部位等划分的。此处不作具体限定。
例如,一个健身课程包括三个子课程。其中,第一个子课程为热身运动,第二个子课程为正式运动,第三个子课程为拉伸运动,上述三个子课程中的任一子课程包括一或多个连续的动作。
本申请实施例中,健身课程中可以包括视频、动画、语音、文字等方式中的一或多种类型的内容,此处不做具体限定。
阶段1:开启健身课程。
如图6A所示,图6A示例性示出了电子设备100上用于展示电子设备100安装的应用程 序的用户界面61。
用户界面61可包括应用程序健身的图标611,以及其他的应用程序(如邮件、图库和音乐等)的图标。任一个应用的图标可用于响应用户的操作,例如触摸操作,使得电子设备100启动图标对应的应用。用户界面61还可以包含更多或更少的内容,本申请实施例对此不作限定。
响应于作用在健身的图标611上的用户操作,电子设备100可以显示如图6B所示的健身课程界面62。健身课程界面62可包括应用程序标题栏621、功能栏622、显示区623。其中:
应用程序标题栏621可用于指示当前页面用于展示电子设备100的设置界面。应用程序标题栏621的表现形式可以为文本信息“智能健身”、图标或其他形式。
功能栏622可包括:用户中心控件、课程推荐控件、燃脂专区控件、塑形专区控件、塑形专区控件。不限于上述控件,功能栏622可以包含更多或更少的控件。
响应于作用在功能栏622中任一控件的用户操作,电子设备100可以在显示区623显示该控件指示的内容。
例如,响应于作用在用户中心控件的用户操作,电子设备100可以在显示区623显示用户个人中心的界面内容。响应于作用在课程推荐控件的用户操作,电子设备100可以在显示区623显示一或多个推荐的健身课程。如图6B所示,显示区623显示多个推荐课程的课程封面。课程封面上可包括对应健身课程的课程分类、时长、名称。响应于作用在任一课程封面的用户操作,电子设备100可以开启该课程封面对应的健身课程,显示该健身课程中的运动内容。
本申请实施例对上面提及的各用户操作均不作限定。例如,用户还可以通过遥控器控制电子设备100执行相应的指令(如启动健身应用程序、开启健身课程等)。
健身课程界面62还可以包含更多或更少的内容,本申请实施例对此不作限定。
响应于作用在任一健身课程(如课程名称为“全身燃脂初级”的健身课程)的课程封面的用户操作,电子设备100可以开启该健身课程。其中,在播放健身课程的过程中,电子设备100需要通过摄像头采集图像。那么在播放健身课程之前,电子设备100可以提示用户摄像头即将开启。
阶段2:确定目标用户及初始补偿角,并利用初始补偿角矫正3D关键点的位置信息。
目标用户可以表示在电子设备100播放健身课程的过程中,需要电子设备100检测3D关键点、记录运动数据的用户。
在一些实施例中,如图6C所示,电子设备100可以采集用户的人脸信息,来确定进行运动的目标用户。确定目标用户有利于电子设备100跟踪需要进行关键点检测的用户,并准确获取目标用户的运动数据。这样可以避免摄像头的拍摄范围内出现目标用户以外的其它用户对电子设备100检测目标用户的关键点以及获取目标用户的运动数据产生干扰。
在上述确定目标用户的过程中,电子设备100还可以确定初始补偿角。电子设备100可以在该初始补偿角被更新之前,利用该初始补偿角来矫正健身课程开始播放后基于2D关键点确定的3D关键点的位置信息。
示例性的,电子设备100可以显示目标用户确定界面63。目标用户确定界面63可包括提示语631和用户图像632。其中,提示语可用于提示用户在确定目标用户及初始补偿角的相关操作。提示语可以为文字提示“请在您开展运动的区域保持站立姿态,并将脸部对准摄像头”。本申请实施例对上述提示语的形式和具体内容不作限定。用户图像632为摄像头采集到的目标用户的图像。
上述摄像头采集到的用户图像可用于电子设备100在健身课程播放过程中确定需要进行关键点检测和运动数据记录的目标用户。其中,电子设备100可以利用目标跟踪算法来跟踪目标用户。上述目标跟踪算法的实现方式可以参考现有技术中目标跟踪算法的具体实现,这里不作赘述。
上述提示语中提示用户在开展运动的区域保持站立姿态,可便于电子设备100确定初始补偿角。
具体的,在上述提示语631的提示下,用户在电子设备100确定目标用户的过程中在自己即将开展运动的区域内保持站立姿态。电子设备100可以采集到包含用户运动姿态的一帧或多帧图像,并从该一帧或多帧图像中识别出用户的一组或多组2D关键点。进一步的,基于上述一组或多组2D关键点,电子设备100可以估计一组用户的3D关键点。根据上述一组用户的3D关键点,电子设备100可以判断用户的腿部是否直立。在判断出用户的腿部直立时,电子设备100可以计算用户腿部3D关键点所在的直线与图像平面之间的夹角。若该夹角小于预设夹角,电子设备100可以将该夹角确定为初始补偿角。否则,电子设备100可以以默认角度为初始补偿角。上述默认角度可以是预先存储在电子设备100中的。上述默认角度可作为通用的补偿角来矫正3D关键点的位置信息,减少图像中人物的透视形变对3D关键点的位置信息造成的误差。上述默认角度可以根据经验值设定。本申请实施例对上述默认角度的取值不作限定。
其中,电子设备100判断上述3D关键点所在的直线与图像平面之间的夹角是否小于预设夹角,并在该夹角小于预设夹角时将该夹角确定为初始补偿角。这可以避免上述3D关键点所在的直线与图像平面之间的夹角计算失误导致初始补偿角为不可能的取值。上述预设夹角可以例如是45°。本申请实施例对上述预设夹角的取值不作限定。
当确定了目标用户和初始补偿角,电子设备100可以播放健身课程。在健身课程播放过程中,电子设备100可以通过摄像头采集图像,并识别图像中用户的2D关键点。基于上述2D关键点,电子设备100可以确定用户的3D关键点。电子设备100可以利用初始补偿角对上述3D关键点的位置信息进行矫正。
其中,电子设备100可以对每一帧图像进行2D关键点检测。对于一帧图像,电子设备100可以识别出与这一帧图像对应的一组2D关键点。电子设备100可以利用这一组2D关键点确定一组3D关键点。这一组3D关键点可用于确定一个人体模型。这一个人体模型可以反映这一帧图像中用户的姿态。
可选的,电子设备100可以对每一帧图像进行2D关键点检测。对于连续多帧图像,电子设备100可以识别出与这连续多帧图像对应的多组2D关键点。电子设备100可以利用这多组2D关键点确定一组3D关键点。这一组3D关键点可用于确定一个人体模型。这一个人体模型可以反映这连续多帧图像中用户的姿态。相比于利用从一帧图像中识别出的2D关键点确定的3D关键点,利用从连续多帧图像中识别出的2D关键点确定的3D关键点可以更准确地反映用户的姿态。
本申请实施例对确定目标用户的方法不作限定。例如,电子设备100可以通过检测用户的动作是否与预设动作匹配来确定目标用户。上述预设动作可以是双臂向上弯举。在开始播放健身课程之前,电子设备100可以在图6C所示的目标用户确定界面63显示双臂向上弯举的动作示例、以及用于提示用户做双臂向上弯举的提示语。另外,电子设备100可以根据摄像头采集的图像进行人体姿态检测。电子设备100可以将检测到姿态与做双臂向上弯举的姿态相同的用户确定为目标用户。用户进行上述目标用户确定的过程中指示用户完成的动作(如 双臂向上弯举)时,腿部是直立的。那么在上述目标用户确定的过程中,电子设备100还可以根据上述实施例的方法确定初始补偿角。
本申请实施例对上述预设动作的类型不作限定。
在一些实施例中,在播放一个健身课程时,电子设备100可以获取这一个健身课程中包含的动作。电子设备100可以在播放一个健身课程时,检测动作与这一个健身课程中的动作匹配的用户来确定目标用户。另外,电子设备100可以以上述默认角度作为初始补偿角,并利用该默认角度对电子设备100确定出的3D关键点的位置信息进行矫正。即电子设备100可以不通过图6C所示方法来确定目标用户和初始补偿角。
可选的,若在一个健身课程开始播放的时刻,或者在一个健身课程开始播放后的一段时间内(如3秒内),电子设备100检测到用户的腿部直立,那么电子设备100可以根据检测到用户的腿部直立时所使用的3D关键点来确定上述初始补偿角。其中,初始补偿角为上述3D关键点中腿部3D关键点所在直线与图像平面之间的夹角。
阶段3:更新补偿角,并利用更新后的补偿角矫正3D关键点的位置信息。
由于在播放一个健身课程时,电子设备100可以获取这一个健身课程中包含的动作。电子设备100可以确定这一个健身课程中的动作是否包含腿部直立的动作(如站立动作、正面站立双臂上举动作等等)。
如图6D所示,在健身课程播放的过程中,电子设备100可以显示图6D所示的用户界面64。用户界面64可包括健身课程窗口641和用户健身窗口642。其中:
健身课程窗口641可用于显示健身课程的具体内容。例如教练进行健身课程中的动作的图像等等。
用户健身窗口642可用于实时显示摄像头采集到的目标用户的图像。
本申请实施例对上述健身课程窗口641和用户健身窗口642在用户界面64上的分布方式不作限定。健身课程窗口641和用户健身窗口642中还可以包含更多的内容,本申请实施例对此不作限定。
健身课程播放至t1时刻时,健身课程指示用户完成的动作为腿部直立的动作,如站立动作。如图6D所示,健身课程窗口641中教练的动作为站立动作。用户可以根据健身课程窗口641中教练动作的指示做站立动作。电子设备100可以利用摄像头在t1时刻附近采集的图像,得到的3D关键点。电子设备100可以根据该3D关键点判断用户的腿部是否直立。若用户的腿部直立,电子设备100可以计算用户腿部3D关键点所在的直线与图像平面之间的夹角。若该夹角小于前述实施例中的预设夹角,电子设备100可以将该夹角确定为补偿角。电子设备100可以利用该补偿角更新之前的补偿角,并利用更新后的补偿角来矫正3D关键点的位置信息。
其中,在健身课程首次指示用户完成腿部直立的动作时,被电子设备100计算得到的补偿角所更新的补偿角为前述实施例中的初始补偿角。在健身课程第二次及后续阶段指示用户完成腿部直立的动作时,被电子设备100计算得到的补偿角所更新的补偿角可以是电子设备100上一次检测到用户腿部直立时计算得到的补偿角。
由上述实施例可知,电子设备100可以在健身课程指示用户完成腿部直立的动作时,检测用户的腿部是否直立。若用户的腿部直立,电子设备100可以更新用于矫正3D关键点的位置信息的补偿角。更新之后的补偿角为该次用户腿部直立时,电子设备100计算用户腿部3D关键点所在的直线与图像平面之间的夹角。更新之后的补偿角可以更准确地反映用户在当前位置下,摄像头采集的图像中人物透视形变的程度。也即是说,电子设备100可以利用更 新之后的补偿角来矫正3D关键点的位置信息。经过矫正之后的3D关键点确定的人体模型可以更准确地反映用户的姿态,提高姿态检测的准确率。这样,在健身或体感游戏中,电子设备100可以更准确地判断出用户的姿态是否正确以及用户动作的幅度是否满足要求等,使得用户在进行健身或体感游戏时具有更好的体验。
在一些实施例中,电子设备100可以在健身课程播放的过程中,通过一帧图像识别出的一组2D关键点或连续多帧图像识别出的多组2D关键点确定一组3D关键点。该一组3D关键点可以确定一个人体模型。电子设备100可以通过该一组3D关键点判断用户的腿部是否直立。若用户的腿部直立,电子设备100可以计算用户腿部3D关键点所在的直线与图像平面之间的夹角。若该夹角小于前述实施例中的预设夹角,电子设备100可以更新补偿角。更新之后的补偿角为上述夹角。电子设备100可以利用更新之后的补偿角来矫正3D关键点的位置信息。
也即是说,电子设备100可以在每次确定出一组3D关键点时,均利用该一组3D关键点判断是否可以更新用于矫正3D关键点的位置信息的补偿角。
不限于上述智能健身的场景,本申请实施例提供的人体关键点检测方法还可以适用于其它通过检测3D关键点实现姿态检测的场景。
下面介绍本申请实施例提供的一种判断用户的腿部是否直立以及在用户的腿部直立时确定补偿角的方法。
在一些实施例中,用户的腿部直立可以指用户的双腿均直立。
用户的双腿均直立时,用户每一条腿的大腿与小腿之间的夹角应该为180°或与180°接近的值。电子设备100可以利用基于2D关键点确定出的3D关键点来判断用户大腿与小腿之间的夹角是否与180°接近,来判断用户的腿部是否直立。
图7A示例性示出了一组3D关键点的示意图。如图7A所示,用户的左大腿所在的直线(即左髋点与左膝点连线所在的直线)与左小腿所在的直线(即左膝点与左脚踝点连线所在的直线)之间的夹角为β1。用户的右大腿所在的直线(即右髋点与右膝点连线所在的直线)与右小腿所在的直线(即右膝点与右脚踝点连线所在的直线)之间的夹角为β2。电子设备100可以计算β1与180°之间的差值以及β2与180°之间的差值。若β1与180°之间的差值以及β2与180°之间的差值均小于预设差值,电子设备100可以确定出用户的腿部直立。上述预设差值为与0接近的值。上述预设差值可以根据经验设定。本申请实施例对上述预设差值的大小不作限定。
进一步的,在确定出用户的腿部直立的情况下,电子设备100可以计算左髋点与左脚踝点连线所在的直线与图像平面之间的夹角α1,以及右髋点与右脚踝点连线所在的直线与图像平面之间的夹角α2。电子设备100可以计算夹角α1和夹角α2的均值,得到如图7B所示的夹角α。若夹角α小于前述实施例中的预设夹角,电子设备100可以将该夹角α确定为补偿角,并利用该补偿角来矫正3D关键点的位置信息。
可选的,在确定出用户的腿部直立的情况下,电子设备100可以确定左髋点与左脚踝点连线所在的直线在图7A所示3D坐标系y-0-z平面上的第一投影直线,以及右髋点与右脚踝点连线所在的直线在图7A所示3D坐标系y-0-z平面上的第二投影直线。电子设备100可以计算上述第一投影直线与图7A所示3D坐标系y轴正方向的夹角α3,以及上述第二投影直 线与图7A所示3D坐标系y轴正方向的夹角α4。电子设备100可以计算夹角α3和夹角α4的均值,得到如图7B所示的夹角α。若夹角α小于前述实施例中的预设夹角,电子设备100可以将该夹角α确定为补偿角,并利用该补偿角来矫正3D关键点的位置信息。
在一些实施例中,用户的腿部直立可以指用户的单腿直立。电子设备100可以根据前述实施例的方法来判断用户双腿中的任意一条腿是否直立。若判断出用户双腿中的一条腿直立,电子设备100可以确定用户的腿部直立。进一步的,电子设备100可以计算直立的那一条腿上髋点与脚踝点连线所在直线与图像平面的夹角。若该夹角小于前述实施例中的预设夹角,电子设备100可以将该夹角确定为补偿角,并利用该补偿角来矫正3D关键点的位置信息。
下面介绍本申请实施例提供的一种利用补偿角α对3D关键点的位置信息进行矫正的方法。
电子设备100可以根据下式(1)对3D关键点的位置信息进行矫正:
Figure PCTCN2022083227-appb-000001
其中,
Figure PCTCN2022083227-appb-000002
为一个3D关键点矫正前的位置信息。
Figure PCTCN2022083227-appb-000003
是电子设备100基于2D关键点估计得到的。
Figure PCTCN2022083227-appb-000004
为上述一个3D关键点矫正后的位置信息。
图8示例性示出了本申请实施例提供的一种人体关键点检测方法的流程图。
如图8所示,该方法可包括步骤S101~S108。其中:
S101、电子设备100可以通过摄像头采集图像。
在一些实施例中,摄像头与电子设备100可以是一体的。如图3所示,电子设备100可以包含摄像头193。
在另一些实施例中,摄像头与电子设备100可以是分离的两个设备。摄像头与电子设备100之间建立有通信连接。摄像头可以将采集的图像发送给电子设备100。
在一些实施例中,响应于开启一个健身课程的用户操作,电子设备100可以开启摄像头或者向摄像头发送指示摄像头开启的指令。如图6C所示,响应于作用在确定控件624A的用户操作,电子设备100可以开启摄像头。
本申请实施例对摄像头开启的时间不作限定。例如,摄像头也可以是在电子设备100接收到开启一个健身课程的用户操作之前开启的。
S102、电子设备100可以根据摄像头采集的m帧图像,确定与这m帧图像对应的m组2D关键点。
在一些实施例中,电子设备100可以在一个健身课程播放的过程中对摄像头采集的图像进行2D关键点检测。即这m帧图像是在健身课程播放的过程中摄像头采集得到的。
m的取值为正整数。电子设备100可以从一帧图像中识别出在这一帧图像中一组用户的2D关键点。其中,电子设备100可以利用深度学习的方法(如openpose算法等)对图像进行2D关键点检测。本申请实施例对电子设备100识别2D关键点的方法不作限定。电子设备 100识别2D关键点的实现过程可以参考现有技术中的实现过程,这里不作赘述。
S103、电子设备100可以根据m组2D关键点,估计用户与这m帧图像对应的一组3D关键点。
若上述m为1,则对于每一帧图像,电子设备100可以从一帧图像中识别出一组2D关键点。进一步的,电子设备100可以利用与这一帧图像对应的一组2D关键点,估计出与这一帧图像对应的一组3D关键点。
若上述m为大于1的整数,这m帧图像为摄像头采集的连续多帧图像。电子设备100可以基于这连续多帧图像估计一组3D关键点。具体的,电子设备100可以从连续多帧图像中识别出多组2D关键点。利用与这多帧图像对应的多组2D关键点,电子设备100可以估计出与这多帧图像对应的一组3D关键点。
本申请实施例对利用2D关键点估计3D关键点的方法不作限定。电子设备100估计3D关键点的实现过程可以参考现有技术中的实现过程,这里不作赘述。
S104、电子设备100可以利用与上述m帧图像对应的一组3D关键点检测用户的腿部是否直立。
当确定了一组3D关键点,电子设备100可以根据这一组3D关键点的位置信息检测用户的腿部是否直立。电子设备100检测用户的腿部是否直立的方法可以参考前述图7A所示的实施例。
若检测出用户的腿部直立,电子设备100可以执行步骤S105。
若检测出用户的腿部不直立,电子设备100可以执行步骤S108。
S105、若检测出用户的腿部直立,电子设备100可以确定上述一组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角,图像平面为与摄像头的光轴垂直的平面。
用户的腿部直立时,用户的腿部所在的直线与图像平面之间的夹角应该为接近0的值。电子设备100可以通过检测用户的腿部3D关键点所在的直线与图像平面之间的夹角来确定摄像头采集的图像中人物透视形变的程度,从而对3D关键点的位置信息进行校正。
当得到与上述m帧图像对应的一组3D关键点,且检测出用户的腿部直立,电子设备100可以确定上述一组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角。电子设备100计算上述夹角的方法可以参考前述图7A所示实施例的介绍,这里不再赘述。
S106、电子设备100可以判断上述夹角是否小于预设夹角。
若判断出上述夹角小于预设夹角,电子设备100可以执行步骤S107,来更新电子设备100中存储的补偿角。
若判断出上述夹角大于或等于预设夹角,电子设备100可以执行步骤S108,利用自己已存储的补偿角对与上述m帧图像对应的一组3D关键点的位置信息进行矫正。
电子设备100计算上述夹角可能存在误差,导致计算出的夹角过大。例如,电子设备100计算出的上述夹角为70°。而在用户的腿部直立时,图像中人物的透视形变对3D关键点的影响显然不可能导致用户的腿部3D关键点所在的直线与图像平面之间的夹角达到70°。电子设备100通过设置预设夹角来判断上述计算得到的夹角是否可以作为矫正3D关键点的位置信息的补偿角,从而避免由于上述夹角计算失误导致补偿角为不可能的取值,提高检测3D关键点的准确率。
上述预设夹角的取值可以根据经验设定。例如预设夹角的取值可以为45°。本申请实施例对上述预设夹角的取值不作限定。
S107、若上述夹角小于预设夹角,电子设备100可以利用上述夹角更新自己已存储的补 偿角,更新之后电子设备100存储的补偿角为上述夹角。
若上述夹角小于预设夹角,则上述预设夹角可以作为矫正3D关键点的位置信息的补偿角。电子设备100可以利用上述夹角更新自己已存储的补偿角。
S108、电子设备100可以利用自己存储的补偿角对这一组3D关键点的位置信息进行矫正。
电子设备100利用补偿角矫正3D关键点的位置信息的方法可以参考前述实施例。
在一些实施例中,电子设备100中存储的补偿角可以是上述步骤S107中经过更新的补偿角。上述更新之后的补偿角为与上述m帧图像对应的一组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角。
在另一些实施例中,电子设备100中存储的补偿角可以是前述实施例中的初始补偿角。由前述实施例可知,上述初始补偿角可以是电子设备100预先存储的默认角度。或者,上述初始补偿角可以是如图6C所示在一个健身课程播放前检测到用户的腿部直立时计算得到的补偿角。
也即是说,从电子设备100开始播放上述一个健身课程到摄像头采集上述m帧图像之间的时间段T1内,电子设备100未更新上述初始补偿角。电子设备100可以利用上述初始补偿角来矫正在上述时间段T1内确定的各组3D关键点的位置信息。
在另一些实施例中,电子设备100中存储的补偿角可以是在上述时间段T1内最后一次检测到用户的腿部直立时计算得到的补偿角。
也即是说,在上述时间段T1内,电子设备100至少检测到一次用户的腿部直立。当检测到用户的腿部直立,电子设备100可以根据该次用户的腿部直立时确定的一组3D关键点来计算腿部的3D关键点与图像平面之间的夹角。若该夹角可用于矫正3D关键点的位置信息,电子设备100可以更新补偿角,将该夹角作为新的补偿角以替换上一次用户的腿部直立时计算得到的补偿角。电子设备100可以利用已存储的补偿角来矫正3D关键点的位置信息,直至该已存储的补偿角被更新。电子设备100可以利用更新之后的补偿角来矫正补偿角更新之后确定的3D关键点的位置信息。
在一些实施例中,电子设备100在播放一个健身课程时,确定这一个健身课程中的动作是否包含腿部直立的动作。在健身课程播放至指示用户完成腿部直立的动作时,电子设备100可以执行图8所示的步骤S104。即电子设备100可以利用基于2D关键点确定出的3D关键点来检测用户的腿部是否直立。在检测出用户的腿部直立的情况下,电子设备100可以判断上述3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角是否可以用于矫正3D关键点的位置信息。若上述夹角可用于矫正3D关键点的位置信息,电子设备100可以利用上述夹角更新补偿角。
在健身课程播放至指示用户完成腿部不直立的动作时,电子设备100可以利用当前采集的图像确定用户的2D关键点,并基于上述2D关键点确定用户的3D关键点。电子设备100可以利用最近一次更新的补偿角,来矫正上述3D关键点。
也即是说,电子设备100可以在健身课程播放至指示用户完成腿部直立的动作时,利用此时基于2D关键点确定出的3D关键点来判断用户的腿部是否直立。那么,电子设备100可以不用利用每一组基于2D关键点确定出的3D关键点来判断用户的腿部是否直立,并判断是否需要更新补偿角。这可以节省电子设备100的计算资源。
本申请实施例对电子设备100更新补偿角的时间不作限定。除了上述实施例中的方法, 电子设备100还可以定期或者不定期检测用户的腿部是否直立,并在检测出用户的腿部直立时更新补偿角。
由图8所示的人体关键点检测方法可知,在检测到用户的腿部直立时,电子设备100可以利用基于2D关键点确定出的3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角来确定摄像头采集的图像中人物透视形变的程度。进而,电子设备100可以对3D关键点的位置信息进行矫正,减少图像透视形变对3D关键点的位置信息带来的误差,提高3D关键点的位置信息检测的准确度。该方法可以只需要一个摄像头,不仅节约成本,而且进行关键点检测的计算复杂度也较低。
另外,每一次检测到用户的腿部直立时,若腿部的3D关键点所在的直线与图像平面之间的夹角可用于矫正3D关键点的位置信息,电子设备100可以更新补偿角。更新之后的补偿角可以更准确地反映用户在当前位置下,摄像头采集的图像中人物透视形变的程度。这可以减少用户与摄像头之间的位置发生变化对3D关键点的位置信息进行矫正的影响,提高矫正后3D关键点的位置信息的准确度。也即是说,电子设备100可以利用更新之后的补偿角来矫正3D关键点的位置信息。经过矫正之后的3D关键点确定的人体模型可以更准确地反映用户的姿态,提高姿态检测的准确率。这样,在健身或体感游戏中,电子设备100可以更准确地判断出用户的姿态是否正确以及用户动作的幅度是否满足要求等,使得用户在进行健身或体感游戏时具有更好的体验。
前述实施例中的人体关键点检测方法主要是通过在用户的腿部直立时来确定摄像头采集的图像中人物透视形变的程度。在实际应用中,例如在进行健身或者体感游戏的过程中,用户不会保持腿部直立的姿态不变。用户可能会进行例如下蹲、侧躺等等动作。在用户的腿部不直立时,电子设备100可以利用摄像头采集的图像确定2D关键点,并基于上述2D关键点确定出一组或多组3D关键点。根据前述图8所示的方法,电子设备100只能利用最近一次检测到用户的腿部直立时计算得到的补偿角对上述用户的腿部不直立时确定出的一组或多组3D关键点的位置信息进行矫正。
尽管用户开展运动的区域一般不会过大,摄像头采集的图像中人物的透视形变程度不会有太大的变化,但用户与摄像头之间相对位置的变化、摄像头采集的图像中用户高度的变化均对摄像头采集的图像中人物的透视形变程度有影响。前一时刻用户的腿部直立时电子设备100计算得到的补偿角难以准确反映后一时刻用户的腿部不直立时摄像头采集的图像中人物的透视形变程度。电子设备100利用最近一次检测到用户的腿部直立时计算得到的补偿角对用户的腿部不直立时确定出的一组或多组3D关键点的位置信息进行矫正,矫正之后3D关键点的位置信息仍然不够准确。
根据透视形变的原理可知,在摄像头摆放位置、摆放角度相同的情况下,用户与摄像头的距离越近,摄像头采集的图像中人物透视形变的程度越大(即前述图4B所示的补偿角α越大)。摄像头采集的图像中用户高度越小,摄像头采集的图像中人物透视形变的程度越大(即前述图4B所示的补偿角α越大)。例如,用户的动作从站立变化为下蹲,摄像头采集的图像中用户的高度变小,这会导致摄像头采集的图像中人物透视形变的程度变大。用户与摄像头的距离变大,摄像头采集的图像中用户的高度变小,这也会导致摄像头采集的图像中人物透视形变的程度变大。
本申请提供一种人体关键点检测方法,该方法可以在不限定用户保持腿部直立的情况下,实时确定出图像中人物透视形变的程度,并对基于2D关键点确定出的3D关键点的位置信息进行矫正。具体的,电子设备100可以确定初始补偿角。初始补偿角的确定的方法可以参考前述实施例的介绍,这里不作赘述。根据用户在相邻两帧图像中位置的变化,电子设备100可以确定这相邻两帧图像之间人物透视形变的程度的变化量。前一帧图像中人物透视形变的程度与这相邻两帧图像之间人物透视形变的程度的变化量之和即为后一帧图像中人物透视形变的程度。上述前一帧图像中人物透视形变的程度可以是以上述初始补偿角为基础,累加上述前一帧图像之前相邻两帧图像之间人物透视形变的程度的变化量。
由上述方法可以看出,即便用户的腿部不直立,电子设备100也可以确定出摄像头采集的图像中人物透视形变的程度。电子设备100根据实时确定出的图像中人物透视形变的程度对3D关键点的位置信息进行矫正,可以提高3D关键点检测的准确率。
下面具体介绍本申请实施例提供的一种确定相邻两帧图像之间人物透视形变的程度的变化量的方法。
图9A示例性示出了用户在运动过程中的场景示意图。如图9A所示,电子设备100可以通过摄像头193采集图像,并在用户界面910显示上述图像。在摄像头193采集第n-1帧图像至摄像头193采集第n帧图像的时间内,用户向摄像头193所在的方向移动。即用户与摄像头193之间的距离变小。上述n为大于1的整数。用户在第n-1帧图像中和在第n帧图像中的位置发生变化。
图9B和图9C示例性示出了摄像头193采集的第n-1帧图像和第n帧图像。
电子设备100可以对图像进行人体检测,确定用户在图像中的人体矩形框。人体矩形框可用于确定用户在图像中的位置。人体矩形框的高度和宽度分别与用户在图像中的高度和宽度适配。电子设备100可以确定每一帧图像中人体矩形框在图像坐标系(即前述实施例中的2D坐标系x_i-y_i)中的高度以及下边缘的与x_i轴的距离。
如图9B所示,在第n-1帧图像中,人体矩形框下边缘与x_i轴的距离为y n-1。人体矩形框的高度为h n-1
如图9C所示,在第n帧图像中,人体矩形框下边缘与x_i轴的距离为y n。人体矩形框的高度为h n
在相邻两帧图像之间,人体矩形框下边缘与x_i轴的距离的变化量Δy可以反映用户与摄像头之间相对位置的变化量。上述Δy为后一帧图像中人体矩形框下边缘与x_i轴的距离减前一帧图像中人体矩形框下边缘与x_i轴的距离之差。若用户向摄像头靠近,人体矩形框下边缘与x_i轴的距离变小,图像中人物透视形变的程度变大。前述图4B所示的补偿角α可以表示图像中人物透视形变的程度。也即是说,一帧图像中人体矩形框下边缘与x_i轴的距离越小,用于矫正根据这一帧图像确定的一组3D关键点的位置信息的补偿角α越大。Δy的绝对值越大,则Δα的绝对值越大。上述Δα为后一个补偿角减前一个补偿角之差。上述前一个补偿角用于矫正根据相邻两帧图像中前一帧图像确定的一组3D关键点的位置信息。上述后一个补偿角用于矫正根据相邻两帧图像中后一帧图像确定的一组3D关键点的位置信息。
其中,若Δy<0,则Δα>0。即若用户向摄像头靠近,图像中人物透视形变的程度变大,上述后一个补偿角大于上述前一个补偿角。若Δy>0,则Δα<0。即若用户远离摄像头,图像中人物透视形变的程度变小,上述后一个补偿角小于上述前一个补偿角。
在相邻两帧图像之间,人体矩形框高度的变化量Δh可以反映图像中用户高度的变化量。 上述Δh为后一帧图像中人体矩形框的高度减前一帧图像中人体矩形框的高度之差。若人体矩形框的高度变小,图像中人物透视形变的程度变大。也即是说,一帧图像中人体矩形框的高度越小,用于矫正根据这一帧图像确定的一组3D关键点的位置信息的补偿角α越大。Δh的绝对值越大,则上述Δα的绝对值越大。
其中,若Δh<0,则Δα>0。即若图像中用户高度变小,图像中人物透视形变的程度变大,上述后一个补偿角大于上述前一个补偿角。若Δh>0,则Δα<0。即若图像中用户高度变大,图像中人物透视形变的程度变小,上述后一个补偿角小于上述前一个补偿角。
在相邻两帧图像之间,后一帧图像中人体矩形框高度h的大小可以影响这相邻两帧图像之间人物透视形变的程度的变化量。可以理解的,图像中用户高度越小,图像中用户透视形变的程度越大。在相邻两帧图像之间的Δh相同的情况下,这相邻两帧图像中用户高度越小,那么这相邻两帧图像之间人物透视形变的程度的变化量越大。即相比于图像中用户高度在取值较大的范围内变化,在图像中用户高度在取值较小的范围内变化时,图像中人物透视形变的程度的变化量更大。
其中,上述h越小,上述Δα的绝对值越大。
不限于通过上述人体矩形框来确定用户与摄像头之间相对位置的变化、摄像头采集的图像中用户高度的变化,电子设备100还可以通过其它方法来确定用户与摄像头之间相对位置的变化、摄像头采集的图像中用户高度以及用户高度的变化。
电子设备100中可存储有补偿角确定模型。该补偿角确定模型的输入可以包括相邻两帧图像之间的Δy、Δh和其中后一帧图像中人体矩形框高度h,输出可以为这相邻两帧图像之间的Δα。
该补偿角确定模型可以例如是线性模型、非线性模型、神经网络模型等等。本申请实施例对补偿角确定模型的类型不作限定。
该补偿角确定模型可以通过多组训练样本得到。这多组训练样本可以是利用摄像头在用户的腿部直立时采集的图像确定的。一组训练样本可以包括相邻两帧图像之间的Δy、Δh、其中后一帧图像中人体矩形框高度h、Δα。一组训练样本中的Δα可以是后一个补偿角减前一个补偿角之差。上述前一个补偿角用于矫正根据相邻两帧图像中前一帧图像确定的一组3D关键点的位置信息。上述后一个补偿角用于矫正根据相邻两帧图像中后一帧图像确定的一组3D关键点的位置信息。上述前一个补偿角和上述后一个补偿角是通过前述图7A所示实施例中的方法计算得到的。
本申请实施例对训练上述补偿角确定模型的方法不作限定,具体可以参考现有技术中线性模型或神经网络模型等等模型的训练方法。
在一些实施例中,电子设备100可以确定采集时间在前的一帧图像与采集时间在后的一帧图像之间的Δy′、Δh′和上述采集时间在后的一帧图像中人体矩形框高度。将上述Δy′、Δh′和上述采集时间在后的一帧图像中人体矩形框高度输入补偿角确定模型,电子设备100可以得到采集时间在前的一帧图像与采集时间在后的一帧图像之间的补偿角变化量。即不限于确定相邻两帧图像之间的补偿角变化量,电子设备100还可以确定中间间隔有多帧图像的两帧图像之间的补偿角变化量。
在一些实施例中,电子设备100可以从一帧图像中识别出一组2D关键点,并根据这一组2D关键点估计与这一帧图像对应的一组3D关键点。即每一帧图像均可对应一组3D关键 点。电子设备100可以根据前述实施例中的方法确定相邻两帧图像之间人物透视形变的程度的变化量。即对于每一帧图像,电子设备100均可以确定出一个补偿角。电子设备100可以利用一帧图像对应的补偿角来矫正与这一帧图像对应的一组3D关键点的位置信息。
在一些实施例中,电子设备100可以从连续多帧图像中识别出多组2D关键点,并根据这多组2D关键点估计与这多帧图像对应的一组3D关键点。例如,电子设备100可以利用连续k帧图像确定一组3D关键点。上述k为大于1的整数。根据前k帧图像与后k帧图像之间人物透视形变的程度的变化量,电子设备100可以确定与后k帧图像对应的补偿角。即对于每连续k帧图像,电子设备100均可以确定出一个补偿角。电子设备100可以利用连续k帧图像对应的补偿角来矫正与这连续k帧图像对应的一组3D关键点的位置信息。
其中,在确定上述前k帧图像与后k帧图像之间人物透视形变的程度的变化量时,电子设备100可以从上述前k帧图像和上述后k帧图像中各选取一帧图像。电子设备100可以计算被选取出来的两帧图像之间人体矩形框下边缘与x_i轴的距离的变化量、人体矩形框的高度的变化量。基于补偿角确定模型,电子设备100可以确定上述前k帧图像与后k帧图像之间人物透视形变的程度的变化量。
在一些实施例中,电子设备100根据上述实施例中通过累加补偿角的变化量的方法确定出一组3D关键点对应的一个补偿角后,可以对这一个补偿角进行平滑处理,并利用经过平滑处理的补偿角来矫正这一组3D关键点的位置信息。具体的,电子设备100可以获取上述一组3D关键点之前多组3D关键点对应的多个补偿角。电子设备100可以计算上述一组3D关键点对应的一个补偿角以及上述一组3D关键点之前多组3D关键点对应的多个补偿角的加权平均值。其中,与上述一组3D关键点确定的时间越接近的3D关键点对应的补偿角所占的权值可以越大。本申请实施例对计算上述加权平均值时各个补偿角所占的权值的大小不作具体限定。上述加权平均值即为经过平滑处理的补偿角。
在摄像头采集的相邻图像之间,每一帧图像中人体矩形框下边缘与x_i轴的距离、人体矩形框的高度是连续变化的。电子设备100对补偿角进行上述平滑处理可以减少计算得到的补偿角出现突变,提高3D关键点检测的准确率。
图10示例性示出了本申请实施例提供的另一种人体关键点检测方法的流程图。
这里具体以电子设备100利用每一帧图像确定出一组3D关键点为例进行说明。如图10所示,该方法可包括步骤S201~S207。其中:
S201、电子设备100可以通过摄像头采集图像。
步骤S201可以参考前述图8所示方法中的步骤S101。
S202、电子设备100可以确定初始补偿角。
初始补偿角的确定方法可以参考前述图6C所示实施例的介绍,这里不再赘述。
S203、电子设备100可以根据摄像头采集的第n帧图像,确定与第n帧图像对应的第t n组2D关键点,并根据第t n组2D关键点,估计用户与第n帧图像对应的第t n组3D关键点。
S204、电子设备100可以根据摄像头采集的第n-1帧图像和第n帧图像,确定从第n-1帧图像到第n帧图像人体矩形框下边缘的位移Δy n、人体矩形框高度的变化量Δh n和第n帧图像人体矩形框高度h n
上述n为大于1的整数。第n-1帧图像和第n帧图像为摄像头采集的图像中任意相邻的两帧图像。
电子设备100可以对摄像头采集的第1帧图像进行2D关键点检测,确定与第1帧图像 对应的第t 1组2D关键点。根据第t 1组2D关键点,电子设备100可以估计用户与第t 1帧图像对应的第t 1组3D关键点。电子设备100可以利用上述初始补偿角对上述第t 1组3D关键点的位置信息进行矫正。
人体矩形框下边缘的位移Δy n即为第n-1帧图像和第n帧图像之间人体矩形框下边缘与x_i轴的距离的变化量。
电子设备100确定上述Δy n和上述Δh n的方法可以参考前述图9B和图9C所示实施例的介绍,这里不再赘述。
S205、电子设备100可以根据上述Δy n、上述Δh n和上述h n,确定从第t n-1组3D关键点到第t n组3D关键点补偿角变化量,第t n-1组3D关键点是根据摄像头采集的第n-1帧图像得到的。
电子设备100可以利用补偿角确定模型来确定从第t n-1组3D关键点到第t n组3D关键点补偿角变化量Δα n。Δα n可以反映第n-1帧图像和第n帧图像之间人物透视形变的程度的变化量。
电子设备100获取补偿角确定模型的方法可以参考前述实施例。
电子设备100可以对摄像头采集的第n-1帧图像进行2D关键点检测,确定与第n-1帧图像对应的第t n-1组2D关键点。根据第t n-1组2D关键点,电子设备100可以估计用户与第n-1帧图像对应的第t n-1组3D关键点。
S206、电子设备100可以根据第t n-1个补偿角和上述补偿角变化量,确定第t n个补偿角,第t n-1个补偿角用于矫正第t n-1组3D关键点的位置信息,第t n-1个补偿角是根据初始补偿角和从第t 1组3D关键点至第t n-1组3D关键点之间所有相邻两组3D关键点的补偿角变化量之和。
电子设备100可以根据下述公式(2)确定第t n个补偿角:
α n=α n-1+Δα n       (2)
其中,α n为第t n个补偿角。α n-1为第t n-1个补偿角。
Figure PCTCN2022083227-appb-000005
α 1为上述初始补偿角。
Figure PCTCN2022083227-appb-000006
为从第t 1组3D关键点至第t n-1组3D关键点之间所有相邻两组3D关键点的补偿角变化量。
Figure PCTCN2022083227-appb-000007
可以反映从第1帧图像至第n-1帧图像之间所有相邻两帧图像人物透视形变的程度的变化量。
在一些实施例中,电子设备100可以确定在前一组3D关键点和在后一组3D关键点的补偿角变化量。上述在前一组3D关键点和上述在后一组3D关键点之间可间隔有一组或多组3D关键点。上述补偿角变化量可以根据采集在前一帧图像时用户的高度、用户与摄像头之间的距离和采集在后一帧图像时用户的高度、用户与摄像头之间的距离确定。上述在前一帧图像中用户的关键点即为上述在前一组3D关键点。上述在后一帧图像中用户的关键点即为上述在后一组3D关键点。电子设备100可以根据上述在前一组3D关键点的补偿角和上述补偿角变化量确定上述在后一组3D关键点的补偿角。即α n=α n-c+Δα′ n。其中,α n-c可以为第t 1组3D关键点至第t n-c组关键点的补偿角变化量与第t 1组3D关键点的补偿角之和。Δα′ n为在前一组3D关键点和在后一组3D关键点的补偿角变化量。
S207、电子设备100可以利用第t n个补偿角矫正第t n组3D关键点的位置信息。
由图10所示的方法可知,即便用户没有持续保持腿部直立,电子设备100仍可实时确定 出图像中人物透视形变的程度,并对基于2D关键点确定出的3D关键点的位置信息进行矫正。其中,电子设备100可以确定每一组3D关键点对应的补偿角。每一组3D关键点对应的补偿角可以随用户与摄像头之间的相对位置的变化、用户所做动作的变化而变化。与一组3D关键点对应的一个补偿角可以反映用于确定这一组3D关键点的图像中人物透视形变的程度。利用与一组3D关键点对应的一个补偿角来矫正这一组3D关键点的位置信息,可以提高3D关键点检测的准确率。
在上述图10所示的方法中,电子设备100利用补偿角确定模型确定出的补偿角变化量可能存在一定的误差。在确定第t n个补偿角时,电子设备100是以上述初始补偿角为基础,累加从第t 1组3D关键点至第t n组3D关键点之间所有相邻两组3D关键点的补偿角变化量。那么,补偿角变化量中的误差也会随之累加。上述t n的值越大,电子设备100确定出的第t n个补偿角所存在的误差也会越大。这会降低3D关键点检测的准确率。
在一些实施例中,基于上述图10所示的方法,电子设备100可以定期或者不定期检测用户的腿部是否直立。在检测到用户的腿部直立时,电子设备100可以计算基于2D关键点确定的一组3D关键点中腿部3D关键点所在直线与图像平面之间的夹角。若该夹角小于预设夹角,则电子设备100可以将该夹角确定为与这一组3D关键点对应的补偿角,并利用该补偿角对这一组3D关键点的位置信息进行矫正。进一步的,电子设备100可以以这一组3D关键点对应的补偿角为基础,按照图10所示的方法确定补偿角变化量,来确定后续与各组3D关键点对应的补偿角。
若再次检测到用户的腿部直立,电子设备100可以以此次用户腿部直立时基于2D关键点确定的一组3D关键点中腿部3D关键点所在直线与图像平面之间的夹角作为基础,来确定后续与各组3D关键点对应的补偿角。
这样,电子设备100可以减少通过累加补偿角变化量来确定补偿角时累加的误差。相比于图10所示的方法,上述实施例可以进一步减少计算图像中人物透视形变程度的误差,提高3D关键点检测的准确率。
图11示例性示出了本申请实施例提供的另一种人体关键点检测方法的流程图。
这里具体以电子设备100利用每一帧图像确定出一组3D关键点为例进行说明。如图11所示,该方法可包括步骤S301~S310。其中:
S301、电子设备100可以通过摄像头采集图像。
S302、电子设备100可以根据摄像头采集的第n帧图像,确定与第n帧图像对应的第t n组2D关键点,并根据第t n组2D关键点,估计用户与第n帧图像对应的第t n组3D关键点。
S303、电子设备100可以利用第t n组3D关键点检测用户的腿部是否直立。
S304、若检测出用户的腿部直立,电子设备100可以确定第t n组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角,图像平面为与摄像头的光轴垂直的平面。
S305、电子设备100可以判断上述夹角是否小于预设夹角。
上述步骤S301~S305可以参考前述图8所示方法中的步骤S101~S106。这里不再赘述。
S306、若上述夹角小于预设夹角,电子设备100可以将上述夹角确定为第t n个补偿角。
S307、电子设备100可以利用第t n个补偿角矫正第t n组3D关键点的位置信息。
上述步骤S301~S307为在检测到用户的腿部直立时,电子设备100可以计算基于2D关键点确定的3D关键点中腿部3D关键点所在的直线与图像平面之间的夹角。电子设备100可 以以该夹角作为补偿角,并利用该补偿角对此次用户腿部直立时基于2D关键点确定的3D关键点的位置信息进行矫正。也即是说,若检测到用户的腿部直立,电子设备100可以不用以初始补偿角或者最近一次检测到用户的腿部直立时确定出的补偿角作为基础,来确定此时3D关键点对应的补偿角。
S308、若电子设备100在上述步骤S303中检测出用户的腿部不直立,或者,若电子设备100在上述步骤S305中判断出上述夹角大于或等于预设夹角,电子设备100可以根据摄像头采集的第n-1帧图像和第n帧图像,确定从第n-1帧图像到第n帧图像人体矩形框下边缘的位移Δy n、人体矩形框高度的变化量Δh n和第n帧图像人体矩形框高度h n
S309、电子设备100可以根据上述Δy n、上述Δh n和上述h n,确定从第t n-1组3D关键点到第t n组3D关键点补偿角变化量,第t n-1组3D关键点是根据摄像头采集的第n-1帧图像得到的。
上述步骤S308和步骤S309可以分别参考前述图10所示方法中的步骤S204和步骤S205。这里不再赘述。
S310、电子设备100可以根据第t n-1个补偿角和上述补偿角变化量,确定第t n个补偿角,第t n-1个补偿角用于矫正第t n-1组3D关键点的位置信息,第t n-1个补偿角是第t n-1组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角,或者是第t p个补偿角和从第t p组3D关键点至第t n-1组3D关键点之间所有相邻两组3D关键点的补偿角变化量之和,第t p个补偿角是电子设备100在摄像头采集到第n-1帧图像之前的时间内最后一次检测到用户的腿部直立时计算得到的。
电子设备100可以根据前述公式(2)α n=α n-1+Δα n来确定。其中,α n为第t n个补偿角。α n-1为第t n-1个补偿角。α n-1的计算方式与前述图10所示的方法不同。
具体的,若电子设备100利用第t n-1组3D关键点检测出用户的腿部直立,且第t n-1组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角小于预设夹角,则第t n-1个补偿角α n-1可以为第t n-1组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角。
或者,
Figure PCTCN2022083227-appb-000008
p为小于n-1的正整数。即第p帧图像为摄像头在采集到第n-1帧图像之前采集到的图像。α p为第t p个补偿角。α p可用于矫正第t p组3D关键点的位置信息。α p是电子设备100在摄像头采集到第n-1帧图像之前的时间内最后一次检测到用户的腿部直立时计算得到的。即α p为第t p组3D关键点中腿部的3D关键点所在的直线与图像平面之间的夹角小于预设夹角。
Figure PCTCN2022083227-appb-000009
为从第t p组3D关键点至第t n-1组3D关键点之间所有相邻两组3D关键点的补偿角变化量。
Figure PCTCN2022083227-appb-000010
可以反映从第p帧图像至第n-1帧图像之间所有相邻两帧图像人物透视形变的程度的变化量。
当确定了上述第t n个补偿角α n,电子设备100可以执行步骤S307,即利用α n来矫正第t n组3D关键点的位置信息。
在一些实施例中,电子设备100可以确定在前一组3D关键点和在后一组3D关键点的补偿角变化量。上述在前一组3D关键点和上述在后一组3D关键点之间可间隔有一组或多组3D关键点。上述补偿角变化量可以根据采集在前一帧图像时用户的高度、用户与摄像头之间的距离和采集在后一帧图像时用户的高度、用户与摄像头之间的距离确定。上述在前一帧图像中用户的关键点即为上述在前一组3D关键点。上述在后一帧图像中用户的关键点即为上 述在后一组3D关键点。电子设备100可以根据上述在前一组3D关键点的补偿角和上述补偿角变化量确定上述在后一组3D关键点的补偿角。
可选的,上述在前一帧图像可以是电子设备100检测到用户的腿部直立时采集的。例如,上述在前一帧图像可以是在上述在后一帧图像的采集时间之前,电子设备100最近一次检测到用户的腿部直立时采集的。
在上述图11所示的方法中,电子设备100可以不对每一组3D关键点均检测用户的腿部是否直立。可选的,电子设备100可以定期或者不定期检测用户的腿部是否直立。或者,电子设备100可以在健身课程播放至指示用户完成腿部直立的动作时监测用户的腿部是否直立。在健身课程播放至指示用户完成腿部直立的动作时,电子设备100可以利用当前采集的图像确定用户的3D关键点。电子设备100可以根据图11中的步骤S308~S310来确定补偿角,并利用该补偿角来矫正上述3D关键点。
由图11所示的方法可知,在检测到用户的腿部直立时,电子设备100可以按照前述图8所示的方法直接确定补偿角。否则,电子设备100可以以初始补偿角或者最近一次检测到用户的腿部直立时确定出的补偿角作为基础,按照前述图10所示的方法来确定补偿角。这可以减少通过累加补偿角变化量来确定补偿角时累加的误差。相比于图8和图10所示的方法,上述实施例可以进一步减少计算图像中人物透视形变程度的误差,提高3D关键点检测的准确率。
不限于前述实施例中在检测到用户的腿部直立时确定补偿角的方法,电子设备100还可以在检测到用户的姿态与预设姿态匹配时确定补偿角。其中,上述预设姿态可以为上半身直立且/或腿部直立的姿态。也即是说,前述检测用户的腿部是否直立是检测用户的姿态是否与预设姿态匹配的一种实现方式。
若检测到用户的上半身直立,电子设备100可以将上半身3D关键点所在的直线与图像平面的夹角确定为补偿角,并利用该补偿角对3D关键点进行矫正。
若检测到用户的上半身和腿部均直立,电子设备100可以将上半身3D关键点和/或腿部3D关键点所在的直线与图像平面的夹角确定为补偿角,并利用该补偿角对3D关键点进行矫正。
图12示例性示出了另一种人体的关键点的位置分布图。如图12所示,人体的关键点可包括头部点、第一颈部点、第二颈部点、左肩点、右肩点、右肘点、左肘点、右手点、左手点、第一胸腹部点、第二胸腹部点、第三胸腹部点、右髋点、左髋点、左右髋中间点、右膝点、左膝点、右脚踝点、左脚踝点。不限于上述关键点,本申请实施例中还可以包括其他包括关键点,此处不做具体限定。
相比于图1所示的关键点,电子设备100可以识别更多的人体的关键点。其中,电子设备100可以根据摄像头采集的图像识别图12所示各关键点在2D平面的位置信息以及在3D空间中的位置信息,确定与图12所示各关键点对应的2D关键点和3D关键点。
图13示例性示出了本申请实施例提供的另一种人体关键点检测方法的流程图。
如图13所示,该方法可包括步骤S401~S408。其中:
S401、电子设备100可以通过摄像头采集图像。
S402、电子设备100可以根据摄像头采集的m帧图像,确定与这m帧图像对应的m组2D关键点。
S403、电子设备100可以根据m组2D关键点,估计用户与这m帧图像对应的一组3D关键点。
上述步骤S401~S403可以参考前述图8中的步骤S101~S103,这里不再赘述。
S404、电子设备100可以利用与上述m帧图像对应的一组3D关键点确定用户的姿态是否与预设姿态匹配,预设姿态为上半身直立且/或腿部直立的姿态。
在一些实施例中,上述m帧图像可以为摄像头采集的任意m帧图像。即电子设备100可以判断每一组3D关键点所确定的用户的姿态是否与预设姿态匹配。若用户的姿态与预设姿态匹配,电子设备100可以执行下述步骤S405。若用户的姿态与预设姿态不匹配,电子设备100可以执行下述步骤S408。
在一些实施例中,上述预设姿态可以为健身课程或体感游戏中包含的姿态。电子设备100可以获取健身课程或体感游戏指示用户完成的动作。其中,电子设备100可以存储上述预设姿态在健身课程或体感游戏中播放的时刻以及上述预设姿态对应的3D关键点。在健身课程或体感游戏播放至上述预设姿态所在的时刻,电子设备100可以比较上述m帧图像对应的一组3D关键点与上述预设姿态对应的3D关键点。若上述m帧图像对应的一组3D关键点所指示的用户的姿态与上述预设姿态匹配,电子设备100可以执行下述步骤S405。上述m帧图像为摄像头在健身课程或体感游戏播放至上述预设姿态所在的时刻采集到的。
可选的,电子设备100可以存储上述预设姿态在健身课程或体感游戏中播放的时刻。在健身课程或体感游戏播放至上述预设姿态所在的时刻,电子设备100可以获取上述预设姿态对应的3D关键点。进一步的,电子设备100可以比较上述m帧图像对应的一组3D关键点与上述预设姿态对应的3D关键点。
在一些实施例中,上述m帧图像为摄像头在健身课程或体感游戏播放至上述预设姿态以外的时刻采集的。电子设备100可以利用最近一次健身课程或体感游戏播放至上述预设姿态时确定的补偿角来矫正上述m帧图像确定的3D关键点。也即是说,电子设备100可以不用判断每一组3D关键点确定的用户的姿态是否与预设姿态匹配。这可以节省电子设备100的计算资源。
不限于上述健身课程或体感游戏中包含的姿态,上述预设姿态可以是其它应用的动作库中所包含的姿态。上述动作库中可存储有指示用户完成各个动作的信息(如动作的图像信息、动作的音频信息等等)。
在一些实施例中,在健身课程或体感游戏开始播放前的一段时间,或者在健身课程或体感游戏播放的过程中,电子设备100可以指示用户完成预设姿态对应的动作。即上述预设姿态可以不包含在健身课程或体感游戏中。在健身课程或体感游戏等应用运行时,电子设备100可以定时或不定时指示用户完成预设姿态对应的动作。在指示用户完成预设姿态对应的动作时,电子设备100可以比较上述m帧图像对应的一组3D关键点与上述预设姿态对应的一组3D关键点。若上述m帧图像对应的一组3D关键点所指示的用户的姿态与上述预设姿态匹配,电子设备100可以执行下述步骤S405。上述m帧图像为摄像头在电子设备100指示用户完成预设姿态对应的动作时采集到的。
可选的,在判断用户的姿态是否与预设姿态匹配时,电子设备100可以利用一组3D关键点中的一部分关键点(如上半身的关键点,或者腿部的关键点)进行比较。示例性的,上述预设姿态为上半身直立的姿态,电子设备100可以比较用户的一组3D关键点中上半身3D 关键点与上述预设姿态对应的一组3D关键点中上半身3D关键点。若这两组上半身3D关键点的位置信息相同或者差值小于阈值,则电子设备100可以判断出用户的姿态与预设姿态匹配。上述预设姿态为腿部直立的姿态,电子设备100可以比较用户的一组3D关键点中腿部3D关键点与上述预设姿态对应的一组3D关键点中腿部3D关键点。若这两组腿部3D关键点的位置信息相同或者差值小于阈值,则电子设备100可以判断出用户的姿态与预设姿态匹配。
S405、电子设备100可以确定上述一组3D关键点中部分3D关键点所在的直线与图像平面之间的夹角,部分3D关键点包括上半身的3D关键点和/或腿部的3D关键点,图像平面为与摄像头的光轴垂直的平面。
在一些实施例中,上述预设姿态为上半身直立的姿态。电子设备100检测出用户的姿态与上述预设姿态匹配可以表示与上述m帧图像对应的一组3D关键点中,颈部点(如图12所示的第一颈部点和第二颈部点)以及胸腹部点(如图12所示的第一胸腹部点、第二胸腹部点和第三胸腹部点)近似在一条直线上。电子设备100可以确定上述一组3D关键点中第一颈部点、第二颈部点、第一胸腹部点、第二胸腹部点和第三胸腹部点任意两个3D关键点所在直线与图像平面之间的夹角。例如,电子设备100确定上述一组3D关键点中第一颈部点和第三胸腹部点所在直线与图像平面之间的夹角。
可选的,在检测出用户的姿态与上述预设姿态匹配时,电子设备100可以计算第一颈部点、第二颈部点、第一胸腹部点、第二胸腹部点和第三胸腹部点中多条由任意两个3D关键点所在直线与图像平面之间的夹角的均值。
在一些实施例中,上述预设姿态为腿部直立的姿态。电子设备100可以根据前述实施例的方法确定腿部3D关键点所在的直线与图像平面之间的夹角。这里不作赘述。
在一些实施例中,上述预设姿态为上半身直立且腿部直立的姿态。电子设备100检测出用户的姿态与上述预设姿态匹配可以表示与上述m帧图像对应的一组3D关键点中,第一颈部点、第二颈部点、第一胸腹部点、第二胸腹部点、第三胸腹部点、右髋点、右膝点、右脚踝点、左髋点、左膝点、左脚踝点近似在一个平面上。电子设备100可以确定上述一组3D关键点中第一颈部点、第二颈部点、第一胸腹部点、第二胸腹部点、第三胸腹部点、右髋点(或左髋点)、右膝点(或左膝点)、右脚踝点(或左脚踝点)任意两个3D关键点所在直线与图像平面之间的夹角。例如,电子设备100确定上述一组3D关键点中第一颈部点和右脚踝点所在直线与图像平面之间的夹角。
可选的,在检测出用户的姿态与上述预设姿态匹配时,电子设备100可以计算第一颈部点、第二颈部点、第一胸腹部点、第二胸腹部点、第三胸腹部点、右髋点(或左髋点)、右膝点(或左膝点)、右脚踝点(或左脚踝点)中多条由任意两个3D关键点所在直线与图像平面之间的夹角的均值。
S406、电子设备100可以判断上述夹角是否小于预设夹角。
S407、若上述夹角小于预设夹角,电子设备100可以利用上述夹角更新自己已存储的补偿角,更新之后电子设备100存储的补偿角为上述夹角。
S408、电子设备100可以利用自己存储的补偿角对这一组3D关键点的位置信息进行矫正。
上述步骤S406~S408可以参考前述图8中的步骤S106~S108,这里不再赘述。
前述图10所示方法流程图中确定初始补偿角的方法可以为上述图13所示的方法。在前述图11所示的方法流程图中,不限于在检测用户的腿部直立时确定补偿角,电子设备100还 可以利用上述图13所示的方法来确定补偿角。
由上述图13所示的方法可知,在检测到用户的姿态与预设姿态匹配时,电子设备100可以利用基于图像确定出的3D关键点来确定图像中人物透视形变的程度。上述预设姿态可以是上半身直立且/或腿部直立的姿态。也即是说,在用户的上半身直立且/或腿部直立时,电子设备100均可以利用基于图像确定出的3D关键点来确定图像中人物透视形变的程度。进而,电子设备100可以对基于图像确定出的3D关键点的位置信息进行矫正,减少图像透视形变对3D关键点的位置信息带来的误差,提高3D关键点的位置信息检测的准确度。该方法可以只需要一个摄像头,不仅节约成本,而且进行关键点检测的计算复杂度也较低。
另外,每一次检测到用户的姿态与预设姿态匹配时,若上半身的3D关键点和/或腿部的3D关键点所在的直线与图像平面之间的夹角可用于矫正3D关键点的位置信息,电子设备100可以更新补偿角。更新之后的补偿角可以更准确地反映用户在当前位置下,摄像头采集的图像中人物透视形变的程度。这可以减少用户与摄像头之间的位置发生变化对3D关键点的位置信息进行矫正的影响,提高矫正后3D关键点的位置信息的准确度。也即是说,电子设备100可以利用更新之后的补偿角来矫正3D关键点的位置信息。经过矫正之后的3D关键点确定的人体模型可以更准确地反映用户的姿态,提高姿态检测的准确率。这样,在健身或体感游戏中,电子设备100可以更准确地判断出用户的姿态是否正确以及用户动作的幅度是否满足要求等,使得用户在进行健身或体感游戏时具有更好的体验。
在本申请实施例中,电子设备可以根据第一多媒体信息确定第一时刻,该第一时刻可以为第一多媒体信息指示用户进行满足第一条件的动作的时刻。上述第一多媒体信息可以为健身课程或体感游戏应用中的相关内容。上述多媒体信息可以包括视频、动画、语音、文字等方式中的一或多种类型的内容。上述满足第一条件的动作可以为上半身直立的动作和/或腿部直立的动作。电子设备可以通过比较用户的3D关键点和上述满足第一条件的动作对应的3D关键点,来判断用户是否进行满足第一条件的动作。可选的,若上述满足第一条件的动作为腿部直立的动作,电子设备可以根据前述图7A所示的方法来判断用户是否进行满足第一条件的动作。
在本申请实施例中,电子设备可以根据第一模型确定补偿角变化量。上述第一模型即为前述实施例中的补偿角确定模型。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (18)

  1. 一种人体关键点检测方法,其特征在于,所述方法应用于包含一个或多个摄像头的电子设备,所述方法包括:
    通过所述摄像头获取第一用户的第一图像;
    根据所述第一图像确定所述第一用户的第一组3D关键点;
    判断所述第一组3D关键点中的多个3D关键点是否满足第一条件;
    若满足所述第一条件,根据所述多个3D关键点确定第一补偿角;
    利用所述第一补偿角对所述第一组3D关键点进行旋转矫正。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    若不满足所述第一条件,利用第二补偿角对所述第一组3D关键点进行旋转矫正,所述第二补偿角根据第二组3D关键点确定,所述第二组3D关键点为在获取所述第一图像之前最近一组满足所述第一条件的3D关键点。
  3. 根据权利要求1或2所述的方法,其特征在于,所述通过所述摄像头获取第一用户的第一图像,所述方法具体包括:
    根据第一多媒体信息确定第一时刻,所述第一时刻为所述第一多媒体信息指示用户进行满足所述第一条件的动作的时刻;
    通过所述摄像头在所述第一时刻开始的第一时间段内,获取所述第一用户的所述第一图像。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    若在第二时刻所述多媒体信息指示用户进行的动作对应的3D关键点不满足所述第一条件,利用第三补偿角对根据从所述第二时刻开始的第二时间段内采集的图像确定的3D关键点进行旋转矫正,所述第三补偿角根据第三组3D关键点确定,所述第三组3D关键点为在所述第二时刻之前最近一组满足所述第一条件的3D关键点。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述判断所述第一组3D关键点中的多个3D关键点是否满足第一条件,所述方法具体包括:
    判断所述第一组3D关键点中的多个3D关键点是否与第一动作对应的3D关键点匹配,所述第一动作为上半身、腿部中至少一项直立的动作。
  6. 根据权利要求5所述的方法,其特征在于,所述第一动作为上半身直立的动作,所述第一补偿角为所述第一组3D关键点中颈部点与胸腹部点所在直线与所述图像平面的夹角。
  7. 根据权利要求5所述的方法,其特征在于,所述第一动作为腿部直立的动作,所述第一补偿角为所述第一组3D关键点中髋点、膝点、脚踝点中任意两个3D关键点所在直线与所述图像平面的夹角。
  8. 根据权利要求1-4中任一项所述的方法,其特征在于,所述第一组3D关键点中的多个3D关键点包括髋点、膝点、脚踝点,所述判断所述第一组3D关键点中的多个3D关键点是否满足第一条件,所述方法具体包括:
    计算所述第一组3D关键点中左髋点与左膝点所在的直线、左膝点与左脚点所在的直线之间的第一夹角,以及所述第一组3D关键点中右髋点与右膝点所在的直线、右膝点与右脚点所在的直线之间的第二夹角;
    通过检测所述第一夹角与180°之间的差值是否小于第一差值、所述第二夹角与180°之间的差值是否小于所述第一差值来判断所述第一组3D关键点中的多个3D关键点是否满足第一条件。
  9. 根据权利要求8所述的方法,其特征在于,所述第一组3D关键点中的多个3D关键点满足所述第一条件的情况包括:所述第一夹角与180°之间的差值小于所述第一差值且/或所述第二夹角与180°之间的差值小于所述第一差值。
  10. 根据权利要求2-9中任一项所述的方法,其特征在于,所述第二补偿角为所述第二组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与所述图像平面的夹角。
  11. 根据权利要求4-10中任一项所述的方法,其特征在于,所述第三补偿角为所述第三组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与所述图像平面的夹角。
  12. 根据权利要求2-9中任一项所述的方法,其特征在于,所述第二补偿角为第一补偿角变化量与第三夹角之和;所述第三夹角为所述第二组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与所述图像平面的夹角;所述第一补偿角变化量是根据第一高度H1,第一距离Y1,第二高度H2以及第二距离Y2确定出来的,其中,所述H1和所述Y1分别是采集第一图像时所述第一用户的高度、所述第一用户与所述摄像头的距离,所述H2和所述Y2分别是采集第二图像时所述第一用户的高度、所述第一用户与所述摄像头的距离;所述第一用户在所述第一图像中的关键点为所述第一组3D关键点,所述第一用户在所述第二图像中的关键点为所述第二组3D关键点;
    其中,从采集所述第二图像至采集所述第一图像之间的时间段内,所述第一用户的高度减少得越多,所述第二补偿角相比于所述第三夹角增加得越多,所述第一用户与所述摄像头的距离减少得越多,所述第二补偿角相比于所述第三夹角增加得越多,所述H1越小,所述第二补偿角相比于所述第三夹角增加得越多。
  13. 根据权利要求4-9或12中任一项所述的方法,其特征在于,所述第三补偿角为第二补偿角变化量与第四夹角之和;所述第四夹角为所述第三组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与所述图像平面的夹角;所述第二补偿角变化量是根据第三高度H3,第三距离Y3,第四高度H4以及第四距离Y4确定出来的,其中,所述H3和所述Y3分别是在所述第二时间段内所述第一用户的高度、所述第一用户与所述摄像头的距离,所述H4和所述Y4分别是采集第三图像时所述第一用户的高度、所述第一用户与所述摄像头的距离;所述第一用户在所述第三图像中的关键点为所述第三组3D关键点;
    其中,从采集所述第二图像至所述第二时间段之间的时间段内,所述第一用户的高度减 少得越多,所述第三补偿角相比于所述第四夹角增加得越多,所述第一用户与所述摄像头的距离减少得越多,所述第三补偿角相比于所述第四夹角增加得越多,所述H3越小,所述第三补偿角相比于所述第四夹角增加得越多。
  14. 根据权利要求13所述的方法,其特征在于,所述第一补偿角变化量和所述第二补偿角均由第一模型确定,所述第一模型由多组训练样本训练得到,一组所述训练样本包括:采集时间在前的图像和采集时间在后的图像之间人体在图像中所在位置的下边缘的变化量、所述人体在图像中高度的变化量、所述人体在所述采集时间在后的图像中的高度、根据所述采集时间在前的图像和所述采集时间在后的图像确定的两组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与所述图像平面的夹角的变化量;所述两组3D关键点中的多个3D关键点均满足所述第一条件。
  15. 根据权利要求2-14中任一项所述的方法,其特征在于,在根据所述多个3D关键点确定第一补偿角之前,所述方法还包括:
    判断出所述第一组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与所述图像平面的夹角小于第五夹角;
    若所述第一组3D关键点中上半身3D关键点和/或腿部3D关键点所在直线与所述平面图像的夹角大于所述第五夹角,利用所述第二补偿角对所述第一组3D关键点进行旋转矫正。
  16. 一种电子设备,其特征在于,所述电子设备包括摄像头、显示屏、存储器和处理器,其中:所述摄像头用于采集图像,所述存储器用于存储计算机程序,所述处理器用于调用所述计算机程序,使得所述电子设备执行权利要求1-15中任一项所述的方法。
  17. 一种计算机存储介质,其特征在于,包括:计算机指令;当所述计算机指令在电子设备上运行时,使得所述电子设备执行权利要求1-15中任一项所述的方法。
  18. 一种计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行权利要求1-15中任一项所述的方法。
PCT/CN2022/083227 2021-03-31 2022-03-26 人体关键点检测方法及相关装置 WO2022206639A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22778821.3A EP4307220A4 (en) 2021-03-31 2022-03-26 METHOD FOR DETECTING A KEY POINT OF THE HUMAN BODY AND ASSOCIATED DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110351266.0A CN115147339A (zh) 2021-03-31 2021-03-31 人体关键点检测方法及相关装置
CN202110351266.0 2021-03-31

Publications (1)

Publication Number Publication Date
WO2022206639A1 true WO2022206639A1 (zh) 2022-10-06

Family

ID=83404628

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083227 WO2022206639A1 (zh) 2021-03-31 2022-03-26 人体关键点检测方法及相关装置

Country Status (3)

Country Link
EP (1) EP4307220A4 (zh)
CN (1) CN115147339A (zh)
WO (1) WO2022206639A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173382A (zh) * 2023-10-27 2023-12-05 南京维赛客网络科技有限公司 Vr交互中的虚拟数字人体态矫正方法、系统及存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115410137B (zh) * 2022-11-01 2023-04-14 杭州新中大科技股份有限公司 基于时空特征的双流工人劳动状态识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140267611A1 (en) * 2013-03-14 2014-09-18 Microsoft Corporation Runtime engine for analyzing user motion in 3d images
CN108305283A (zh) * 2018-01-22 2018-07-20 清华大学 基于深度相机和基本姿势的人体行为识别方法及装置
CN110717937A (zh) * 2019-09-29 2020-01-21 深圳市图郅创新科技有限公司 一种图像矫正方法及其系统、电子设备与可储存媒体
CN111243106A (zh) * 2020-01-21 2020-06-05 杭州微洱网络科技有限公司 一种基于2d人体图像修正三维人体模型的方法
WO2020147796A1 (zh) * 2019-01-18 2020-07-23 北京市商汤科技开发有限公司 图像处理方法及装置、图像设备及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10546387B2 (en) * 2017-09-08 2020-01-28 Qualcomm Incorporated Pose determination with semantic segmentation
GB2598452B (en) * 2020-06-22 2024-01-10 Snap Inc 3D object model reconstruction from 2D images

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140267611A1 (en) * 2013-03-14 2014-09-18 Microsoft Corporation Runtime engine for analyzing user motion in 3d images
CN108305283A (zh) * 2018-01-22 2018-07-20 清华大学 基于深度相机和基本姿势的人体行为识别方法及装置
WO2020147796A1 (zh) * 2019-01-18 2020-07-23 北京市商汤科技开发有限公司 图像处理方法及装置、图像设备及存储介质
CN110717937A (zh) * 2019-09-29 2020-01-21 深圳市图郅创新科技有限公司 一种图像矫正方法及其系统、电子设备与可储存媒体
CN111243106A (zh) * 2020-01-21 2020-06-05 杭州微洱网络科技有限公司 一种基于2d人体图像修正三维人体模型的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4307220A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173382A (zh) * 2023-10-27 2023-12-05 南京维赛客网络科技有限公司 Vr交互中的虚拟数字人体态矫正方法、系统及存储介质
CN117173382B (zh) * 2023-10-27 2024-01-26 南京维赛客网络科技有限公司 Vr交互中的虚拟数字人体态矫正方法、系统及存储介质

Also Published As

Publication number Publication date
EP4307220A1 (en) 2024-01-17
EP4307220A4 (en) 2024-09-11
CN115147339A (zh) 2022-10-04

Similar Documents

Publication Publication Date Title
WO2022206639A1 (zh) 人体关键点检测方法及相关装置
CN108008930B (zh) 确定k歌分值的方法和装置
WO2021036568A1 (zh) 辅助健身的方法和电子装置
CN108989672B (zh) 一种拍摄方法及移动终端
WO2020237611A1 (zh) 图像处理方法、装置、控制终端及可移动设备
KR20210111833A (ko) 타겟의 위치들을 취득하기 위한 방법 및 장치와, 컴퓨터 디바이스 및 저장 매체
CN109819167B (zh) 一种图像处理方法、装置和移动终端
WO2022193989A1 (zh) 电子设备的操作方法、装置和电子设备
CN112069863B (zh) 一种面部特征的有效性判定方法及电子设备
CN109803165A (zh) 视频处理的方法、装置、终端及存储介质
CN111031234B (zh) 一种图像处理方法及电子设备
CN113365085B (zh) 一种直播视频生成方法及装置
EP3991814A1 (en) Intelligent speech playing method and device
CN111091519B (zh) 一种图像处理方法及装置
CN111008929B (zh) 图像矫正方法及电子设备
WO2022062884A1 (zh) 文字输入方法、电子设备及计算机可读存储介质
CN113325948A (zh) 隔空手势的调节方法及终端
EP4006754A1 (en) Prompting method for fitness training, and electronic device
JP7341324B2 (ja) 目標ユーザのロック方法および電子デバイス
WO2022111704A1 (zh) 心率检测方法及电子设备
WO2022214004A1 (zh) 一种目标用户确定方法、电子设备和计算机可读存储介质
CN111563838B (zh) 图像处理方法及电子设备
CN113763531B (zh) 三维人脸重建方法、装置、电子设备及存储介质
CN109858364A (zh) 一种人脸图像的处理方法和移动终端
CN109618097B (zh) 辅助拍照方法及终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778821

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022778821

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022778821

Country of ref document: EP

Effective date: 20231013

NENP Non-entry into the national phase

Ref country code: DE