CN118230352A

CN118230352A - Human body posture estimation method, device and system

Info

Publication number: CN118230352A
Application number: CN202410246305.4A
Authority: CN
Inventors: 陈训教; 郑新莹; 陈侠达
Original assignee: Orbbec Inc
Current assignee: Orbbec Inc
Priority date: 2024-03-05
Filing date: 2024-03-05
Publication date: 2024-06-21

Abstract

The application is applicable to the technical field of computer vision, and provides a human body posture estimation method, device and system, wherein the method comprises the following steps: acquiring a first color image and a first depth image corresponding to the first color image, wherein the first color image and the first depth image both comprise a target object; processing the first color image to obtain a first human body grid model of the target object; constructing a first point cloud corresponding to the target object according to the first depth image; and carrying out global rigid registration on the first point cloud and a first grid point set of the first human body grid model, and optimizing the first human body grid model by utilizing the registered first point cloud so as to obtain a first human body posture of the target object through the optimized first human body grid model. According to the method and the device, the accuracy of human body posture estimation can be improved, and the real human body posture which is more attached to the target object can be obtained.

Description

Human body posture estimation method, device and system

Technical Field

The application belongs to the technical field of computer vision, and particularly relates to a human body posture estimation method, device and system.

Background

Human body posture estimation refers to analyzing and identifying human body posture in an image or a video by utilizing a computer vision technology, so that the position and posture information of each part of a human body are determined. The human body posture estimation method can be widely applied to the fields of computer vision, robots, human-computer interaction and the like, such as the fields of motion capture and character animation, human body behavior analysis, fitness training assistance, medical rehabilitation and the like.

In the related art, based on a deep learning or pose estimation network, a three-dimensional pose of a human body is estimated from a single color image (e.g., RGB image). On one hand, the mode needs to label training data on a large scale, and has the defects of high data acquisition and labeling difficulty and high cost; on the other hand, the human body posture has diversity and ambiguity, and the visual angle of a single color image is limited, so that the comprehensive human body posture information cannot be acquired through the single color image, and the estimation accuracy of the three-dimensional human body posture is poor.

Disclosure of Invention

The embodiment of the application provides a human body posture estimation method, device and system, which can improve the accuracy of human body posture estimation and obtain the real human body posture which is more fit with a target object.

In a first aspect, an embodiment of the present application provides a method for estimating a human body posture, including: firstly, acquiring a first color image and a first depth image corresponding to the first color image, wherein the first color image and the first depth image both comprise a target object; then, processing the first color image to obtain a first human body grid model of the target object; then, constructing a first point cloud corresponding to the target object according to the first depth image; and finally, carrying out global rigid registration on the first point cloud and a first grid point set of the first human body grid model, and optimizing the first grid point of the first human body grid model by utilizing the registered first point cloud so as to obtain the first human body posture of the target object through the optimized first human body grid model.

In a second aspect, an embodiment of the present application provides a device for estimating a human body posture, including: the system comprises an image acquisition module, a grid model acquisition module, a point cloud construction module, a first registration module and a human body posture reconstruction module; the image acquisition module is used for acquiring a first color image and a first depth image corresponding to the first color image, wherein the first color image and the first depth image both comprise a target object; the grid model acquisition module is used for processing the first color image to obtain a first human body grid model of the target object; the point cloud construction module is used for constructing a first point cloud corresponding to the target object according to the first depth image; the gesture estimation module is used for carrying out global rigid registration on the first point cloud and a first grid point set of the first human body grid model, and optimizing the first grid point of the first human body grid model by utilizing the registered first point cloud so as to obtain the first human body gesture of the target object through the optimized first human body grid model.

In a third aspect, an embodiment of the present application provides a system for estimating a human body posture, including: the image acquisition device is used for acquiring color images and depth images of the target object; a processing device for processing the color image and the depth image according to the human body posture estimation method as in any one of the above first aspect to obtain the human body posture of the target object.

In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the steps of the method for estimating a human body posture according to any of the first aspects described above.

In a fifth aspect, an embodiment of the present application provides a computer program product, which when run on an electronic device, causes the electronic device to perform the method of estimating a human body posture according to any of the first aspects described above.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

When the human body posture estimation method provided by the embodiment of the application is used for estimating the human body posture, firstly, a first color image and a first depth image of a first color image object are obtained, the human body posture is estimated by using the first color image, and a first human body grid model corresponding to a target object is generated; then, constructing a first point cloud of the target object according to the first depth image; and finally, carrying out global rigid registration on grid points of the first human body grid model by using a first point cloud reflecting the real gesture of the target object to be subjected to human body gesture reconstruction, and adjusting the first human body grid model based on the registered first point cloud, so that the real gesture of the target object can be more attached. In this way, the point cloud data constructed by the depth image is utilized to register the grid points of the human body grid model constructed based on the color image, so that the problem that the comprehensive human body posture information cannot be acquired due to the limited view angle of the single color image can be solved, the accuracy of human body posture estimation can be improved, and the real human body posture which is more attached to the target object can be obtained. Furthermore, the embodiment does not need to label training data on a large scale, so that the data processing difficulty can be reduced, and the cost can be reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a human body posture estimation system according to an embodiment of the present application;

fig. 2 is a flow chart of a method for estimating a human body posture according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a human body posture estimating apparatus according to an embodiment of the present application;

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

In the related art, based on a deep learning or pose estimation network, a three-dimensional pose of a human body is estimated from a single color image (e.g., RGB image). On one hand, the mode needs to label training data on a large scale, and has the defects of high data acquisition and labeling difficulty and high cost; on the other hand, because the view angle of the single color image is limited, the single color image can not acquire the comprehensive human body posture information, so that the estimation accuracy of the three-dimensional posture of the human body is poor; on the other hand, as the human body posture has diversity and ambiguity, the image quality, the visual angle and the shielding condition of the color image can influence the accuracy of estimating the three-dimensional posture of the human body, and the accuracy of estimating the three-dimensional posture of the human body is difficult to ensure.

In contrast, in the prior art, two-dimensional joint point information of a human body is extracted from a color image, and posture recovery is performed by using a corresponding relationship between the two-dimensional joint point and the three-dimensional posture so as to estimate the three-dimensional posture of the human body. However, the method needs to label the color image and the three-dimensional gesture corresponding to the color image, that is, the method still needs to label training data on a large scale, and the difficulty and cost of data acquisition and labeling are high.

In order to solve the above technical problems, an embodiment of the present application provides a method for estimating a human body posture, when estimating a human body posture, firstly, acquiring a first color image and a first depth image of a first color image object, and estimating a human body posture by using the first color image to generate a first human body grid model corresponding to a target object; then, constructing a first point cloud of the target object according to the first depth image; finally, global rigid registration is carried out on grid points of the first human body grid model by utilizing a first point cloud reflecting the real gesture of the target object to be subjected to human body gesture reconstruction, and the first human body grid model is adjusted based on the registered grid points, so that the first human body gesture which is more fit with the real gesture of the target object can be obtained. In this way, the point cloud data constructed by the depth image is utilized to register the grid points of the human body grid model constructed based on the color image, so that the problem that the comprehensive human body posture information cannot be acquired due to the limited view angle of the single color image can be solved, the accuracy of human body posture estimation can be improved, and the human body posture of the real human body which is more attached to the target object can be obtained. Furthermore, the embodiment does not need to label training data on a large scale, so that the data processing difficulty can be reduced, and the cost can be reduced.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

Referring to fig. 1, a schematic structural diagram of a human body posture estimation system according to an embodiment of the present application is shown. As shown in fig. 1, the human body pose estimation system may include an image acquisition device 110, a processing device 120, and a display device 130. The processing device 120 is in communication connection with the image acquisition device 110 through a wired or wireless network, so as to realize data transmission. The processing device 120 is in communication connection with the display device 130 via a wired or wireless network to enable data transmission. In the estimation system, the image acquisition device 110 is configured to acquire a color image and a depth image of a target object, and transmit the acquired color image and depth image to the processing device 120; the processing device 120 is configured to process the color image to obtain a human body mesh model of the target object, construct a target point cloud corresponding to the target object according to the depth image, and register grid points of the human body mesh model based on the target point cloud to obtain a human body posture of the target object; thereafter, the processing device 120 transmits the human body pose of the target object to the display device 130 to cause the display device 130 to display the human body pose of the target object.

In some embodiments, the image capture device 110 may include a color camera and a depth camera, wherein the color camera is used to capture color images including the target object. The color image may be an RGB image or a YUV image, for example.

The depth camera is used to acquire a depth image including a target object. A depth image is an image that includes depth information. In some examples, the depth camera may be a structured light camera, a time of flight (TOF) camera, or a binocular camera. When the depth camera is a structured light camera, the collected depth image may be a structured light image or an image obtained by processing the structured light image. When the depth camera is a TOF camera, the acquired depth image may be a phase image or a histogram, or an image obtained by processing the phase image or the histogram. When the depth camera is a binocular camera, the acquired depth image may be a parallax image or an image obtained by processing the parallax image.

In some examples, the depth camera may include a projection module and an image sensor. The depth camera is schematically illustrated below with specific examples.

In one example, a depth camera may include a projection module and a CMOS image sensor. The projection module is used for projecting a structural light beam to the target object; the CMOS image sensor is used for receiving the light beam emitted by the target object and generating a structured light image, and carrying out matching calculation or trigonometry calculation on the structured light image to obtain depth information of each pixel point, namely obtaining a depth image reflecting the depth information.

In another example, a depth camera may include a projection module and a TOF image sensor. The projection module is used for projecting pulse-type and continuous wave-type infrared light beams to the target object; the TOF image sensor is used for receiving the light beam emitted by the target object, generating a phase image, processing the phase image based on an indirect flight time principle, obtaining the flight time from the light beam emitted by the projection module to the light beam received by the TOF image sensor, and further calculating based on the flight time, obtaining the depth information of each pixel point, and obtaining the depth image reflecting the depth information.

In yet another example, a depth camera may include a projection module and a SPAD image sensor. The projection module is used for projecting infrared pulse beams to the target object; the SPAD image sensor is used for receiving the light beam emitted by the target object, generating a histogram, processing the histogram based on a direct flight time principle to obtain the flight time from the light beam emitted by the projection module to the light beam received by the SPAD image sensor, and further calculating based on the flight time to obtain the depth information of each pixel point, namely obtaining the depth image reflecting the depth information.

In still another example, the depth camera may include two CMOS image sensors, where the two CMOS image sensors are respectively used to collect binocular images (i.e., left and right images) of the target object, match the binocular images based on a visual principle to obtain a parallax image, and further calculate based on the parallax image to obtain depth information of each pixel point, so as to obtain a depth image reflecting the depth information.

In some embodiments, the processing device 120 may be a separate dedicated circuit, such as the processing device 120 may be a dedicated SOC chip, FPGA chip, or ASIC chip including a processor, memory, bus, etc. In other embodiments, the processing device 120 may be a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), or a terminal device, which is not limited in the specific type of the processing device according to the embodiments of the present application.

It should be understood that the image capturing device 110 and the processing device 120 may be independent devices, and that the image capturing device 110 and the processing device 120 may be integrally provided, which is not limited in this embodiment of the present application. The processing device 120 and the display device 130 may be independent devices, and the processing device 120 and the display device 130 may be integrally provided, which is not limited in this embodiment of the present application.

The above section schematically illustrates a human body posture estimation system, and the following schematically illustrates a human body posture estimation method provided by the embodiment of the present application.

Fig. 2 is a schematic flow chart of a human body posture estimation method according to an embodiment of the present application. The human body posture estimation method may be performed by an electronic device or a server. As shown in fig. 2, the human body posture estimating method includes the following steps 210 to 240.

Step 210, a first color image and a first depth image corresponding to the first color image are acquired, where the first color image and the first depth image each include a target object.

The first depth image is a depth image corresponding to the first color image, that is, the first color image has an alignment relationship with at least part of pixel points of the first depth image.

In this embodiment, the first color image and the first depth image may each include a target object. The target object may be an object that needs to be subjected to a human body posture, that is, the target object is a human body. The first color image and the first depth image may include one object (i.e., a user) or may include a plurality of objects, for example. When the first color image and the first depth image both comprise an object, the object is the target object requiring human body posture. When the first color image and the first depth image each include a plurality of objects, the target object may be any one of the plurality of objects.

The first color image may be an RGB image or a YUV image, for example. The first depth image may contain an image of depth information. The first depth image may be a structured light image, a phase image, a histogram, or a parallax image, for example.

Step 220, processing the first color image to obtain a first human body grid model of the target object.

In this embodiment, after the first color image is acquired, the human body posture is estimated using the first color image, and a first human body mesh model may be obtained. Wherein the first human body mesh model is a human body mesh model of the target object. The first human body mesh model may be composed of a plurality of mesh points, and after the first human body mesh model is obtained, the mesh points of the first human body mesh model are obtained, that is, the first mesh point set is obtained.

Step 230, constructing a first point cloud corresponding to the target object according to the first depth image.

In this embodiment, the first point cloud may include three-dimensional coordinate information of each pixel point of the image area where the target object is located in the first depth image, that is, the first point cloud may include three-dimensional coordinate information of a human body node of the target object. The first point cloud may reflect the true pose of the target object.

In some embodiments, before constructing the first point cloud corresponding to the target object according to the first depth image, the method may further include: and processing the first color image to obtain a human mask image corresponding to the target object. Thus, after the human mask image is obtained, a first point cloud corresponding to the target object can be constructed according to the first depth image and the human mask image.

In some embodiments, before a first point cloud corresponding to a target object is constructed according to a first depth image and a human mask image, abnormal points in a second image area corresponding to the human mask image in the first depth image are removed, so that the first point cloud which is more in line with the real human body posture of the target object can be obtained, grid points of a first human body grid model are registered based on the first point cloud, the human body posture which is more in line with the real human body of the target object can be generated, and the accuracy of human body posture estimation is improved.

Step 240, performing global rigid registration based on the first point cloud and the first grid point set of the first human body grid model, and optimizing the first grid point set of the first human body grid model by using the registered first point cloud to obtain the first human body posture of the target object.

Specifically, based on the first point cloud, performing global rigid registration on a first grid point set of the first human body grid model to obtain a second grid point set. In this embodiment, since the first point cloud is constructed according to the first depth image, the first point cloud may reflect the true pose of the target object whose human body pose is to be performed, and therefore, the first point cloud is used as a registration target, and global rigid registration is performed on each grid point in the first grid point set, so as to obtain the second grid point set. The second set of grid points may be a set of grid points with a minimum error between the first set of grid points and the first point cloud (i.e., a sum of distances between each grid point in the second set of grid points and a reference point in the first point cloud corresponding to the grid point is the minimum).

In this embodiment, after the second grid point set is obtained through global rigid registration, the first point cloud is utilized to optimize the second grid point set in the first human body grid model to obtain the first human body three-dimensional model, so that the first human body three-dimensional model is closer to the real human body of the target object, and further the first human body posture of the target object (i.e., the user) is obtained based on the first human body three-dimensional model, and the real posture of the target object can be reflected more accurately. After the first human body gesture of the target object is obtained, the first human body gesture is displayed so as to realize visual display of the human body gesture.

In some embodiments, before globally rigidly registering the first set of grid points of the first human mesh model based on the first point cloud to obtain the second set of grid points, the method may further include: the first human body grid model (namely the first grid point set) and the first point cloud are subjected to downsampling, and the first grid point set after downsampling is subjected to global rigid registration based on the first point cloud after downsampling, so that the processing efficiency can be improved, the human body posture can be estimated quickly, and the user experience is better.

When the human body posture estimation method provided by the embodiment of the application is used for estimating the human body posture, firstly, a first color image and a first depth image of a first color image object are obtained, the human body posture is estimated by using the first color image, and a first human body grid model corresponding to a target object is generated; then, constructing a first point cloud of the target object according to the first depth image; and finally, carrying out global rigid registration on grid points of the first human body grid model by using a first point cloud reflecting the real gesture of the target object to be subjected to human body gesture, and optimizing the grid points of the first human body grid model by using the registered first point cloud to obtain the real gesture of the target object. In this way, the point cloud data constructed by the depth image is utilized to register grid points of the human body grid model constructed based on the color image, so that the problem that comprehensive human body posture information cannot be acquired due to limited view angle of a single color image can be solved, the accuracy of human body posture estimation can be improved, and the real human body posture which is more attached to a target object can be obtained. Furthermore, the embodiment does not need to label training data on a large scale, so that the data processing difficulty can be reduced, and the cost can be reduced.

The foregoing is a schematic illustration of the overall arrangement of an embodiment of the application. The specific implementation of each step in the foregoing embodiments is schematically illustrated below by specific examples.

In some examples, the step 210 may include: acquiring a first color image and a depth image to be processed, which is synchronous with the first color image; and aligning the depth image to be processed with the first color image to obtain a first depth image.

Specifically, a first color image and a first depth image are acquired through an image acquisition device, namely, the first color image is acquired through a color camera in the image acquisition device, a depth image to be processed is acquired through a depth camera in the image acquisition device, and the depth image to be processed is aligned to the first color image, so that the first depth image is obtained. The first depth image is a depth image corresponding to the first color image, that is, the first color image and the first depth image are images of the same scene (as the same object). For example, the first color image is a current frame color image acquired by a color camera, and the first depth image is a current frame depth image acquired by a depth camera.

The fields of view of the color camera and the depth camera may be the same or partially overlap. When the field of view of the color camera and the field of view of the depth camera are partially overlapped, an overlapping area exists between the first color image acquired by the color camera and the first depth image acquired by the depth camera, that is, the first color image and at least part of pixels of the first depth image have alignment relation, that is, the first color image and the first depth image both comprise the same object.

In some examples, the step 220 may include: step 310-step 330.

Step 310, performing foreground extraction processing on the first color image to obtain a human body detection frame.

In some examples, the first color image is processed using a preset foreground extraction model to obtain a human detection box (which may also be referred to as a human foreground box). The preset foreground extraction model may include a feature extraction module, a fusion module, and a detection module, for example. The step of processing the first color image by using the preset foreground extraction model to obtain a human body detection frame may include: step 410-step 430.

In step 410, the feature extraction module performs feature extraction on the first color image to obtain a plurality of feature images, where feature sizes corresponding to different feature images are different.

And step 420, the fusion module fuses the plurality of feature images to obtain a fused feature image.

In step 430, the detection module determines a human detection frame according to the fused feature image.

Specifically, the fused feature images are analyzed, so that a plurality of detection frames (also called foreground frames) can be obtained; acquiring the confidence coefficient of the category of the foreground in each detection frame; determining the maximum confidence coefficient (namely, the confidence coefficient with the maximum value of the confidence coefficient of the category of the foreground in each detection frame in the plurality of detection frames) according to the confidence coefficient of the category of the foreground in each detection frame in the plurality of detection frames, and determining the category of the foreground corresponding to the maximum confidence coefficient; when the category of the foreground corresponding to the maximum confidence is a human body, taking the detection frame corresponding to the maximum confidence as a human body detection frame; when the category of the foreground corresponding to the maximum confidence is not human body, the frame of color image (namely the first color image) does not comprise the object to be subjected to human body posture estimation, at the moment, the next frame of color image is acquired, and the next frame of color image is processed.

In some examples, following step 420 described above, the method may further include: the detection module generates a mask image corresponding to the first color image by utilizing the fusion characteristic image; and on the basis of the human body detection frame, performing segmentation processing on the mask image corresponding to the first color image to obtain a human body mask image. The human body mask image may be a human body mask image corresponding to the target object, and the first point cloud of the target object may be constructed by using the human body mask image and the first depth image.

After step 310, step 320 is performed to determine a first image area corresponding to the human detection frame in the first color image. The first image area may be an image area including the target object in the first color image.

Step 330, estimating the human body posture based on the first image area to obtain a first human body grid model.

In some examples, the human body pose estimation is performed on the first image region using a preset reconstruction model to obtain a first human body mesh model. For example, the preset reconstruction model may include a preprocessing module, an estimation module, and a fitting module. The step 330 may include: step 510-step 530.

Step 510, the preprocessing module performs normalization processing and affine transformation processing on the first image area to obtain a standard human body image corresponding to the target object.

Step 520, the estimation module processes the standard human body image to obtain the coordinate information of the human body joint point, the self-rotation parameters of the human skeleton and the shape parameters; the human body joint point coordinate information may be two-dimensional coordinate information of the human body joint point.

Step 530, the fitting module is configured to obtain an initial pose parameter of the target object when stationary by using a shape parameter of a preset human body mesh model, determine a pose parameter according to human body joint point coordinate information, a human body skeleton self-rotation parameter and the initial pose parameter, and adjust the preset human body mesh model based on the shape parameter and the pose parameter to obtain a first human body mesh model; wherein, the shape parameter (i.e. shape) can be used for adjusting the shape of the preset human body grid model; the pose parameters (i.e., pose) may be used to adjust the pose of the pre-set human mesh model.

In this embodiment, the preset human body mesh model may be a standard human body posture that is generated in advance. The preset human mesh model may be, for example, a SMPL (skinned multi-person linear) human parameter model, a SCAPE (shape completion and animation through parametric estimation) human parameter model, or a 3DMM (3 d morphable model) model. The embodiment of the application does not limit the specific type of the preset human body grid model. That is, the shape of the preset human body mesh model is adjusted through the shape parameters by taking the preset human body mesh model as a template, and the posture of the preset human body mesh model is adjusted through the posture parameters, so that the first human body mesh model is obtained, and the first human body mesh model is closer to the real shape and posture of the target object.

In some examples, after human body pose estimation is performed based on the first image region to obtain a first human body mesh model, grid points of the first human body mesh model are obtained, that is, a first grid point set is obtained. In this way, the first grid point set is globally registered in combination with the subsequent steps, so as to adjust the three-dimensional model of the human body, thereby improving the accuracy of human body posture estimation.

In some examples, where a plurality of objects are included in the first color image, determining foreground extraction data for each object, wherein the foreground extraction data includes a human detection frame and a human mask image; and storing the foreground extraction data of each object in association with the identification information of the object. Thus, when the first point cloud of the target object is constructed by utilizing the human mask image and the first depth image, the human mask image corresponding to the target object can be acquired according to the identification information of the target object, and then the first point cloud of the target object is constructed according to the human mask image corresponding to the target object and the first depth image. In this embodiment, when a plurality of objects are included in an image, different identification information may be set for different objects, so that data of different objects are distinguished using the identification information of the objects.

In some examples, the step 230 may include: step 610-step 620.

Step 610, determining a second image area corresponding to the target object in the first depth image according to the human mask image.

Step 620, performing conversion processing on the second image area to obtain a first point cloud.

Specifically, the second image area is subjected to conversion processing according to the following formula (1), resulting in a first point cloud (X, Y, Z).

Wherein (X, Y, Z) represents the three-dimensional coordinates of any point in the first point cloud; f is the focal length of the receiving end of the depth camera; (C _x,C_y) an optical center of the receiving end of the depth camera; (x, y) is the coordinates of any pixel in the first depth image, and z is the depth value of any pixel in the first depth image.

In some examples, in the step 230, the step of performing outlier rejection processing on the second image area corresponding to the mask image of the human body in the first depth image includes: determining the average depth of the second image according to the pixel value of each pixel point in the second image area; calculating the difference between the pixel value and the average depth of each pixel point in the second image area; under the condition that the difference value is out of a preset difference value threshold value, determining a pixel point corresponding to the difference value as an abnormal pixel point, and eliminating the abnormal pixel point in the second image area; and under the condition that the difference value is within a preset difference value threshold value, converting the pixel point corresponding to the difference value to obtain a first point cloud.

In this embodiment, abnormal points in the second image area corresponding to the human mask image in the first depth image are removed, so that a first point cloud which is more in line with the real human posture of the target object can be obtained, grid points of the first human grid model are registered based on the first point cloud, the real human posture which is more in line with the target object can be generated, and the accuracy of human posture estimation is improved.

In some examples, in the step 240, using the first point cloud as the registration target, performing global rigid registration on each grid point in the first grid point set to obtain the second grid point set specifically includes: step 710-step 720.

And 710, performing global rigid registration on the first grid point set and the first point cloud to obtain a target transformation relation between the first grid point set and the first point cloud.

Specifically, based on an ICP registration algorithm, calculating to obtain a relative transformation relation between grid points (grid points in a first grid point set) of the first human body grid model and a first point cloud (i.e. human body point cloud of the target object), wherein the relative transformation relation comprises a rotation matrix R and a translation matrix T; constructing a target residual error function according to grid points, first point clouds, a rotation matrix R and a translation matrix T of the first human body grid model; deriving a target residual error function to obtain a cost function; summing the cost values corresponding to all grid points in the first grid point set based on the cost function to obtain a reference cost value; determining a target cost value according to the reference cost values corresponding to different relative transformation relations (different rotation matrixes and translation matrixes), wherein the target cost value is the reference cost value with the minimum value among a plurality of reference cost values; and taking the relative change relation corresponding to the target cost value as a target transformation relation, wherein the target transformation relation comprises a target rotation matrix and a target translation matrix.

For example, the above target residual function may be referred to as the following formula (2):

S＝|RQ+T-q|² (2)

s is a target residual error function; q is a grid point of the first human mesh model (i.e., any grid point in the first set of grid points); q is any point in the first point cloud; r is a rotation matrix; t is the translation matrix.

It can be understood that different target residual functions can be constructed according to different relative transformation relations, different reference cost values can be obtained according to different target residual functions, and further different reference cost values are compared, so that the target cost values can be obtained, and further the target transformation relation is obtained.

In step 720, rotation and translation transformation are performed on the first point cloud based on the target transformation relationship so that the first point cloud and the first grid point are located in the same coordinate system, and a grid point with the minimum error between the first grid point set and the first point cloud is obtained to obtain a second grid point set.

In this embodiment, the point cloud data constructed by the depth image is used to register the grid points of the human body grid model constructed based on the color image, so that the problem that the comprehensive human body posture information cannot be obtained due to the limited viewing angle of the single color image can be solved, the accuracy of human body posture estimation can be improved, and the true human body posture of the more fitting target object can be obtained.

In some embodiments, in step 240 above, after optimizing the first human mesh model using the registered first point cloud, the method may further include: and performing global non-rigid calibration of local deformation bending on the optimized first human body grid model by utilizing the first point cloud so as to further correct the optimized first human body grid model to obtain a corrected first human body grid model, and acquiring the first human body posture of the target object based on the corrected first human body grid model.

Specifically, after a second grid point set is obtained through global rigid registration and a first grid point set in a first human body grid model is optimized by utilizing a first point cloud to obtain a first human body three-dimensional model, global non-rigid calibration is conducted on the first human body three-dimensional model by utilizing the first point cloud, such as non-rigid transformation such as local deformation bending and the like is conducted on the grid surface of the first human body three-dimensional model according to the first point cloud, further correction is conducted on the first human body three-dimensional model to obtain a corrected first human body three-dimensional model, and therefore the first human body posture of a target object is obtained based on the corrected first human body three-dimensional model.

In the embodiment of the application, after the grid points of the human body grid model are subjected to global rigid registration by utilizing the first point cloud to obtain the optimized first human body three-dimensional model, the grid points of the first human body three-dimensional model are further subjected to global non-rigid registration by utilizing the first point cloud to correct the first human body three-dimensional model, so that the spatial variation and deformation of the point cloud data can be better adapted, the local fine calibration of the first human body three-dimensional model is realized, the error of human body posture estimation is further reduced, and the accuracy of human body posture estimation is improved.

Further, in some embodiments, the method may further comprise: and performing joint point filtering and rotation angle filtering on the first human body three-dimensional model. Specifically, before the step 240, the method may further include: acquiring a fourth depth image, wherein the fourth depth image is a depth image corresponding to a next frame of color image of the first color image; and constructing a fourth point cloud according to the fourth depth image.

In this embodiment, step 240 may include: weighting the first point cloud and the fourth point cloud to obtain processed point clouds; and carrying out global rigid registration on the processed point cloud and a first grid point set of the first human body grid model, and optimizing the first human body grid model by utilizing the registered first point cloud so as to obtain a first human body posture of the target object through the optimized first human body grid model. Therefore, the generated human body posture can be displayed more stably and smoothly, and the visual effect of human body posture estimation is improved.

In this embodiment, for an image frame sequence, human body pose estimation may be performed based on each frame of color image and a depth image corresponding to the color image. Alternatively, when the human body posture estimation is performed based on any frame color image in the image frame sequence and the depth image corresponding to the color image, the current state of the object in the color image may be determined, so as to determine whether to perform the human body posture estimation based on the color image and the depth image corresponding to the color image according to the current state of the object in the color image. The following is a schematic illustration of specific examples.

In some embodiments, after step 210, the method may further comprise: step 810-step 840.

At step 810, a second color image is acquired, the second color image comprising the target object.

Wherein the second color image may be an image comprising the target object acquired by the image acquisition device. The second color image is acquired at a time later than the first color image, i.e. the second color image is an image frame acquired after the first color image. For example, the first color image is a current frame color image and the second color image is a next frame color image. Also, for example, the first color image is an i-th frame color image, and the second color image is an i+5-th frame color image.

Step 820, determining the current state of the target object according to the second color image and the first color image.

In this embodiment, the current state of the target object may be a change state of the target object in the second color image with respect to the target object in the first color image. The current state of the target object may include a stationary state and a moving state. When the current state of the target object is a static state, the posture of the target object in the second color image relative to the target object in the first color image is unchanged. When the current state of the target object is a motion state, the gesture of the target object in the second color image relative to the target object in the first color image is changed.

In step 830, the first human gesture is followed in the case where the current state is a stationary state.

When the current state of the target object is in a static state, the posture of the target object in the second color image relative to the target object in the first color image is not changed, and at this time, the first human body posture estimated based on the first color image and the first depth image is not required to be continuously used and is output as a human body posture estimation result.

In step 840, in the case where the current state is a motion state, a second human body posture of the target object is estimated according to the second color image and the second depth image.

The second depth image is a depth image corresponding to the second color image, the second depth image comprises a target object, and the second human body posture is used for reflecting the human body posture after the target object changes.

When the current state of the target object is a motion state, the posture of the target object in the second color image relative to the target object in the first color image is changed, and at this time, the human body posture can be updated in real time based on the second human body posture generated by the second color image and the second depth image when the human body posture of the target object is changed.

Specifically, the step of estimating the second human body pose of the target object from the second color image and the second depth image may include: processing the second color image to obtain a second human body grid model corresponding to the target object; and constructing a second point cloud corresponding to the target object according to the second depth image, carrying out global rigid registration on the second point cloud and grid points of a second human body grid model, and optimizing the second human body grid model by utilizing the registered second point cloud so as to obtain a second human body posture corresponding to the target object through the optimized first human body grid model.

It should be noted that, according to the second color image and the second depth image, a specific implementation manner of estimating the second human body posture of the target object may refer to a specific implementation manner of steps 210 to 240 in the foregoing embodiment, and in order to avoid repetition, a description is omitted here.

In this embodiment, a first color image and a second color image are acquired, and a current state of a target object for which human body posture estimation is required is determined according to the first color image and the second color image, so as to determine whether to re-estimate the human body posture of the target object according to the current state of the target object. Therefore, when the human body posture of the target object changes, the human body posture of the target object can be estimated in real time, and the human body posture of the target object is updated in real time, so that the real-time estimation of the human body posture of the dynamic scene can be realized. In addition, the human body posture estimation method provided by the embodiment has stronger motion robustness, and can be further suitable for more complex scenes. In addition, when the target object is in a static state, the human body posture is not required to be updated, the first human body posture generated based on the first color image and the first depth image is output as a human body posture estimation result, the real-time performance and the accuracy of human body posture estimation are ensured, the operation amount can be reduced, the stability of the output of the human body posture estimation result can be improved, and the visual effect of the human body posture estimation result is improved.

The determination of the current state of the target object is schematically described below with specific examples.

In some examples, step 820 described above may include: step 910-step 930.

Step 910, determining a first differential image based on the second color image and the first color image.

Specifically, gray processing is carried out on the first color image, so that a first gray image is obtained; gray processing is carried out on the second color image, and a second gray image is obtained; and carrying out differential operation on the second gray level image and the first gray level image to obtain a first differential image. The first differential image may reflect a change in the second color image relative to the first color image.

In step 920, in the case that the pixel value of the first differential image is within the preset pixel range, it is determined that the current state of the target object is a stationary state.

And when the pixel value of the first differential image is within the preset pixel range, the second color image is not obviously changed relative to the first color image, and the target object in the second color image is not changed relative to the target object in the first color image, namely the current state of the target object is a static state.

It should be noted that, the preset pixel range may be set according to practical experience and experimental data, and the specific numerical range of the preset pixel range is not limited in the embodiment of the present application.

In step 930, the current state of the target object is determined to be a motion state if the pixel value of the first differential image is outside the preset pixel range.

And when the pixel value of the first differential image is out of the preset pixel range, the second color image is changed relative to the first color image, and the target object in the second color image is determined to be changed relative to the target object in the first color image, namely the current state of the target object is a motion state.

In this embodiment, the first color image and the second color image are subjected to differential operation to obtain the first differential image, and the current state of the target object in the second color image can be determined according to the pixel value of the first differential image, so that the processing mode is simple.

In other examples, step 820 may include: step 1010-step 1050.

In step 1010, interpolation processing is performed on the first color image and the second color image to obtain a processed first color image and a processed second color image.

Step 1020, determining a second difference image based on the processed second color image and the processed first color image.

Specifically, interpolation processing is carried out on the first color image and the second color image respectively to obtain a processed first color image and a processed second color image which accord with a preset size; and performing differential operation on the processed first color image and the processed second color image to obtain a second differential image. The second differential image may reflect a change in the second color image relative to the first color image.

Step 1030, determining the number of target pixel points in the second differential image, where the target pixel points are pixel points with pixel values outside the preset pixel range.

Step 1040, determining that the current state of the target object is a stationary state when the number of target pixel points is less than the preset number threshold.

When the number of the target pixel points is smaller than the preset number threshold, the second color image is not obviously changed relative to the first color image, and at the moment, the fact that the target object in the second color image is unchanged relative to the target object in the first color image is determined, namely the current state of the target object is a static state.

It should be noted that, the preset number threshold may be set according to actual experience and test data, and the specific value of the preset number threshold is not limited in the embodiment of the present application.

In step 1050, the current state of the target object is determined to be a motion state when the number of target pixels is greater than or equal to the preset number threshold.

When the number of the target pixel points is greater than or equal to a preset number threshold, the second color image is changed relative to the first color image, and at the moment, the target object in the second color image is determined to be changed relative to the target object in the first color image, namely the current state of the target object is a motion state.

In this embodiment, interpolation processing is performed on the first color image and the second color image, and difference operation is performed on the processed first color image and the processed second color image, so as to obtain a second difference image, and according to the pixel value of the second difference image, the current state of the target object in the second color image can be more accurately determined.

In still other examples, to improve the accuracy of detection, the detection manners of the current states of the two target objects may be combined, and the current states of the target objects may be comprehensively determined. Illustratively, the step 820 may include: step 1110-step 1140.

Step 1110, determining a first differential image according to the second color image and the first color image.

In particular, the specific implementation of step 1110 may refer to the specific implementation of step 910, and in order to avoid repetition, a detailed description is omitted here.

Step 1120, performing interpolation processing on the first color image and the second color image to obtain a processed first color image and a processed second color image, and determining a second difference image according to the processed second color image and the processed first color image.

In particular, the specific implementation of step 1120 may refer to the specific implementation of step 1010-step 1020 described above, and will not be repeated here.

In step 1130, the current state of the target object is determined to be a stationary state when the first differential image satisfies the first preset condition and the second differential image satisfies the second preset condition.

In step 1140, the current state of the target object is determined to be a motion state if the first differential image does not satisfy the first preset condition and/or the second differential image does not satisfy the second preset condition.

The first differential image meets a first preset condition that the pixel value of the first differential image is located in a preset pixel range; the second differential image meets a second preset condition that the number of target pixel points, of which the pixel values are out of a preset pixel range, in the second differential image is smaller than a preset number threshold.

That is, after the first differential image and the second differential image are obtained, whether the pixel value of the first differential image is within the preset pixel range or not and whether the number of target pixel points, of which the pixel value is outside the preset pixel range, in the second differential image is smaller than the preset number threshold value are determined, and when the pixel value of the first differential image is within the preset pixel range and the number of target pixel points, of which the pixel value is outside the preset pixel range, in the second differential image is smaller than the preset number threshold value, the current state of the target object is determined to be a stationary state. And when the pixel value of the first differential image is out of the preset pixel range and/or the number of the target pixel points of which the pixel values are out of the preset pixel range in the second differential image is greater than or equal to a preset number threshold value, determining that the current state of the target object is a motion state.

In some embodiments, after the first color image, if the target object in the continuous multi-frame images is in a static state, determining an interval duration, and updating the human body pose of the target object according to the interval duration, so that real-time estimation of the human body pose of the target object can be realized.

Illustratively, after the first color image and the first depth image corresponding to the first color image are acquired, the method further includes: step 1210-step 1240.

In step 1210, a color image sequence is acquired that includes a plurality of third color images, each of the plurality of third color images including a target object.

Wherein the sequence of color images may comprise a consecutive plurality of third color images acquired after the first color image. For example, the first color image is an i-th frame color image acquired by the image acquisition device, and the color image sequence may include i+1th to i+10th frame color images acquired by the image acquisition device.

In step 1220, in the case that the current state of the target object included in each third color image in the color image sequence is a stationary state, a duration of the stationary state of the target object is determined.

Specifically, when the current state of the target object included in each third color image in the color image sequence acquired after the first color image is in a static state, determining the duration of the target object in the static state according to the acquisition time of the first color image and the acquisition time of the last frame of the third color image in the color image sequence, namely obtaining the duration of the target object in the static state, and further determining the interval duration of the human body posture estimation operation according to the duration of the target object in the static state.

It should be noted that, the determining process of the current state of the target object in the third color image may refer to the specific implementation manner of step 820 in the foregoing embodiment, and in order to avoid repetition, a description thereof is omitted here.

In step 1230, an interval duration is determined based on the duration.

The interval duration may represent an interval duration between two human body posture estimation operations, that is, an execution frequency (i.e., a restart frequency) of the human body posture estimation step in a period in which the target object is in a stationary state. Specifically, the longer the duration of the target object in the stationary state, the longer the interval duration, and the lower the execution frequency of the human body posture estimation step in the time period of the target object in the stationary state; the shorter the duration of the target object in the stationary state, the shorter the interval duration, and the higher the execution frequency of the human body posture estimation step in the time period in which the target object is in the stationary state.

Illustratively, the interval duration may be determined according to the following equation (3).

t＝2^T (3)

Wherein t is the interval duration; t is the duration of time that the target object is in a stationary state.

Step 1240, starting timing from the acquisition time of the target color image, and updating the first human body posture based on the third depth image when the timing time length reaches the interval time length, so as to obtain the third human body posture.

The third depth image is a depth image corresponding to a third color image of the last frame in the color image sequence. The target color image is the first frame image in which the target object is detected to be in a stationary state, for example, the target color image is the first frame of the third color image in the color image sequence.

Specifically, a third point cloud is constructed according to the third depth image; and carrying out global registration on the third point cloud and grid points of the first human body three-dimensional model, and updating the first human body three-dimensional model (namely updating the first human body posture) based on the registered grid points to obtain a third human body posture. Wherein the third human body posture may reflect a small change of the third color image of the last frame in the sequence of color images relative to the target object of the first color image.

For example, taking a first color image as an ith color image, taking a color image sequence including an ith+1th color image to an ith+10th color image as an example, determining a duration T of the target object in a static state according to a collection time of the ith color image and a collection time of the ith+10th color image in each color image in the color image sequence; determining an interval duration T according to the duration T of the target object in a static state; and taking the acquisition time of the i+1st frame of color image (the first frame of image of which the user is in a static state is detected) as the starting time, starting timing, and executing the human body posture estimation step if the timing time reaches the interval time t, namely the restarting time t ^′. Specifically, a third point cloud is constructed based on a depth image corresponding to the (i+10) th frame of color image; registering the third point cloud with grid points of the first human body three-dimensional model, updating the first human body three-dimensional model based on the registered grid points (namely updating the first human body posture), and obtaining the third human body posture based on the updated first human body three-dimensional model.

In this embodiment, after the first color image, if the target object in the continuous multi-frame images is in a static state, the interval duration is determined, and the human body posture of the target object is updated according to the interval duration, so that real-time estimation of the human body posture of the target object can be realized, and thus, when the target object is slightly moved or deformed, the human body posture of the real human body attached to the target object can be generated in real time, and the accuracy of human body posture estimation can be further improved. And the human body posture of the target object is updated according to the interval time, so that human body posture estimation is not required for each frame of image, the operation amount can be reduced, the estimation efficiency of the human body posture is improved, the stability of the output of the human body posture estimation result is improved, and the visualization effect of the human body posture estimation result is improved.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the method for estimating a human body posture described in the above embodiments, fig. 3 shows a block diagram of a device for estimating a human body posture according to an embodiment of the present application, and for convenience of explanation, only a portion related to the embodiment of the present application is shown.

Referring to fig. 3, the human body posture estimating apparatus 1300 includes: an image obtaining module 1301, configured to obtain a first color image and a first depth image corresponding to the first color image, where the first color image and the first depth image each include a target object; the grid model obtaining module 1302 is configured to process the first color image to obtain a first human body grid model of the target object; the point cloud construction module 1303 is configured to construct a first point cloud corresponding to the target object according to the first depth image; the pose estimation module 1304 is configured to perform global rigid registration on the first point cloud and a first grid point set of the first human body grid model, and optimize the first grid point of the first human body grid model by using the registered first point cloud to obtain a first human body pose of the target object through the optimized first human body grid model.

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Referring to fig. 4, the embodiment of the application further provides an electronic device. As shown in fig. 4, the electronic device 1400 includes a processor 1401, a memory 1402, and a computer program stored in the memory 1402 and executable on the processor 1401, the processor 1401 implementing the steps in any of the various method embodiments described above when executing the computer program.

The embodiment of the application also provides electronic equipment. The electronic device may comprise the human body posture estimation apparatus 1300 provided by the foregoing embodiments.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.

Embodiments of the present application provide a computer program product enabling an electronic device/server to carry out the steps of the various method embodiments described above when the computer program product is run on the electronic device/estimation device.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/electronic device, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of estimating a human body pose, the method comprising:

Acquiring a first color image and a first depth image corresponding to the first color image, wherein the first color image and the first depth image both comprise a target object;

Processing the first color image to obtain a first human body grid model of the target object;

Constructing a first point cloud corresponding to the target object according to the first depth image;

And carrying out global rigid registration on the first point cloud and a first grid point set of the first human body grid model, and optimizing the first human body grid model by utilizing the registered first point cloud so as to obtain the first human body posture of the target object through the optimized first human body grid model.

2. The method of claim 1, wherein after the optimizing the first human mesh model with the registered first point cloud, the method further comprises:

And carrying out global non-rigid calibration of local deformation bending on the optimized first human body grid model by utilizing the first point cloud so as to further modify the optimized first human body grid model to obtain a modified first human body grid model, and acquiring the first human body posture of the target object based on the modified first human body grid model.

3. The method of claim 1, wherein after the acquiring the first color image and the first depth image corresponding to the first color image, the method further comprises:

acquiring a second color image, wherein the second color image comprises the target object;

Determining a current state of the target object according to the second color image and the first color image;

Under the condition that the current state is a static state, the first human body posture is used;

Estimating a second human body posture of the target object according to the second color image and the second depth image under the condition that the current state is a motion state; the second depth image is a depth image corresponding to the second color image, the second depth image comprises the target object, and the second human body posture is used for reflecting the human body posture of the target object after the change.

4. A method according to claim 3, wherein said determining the current state of the target object from the second color image and the first color image comprises:

Determining a first differential image from the second color image and the first color image;

Determining that the current state of the target object is the static state under the condition that the pixel value of the first differential image is within a preset pixel range;

and determining the current state of the target object as the motion state under the condition that the pixel value of the first differential image is out of the preset pixel range.

5. A method according to claim 3, wherein said determining the current state of the target object from the second color image and the first color image comprises:

performing interpolation processing on the first color image and the second color image to obtain a processed first color image and a processed second color image;

determining a second differential image according to the processed second color image and the processed first color image;

determining the number of target pixel points in the second differential image, wherein the target pixel points are pixel points with pixel values outside a preset pixel range;

Determining that the current state of the target object is a static state under the condition that the number of the target pixel points is smaller than a preset number threshold value;

And under the condition that the number of the target pixel points is greater than or equal to the preset number threshold, determining that the current state of the target object is a motion state.

6. A method according to claim 3, wherein said determining the current state of the target object from the second color image and the first color image comprises:

Performing interpolation processing on the first color image and the second color image to obtain a processed first color image and a processed second color image, and determining a second difference image according to the processed second color image and the processed first color image;

Determining that the current state of the target object is a static state under the condition that the first differential image meets a first preset condition and the second differential image meets a second preset condition;

Determining that the current state of the target object is a motion state under the condition that the first differential image does not meet the first preset condition and/or the second differential image does not meet the second preset condition;

The first differential image meets a first preset condition, wherein the pixel value of the first differential image is located in a preset pixel range; the second differential image meets a second preset condition that the number of target pixel points, of which pixel values are located outside the preset pixel range, in the second differential image is smaller than a preset number threshold.

7. The method of claim 1, wherein after the acquiring the first color image and the first depth image corresponding to the first color image, the method further comprises:

Acquiring a color image sequence comprising a plurality of third color images, wherein each of the plurality of third color images comprises a target object;

determining the duration of the target object in the static state under the condition that the current state of the target object included in each third color image in the color image sequence is the static state;

determining an interval duration according to the duration;

Starting timing from the acquisition time of the target color image, and updating the first human body posture based on a third depth image to obtain a third human body posture when the timing time reaches the interval time; the target color image is a first frame image of detecting that the target object is in a static state, and the third depth image is a depth image corresponding to a third color image of a last frame in the color image sequence.

8. An apparatus for estimating a human body posture, the apparatus comprising:

The image acquisition module is used for acquiring a first color image and a first depth image corresponding to the first color image, wherein the first color image and the first depth image both comprise a target object;

The grid model acquisition module is used for processing the first color image to obtain a first human body grid model of the target object;

the point cloud construction module is used for constructing a first point cloud corresponding to the target object according to the first depth image;

And the gesture estimation module is used for carrying out global rigid registration on the first point cloud and the first grid point set of the first human body grid model, and optimizing the first human body grid model by utilizing the registered first point cloud so as to obtain the first human body gesture of the target object through the optimized first human body grid model.

9. A system for estimating a human body pose, the system comprising:

The image acquisition device is used for acquiring color images and depth images of the target object;

Processor means for processing the color image and the depth image according to the human body posture estimation method as claimed in any one of claims 1 to 7 to obtain the human body posture of the target object.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed, implements the method according to any one of claims 1 to 7.