CN113449570A

CN113449570A - Image processing method and device

Info

Publication number: CN113449570A
Application number: CN202010231605.7A
Authority: CN
Inventors: 甄海洋; 周维
Original assignee: Rainbow Software Co ltd
Current assignee: Rainbow Software Co ltd; ArcSoft Corp Ltd
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2021-09-28
Also published as: KR20220160066A; JP2023519012A; WO2021190321A1; JP7448679B2

Abstract

The invention discloses an image processing method and device. Wherein, the method comprises the following steps: acquiring an original image; carrying out human body detection on the original image to obtain a human body image; processing the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model; and generating a human body model according to the processing result of the human body image. The invention solves the technical problem of lower identification accuracy for positioning two-dimensional and three-dimensional joint points and reconstructing human body parameters in the related technology.

Description

Image processing method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to an image processing method and device.

Background

Related human techniques in the industry today include human detection, two-dimensional and three-dimensional joint point localization, segmentation, and the like. Aiming at parts such as two-dimensional and three-dimensional joint point positioning, human parameter coefficient reconstruction and the like, the following scheme can be adopted at present: 1) firstly, the image is detected by a deep learning scheme, a human body area is intercepted after the detection is finished, two-dimensional joint points are estimated by a deep learning network, and then three-dimensional joint points, human body postures and shape parameters are estimated by the two-dimensional joint points. However, when a three-dimensional joint is estimated by using two-dimensional joints, there may be ambiguity of motion, for example, two-dimensional joints in the same state may correspond to three-dimensional joints in different positions, and the recognition accuracy of the three-dimensional joints depends on the recognition accuracy of the two-dimensional joints, resulting in low recognition accuracy of the three-dimensional joints. 2) Firstly, the image is detected by a deep learning scheme, a human body region is intercepted after the detection is finished, the three-dimensional joint points are directly predicted by a deep learning network, the three-dimensional joint points are changed into three-dimensional voxel grids, the possibility of each voxel grid of each joint is deduced, and therefore training and prediction are carried out. However, because the samples of the three-dimensional joint points are difficult to obtain, most training samples are collected in a laboratory environment, the robustness to outdoor scenes is not high, and the voxel grid is adopted for prediction, so that the calculation amount is large, and the real-time performance is low. 3) Firstly, detecting a human body, then carrying out human segmentation or human analysis on the detected picture, and then carrying out human model estimation by an optimization method by using segmentation and analysis results. However, the human body segmentation and analysis requirements are too high, and the result deviation can affect the human body reconstruction effect.

In view of the problems in the above solutions, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides an image processing method and device, which at least solve the technical problem of low identification accuracy for positioning two-dimensional and three-dimensional joint points and reconstructing human body parameters in the related technology.

According to an aspect of an embodiment of the present invention, there is provided an image processing method including: acquiring an original image; carrying out human body detection on the original image to obtain a human body image; processing the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model; and generating a human body model according to the processing result of the human body image.

Optionally, the method further comprises: obtaining a plurality of groups of training samples, wherein each group of training samples comprises: the method comprises the following steps of (1) obtaining a human body image, first marking information of a two-dimensional joint point, second marking information of a three-dimensional joint point and a parameter value of an SMPL model; training a preset model by utilizing a plurality of groups of training samples, and acquiring a target loss value of the preset model; stopping training the preset model under the condition that the target loss value is smaller than the preset value, and determining the preset model as a first model; and under the condition that the target loss value is larger than the preset value, continuously training the preset model by using the multiple groups of training samples until the target loss value is smaller than the preset value.

Optionally, training the preset model by using multiple sets of training samples, and obtaining the target loss value of the preset model includes: inputting a plurality of groups of training samples into a preset model, and obtaining an output result of the preset model, wherein the output result comprises: a first result of a two-dimensional joint point, a second result of a three-dimensional joint point, and a third result of an SMPL model; obtaining a first loss value of the two-dimensional joint point based on the first mark information and the first result; obtaining a second loss value of the three-dimensional joint point based on the second marking information and the second result; obtaining a third loss value of the SMPL model based on the parameter value and the third result; and obtaining a target loss value based on the first loss value, the second loss value and the third loss value.

Optionally, the parameter value of the SMPL model is real data acquired by the acquisition device, or adjustment data obtained by adjusting the parameter value acquired by the acquisition device.

Optionally, deriving a third loss value for the SMPL model based on the parameter value and the third result comprises: obtaining a third loss value based on the parameter value and the third result under the condition that the parameter value is the real value acquired by the acquisition device; and under the condition that the parameter value is an adjustment value obtained by adjusting the parameter value acquired by the acquisition device, acquiring a three-dimensional joint point based on the parameter value, projecting the three-dimensional joint point onto a two-dimensional plane to obtain a two-dimensional joint point, acquiring a fourth loss value of the two-dimensional joint point based on the projected two-dimensional joint point and the first mark information, and determining the fourth loss value as a third loss value.

Optionally, the method further comprises: processing the parameter value of the third result by using a discriminator to obtain a classification result of the parameter value of the third result, wherein the classification result is used for representing whether the parameter value of the third result is a real value acquired by an acquisition device; and determining whether to stop training the preset model or not based on the classification result and the target loss value.

Optionally, the arbiter is trained with a generative confrontation network.

Optionally, the human body detection is performed on the original image, and obtaining the human body image includes: processing the original image by using the trained second model to obtain the position information of the human body in the original image; and cutting and normalizing the original image based on the position information to obtain a human body image.

Optionally, the first model employs an hourglass-type network structure or a feature map pyramid FPN network structure.

According to another aspect of the embodiments of the present invention, there is also provided an image processing apparatus including: the acquisition module is used for acquiring an original image; the detection module is used for carrying out human body detection on the original image to obtain a human body image; the processing module is used for processing the human body image by utilizing the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model; and the generating module is used for generating a human body model according to the processing result of the human body image.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program is executed, an apparatus in which the storage medium is located is controlled to execute the above-mentioned image processing method.

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the image processing method described above.

In the embodiment of the invention, after the original image is obtained, firstly, the original image is subjected to human body detection to obtain the human body image, and then the trained first model is used for processing the human body image to obtain the processing result of the human body image, so that the purposes of human body detection, two-dimensional and three-dimensional joint point positioning and SMPL model establishment are simultaneously realized, and the human body model can be further generated. It is easy to notice that, because a two-dimensional joint point, a three-dimensional joint point and an SMPL model can be obtained simultaneously by using one model, the three-dimensional joint point is not required to be estimated through the two-dimensional joint point, thereby achieving the technical effect of improving the image identification accuracy, and further solving the technical problem of lower identification accuracy for positioning the two-dimensional and three-dimensional joint points and reconstructing human body parameters in the related technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of an image processing method according to an embodiment of the invention;

FIG. 2 is a schematic illustration of an alternative human body image according to an embodiment of the invention;

FIG. 3 is a schematic illustration of an alternative mannequin according to an embodiment of the present invention;

FIG. 4a is a schematic illustration of an alternative average shaped mannequin in accordance with an embodiment of the present invention;

FIG. 4b is a schematic diagram of an alternative human model generated with shape parameters added thereto according to an embodiment of the present invention;

FIG. 4c is a schematic diagram of an alternative human model generated after adding shape parameters and pose parameters according to an embodiment of the invention;

FIG. 4d is a schematic diagram of an alternative human model generated from detected human motion according to an embodiment of the invention;

FIG. 5 is a flow diagram of an alternative image processing method according to an embodiment of the invention;

fig. 6 is a schematic diagram of an alternative GAN network in accordance with an embodiment of the present invention; and

fig. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

According to an embodiment of the present invention, there is provided an image processing method, it should be noted that the steps shown in the flowchart of the drawings may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present invention, as shown in fig. 1, the method including the steps of:

step S102, obtaining an original image;

specifically, the original image may be an image obtained by capturing input video stream data, or may be an image directly obtained, where the original image includes a human body.

Step S104, carrying out human body detection on the original image to obtain a human body image;

specifically, the human body image may be a minimum image extracted from the original image and including a complete human body region, as shown in fig. 2.

In an alternative embodiment, the human body detection can be performed by using a deep learning model, such as fast Region Convolutional Neural Networks (fast Region Convolutional Neural Networks), yolo (young Only Look one), ssd (single Shot detector), and other detection frames and their variants. As known to those skilled in the art, under different devices and application scenarios, different detection frameworks can be selected to quickly and accurately implement human body detection to obtain a human body image.

Optionally, the human body detection is performed on the original image, and obtaining the human body image includes: processing the original image by using the trained deep learning model to obtain the position information of the human body in the original image; and cutting and normalizing the original image based on the position information to obtain a human body image. The human body position in the human body image can be represented by a minimum bounding rectangle containing a complete human body region in the original image, and is expressed in a form of two-dimensional coordinates (left, top, bottom, right).

Step S106, processing the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points, and SMPL (Skinned Multi-Person Linear) models.

Alternatively, the first model may adopt an hourglass network structure or an FPN (Feature map Pyramid) network structure. For example, when the input is a w × h image, the output feature map may be a w × h or w/4 × h/4 image.

Specifically, the above-described joint point may be a position coordinate of each joint on the human body, such as a wrist, an elbow, or the like, as shown in fig. 2.

The two-dimensional joint points can be expressed in the form of a thermodynamic diagram (Heat Map) or in the form of a coordinate vector. In the form of a thermal map, each joint point can be represented as a feature map, and assuming that the input human body image is a w × h image, the output feature map is an image with the same size or scaled in equal proportion, the value of the feature map at the position of the joint point is 1, and the values of the feature maps at other positions are 0. In one example, when there are 16 two-dimensional joint points of the human body, 16 maps of w h or w/2 h/2 or less may be used to represent the two-dimensional joint points of the human body.

The three-dimensional joint point can also have two expression modes of thermodynamic diagram and coordinate vector, wherein for the thermodynamic diagram mode, compared with the two-dimensional joint point, the three-dimensional joint point increases the z-axis information on a three-dimensional space and diffuses the thermodynamic diagram into a cuboid.

In an alternative embodiment, the human body image may be processed by using the first model to obtain a parameter value of the SMPL model; and then obtaining two-dimensional joint points or three-dimensional joint points based on the parameter values.

And step S108, generating a human body model according to the processing result of the human body image.

As shown in fig. 3, the SMPL model may include a shape (shape) parameter and a pose (position) parameter, and the human body model generated from the shape parameter and the pose parameter may include a plurality of vertices and three-dimensional joint points, each of which is a three-dimensional vector including (x, y, z) coordinates. Fig. 4a to 4c show a process of generating a human body model based on a shape parameter and an orientation parameter, wherein fig. 4a shows a human body model of an average shape, fig. 4b shows a human body model generated by adding a shape parameter to the average shape, and fig. 4c shows a human body model generated by adding a shape parameter and an orientation parameter to the average shape. Fig. 4d shows a human model generated from the detected motion of the human body, in addition to the human model generated in fig. 4 c. As can be seen by comparing fig. 4b and 4c, the difference between the two is not very large, and therefore, in some applications, human modeling can be achieved by generating a human model from only the shape parameters.

According to the embodiment of the invention, after the original image is obtained, firstly, the original image is subjected to human body detection to obtain the human body image, and then the trained first model is used for processing the human body image to obtain the processing result of the human body image, so that the purposes of human body detection, two-dimensional and three-dimensional joint point positioning and SMPL model establishment are simultaneously realized, and the human body model can be further generated. It is easy to notice that, because a two-dimensional joint point, a three-dimensional joint point and an SMPL model can be obtained simultaneously by using one model, the three-dimensional joint point is not required to be estimated through the two-dimensional joint point, thereby achieving the technical effect of improving the image identification accuracy, and further solving the technical problem of lower identification accuracy for positioning two-dimensional and three-dimensional joint points and reconstructing human parameters in the related technology.

In a first application scenario, a human body motion can be detected in real time to drive a human body animation (AVATAR) model, for example, the human body motion is captured based on two-dimensional joint points and three-dimensional joint points, so that the human body animation model performs corresponding motion along with the human body motion, and interactive interaction is realized.

In the second application scenario, the purpose of editing the human body, such as slimming, can be achieved according to the two-dimensional joint points and the three-dimensional joint points in the processing result, for example, processing image pixels at corresponding positions of an arm, a leg, and a body on the human body image, and achieving image processing effects of slimming an arm, a leg, and a waist.

Optionally, in the above embodiment of the present invention, the image processing method further includes: obtaining a plurality of groups of training samples, wherein each group of training samples comprises: the method comprises the following steps of (1) obtaining a human body image, first marking information of a two-dimensional joint point, second marking information of a three-dimensional joint point and a parameter value of an SMPL model; training a preset model by utilizing a plurality of groups of training samples, and acquiring a target loss value of the preset model; stopping training the preset model under the condition that the target loss value is smaller than the preset value, and determining the preset model as a first model; and under the condition that the target loss value is larger than the preset value, continuously training the preset model by using the multiple groups of training samples until the target loss value is smaller than the preset value. The smaller the target loss value is, the higher the recognition accuracy is, the preset value can be set in advance according to the requirements of image recognition accuracy and efficiency, and whether the model is trained can be determined through the preset value.

Optionally, in the foregoing embodiment of the present invention, the image processing method further includes labeling the training sample with first label information of a two-dimensional joint point and second label information of a three-dimensional joint point.

In an alternative embodiment, for a two-dimensional joint point, the first loss value may be obtained based on a predicted thermodynamic diagram (i.e., the first result) and a thermodynamic diagram of the label (i.e., the first label information), or based on a predicted coordinate vector (i.e., the first result) and a coordinate vector of the label (i.e., the first label information), or based on a combined information of the thermodynamic diagram and the coordinate vector.

For a three-dimensional joint point, likewise, the second loss value may be obtained based on the predicted thermodynamic diagram (i.e., the second result) and the thermodynamic diagram of the label tag (i.e., the second label information), or based on the predicted coordinate vector (i.e., the second result) and the coordinate vector of the label tag (i.e., the second label information), or based on the integrated information of the thermodynamic diagram and the coordinate vector.

Compared with a thermodynamic diagram mode, the coordinate vector mode is more convenient to calculate.

Alternatively, the parameter value of the SMPL model may be a real value acquired by the acquisition device, or an adjusted value obtained by adjusting the parameter value acquired by the acquisition device. In an alternative embodiment, the parameter values of the SMPL model may be predicted by true values that are weighted more heavily and adjusted values that are weighted less heavily.

Specifically, the above-mentioned acquisition device may be a camera or a sensor disposed at a plurality of fixed positions in a laboratory environment or an outdoor environment.

Because only the data collected in the laboratory environment can obtain accurate and real parameter values of the SMPL model, the data collected in the outdoor environment has no way to obtain accurate parameter values of the SMPL model. Therefore, in actual calculations, the third loss value may be calculated in different ways for the SMPL model based on the type of parameter value. Optionally, when the parameter value is the real value acquired by the acquisition device, a direct regression mode may be adopted to calculate a third loss value, that is, the third loss value is obtained based on the parameter value and the third result; when the parameter value is an adjustment value obtained by adjusting the parameter value acquired by the acquisition device, a three-dimensional joint point can be obtained according to the parameter value of the SMPL model, the three-dimensional joint point is projected onto a two-dimensional plane to obtain a two-dimensional joint point, a fourth loss value of the two-dimensional joint point is calculated based on the projected two-dimensional joint point and the first mark information, and the loss value is used as a third loss value and is transmitted back to a parameter space of the SMPL model to update the parameter value of the SMPL model.

In the training process, the target loss value is the synthesis of the first loss value, the second loss value and the third loss value, and can be obtained by calculation in a manner of solving a weighted sum.

In an optional embodiment, in the model training process, parameters of the two-dimensional joint points, the three-dimensional joint points and the SMPL model may be learned at the same time, and the model may be generated by performing regression as a whole, and in addition, as shown in fig. 5, an SMPL model discriminator may be used to discriminate the parameter value of the SMPL model, and determine whether the parameter value is a value randomly generated through a network or an acquired real value, thereby improving the reality of the model effect. Specifically, the SMPL model discriminator processes the parameter value of the third result (i.e., the SMPL model output by the preset model) to obtain a classification result of the parameter value of the third result, where the classification result is used to characterize whether the parameter value of the third result is a real value acquired by the acquisition device; and determining whether to stop training the preset model or not based on the classification result and the target loss value. Among them, a D-discriminator in a countermeasure generation Network (GAN) can be adopted as the SMPL model discriminator.

In an alternative embodiment, since the data collected in the outdoor environment has no way to obtain the precise parameter values of the SMPL model, so that it may generate abnormal parameter values, in the embodiment of the present invention, a GAN network is added to train the SMPL model discriminator (i.e. D discriminator), as shown in fig. 6, the GAN network includes a G generator and a D discriminator, the D discriminator is a binary network, receives the randomly generated value from the G generator and the collected real value, and outputs a label indicating the authenticity of the data, for example, when the real value is received, the output is close to a positive label (usually, the positive label is set to 1), when the randomly generated value from the G generator is received, the output is close to a negative label (usually, the negative label is set to 0), and the difference between the randomly generated value and the real value is described by the D discriminator, and then, updating the weight of the numerical value randomly generated by the G generator according to the difference, so that the numerical value randomly generated by the G generator is closer to the real numerical value, and the capability of distinguishing the randomly generated numerical value from the real numerical value by the D discriminator is improved.

Example 2

According to an embodiment of the present invention, an image processing apparatus is provided, which can execute the image processing method described in embodiment 1, and preferred embodiments and application scenarios in this embodiment are the same as those in embodiment 1, and are not described herein again.

Fig. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present invention, as shown in fig. 7, the apparatus including:

an obtaining module 72, configured to obtain an original image;

the detection module 74 is used for performing human body detection on the original image to obtain a human body image;

a processing module 76, configured to process the human body image by using the trained first model to obtain a processing result of the human body image, where the processing result includes: parameter values of two-dimensional joint points, three-dimensional joint points and the SMPL model;

and a generating module 78, configured to generate a human body model according to the processing result of the human body image.

Optionally, in the above embodiment of the present invention, the apparatus further includes: the obtaining module is further configured to obtain a plurality of sets of training samples, where each set of training samples includes: the method comprises the following steps of (1) obtaining a human body image, first marking information of a two-dimensional joint point, second marking information of a three-dimensional joint point and a parameter value of an SMPL model; the training module is used for training the preset model by utilizing a plurality of groups of training samples and acquiring a target loss value of the preset model; the training stopping module is used for stopping training the preset model and determining the preset model as a first model under the condition that the target loss value is smaller than the preset value; the training module is further used for continuing to train the preset model by using the multiple groups of training samples under the condition that the target loss value is larger than the preset value until the target loss value is smaller than the preset value.

Optionally, the training module comprises: the obtaining unit is used for inputting a plurality of groups of training samples into a preset model and obtaining an output result of the preset model, wherein the output result comprises: a first result of a two-dimensional joint point, a second result of a three-dimensional joint point, and a third result of an SMPL model; the first processing unit is used for obtaining a first loss value of the two-dimensional joint point based on the first mark information and the first result; the second processing unit is used for obtaining a second loss value of the three-dimensional joint point based on the second marking information and the second result; a third processing unit, configured to obtain a third loss value of the SMPL model based on the parameter value and the third result; and the fourth processing unit is used for obtaining a target loss value based on the first loss value, the second loss value and the third loss value.

Optionally, the third processing unit is further configured to obtain a third loss value based on the parameter value and the third result when the parameter value is acquired by the acquisition device; and under the condition that the parameter value is obtained by adjusting the parameter value acquired by the acquisition device, acquiring a three-dimensional joint point based on the parameter value, projecting the three-dimensional joint point onto a two-dimensional plane to obtain a two-dimensional joint point, acquiring a fourth loss value of the two-dimensional joint point based on the projected two-dimensional joint point and the first mark information, and determining the fourth loss value as a third loss value.

Optionally, the apparatus further comprises: the processing module is further used for processing the parameter value of the third result by using the discriminator to obtain a classification result of the parameter value of the third result, wherein the classification result is used for representing whether the parameter value of the third result is a real value acquired by the acquisition device; the training stopping module is further used for determining whether to stop training the preset model or not based on the classification result and the target loss value.

Optionally, the training module is further configured to train the discriminator with the generation of the countermeasure network.

Optionally, the detection module comprises: the detection unit is used for processing the original image by using the trained second model to obtain the position information of the human body in the original image; and the fifth processing unit is used for cutting and normalizing the original image based on the position information to obtain a human body image.

Example 3

According to an embodiment of the present invention, there is provided a storage medium including a stored program, wherein an apparatus in which the storage medium is located is controlled to execute the image processing method in embodiment 1 described above when the program is executed.

Example 4

According to an embodiment of the present invention, there is provided a processor configured to execute a program, where the program executes the image processing method in embodiment 1.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An image processing method, comprising:

acquiring an original image;

carrying out human body detection on the original image to obtain a human body image;

processing the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result comprises: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model;

and generating a human body model according to the processing result of the human body image.

2. The method of claim 1, further comprising:

obtaining a plurality of groups of training samples, wherein each group of training samples comprises: the method comprises the following steps of (1) obtaining a human body image, first marking information of a two-dimensional joint point, second marking information of a three-dimensional joint point and a parameter value of an SMPL model;

training a preset model by using the multiple groups of training samples, and acquiring a target loss value of the preset model;

stopping training the preset model and determining the preset model as the first model under the condition that the target loss value is smaller than a preset value;

and under the condition that the target loss value is larger than the preset value, continuing to train the preset model by using the multiple groups of training samples until the target loss value is smaller than the preset value.

3. The method of claim 2, wherein training a preset model with the plurality of sets of training samples and obtaining a target loss value of the preset model comprises:

inputting the plurality of groups of training samples into the preset model, and obtaining an output result of the preset model, wherein the output result comprises: a first result of the two-dimensional articulated point, a second result of the three-dimensional articulated point, and a third result of the SMPL model;

obtaining a first loss value of the two-dimensional joint point based on the first mark information and the first result;

obtaining a second loss value of the three-dimensional joint point based on the second marking information and the second result;

obtaining a third loss value of the SMPL model based on the parameter value and the third result;

and obtaining the target loss value based on the first loss value, the second loss value and the third loss value.

4. The method of claim 3, wherein the parameter values of the SMPL model are real values acquired by an acquisition device or adjusted values obtained by adjusting the parameter values acquired by the acquisition device.

5. The method of claim 4, wherein deriving a third loss value for the SMPL model based on the parameter value and the third result comprises:

obtaining the third loss value based on the parameter value and the third result under the condition that the parameter value is the real value acquired by the acquisition device;

and under the condition that the parameter value is an adjustment value obtained by adjusting the parameter value acquired by the acquisition device, acquiring a three-dimensional joint point based on the parameter value, projecting the three-dimensional joint point onto a two-dimensional plane to acquire a two-dimensional joint point, acquiring a fourth loss value of the two-dimensional joint point based on the projected two-dimensional joint point and the first mark information, and determining the fourth loss value as the third loss value.

6. The method of claim 3, further comprising:

processing the parameter value of the third result by using a discriminator to obtain a classification result of the parameter value of the third result, wherein the classification result is used for representing whether the parameter value of the third result is a real numerical value acquired by an acquisition device;

and determining whether to stop training the preset model or not based on the classification result and the target loss value.

7. The method of claim 6, wherein the arbiter is trained using a generative countermeasure network.

8. The method of claim 1, wherein the human body detection of the original image, and obtaining the human body image comprises:

processing the original image by using the trained second model to obtain the position information of the human body in the original image;

and cutting and normalizing the original image based on the position information to obtain the human body image.

9. The method of claim 1, wherein the first model employs an hourglass-type network structure or a feature map pyramid FPN network structure.

10. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring an original image;

the detection module is used for carrying out human body detection on the original image to obtain a human body image;

the processing module is configured to process the human body image by using the trained first model to obtain a processing result of the human body image, where the processing result includes: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model;

and the generating module is used for generating a human body model according to the processing result of the human body image.

11. A storage medium, characterized in that the storage medium includes a stored program, wherein an apparatus in which the storage medium is located is controlled to execute the image processing method according to any one of claims 1 to 9 when the program is executed.

12. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to execute the image processing method according to any one of claims 1 to 9 when running.