WO2021190321A1

WO2021190321A1 - Image processing method and device

Info

Publication number: WO2021190321A1
Application number: PCT/CN2021/080280
Authority: WO
Inventors: 甄海洋; 周维
Original assignee: 虹软科技股份有限公司
Priority date: 2020-03-27
Filing date: 2021-03-11
Publication date: 2021-09-30
Also published as: CN113449570A; JP7448679B2; JP2023519012A; KR20220160066A

Abstract

An image processing method and device. The method comprises: acquiring an original image (S102); performing human body detection on the original image to obtain a human body image (S104); performing processing on the human body image by using a trained first model to obtain a processing result of the human body image, wherein the processing result comprises: a two-dimensional articulation point, a three-dimensional articulation point, and a skinned multi-person linear (SMPL) model (S106); and generating a human body model according to the processing result of the human body image (S108). The method solves the technical problem of low recognition accuracy for positioning of the two-dimensional and three-dimensional articulation points and reconstruction of a human body parameter.

Description

Image processing method and device

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with priority number 202010231605.7 and invention title "Image Processing Method and Apparatus" on March 27, 2020, the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the field of computer vision technology, and specifically to an image processing method and device.

Background technique

At present, relevant human body technologies in the industry include human body detection, two-dimensional and three-dimensional joint point positioning, segmentation, and so on. For the two-dimensional and three-dimensional joint point positioning and the reconstruction of human body parameter coefficients, the following schemes can be used at present: 1) First, use the deep learning scheme for human body detection on the image. After the detection is completed, the human body area is cut out, and then deep learning is used. The network estimates the two-dimensional joint points, and then uses the two-dimensional joint points to estimate the three-dimensional joint points, the posture and shape parameters of the human body. However, using two-dimensional joint points to estimate three-dimensional joint points will cause ambiguity in actions. For example, two-dimensional joint points in the same state will correspond to different three-dimensional joint points before and after, and the recognition accuracy of three-dimensional joint points depends on two The recognition accuracy of the three-dimensional joint points results in low recognition accuracy of the three-dimensional joint points. 2) First, perform human body detection on the image with a deep learning scheme. After the detection is completed, the human body area is cut out, and then the deep learning network is used to directly predict the 3D joint points, and the 3D joint points are turned into a 3D voxel grid to infer Possibility of each voxel grid of each joint for training and prediction. However, because the samples of 3D joint points are difficult to obtain, most of the training samples are collected in a laboratory environment, which is not robust to outdoor scenes, and voxel grids are used for prediction, which requires a large amount of calculation and real-time performance. Lower. 3) The human body is detected first, and then the detected pictures are segmented or analyzed, and then the results of the segmentation and analysis are used to estimate the human body model through an optimized method. However, due to the high requirements of human body segmentation and analysis, the deviation of the results will affect the effect of human body reconstruction.

In view of the problems in the above-mentioned schemes, no effective solutions have yet been proposed.

Summary of the invention

At least some embodiments of the present application provide an image processing method and device to at least solve the technical problem of low recognition accuracy for two-dimensional and three-dimensional joint point positioning and human body parameter reconstruction in related technologies.

According to one aspect of the embodiments of the present application, an image processing method is provided, which includes: obtaining an original image; performing human body detection on the original image to obtain a human body image; and using a trained first model to process the human body image to obtain the human body image The processing results include: two-dimensional joint points, three-dimensional joint points and a skinned multi-person linear SMPL model; the human body model is generated according to the processing result of the human body image.

Optionally, the method further includes: obtaining multiple sets of training samples, where each set of training samples includes: a human body image, first label information of two-dimensional joint points, second label information of three-dimensional joint points, and parameters of the SMPL model Value; use multiple sets of training samples to train the preset model and obtain the target loss value of the preset model; if the target loss value is less than the preset value, stop training the preset model and determine that the preset model is The first model; when the target loss value is greater than the preset value, continue to use multiple sets of training samples to train the preset model until the target loss value is less than the preset value.

Optionally, using multiple sets of training samples to train the preset model and obtaining the target loss value of the preset model includes: inputting multiple sets of training samples into the preset model and obtaining the output result of the preset model, where the output result Including: the first result of the two-dimensional joint point, the second result of the three-dimensional joint point and the third result of the SMPL model; based on the first label information and the first result, the first loss value of the two-dimensional joint point is obtained; based on the second The label information and the second result are used to obtain the second loss value of the three-dimensional joint point; based on the parameter value and the third result, the third loss value of the SMPL model is obtained; based on the first loss value, the second loss value, and the third loss value, Get the target loss value.

Optionally, the parameter values of the SMPL model are real data collected by the collecting device, or adjusted data obtained by adjusting the parameter values collected by the collecting device.

Optionally, obtaining the third loss value of the SMPL model based on the parameter value and the third result includes: in the case that the parameter value is a real value collected by the collecting device, obtaining the third loss based on the parameter value and the third result Value; in the case that the parameter value is the adjusted value obtained by adjusting the parameter value collected by the acquisition device, the three-dimensional joint point is obtained based on the parameter value, and the three-dimensional joint point is projected onto the two-dimensional plane to obtain the two-dimensional joint point, based on The projected two-dimensional joint points and the first label information are used to obtain the fourth loss value of the two-dimensional joint point, and the fourth loss value is determined as the third loss value.

Optionally, the method further includes: using a discriminator to process the parameter value of the third result to obtain a classification result of the parameter value of the third result, wherein the classification result is set to characterize whether the parameter value of the third result is collected The real value collected by the device; based on the classification result and the target loss value, it is determined whether to stop training the preset model.

Optionally, the discriminator is trained using a generative confrontation network.

Optionally, performing human body detection on the original image to obtain the human body image includes: processing the original image by using the trained second model to obtain the position information of the human body in the original image; and cropping and normalizing the original image based on the position information Chemical processing to obtain a human body image.

Optionally, the first model adopts an hourglass network structure or a feature map pyramid FPN network structure.

According to another aspect of the embodiments of the present application, an image processing device is also provided, including: an acquisition module configured to acquire an original image; a detection module configured to perform human body detection on the original image to obtain a human body image; a processing module configured to In order to use the trained first model to process the human body image to obtain the processing result of the human body image, the processing result includes: two-dimensional joint points, three-dimensional joint points and the skin multi-person linear SMPL model; the generation module is set to be based on The human body image is processed as a result to generate a human body model.

According to another aspect of the embodiments of the present application, a storage medium is also provided. The storage medium includes a stored program, wherein the device where the storage medium is located is controlled to execute the above-mentioned image processing method when the program is running.

According to another aspect of the embodiments of the present application, a processor is also provided, the processor is configured to run a program, wherein the above-mentioned image processing method is executed when the program is running.

In at least some of the embodiments of the present application, after acquiring the original image, first perform human body detection on the original image to obtain a human body image, and then use the trained first model to process the human body image to obtain the processing result of the human body image. At the same time, the purpose of human body detection, 2D and 3D joint point positioning and SMPL model establishment can be realized, and the human body model can be further generated. It is easy to notice that since one model can obtain two-dimensional joint points, three-dimensional joint points and SMPL models at the same time, there is no need to estimate three-dimensional joint points through two-dimensional joint points, thereby achieving the technical effect of improving the accuracy of image recognition, and then It solves the technical problem of low recognition accuracy for two-dimensional and three-dimensional joint point positioning and human body parameter reconstruction in related technologies.

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application;

Fig. 2 is a schematic diagram of an optional human body image according to an embodiment of the present application;

Fig. 3 is a schematic diagram of an optional human body model according to an embodiment of the present application;

Fig. 4a is a schematic diagram of an optional average-shaped human body model according to an embodiment of the present application;

4b is a schematic diagram of an optional human body model generated after adding shape parameters according to an embodiment of the present application;

Fig. 4c is a schematic diagram of an optional human body model generated after adding shape parameters and posture parameters according to an embodiment of the present application;

Figure 4d is a schematic diagram of an optional human body model generated based on detected human actions according to an embodiment of the present application;

Fig. 5 is a flowchart of an optional image processing method according to an embodiment of the present application;

Fig. 6 is a schematic diagram of an optional GAN network according to an embodiment of the present application; and

Fig. 7 is a schematic diagram of an image processing device according to an embodiment of the present application.

Detailed ways

In order to enable those skilled in the art to better understand the solutions of the application, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only These are a part of the embodiments of this application, but not all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.

It should be noted that the terms “first” and “second” in the specification and claims of the application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific sequence or sequence. It should be understood that the data used in this way can be interchanged under appropriate circumstances, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not necessarily limited to those clearly listed. Those steps or units may include other steps or units that are not clearly listed or are inherent to these processes, methods, products, or equipment.

Example 1

According to an embodiment of the present application, an image processing method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions. The logical order is shown in, but in some cases, the steps shown or described can be performed in a different order than here.

Fig. 1 is a flowchart of an image processing method according to an embodiment of the present application. As shown in Fig. 1, the method includes the following steps:

Step S102, obtaining an original image;

The foregoing original image may be an image intercepted from the input video stream data, or may be an image obtained directly, and the original image contains a human body.

Step S104: Perform human body detection on the original image to obtain a human body image;

The above-mentioned human body image may be the smallest image that contains a complete human body region extracted from the original image, as shown in FIG. 2.

In an alternative embodiment, a deep learning model can be used, such as Faster RCNN (Faster Region Convolutional Neural Networks, fast regional convolutional neural network), YOLO (You Only Look Once), and SSD (Single Shot Detector) detection The frame and its deformation are subject to human detection. Those skilled in the art can know that in different devices and application scenarios, different detection frameworks can be selected to quickly and accurately realize human body detection and obtain human body images.

Optionally, performing human body detection on the original image to obtain a human body image includes: using a trained deep learning model to process the original image to obtain the position information of the human body in the original image; and cropping and normalizing the original image based on the position information Chemical processing to obtain a human body image. Wherein, the position of the human body in the human body image can be represented by the smallest enclosing rectangular frame containing the complete human body area in the original image, and expressed in the form of two-dimensional coordinates (left, top, bottom, right).

Step S106: Use the trained first model to process the human body image to obtain the processing result of the human body image, where the processing result includes: two-dimensional joint points, three-dimensional joint points and SMPL (Skinned Multi-Person Linear). Linear) model.

Optionally, the above-mentioned first model may adopt an hourglass network structure or an FPN (Feature Pyramid Networks) network structure. For example, when the input is a w*h image, the output feature map can be a w*h or w/4*h/4 image.

The above-mentioned joint points may be the position coordinates of each joint on the human body, such as a wrist, an elbow, etc., as shown in FIG. 2.

The two-dimensional joint points can be expressed in the form of a heat map, or in the form of a coordinate vector. Among them, for the heat map form, each joint point can be represented as a feature map. Assuming that the input human body image is a w*h image, then the output feature map is an image of the same size or scaled in equal proportions. The value of the feature map of the location is 1, and the value of the feature map of other locations is 0. In an example, when there are 16 two-dimensional joint points of the human body, 16 feature maps of w*h or w/2*h/2 or smaller can be used to represent the two-dimensional joint points of the human body.

Three-dimensional joint points can also be expressed in two ways: heat map and coordinate vector. Among them, for the form of heat map, compared with two-dimensional joint points, three-dimensional joint points add z-axis information to the three-dimensional space, spreading the heat map into one cuboid.

In an optional embodiment, the first model may be used to process the human body image to obtain the parameter values of the SMPL model; and then the two-dimensional joint points or the three-dimensional joint points may be obtained based on the parameter values.

In step S108, a human body model is generated according to the processing result of the human body image.

As shown in Figure 3, the SMPL model can include shape parameters and pose parameters. The human body model generated according to the shape parameters and pose parameters can include multiple vertices and three-dimensional joint points. Each vertex and three-dimensional joint point is A three-dimensional vector containing (x, y, z) coordinates. Figures 4a to 4c show the process of generating a human body model based on shape parameters and posture parameters. Figure 4a shows a human body model with an average shape, and Figure 4b shows a human body model generated after adding shape parameters to the average shape. 4c represents the human body model generated after adding shape parameters and posture parameters on the basis of the average shape. Fig. 4d shows a human body model generated based on the human body motion detected on the basis of the human body model generated in Fig. 4c. By comparing Figure 4b and Figure 4c, it can be seen that the difference between the two is not very large. Therefore, in some applications, a human body model can be generated only based on shape parameters to achieve human body modeling.

Through the above-mentioned embodiments of the application, after the original image is obtained, the human body is detected on the original image to obtain the human body image, and then the human body image is processed by the trained first model to obtain the processing result of the human body image, thereby simultaneously achieving The purpose of human body detection, two-dimensional and three-dimensional joint point positioning and SMPL model establishment, and can further generate a human body model. It is easy to notice that since one model can obtain two-dimensional joint points, three-dimensional joint points and SMPL models at the same time, there is no need to estimate three-dimensional joint points through two-dimensional joint points, thereby achieving the technical effect of improving the accuracy of image recognition, and then It solves the technical problem of low recognition accuracy for two-dimensional and three-dimensional joint point positioning and human body parameter reconstruction in related technologies.

In the first application scenario, the human body motion can be detected in real time to drive the human body animation (AVATAR) model, for example, based on 2D joint points and 3D joint points to capture human body motion, so that the human body animation model can follow the human body to make corresponding actions , To achieve interactive interaction.

In the second application scenario, according to the two-dimensional joint points and three-dimensional joint points in the processing result, the purpose of slimming the human body can be achieved. Processing to achieve image processing effects such as thin arms, thin legs, and thin waists.

Optionally, in the foregoing embodiment of the present application, the image processing method further includes: acquiring multiple sets of training samples, wherein each set of training samples includes: a human body image, first label information of two-dimensional joint points, and three-dimensional joint points The second label information of the, and the parameter values of the SMPL model; use multiple sets of training samples to train the preset model, and obtain the target loss value of the preset model; if the target loss value is less than the preset value, stop pre-preparation Set the model for training, and determine that the preset model is the first model; when the target loss value is greater than the preset value, continue to use multiple sets of training samples to train the preset model until the target loss value is less than the preset value. The smaller the target loss value is, the higher the recognition accuracy is. The above-mentioned preset value can be set in advance according to the requirements of image recognition accuracy and efficiency, and it can be determined whether the training of the model is completed or not through the preset value.

Optionally, in the foregoing embodiment of the present application, the image processing method further includes marking the training sample with first label information of the two-dimensional joint points and second label information of the three-dimensional joint points.

In an alternative embodiment, for the two-dimensional joint points, the first loss value can be obtained based on the predicted heat map (ie, the first result) and the heat map of the label (ie, the first label information), or based on the prediction The coordinate vector (that is, the first result) and the coordinate vector of the tag label (that is, the first tag information) are obtained, or it is obtained based on the integrated information of the heat map and the coordinate vector.

For three-dimensional joint points, similarly, the second loss value can be obtained based on the predicted heat map (i.e., the second result) and the heat map of the tag label (i.e., the second tag information), or based on the predicted coordinate vector (i.e., the second result) ) And the coordinate vector of the mark label (ie, the second mark information), or based on the integrated information of the heat map and the coordinate vector.

Among them, the coordinate vector method is more convenient for calculation than the heat map method.

Optionally, the parameter value of the aforementioned SMPL model may be the actual value collected by the collecting device, or the adjusted value obtained by adjusting the parameter value collected by the collecting device. In an optional embodiment, the parameter values of the SMPL model can be predicted by the real value and the adjusted value, where the weight of the real data is larger, and the weight of the adjusted value is smaller.

The aforementioned collection device may be cameras or sensors installed in multiple fixed positions in a laboratory environment or an outdoor environment.

Since only data collected in a laboratory environment can obtain accurate and true parameter values of the SMPL model, data collected in an outdoor environment cannot obtain accurate parameter values of the SMPL model. Therefore, in actual calculations, for the SMPL model, different methods can be used to calculate the third loss value based on the type of parameter value. Optionally, when the parameter value is the real value collected by the acquisition device, the third loss value can be calculated by direct regression, that is, the third loss value is obtained based on the parameter value and the third result; when the parameter value is passed When adjusting the adjusted values obtained by adjusting the parameter values collected by the acquisition device, the three-dimensional joint points can be obtained according to the parameter values of the SMPL model, and the three-dimensional joint points can be projected on the two-dimensional plane to obtain the two-dimensional joint points, based on the projected two-dimensional joints The points and the first mark information calculate the fourth loss value of the two-dimensional joint point, use the loss value as the third loss value, and return it to the parameter space of the SMPL model to update the parameter value of the SMPL model.

In the training process, the target loss value is a combination of the first loss value, the second loss value, and the third loss value, which can be calculated by calculating the weighted sum.

In an optional embodiment, during the model training process, the two-dimensional joint points, the three-dimensional joint points, and the parameters of the SMPL model can be learned at the same time, and the overall regression can be performed to generate the model. In addition, as shown in Figure 5, you can The SMPL model discriminator is used to discriminate the parameter values of the SMPL model, and determine whether the parameter values are randomly generated values through the network or real values collected, thereby improving the authenticity of the model effect. Optionally, the SMPL model discriminator processes the parameter values of the third result (that is, the SMPL model output by the preset model) to obtain a classification result of the parameter values of the third result, where the classification result is used to characterize the third result Whether the parameter value is the real value collected by the collecting device; based on the classification result and the target loss value, it is determined whether to stop training the preset model. Among them, the D discriminator in the Generative Adversarial Network (GAN) can be used as the SMPL model discriminator.

In an optional embodiment, because the data collected in the outdoor environment cannot obtain accurate parameter values of the SMPL model, abnormal parameter values may be generated. To solve this problem, in the embodiment of the present application Add GAN network to train SMPL model discriminator (ie D discriminator), as shown in Figure 6, GAN network includes G generator and D discriminator, D discriminator is a two-class network, receiving random from G generator The generated value and the collected real value, and output a label indicating the authenticity of the data, for example, when the real value is received, the output is close to the positive label (usually, the positive label is set to 1), when receiving the value randomly generated by the G generator When the output is close to the negative label (usually, the negative label is set to 0), the D discriminator is used to describe the difference between the randomly generated value and the real value, and then the weight of the value randomly generated by the G generator is updated according to the difference, so that G The value randomly generated by the generator is closer to the real value, and the ability of the D discriminator to distinguish between the randomly generated value and the real value is improved.

Example 2

According to an embodiment of the present application, an image processing device is provided, which can execute the image processing method described in the above embodiment 1. The preferred embodiments and application scenarios in this embodiment are the same as those in the above embodiment 1. Do repeat.

Fig. 7 is a schematic diagram of an image processing device according to an embodiment of the present application. As shown in Fig. 7, the device includes:

The obtaining module 72 is configured to obtain the original image;

The detection module 74 is configured to perform human body detection on the original image to obtain a human body image;

The processing module 76 is configured to process the human body image by using the trained first model to obtain a processing result of the human body image, where the processing result includes: two-dimensional joint points, three-dimensional joint points, and parameter values of the SMPL model;

The generating module 78 is configured to generate a human body model according to the processing result of the human body image.

It should be noted here that the acquisition module 72, the detection module 74, the processing module 76, and the generation module 78 can be run in a computer terminal as a part of the device, and the functions implemented by the above modules can be executed by the processor in the computer terminal. The computer terminal can also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a mobile Internet device (Mobile Internet Devices, MID), PAD, and other terminal devices.

Optionally, in the above-mentioned embodiment of the present application, the device further includes: the acquisition module is further configured to acquire multiple sets of training samples, wherein each set of training samples includes: a human body image, and first label information of two-dimensional joint points, The second label information of the three-dimensional joint points and the parameter values of the SMPL model; the training module is set to use multiple sets of training samples to train the preset model and obtain the target loss value of the preset model; stop the training module and set it to When the target loss value is less than the preset value, stop training the preset model and determine that the preset model is the first model; the training module is also set to continue to use multiple groups when the target loss value is greater than the preset value The training samples train the preset model until the target loss value is less than the preset value.

It should be noted here that the aforementioned acquisition module, training module, and stop training module can be run in a computer terminal as part of the device, and the functions implemented by the aforementioned modules can be executed by the processor in the computer terminal, and the computer terminal can also be a smart device. Mobile phones (such as Android phones, iOS phones, etc.), tablet computers, handheld computers, mobile Internet devices, PAD and other terminal devices.

Optionally, the training module includes: an obtaining unit configured to input multiple sets of training samples into the preset model and obtain the output result of the preset model, where the output result includes: the first result of the two-dimensional joint point, the three-dimensional joint point The second result of the SMPL model and the third result of the SMPL model; the first processing unit is set to obtain the first loss value of the two-dimensional joint point based on the first mark information and the first result; the second processing unit is set to be based on the second The label information and the second result are used to obtain the second loss value of the three-dimensional joint point; the third processing unit is set to obtain the third loss value of the SMPL model based on the parameter value and the third result; the fourth processing unit is set to be based on the first A loss value, a second loss value, and a third loss value are used to obtain the target loss value.

It should be noted here that the acquisition unit, the first processing unit, the second processing unit, the third processing unit, and the fourth processing unit can be run in the computer terminal as part of the device, and can be processed by the processor in the computer terminal. To perform the functions implemented by the above modules, the computer terminal may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a mobile Internet device, a PAD, and other terminal devices.

Optionally, the third processing unit is further configured to obtain a third loss value based on the parameter value and the third result when the parameter value is collected by the collecting device; when the parameter value is the parameter collected by the collecting device When the value is adjusted, the 3D joint points are obtained based on the parameter values, the 3D joint points are projected onto the 2D plane to obtain the 2D joint points, and the 2D joint points are obtained based on the projected 2D joint points and the first label information The fourth loss value of, and the fourth loss value is determined as the third loss value.

Optionally, the device further includes: the processing module is further configured to use the discriminator to process the parameter value of the third result to obtain a classification result of the parameter value of the third result, wherein the classification result is used to characterize the parameter of the third result Whether the value is a real value collected by the acquisition device; the stop training module is also set to determine whether to stop training the preset model based on the classification result and the target loss value.

Optionally, the training module is further configured to train the discriminator by using a generative confrontation network.

Optionally, the detection module includes: a detection unit configured to process the original image using the trained second model to obtain the position information of the human body in the original image; the fifth processing unit is configured to perform processing on the original image based on the position information Crop and normalize processing to get the human body image.

It should be noted here that the above detection unit and the fifth processing unit can be run in a computer terminal as part of the device, and the functions implemented by the above modules can be executed by the processor in the computer terminal. The computer terminal can also be a smart phone ( Such as Android phones, iOS phones, etc.), tablet computers, handheld computers, mobile Internet devices, PAD and other terminal devices.

The functional units provided in the embodiments of the present application may run in a mobile terminal, a computer terminal or a similar computing device, or may be stored as a part of a storage medium.

Therefore, the embodiments of the present application may provide a computer terminal, and the computer terminal may be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the above-mentioned computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices in the computer network.

In this embodiment, the above-mentioned computer terminal can execute the program code of the following steps in the image processing method: obtain the original image; perform human body detection on the original image to obtain the human body image; use the trained first model to process the human body image to obtain The processing result of the human body image, where the processing result includes: two-dimensional joint points, three-dimensional joint points and a skinned multi-person linear SMPL model; the human body model is generated according to the processing result of the human body image.

Optionally, the computer terminal may include: one or more processors, memories, and transmission devices.

Among them, the memory can be used to store software programs and modules, such as program instructions/modules corresponding to the image processing method and device in the embodiments of the present application. The processor executes various functional applications by running the software programs and modules stored in the memory. And data processing, that is, to achieve the above-mentioned image processing method. The memory may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include a memory remotely provided with respect to the processor, and these remote memories may be connected to the terminal through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

The aforementioned transmission device is used to receive or send data via a network. The above-mentioned specific examples of the network may include a wired network and a wireless network. In one example, the transmission device includes a network adapter (Network Interface Controller, NIC), which can be connected to other network devices and routers via a network cable so as to communicate with the Internet or a local area network. In one example, the transmission device is a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.

Among them, specifically, the memory is used to store the first model, the skinned multi-person linear SMPL model, the processing result, and the application program.

The processor may call the information and application programs stored in the memory through the transmission device to execute the program code of the method steps of each optional or preferred embodiment in the foregoing method embodiments.

Those of ordinary skill in the art can understand that the computer terminal may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a mobile Internet device, a PAD, and other terminal devices.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by instructing the relevant hardware of the terminal device through a program. The program can be stored in a computer-readable storage medium, which can be Including: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random Access Memory, RAM), magnetic disk or optical disk, etc.

Example 3

According to an embodiment of the present application, a computer-readable storage medium is provided, and the computer-readable storage medium includes a stored program, wherein when the program is running, the device where the computer-readable storage medium is located is controlled to execute the image processing method in Embodiment 1 above. .

Optionally, in this embodiment, the above-mentioned computer-readable storage medium may be located in any computer terminal in a computer terminal group in a computer network, or located in any mobile terminal in a mobile terminal group.

Optionally, in this embodiment, the computer-readable storage medium is configured to store program code for performing the following steps: obtain an original image; perform human body detection on the original image to obtain a human body image; use the trained first model The human body image is processed to obtain the processing result of the human body image, where the processing result includes: two-dimensional joint points, three-dimensional joint points and a skin multi-person linear SMPL model; the human body model is generated according to the processing result of the human body image.

Optionally, in this embodiment, the computer-readable storage medium may also be configured to store the program code of various preferred or optional method steps provided by the image processing method.

As above, the image processing method and device according to the present invention are described by way of example with reference to the accompanying drawings. However, those skilled in the art should understand that various improvements can be made to the image processing method and device proposed by the present invention without departing from the content of the present invention. Therefore, the protection scope of the present invention should be determined by the content of the appended claims.

Example 4

According to an embodiment of the present application, a processor is provided, and the processor is configured to run a program, wherein the image processing method in Embodiment 1 is executed when the program is running.

The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.

In the above-mentioned embodiments of the present application, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed technical content can be implemented in other ways. The device embodiments described above are only illustrative. For example, the division of the units may be a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes. .

The above are only the preferred embodiments of this application. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of this application, several improvements and modifications can be made, and these improvements and modifications are also Should be regarded as the scope of protection of this application.

Industrial applicability

As described above, the image processing method and device provided by at least some of the embodiments of the present application have the following beneficial effects: Since one model can be used to obtain two-dimensional joint points, three-dimensional joint points and SMPL models at the same time, there is no need to estimate three-dimensional through two-dimensional joint points. The key points, so as to achieve the technical effect of improving the accuracy of image recognition, and then solve the technical problem of low recognition accuracy for two-dimensional and three-dimensional joint point positioning and human body parameter reconstruction in related technologies.

Claims

An image processing method, including:

Get the original image;

Performing human body detection on the original image to obtain a human body image;

Use the trained first model to process the human body image to obtain a processing result of the human body image, where the processing result includes: two-dimensional joint points, three-dimensional joint points, and a skinned multi-person linear SMPL model;

A human body model is generated according to the processing result of the human body image.
The method according to claim 1, wherein the method further comprises:

Obtain multiple sets of training samples, where each set of training samples includes: a human body image, the first label information of the two-dimensional joint points, the second label information of the three-dimensional joint points, and the parameter values of the SMPL model;

Training a preset model by using the multiple sets of training samples, and obtaining a target loss value of the preset model;

If the target loss value is less than a preset value, stop training the preset model, and determine that the preset model is the first model;

In a case where the target loss value is greater than the preset value, continue to use the multiple sets of training samples to train the preset model until the target loss value is less than the preset value.
The method according to claim 2, wherein training a preset model using the multiple sets of training samples and obtaining a target loss value of the preset model comprises:

The multiple sets of training samples are input to the preset model, and the output result of the preset model is obtained, where the output result includes: the first result of the two-dimensional joint point, and the first result of the three-dimensional joint point The second result and the third result of the SMPL model;

Obtaining a first loss value of the two-dimensional joint point based on the first label information and the first result;

Obtaining a second loss value of the three-dimensional joint point based on the second mark information and the second result;

Obtaining a third loss value of the SMPL model based on the parameter value and the third result;

Based on the first loss value, the second loss value, and the third loss value, the target loss value is obtained.
The method according to claim 3, wherein the parameter value of the SMPL model is a real value collected by a collecting device, or an adjusted value obtained by adjusting the parameter value collected by the collecting device.
The method according to claim 4, wherein, based on the parameter value and the third result, obtaining the third loss value of the SMPL model comprises:

In the case that the parameter value is a real value collected by the collecting device, obtain the third loss value based on the parameter value and the third result;

In the case where the parameter value is an adjustment value obtained by adjusting the parameter value collected by the acquisition device, a three-dimensional joint point is obtained based on the parameter value, and the three-dimensional joint point is projected onto a two-dimensional plane to obtain For a two-dimensional joint point, a fourth loss value of the two-dimensional joint point is obtained based on the projected two-dimensional joint point and the first label information, and the fourth loss value is determined as the third loss value.
The method according to claim 3, wherein the method further comprises:

The parameter value of the third result is processed by a discriminator to obtain a classification result of the parameter value of the third result, wherein the classification result is used to characterize whether the parameter value of the third result passes through the acquisition device The actual value collected;

Based on the classification result and the target loss value, it is determined whether to stop training the preset model.
The method according to claim 6, wherein the discriminator is trained using a generative adversarial network.
The method according to claim 1, wherein performing human body detection on the original image to obtain a human body image comprises:

Processing the original image by using the trained second model to obtain position information of the human body in the original image;

The original image is cropped and normalized based on the position information to obtain the human body image.
The method according to claim 1, wherein the first model adopts an hourglass-type network structure or a feature map pyramid FPN network structure.
The method according to claim 1, wherein processing the human body image by using the trained first model to obtain the processing result of the human body image comprises:

Processing the human body image by using the first model to obtain the SMPL model;

Obtain the two-dimensional joint point or the three-dimensional joint point based on the SMPL model.
The method according to claim 1, wherein the two-dimensional joint points are expressed in a heat map form or a coordinate vector form, and the three-dimensional joint points are expressed in a heat map form or a coordinate vector form.
The method according to claim 1, wherein after generating a human body model according to the processing result of the human body image, the method further comprises:

Capturing human body movements based on the two-dimensional joint points and the three-dimensional joint points;

The human body animation model is driven based on the human body motion.
The method according to claim 1, wherein after generating a human body model according to the processing result of the human body image, the method further comprises:

Based on the two-dimensional joint points and the three-dimensional joint points, image pixels of the target position on the human body image are processed.
The method according to claim 4, wherein the parameter value of the SMPL model is obtained by predicting the real value and the adjusted value, wherein the weight of the real value is greater than the weight of the adjusted value.
An image processing device, including:

The acquisition module is set to acquire the original image;

The detection module is configured to perform human body detection on the original image to obtain a human body image;

The processing module is configured to process the human body image by using the trained first model to obtain a processing result of the human body image, wherein the processing result includes: two-dimensional joint points, three-dimensional joint points and multiple skins Linear SMPL model;

The generating module is configured to generate a human body model according to the processing result of the human body image.
A computer-readable storage medium, the computer-readable storage medium includes a stored program, wherein, when the program is running, the device where the computer-readable storage medium is located is controlled to execute any one of claims 1 to 14 Image processing method.
A processor configured to run a program, wherein the image processing method according to any one of claims 1 to 14 is executed when the program is running.