CN114973396B

CN114973396B - Image processing method, image processing device, terminal equipment and computer readable storage medium

Info

Publication number: CN114973396B
Application number: CN202111675804.8A
Authority: CN
Inventors: 冷雨泉; 闫丹琪; 付成龙; 张贶恩; 林诚育
Original assignee: Southern University of Science and Technology
Current assignee: Southern University of Science and Technology
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-03-31
Anticipated expiration: 2041-12-31
Also published as: CN114973396A

Abstract

The application is applicable to the technical field of image processing, and provides an image processing method, an image processing device, terminal equipment and a computer readable storage medium, wherein the image processing method comprises the following steps: acquiring a first image containing a target object, wherein the first image is a two-dimensional image; acquiring a plurality of target three-dimensional postures of the target object; generating a second image of the target object in each target three-dimensional posture, wherein the second image is a two-dimensional image; converting the target object in the first image according to the posture of the target object in each second image to obtain a third image corresponding to each of the target three-dimensional postures; and generating a training set from the third images corresponding to the multiple target three-dimensional postures, wherein the training set is used for training a preset detection model, and the preset detection model is used for detecting the target object. By the method, the acquisition difficulty of the training data can be effectively reduced, and the data acquisition efficiency is improved.

Description

Image processing method, image processing device, terminal equipment and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, a terminal device, and a computer-readable storage medium.

Background

In the field of deep learning, the detection accuracy of an image detection model largely depends on the number and diversity of training data. Therefore, in the process of acquiring the training data, not only a large amount of training data needs to be acquired, but also the acquired training data needs to be screened so as to ensure the diversity of the training data. Taking a detection model for estimating the three-dimensional posture of a hand based on a two-dimensional image as an example, the training data is a two-dimensional image, but the detection result is a three-dimensional posture. When training data are collected, a large number of two-dimensional images need to be obtained, and the hand three-dimensional postures corresponding to the large number of two-dimensional images need to be guaranteed to be various. This greatly increased the training data's the collection degree of difficulty, reduced collection efficiency.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, terminal equipment and a computer readable storage medium, which can effectively reduce the acquisition difficulty of training data and improve the data acquisition efficiency.

In a first aspect, an embodiment of the present application provides an image processing method, including:

acquiring a first image containing a target object, wherein the first image is a two-dimensional image;

acquiring a plurality of target three-dimensional postures of the target object;

generating a second image of the target object in each target three-dimensional posture, wherein the second image is a two-dimensional image;

converting the target object in the first image according to the posture of the target object in each second image to obtain a third image corresponding to each of the target three-dimensional postures;

and generating a training set from the third images corresponding to the multiple target three-dimensional postures, wherein the training set is used for training a preset detection model, and the preset detection model is used for detecting the target object.

In the embodiment of the application, a two-dimensional image corresponding to the three-dimensional posture is generated, and the three-dimensional posture is known, so that the generated two-dimensional image equivalently contains the known three-dimensional posture information; the pose of the target object in the image to be processed is then converted into the pose of the target object in the generated two-dimensional image. By the method, a plurality of images which accord with any target posture can be generated from one image to be processed, the postures of the target objects in the generated images are various, the three-dimensional postures corresponding to the generated images are known, manual marking is not needed, the acquisition difficulty of training data is effectively reduced, and the data acquisition efficiency is greatly improved.

In a possible implementation manner of the first aspect, the acquiring a first image including a target object includes:

acquiring an image to be processed containing the target object;

and carrying out image segmentation processing on the target object and the background area in the image to be processed to obtain the first image.

In one possible implementation manner of the first aspect, the generating a second image of the target object in each of the target three-dimensional poses includes:

for each target three-dimensional gesture, converting the target three-dimensional gesture into a two-dimensional gesture in a pixel coordinate system;

generating the second image from the two-dimensional pose.

In a possible implementation manner of the first aspect, the converting the target object in the first image according to the posture of the target object in each of the second images to obtain a third image corresponding to each of the target three-dimensional postures includes:

acquiring a preset conversion model, wherein the conversion model comprises an encoder and a generator;

inputting the first image into the encoder, and obtaining a feature vector of the first image;

and inputting the feature vector of the first image and the second image into the generator to obtain the third image.

In a possible implementation manner of the first aspect, after the target object in the first image is converted according to the pose of the target object in each second image, and a third image corresponding to each of the multiple target three-dimensional poses is obtained, the method further includes:

replacing a background area in each third image to obtain fourth images corresponding to the target three-dimensional postures respectively;

and generating the training set by using the fourth images corresponding to the plurality of target three-dimensional poses respectively.

In a possible implementation manner of the first aspect, the replacing a background region in each of the third images and obtaining fourth images corresponding to the plurality of target three-dimensional poses includes:

for each third image, acquiring an image mask of the third image, wherein the pixel value corresponding to the target object in the image mask is 0, and the pixel value corresponding to the image area except the target object is 255;

generating a sixth image according to the image mask and the fifth image;

fusing the third image and the sixth image into the fourth image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the image acquisition unit is used for acquiring a first image containing a target object, wherein the first image is a two-dimensional image;

a posture acquisition unit for acquiring a plurality of target three-dimensional postures of the target object;

an image generation unit, configured to generate a second image of the target object in each of the target three-dimensional poses, where the second image is a two-dimensional image;

the posture conversion unit is used for converting the target object in the first image according to the posture of the target object in each second image to obtain third images corresponding to the target three-dimensional postures;

and the data generation unit is used for generating a training set from the third images corresponding to the multiple target three-dimensional postures, wherein the training set is used for training a preset detection model, and the preset detection model is used for detecting the target object.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the image processing method according to any one of the above first aspects when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, and the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the image processing method according to any one of the above first aspects.

In a fifth aspect, the present application provides a computer program product, which when run on a terminal device, causes the terminal device to execute the image processing method according to any one of the first aspect.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of an image processing method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a hand joint point provided by an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a transformation model provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a training network provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a background replacement process provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a data collection process provided by an embodiment of the present application;

fig. 7 is a block diagram of a configuration of an image processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise.

Currently in the deep learning field, the effectiveness of a model depends largely on how much and whether there is diversity in the training data. For the problem of estimating the three-dimensional pose of the hand based on the monocular RGB camera, which has been developed in the last two years, it is a current problem how to obtain various and huge data sets because the three-dimensional coordinates of the key points of the hand need to be labeled in the two-dimensional image.

In order to solve the problem, an embodiment of the present application provides an image processing method. Referring to fig. 1, which is a schematic flow chart of an image processing method provided in an embodiment of the present application, by way of example and not limitation, the method may include the following steps:

s101, a first image containing a target object is obtained, and the first image is a two-dimensional image.

In one embodiment, the first image is acquired in a manner that includes:

acquiring an image to be processed containing the target object; and carrying out image segmentation processing on the target object and the background area in the image to be processed to obtain the first image.

The image segmentation process may use an existing image segmentation model, such as a U-NET network. The image segmentation model can segment the foreground and the background in the image. The realization principle is as follows: inputting the image into the image segmentation model, a mask consisting of 0 and 255 can be generated, wherein 1 represents the portion that needs to be preserved and 0 represents the portion that does not need to be preserved. If the target object in the first image needs to be acquired, the mask of the region corresponding to the target object is 1 and the mask of the background region is 0 in the generated masks. And performing bitwise AND operation on the generated mask and the original image to reserve the part of the target object in the first image. And then, performing pixel-level processing on the original image, and setting the pixel values of other pixel points except the pixel point in the corresponding region of the target object to be 0, thereby completing image segmentation processing on the first image and obtaining the target object in the first image.

S102, obtaining a plurality of target three-dimensional postures of the target object.

Taking hand pose estimation as an example, the current hand pose uses a hand model with mainly 21 joints. As shown in FIG. 2, T represents the thumb, I represents the index finger, M represents the middle finger, R represents the ring finger,

p represents the little finger, TIP represents the fingertip, DIP represents the distal interphalangeal joint, PIP represents the proximal interphalangeal joint,

MCP stands for metacarpophalangeal joint and Wrist stands for Wrist. Accordingly, the target three-dimensional gesture in the embodiment of the present application refers to absolute three-dimensional coordinates of 21 joint points of the hand portion in the world coordinate system.

S103, generating a second image of the target object in each target three-dimensional posture, wherein the second image is a two-dimensional image.

To generate a two-dimensional image in a three-dimensional posture, the three-dimensional posture needs to be converted into a two-dimensional posture, and then a two-dimensional image is generated according to the converted two-dimensional posture. In one embodiment, the second image generation method includes:

for each target three-dimensional gesture, converting the target three-dimensional gesture into a two-dimensional gesture in a pixel coordinate system; generating the second image from the two-dimensional pose.

The process of converting the three-dimensional gesture to the two-dimensional gesture may include: first, three-dimensional coordinates in the world coordinate system are converted into two-dimensional coordinates in the image. The camera step of this process may refer to the conversion between the world, the camera, and the image.

Continue to take the hand gestureTo estimate as an example, assume the coordinate of the joint point k in the world coordinate system is

Where k represents the kth joint point and w represents the world coordinate system, then the coordinates of that point in the pixel coordinate system { u } _k ，v _k Then, it can be obtained by the following formula:

wherein c represents the camera coordinate system, f _x And f and _y is the focal length of the camera, u ₀ And v ₀ Is the pixel coordinates of the central position of the image,

is an internal reference matrix of the camera>

And the two external parameter matrixes can be obtained by calibrating the camera.

Since the bone map contains more information than the joint points, in the embodiment of the present application, the two-dimensional joint point pose after conversion is used to generate the bone map (i.e., the two-dimensional second image). Illustratively, the two-dimensional coordinates { u } of the 21 joint points obtained from the previous step _k ，v _k And k =1,2 and …, creating an all-zero matrix with the same size as the first image, and connecting the two-dimensional coordinates of the 21 joint points by lines with the width of 5 pixels and the value of 255 according to a hand skeleton manner to obtain a skeleton map.

S104, converting the target object in the first image according to the posture of the target object in each second image respectively to obtain third images corresponding to the plurality of target three-dimensional postures respectively.

Optionally, the target object in the first image may be converted according to the pose of the target object in the second image by using the trained conversion model.

To address the conversion problem between unpaired images, in one embodiment, S104 may include:

acquiring a preset conversion model, wherein the conversion model comprises an encoder and a generator; inputting the first image into the encoder, and obtaining a feature vector of the first image; and inputting the feature vector of the first image and the second image into the generator to obtain the third image.

Fig. 3 is a schematic structural diagram of a conversion model provided in the embodiment of the present application. As shown in FIG. 3, E is the encoder, G is the generator, D is the discriminator, I _s RGB picture of hand C _t Is a skeleton map and z is the potential feature space.

In the process of generating the third image, the first image I is processed _s Inputting the trained coder to obtain a feature vector Z in the potential space _s (ii) a Then Z is _s Input and second image C _t The third image is input to the trained generator G and output.

Before this, the generator and encoder need to be trained in advance. Further, in the embodiment of the application, improvement is performed on the basis of the original cycleGAN model. Fig. 4 is a schematic structural diagram of a training network provided in the embodiment of the present application.

In the training stage, the obtained processed bone images and hand images are paired, at the moment, the obtained data are randomly divided into two groups, and each group consists of a plurality of pairs of matched bone images and hand images. One group is defined as the source domain and one group is defined as the target domain. Assume hand image in source domain as I _s Corresponding bone map is C _s (ii) a The hand image in the target domain is I _t Corresponding bone map is C _t . And simultaneously transmitting the two groups of data into the built hand posture conversion network for training, and keeping the model parameters with optimal and stable results.

As shown in fig. 4, the improved training network consists of a total of two symmetric paths. Taking one of the paths as an example, the input is image I _s It is made up by using a braiding processAfter the encoder E, a set of eigenvectors Z corresponding to the encoder E in the potential eigenspace is obtained _s (ii) a Feature vector Z _s And skeleton map C _t The two are taken as new input and transmitted into a generator G to obtain an output image It; and It enters a discriminator D, and the discriminator judges whether the generated It is a real image or a composite image. Similarly, the other path may implement the conversion of the target domain to the source domain.

A loss function is introduced, and parameters in a generator, a discriminator and an encoder are updated by using a loss value calculated by the loss function so as to achieve the aim of training. In particular, the loss function may be composed of a penalty loss function, a consistency loss function, a round robin loss function, and a potential spatial similarity function. As follows:

L _adversarial ＝max _G max _D E _x [log(D(x))]+E _z [log(D(G(z，C)))]；

L _identity ＝||I _s -G(z _s ，C _s )|| ₁ ；

L _cycle ＝||I _t -G(z _s ，C _s )|| ₁ +||I _s -G(z _s ，C _s )|| ₁ ；

firstly, fixing parameters of a generator and an encoder, and updating parameters of a discriminator by using a loss value calculated by a countermeasure loss function; then, the parameters of the discriminator are fixed, and the parameters of the generator and the encoder are updated by using the loss values calculated by the consistency loss function, the cyclic loss function and the potential space similarity function.

Optionally, the respective function values of the consistency loss function, the cyclic loss function, and the potential spatial similarity function may be weighted, and the sum value may be used as the total loss value to update the parameters of the generator and the encoder.

And after the training is finished, breaking a circulation mode of the network, only keeping one forward route, discarding the discriminator D, building a conversion model according to the structure shown in FIG. 3, and loading parameters corresponding to the encoder E and the generator G obtained after the training.

And S105, generating a training set from the third images corresponding to the plurality of target three-dimensional postures, wherein the training set is used for training a preset detection model, and the preset detection model is used for detecting the target object.

In the embodiment of the application, a two-dimensional image corresponding to the three-dimensional posture is generated, and the three-dimensional posture is known, so that the generated two-dimensional image equivalently contains the known three-dimensional posture information; the pose of the target object in the image to be processed is then converted into the pose of the target object in the generated two-dimensional image. By the method, a plurality of images which accord with any target posture can be generated from one image to be processed, the postures of the target objects in the generated images are various, the corresponding three-dimensional postures in the generated images are known, manual marking is not needed, the acquisition difficulty of training data is effectively reduced, and the data acquisition efficiency is greatly improved.

To further increase the diversity of the acquired data, in one embodiment, S105 may further include:

replacing a background area in each third image to obtain fourth images corresponding to the target three-dimensional postures respectively; and generating the training set by using the fourth images corresponding to the plurality of target three-dimensional poses respectively.

Specifically, one implementation manner of replacing the background is as follows:

for each third image, acquiring an image mask of the third image, wherein the pixel value corresponding to the target object in the image mask is 0, and the pixel value corresponding to the image area except the target object is 255; generating a sixth image according to the image mask and the fifth image; fusing the third image and the sixth image into the fourth image.

The image mask for the third image may be generated again using the image segmentation model described in S101. Fig. 5 is a schematic diagram of a background replacement process provided in the embodiment of the present application. As shown in fig. 5, toIn the hand pose estimation, for example, in this step, the mask of the target object corresponding region is 0 and the mask of the background region is 255 in the generated image mask. Randomly selecting a background picture I from a background picture library _b The background picture and the image Mask are used _t Performing bitwise AND operation to obtain an image I _m (ii) a Then the image I is processed _m And an original target image (third image) I _t Fusing to obtain an image I _f (fourth image).

By the background replacing method, a plurality of backgrounds can be replaced for each obtained third image, a plurality of training images are derived, and diversity of acquired data is further increased. In addition, by the mode, a plurality of samples can be added only through automatic background replacement, and the data acquisition efficiency can be effectively improved.

Fig. 6 is a schematic diagram of a data acquisition process provided in an embodiment of the present application. As shown in fig. 6, an original image I is divided into _s The segmentation of foreground and background is performed, only the foreground (i.e. the target object) is kept, and the background is set to black. And generating a two-dimensional skeleton map of the hand according to the target posture. Converting the hand in the segmented foreground image according to the posture of the hand in the two-dimensional skeleton image to obtain a converted image I _t And the hand gesture conversion is realized. From converted images I _t To obtain an image Mask _t Masking the image with a randomly acquired background image I _b Performing bitwise AND operation to obtain a background image I _m (ii) a Finally, the background picture I _m And converting the image I _t Fusing to generate a fused image I _f 。

It should be noted that, for convenience of description, only hand pose estimation is taken as an example in the embodiment of the present application. In practical applications, the image processing method provided in the embodiment of the present application may also be applied to other target objects, and is not specifically limited herein.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 7 is a block diagram of an image processing apparatus according to an embodiment of the present application, which corresponds to the image processing method described in the foregoing embodiment, and only a part related to the embodiment of the present application is shown for convenience of description.

Referring to fig. 7, the apparatus includes:

an image acquisition unit 71 is configured to acquire a first image including a target object, where the first image is a two-dimensional image.

A pose acquisition unit 72 for acquiring a plurality of target three-dimensional poses of the target object.

An image generating unit 73, configured to generate a second image of the target object in each of the target three-dimensional poses, where the second image is a two-dimensional image.

And a posture conversion unit 74, configured to convert the target object in the first image according to the posture of the target object in each second image, respectively, so as to obtain a third image corresponding to each of the multiple target three-dimensional postures.

A data generating unit 75, configured to generate a training set from the third images corresponding to the multiple target three-dimensional poses, where the training set is used to train a preset detection model, and the preset detection model is used to detect the target object.

Optionally, the image acquiring unit 71 is further configured to:

acquiring an image to be processed containing the target object;

Optionally, the image generating unit 73 is further configured to:

generating the second image from the two-dimensional pose.

Optionally, the posture conversion unit 74 is further configured to:

Optionally, the data generating unit 75 is further configured to:

generating a sixth image according to the image mask and the fifth image;

fusing the third image and the sixth image into the fourth image.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

The image processing apparatus shown in fig. 7 may be a software unit, a hardware unit, or a combination of software and hardware unit built in an existing terminal device, may be integrated into the terminal device as a separate pendant, or may exist as a separate terminal device.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 8, the terminal device 8 of this embodiment includes: at least one processor 80 (only one shown in fig. 8), a memory 81, and a computer program 82 stored in the memory 81 and operable on the at least one processor 80, the processor 80 implementing the steps in any of the various image processing method embodiments described above when executing the computer program 82.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that fig. 8 is merely an example of the terminal device 8, and does not constitute a limitation of the terminal device 8, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 80 may be a Central Processing Unit (CPU), and the Processor 80 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 81 may in some embodiments be an internal storage unit of the terminal device 8, such as a hard disk or a memory of the terminal device 8. The memory 81 may also be an external storage device of the terminal device 8 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the terminal device 8. Further, the memory 81 may also include both an internal storage unit and an external storage device of the terminal device 8. The memory 81 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer programs. The memory 81 may also be used to temporarily store data that has been output or is to be output.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-drive, a removable hard drive, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image processing method, comprising:

generating a training set from third images corresponding to the multiple target three-dimensional postures, wherein the training set is used for training a preset detection model, and the preset detection model is used for detecting the target object;

the acquiring a first image containing a target object comprises:

acquiring an image to be processed containing the target object;

performing image segmentation processing on the target object and the background area in the image to be processed to obtain the first image;

correspondingly, the generating a second image of the target object in each of the target three-dimensional poses includes:

generating the second image from the two-dimensional pose.

2. The image processing method according to claim 1, wherein the converting the target object in the first image according to the pose of the target object in each of the second images to obtain a third image corresponding to each of the target three-dimensional poses comprises:

3. The image processing method according to claim 1, wherein after converting the target object in the first image according to the pose of the target object in each of the second images to obtain a third image corresponding to each of the plurality of target three-dimensional poses, the method further comprises:

correspondingly, the generating a training set from the third images corresponding to the plurality of target three-dimensional poses includes:

and generating the training set by using the fourth images corresponding to the plurality of target three-dimensional poses.

4. The image processing method according to claim 3, wherein the replacing the background area in each of the third images and obtaining a fourth image corresponding to each of the plurality of target three-dimensional poses comprises:

generating a sixth image according to the image mask and the fifth image;

fusing the third image and the sixth image into the fourth image.

5. An image processing apparatus characterized by comprising:

a posture acquisition unit configured to acquire a plurality of target three-dimensional postures of the target object;

an image generating unit, configured to generate a second image of the target object in each of the target three-dimensional poses, where the second image is a two-dimensional image;

the data generating unit is used for generating a training set from third images corresponding to the multiple target three-dimensional postures, wherein the training set is used for training a preset detection model, and the preset detection model is used for detecting the target object;

an image acquisition unit, configured to acquire an image to be processed including the target object, and perform image segmentation processing on the target object and a background area in the image to be processed to obtain the first image;

and the image generating unit is used for converting the target three-dimensional posture into a two-dimensional posture in a pixel coordinate system for each target three-dimensional posture and generating the second image according to the two-dimensional posture.

6. The image processing apparatus according to claim 5, wherein the image acquisition unit is further configured to:

acquiring an image to be processed containing the target object;

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 4 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 4.