CN111246113A - Image processing method, device, equipment and storage medium - Google Patents

Image processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN111246113A
CN111246113A CN202010147639.8A CN202010147639A CN111246113A CN 111246113 A CN111246113 A CN 111246113A CN 202010147639 A CN202010147639 A CN 202010147639A CN 111246113 A CN111246113 A CN 111246113A
Authority
CN
China
Prior art keywords
ideal
image
candidate
human
limb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010147639.8A
Other languages
Chinese (zh)
Other versions
CN111246113B (en
Inventor
罗彤
李亚乾
蒋燚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jinsheng Communication Technology Co ltd
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Shanghai Jinsheng Communication Technology Co ltd
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jinsheng Communication Technology Co ltd, Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Shanghai Jinsheng Communication Technology Co ltd
Priority to CN202010147639.8A priority Critical patent/CN111246113B/en
Publication of CN111246113A publication Critical patent/CN111246113A/en
Application granted granted Critical
Publication of CN111246113B publication Critical patent/CN111246113B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/62Control of parameters via user interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • H04N23/632Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/80Camera processing pipelines; Components thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an image processing method, an image processing device, image processing equipment and a storage medium, and belongs to the technical field of image processing. The method comprises the following steps: acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures; acquiring an ideal character image, wherein characters in the ideal character image have ideal body postures; correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value. The technical scheme provided by the embodiment of the application can improve the shooting efficiency of the figure image.

Description

Image processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and a storage medium.
Background
Currently, people's image capture is becoming more common in people's daily life, wherein an aesthetic body posture (which may also be referred to as a photographing posture) may enhance the overall effect of the people's image.
In the related art, after the person image is captured, the subject or the imaging person can view the captured person image, and if the body posture in the person image is not beautiful, the subject can adjust the body posture thereof, and then the imaging person can capture the subject again until the body posture in the captured person image is beautiful.
However, such an approach is cumbersome, resulting in inefficient capturing of the person's image.
Disclosure of Invention
Based on this, the embodiment of the application provides an image processing method, an image processing device, an image processing apparatus and a storage medium, which can improve the shooting efficiency of the person image.
In a first aspect, an image processing method is provided, which includes:
acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures; acquiring an ideal character image, wherein characters in the ideal character image have ideal body postures; correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.
In a second aspect, there is provided an image processing apparatus comprising:
the first acquisition module is used for acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body gestures;
the second acquisition module is used for acquiring an ideal character image, and characters in the ideal character image have ideal human body postures;
the correction module is used for correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.
In a third aspect, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the image processing method according to any of the first aspects above.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method according to any of the first aspects described above.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
by acquiring a non-ideal character image and an ideal character image, wherein the character in the non-ideal character image has a non-ideal body posture, and the character in the ideal character image has an ideal body posture, and then correcting the body posture of the character in the non-ideal character image according to the ideal character image, so that the difference between the corrected body posture and the ideal body posture is smaller than a preset difference threshold value, when the body posture in the shot character image is not attractive, namely is not ideal, the correction processing can be directly performed according to the ideal character image, so that the corrected body posture is close to the ideal body posture, and because the ideal body posture is the ideal and attractive body posture, the attractiveness of the body posture in the shot character image can be improved through the correction processing, and thus, in the process of shooting the figure image, the shot person does not need to adjust the body posture of the shot person for many times, and the shot person does not need to shoot for many times, so that the efficiency of shooting the figure image can be improved.
Drawings
FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;
fig. 2 is a flowchart of an image processing method according to an embodiment of the present application;
fig. 3 is a flowchart of a method for correcting a human posture of a person in a non-ideal human image according to an embodiment of the present application;
fig. 4 is a schematic network structure diagram of a key point identification network according to an embodiment of the present disclosure;
fig. 5 is a schematic network structure diagram of a 1 st second identification subnetwork provided in the embodiment of the present application;
fig. 6 is a schematic network structure diagram of a kth second identification subnetwork provided in the embodiment of the present application;
fig. 7 is a schematic diagram of an STN network structure according to an embodiment of the present disclosure;
fig. 8 is a block diagram of an image processing apparatus according to an embodiment of the present application;
FIG. 9 is a block diagram of a calibration module provided in an embodiment of the present application;
fig. 10 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
In the following, a brief description will be given of an implementation environment related to the image processing method provided in the embodiment of the present application.
Fig. 1 is a schematic diagram of an implementation environment related to an image processing method provided in an embodiment of the present application, and as shown in fig. 1, the implementation environment may include a server 101 and a terminal 102, and the server 101 and the terminal 102 may communicate with each other through a wired network or a wireless network.
The terminal 102 may be a smart phone, a tablet computer, a wearable device, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compress standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compress standard Audio Layer 4), an e-book reader, or a vehicle-mounted device. The server 101 may be one server or a server cluster including a plurality of servers.
In the implementation environment shown in fig. 1, the terminal 102 may transmit a non-ideal character image in which a character has a non-ideal body posture, which refers to an undesirable or unaesthetic body posture, and an ideal character image, which may alternatively be a body posture unsatisfactory to the user, to the server 101, and may alternatively be a body posture desired to be posed by the user. The server 101 may perform correction processing of the human body posture of the person in the non-ideal personal image using the ideal personal image.
Of course, in some possible implementations, the implementation environment related to the image processing method provided by the embodiment of the present application may only include the terminal 102.
In the case where the implementation environment includes only the terminal 102, the terminal 102 may perform correction processing of the human body posture of the person in the non-ideal person image using the ideal person image directly after acquiring the non-ideal person image and the ideal person image.
Please refer to fig. 2, which shows a flowchart of an image processing method provided in the embodiment of the present application, where the image processing method may be applied to the server 101 or the terminal 102, and the embodiment of the present application only takes the application of the image processing method to the terminal 102 as an example for description, and a technical process of the image processing method applied to the server 101 is the same as a technical process of the image processing method applied to the terminal 102, and details of the image processing method are not repeated in the embodiment of the present application. As shown in fig. 2, the image processing method may include the steps of:
step 201, the terminal acquires a non-ideal person image.
In one embodiment of the present application, if the terminal detects a correction instruction for a personal image after capturing the personal image, the terminal may take the captured personal image as a non-ideal personal image.
Optionally, after the terminal captures the person image, the captured person image may be displayed in an image display interface, and the terminal may receive a correction instruction for the person image based on the image display interface.
In a possible implementation manner, a correction option may be set in the image presentation interface, and when a trigger operation on the correction option is detected, the terminal may receive a correction instruction for the person image.
In another possible implementation manner, when the terminal detects a preset type of touch operation in the image display interface, the terminal may receive a correction instruction for the person image, where the touch operation may be a double-click operation, a single-machine operation, or a sliding operation.
Step 202, the terminal acquires an ideal person image.
In one embodiment of the present application, the terminal may determine the above-described ideal personal image from among the plurality of candidate ideal personal images according to a selection instruction of the user.
In practical applications, the terminal may store a plurality of candidate ideal personal images in advance, or the terminal may request the server for a plurality of candidate ideal personal images, where the human body posture of the person in the candidate ideal personal images is an ideal human body posture, the user may select one candidate ideal personal image from the plurality of candidate ideal personal images as the ideal personal image, and the terminal may perform correction processing on the human body posture of the person in the non-ideal personal image based on the ideal personal image selected by the user.
And step 203, the terminal corrects the human body posture of the person in the non-ideal person image according to the ideal person image.
Wherein, the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value, in other words, the corrected human body posture is close to the ideal human body posture.
The image processing method provided by the embodiment of the application obtains the non-ideal character image and the ideal character image, wherein the character in the non-ideal character image has a non-ideal body posture, the character in the ideal character image has an ideal body posture, then, the body posture of the character in the non-ideal character image is corrected according to the ideal character image, so that the difference between the corrected body posture and the ideal body posture is smaller than the preset difference threshold value, therefore, when the body posture of the shot character image is not beautiful, namely is not ideal, the correction processing can be directly carried out according to the ideal character image, the corrected body posture is close to the ideal body posture, because the ideal body posture is the ideal and beautiful body posture, the beauty of the body posture in the shot character image can be improved through the correction processing, thus, in the process of shooting the figure image, the shot person does not need to adjust the body posture of the shot person for many times, and the shot person does not need to shoot for many times, so that the efficiency of shooting the figure image can be improved.
Referring to fig. 3, on the basis of the above embodiment, an embodiment of the present application provides a method for performing a correction process on a human posture of a person in a non-ideal person image according to an ideal person image, where the method may include the following steps:
step 2031, the terminal identifies the key points of the human skeleton for the non-ideal character image and the ideal character image respectively to obtain a plurality of key points of the human skeleton included in the non-ideal character image and a plurality of key points of the human skeleton included in the ideal character image.
Optionally, in an embodiment of the present application, the terminal may perform human skeleton keypoint identification on the non-ideal person image and the ideal person image by using a neural network, and in order to simplify the description, in the process of describing the human skeleton keypoint identification, the non-ideal person image and the ideal person image are collectively referred to as a target person image. The technical process of identifying the key points of the human skeleton of the target person image by using the neural network can comprise the following steps A and B.
And step A, the terminal inputs the target character image into a key point identification network to obtain a key point probability graph set and a limb direction vector graph set output by the key point identification network.
In the following, the embodiment of the present application will describe a keypoint probability map set and a limb direction vector map set separately.
Firstly, a key point probability graph set.
The key point probability map set comprises a plurality of key point probability maps which are in one-to-one correspondence with different types of human skeleton key points, and each key point probability map comprises a probability value used for indicating the probability that the corresponding type of human skeleton key point is located at the position point of the probability value.
For example, assuming that there are 3 kinds of human skeleton key points, certainly, the kinds of human skeleton key points in practical application are far more than 3 kinds, and for simplicity of explanation, the present embodiment is described by taking only 3 kinds of human skeleton key points as an example, where the 3 kinds of human skeleton key points are a left elbow key point, a left wrist key point, and a left shoulder key point, respectively.
In the case of 3 kinds of human skeleton key points in total, the key point probability map set described above includes 3 key point probability maps, each of which corresponds to one kind of human skeleton key points.
The keypoint probability map is essentially a matrix, each matrix element in the matrix is a probability value, the position of each matrix element (i.e., the probability value) in the matrix can be regarded as the position point of the matrix element in the keypoint probability map, and the position point has a mapping relationship with one or more pixels in the target person image.
As described above, the probability value included in the key point probability map is used to indicate the probability that the corresponding kind of human bone key point is located at the position of the probability value, and taking the key point probability map corresponding to the left-elbow key point as an example, the probability value included in the key point probability map is used to indicate the probability that the left-elbow key point is located at the position of the probability value.
And II, collecting the limb direction vector diagram.
The set of limb direction vector images includes a plurality of limb direction vector images in one-to-one correspondence with different kinds of limbs, each limb direction vector image including vector values for indicating directions of the corresponding kind of limb at a point where the vector values are located.
It should be noted that the body limb in the embodiment of the present application is not a narrow body limb, but refers to a body region between the associated skeletal key points of the human body, for example, the body region between the key point for the left eye and the key point for the right eye is one body limb, and the body region between the key point for the neck and the key point for the left shoulder is one body limb.
For example, assuming that there are 2 kinds of limbs, of course, the kinds of limbs in practical application are far more than 2 kinds, for simplicity of description, the present embodiment is described by taking only 2 kinds of limbs as an example, where the 2 kinds of limbs are a left forearm limb and a left forearm limb respectively, where the left forearm limb is a human body region between a key point of a left elbow and a key point of a left wrist, and the left forearm limb is a human body region between a key point of a left elbow and a key point of a left shoulder.
In the case of a total of 2 limbs, the set of limb direction vector images described above includes 2 limb direction vector images, each corresponding to one limb.
Wherein the body direction vector diagram is essentially a matrix, each matrix element in the matrix is a vector value, and the position of each matrix element (i.e., the vector value) in the matrix can be regarded as the position point of the matrix element in the body direction vector diagram, and the position point has a mapping relation with one or more pixels in the target person image.
Alternatively, the vector value may be a two-dimensional vector value, which may be represented by (x, y), and in general, when a certain vector value in the limb direction vector diagram is (0,0), it indicates that there is no limb of the kind corresponding to the limb direction vector diagram at the position point where the vector value is located.
As described above, the limb direction vector map includes a vector value for indicating the direction of the limb of the corresponding category at the position point of the vector value, and the limb direction vector map corresponding to the left forearm limb is taken as an example, and the vector value included in the limb direction vector map is used for indicating the direction of the left forearm limb at the position point of the vector value.
After describing the set of the keypoint probability map and the set of the limb direction vector map, the embodiment of the present application will briefly describe the network structure of the keypoint identification network.
Referring to fig. 4, the keypoint identification network may include a first identification subnetwork w1 and a cascade of n second identification subnetworks w2, n being a positive integer greater than 1.
The first recognition subnetwork w1 is used for feature extraction of the target person image and outputting a feature map. The feature map is essentially a matrix whose matrix elements are the features of the image of the target person extracted by the first recognition subnetwork w 1.
Optionally, the first recognition subnetwork w1 may be a Convolutional Neural Network (CNN), for example, the first recognition subnetwork w1 may be a mobilene v2 network.
The input to the 1 st of the n second identification subnetworks w2 may be: the feature map output by the first recognition subnetwork w1, the 1 st second recognition subnetwork may perform recognition computation on the feature map, and output a 1 st candidate keypoint probability map set and a 1 st candidate limb direction vector map set, and optionally, the recognition computation described above may be convolution computation.
The input to the kth second recognition subnetwork of the n second recognition subnetworks w2 may be: the feature map output by the first sub-network w1, the k-1 th candidate keypoint probability map set, and the k-1 st candidate limb direction vector map set, and the kth second recognition sub-network may perform recognition computation on the feature map output by the first sub-network w1, the k-1 th candidate keypoint probability map set, and the k-1 st candidate limb direction vector map set, and output the kth candidate keypoint probability map set and the kth candidate limb direction vector map set, where k is a positive integer greater than or equal to 1 and less than or equal to n, and optionally, the recognition computation described above may be convolution computation.
Optionally, the second recognition network w2 may also be a convolutional neural network.
Fig. 5 is a schematic diagram of a network structure of an exemplary 1 st second recognition sub-network, and as shown in fig. 5, the 1 st second recognition sub-network includes two branches, inputs of the two branches are both feature maps output by the first recognition sub-network w1, and outputs of the two branches are a 1 st candidate keypoint probability map set and a 1 st candidate limb direction vector map set, respectively, where each branch includes 3 × 3 convolutional layers and 2 1 × 1 convolutional layers.
Fig. 6 is a schematic diagram of a network structure of an exemplary kth second recognition subnetwork, which, as shown in fig. 6, includes two branches, whose inputs are a feature map output by the first recognition subnetwork w1, a kth-1 candidate keypoint probability map set, and a kth-1 candidate limb direction vector map set, and whose outputs are the kth candidate keypoint probability map set and the kth candidate limb direction vector map set, respectively, where each branch includes 5 × 7 convolutional layers and 2 1 × 1 convolutional layers, respectively.
Next, the embodiment of the present application will briefly describe the technical process of step a in conjunction with the network structure of the key point identification network.
The terminal inputs the target character image into a first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target character image, then the terminal inputs the feature map into n cascaded second recognition sub-networks, carries out recognition calculation on the ith input map through the ith second recognition sub-network, outputs an ith candidate key point probability map set and an ith candidate limb direction vector map set, and respectively uses the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second recognition sub-network as a key point probability map set and a limb direction vector map set finally output by the key point recognition network.
When i is equal to 1, the ith input map is a feature map output by the first recognition sub-network, and when 1 is more than i and less than or equal to n, the ith input map is a feature map output by the first recognition sub-network, an i-1 th candidate keypoint probability map set and an i-1 th candidate body direction vector map set.
And step B, the terminal identifies the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.
Wherein, step B may comprise the following substeps:
and a substep b1, for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point by the terminal.
Wherein the probability value at each candidate location point is the largest of the plurality of location points that are adjacent to the candidate location point.
Optionally, the terminal may perform maximum pooling operation on the key point probability map to obtain a pooled probability map, where the pooled probability map includes a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one manner. The terminal may then determine a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value. Then, the terminal may determine the location point where the target probability value is located as a candidate location point corresponding to the human skeleton key point.
b2, the terminal acquires a plurality of position point sets.
Each position point set comprises m candidate position points, the types of the human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points.
As in the above example, assuming that there are 3 kinds of human skeletal keypoints, where the 3 kinds of human skeletal keypoints are the left elbow keypoint, the left wrist keypoint, and the left shoulder keypoint, respectively, each position point set may include 3 candidate position points, where the 3 candidate position points correspond to the left elbow keypoint, the left wrist keypoint, and the left shoulder keypoint, respectively.
b3, the terminal determines a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determines human skeleton key points in the target character image according to each candidate position point included in the target position point set.
1. For each position point set, the terminal determines a connecting line between each candidate position point included in the position point set as a candidate limb to obtain a candidate limb set.
The candidate limb set can be described by the following mathematical languages:
Figure BDA0002401317100000091
wherein Z is a candidate limb set, j1、j2Representing the types of the human skeleton key points corresponding to the candidate position points, x and y respectively representing the numbers of the candidate position points in the position point set, m is the number of the candidate position points in the position point set,
Figure BDA0002401317100000092
a line connecting the candidate position point with the number x and the candidate position point with the number y, that is, it is a candidate limb.
2. And for each candidate limb of each position point set, the terminal determines a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculates the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram.
In practical application, the key point probability map and the limb direction vector map have the same size, so that the position points in the key point probability map and the position points in the limb direction vector map have a one-to-one correspondence relationship. Based on this one-to-one correspondence, the terminal may calculate a confidence level for the candidate limb. The confidence of the candidate limb is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person.
The confidence of the candidate limb can be calculated according to the following formula:
Figure BDA0002401317100000093
wherein p (u) is a first coordinate of a position point interpolated between candidate position points at both ends of a candidate limb in the keypoint probability map, that is:
P(u)=(1-u)rx+ury
wherein u is generally [0,1 ]]Is sampled at even intervals to obtainxIs the coordinate of the candidate position point at one end of the candidate limb in the key point probability map, ryThe coordinates of the candidate position point at the other end of the candidate limb in the key point probability map.
Lc(P (u)) is the vector value of the position point corresponding to the first coordinate in the target limb direction vector diagram, dxFor the vector values at the position points in the target limb direction vector diagram corresponding to the candidate position point at one end of the candidate limb, dyIs the vector value at the position point corresponding to the candidate position point at the other end of the candidate limb in the target limb direction vector diagram.
3. For each position point set, the terminal calculates the confidence of the position point set according to the confidence of each candidate limb of the position point set.
Optionally, the terminal may superimpose the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.
4. And the terminal determines the target position point set from the plurality of position point sets according to the confidence coefficient of each position point set.
Optionally, the terminal may determine, as the target location point set, a location point set with the highest confidence in the plurality of location point sets.
The confidence of the position point set can be represented by the following formula:
Figure BDA0002401317100000101
the process of determining the target position point set is a process of finding the maximum confidence coefficient, and in the embodiment of the present application, the maximum confidence coefficient can be found by using the hungarian algorithm.
After the target position point set is obtained, the terminal can determine the pixel corresponding to each candidate position point in the target position point set from the target person image according to the corresponding relation between the position point in the key point probability map and the pixel in the target person image, and the determined pixel is used as a human skeleton key point.
Step 2032, the terminal corrects the human posture of the person in the non-ideal character image according to the plurality of human skeleton key points included in the non-ideal character image and the plurality of human skeleton key points included in the ideal character image.
Optionally, the terminal may input the plurality of human skeleton key points included in the non-ideal person image, the plurality of human skeleton key points included in the ideal person image, and the non-ideal person image into a Spatial Transform Network (STN), and perform correction processing on the human posture of the person in the non-ideal person image through the STN.
Please refer to fig. 7, which is a diagram illustrating an exemplary STN network structure. As shown in fig. 7, the STN includes a local Network (english: localization Network), a lattice generator (english: Grid generator), and a Sampler (english: Sampler).
The local network is a parameter predictor which can be a multilayer neural network, the input of the local network is a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image, and the output of the local network is a set of grid generator parameters.
The grid generator is essentially a coordinate mapper of which the grid generator parameters output by the local network are parameters, and can output an image coordinate mapping relationship between the non-ideal character image and a target image to be output, wherein the human body posture of the character in the target image is the human body posture after correction processing. In other words, for each pixel in the non-ideal person image, the mesh generator may map it into the target image.
The coordinate mapping output by the grid generator may be represented in the following mathematical language:
Figure BDA0002401317100000111
wherein the content of the first and second substances,
Figure BDA0002401317100000112
the image coordinates of any pixel in the non-ideal character figure,
Figure BDA0002401317100000113
for the image coordinates of any pixel after mapping to the target image,
Figure BDA0002401317100000114
is the trellis generator parameter.
The sampler can convert the coordinates of each pixel point in the non-ideal character image by utilizing the image coordinate mapping relation output by the grid generator so as to obtain the target image, wherein the target image is the result of the human body posture correction processing.
In the following, the embodiments of the present application will briefly describe the training process of STN:
1. a sufficient number of pairs of character images are captured, wherein each pair of character images includes a character image of a human body posture which is not subjected to correction processing and a character image of a human body posture which is subjected to correction processing (in practice, it is only necessary to ensure that the shooting places and the shooting contents of the two character images are the same, and any one of the pair of character images can be used as the character image of a human body posture which is not subjected to correction processing, so that the data capturing amount can be saved).
2. And identifying the key points of the human skeleton of each acquired person image to obtain the key points of the human skeleton included in each image.
3. The STN network is trained using, as inputs, a person image of a human body posture which is not subjected to correction processing, a human skeleton key point of the person image of the human body posture which is not subjected to correction processing, and a human skeleton key point of the person image of the human body posture which is subjected to correction processing, of the pair of person images, and using the person image of the human body posture which is subjected to correction processing as a real output.
According to the embodiment of the application, the non-ideal human body posture is corrected according to the key points of the human skeleton, so that large-amplitude limb deviation and small-amplitude posture deviation can be adjusted simultaneously, and the human body posture obtained after correction is natural.
Referring to fig. 8, a block diagram of an image processing apparatus 400 according to an embodiment of the present application is shown, where the image processing apparatus 400 may be configured in the server 101 or the terminal 102 shown in fig. 1. As shown in fig. 8, the image processing apparatus 400 may include: a first acquisition module 401, a second acquisition module 402 and a correction module 403.
The first obtaining module 401 is configured to obtain a non-ideal human image, where a human in the non-ideal human image has a non-ideal human posture.
The second obtaining module 402 is configured to obtain an ideal human image, where a human in the ideal human image has an ideal human posture.
The correction module 403 is configured to perform correction processing on the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.
Referring to fig. 9, optionally, in an embodiment of the present application, the correction module 403 may include an identification sub-module 4031 and a correction sub-module 4032.
The identification submodule 4031 is configured to perform human skeleton key point identification on the non-ideal person image and the ideal person image respectively to obtain a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.
The correction sub-module 4032 is configured to correct the human posture of the person in the non-ideal person image according to a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.
In an embodiment of the present application, the identification submodule 4031 is specifically configured to:
inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different kinds of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the human skeleton key point of the corresponding kind is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limb is a human body region between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values; and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.
In an embodiment of the present application, the keypoint identification network includes a first identification subnetwork and n cascaded second identification subnetworks, where n is a positive integer greater than 1, and the identification submodule 4031 is specifically configured to:
inputting the target person image into the first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target person image; inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set; and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.
In an embodiment of the present application, the identification submodule 4031 is specifically configured to:
for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point; acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points; and determining a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determining human skeleton key points in the target human image according to each candidate position point included in the target position point set.
In an embodiment of the present application, the identification submodule 4031 is specifically configured to:
performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one mode; determining a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value; and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.
In an embodiment of the present application, the identification submodule 4031 is specifically configured to:
for each position point set, determining a connecting line between each candidate position point included in the position point set as a candidate limb; for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person; for each position point set, calculating the confidence coefficient of the position point set according to the confidence coefficient of each candidate limb of the position point set; and determining the target position point set from the plurality of position point sets according to the confidence degree of each position point set.
In an embodiment of the present application, the identification submodule 4031 is specifically configured to:
and superposing the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.
In an embodiment of the present application, the identification submodule 4031 is specifically configured to:
and determining the position point set with the highest confidence coefficient in a plurality of position point sets as the target position point set.
In an embodiment of the present application, the syndrome 4032 is specifically configured to:
inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a space transformation network STN, and correcting the human posture of the character in the non-ideal character image through the STN.
In an embodiment of the present application, the STN includes a local network, a trellis generator and a sampler, and the syndrome 4032 is specifically configured to:
obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network; obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing; and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.
The image processing apparatus provided in the embodiment of the present application can implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
For specific limitations of the image processing apparatus, reference may be made to the above limitations of the image processing method, which are not described herein again. The respective modules in the image processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the terminal, and can also be stored in a memory in the terminal in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment of the present application, a computer device is provided, and the computer device may be a terminal or a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor and a memory connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The computer program is executed by a processor to implement an image processing method provided by the embodiment of the application.
Those skilled in the art will appreciate that the architecture shown in fig. 10 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment of the present application, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the following steps when executing the computer program:
acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures; acquiring an ideal character image, wherein characters in the ideal character image have ideal body postures; correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: respectively carrying out human skeleton key point identification on the non-ideal character image and the ideal character image to obtain a plurality of human skeleton key points included in the non-ideal character image and a plurality of human skeleton key points included in the ideal character image; and correcting the human posture of the person in the non-ideal person image according to a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different kinds of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the human skeleton key point of the corresponding kind is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limb is a human body region between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values; and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.
The keypoint identification network comprises a first identification subnetwork and a cascade of n second identification subnetworks, n being a positive integer greater than 1, and in one embodiment of the application, the processor, when executing the computer program, further implements the steps of: inputting the target person image into the first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target person image; inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set; and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point; acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points; and determining a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determining human skeleton key points in the target human image according to each candidate position point included in the target position point set.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one mode; determining a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value; and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: for each position point set, determining a connecting line between each candidate position point included in the position point set as a candidate limb; for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person; for each position point set, calculating the confidence coefficient of the position point set according to the confidence coefficient of each candidate limb of the position point set; and determining the target position point set from the plurality of position point sets according to the confidence degree of each position point set.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: and superposing the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: and determining the position point set with the highest confidence coefficient in a plurality of position point sets as the target position point set.
In one embodiment of the application, the processor when executing the computer program further performs the steps of: inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a space transformation network STN, and correcting the human posture of the character in the non-ideal character image through the STN.
The STN comprises a local network, a trellis generator and a sampler, which when executed by a processor in one embodiment of the application further performs the steps of: obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network; obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing; and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.
The implementation principle and technical effect of the computer device provided by the embodiment of the present application are similar to those of the method embodiment described above, and are not described herein again.
In an embodiment of the application, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of:
acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures; acquiring an ideal character image, wherein characters in the ideal character image have ideal body postures; correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: respectively carrying out human skeleton key point identification on the non-ideal character image and the ideal character image to obtain a plurality of human skeleton key points included in the non-ideal character image and a plurality of human skeleton key points included in the ideal character image; and correcting the human posture of the person in the non-ideal person image according to a plurality of human skeleton key points included in the non-ideal person image and a plurality of human skeleton key points included in the ideal person image.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different kinds of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the human skeleton key point of the corresponding kind is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limb is a human body region between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values; and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.
The keypoint identification network comprises a first identification subnetwork and a concatenation of n second identification subnetworks, n being a positive integer greater than 1, the computer program further realizing the following steps when executed by a processor in one embodiment of the application: inputting the target person image into the first recognition sub-network to obtain a feature map output by the first recognition sub-network after feature extraction is carried out on the target person image; inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set; and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: for each kind of human skeleton key point, determining a plurality of candidate position points corresponding to the human skeleton key point in a key point probability graph corresponding to the human skeleton key point; acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points; and determining a target position point set from the plurality of position point sets according to the limb direction vector diagram set, and determining human skeleton key points in the target human image according to each candidate position point included in the target position point set.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values correspond to a plurality of probability values included in the key point probability map in a one-to-one mode; determining a target probability value from the keypoint probability map, the target probability value being equal to the corresponding pooling probability value; and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: for each position point set, determining a connecting line between each candidate position point included in the position point set as a candidate limb; for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at the two ends of the candidate limb belong to the same person; for each position point set, calculating the confidence coefficient of the position point set according to the confidence coefficient of each candidate limb of the position point set; and determining the target position point set from the plurality of position point sets according to the confidence degree of each position point set.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and superposing the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: and determining the position point set with the highest confidence coefficient in a plurality of position point sets as the target position point set.
In one embodiment of the application, the computer program when executed by the processor further performs the steps of: inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a space transformation network STN, and correcting the human posture of the character in the non-ideal character image through the STN.
The STN comprises a local network, a trellis generator and a sampler, the computer program, when executed by a processor, further implementing the steps of: obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network; obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing; and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.
The implementation principle and technical effect of the computer-readable storage medium provided by this embodiment are similar to those of the above-described method embodiment, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. The non-volatile memory may include read-only memory (RO-many), programmable RO-many (PRO-many), electrically programmable RO-many (EPRO-many), electrically erasable programmable RO-many (EEPRO-many), or flash memory. Volatile memory may include random access memory (RA multi) or external cache memory. By way of illustration and not limitation, RA is available in many forms, such as static RA multiple (SRA multiple), dynamic RA multiple (DRA multiple), synchronous DRA multiple (SDRA multiple), double data rate SDRA multiple (DDRSDRA multiple), enhanced SDRA multiple (ESDRA multiple), synchronous link (Sy multiple chli multiple k) DRA multiple (SLDRA multiple), memory bus (RA multiple bus) direct RA multiple (RDRA multiple), direct memory bus dynamic RA multiple (DRDRA multiple), and memory bus dynamic RA multiple (RDRA multiple).
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (14)

1. An image processing method, characterized in that the method comprises:
acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body postures;
acquiring an ideal character image, wherein characters in the ideal character image have ideal human body postures;
correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.
2. The method according to claim 1, wherein the correction processing of the human posture of the person in the non-ideal person image based on the ideal person image comprises:
respectively carrying out human skeleton key point identification on the non-ideal character image and the ideal character image to obtain a plurality of human skeleton key points included in the non-ideal character image and a plurality of human skeleton key points included in the ideal character image;
and correcting the human body posture of the person in the non-ideal person image according to the plurality of human body skeleton key points included in the non-ideal person image and the plurality of human body skeleton key points included in the ideal person image.
3. The method of claim 2, wherein the identifying the non-ideal human image and the ideal human image for human skeletal key points respectively comprises:
inputting the target character image into a key point recognition network to obtain a key point probability graph set and a limb direction vector graph set output by the key point recognition network, the target person image is the non-ideal person image or the ideal person image, the key point probability map set comprises a plurality of key point probability maps corresponding to different types of human skeleton key points in a one-to-one mode, each key point probability map comprises a probability value used for indicating the probability that the corresponding type of human skeleton key point is located at the position point of the probability value, the set of limb direction vector images comprises a plurality of limb direction vector images corresponding to different kinds of limbs one to one, the limbs are human body regions between associated human body bone key points, and each limb direction vector diagram comprises vector values for indicating the direction of the corresponding kind of limb at the position point of the vector values;
and identifying the human skeleton key points of the target character image according to the key point probability graph set and the limb direction vector graph set.
4. The method of claim 3, wherein the keypoint recognition network comprises a first recognition sub-network and a cascade of n second recognition sub-networks, n being a positive integer greater than 1, and wherein inputting the target person image into the keypoint recognition network results in a set of keypoint probability maps and a set of body orientation vector maps output by the keypoint recognition network, comprises:
inputting the target person image into the first recognition sub-network to obtain a feature map output after the first recognition sub-network performs feature extraction on the target person image;
inputting the feature map into the n second recognition sub-networks, performing recognition calculation on an ith input map through the ith second recognition sub-network, and outputting an ith candidate keypoint probability map set and an ith candidate limb direction vector map set, wherein when i is 1, the ith input map is the feature map, and when i is more than 1 and less than or equal to n, the ith input map is the feature map, the ith-1 candidate keypoint probability map set and the ith-1 candidate limb direction vector map set;
and taking the nth candidate key point probability map set and the nth candidate limb direction vector map set output by the nth second identification subnetwork as the key point probability map set and the limb direction vector map set output by the key point identification network respectively.
5. The method of claim 3, wherein said identifying the target person image for human skeletal keypoints based on said set of keypoint probability maps and said set of extremity orientation vector maps comprises:
for each kind of human skeleton key points, determining a plurality of candidate position points corresponding to the human skeleton key points in a key point probability graph corresponding to the human skeleton key points;
acquiring a plurality of position point sets, wherein each position point set comprises m candidate position points, the types of human skeleton key points corresponding to the candidate position points in each position point set are different, and m is the number of the types of the human skeleton key points;
and determining a target position point set from the plurality of position point sets according to the limb direction vector image set, and determining human skeleton key points in the target character image according to each candidate position point included in the target position point set.
6. The method of claim 5, wherein determining a plurality of candidate location points corresponding to the human bone keypoints in a keypoint probability map corresponding to the human bone keypoints comprises:
performing maximum pooling operation on the key point probability map to obtain a pooled probability map, wherein the pooled probability map comprises a plurality of pooled probability values, and the pooled probability values are in one-to-one correspondence with a plurality of probability values included in the key point probability map;
determining a target probability value from the keypoint probability map, the target probability value being equal to a corresponding pooling probability value;
and determining the position point where the target probability value is located as a candidate position point corresponding to the human skeleton key point.
7. The method of claim 5, wherein said determining a set of target location points from said plurality of sets of location points based on said set of limb direction vector images comprises:
for each position point set, determining a connecting line between candidate position points included in the position point set as a candidate limb;
for each candidate limb of each position point set, determining a target limb direction vector diagram corresponding to the candidate limb from the limb direction vector diagram set according to the limb type of the candidate limb, and calculating the confidence coefficient of the candidate limb according to the vector value in the target limb direction vector diagram, wherein the confidence coefficient is used for indicating the probability that the candidate position points at two ends of the candidate limb belong to the same person;
for each of the location point sets, calculating a confidence of the location point set according to the confidence of each candidate limb of the location point set;
determining the target set of location points from the plurality of sets of location points based on the confidence level for each of the sets of location points.
8. The method of claim 7, wherein the calculating the confidence level for the set of location points based on the confidence level for each candidate limb of the set of location points comprises:
and superposing the confidence degrees of each candidate limb of the position point set to obtain the confidence degree of the position point set.
9. The method of claim 8, wherein determining the target set of location points from the plurality of sets of location points based on the confidence level for each of the sets of location points comprises:
and determining the position point set with the highest confidence coefficient in the plurality of position point sets as the target position point set.
10. The method according to any one of claims 2 to 9, wherein the correcting process of the human posture of the person in the non-ideal human image according to the plurality of human skeleton key points included in the non-ideal human image and the plurality of human skeleton key points included in the ideal human image comprises:
inputting a plurality of human skeleton key points included in the non-ideal character image, a plurality of human skeleton key points included in the ideal character image and the non-ideal character image into a space transformation network STN, and correcting the human posture of the character in the non-ideal character image through the STN.
11. The method of claim 10, wherein the STN comprises a local network, a mesh generator and a sampler, and wherein the performing the correction process on the human body posture of the person in the non-ideal person image through the STN comprises:
obtaining grid generator parameters according to a plurality of human skeleton key points included by the non-ideal character image, a plurality of human skeleton key points included by the ideal character image and the non-ideal character image through the local network;
obtaining an image coordinate mapping relation between the non-ideal character image and a target image to be output by using the grid generator parameters through the grid generator, wherein the human body posture of the character in the target image is the human body posture after the correction processing;
and converting the coordinates of each pixel point in the non-ideal character image by using the image coordinate mapping relation through the sampler to obtain the target image.
12. An image processing apparatus, characterized in that the apparatus comprises:
the first acquisition module is used for acquiring a non-ideal character image, wherein characters in the non-ideal character image have non-ideal human body gestures;
the second acquisition module is used for acquiring an ideal character image, wherein characters in the ideal character image have ideal human body postures;
the correction module is used for correcting the human body posture of the person in the non-ideal person image according to the ideal person image; and the difference between the corrected human body posture and the ideal human body posture is smaller than a preset difference threshold value.
13. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, implements the image processing method of any one of claims 1 to 11.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the image processing method according to any one of claims 1 to 11.
CN202010147639.8A 2020-03-05 2020-03-05 Image processing method, device, equipment and storage medium Active CN111246113B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010147639.8A CN111246113B (en) 2020-03-05 2020-03-05 Image processing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010147639.8A CN111246113B (en) 2020-03-05 2020-03-05 Image processing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111246113A true CN111246113A (en) 2020-06-05
CN111246113B CN111246113B (en) 2022-03-18

Family

ID=70864338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010147639.8A Active CN111246113B (en) 2020-03-05 2020-03-05 Image processing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111246113B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111800574A (en) * 2020-06-23 2020-10-20 维沃移动通信有限公司 Imaging method and device and electronic equipment

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201339986A (en) * 2012-03-30 2013-10-01 Altek Corp Method and device for capturing image
US20160065843A1 (en) * 2014-09-02 2016-03-03 Alibaba Group Holding Limited Method and apparatus for creating photo-taking template database and for providing photo-taking recommendation information
CN105701763A (en) * 2015-12-30 2016-06-22 青岛海信移动通信技术股份有限公司 Method and device for adjusting face image
CN106909892A (en) * 2017-01-24 2017-06-30 珠海市魅族科技有限公司 A kind of image processing method and system
CN107358241A (en) * 2017-06-30 2017-11-17 广东欧珀移动通信有限公司 Image processing method, device, storage medium and electronic equipment
CN108133207A (en) * 2017-11-24 2018-06-08 阿里巴巴集团控股有限公司 The image of auxiliary items closes the method, apparatus and electronic equipment of rule
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108712603A (en) * 2018-04-27 2018-10-26 维沃移动通信有限公司 A kind of image processing method and mobile terminal
CN108710868A (en) * 2018-06-05 2018-10-26 中国石油大学(华东) A kind of human body critical point detection system and method based under complex scene
CN108985132A (en) * 2017-05-31 2018-12-11 腾讯科技(深圳)有限公司 A kind of face image processing process, calculates equipment and storage medium at device
CN108986023A (en) * 2018-08-03 2018-12-11 北京字节跳动网络技术有限公司 Method and apparatus for handling image
CN109145927A (en) * 2017-06-16 2019-01-04 杭州海康威视数字技术股份有限公司 The target identification method and device of a kind of pair of strain image
CN109472795A (en) * 2018-10-29 2019-03-15 三星电子(中国)研发中心 A kind of image edit method and device
CN110045823A (en) * 2019-03-12 2019-07-23 北京邮电大学 A kind of action director's method and apparatus based on motion capture
US20190294871A1 (en) * 2018-03-23 2019-09-26 Microsoft Technology Licensing, Llc Human action data set generation in a machine learning system
WO2019216593A1 (en) * 2018-05-11 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus for pose processing
CN110570383A (en) * 2019-09-25 2019-12-13 北京字节跳动网络技术有限公司 image processing method and device, electronic equipment and storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201339986A (en) * 2012-03-30 2013-10-01 Altek Corp Method and device for capturing image
US20160065843A1 (en) * 2014-09-02 2016-03-03 Alibaba Group Holding Limited Method and apparatus for creating photo-taking template database and for providing photo-taking recommendation information
CN105701763A (en) * 2015-12-30 2016-06-22 青岛海信移动通信技术股份有限公司 Method and device for adjusting face image
CN106909892A (en) * 2017-01-24 2017-06-30 珠海市魅族科技有限公司 A kind of image processing method and system
CN108985132A (en) * 2017-05-31 2018-12-11 腾讯科技(深圳)有限公司 A kind of face image processing process, calculates equipment and storage medium at device
CN109145927A (en) * 2017-06-16 2019-01-04 杭州海康威视数字技术股份有限公司 The target identification method and device of a kind of pair of strain image
CN107358241A (en) * 2017-06-30 2017-11-17 广东欧珀移动通信有限公司 Image processing method, device, storage medium and electronic equipment
CN108133207A (en) * 2017-11-24 2018-06-08 阿里巴巴集团控股有限公司 The image of auxiliary items closes the method, apparatus and electronic equipment of rule
US20190294871A1 (en) * 2018-03-23 2019-09-26 Microsoft Technology Licensing, Llc Human action data set generation in a machine learning system
CN108510435A (en) * 2018-03-28 2018-09-07 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108712603A (en) * 2018-04-27 2018-10-26 维沃移动通信有限公司 A kind of image processing method and mobile terminal
WO2019216593A1 (en) * 2018-05-11 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus for pose processing
CN108710868A (en) * 2018-06-05 2018-10-26 中国石油大学(华东) A kind of human body critical point detection system and method based under complex scene
CN108986023A (en) * 2018-08-03 2018-12-11 北京字节跳动网络技术有限公司 Method and apparatus for handling image
CN109472795A (en) * 2018-10-29 2019-03-15 三星电子(中国)研发中心 A kind of image edit method and device
CN110045823A (en) * 2019-03-12 2019-07-23 北京邮电大学 A kind of action director's method and apparatus based on motion capture
CN110570383A (en) * 2019-09-25 2019-12-13 北京字节跳动网络技术有限公司 image processing method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
廖宇凡: "基于深度学习的人体关键点检测与应用", 《浙江大学硕士论文》 *
许政: "基于深度学习的人体骨架点检测", 《济南大学硕士论文》 *
魏璐: "基于三维形变模型的人脸替换技术研究", 《北京邮电大学硕士论文》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111800574A (en) * 2020-06-23 2020-10-20 维沃移动通信有限公司 Imaging method and device and electronic equipment
CN111800574B (en) * 2020-06-23 2022-06-24 维沃移动通信有限公司 Imaging method and device and electronic equipment

Also Published As

Publication number Publication date
CN111246113B (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN109448090B (en) Image processing method, device, electronic equipment and storage medium
CN110517278B (en) Image segmentation and training method and device of image segmentation network and computer equipment
CN111598993B (en) Three-dimensional data reconstruction method and device based on multi-view imaging technology
CN111476097A (en) Human body posture assessment method and device, computer equipment and storage medium
WO2021237875A1 (en) Hand data recognition method and system based on graph convolutional network, and storage medium
CN111951167B (en) Super-resolution image reconstruction method, super-resolution image reconstruction device, computer equipment and storage medium
CN112911393B (en) Method, device, terminal and storage medium for identifying part
CN110287836B (en) Image classification method and device, computer equipment and storage medium
CN111383232B (en) Matting method, matting device, terminal equipment and computer readable storage medium
CN110751039A (en) Multi-view 3D human body posture estimation method and related device
CN110302524A (en) Limbs training method, device, equipment and storage medium
CN110992243B (en) Intervertebral disc cross-section image construction method, device, computer equipment and storage medium
CN112163479A (en) Motion detection method, motion detection device, computer equipment and computer-readable storage medium
CN110991293A (en) Gesture recognition method and device, computer equipment and storage medium
CN110570373A (en) Distortion correction method and apparatus, computer-readable storage medium, and electronic apparatus
CN111246113B (en) Image processing method, device, equipment and storage medium
CN115984447A (en) Image rendering method, device, equipment and medium
CN110910512A (en) Virtual object self-adaptive adjusting method and device, computer equipment and storage medium
CN116766213B (en) Bionic hand control method, system and equipment based on image processing
CN113658035A (en) Face transformation method, device, equipment, storage medium and product
CN111062362B (en) Face living body detection model, method, device, equipment and storage medium
CN112464860A (en) Gesture recognition method and device, computer equipment and storage medium
CN112417985A (en) Face feature point tracking method, system, electronic equipment and storage medium
CN113421182B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and storage medium
CN113887319A (en) Three-dimensional attitude determination method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant