CN113920282B

CN113920282B - Image processing method and device, computer readable storage medium, and electronic device

Info

Publication number: CN113920282B
Application number: CN202111349936.1A
Authority: CN
Inventors: 晋博
Original assignee: Guangzhou Boguan Information Technology Co Ltd
Current assignee: Guangzhou Boguan Information Technology Co Ltd
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2022-11-04
Anticipated expiration: 2041-11-15
Also published as: CN113920282A

Abstract

The present disclosure relates to an image processing method and apparatus, a computer-readable storage medium, and an electronic device, and relates to the field of computer technologies, wherein the method includes: acquiring a detection key point of a face image to be processed; acquiring a preset three-dimensional model, and acquiring a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points; transforming the three-dimensional model to a screen space according to the world space transformation matrix, and obtaining a minimum circumscribed cube of the three-dimensional model based on coordinates in the screen space; and acquiring a face mesh of a preset face image to be processed, and mapping the face mesh to the minimum circumscribed cube to obtain a three-dimensional deformation special effect of the face image to be processed. The method and the device reduce the complexity of the face deformation of the image and improve the three-dimensional degree of the deformation.

Description

Image processing method and device, computer readable storage medium, and electronic device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, and in particular relates to an image processing method, an image processing device, a computer-readable storage medium and an electronic device.

Background

In the entertainment live broadcast, in order to increase the interaction between a main broadcast and audiences, the atmosphere of a live broadcast room is activated, and various special effects of facial deformation are added in the live broadcast room.

At present, the special effect of face deformation is based on a two-dimensional space, and firstly, a standard model image is prepared, a face area of the standard model image is stretched, a stretched deviant is recorded, and the deviant is applied to a target face to be deformed to realize the special effect of face deformation. The stretching is usually carried out by the following technical methods: moving least square deformation, liquefied deformation, differential deformation, and offset map deformation, etc., wherein the most common image deformation techniques are used when moving least square deformation and liquefied deformation.

However, when the least square deformation is moved, the image is controlled by dragging the control point, so that the selection of the control point is very important, the influence range of the change of one control point is generally very large, and the moving least square deformation with high precision has high complexity and is not suitable for real-time deformation; the liquefaction deformation only needs to control the radius of the circle and the deformation force, is suitable for controlling the local deformation effects of eyes, mouth shapes and the like, and is not suitable for adjusting the face shapes. In an actual application scene, the deformation of the image is controlled by the moving least square deformation and the liquefaction deformation together, so that the complexity of the deformation of the target face is increased.

Therefore, it is desirable to provide a new image processing method.

It is to be noted that the information invented in the above background section is only for enhancing the understanding of the background of the present invention, and therefore, may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide an image processing method, an image processing apparatus, a computer-readable storage medium, and an electronic device, which overcome, at least to some extent, the problem of high complexity of deformation of a face image due to limitations and drawbacks of the related art.

According to an aspect of the present disclosure, there is provided an image processing method including:

acquiring a detection key point of a face image to be processed;

acquiring a preset three-dimensional model, and obtaining a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points;

transforming the three-dimensional model to a screen space according to the world space transformation matrix, and obtaining a minimum circumscribed cube of the three-dimensional model based on coordinates in the screen space;

and acquiring a face mesh of a preset face image to be processed, and mapping the face mesh to the minimum circumscribed cube to obtain a three-dimensional deformation special effect of the face image to be processed.

In an exemplary embodiment of the present disclosure, processing the to-be-processed face image to obtain a detection key point of the to-be-processed face image includes:

and inputting the face image to be processed into a multitask convolution neural network, and obtaining a detection key point of the face image to be processed through the multitask convolution neural network.

In an exemplary embodiment of the present disclosure, obtaining the detection key points of the face image to be processed through the multitask convolutional neural network model includes:

obtaining a key area of the face image to be processed through a face detection model of the multitask convolution neural network;

and inputting the key area into a key point detection model of the multitask convolutional neural network to obtain a detection key point of the face image to be processed.

In an exemplary embodiment of the present disclosure, obtaining, through the three-dimensional model and the detection key point, a world space transformation matrix corresponding to the facial image to be processed includes:

acquiring world coordinates of the three-dimensional model and image coordinates of the detection key points;

obtaining camera coordinates of the three-dimensional model according to camera parameters and the image coordinates of the detection key points;

and obtaining a rotation matrix and a translation matrix according to the camera coordinates and the world coordinates, and combining the rotation matrix and the translation matrix into a world space conversion matrix corresponding to the face image to be processed.

In an exemplary embodiment of the present disclosure, after obtaining a world space transformation matrix corresponding to the facial image to be processed, the image processing method further includes:

acquiring the rotation matrix;

and calculating according to the rotation matrix to obtain the attitude angle of the face image to be processed.

In an exemplary embodiment of the present disclosure, transforming the three-dimensional model into a screen space according to the world space transformation matrix, and obtaining a minimum bounding cube of the three-dimensional model based on coordinates in the screen space includes:

acquiring coordinates and depth values of three-dimensional image pixels included in the screen space;

and obtaining a minimum circumscribed cube of the three-dimensional model according to the maximum depth value and the minimum depth value in the depth values of the three-dimensional image pixels, and the maximum value and the minimum value of the coordinates of the three-dimensional image pixels in the horizontal axis direction and the maximum value and the minimum value of the coordinates of the three-dimensional image pixels in the vertical axis direction.

In an exemplary embodiment of the present disclosure, mapping the face mesh into the minimum bounding cube to obtain a three-dimensional deformation special effect of the to-be-processed face image includes:

acquiring the number of grids included in the face grid of the face image to be processed;

utilizing a cubic spline interpolation method to perform difference on a minimum circumscribed cube of the three-dimensional model, and dividing the minimum circumscribed cube into three-dimensional grids with the same number as the grids included in the face grid;

and mapping the face mesh to the three-dimensional mesh, and inputting the vertex of the three-dimensional mesh into a shader to obtain the three-dimensional deformation special effect of the face image to be processed.

In an exemplary embodiment of the present disclosure, after obtaining the three-dimensional deformation special effect of the facial image to be processed, the image processing method further includes:

and adjusting the angle of the face in the three-dimensional deformation special effect according to the attitude angle.

According to an aspect of the present disclosure, there is provided an image processing apparatus including:

the detection key point generating module is used for acquiring the detection key points of the face image to be processed;

the world space conversion matrix generation module is used for acquiring a preset three-dimensional model and obtaining a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points;

the minimum external cube determining module is used for transforming the three-dimensional model to a screen space according to the world space transformation matrix and obtaining a minimum external cube of the three-dimensional model based on coordinates in the screen space;

and the deformation special effect generation module is used for acquiring a face mesh of a preset face image to be processed, and mapping the face mesh into the minimum circumscribed cube to obtain a three-dimensional deformation special effect of the face image to be processed.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image processing method of any of the above-described exemplary embodiments.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the image processing method of any of the above exemplary embodiments via execution of the executable instructions.

The image processing method provided by the embodiment of the disclosure acquires detection key points of a face image to be processed; acquiring a preset three-dimensional model, and acquiring a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points; transforming the three-dimensional model to a screen space according to the world space transformation matrix, and obtaining a minimum circumscribed cube of the three-dimensional model based on coordinates in the screen space; acquiring a face mesh of a preset face image to be processed, and mapping the face mesh into the minimum circumscribed cube to obtain a three-dimensional deformation special effect of the face image to be processed; on one hand, when the face image is deformed, a preset three-dimensional model is obtained, the three-dimensional model and the face image to be processed are solved to obtain a world space conversion matrix of the face image to be processed, the three-dimensional model is projected to a screen space according to the world space conversion matrix to obtain a minimum external cube of the three-dimensional model, a face mesh of the face image to be processed is mapped into the minimum external cube to obtain a three-dimensional deformation effect of the face image to be processed, the problem that the deformation of the image is controlled by two algorithms of moving least square deformation and liquefied deformation in the prior art is solved, and the complexity of the image deformation is reduced; on the other hand, the face mesh of the face image to be processed is mapped to the minimum circumscribed cube of the three-dimensional model, so that the three-dimensional deformation special effect of the face image to be processed is realized, and the stereoscopic impression of image deformation is increased.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 schematically illustrates a scene diagram of image deformation in a related art according to an example embodiment of the present disclosure.

Fig. 2 schematically illustrates a flow chart of an image processing method according to an example embodiment of the present disclosure.

FIG. 3 schematically shows a block diagram of an image processing system according to an example embodiment of the present disclosure.

Fig. 4 schematically illustrates a flowchart of a method for obtaining detection key points of a face image to be processed through a multitask convolutional neural network according to an exemplary embodiment of the present disclosure.

Fig. 5 schematically illustrates a scene diagram of obtaining detection key points of a face image to be processed through a multitask convolutional neural network according to an exemplary embodiment of the present disclosure.

FIG. 6 schematically shows a schematic diagram of a standard three-dimensional model according to an example embodiment of the present disclosure.

Fig. 7 schematically illustrates a flowchart of a method for deriving a world space transformation matrix from a three-dimensional model and detected keypoints, according to an example embodiment of the present disclosure.

Fig. 8 schematically illustrates a flow chart of a method for transforming a three-dimensional model to a screen space to obtain a minimum bounding cube of the three-dimensional model based on coordinates in the screen space, according to an exemplary embodiment of the present disclosure.

FIG. 9 schematically illustrates a scene schematic of a generated minimal bounding cube, according to an example embodiment of the present disclosure.

FIG. 10 schematically illustrates a schematic diagram of a generated face mesh for a facial image to be processed according to an example embodiment of the present disclosure.

FIG. 11 schematically illustrates a flow diagram of a method for mapping a face mesh into a minimal bounding cube resulting in a three-dimensional morphing effect with a processed face image according to an example embodiment of the present disclosure.

Fig. 12 schematically illustrates a schematic diagram of generating a three-dimensional deformation special effect of a face image to be processed according to an example embodiment of the present disclosure.

Fig. 13 schematically illustrates a block diagram of an image processing apparatus according to an example embodiment of the present disclosure.

Fig. 14 schematically illustrates an electronic device for implementing the above-described image processing method according to an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.

Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the related art, there are many image deformation algorithms, which may include: the moving least square deformation, the liquefied deformation, the differential deformation, the deformation of the offset map, and the like are used, wherein the moving least square deformation and the liquefied deformation are used for comparison.

In the moving least square deformation, the movement of each pixel point in the image to be processed is calculated through the change of the control point, and the method specifically comprises the following steps: firstly, setting control points in an image to be processed, manually dragging the control points to obtain new positions of the control points, then traversing each pixel point in the image to be processed, calculating the distance from each pixel point in the image to be processed to the new positions of all the control points, calculating the weight according to the distance, and finally calculating the change of each pixel point in the image to be processed according to the weight and the change of the positions of the control points. Referring to fig. 1, from sitting to right in fig. 1 are an image to be processed, a control point set in the image to be processed, a control point dragged, and a resulting deformed image, respectively. The method is suitable for carrying out overall deformation on the two-dimensional image, and when high-precision deformation is carried out through moving least squares, all pixel points in the image to be processed need to be subjected to weighted calculation, so that the method is high in complexity and not suitable for real-time deformation.

The liquefaction deformation generally refers to deformation controlled by a circle, and can also be controlled by an ellipse, and the liquefaction deformation is generally divided into central liquefaction and forward liquefaction. The central liquefaction is to control deformation by using a deformation function which diverges or converges towards the circle center, wherein the deformation quantity of the circle center and the circle is 0; the forward liquefaction is to control the deformation by using a function of the deformation of the circle center towards a certain direction, wherein the deformation of the circle center is the largest, and the deformation on the circle is 0. The liquefaction deformation is suitable for controlling local deformation of eyes, mouth shapes and the like of people in the image to be processed, and is not suitable for overall deformation.

In view of one or more of the above problems, the present exemplary embodiment first provides an image 0 processing method, which may be executed in a device terminal, which may include a desktop computer, a portable computer, a smart phone, a tablet computer, and so on; of course, those skilled in the art may also operate the method of the present invention on other platforms as needed, and this is not particularly limited in this exemplary embodiment. Referring to fig. 2, the image processing method may include the steps of:

s210, acquiring detection key points of a face image to be processed;

s220, acquiring a preset three-dimensional model, and obtaining a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points;

step S230, transforming the three-dimensional model to a screen space according to the world space transformation matrix, and obtaining a minimum external cube of the three-dimensional model based on coordinates in the screen space;

and S240, acquiring a face mesh of a preset face image to be processed, and mapping the face mesh to the minimum external cube to obtain a three-dimensional deformation special effect of the face image to be processed.

The image processing method obtains the detection key points of the face image to be processed; acquiring a preset three-dimensional model, and obtaining a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points; transforming the three-dimensional model to a screen space according to the world space transformation matrix, and obtaining a minimum circumscribed cube of the three-dimensional model based on coordinates in the screen space; acquiring a face mesh of a preset face image to be processed, and mapping the face mesh into the minimum circumscribed cube to obtain a three-dimensional deformation special effect of the face image to be processed; on one hand, when the face image is deformed, a preset three-dimensional model is obtained, the three-dimensional model and the face image to be processed are solved to obtain a world space conversion matrix of the face image to be processed, the three-dimensional model is projected to a screen space according to the world space conversion matrix to obtain a minimum external cube of the three-dimensional model, a face grid of the face image to be processed is mapped into the minimum external cube to obtain a three-dimensional deformation effect of the face image to be processed, the problem that the deformation of the image is controlled by two algorithms of mobile least square deformation and liquefied deformation in the prior art is solved, and the complexity of the image deformation is reduced; on the other hand, the face mesh of the face image to be processed is mapped to the minimum external cube of the three-dimensional model, so that the three-dimensional deformation special effect of the face image to be processed is realized, and the stereoscopic impression of image deformation is increased.

Hereinafter, each step involved in the image processing method of the exemplary embodiment of the present disclosure is explained and explained in detail.

First, an application scenario and an object of the exemplary embodiment of the present disclosure are explained and explained. Specifically, the exemplary embodiment of the present disclosure may be used to generate a three-dimensional deformation effect of an image to be processed, and mainly researches how to improve efficiency of generating the deformation effect of the image to be processed and how to improve a stereoscopic impression of the deformation effect. In the disclosure, on the basis of a to-be-processed face image in an acquired video or a live video, firstly, the acquired to-be-processed face image is processed to obtain a detection key point of the to-be-processed face image; then, performing PNP solution through a preset three-dimensional model and the detection key points to obtain a world space transformation matrix and an attitude angle of the face image to be processed; and then, a preset minimum external cube of the three-dimensional model is obtained through the world space conversion matrix, the face image to be processed is mapped into the minimum external cube, the three-dimensional deformation effect of the face image to be processed is obtained, the complexity of the face image deformation is reduced, and the three-dimensional effect of the face image deformation is increased.

Next, the image processing system referred to in the exemplary embodiments of the present disclosure is explained and explained. Referring to FIG. 3, the image processing system may include a face detection module 310, a PnP solution module 320, a three-dimensional model projection module 330, and a face special effects generation module 340. The face detection module 310 is configured to obtain a to-be-processed face image in a video or a live broadcast, and detect the to-be-processed face image through an MTCNN (Multi-task convolutional neural network) to obtain a detection key point of the to-be-processed face image; the PnP solving module 320 is connected to the face detection module 310 through a network, and is configured to obtain a preset three-dimensional model, obtain world coordinates of the three-dimensional model, and perform PnP solving on the three-dimensional model and detection key points of the face image to be processed based on the world coordinates to obtain a world space transformation matrix of the face image to be processed and a pose angle of the face image to be processed; the three-dimensional model projection module 330 is in network connection with the PnP solution module 320, and is configured to project the three-dimensional model to a screen space according to the world space transformation matrix, and obtain coordinate values of pixels of the projected model in the screen space and a depth value of each pixel, so as to obtain a minimum circumscribed cube of the three-dimensional model; the face special effect generation module 340 is in network connection with the three-dimensional model projection module 330 and is used for acquiring detection key points of the face image to be processed and generating a face mesh corresponding to the face image to be processed according to the coordinates of the detection key points and by an interpolation method; the method comprises the steps of obtaining the number of grids in a face grid, dividing a minimum external cube into three-dimensional grids equal to the number of grids in the face grid according to the number of grids in the face grid, mapping a face image to be processed into the three-dimensional grids, and taking vertexes of the three-dimensional grids as input of a shader to obtain a three-dimensional deformation effect of the face image to be processed.

Hereinafter, steps S210 to S240 will be explained and explained in detail with reference to fig. 3.

In step S210, detection key points of the face image to be processed are acquired.

The face image to be processed may be a two-dimensional image containing a face, or may also be a live video or video containing a face, where the face may be a face of a human or an animal, and the face image to be processed is not specifically limited in this example embodiment. The Face image to be processed may be detected through an MTCNN, where the MTCNN may include a Face detection model and a key point detection model, and the Face detection model may be LFFD (LFFD: a Light and Fast Face Detector for Edge Devices), or may be other Face detection algorithms, and the Face detection model is not specifically limited in this example embodiment.

In this exemplary embodiment, processing the to-be-processed face image to obtain the detection key point of the to-be-processed face image may include:

and inputting the facial image to be processed into a multitask convolution neural network, and obtaining the detection key points of the facial image to be processed through the multitask convolution neural network.

Specifically, the acquired face image to be processed may be input into the multitask convolutional neural network, and the detection key point of the face image to be processed is obtained through the multitask convolutional neural network. Referring to fig. 4, obtaining the detection key points of the facial image to be processed through the multitask convolutional neural network model may include steps S410 and S420:

s410, obtaining a key area of the face image to be processed through a face detection model of the multitask convolutional neural network;

and S420, inputting the key area into a key point detection model of the multitask convolutional neural network to obtain a detection key point of the face image to be processed.

Hereinafter, step S410, step S420 will be explained and explained. Specifically, firstly, inputting a face image to be processed into a face detection model of a multitask convolutional neural network to obtain a key region of the face image to be processed, wherein the key region can be an interested region obtained by counting visual focus data of a user; and then, inputting the key area of the face image to be processed into the key point detection model to obtain the detection key point of the face image to be processed. The number of detection key points included in the face image to be processed obtained through the multitask convolutional neural network is 106. The generated detection key points of the image to be processed can be referred to fig. 5.

In step S220, a preset three-dimensional model is obtained, and a world space transformation matrix corresponding to the to-be-processed face image is obtained through the three-dimensional model and the detection key point.

The preset three-dimensional model is a standard three-dimensional model, wherein the standard three-dimensional model can be shown by referring to fig. 6. The world space transformation matrix of the facial image to be processed may be obtained by means of PnP (passive-n-Point, which is a method for solving motion from a three-dimensional Point to a two-dimensional Point), where there are many ways for solving the PnP problem, which may be direct linear transformation, EPnP, or SDP (semi-definite programming), and the way for solving the PnP problem in this example embodiment is not particularly limited.

In this exemplary embodiment, referring to fig. 7, obtaining a world space transformation matrix corresponding to the to-be-processed face image through the three-dimensional model and the detection key point may include steps S710 to S730:

step S710, acquiring world coordinates of the three-dimensional model and image coordinates of the detection key points;

s720, obtaining camera coordinates of the three-dimensional model according to camera parameters and the image coordinates of the detection key points;

and step 730, obtaining a rotation matrix and a translation matrix according to the camera coordinate and the world coordinate, and combining the rotation matrix and the translation matrix into a world space conversion matrix corresponding to the face image to be processed.

Hereinafter, steps S710 to S730 will be explained and explained. Specifically, when the solution mode of the PnP problem is direct linear transformation, first, preset three-dimensional world coordinates (U, V, W) and image coordinates (x, y) of a detection key point in a face image to be processed are obtained; then, a camera parameter f is acquired _x ，f _y ，c _x ，c _y Wherein f is _x ，f _y Is a focal length, c _x ，c _y As principal point coordinates relative to the imaging plane and in accordance with camera parametersAnd the image coordinates to obtain camera coordinates (X, Y, Z) of the three-dimensional model, wherein the calculation equation is shown in equation (1),

where s is a scaling constant, after the camera coordinates (X, Y, Z) are obtained by equation (1), the world space transformation matrix of the face image to be processed can be obtained by equation (2)

In equation (2), R is a rotation matrix and t is a translation matrix, and equation (2) can be converted into equation (3) to be solved, so as to obtain a world space rotation matrix

Where [ R | t ] is a world space transformation matrix, in the present exemplary embodiment, the world space rotation matrix may be obtained by combining the rotation matrix obtained in equation (2) and the translation matrix, and the world space transformation matrix may also be obtained by equation (3).

After obtaining the world space matrix, the image processing method may further include:

acquiring the rotation matrix;

Specifically, in a video containing a face, the face is transformed in various poses, thus generating pose angles, including: a pitch angle, a yaw angle and a roll angle, wherein the pitch angle is rotated around an X axis, the yaw angle is rotated around a Y axis, the roll angle is rotated around a Z axis, and when a rotation matrix is obtained

After that time, the user can use the device,the attitude angle of the face image to be processed can be generated according to the rotation matrix R, wherein the pitch angle theta _x Is calculated in a manner of referring to equation (4), yaw angle θ _y Is calculated in a manner of referring to equation (5), the roll angle theta _z With reference to equation (6):

θ _x ＝a tan 2(r ₃₂ ,r ₃₃ ) Equation (4)

θ _z ＝a tan 2(r ₂₁ ,r ₁₁ ) Equation (6)

Wherein atan2 is used to calculate the pose angle of the face image from different parameters.

In step S230, the three-dimensional model is transformed to a screen space according to the world space transformation matrix, and a minimum bounding cube of the three-dimensional model is obtained based on coordinates in the screen space.

The space where the three-dimensional model is located is a three-dimensional space, and the screen space is a two-dimensional space. After transforming the three-dimensional model to screen space, each pixel of the three-dimensional image in screen space also includes a depth value.

In this exemplary embodiment, referring to fig. 8, transforming the three-dimensional model into a screen space according to the world space transformation matrix, and obtaining a minimum bounding cube of the three-dimensional model based on coordinates in the screen space may include steps S810 and S820:

step 810, obtaining coordinates and depth values of three-dimensional image pixels included in the screen space;

and S820, obtaining a minimum circumscribed cube of the three-dimensional model according to the maximum depth value and the minimum depth value in the depth values of the three-dimensional image pixels, and the maximum value and the minimum value of the coordinate of the three-dimensional image pixel in the horizontal axis direction and the maximum value and the minimum value of the coordinate of the three-dimensional image pixel in the vertical axis direction.

Hereinafter, steps S810 and S820 will be explained and explained. Specifically, after a world space conversion matrix is obtained, the three-dimensional model is converted into a clipping space through the world space conversion matrix, wherein a view frustum can determine clipping of a rendering primitive of the image, and the view frustum refers to a region in the space and determines the space visible to the camera; after clipping, real projection is needed, and the viewing frustum can be projected to a screen space to obtain three-dimensional image pixels included in the screen space. After the three-dimensional model is projected to the screen space, coordinate values and depth values of three-dimensional image pixels included in the screen space may be acquired; then, a minimum bounding cube of the three-dimensional model is determined according to the obtained coordinate values and depth values of the pixels, where a minimum bounding rectangle of the three-dimensional model may be determined according to a maximum value, i.e., a minimum value, in the horizontal axis, i.e., a maximum value and a minimum value, in the x-axis direction, of the coordinate values of the pixels, according to a maximum value, i.e., a maximum value and a minimum value, in the vertical axis, of the coordinate values of the pixels, i.e., a maximum value and a minimum value, in the y-axis direction, and a maximum value and a minimum value, in the depth values of the pixels, where the generated minimum bounding rectangle may be referred to as shown in fig. 9.

In step S240, a preset face mesh of the face image to be processed is obtained, and the face mesh is mapped to the minimum circumscribed cube, so as to obtain a three-dimensional deformation special effect of the face image to be processed.

The preset face mesh of the face image to be processed is a pre-designed face mesh, a coordinate axis may be established based on the face image to be processed, coordinates of the detection key points are determined, the face mesh is generated by an interpolation method based on the coordinates of the detection key points, as shown in fig. 10, the face region in the face image to be processed may be divided into 36 triangular meshes, and the number of the triangular meshes into which the face region is divided is not specifically limited in this exemplary embodiment.

In this exemplary embodiment, as shown in fig. 11, mapping the face mesh to the minimum bounding cube to obtain a three-dimensional deformation special effect of the face image to be processed may include steps S1110 to S1130:

step S1110, acquiring the number of grids included in the face grid of the face image to be processed;

s1120, interpolating a minimum external cube of the three-dimensional model by utilizing a cubic spline interpolation method, and dividing the minimum external cube into three-dimensional grids with the same number as the grids included in the face grid;

and S1130, mapping the face mesh to the three-dimensional mesh, and inputting the vertex of the three-dimensional mesh into a shader to obtain the three-dimensional deformation special effect of the face image to be processed.

Hereinafter, steps S1110 to S1130 will be explained and explained. Specifically, the number of face meshes of a face region in a face image to be processed is obtained, then, interpolation is performed on a minimum circumscribed cube of the three-dimensional model, and the minimum circumscribed cube is divided into three-dimensional meshes with the number of meshes included in the face meshes being equal. When performing interpolation, a cubic spline interpolation method may be adopted, which uses the function aX ³ +bX ² + cX + d interpolation. Linear interpolation may also be used in the interpolation, but the divided points are not smooth by the linear interpolation, and a severe texture stretching occurs after the image to be processed is mapped to the stereo mesh. After the three-dimensional mesh is obtained, the face mesh can be mapped into the three-dimensional mesh, the vertex of the three-dimensional mesh is used as the input of a shader, the vertex is input into the shader, and the three-dimensional deformation effect of the face image to be processed is obtained through rendering.

When mapping the face mesh into the stereoscopic mesh, vertices in the face mesh may be mapped to corresponding vertices in the stereoscopic mesh, and when mapping the top-left vertex in the face mesh, the vertex may be mapped to a top-left vertex in the stereoscopic mesh.

In this exemplary embodiment, after obtaining the three-dimensional deformation effect of the face image to be processed, the image processing method further includes:

Specifically, after the three-dimensional deformation special effect is generated, in order to ensure that the angle of the face in the three-dimensional deformation special effect is consistent with the angle of the face in the actual video, that is, when the face in the video turns left, the orientation of the face in the three-dimensional deformation special effect also needs to be displayed as turning left, and specifically, when the three-dimensional deformation special effect is implemented, the angle of the face in the three-dimensional deformation special effect can be adjusted by using the attitude angle with the processed face image. The generated three-dimensional deformation special effect can be referred to as fig. 12.

The image processing method provided by the disclosed example embodiment has at least the following advantages: on one hand, generating detection key points of the image to be processed, acquiring a preset three-dimensional model, performing PnP (pseudo-random P) solution on the three-dimensional model and the detection key points to obtain a world space conversion matrix, projecting the three-dimensional model to a screen space according to the world space conversion matrix, obtaining a minimum circumscribed cube of the three-dimensional model, mapping a face mesh of the face image to be processed to the minimum circumscribed cube to obtain a three-dimensional deformation special effect, and reducing the complexity of face deformation in the image; on the other hand, the face mesh of the face image to be processed is mapped into the minimum external cube of the three-dimensional model, so that the three-dimensional deformation special effect of the face image to be processed is realized, and the three-dimensional sense of image deformation is increased; on the other hand, the minimum external cube is divided into three-dimensional grids with the number equal to that of the grids of the face by adopting a cubic spline interpolation method, so that excessive stretching of the face after the face graph to be processed is mapped to the minimum external cube is avoided.

An exemplary embodiment of the present disclosure also provides an image processing apparatus, as shown in fig. 13, which may include: a detection keypoint generation module 1310, a world space transformation matrix generation module 1320, a minimum circumscribed cube determination module 1330, and a deformation special effect generation module 1040. Wherein:

a detection key point generating module 1310, configured to obtain a detection key point of the face image to be processed;

a world space transformation matrix generation module 1320, configured to obtain a preset three-dimensional model, and obtain a world space transformation matrix corresponding to the to-be-processed face image through the three-dimensional model and the detection key point;

a minimum circumscribed cube determining module 1330, configured to transform the three-dimensional model to a screen space according to the world space transformation matrix, and obtain a minimum circumscribed cube of the three-dimensional model based on coordinates in the screen space;

the deformation special effect generating module 1340 is configured to obtain a face mesh of a preset face image to be processed, map the face mesh into the minimum circumscribed cube, and obtain a three-dimensional deformation special effect of the face image to be processed.

The specific details of each module in the image processing apparatus have been described in detail in the corresponding image processing method, and therefore are not described herein again.

In an exemplary embodiment of the present disclosure, obtaining the detection key points of the facial image to be processed through the multitask convolutional neural network model includes:

obtaining a key area of the facial image to be processed through a face detection model of the multitask convolutional neural network;

In an exemplary embodiment of the present disclosure, obtaining, by the three-dimensional model and the detection key point, a world space transformation matrix corresponding to the facial image to be processed includes:

obtaining a camera coordinate of the three-dimensional model according to the camera parameter and the image coordinate of the detection key point;

acquiring the rotation matrix;

In an exemplary embodiment of the present disclosure, mapping the face mesh into the minimum bounding cube to obtain a three-dimensional deformation special effect of the face image to be processed, includes:

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.), or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 1400 according to this embodiment of the invention is described below with reference to fig. 14. The electronic device 1400 shown in fig. 14 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 14, electronic device 1400 is in the form of a general purpose computing device. The components of electronic device 1400 may include, but are not limited to: the at least one processing unit 1410, the at least one memory unit 1420, the bus 1430 that connects the various system components (including the memory unit 1420 and the processing unit 1410), and the display unit 1440.

Wherein the storage unit stores program code that is executable by the processing unit 1410, such that the processing unit 1410 performs steps according to various exemplary embodiments of the present invention described in the above section "exemplary methods" of the present specification. For example, the processing unit 1410 may perform step S210 as shown in fig. 2: acquiring a detection key point of a face image to be processed; s220: acquiring a preset three-dimensional model, and acquiring a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points; s230: transforming the three-dimensional model to a screen space according to the world space transformation matrix, and obtaining a minimum circumscribed cube of the three-dimensional model based on coordinates in the screen space; s240: and acquiring a face mesh of a preset face image to be processed, and mapping the face mesh to the minimum circumscribed cube to obtain a three-dimensional deformation special effect of the face image to be processed.

The memory unit 1420 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 14201 and/or a cache memory unit 14202, and may further include a read only memory unit (ROM) 14203.

Storage unit 1420 may also include a program/utility 14204 having a set (at least one) of program modules 14205, such program modules 14205 including but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1430 may be any bus representing one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1400 may also communicate with one or more external devices 1500 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1400, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1400 to communicate with one or more other computing devices. Such communication can occur via an input/output (I/O) interface 1450. Also, the electronic device 1400 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 1460. As shown, the network adapter 1460 communicates with the other modules of the electronic device 1400 via the bus 1430. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 1400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present invention.

In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary method" of this description, when said program product is run on said terminal device.

According to the program product for realizing the method, the portable compact disc read only memory (CD-ROM) can be adopted, the program code is included, and the program product can be operated on terminal equipment, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described drawings are only schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed, for example, synchronously or asynchronously in multiple modules.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. An image processing method, comprising:

acquiring a detection key point of a face image to be processed;

acquiring a preset three-dimensional model, and acquiring a world space conversion matrix corresponding to the face image to be processed through the three-dimensional model and the detection key points;

acquiring a preset face mesh of a face image to be processed and the number of meshes included in the face mesh of the face image to be processed;

utilizing a cubic spline interpolation method to interpolate a minimum circumscribed cube of the three-dimensional model, and dividing the minimum circumscribed cube into three-dimensional grids with the same number as grids included in the face grid;

2. The image processing method according to claim 1, wherein obtaining the detection key points of the face image to be processed comprises:

3. The image processing method according to claim 2, wherein obtaining the detection key points of the face image to be processed through the multitask convolutional neural network model comprises:

4. The image processing method according to claim 1, wherein obtaining a world space transformation matrix corresponding to the facial image to be processed through the three-dimensional model and the detection key points comprises:

5. The image processing method according to claim 4, wherein after obtaining the world space transformation matrix corresponding to the face image to be processed, the image processing method further comprises:

acquiring the rotation matrix;

and calculating the attitude angle of the face image to be processed according to the rotation matrix.

6. The image processing method according to claim 5, wherein transforming the three-dimensional model to a screen space according to the world space transformation matrix, and deriving a minimum bounding cube of the three-dimensional model based on coordinates in the screen space comprises:

and obtaining a minimum circumscribed cube of the three-dimensional model according to the maximum depth value and the minimum depth value in the depth values of the three-dimensional image pixels, the maximum value and the minimum value of the coordinates of the three-dimensional image pixels in the horizontal axis direction and the maximum value and the minimum value of the coordinates of the three-dimensional image pixels in the vertical axis direction.

7. The image processing method according to claim 6, wherein after obtaining the three-dimensional deformation special effect of the face image to be processed, the image processing method further comprises:

8. An image processing apparatus characterized by comprising:

the system comprises a deformation special effect generation module, a face detection module and a face detection module, wherein the deformation special effect generation module is used for acquiring a preset face mesh of a face image to be processed and the number of meshes included in the face mesh of the face image to be processed;

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the image processing method of any one of claims 1 to 7.

10. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the image processing method of any of claims 1-7 via execution of the executable instructions.