CN115482327A

CN115482327A - Image processing method, image processing device, computer equipment and storage medium

Info

Publication number: CN115482327A
Application number: CN202110669026.5A
Authority: CN
Inventors: 汪晏如
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2022-12-16

Abstract

The application provides an image processing method, an image processing device, computer equipment and a storage medium, wherein the image processing method comprises the following steps: acquiring an image to be processed, and generating a three-dimensional target object model for a target object in the image to be processed; carrying out model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, wherein the target object expansion diagram is a two-dimensional image; performing predictive conversion processing on the target object development image to obtain a virtual object development image of the target object development image, wherein the virtual object development image is a two-dimensional image; carrying out three-dimensional reduction processing on the virtual object development image to obtain a three-dimensional virtual object model; and generating a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model. According to the embodiment of the application, the accuracy of image processing can be improved.

Description

Image processing method, image processing device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.

Background

In recent years, cartoon products attract the favor of people of all ages in the society, and with the development of the fields of three-dimensional image reconstruction and three-dimensional rendering, the cartoon products are more and more widely applied to the fields of virtual anchor, cartoon videos, network games, virtual communities, auxiliary teaching and the like by converting real characters into three-dimensional virtual images.

At present, the three-dimensional virtual image is generated mainly by the manual production of a designer by utilizing three-dimensional production software, therefore, more tools for designing the three-dimensional cartoon image are developed, and the three-dimensional cartoon image of the user psychoscope can be obtained by manually adjusting a three-dimensional cartoon template under the tools. In the existing scheme, a large amount of manpower and time cost are needed for manual manufacturing, and the efficiency is low.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, computer equipment and a storage medium, and can improve the accuracy of image processing.

In one aspect, an embodiment of the present application provides an image processing method, where the method includes:

acquiring an image to be processed, and generating a three-dimensional target object model for a target object in the image to be processed;

carrying out model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, wherein the target object expansion diagram is a two-dimensional image;

performing prediction conversion processing on the target object development image to obtain a virtual object development image of the target object development image, wherein the virtual object development image is a two-dimensional image;

carrying out three-dimensional reduction processing on the virtual object development image to obtain a three-dimensional virtual object model;

and generating a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model.

In one aspect, an embodiment of the present application provides an image processing apparatus, including:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be processed and generating a three-dimensional target object model for a target object in the image to be processed;

the processing unit is used for carrying out model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, and the target object expansion diagram is a two-dimensional image;

the processing unit is used for performing prediction conversion processing on the target object development image to obtain a virtual object development image of the target object development image, wherein the virtual object development image is a two-dimensional image;

the processing unit is used for carrying out three-dimensional reduction processing on the virtual object development diagram to obtain a three-dimensional virtual object model;

and the generating unit is used for generating a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model.

In one aspect, the present application provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the image processing method described above.

In one aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is read and executed by a processor of a computer device, the computer device is caused to execute the image processing method.

In one aspect, embodiments of the present application provide a computer program product or a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the image processing method described above.

In the embodiment of the application, firstly, an image to be processed can be obtained, and a three-dimensional target object model is generated for a target object in the image to be processed; then, model expansion processing can be carried out on the three-dimensional target object model to obtain a target object expansion image of the three-dimensional target object model, wherein the target object expansion image is a two-dimensional image; performing predictive conversion processing on the target object development image to obtain a virtual object development image of the target object development image, wherein the virtual object development image is a two-dimensional image; then, carrying out three-dimensional reduction processing on the virtual object expansion diagram to obtain a three-dimensional virtual object model; and finally, generating a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model. Therefore, compared with the method for manually making the three-dimensional virtual image, the method and the device for making the three-dimensional virtual image can be automatically executed by computer equipment after the image of the image object including the face is shot, and the efficiency for making the three-dimensional virtual image is effectively improved. In addition, the corresponding two-dimensional virtual object development image is obtained by predicting the two-dimensional target object development image obtained after the model development, and then the three-dimensional virtual object model is determined by performing three-dimensional reduction processing on the two-dimensional virtual object development image, so that the image processing process can be reduced from a three-dimensional space to a two-dimensional space, the difficulty and complexity in image processing are reduced, and the accuracy of image processing can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic architecture diagram of an image processing system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

fig. 3a is a schematic diagram of an image to be processed according to an embodiment of the present application;

FIG. 3b is a schematic diagram of a three-dimensional target object model provided by an embodiment of the present application;

FIG. 3c is a schematic diagram of an expanded view of a target object according to an embodiment of the present application;

FIG. 4 is a model diagram of an image processing model provided by an embodiment of the present application;

FIG. 5 is a schematic flow chart diagram illustrating a model training method according to an embodiment of the present disclosure;

FIG. 6 is a schematic image of an initial sample set provided by an embodiment of the present application;

fig. 7a is a schematic structural diagram of a generative countermeasure network provided in an embodiment of the present application;

fig. 7b is a schematic structural diagram of a generating network according to an embodiment of the present application;

fig. 7c is a schematic structural diagram of a discrimination network according to an embodiment of the present application;

FIG. 8 is a rendering schematic diagram of a three-dimensional avatar provided by an embodiment of the present application;

fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The embodiment of the application provides a data processing scheme, which mainly starts from processing image data to obtain a final three-dimensional cartoon image, and the scheme can be particularly used in scenes in which 3D (3D Dimensions) virtual images need to be formed, for example, in the process of video conversation, a user of the video conversation wants to replace real images with the 3D cartoon images, and for example, in some scenes of live webcasts, the 3D cartoon images are needed to be presented to the user watching the live webcasts as the supplement of the anchor. The data processing scheme includes: acquiring an image to be processed through a camera of a mobile phone, a personal computer and other equipment, and generating a three-dimensional target object model for a target object in the image to be processed, which is equivalent to obtaining a real human face three-dimensional model corresponding to the target object such as a human face part; then, model expansion processing can be carried out on the three-dimensional target object model to obtain a two-dimensional target object expansion image corresponding to the three-dimensional target object model, and specifically, the real human face three-dimensional model can be expanded in a UV (horizontal axis and vertical axis of two-dimensional space) expansion mode to obtain a 2D expansion image of the real human face three-dimensional model; after a 2D (2 Dimensions, two-dimensional) expansion image is obtained, performing prediction conversion processing on the target object expansion image to obtain a two-dimensional virtual object expansion image corresponding to the target object expansion image, namely obtaining a virtual face expansion image; after the 2D virtual object development image is obtained, 3D reduction processing is carried out on the 2D virtual object development image to obtain a 3D virtual object model; and finally, generating a 3D virtual image corresponding to the target object in the image to be processed according to the 3D virtual object model. Therefore, in the embodiment of the application, the corresponding 2D virtual object development diagram is predicted by using the 2D target object development diagram obtained after model development, and then the 3D virtual object model is determined by performing three-dimensional reduction processing on the 2D virtual object development diagram, so that the image processing process can be reduced from a 3D space to a 2D space, the difficulty and complexity in image processing are reduced, and the accuracy of image processing can be improved. In addition, the three-dimensional virtual image obtained by the method can be applied to the fields of virtual anchor, animation movie and television, network games, virtual communities, auxiliary teaching and the like.

In the data identification and data processing process, the scheme can be realized through Artificial Intelligence (AI), wherein the AI is a theory, a method, a technology and an application system which simulate, extend and expand human Intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and use the knowledge to acquire an optimal result. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning, automatic driving, intelligent traffic and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. The computer vision technology generally includes image processing, image Recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition), video processing, video semantic understanding, video content/behavior Recognition, three-dimensional object reconstruction, three-dimensional technology, virtual reality, augmented reality, synchronous positioning and map construction, automatic driving, intelligent transportation and other technologies, and also includes common biometric identification technologies such as face Recognition, fingerprint Recognition and the like.

In the embodiment of the application, the expanded image of the real face is converted into the expanded image part of the cartoon face, and the expanded image part can be realized through a model, the model can be obtained by optimizing in a Machine Learning (ML) mode, wherein ML is a multi-field cross subject and relates to multi-subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The method specially studies how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The scheme provided by the embodiment of the application relates to a computer vision technology, a deep learning technology/a machine learning technology and the like belonging to the field of artificial intelligence.

Based on the scheme provided by the embodiment of the application, the generated three-dimensional virtual image can be applied to the fields of animation and movie, network games, virtual communities, auxiliary teaching and the like, for example, the method can be applied to the fields of three-dimensional object reconstruction, three-dimensional virtual image generation, decoration based on the three-dimensional virtual image, expression driving and the like.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an architecture of a data processing system according to an embodiment of the present disclosure. The architecture shown in FIG. 1 is by way of example only, and the architecture diagram of the data processing system may include: server 140 and a smart device cluster, wherein the smart device cluster may include: smart device 110, smart device 120, smart device 130, and so forth. The intelligent device cluster and the server 140 may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein.

The server 140 shown in fig. 1 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), a big data and artificial intelligence platform, and the like.

The smart device 110, the smart device 120, the smart device 130, and the like shown in fig. 1 may be a mobile phone, a tablet computer, a notebook computer, a palm computer, a Mobile Internet Device (MID), a vehicle-mounted device, a roadside device, an aircraft, a wearable device, a smart television, and the like, for example, a smart watch, a smart band, a pedometer, and the like, which have an image processing function.

In one possible implementation, taking the smart device 110 as an example, the smart device 110 acquires an image to be processed, and then the smart device 110 sends the image to be processed to the server 140. The server 140 generates a three-dimensional target object model for the target object in the image to be processed; the server 140 performs model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, wherein the target object expansion diagram is a two-dimensional image; the server 140 performs predictive conversion processing on the target object development image to obtain a virtual object development image of the target object development image, wherein the virtual object development image is a two-dimensional image; the server 140 performs three-dimensional reduction processing on the virtual object development map to obtain a three-dimensional virtual object model.

Finally, the server 140 sends the three-dimensional virtual object model to the smart device 110. The intelligent device 110 generates a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model. Subsequently, the smart device 110 may also perform decoration, expression driving, and the like based on the three-dimensional avatar, or be applied to the fields of animation, video, network games, virtual communities, auxiliary teaching, and the like.

And of course, generating a three-dimensional virtual image corresponding to the target object in the image to be processed. The above steps are not necessarily performed by the server 140, but may be performed by the smart device 110 or any other computer device in the smart device cluster. And generating a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model, which is not necessarily performed by the smart device 110, but may also be performed by the server 140. That is, the above-mentioned related processing flow executed by the server 140 may also be implemented by a smart device such as the smart device 110, and the server 140 is only used for interacting data between the smart devices in the smart device cluster, for example, the architecture shown in fig. 1 corresponds to a video conference scene, in the video conference, the smart device 110 starts from obtaining an image to be processed to finally obtain a three-dimensional virtual object model, and then sends the three-dimensional virtual object model to the server 140 to replace the real user image of the user corresponding to the smart device 110 displayed on other smart devices.

In a possible implementation manner, the data processing system provided in this embodiment may be deployed at a node of a blockchain, for example, the server 140 and each intelligent device included in the intelligent device cluster may be regarded as a node device of the blockchain to jointly form a blockchain network. Therefore, in the embodiment of the application, the image processing flow for generating the three-dimensional virtual image corresponding to the target object in the image to be processed can be executed on the block chain, so that the fairness and the justice of the image processing flow can be ensured, meanwhile, the image processing flow has traceability, and the safety of the image processing flow is improved.

It can be understood that the schematic diagram of the system architecture described in the embodiment of the present application is for more clearly illustrating the technical solution of the embodiment of the present application, and does not form a limitation to the technical solution provided in the embodiment of the present application, and as a person having ordinary skill in the art knows that with the evolution of the system architecture and the occurrence of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

Based on the above analysis, the image processing method of the present application is described below with reference to fig. 2. Referring to fig. 2, fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. The image processing method can be applied to computer equipment, the computer equipment can be intelligent equipment such as vehicle-mounted equipment, a smart phone, a tablet computer, intelligent wearable equipment and the like, and the computer equipment can also be a server. As shown in fig. 2, the image processing method may include steps S210 to S250. Wherein:

s210: and acquiring an image to be processed, and generating a three-dimensional target object model for a target object in the image to be processed. In the embodiment of the application, the image to be processed can comprise a face image, the face image can be a two-dimensional face photo, the face in the face image can be in any posture and any expression, a clear face can be obtained based on the face image, and reliable face features can be obtained through detection. In practical application, the face image may be a front face photograph taken in real time under a good light condition, or the face image may be a face photograph acquired from a social platform or an album, for example, fig. 3a and fig. 3a are schematic diagrams of an image to be processed provided by the embodiment of the present application. Starting from a human face image required by needs, the three-dimensional image reconstructed by the two-dimensional image is more like a target human face and is more realistic. In one embodiment, the target object may be an object including a face, a head, and the like, and other objects such as hands and legs are not processed as the target object. In other embodiments, the whole human body may be treated as a target object. In other embodiments, the method can be not only limited to human, but also can process some animals to obtain cartoon images corresponding to the animals.

The three-dimensional target object model refers to a three-dimensional model generated according to the determined target object, and 2D-to-3D conversion tools can be built in to realize conversion from a 2D image to the three-dimensional model. For example, the target object is a face image, and the three-dimensional target object model may be a three-dimensional face model highly similar to the face image, and a real face may be restored through a large number of model vertices included in the three-dimensional face model. The more the number of model vertices is, the higher the precision of the three-dimensional face model is, as shown in fig. 3b, fig. 3b is a schematic diagram of a three-dimensional target object model provided in the embodiment of the present application. The three-dimensional face model shown in fig. 3b comprises, for example, about 2 ten thousand model vertices. In addition, each model vertex in the three-dimensional face model can be provided with a unique number, and the vertex data corresponding to each model vertex can be position data, and the position data can comprise (x, y, z) coordinates of the model vertex on the three-dimensional target object model. The human faces with different shapes and expressions can be simulated by adjusting the coordinates of the model vertexes, and the shapes and expressions of the three-dimensional human faces can be determined by determining the coordinates of all the model vertexes in the three-dimensional human face model. The whole three-dimensional face model can be divided into a plurality of characteristic areas, such as an eye area, a nose area, a mouth area, a chin area and the like, each characteristic area comprises a plurality of vertexes, and five sense organs with different shapes and five sense organs corresponding to different expressions are simulated based on the vertexes in each characteristic area, wherein the vertexes in the eye area are used for simulating the shape of eyes, and the vertexes in the nose area are used for simulating the shape of a nose.

In one possible implementation, the features of the image to be processed may be extracted based on a 3d Morph Model (3 d face deformation statistical Model) to generate a three-dimensional target object Model. The 3DMM model is a universal three-dimensional face model, can represent a face by using a fixed vertex, and is used for solving the problem of recovering a three-dimensional face image from a two-dimensional face image. The core idea of 3DMM is that faces can be matched one by one in a three-dimensional space and can be obtained by linear addition of orthogonal basis weights of many other faces, that is, each three-dimensional face can be represented in a basis vector space composed of a plurality of face-like deformable templates in a 3DMM database, and solving any three-dimensional face model is actually equivalent to solving the problem of coefficients of each basis vector.

S220: and carrying out model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model.

In the embodiment of the application, the target object development image is a two-dimensional image. Please refer to fig. 3c, fig. 3c is a schematic diagram of an expanded view of a target object according to an embodiment of the present disclosure. The three-dimensional target object model comprises a source model vertex set, the source model vertex set comprises a plurality of source model vertices, the target object expansion graph comprises a source topology point set, the source topology point set comprises a plurality of source topology points, each source topology point is located at a fixed position of the expansion graph, and one source topology point corresponds to one source model vertex. That is to say, the target object expansion diagram in the embodiment of the present application is a fixed topological structure, and the fixed topological structure is adopted between different object expansion diagrams, except that the color value of each source topological point in different object expansion diagrams may be different. The color value corresponding to each source topology point may be, for example, an RGB (Red Green Blue) value, and the color value is obtained by converting a three-dimensional coordinate of a source model vertex corresponding to the source topology point.

Through the method, each source model vertex on the three-dimensional target object model corresponds to a pixel (represented by RGB color value in the embodiment of the application) at a fixed position on the topological UV expansion diagram (namely the target object expansion diagram), and the three-dimensional face data can be well stored as two-dimensional image information due to the corresponding relation, so that the data expression mode is more suitable for being applied to the convolutional neural network.

In a possible implementation manner, the specific process of obtaining the target object expansion diagram of the three-dimensional target object model by performing model expansion processing on the three-dimensional target object model by the computer device may include:

firstly, obtaining difference values between position data of a target source model vertex in a source model vertex set on a three-dimensional target object model and position data of a target source reference vertex, wherein the target source reference vertex is a source reference vertex matched with a target source model vertex in a source reference vertex set included in a three-dimensional object mean value model. The target source model vertex is any source model vertex in the source model vertex set, and the description is given for any source model vertex, and the processing mode for other source model vertices in the application may refer to the processing mode for the target source model vertex, which is not repeated herein. The three-dimensional object mean model is preset, for example, the three-dimensional object mean model may be a mean model obtained by performing an average operation according to a large number of three-dimensional object models, the three-dimensional object mean model may be directly obtained, and the topology structures of the three-dimensional object mean model and the three-dimensional object model are the same, which may mean that the number of vertices and the connection relationship of adjacent points of each vertex are the same, for example, the number of the target source reference vertex in the three-dimensional object mean model is 11, and the position of the target source model vertex in the three-dimensional object model is also the number 11.

The position data may be coordinate data, the coordinate data may be (x, y, z) coordinates, and the difference value may be: and the residual value is obtained after the subtraction between the position data of the target source model vertex on the three-dimensional target object model and the position data of the target source reference vertex. For example, if the position data of the target source model vertex on the three-dimensional target object model is (x 1, y1, z 1) and the position data of the target source reference vertex is (x 0, y0, z 0), the difference value between the position data of the target source model vertex and the position data of the target source reference vertex may be: (x 1-x0, y1-y0, z1-z 0).

And then, determining the color value of the target source topological point corresponding to the vertex of the target source model according to the difference value. The difference value obtained after subtraction is used as a color value of a target source topological point corresponding to the vertex of the target source model, and the difference value can also be calculated according to a scale (proportion) adjustment multiple k to obtain a color value of the target source topological point corresponding to the vertex of the target source model; and carrying out translation processing on the difference value or a value obtained by calculating the difference value according to k, and taking the value after translation processing as the color value of the target source topological point corresponding to the vertex of the target source model.

Through the mode, compared with the situation that the position data of the vertex of the target source model is directly used as the color value of the target source topological point corresponding to the vertex of the target source model, after the color value is obtained based on the difference value, the obtained color value is used as the storage data of the target source model to be carried out and the subsequent related processing of the virtual object model is carried out, the limited range of the color value of the RGB image can be better utilized, the accuracy of the subsequent prediction conversion processing is further ensured, and the more suitable three-dimensional virtual object model is obtained.

In one embodiment, calculating the difference value according to the scale adjustment multiple k comprises: after obtaining the difference value (x 1-x0, y1-y0, z1-z 0), further, determining the color value of the target source topological point according to the scale adjustment multiple k, that is: the color values obtained by k conversion are R = k (x 1-x 0), G = k (y 1-y 0), and B = k (z 1-z 0). K here refers to a value that can be adjusted according to a specific data set during actual calculation, and when a difference value is obtained, which is small (when a difference between a vertex of the target source model in the vertex set of the source model and a reference vertex of the target source is small, or a difference between the three-dimensional target object model and the three-dimensional object mean model is small), k may take a larger value to ensure that the range of color values (0, 255) is better utilized, and when a difference value is obtained, which is large (when a difference between a vertex of the target source model in the vertex set of the source model and a reference vertex of the target source is large, or a difference between the three-dimensional target object model and the three-dimensional object mean model is large), k may take a smaller value, for example, a value larger than 0 and smaller than 1, to further reduce the difference value and utilize the range of color values (0, 255).

In one embodiment, performing the translation processing on the disparity value or the value obtained by calculating the disparity value according to k may refer to performing the translation processing according to a preset value, for example, performing the translation processing according to 128. For example, for the above (x 1-x0, y1-y0, z1-z 0), after the translation processing is performed based on 128, the obtained corresponding color value is (x 1-x0+128, y1-y0+128, z1-z0+ 128); for another example, the final color values obtained by performing the translation processing based on 128 for the initial color values R = k (x 1-x 0), G = k (y 1-y 0), and B = k (z 1-z 0) are: r = k (x 1-x 0) +128, G = k (y 1-y 0) +128, B = k (z 1-z 0) +128.

And finally, generating a target object expansion diagram according to the color value of the target source topological point. In one possible implementation, for each source model vertex in the set of source model vertices, the color value of the corresponding source topology point may be obtained by referring to the above steps. Then, according to the determined color values of the N source topological points, the detailed process of generating the target object expansion diagram may include:

firstly, interpolation processing can be carried out on points in an expansion map area enclosed by N source topological points to obtain color values of the points in the expansion map area; wherein N is an integer greater than 3. Since the target object expansion map is a network topology map formed by a plurality of source topology points, as shown in fig. 3c, an expansion map region enclosed between every three source topology points may be a triangular region. The interpolation processing is to avoid too many holes in the target object development image which is subsequently fed into the image processing model, and if the difference processing is not performed, the image processing model may not easily learn the key point of the image in the processing stage, so the difference processing can improve the learning accuracy of the image processing model in the image processing process. However, the interpolation processing is only for assisting the learning of the image processing model, and in the subsequent "three-dimensional reduction processing is performed on the expanded view of the virtual object to obtain the three-dimensional virtual object model", the value of the interpolation processing may be not required, and only the color value of the topological point may be used.

And then, generating a target object expansion graph according to the color values of the points in each expansion graph area and the color value of each source topological point. The number of the points in each expansion map region obtained after the interpolation processing is not limited, and the points can be correspondingly adjusted according to the quality of the processing function of the image processing model.

S230: and performing prediction conversion processing on the target object development diagram to obtain a virtual object development diagram of the target object development diagram.

In the embodiment of the present application, the virtual object development image is a two-dimensional image. And the image processing model can be called to carry out prediction conversion processing on the target object development graph to obtain a virtual object development graph of the target object development graph. The image processing model may include, among other things, a downsampled convolutional layer and a transposed convolutional layer.

For example, please refer to fig. 4, fig. 4 is a model diagram of an image processing model according to an embodiment of the present disclosure. As shown in fig. 4, the downsampled convolutional layer and the transposed convolutional layer form a "U-shaped" network structure, and the "U-shaped" network structure is an image processing model. The image processing model receives a two-dimensional target object expansion map as an input, and outputs a two-dimensional virtual object expansion map as a target object expansion map.

In specific implementation, the performing a prediction conversion process on the target object expansion diagram to obtain a virtual object expansion diagram of the target object expansion diagram may include: extracting image features of the target object development map based on the downsampled convolution layer; the extracted image features are processed based on the transposed convolution layer to output a virtual object development map of the target object development map.

S240: and carrying out three-dimensional reduction processing on the virtual object development image to obtain a three-dimensional virtual object model.

In the embodiment of the application, the three-dimensional virtual object model can be an anthropomorphic three-dimensional cartoon human face model, generally a three-dimensional model with a small number of model vertexes, and the anthropomorphic three-dimensional cartoon human face can be reconstructed according to the three-dimensional virtual object model.

In addition, the virtual object expansion diagram comprises a conversion topology point set, the conversion topology point set comprises a plurality of conversion topology points, the three-dimensional virtual object model comprises a conversion model vertex set, the conversion model vertex set comprises a plurality of conversion model vertices, each conversion topology point is located at a fixed position, and one conversion topology point corresponds to one conversion model vertex.

In a possible implementation manner, the three-dimensional reduction processing performed on the expanded view of the virtual object by the computer device to obtain a detailed flow of the three-dimensional virtual object model may include:

(1) and acquiring a topological point weighted value between the color value of the target conversion topological point in the conversion topological point set on the virtual object expansion graph and the color value of the target conversion reference topological point, wherein the target conversion reference topological point is a conversion reference topological point matched with the target conversion topological point in the conversion reference topological point set included in the virtual object mean value expansion graph. The target conversion topology point is any conversion topology point in the conversion topology point set, and the description is given for any conversion topology point, and the processing manner of the present application for other conversion topology points may refer to the processing manner for the target conversion topology point, which is not described herein any more. The virtual object mean expansion diagram is preset and may refer to a mean model (which may be called a three-dimensional virtual mean model) obtained by performing average operation according to a large number of three-dimensional virtual object models, and then performing model expansion processing on the three-dimensional virtual mean model to obtain the virtual object mean expansion diagram. Of course, the virtual object mean expansion map can be directly obtained. It should be noted that the detailed process of "performing model expansion processing on the three-dimensional virtual mean model to obtain a virtual object mean expansion diagram" may specifically refer to "performing model expansion processing on a three-dimensional object model to obtain a target object expansion diagram" in step S220 of the present application, which is not described herein again.

The position of the target conversion reference topology point in the virtual object mean expansion map is the same as the position of the target conversion topology point in the virtual object mean expansion map, and only the difference exists in value, for example, the position of the target conversion reference topology point in the virtual object mean expansion map is row 3, column 4, and the position of the target conversion topology point in the virtual object mean expansion map is also row 3, column 4. In addition, the weighted value of the topological point can be obtained by calculating the color value of the target conversion topological point on the virtual object expansion diagram and the color value of the target conversion reference topological point. For example, if the color value of the target conversion topology point on the virtual object expansion graph is (R1, G1, B1), and the color value of the target conversion reference topology point is (R0, G0, B0), the weighted value of the topology point between the color value of the target conversion topology point and the color value of the target conversion reference topology point may be: (R1/k + R0, G1/k + G0, B1/k + B0), k is the scale (ratio) adjustment factor mentioned above.

(2) And taking the weighted value of the topological point as the position data of the vertex of the target conversion model corresponding to the target conversion topological point so as to generate the three-dimensional virtual object model according to the position data of the vertex of the target conversion model.

In a possible implementation manner, the three-dimensional reduction processing is performed on the expanded view of the virtual object to obtain a three-dimensional virtual object model, and the following process may be further included:

(1) and acquiring a common point weighted value between the color value of the target point in the virtual object expansion diagram except the conversion topological point set and the color value of the target conversion reference point. The virtual object development graph comprises a plurality of conversion topological points and a plurality of common points, and the target point is any one of the common points. The plurality of common points may be some points between two conversion topology points, for example, the conversion topology point 1 and the conversion topology point 2 are connected by a fixed topology straight line in the virtual object development diagram, and the common point may be a middle point of a straight line connecting the conversion topology point 1 and the conversion topology point 2, or a point located at an upper third of the straight line, a point located at a fourth of the straight line, and the like. Similarly, the virtual object mean expansion diagram further comprises a plurality of conversion reference points besides the plurality of conversion reference topological points, and the target conversion reference point is any one of the plurality of conversion reference points. And, the conversion reference points correspond to the common points one by one. That is, if one common point is the middle point of a straight line connecting conversion topology point 1 and conversion topology point 2, the conversion reference point is also the middle point of a straight line connecting conversion reference point 1 and conversion reference point 2.

(2) And generating a three-dimensional virtual object model according to the position data of the vertex of the target conversion model and the weighted value of each common point.

In the embodiment of the application, the common point weighted value is introduced to enable the image processing model to perform network learning according to a plurality of conversion topology points and perform auxiliary learning according to a plurality of common points in the process of processing the virtual object development graph, so that the learning accuracy of the image processing model in the image processing process can be improved.

S250: and generating a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model.

In the embodiment of the application, after the three-dimensional virtual object model is obtained, a rendering engine can be called to obtain the corresponding three-dimensional virtual image by performing image rendering on the three-dimensional virtual object model.

The generated three-dimensional virtual image can be applied to various application scenes suitable for the virtual image by using the image processing method provided by the embodiment of the application, such as news broadcasting, weather forecast, virtual assistants, intelligent customer service, game commentary, game scenes, product release meetings and other scenes, so that the three-dimensional virtual image as the face of the user can be constructed, and the requirement of the user on more individuation can be met. Taking a virtual assistant scene as an example, a user can send a picture of the user to computer equipment, the computer equipment can generate a three-dimensional cartoon face similar to the face in the picture, and the three-dimensional cartoon face is used as a virtual assistant, so that the sense of intimacy brought by the virtual assistant to the user is increased. Taking a game scene as an example, a three-dimensional cartoon face similar to a user can be generated based on a photo of the user, so that each user in the game can have a unique three-dimensional virtual image with identification, and the personalized requirements of the user can be met. Of course, the method provided in the embodiment of the present application is not limited to be used in the above application scenario, and may also be used in other possible application scenarios, and the embodiment of the present application is not limited.

In the embodiment of the application, firstly, an image to be processed can be obtained, and a three-dimensional target object model is generated for a target object in the image to be processed; then, model expansion processing can be carried out on the three-dimensional target object model to obtain a target object expansion image of the three-dimensional target object model, wherein the target object expansion image is a two-dimensional image; performing prediction conversion processing on the target object development image to obtain a virtual object development image of the target object development image, wherein the virtual object development image is a two-dimensional image; then, carrying out three-dimensional reduction processing on the virtual object expansion diagram to obtain a three-dimensional virtual object model; and finally, generating a three-dimensional virtual image corresponding to the target object in the image to be processed according to the three-dimensional virtual object model. Therefore, compared with the method for manually making the three-dimensional virtual image, the method for making the three-dimensional virtual image can be automatically executed by the computer equipment after the image of the image object including the face is shot, and the efficiency for making the three-dimensional virtual image is effectively improved. In addition, the corresponding two-dimensional virtual object development image is obtained by predicting the two-dimensional target object development image obtained after the model development, and then the three-dimensional virtual object model is determined by performing three-dimensional reduction processing on the two-dimensional virtual object development image, so that the image processing process can be reduced from a three-dimensional space to a two-dimensional space, the difficulty and the complexity in image processing are reduced, and the accuracy of image processing can be improved.

Based on the above analysis, please refer to fig. 5, and fig. 5 is a flowchart illustrating a model training method according to an embodiment of the present disclosure. The embodiment in fig. 5 may be a specific embodiment in step S230 in the embodiment in fig. 2, and the model training method may be applied to a computer device, where the computer device may be, for example, an in-vehicle device, a smart phone, a tablet computer, a smart wearable device, and the like, and the computer device may also be a server. As shown in fig. 5, the image processing method may include steps S510 to S530. Wherein:

s510: an initial sample set is obtained.

In this embodiment, the initial sample set may include: and the plurality of sample pairs are formed by the three-dimensional object training model obtained by making and the three-dimensional virtual labeling model which is obtained by making and is used as training supervision data. The three-dimensional object training model may be a three-dimensional target object model generated for the collected user diagram after the user diagram is collected on the artificial line. The user image can comprise a face image, the face image can be a two-dimensional face photo, the face in the face image can be in any posture and any expression, and only the face is required to be clearly visible, so that reliable face features can be detected. The way of manually collecting the user graph on line can be as follows: the real-time shooting may also be obtained from a gallery or an album, which is not specifically limited in this application. In addition, the three-dimensional virtual labeling model can be manually manufactured on the basis of the three-dimensional object training model by utilizing some three-dimensional animation rendering and manufacturing software. And each sample pair is composed of a three-dimensional object training model and a corresponding three-dimensional virtual labeling model.

For example, please refer to fig. 6, fig. 6 is a schematic image diagram of an initial sample set according to an embodiment of the present application. As shown in fig. 6, the initial sample set may include 4 sample pairs, for example, a sample pair 1 may be composed of a three-dimensional object training model 1 and a three-dimensional virtual labeling model 1; the sample pair 2 may be composed of a three-dimensional object training model 2 and a three-dimensional virtual labeling model 2, and so on. Each three-dimensional object training model may be generated according to an acquired user map, where the user map may be a face image acquired by a mobile phone or a camera. As shown in fig. 6, the three-dimensional object training model 1 may be generated according to the collected user fig. 1 and extracting features of a face image included in the user fig. 1; the three-dimensional object training model 2 may be generated by collecting the user image 2 and extracting features of a face image included in the user image 2. Similarly, a three-dimensional object training model 3 and a three-dimensional object training model 4 may also be generated. Of course, the number and schematic diagrams of the user graphs included in the initial sample set in the embodiment of the present application are only examples, and in practical applications, more user graphs may be collected as needed, so as to generate a corresponding training model of the obtained object. In addition, the user maps that generate the three-dimensional object training models in the initial sample set may be user maps that have the same collected gender ratio, such as collecting half of the face maps of male users and half of the face maps of female users. Of course, the user maps may not be collected according to the same gender ratio, which is not specifically limited in the present application. It should be noted that, in consideration of the fact that a large number of data samples (i.e., sample pairs) are collected, it takes a large time cost and a large labor cost, and therefore the initial sample set in the embodiment of the present application contains a small number of data samples (i.e., sample pairs).

S520: and preprocessing the initial sample set to determine a sample data set.

In a possible implementation manner, in this embodiment of the present application, the preprocessing performed on the initial sample set may include the following steps:

(1) and performing first average operation on a plurality of three-dimensional object training models in the initial sample set, and determining a three-dimensional object mean model based on the first average operation. The three-dimensional object training models in the initial sample set are identical in topological structure, namely the number of the vertexes and the connection relation of the adjacent points of each vertex are identical. The first averaging operation refers to averaging multiple model vertices at the same position in multiple three-dimensional object training models, for example, an initial sample set includes 5 three-dimensional object training models, which are respectively denoted as: q1, Q2, Q3, Q4, Q5. Then, the position data (x 1, y1, z 1) of the model vertex m1 at the position 1 in Q1, the model vertex m2 (x 2, y2, z 2) at the position 1 in Q2, the model vertex m3 (x 3, y3, z 3) at the position 1 in Q3, the model vertex m4 (x 4, y4, z 4) at the position 1 in Q4, and the model vertex m5 (x 5, y5, z 5) at the position 1 in Q5 may be subjected to weighted average operation, and the obtained model vertex m0 ((x 1+ x2+ x3+ x4+ x 5)/5, (y 1+ y2+ y3+ y4+ y 5)/5, (z 1+ z2+ z3+ z4+ z 5)/5) of the three-dimensional object mean model at the position 1 may be obtained.

(2) And performing second average operation on the plurality of three-dimensional virtual labeling models in the initial sample set, and determining the three-dimensional virtual mean value model based on the second average operation. Similarly, the topology structures of the plurality of three-dimensional virtual labeling models in the initial sample set are also the same, and the second averaging operation is performed on a plurality of model vertices at the same position in the plurality of three-dimensional virtual labeling models, for example, 5 three-dimensional virtual labeling models in the initial sample set are respectively recorded as: p1, P2, P3, P4, P5. Then, the position data (X1 ', Y1', Z1 ') of the model vertex n1 at the position 1 in P1, the model vertex n2 (X2 ', Y2', Z2 ') at the position 1 in P2, the model vertex n3 (X3 ', Y3', Z3 ') at the position 1 in P3, the model vertex n4 (X4 ', Y4', Z4 ') at the position 1 in P4, and the model vertex n5 (X5 ', Y5', Z5 ') at the position 1 in P5 may be subjected to weighted average operation, and the obtained model vertex n0 (X, Y, Z) at the position 1 of the three-dimensional object mean model may be obtained, where: x = (X1 ' + X2' + X3' + X4' + X5 ')/5;

Y＝(y1’+y2’+y3’+y4’+y5’)/5；

Z＝(z1’+z2’+z3’+z4’+z5’)/5。

further, after the three-dimensional virtual mean model and the three-dimensional object mean model are obtained through calculation. And further preprocessing can be carried out according to a plurality of three-dimensional virtual labeling models and a plurality of three-dimensional object training models in the initial sample set. For example, the target three-dimensional object training model in the sample data set may be determined according to a difference value between the target three-dimensional object training model in the initial sample set and the three-dimensional object mean model. For another example, the target three-dimensional labeling pair virtual model in the sample data set may also be determined according to a difference value between the target three-dimensional virtual labeling model in the initial sample set and the three-dimensional virtual mean value model. That is, a residual value (obtained by subtraction) between each three-dimensional object training model in the initial sample set and the three-dimensional object mean model is stored as the three-dimensional object training model in the sample data set; and storing the three-dimensional virtual labeling model in the sample data set according to a residual value (obtained by subtraction) between each three-dimensional virtual labeling model in the initial sample set and the three-dimensional virtual mean value model. By the mode, residual values between the models used in storage can reduce the complexity of operation in the subsequent model training process, so that the model training efficiency is improved.

In one possible implementation, the preprocessing performed on the initial sample set may further include:

(1) performing first weighting operation on a first model in the three-dimensional object training model included in the initial sample set and a second model in the three-dimensional object training model included in the initial sample set, and obtaining an expanded three-dimensional object training model through the first weighting operation; in the first weighting operation, the first model uses a first weighting coefficient, and the second model uses a second weighting coefficient;

(2) performing second weighting operation on a three-dimensional virtual labeling model corresponding to a first model in the three-dimensional virtual labeling models included in the initial sample set and a three-dimensional virtual labeling model corresponding to a second model in the three-dimensional virtual labeling models included in the initial sample set, and obtaining an expanded three-dimensional virtual labeling model through the second weighting operation; in the second weighting operation, the three-dimensional virtual labeling model corresponding to the first model uses a first weighting coefficient, and the three-dimensional virtual labeling model corresponding to the second model uses a second weighting coefficient;

then, a sample pair formed by the extended three-dimensional object training model and the extended three-dimensional virtual annotation model, which form the extended sample pair, is stored in a sample data set.

For example, assume that a first model in the three-dimensional object training model is q1 and a second model in the three-dimensional object training model is q2; the three-dimensional virtual labeling model corresponding to the first model is p1, and the three-dimensional virtual labeling model corresponding to the second model is p2. The extended three-dimensional object training model q3 may then be: q3=0.9q1+0.1q2, and the expanded three-dimensional virtual labeling model p3 may be: p3=0.9p1+0.1p2. Then q3 and p3 may be stored in the sample data set as an extended sample pair. In the embodiment of the application, the three-dimensional object training model and the three-dimensional virtual labeling model in the sample pair are randomly combined only in a proper parameter range, so that the sample pair is expanded, and the sample data set is enriched.

The above-mentioned first weight and second weight are only examples, in the embodiment of the present application, when the method is used for sample expansion of existing three-dimensional object training models and three-dimensional virtual labeling models from the initial sample set, the weights used in the method are vectors with length of the number of bases, for example, the initial five-based sample pairs, the weight vector used in the recombination to obtain the expanded sample pair is [0.3,0.1, 0.2,0.3], the first model in the three-dimensional object training model is assumed to be q1, the second model in the three-dimensional object training model is assumed to be q2, and so on, the third model q3, the fourth model q4 and the fifth model q5 are obtained; and the three-dimensional virtual labeling model corresponding to the first model is p1, the three-dimensional virtual labeling model corresponding to the second model is p2, and so on, the obtained three-dimensional virtual labeling model corresponding to the third model is p3, the obtained three-dimensional virtual labeling model corresponding to the fourth model is p4, and the obtained three-dimensional virtual labeling model corresponding to the fifth model is p5. Then, the extended three-dimensional object training model q6 may be: q6=0.3q1+0.1q2+0.1q3+0.2q4+0.3q5, and the expanded three-dimensional virtual labeling model p6 may be: p6=0.3p1+0.1p2+0.1p3+0.2p4+0.3p5. Then q6 and p6 may be stored in the sample data set as an extended sample pair. In the embodiment of the application, the three-dimensional object training model and the three-dimensional virtual labeling model in the sample pair are randomly combined only in a proper parameter range, so that the sample pair is expanded, and the sample data set is enriched. In an embodiment, the weight vectors may be randomly combined, for example, after the above random combination of [0.3,0.1, 0.2,0.3], a new extended sample pair may be obtained, and the weight vectors may also take values randomly, and it is found through research that when the value of the weight vector is between [0.1,0.8], the extended sample pair obtained through calculation is more suitable for optimizing the neural network mentioned below, so as to obtain a better image processing model.

By the mode, based on a small number of sample pairs, a proper parameter range is set to randomly combine the samples, so that more diversified extended sample pairs are obtained, and sample data in a sample data set is enriched.

S530: and optimizing the neural network by using the sample data set to obtain an image processing model.

In this embodiment, the neural Network may be a countermeasure Network, the countermeasure Network may be a GAN (generic adaptive Network, generative countermeasure) Network, and the GAN Network may include a pix2pix Network architecture (countermeasure Network architecture) and a pix2pixHD Network architecture, for example, fig. 7a is a schematic structural diagram of a Generative countermeasure Network provided in this embodiment. In particular implementations, the neural network may include a generating network and a discriminating network. Referring to fig. 7b, fig. 7b is a schematic structural diagram of a generation network according to an embodiment of the present disclosure. As shown in fig. 7b, the generation network is a U-network structure composed of a downsampled convolutional layer and a transposed convolutional layer, and the inputs of the generation network are: the object training expansion diagram of the three-dimensional object training model in the sample pair included in the sample data set generates the output of the network as follows: and generating a virtual object training development graph obtained by network prediction. Please refer to fig. 7c, fig. 7c is a schematic structural diagram of a discrimination network according to an embodiment of the present disclosure. As shown in fig. 7c, the inputs include a virtual object training development and a virtual object labeling development output by the generating network, and it is understood that the illustration in fig. 7c is only for example and is only used to illustrate the inputs defined for the discriminant network. The discriminating network is composed of down-sampling convolutional layers, and the inputs of the discriminating network are: and generating a virtual object labeling expansion diagram of a three-dimensional virtual labeling model in a sample pair included in the sample data set, generating a virtual object training expansion diagram obtained by network prediction, and performing parameter optimization updating on the generated network and the judgment network based on the result of judgment network calculation. In fig. 7c, there are some differences between the virtual object training expansion diagram and the virtual object labeling expansion diagram, if the network prediction is generated to obtain a more accurate virtual object training expansion diagram, the difference between the virtual object training expansion diagram and the virtual object labeling expansion diagram is very small, and if the network prediction is generated to obtain a deviation between the virtual object training expansion diagram and the virtual object labeling expansion diagram, the difference between the virtual object training expansion diagram and the virtual object labeling expansion diagram is very large.

Wherein the loss function of the generator network is:

Loss _G ＝log(1-D(s，G(s)))+||G(s)-I _gt || ₁

the penalty function for the discriminator network is:

Loss _D ＝logD(s，g)+log(1-D(s，G(s)))

wherein, (s, G) is a set of true samples, s is an input expansion diagram of real three-dimensional face data, that is, an object training expansion diagram of a three-dimensional object training model in a sample pair included in a sample data set in the present application, G is a ground truth (true value) corresponding to a predicted value, that is, a virtual object labeling expansion diagram of a three-dimensional virtual labeling model in a sample pair included in a sample data set in the present application, G(s) is a virtual object training expansion diagram obtained by generating network prediction, (s, G (s)) is a set of false samples obtained by generating a network, and D (s, G (s)) are results obtained by distinguishing the network from the set of true samples and the set of false samples respectively.

Thus, with the above description, 8000 pairs of training data (i.e., sample pairs) can be used to optimize the neural network during training, and the resulting opposing losses (including those resulting from the resulting network and those resulting from the discriminating network) can be used to perform constraint training on the neural network. In one possible implementation, the generation network in the trained neural network model is used as the image processing model mentioned in the present application. In addition, in the training process, the three-dimensional object mean model, the three-dimensional virtual mean model, and the like may be stored in the neural network model, so that the three-dimensional object mean model, the three-dimensional virtual mean model, and the like may be directly obtained when image processing is subsequently performed using a generation network (i.e., an image processing model) in the trained neural network model.

For example, with the image processing model trained in the embodiment of the present application, the generated three-dimensional avatar can be applied to various application scenes suitable for avatars by calling the trained image processing model, such as news broadcasts, weather forecasts, virtual assistants, intelligent customer service, game commentary, game scenes, product releases and other scenes, so as to construct a three-dimensional avatar that is the same as the face of the user, and meet the more personalized requirements of the user. Of course, decoration, expression driving, and the like may also be performed based on the three-dimensional avatar. As shown in fig. 8, fig. 8 is a rendering schematic diagram of a three-dimensional avatar provided in an embodiment of the present application. And further determining a three-dimensional virtual image rendering map and an expression driving rendering map based on the three-dimensional virtual image generated by the image processing method provided by the application and the user map.

It should be noted that the related user diagrams, three-dimensional object training models, three-dimensional virtual labeling models, object training expanded diagrams, virtual object labeling expanded diagrams, and final cartoon diagrams provided in fig. 3a, 3b, 3c, 6, 7a, 7b, 7c, and 8 are only schematic examples. In practical cases, for example, fig. 3a, fig. 6 of the user fig. 1 to fig. 4 of the user are all color images obtained by shooting, picture loading, and the like, and in each developed image, there may be a color image composed of a large number of RGB values, and different colors represent different position points on the corresponding three-dimensional model. And the resulting three-dimensional virtual object model may also be a colorful cartoon image rather than a simple black-and-white image as in fig. 8, for example.

In the embodiment of the application, aiming at the defect that the generation of the current three-dimensional virtual object model needs a large amount of manpower or a large amount of database resources, the method for learning the style deformation relationship from the real three-dimensional target object model to the three-dimensional virtual object model based on the GAN network is provided, and the method can be realized only by using a small amount of samples, so that the cost can be saved, and the efficiency of model training is improved. The whole scheme has strong feasibility, low cost and good three-dimensional deformation style migration effect. In addition, deformation of other three-dimensional model styles, such as human faces to animal faces, can be realized only by replacing the sample data set.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure. The image processing apparatus 900 may be applied to a computer device in the method embodiment corresponding to fig. 2 or fig. 5. The image processing apparatus 900 may be a computer program (comprising program code) running in a lightweight node, e.g. the image processing apparatus 900 is an application software; the apparatus may be configured to perform corresponding steps in the methods provided in the embodiments of the present application. The image processing apparatus 900 may include:

an obtaining unit 901, configured to obtain an image to be processed, and generate a three-dimensional target object model for a target object in the image to be processed;

the processing unit 902 is configured to perform model expansion processing on the three-dimensional target object model to obtain a target object expansion map of the three-dimensional target object model, where the target object expansion map is a two-dimensional image;

the processing unit 902 is configured to perform prediction conversion processing on the target object expansion map to obtain a virtual object expansion map of the target object expansion map, where the virtual object expansion map is a two-dimensional image;

the processing unit 902 is configured to perform three-dimensional reduction processing on the expanded view of the virtual object to obtain a three-dimensional virtual object model;

a generating unit 903, configured to generate a three-dimensional avatar corresponding to the target object in the image to be processed according to the three-dimensional virtual object model.

In a possible implementation manner, the three-dimensional target object model comprises a source model vertex set, the source model vertex set comprises a plurality of source model vertices, the target object expansion graph comprises a source topology point set, the source topology point set comprises a plurality of source topology points, each source topology point is located at a fixed position, and one source topology point corresponds to one source model vertex;

the processing unit 902 performs model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, including:

acquiring difference values between position data of a target source model vertex in a source model vertex set on a three-dimensional target object model and position data of a target source reference vertex, wherein the target source reference vertex is a source reference vertex matched with a target source model vertex in a source reference vertex set included in a three-dimensional object mean value model;

determining a color value of a target source topological point corresponding to the vertex of the target source model according to the difference value;

and generating a target object expansion diagram according to the color value of the target source topological point.

In one possible implementation, each source model vertex in the set of source model vertices is allowed to obtain a color value of the corresponding source topology point;

the processing unit 902 generates a target object expansion diagram according to the color value of the target source topology point, including:

carrying out interpolation processing on points in an expansion map area enclosed by the N source topological points to obtain color values of the points in the expansion map area; n is an integer greater than 3;

and generating a target object expansion diagram according to the color values of the points in each expansion diagram area and the color value of each source topological point.

In a possible implementation manner, the virtual object expansion diagram includes a conversion topology point set, the conversion topology point set includes a plurality of conversion topology points, the three-dimensional virtual object model includes a conversion model vertex set, the conversion model vertex set includes a plurality of conversion model vertices, each conversion topology point is located at a fixed position, and one conversion topology point corresponds to one conversion model vertex;

the processing unit 902 performs three-dimensional reduction processing on the virtual object development graph to obtain a three-dimensional virtual object model, including:

acquiring a topological point weighted value between a color value of a target conversion topological point in a conversion topological point set on a virtual object expansion graph and a color value of a target conversion reference topological point, wherein the target conversion reference topological point is a conversion reference topological point matched with the target conversion topological point in a conversion reference topological point set included in a virtual object mean value expansion graph;

and taking the weighted value of the topological point as the position data of the vertex of the target conversion model corresponding to the target conversion topological point so as to generate the three-dimensional virtual object model according to the position data of the vertex of the target conversion model.

In a possible implementation manner, the processing unit 902 performs three-dimensional reduction processing on the expanded view of the virtual object to obtain a three-dimensional virtual object model, and further includes:

acquiring a common point weighted value between the color value of a target point in the virtual object expansion diagram except the conversion topological point set and the color value of a target conversion reference point;

and generating a three-dimensional virtual object model according to the position data of the vertex of the target conversion model and the weighted value of each common point.

In one possible implementation, the processing unit 902 performs the prediction conversion process on the target object expansion map by calling an image processing model, where the image processing model includes a downsampled convolutional layer and a transposed convolutional layer;

the processing unit 902 performs prediction conversion processing on the target object expansion diagram to obtain a virtual object expansion diagram of the target object expansion diagram, including:

extracting image features of the target object expansion map based on the downsampled convolution layer;

processing the extracted image features based on the transposed convolution layer to output a virtual object development of the target object development.

In one possible implementation, the image processing model is obtained by training and optimizing a sample data set, where the sample data set is determined by preprocessing an initial sample set, and the initial sample set includes: and the plurality of sample pairs are composed of the three-dimensional object training model obtained by making and the three-dimensional virtual labeling model which is obtained by making and is used as training supervision data.

In one possible implementation, the preprocessing performed by the processing unit 902 on the initial sample set includes:

performing first average operation on a plurality of three-dimensional object training models in the initial sample set, and determining a three-dimensional object mean model based on the first average operation;

performing second average operation on the plurality of three-dimensional virtual labeling models in the initial sample set, and determining a three-dimensional virtual mean value model based on the second average operation;

wherein, the training model of the target three-dimensional object in the sample data set comprises: determining the difference value between a target three-dimensional object training model in the initial sample set and the three-dimensional object mean value model; the target three-dimensional virtual labeling model in the sample data set comprises: and determining according to the difference value between the target three-dimensional virtual labeling model in the initial sample set and the three-dimensional virtual mean value model.

In a possible implementation manner, the preprocessing performed by the processing unit 902 on the initial sample set further includes:

performing first weighting operation on a first model in the three-dimensional object training model included in the initial sample set and a second model in the three-dimensional object training model included in the initial sample set, and obtaining an expanded three-dimensional object training model through the first weighting operation; in the first weighting operation, the first model uses a first weighting coefficient, and the second model uses a second weighting coefficient;

performing second weighting operation on a three-dimensional virtual labeling model corresponding to a first model in the three-dimensional virtual labeling models included in the initial sample set and a three-dimensional virtual labeling model corresponding to a second model in the three-dimensional virtual labeling models included in the initial sample set, and obtaining an expanded three-dimensional virtual labeling model through the second weighting operation; in the second weighting operation, the three-dimensional virtual labeling model corresponding to the first model uses a first weighting coefficient, and the three-dimensional virtual labeling model corresponding to the second model uses a second weighting coefficient;

the sample data set comprises a sample pair formed by the expanded three-dimensional object training model and the expanded three-dimensional virtual labeling model.

In one possible implementation, the image processing model is obtained by optimizing a neural network, and the neural network comprises a generation network and a discrimination network;

the inputs to the generation network are: and (3) generating an object training expansion diagram of a three-dimensional object training model in a sample pair included in the sample data set, wherein the output of the generated network is as follows: generating a virtual object training expansion diagram obtained by network prediction;

the inputs to the discrimination network are: and generating a virtual object labeling expansion diagram of a three-dimensional virtual labeling model in a sample pair included in the sample data set, generating a virtual object training expansion diagram obtained by network prediction, and performing parameter optimization updating on the generated network and the judgment network based on the result of judgment network calculation.

In one possible implementation, the loss function that generates the network is:

Loss _G ＝log(1-D(s，G(s)))+||G(s)-I _gt || ₁

the penalty function for the discrimination network is:

Loss _D ＝logD(s,g)+log(1-D(s,G(s)))

wherein (s, G) is a group of true samples, s is an object training expanded graph of a three-dimensional object training model in a sample pair included in a sample data set, G is a virtual object labeling expanded graph of a three-dimensional virtual labeling model in the sample pair included in the sample data set, G(s) is a virtual object training expanded graph obtained by generating network prediction, (s, G (s)) is a group of false samples obtained by generating a network, and D (s, G) and D (s, G (s)) are results obtained by distinguishing the network from the group of true samples and the group of false samples respectively.

Referring to fig. 10, please refer to fig. 10, fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device 1000 is configured to execute the steps executed by the computer device in the method embodiment corresponding to fig. 2 or fig. 5, and the computer device 1000 includes: one or more processors 1010; one or more input devices 1020, one or more output devices 1030, and a memory 1040. The processor 1010, the input device 1020, the output device 1030, and the memory 1040 are connected by a bus 1050. The memory 1040 is used for storing a computer program comprising program instructions, and the processor 1010 is used for executing the program instructions stored in the memory 1040, and performing the following operations:

carrying out model expansion processing on the three-dimensional target object model to obtain a target object expansion image of the three-dimensional target object model, wherein the target object expansion image is a two-dimensional image;

the processor 1010 performs model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, and the method includes:

the processor 1010 generates a target object expansion diagram according to the color value of the target source topology point, including:

carrying out interpolation processing on points in an expanded graph area enclosed by the N source topological points to obtain color values of the points in the expanded graph area; n is an integer greater than 3;

the processor 1010 performs three-dimensional reduction processing on the virtual object expansion diagram to obtain a three-dimensional virtual object model, including:

acquiring a topological point weighted value between a color value of a target conversion topological point in a conversion topological point set on a virtual object expansion graph and a color value of a target conversion reference topological point, wherein the target conversion reference topological point is a conversion reference topological point matched with the target conversion topological point in the conversion reference topological point set included in the virtual object mean expansion graph;

In one possible implementation, the processor 1010 performs the prediction conversion process on the target object unfolded graph by calling an image processing model, where the image processing model includes a downsampled convolutional layer and a transposed convolutional layer;

the processor 1010 performs the prediction conversion processing on the target object expansion diagram to obtain a virtual object expansion diagram of the target object expansion diagram, and includes:

processing the extracted image features based on the transposed convolution layer to output a virtual object expansion map of the target object expansion map.

In one possible implementation, the preprocessing performed by the processor 1010 on the initial sample set includes:

In one possible implementation, the preprocessing performed by the processor 1010 on the initial sample set further includes:

Loss _G ＝log(1-D(s，G(s)))+||G(s)-I _gt || ₁

the penalty function for the discrimination network is:

Loss _D ＝logD(s,g)+log(1-D(s,G(s)))

It should be understood that the computer device described in this embodiment may perform the description of the image processing method in the embodiment corresponding to fig. 2 or fig. 5, and may also perform the description of the image processing apparatus 900 in the embodiment corresponding to fig. 9, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Furthermore, it is to be noted here that: the embodiment of the present application further provides a computer storage medium, and the computer storage medium stores the aforementioned computer program executed by the image processing apparatus 900, and the computer program includes program instructions, and when the processor executes the program instructions, the method in the embodiment corresponding to fig. 2 or fig. 5 can be executed, and therefore, details will not be repeated here. For technical details not disclosed in the embodiments of the computer storage medium referred to in the present application, reference is made to the description of the embodiments of the method of the present application. As an example, program instructions may be deployed to be executed on one computer device or on multiple computer devices at one site or distributed across multiple sites and interconnected by a communication network, where the multiple computer devices are distributed across the multiple sites and interconnected by the communication network to form a block chain system.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device can execute the method in the embodiment corresponding to fig. 2 or fig. 5, and therefore, the detailed description thereof will not be repeated here.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. An image processing method, characterized in that the method comprises:

performing model expansion processing on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, wherein the target object expansion diagram is a two-dimensional image;

2. The method of claim 1, wherein the three-dimensional target object model comprises a set of source model vertices, the set of source model vertices comprising a plurality of source model vertices, the target object expansion graph comprises a set of source topology points, the set of source topology points comprising a plurality of source topology points, each source topology point located at a fixed location, one source topology point corresponding to one source model vertex;

the model expansion processing is carried out on the three-dimensional target object model to obtain a target object expansion diagram of the three-dimensional target object model, and the method comprises the following steps:

obtaining difference values between position data of a target source model vertex in the source model vertex set on the three-dimensional target object model and position data of a target source reference vertex, wherein the target source reference vertex is a source reference vertex matched with the target source model vertex in a source reference vertex set included in a three-dimensional object mean value model;

3. The method of claim 2, wherein each source model vertex in the set of source model vertices is allowed to obtain a color value for the corresponding source topology point;

generating a target object expansion diagram according to the color value of the target source topological point, wherein the generating of the target object expansion diagram comprises the following steps:

4. The method of claim 1, wherein the virtual object deployment graph comprises a set of conversion topology points, the set of conversion topology points comprising a plurality of conversion topology points, the three-dimensional virtual object model comprises a set of conversion model vertices, the set of conversion model vertices comprising a plurality of conversion model vertices, each conversion topology point located at a fixed position, one conversion topology point corresponding to one conversion model vertex;

the three-dimensional reduction processing is carried out on the virtual object development graph to obtain a three-dimensional virtual object model, and the three-dimensional virtual object model comprises the following steps:

acquiring a topological point weighted value between a color value of a target conversion topological point in the conversion topological point set on the virtual object expansion diagram and a color value of a target conversion reference topological point, wherein the target conversion reference topological point is a conversion reference topological point matched with the target conversion topological point in a conversion reference topological point set included in a virtual object mean value expansion diagram;

and taking the weighted value of the topological point as the position data of the vertex of the target conversion model corresponding to the target conversion topological point, so as to generate a three-dimensional virtual object model according to the position data of the vertex of the target conversion model.

5. The method of claim 4, wherein the three-dimensional reduction processing of the virtual object expansion map to obtain a three-dimensional virtual object model further comprises:

acquiring a common point weighted value between a color value of a target point in the virtual object expanded graph except the conversion topological point set and a color value of a target conversion reference point, wherein the target conversion reference point is a conversion reference point matched with the target point in the virtual object mean expanded graph;

6. The method of claim 1, wherein the predictive conversion processing of the target object unwind map is performed by calling an image processing model, the image processing model comprising a downsampled convolutional layer and a transposed convolutional layer;

the performing prediction conversion processing on the target object expansion diagram to obtain a virtual object expansion diagram of the target object expansion diagram includes:

7. The method of claim 1 or 6, wherein the image processing model is derived by training optimization through a sample data set determined after preprocessing an initial sample set, the initial sample set comprising: and the plurality of sample pairs are composed of the three-dimensional object training model obtained by making and the three-dimensional virtual labeling model which is obtained by making and is used as training supervision data.

8. The method of claim 7, wherein the pre-processing of the initial sample set comprises:

performing a first averaging operation on a plurality of three-dimensional object training models in the initial sample set, and determining a three-dimensional object mean model based on the first averaging operation;

and performing second average operation on the plurality of three-dimensional virtual labeling models in the initial sample set, and determining a three-dimensional virtual mean value model based on the second average operation.

9. The method of claim 8, wherein the pre-processing of the initial sample set further comprises:

performing a first weighting operation on a first model in the three-dimensional object training model included in the initial sample set and a second model in the three-dimensional object training model included in the initial sample set, and obtaining an expanded three-dimensional object training model through the first weighting operation; in the first weighting operation, the first model uses a first weighting coefficient, and the second model uses a second weighting coefficient;

performing a second weighting operation on a three-dimensional virtual labeling model corresponding to a first model in the three-dimensional virtual labeling models included in the initial sample set and a three-dimensional virtual labeling model corresponding to a second model in the three-dimensional virtual labeling models included in the initial sample set, and obtaining an expanded three-dimensional virtual labeling model through the second weighting operation; in the second weighting operation, the three-dimensional virtual labeling model corresponding to the first model uses a first weighting coefficient, and the three-dimensional virtual labeling model corresponding to the second model uses a second weighting coefficient;

10. The method of claim 1 or 6,

the image processing model is obtained by optimizing a neural network, and the neural network comprises a generating network and a judging network;

the inputs to the generation network are: and the object training expansion diagram of the three-dimensional object training model in the sample pair included in the sample data set, wherein the output of the generated network is as follows: generating a virtual object training expansion diagram obtained by network prediction;

the inputs of the discrimination network are: a virtual object labeling expansion diagram of a three-dimensional virtual labeling model in a sample pair included in the sample data set and a virtual object training expansion diagram obtained by the network prediction generation;

the generating network and the judging network are updated based on the calculation result of the judging network.

11. The method of claim 10,

the loss function of the generated network is:

Loss _G ＝log(1-D(s，G(s)))+||G(s)-I _gt || ₁ ；

the penalty function for the discriminating network is:

Loss _D ＝logD(s，g)+log(1-D(s，G(s)))

wherein (s, G) is a set of true samples, s is an object training expanded view of a three-dimensional object training model in a sample pair included in the sample data set, G is a virtual object labeling expanded view of a three-dimensional virtual labeling model in the sample pair included in the sample data set, G(s) is a virtual object training expanded view obtained by the generation network prediction, (s, G (s)) is a set of false samples obtained through the generation network, and D (s, G) and D (s, G (s)) are results obtained by the discrimination network through the set of true samples and the set of false samples, respectively.

12. An image processing apparatus, characterized in that the apparatus comprises:

the processing unit is used for performing model expansion processing on the three-dimensional target object model to obtain a target object expansion image of the three-dimensional target object model, and the target object expansion image is a two-dimensional image;

the processing unit is further configured to perform predictive conversion processing on the target object expansion map to obtain a virtual object expansion map of the target object expansion map, where the virtual object expansion map is a two-dimensional image;

the processing unit is further configured to perform three-dimensional reduction processing on the virtual object development diagram to obtain a three-dimensional virtual object model;

13. A computer device, comprising:

a processor adapted to execute a computer program;

a computer-readable storage medium, in which a computer program is stored which, when executed by the processor, implements the image processing method according to any one of claims 1 to 11.

14. A computer-readable storage medium, characterized in that it stores a computer program adapted to be loaded by a processor and to perform the image processing method according to any one of claims 1 to 11.