CN110796593A - Image processing method, device, medium and electronic equipment based on artificial intelligence - Google Patents
Image processing method, device, medium and electronic equipment based on artificial intelligence Download PDFInfo
- Publication number
- CN110796593A CN110796593A CN201910980281.4A CN201910980281A CN110796593A CN 110796593 A CN110796593 A CN 110796593A CN 201910980281 A CN201910980281 A CN 201910980281A CN 110796593 A CN110796593 A CN 110796593A
- Authority
- CN
- China
- Prior art keywords
- image
- face
- region
- sample
- face region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013473 artificial intelligence Methods 0.000 title claims abstract description 58
- 238000003672 processing method Methods 0.000 title claims abstract description 33
- 238000012545 processing Methods 0.000 claims abstract description 51
- 238000000605 extraction Methods 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 28
- 230000001815 facial effect Effects 0.000 claims description 25
- 230000036544 posture Effects 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 9
- 230000002194 synthesizing effect Effects 0.000 claims description 5
- 238000003709 image segmentation Methods 0.000 claims description 4
- 238000000034 method Methods 0.000 abstract description 29
- 230000008569 process Effects 0.000 abstract description 9
- 210000004209 hair Anatomy 0.000 description 100
- 238000010586 diagram Methods 0.000 description 27
- 210000001508 eye Anatomy 0.000 description 26
- 230000000694 effects Effects 0.000 description 20
- 238000005516 engineering process Methods 0.000 description 20
- 210000001331 nose Anatomy 0.000 description 13
- 230000008859 change Effects 0.000 description 12
- 241000282414 Homo sapiens Species 0.000 description 10
- 238000001514 detection method Methods 0.000 description 9
- 230000003450 growing effect Effects 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000001976 improved effect Effects 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 239000003086 colorant Substances 0.000 description 4
- 239000002131 composite material Substances 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000037308 hair color Effects 0.000 description 3
- 238000003384 imaging method Methods 0.000 description 3
- 230000001131 transforming effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 210000000744 eyelid Anatomy 0.000 description 2
- 230000001965 increasing effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 210000000697 sensory organ Anatomy 0.000 description 2
- 240000007711 Peperomia pellucida Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 210000005069 ears Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 210000002221 olecranon process Anatomy 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000010408 sweeping Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
The disclosure provides an image processing method, an image processing device, an image processing medium and electronic equipment based on artificial intelligence, and relates to the field of artificial intelligence. The method comprises the following steps: acquiring an original face image, and extracting the characteristics of the original face image to acquire a face region image; inputting the face region image into a generating network model, extracting the features of the face region image through the generating network model, and generating a first image based on the extracted features; wherein the features of the target object in the first image are different from the features of the target object in the original face image; and aligning and superposing the first image and the original face image to acquire a second image. The method and the device can process the received face image in real time, and can keep the resolution of the generated image the same as that of the original face image, thereby improving the quality of the generated image and further improving the user experience.
Description
Technical Field
The present disclosure relates to the field of artificial intelligence technology, and in particular, to an image processing method based on artificial intelligence, an image processing apparatus based on artificial intelligence, a computer storage medium, and an electronic device.
Background
With the revolution of computer technology and the rapid development of artificial intelligence, the artificial intelligence technology is widely applied to various electronic products, such as smart phones, portrait processing software, sweeping robots, and the like.
At present, people like to use portrait processing software such as a beauty camera when taking a picture, and hopefully, the people can change the shot portrait, such as modifying and optimizing the eyes, the nose, the mouth, the hair and the like, but the existing portrait processing software is difficult to simulate a vivid three-dimensional effect when the face angle changes in real time, has less special effects and lower resolution of the generated image than the original image, and reduces the user experience.
In view of the above, there is a need in the art to develop a new image processing method based on artificial intelligence.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The embodiment of the disclosure provides an image processing method based on artificial intelligence, an image processing device based on artificial intelligence, a computer storage medium and an electronic device, so that a three-dimensional effect can be simulated on a face image at least to a certain extent in real time, the resolution of each part of the generated image is maintained, and the user experience is further improved.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to an aspect of the embodiments of the present disclosure, there is provided an artificial intelligence based image processing method, including: acquiring an original face image, and extracting the characteristics of the original face image to acquire a face region image; inputting the face region image into a generating network model, extracting the features of the face region image through the generating network model, and generating a first image based on the extracted features; wherein the features of the target object in the first image are different from the features of the target object in the original face image; and aligning and superposing the first image and the original face image to acquire a second image.
According to an aspect of the embodiments of the present disclosure, there is provided an artificial intelligence based image processing apparatus including: the system comprises an image acquisition module, a face region extraction module and a face region extraction module, wherein the image acquisition module is used for acquiring an original face image and extracting the characteristics of the original face image to acquire a face region image; the image processing module is used for inputting the face region image into a generation network model, extracting the features of the face region image through the generation network model and generating a first image based on the extracted features; wherein the features of the target object in the first image are different from the features of the target object in the original face image; and the image superposition module is used for aligning and superposing the first image and the original face image to acquire a second image.
In some embodiments of the present disclosure, based on the foregoing, the image acquisition module is configured to: detecting a face region in the original face image, and performing feature registration on the face region to determine a face feature point; determining a characteristic region corresponding to the face characteristic point according to the position information of the face characteristic point; and expanding the characteristic region by a preset multiple, and determining the face region image according to the expanded characteristic region.
In some embodiments of the present disclosure, based on the foregoing solution, the image processing module includes a channel image generation unit, configured to perform feature extraction on the face region image to obtain a contour feature and a facial feature, obtain an RGB channel image and an α channel image based on the contour feature and the facial feature, and use the RGB channel image and the α channel image, and the RGB channel image as the first image, or perform feature extraction on the face region image to obtain a contour feature and a facial feature, obtain an RGB channel image based on the contour feature and the facial feature, and use the RGB channel image as the first image.
In some embodiments of the present disclosure, based on the foregoing solution, the image superposition module includes: the image alignment unit is used for aligning the first image and the original face image according to the face feature points; and the image synthesis unit is used for performing superposition synthesis on the aligned first image and the original face image to acquire the second image.
In some embodiments of the present disclosure, the first image comprises an RGB channel image and an α channel image, and based on the foregoing, the image synthesizing unit is configured to take the α channel image as a weight and to perform weighted summation on the RGB channel image and the original face image to obtain the second image.
In some embodiments of the present disclosure, based on the foregoing solution, the image processing apparatus further includes: the system comprises a sample acquisition module, a target object acquisition module and a target object analysis module, wherein the sample acquisition module is used for acquiring a face region image sample and a synthetic image sample corresponding to the face region image sample, and the synthetic image sample is obtained by processing the target object in the face region image sample; and the model training module is used for training a face image model containing a to-be-trained generating network and a to-be-trained judging network according to the face region image sample and the synthetic image sample, and taking the trained to-be-trained generating network as the generating network model.
In some embodiments of the present disclosure, based on the foregoing, the sample acquisition module comprises: the characteristic point acquisition unit is used for acquiring a face image sample, and detecting and registering the face image sample to acquire a face characteristic point corresponding to the face image sample; a face region image sample acquisition unit, configured to determine target feature points from the face feature points, and extract face region image samples corresponding to different poses according to the target feature points; the region position image acquisition unit is used for extracting the characteristics of the face region image sample through a region prediction module so as to acquire a region position image corresponding to the target object; and the synthetic image sample acquisition unit is used for acquiring a two-dimensional texture image corresponding to a target object and superposing the face region image sample, the region position map and the two-dimensional texture image to obtain a synthetic image sample corresponding to the face region image sample.
In some embodiments of the present disclosure, based on the foregoing scheme, the face region image sample acquiring unit is configured to: determining a target characteristic region according to the target characteristic points; and fixing the position of the target characteristic region unchanged, acquiring face images corresponding to different postures, and taking the face images as the face region image samples.
In some embodiments of the present disclosure, based on the foregoing scheme, the area location map obtaining unit is configured to: carrying out image segmentation on the face region image sample to obtain the target feature point; and determining a region corresponding to the target object according to the target feature point and a preset distance, and obtaining the region position map according to the region corresponding to the target object.
In some embodiments of the present disclosure, based on the foregoing, the model training module includes: the to-be-detected image generation unit is used for inputting the face area image sample into the face image model, extracting the characteristics of the face area image sample through the to-be-trained generation network, and generating an to-be-detected image containing the optimized target object based on the extracted characteristics; the reality degree obtaining unit is used for inputting the image to be detected and the synthetic image sample into the to-be-trained distinguishing network, and performing feature extraction and feature comparison on the image to be detected and the synthetic image sample through the to-be-trained distinguishing network so as to obtain the reality degree of the image to be detected; and the parameter adjusting unit is used for adjusting the parameters of the face image model according to the truth so as to make the face image model converge.
In some embodiments of the present disclosure, based on the foregoing scheme, the to-be-detected image generation unit is configured to perform feature extraction on the face region image sample, generate an RGB channel image sample and an α channel image sample based on the extracted features, and superimpose the α channel image sample and the RGB channel image sample to obtain the to-be-detected image
In some embodiments of the present disclosure, based on the foregoing, the image acquisition module is configured to: taking the received single-frame face image as the original face image; or extracting the face information in each image frame contained in the received video to obtain the original face image.
According to an aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing the artificial intelligence based image processing method according to the embodiments described above.
According to an aspect of an embodiment of the present disclosure, there is provided an electronic device including one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to perform the artificial intelligence based image processing method as described in the above embodiments.
In the technical scheme provided by some embodiments of the present disclosure, firstly, a face region image is obtained by performing feature extraction on an original face image; inputting the face region image into a generation network model, extracting the features of the face region image through the generation network model, and generating a first image based on the extracted features; the characteristics of the target object in the first image are different from the characteristics of the target object in the original face image; and finally aligning and superposing the first image and the original face image to acquire a second image. The technical scheme disclosed by the invention can process the received face image in real time, change the target object in the face image, and keep the resolution of the second image the same as that of the original face image, thereby improving the image quality and further improving the user experience.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
fig. 1 shows a schematic diagram of an exemplary system architecture to which technical aspects of embodiments of the present disclosure may be applied;
FIGS. 2A-2B are schematic diagrams illustrating non-real-time single-frame picture processing to generate long hair interface in the related art;
3A-3C schematically illustrate interface diagrams for generating two-dimensional hair in real time in the related art;
4A-4C schematically illustrate interface diagrams for generating two-dimensional hair in real time in the related art;
5A-5C schematically illustrate interface diagrams for generating three-dimensional hair in real time in the related art;
FIG. 6 schematically shows a flow diagram of an artificial intelligence based image processing method according to an embodiment of the present disclosure;
FIG. 7 schematically shows a flowchart for acquiring a face region image according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow diagram for acquiring face region image samples and composite image samples according to one embodiment of the present disclosure;
9A-9F schematically illustrate interface diagrams of long hair composite image samples in different poses according to one embodiment of the present disclosure;
FIG. 10 schematically illustrates a flow diagram for training a face image model according to one embodiment of the present disclosure;
FIG. 11 schematically illustrates a flow diagram for generating a face image containing long hairs according to one embodiment of the present disclosure;
FIG. 12 schematically illustrates a block diagram of an artificial intelligence based image processing apparatus according to an embodiment of the present disclosure;
FIG. 13 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present disclosure may be applied.
As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired communication links, wireless communication links, and so forth.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired. For example, the server 103 may be a server cluster composed of a plurality of servers. The terminal device 101 may be a photographing apparatus with an imaging unit such as a video camera, a still camera, a smart phone, etc., and an original face image containing a target object may be acquired through the terminal device 101.
In an embodiment of the present disclosure, after acquiring an original face image containing a target object, the terminal device 101 may send the original face image to the server 103 through the network 102, and after acquiring the original face image, the server 103 may perform face detection and registration on the original face image to acquire a face feature point; then, calculating and picking face region images with aligned postures according to the face characteristic points; then, feature extraction is carried out on the face region image through the trained generation network model, and a first image is generated based on the extracted features, wherein the first image is generated after a target object in the face region image is transformed, for example, short hairs in the face region image are changed into long hairs, and small eyes are changed into large eyes, and the like; and finally, aligning and superposing the first image with the transformed target object and the original face image to obtain a second image, wherein the second image is the face image with the transformed target object and the original resolution. Further, in the embodiment of the present disclosure, it is also possible to perform, after acquiring the original face image, the face makeup in the original face image, such as changing the male makeup to the female makeup, changing the female makeup to the male makeup, and the like, and then perform the above-described operations on the original face image after changing the face makeup to acquire the second image in which the face makeup and the target object are changed while maintaining the original resolution. The technical scheme of the embodiment of the disclosure can simulate the target object in the received original face image in real time to generate a three-dimensional effect, and the three-dimensional effect of the target object changes correspondingly with the change of the face angle; meanwhile, images with various shapes and makeup can be obtained, the original resolution is kept, and the user experience is further improved.
It should be noted that the artificial intelligence based image processing method provided by the embodiment of the present disclosure is generally executed by a server, and accordingly, an artificial intelligence based image processing apparatus is generally disposed in the server. However, in other embodiments of the present disclosure, the artificial intelligence based image processing method provided by the embodiments of the present disclosure may also be executed by a terminal device.
In the related art in the field, taking the example of generating hair in real time for a face image, there are the following three ways: (1) processing a non-real-time single-frame picture to generate long hairs; (2) generating two-dimensional hair in real time; and (3) generating three-dimensional hair in real time. Specifically, the method comprises the following steps:
the non-real-time single-frame image processing can generate long hair to obtain a real three-dimensional effect, and fig. 2A-2B show an interface schematic diagram of the non-real-time single-frame image processing for generating the long hair, such as an original face image shown in fig. 2A, and the hair is short hair; after the processing, the short hair is changed into the long hair, and the three-dimensional effect is achieved, as shown in fig. 2B, but the method only can process a single frame, cannot simulate in real time, and has low simulation efficiency.
Although the mode of generating the two-dimensional hair in real time can achieve real-time processing, only the two-dimensional hair can be generated, the three-dimensional hair cannot be generated, and the effect of the side face is poor, fig. 3A-3C and fig. 4A-4C show schematic diagrams of interfaces for generating the two-dimensional hair in real time, as shown in fig. 3A-3B and fig. 4A-4C, fig. 3A and fig. 4A are long-hair effects generated when the face turns right, and it can be seen that the long hair is not attached to the contour of the face, and the effect is poor; fig. 3B and 4B show the hair growing effect generated when the face is a front face, and it can be seen that the hair growing effect is more attached to the contour of the face; fig. 3C and 4C show the hair growing effect generated when the face turns left, which is the same as the hair growing effect when the face turns right, and the hair growing effect is not attached to the contour of the face, so the effect is poor.
The method for generating the three-dimensional hair in real time can generate three-dimensional effect through real-time processing, and can also realize facial gender conversion, but the generated image has single hair color. Fig. 5A to 5C show schematic diagrams of an interface for generating three-dimensional hair in real time, and as shown in fig. 5A to 5C, fig. 5A shows a hair growing effect generated when a face turns right, fig. 5B shows a hair growing effect generated when a face turns right, and fig. 5C shows a hair growing effect generated when a face turns left, which indicates that the generated hair growing is more attached to the contour of the face and has a three-dimensional effect, but the hair color is single, and only the hair color in the original face image can be used, and thus, various choices cannot be provided for a user.
In addition, the three methods also have a common disadvantage that the corresponding application program is large, the storage space of tens, tens or even hundreds of megabytes is usually required, the terminal equipment with small storage space is difficult to install and effectively load and run, and even the post-processing efficiency is low when the terminal equipment is installed.
In view of the problems in the related art, the embodiments of the present disclosure provide an Artificial Intelligence based image processing method, which is implemented based on machine learning, which is one of Artificial Intelligence, and Artificial Intelligence (AI), which is a theory, method, technique, and application system that simulates, extends, and expands human Intelligence, senses an environment, acquires knowledge, and uses the knowledge to obtain an optimal result using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.
With the research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, such as common smart homes, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned driving, automatic driving, unmanned aerial vehicles, robots, smart medical care, smart customer service, and the like.
The scheme provided by the embodiment of the disclosure relates to an artificial intelligence image processing technology, and is specifically explained by the following embodiment:
the embodiment of the present disclosure first provides an image processing method based on artificial intelligence, and details of implementation of the technical solution of the embodiment of the present disclosure are explained in detail below by taking real-time generation of long hairs as an example:
FIG. 6 schematically shows a flow diagram of an artificial intelligence based image processing method according to one embodiment of the present disclosure, which may be performed by a server, which may be the server 103 shown in FIG. 1. Referring to fig. 6, the artificial intelligence based image processing method at least includes steps S610 to S630, which are described in detail as follows:
in step S610, an original face image is obtained, and feature extraction is performed on the original face image to obtain a face region image.
In one embodiment of the present disclosure, an original face image may be obtained by the terminal device 101, and the original face image may be formed by shooting a face containing a target object by the terminal device 101 and imaging according to a captured image signal by an imaging unit therein; or the face image containing the target object obtained by downloading from the network through the terminal device 101; of course, the target object may be a face image locally stored by the terminal device 101 and including a target object, where the target object may be any one of the five sense organs, and may also be an object such as hair and a face shape, and this is not specifically limited in this embodiment of the disclosure. In order to clearly understand the technical solution of the embodiments of the present disclosure, hair is taken as a target object for explanation. After the original face image sent by the terminal device 101 is obtained, feature extraction may be performed on the original face image to obtain a face region image.
Fig. 7 is a schematic diagram illustrating a flow of acquiring a face region image, and as shown in fig. 7, the flow of acquiring the face region image at least includes steps S701 to S703, specifically:
in step S701, a face region in the original face image is detected, and feature registration is performed on the face region to determine a face feature point.
In one embodiment of the present disclosure, an original face image may be input to a trained face detection model, which includes a detection sub-model and a registration sub-model, wherein the detection sub-model is used to detect a face region in the original face image, and the registration sub-model is used to perform feature registration on the face region to determine a face feature point. The human face feature points may specifically include center points of the left and right eyes, left and right mouth corners, and a nose tip.
In step S702, a feature region corresponding to the face feature point is determined from the position information of the face feature point.
In one embodiment of the present disclosure, after the face feature points are determined, position information of the face feature points, such as position information of center points of the left and right eyes, position information of the left and right mouth corners, and position information of the nose tip, may be obtained. The feature region corresponding to the face feature point can be determined from the position information of the face feature point, for example, with the center point (x, y) of the left eye as a reference, (x-10mm, x +10mm) as the position information of both ends of the left eye, and (y-5mm, y +5mm) as the position information of the upper and lower vertexes of the left eye, and further, the feature region of the left eye can be determined from the position information of both ends of the left eye and the position information of the upper and lower vertexes, and it should be noted that the sizes of eyes of different persons are different, and therefore, when determining the position information of both ends of the eye and the upper and lower vertexes, values different from 10mm and 5mm can be used. Similarly, a characteristic region corresponding to the mouth and a characteristic region corresponding to the nose can be determined from the position information of the left and right mouth corners and the position information of the nose tip.
In step S703, the feature region is enlarged by a preset multiple, and a face region image is determined according to the enlarged feature region.
In an embodiment of the present disclosure, after obtaining the feature region corresponding to the face feature point, the feature region may be expanded by a preset multiple to obtain a face region image. For example, the feature area may be expanded by three times, four times, etc., as long as it covers the entire face and a partial area outside the face contour.
In step S620, inputting the face region image into a generation network model, performing feature extraction on the face region image by the generation network model, and generating a first image based on the extracted features; wherein the features of the target object in the first image are different from the features of the target object in the original face image.
In an embodiment of the present disclosure, after a face region image is obtained, the face region image may be input to a trained machine learning model to perform feature extraction on the face region image, and a first image is generated based on the extracted feature, where the machine learning model may be a generation network model in a countermeasure generation network model, and after the countermeasure generation network model is trained, the trained generation network model may be used as a model for finally performing face image processing, so that the size of the model may be reduced, and the image processing efficiency may be improved. And performing feature extraction on the face region image through the trained generation network model, and generating a first image with long hair based on the extracted features. It should be noted that the hair in the original face image or the face region image may be long hair, short hair, or no hair, the hair in the first image is different from the hair in the original face image or the face region image, for example, the face region image is a male face region image with short hair, the short hair of the male can be converted into long hair by generating the network model, likewise, the long hair or the short hair of the female can be converted into short hair or longer long hair by generating the network model, the generated hair may have various shapes, such as curly hair, straight hair, bang hair, an ancient style with a hairpiece, and the like, and the hair may have any shape.
In an embodiment of the present disclosure, when feature extraction is performed on a face region image by generating a network model, the feature extraction may be specifically performed on the face region image to obtain a contour feature and a facial feature, and an RGB channel image and an α channel image corresponding to the face region image are obtained based on the extracted contour feature and facial feature, where the RGB channel image and the α channel image are first images.
It should be noted that, in the embodiment of the present disclosure, a network model may be generated to perform feature extraction on an input face region image to obtain a contour feature and a facial feature, and an RGB channel image corresponding to the face region image is obtained based on the extracted contour feature and facial feature, where the RGB channel image is the first image.
In an embodiment of the present disclosure, in order to obtain a stable generated network model, a face image model needs to be trained, the face image model generates a reactive network model, which includes a generated network to be trained and a discriminant network to be trained, and the trained generated network model is obtained by training the face image model. A method of generating a countermeasure network (GAN) is unsupervised learning by letting two neural networks game each other. The generation countermeasure network is composed of a generation network and a discrimination network, the generation network randomly samples from a potential space as input, the output result needs to imitate a real sample in a training set as much as possible, the input of the discrimination network is the output of the real sample and the generation network, the purpose is to distinguish the output of the generation network from the real sample as much as possible, the generation network needs to deceive the discrimination network as much as possible, the two networks resist each other, parameters are continuously adjusted, and finally an image which is falsified and truthful is generated.
In one embodiment of the present disclosure, training samples are first obtained, and then the generation countermeasure network model is trained using the training samples. The training samples comprise face region image samples and synthetic image samples corresponding to the face region image samples, the synthetic image samples are obtained by transforming target objects in the face region image samples, and for example, images generated by transforming hairs in the face region image samples are the synthetic image samples. After the face image sample and the synthesized image sample are obtained, the face image model can be trained according to the face image sample and the synthesized image sample, and the trained generated network to be trained is used as a generated network model for subsequent image processing.
In an embodiment of the present disclosure, when the training data is not paired data and the training data is not aligned, unsupervised GAN training is difficult, a required network is large, and cannot meet a real-time requirement, so in order to reduce training difficulty and improve network efficiency, GAN may be trained by constructing paired training data, and the paired training data may be used to control alignment of background pixel values, so that a generated result is more natural after being fused into a video and an image. The paired real long and short hair data, namely the long and short hair data of the same person are very difficult to acquire, the long and short hair data of the same person can be generally acquired in a wig wearing mode to serve as paired training data, and high-precision three-dimensional modeling can also be used for synthesizing three-dimensional hair effects.
Fig. 8 is a schematic diagram of a process of acquiring a face region image sample and a composite image sample, and as shown in fig. 8, the process at least includes steps S801-S804, specifically:
in step S801, a face image sample is obtained, and the face image sample is detected and registered to obtain a face feature point corresponding to the face image sample.
In an embodiment of the present disclosure, similar to step S701, the obtained face image sample may be input into a trained face detection model, a face region in the face image sample is detected by a detection sub-model in the face detection model, and then the detected face region is subjected to feature registration by a registration sub-model in the face detection model to determine a face feature point corresponding to the face image sample.
In step S802, target feature points are determined from the face feature points, and face region image samples corresponding to different poses are extracted according to the target feature points.
In one embodiment of the present disclosure, a relatively rigid point may be selected from the determined human face feature points as a target feature point, and a target feature area may be determined according to the target feature point; and then fixing the position of the target characteristic region unchanged, acquiring face region images corresponding to different postures, and taking the acquired face region images as face region image samples. For example, the central points of the left and right eyes can be used as target feature points, the left and right eye regions can be determined according to the central points of the left and right eyes, face region image samples in different postures can be extracted under the condition that the positions of the left and right eye regions in the picture are basically unchanged, for example, face region images in different postures such as the front of a face, the left turn of the face, the right turn of the face, the face rising and the like can be extracted, and the obtained face region images can be used as face region image samples.
In step S803, feature extraction is performed on the face region image sample by the region prediction module to obtain a region position map corresponding to the target object.
In an embodiment of the present disclosure, after a face region image sample is obtained, feature extraction may be performed on the face region image sample through a region prediction module, so as to obtain a region position map corresponding to a target object. For example, if the target object is hair, a long hair region prediction module can be adopted to perform feature extraction on the face region image sample, and a long hair region position image corresponding to the face region image sample is obtained according to the contour features in the face region image sample; if the target object is a nose, a nose region prediction module can be adopted to perform feature extraction on the face region image sample, and a nose region position image corresponding to the face region image sample is obtained according to the five sense organs features in the face region image sample, and the like. When the region prediction module performs feature extraction on the face region image to obtain a region position map corresponding to the target object, the method specifically includes the following steps: firstly, carrying out image segmentation on a human face region image sample to obtain target feature points; and then determining a region corresponding to the target object according to the target feature point and the preset distance, and obtaining a region position map according to the region. Specifically, when the image of the face region is segmented, facial features in a facial region image sample can be obtained, and relatively rigid part feature points are selected from the facial features as target feature points, for example, central points of left and right eyes are used as the target feature points; and then determining the hair region according to the central point of the left eye and the right eye and the preset distance, and further determining a hair region position map according to the hair region. The preset distance is a distance set according to the position relation between the center point of the eyes and the outline of the face, in addition, the hair area comprises a plurality of small areas, the distance between each small area and the center point of the eyes on the same side does not change along with the change of the angle of the face, but along with the change of the angle of the face, the display attributes of the small areas can change correspondingly, for example, when the face turns to the right, along with the gradual increase of the right turning angle, the hair on the outer side of the right side can be gradually hidden, so that a vivid three-dimensional hair growing effect is presented. It should be noted that the region location map is substantially a region template where a predicted target object is located, and an image in which the target object changes can be obtained by superimposing a two-dimensional texture on the region template.
In step S804, a two-dimensional texture image corresponding to the target object is acquired, and the face region image sample, the region position map, and the two-dimensional texture image are superimposed to obtain a synthesized image sample corresponding to the face region image sample.
In one embodiment of the present disclosure, a two-dimensional texture image corresponding to a target object, such as a two-dimensional hair texture image, a two-dimensional nose texture image, and the like, may be acquired through full-image two-dimensional texture design, and after the two-dimensional texture image is acquired, the two-dimensional texture image may be superimposed with a face region image sample and a region position map to acquire a composite image sample corresponding to the face region image sample. During superposition, the two-dimensional texture image, the human face area image sample and the area position image need to be aligned according to the target characteristic point, and then superposition is carried out, so that the fidelity of the synthesized image sample can be ensured; in addition, the two-dimensional texture image is only displayed in the region position map so as to obtain a face image with a changed target object. Taking the generation of the long hair in real time as an example, after a long hair region position image is obtained, two-dimensional hair textures can be designed, the long hair region position image, the two-dimensional hair textures and a face region image are aligned and superposed through a texture superposition module to obtain a synthetic image sample with the long hair, further, the outer side transparency can be designed when the two-dimensional hair texture image covering the whole image is designed, the three-dimensional sense and the fidelity of the hair can be increased through the outer side transparency, correspondingly, the long hair region position image, the two-dimensional hair textures, the outer side transparency and the face region image can be input into the texture superposition module when the two-dimensional hair texture image is aligned and superposed, and the three-dimensional long hair synthetic image sample is obtained through image superposition. Accordingly, the long-hair synthesized image sample with the three-dimensional rotation effect can be obtained by performing the operation on the face region images of all the postures. Fig. 9A to 9F are schematic interface diagrams illustrating a sample of a synthetic image of long hair in different postures, as shown in fig. 9A to 9F, each image respectively shows the effect of long hair in different human face postures, the generated long hair shows a three-dimensional rotation effect along with the rotation of the head, and the hair edge and the hair tip are more three-dimensional and more vivid under the action of the outside transparency.
In an embodiment of the present disclosure, when generating a two-dimensional hair texture image, a two-dimensional hair texture with different colors and/or hairstyles may be generated to form a plurality of sets of paired training data with different colors and/or hairstyles, and when performing image processing on a new face image, a generated network model trained according to the paired training data with different colors and/or hairstyles may generate hair with multiple colors and/or hairstyles, further increasing types of special effects that may be generated by the model, and improving user experience.
In an embodiment of the present disclosure, after the face region image sample and the synthesized image sample are obtained, a face image model including a to-be-trained generation network and a to-be-trained discrimination network may be trained according to the face region image sample and the synthesized image sample. Fig. 10 shows a schematic flow chart of training a face image model, and as shown in fig. 10, a face region image sample is input to a to-be-trained generating network 1001, feature extraction is performed on the face region image sample by the to-be-trained generating network 1001, and an image to be detected containing a transformed target object is generated based on the extracted features, for example, the input face region image sample is a short-hair male, and the generated image to be detected is a long-hair male; inputting the image to be detected and a synthetic image sample corresponding to the face region image sample into a to-be-trained discrimination network 1002, and performing feature extraction and feature comparison on the image to be detected and the synthetic image sample through the to-be-trained discrimination network 1002 to obtain the degree of reality of the image to be detected; and finally, adjusting parameters of the face image model according to the truth so as to make the face image model converge.
In an embodiment of the present disclosure, when feature extraction is performed on a face area image sample through a to-be-trained generation network, an RGB channel image sample and an α channel image sample may be generated, and the RGB channel image sample and a α channel image sample may be superimposed to obtain an image to be detected, and of course, only the RGB channel image sample may be generated through the to-be-trained generation network, and the RGB channel image sample is the image to be detected, but only the RGB channel image sample may cause lower resolution precision of the image to be detected, which further causes lower resolution of a face image generated by using a trained generation network model, and poor user experience.
In an embodiment of the present disclosure, after the face image model is trained, the trained generated network to be trained may be used as a generated network model, and is used to process a subsequently received face image to obtain a face image containing a transformed target object. Only the generated network model is adopted for image processing, so that the use of models with the size of hundreds of megabytes or even G for image processing is avoided, and the image processing efficiency is improved. The generated network model in the present disclosure has a size of only 1.4M, has a high computation speed, and can achieve a real-time processing efficiency of 20 frames per second (20FPS) at the mobile end, and can achieve a real-time processing efficiency of 10FPS even at the middle and low-end mobile terminals.
In step S630, the first image and the original face image are aligned and superimposed to obtain a second image.
In one embodiment of the present disclosure, after generating the first image, in order to maintain the original resolution, the first image may be aligned and superimposed with the original facial image to obtain the second image, in the aligning and superimposing, alignment and superimposition may be performed based on a relatively rigid target feature point in the facial feature points, and also may be performed based on all the facial feature points, if the first image includes an RGB channel image and an α channel image, in the superimposing and combining of the aligned first image with the original facial image, the α channel image may be used as a weight, and the RGB channel image and the original facial image may be weighted and summed to obtain the second image, specifically, the weighted and summing may be performed using equation (1):
Iout=IRGB×α+Iin×(1-α) (1)
wherein, IoutIs a second image, IRGBFor RGB channel images, Iinα is α channel image for original face image.
It should be noted that, if the first image is only an RGB image, only the RGB channel image and the original face image need to be aligned and superimposed, and of course, a weight may also be set to perform weighted superimposition on the RGB channel image and the original face image, and the weight may be set according to actual needs, which is not specifically limited in this embodiment of the disclosure.
In an embodiment of the present disclosure, the original face image may be an acquired single-frame face image, or a face image obtained by extracting face information in each image frame in a received video, that is, the technical solution of the present disclosure may perform real-time processing on a single-frame face image to obtain a face image with a changed target object, or may perform real-time processing on continuous face images in a video to obtain a face image with a changed target object.
In an embodiment of the disclosure, taking simulation of a short-hair male to generate a long hair as an example, fig. 11 shows a schematic flow chart of generating a face image containing the long hair, as shown in fig. 11, in step S1101, a face image or a video frame of the short-hair male is acquired, in step S1102, detection and registration are performed on a face in the face image or the video frame to acquire feature points of the face, in step S1103, face region images aligned in respective poses are calculated and extracted according to the feature points of the face, in step S1104, the face region images are input to a generation network model to acquire RGB channel images and α channel images, wherein the RGB channel images and the α channel images are both channel images containing the long hair, and in step S1105, the RGB channel images, the α channel images and the face image or the video frame are synthesized by a synthesis module to form the face image or the video frame of the long-hair male.
In an embodiment of the present disclosure, when using image processing software, a user may not only change certain characteristics of hair, eyes, nose, etc., but may also want to change a face makeup, such as changing a male face makeup to a female face makeup, changing a plain female face makeup to a heavy makeup face makeup, etc., and in view of this, in the technical solution of the embodiment of the present disclosure, after acquiring a single frame of face image or video frame, the face makeup of the face image therein may be changed, and then the face image after the face makeup change may be subjected to the operation shown in fig. 8 to acquire a face image with long hair corresponding to the face image after the face makeup change; and then, a face image model can be trained according to the face image with the changed face makeup and the face image with long hairs corresponding to the face image with the changed face makeup to obtain a trained generated network model, and further, the face image with long hairs and the face makeup changed corresponding to the new face image can be generated after the new face image is received through the trained generated network model.
In one embodiment of the present disclosure, in addition to real-time simulation of hair in the face image, simulation of eyes, nose, mouth, ears and the like in the face image can be performed, such as converting an eye of a single eyelid into a double eyelid, converting a collapsed nose into an olecranon nose, converting an ear into a rabbit ear, etc., as long as in acquiring the pair-wise training data, the face region image with aligned posture is predicted by a corresponding region prediction module to obtain a region position image of a corresponding part, and designing a two-dimensional texture image corresponding to a part to be changed, and overlapping the two-dimensional texture image, the region position image and the face region image through a texture overlapping module to obtain a synthesized face image corresponding to the input face image, wherein the face image/the face region image and the synthesized face image form paired training data. And then training the generated confrontation network model according to the paired training data to obtain a generated network model, wherein the generated network model can correspondingly transform the relevant parts in the received face image to generate the face image with the transformed relevant part characteristics.
The image processing method based on artificial intelligence in the embodiment of the disclosure can process the face image in real time, obtain a three-dimensional effect through simulation, and enable a changed target object to rotate and change correspondingly even when the face angle changes; on the other hand, the image generated by the generated network model is aligned and superposed with the original face image, so that the resolution of the face and the background can be kept, and the quality of the generated image is improved; on the other hand, the model in the embodiment of the disclosure is only a generation network model in the antagonistic network model, the model is small, the image processing efficiency is high, the effect of transforming the target object can be realized while recording the video, and the user experience is further improved.
The following describes embodiments of the apparatus of the present disclosure, which can be used to perform the artificial intelligence based image processing method in the above embodiments of the present disclosure. For the details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the image processing method based on artificial intelligence described above in the present disclosure.
FIG. 12 schematically illustrates a block diagram of an artificial intelligence based image processing apparatus according to one embodiment of the present disclosure.
Referring to fig. 12, an artificial intelligence based image processing apparatus 1200 according to an embodiment of the present disclosure includes: an image acquisition module 1201, an image processing module 1202, and an image superimposition module 1203.
The image acquisition module 1201 is configured to acquire an original face image, and perform feature extraction on the original face image to acquire a face region image; an image processing module 1202, configured to input the face region image into a generation network model, perform feature extraction on the face region image through the generation network model, and generate a first image based on the extracted features; wherein the features of the target object in the first image are different from the features of the target object in the original face image; an image overlaying module 1203, configured to align and overlay the first image and the original face image to obtain a second image.
In one embodiment of the present disclosure, the image acquisition module 1201 is configured to: detecting a face region in the original face image, and performing feature registration on the face region to determine a face feature point; determining a characteristic region corresponding to the face characteristic point according to the position information of the face characteristic point; and expanding the characteristic region by a preset multiple, and determining the face region image according to the expanded characteristic region.
In an embodiment of the disclosure, the image processing module 1202 includes a channel image generation unit, configured to perform feature extraction on the face region image to obtain a contour feature and a facial feature, obtain an RGB channel image and an α channel image based on the contour feature and the facial feature, and use the RGB channel image and the α channel image as the first image, or perform feature extraction on the face region image to obtain a contour feature and a facial feature, obtain an RGB channel image based on the contour feature and the facial feature, and use the RGB channel image as the first image.
In one embodiment of the present disclosure, the image superposition module 1203 includes: the image alignment unit is used for aligning the first image and the original face image according to the face feature points; and the image synthesis unit is used for performing superposition synthesis on the aligned first image and the original face image to acquire the second image.
In one embodiment of the present disclosure, the first image includes an RGB channel image and an α channel image, and the image synthesizing unit is configured to take the α channel image as a weight and to perform a weighted sum of the RGB channel image and the original face image to obtain the second image.
In one embodiment of the present disclosure, the image processing apparatus 1200 further includes: the system comprises a sample acquisition module, a target object acquisition module and a target object analysis module, wherein the sample acquisition module is used for acquiring a face region image sample and a synthetic image sample corresponding to the face region image sample, and the synthetic image sample is obtained by processing the target object in the face region image sample; and the model training module is used for training a face image model containing a to-be-trained generating network and a to-be-trained judging network according to the face region image sample and the synthetic image sample, and taking the trained to-be-trained generating network as the generating network model.
In one embodiment of the present disclosure, the sample acquisition module includes: the characteristic point acquisition unit is used for acquiring a face image sample, and detecting and registering the face image sample to acquire a face characteristic point corresponding to the face image sample; a face region image sample acquisition unit, configured to determine target feature points from the face feature points, and extract face region image samples corresponding to different poses according to the target feature points; the region position image acquisition unit is used for extracting the characteristics of the face region image sample through a region prediction module so as to acquire a region position image corresponding to the target object; and the synthetic image sample acquisition unit is used for acquiring a two-dimensional texture image corresponding to a target object and superposing the face region image sample, the region position map and the two-dimensional texture image to obtain a synthetic image sample corresponding to the face region image sample.
In one embodiment of the present disclosure, the face region image sample acquiring unit is configured to: determining a target characteristic region according to the target characteristic points; and fixing the position of the target characteristic region unchanged, acquiring face images corresponding to different postures, and taking the face images as the face region image samples.
In one embodiment of the present disclosure, the area location map acquiring unit is configured to: carrying out image segmentation on the face region image sample to obtain the target feature point; and determining a region corresponding to the target object according to the target feature point and a preset distance, and obtaining the region position map according to the region corresponding to the target object.
In one embodiment of the present disclosure, the model training module includes: the to-be-detected image generation unit is used for inputting the face area image sample into the face image model, extracting the characteristics of the face area image sample through the to-be-trained generation network, and generating an to-be-detected image containing the optimized target object based on the extracted characteristics; the reality degree obtaining unit is used for inputting the image to be detected and the synthetic image sample into the to-be-trained distinguishing network, and performing feature extraction and feature comparison on the image to be detected and the synthetic image sample through the to-be-trained distinguishing network so as to obtain the reality degree of the image to be detected; and the parameter adjusting unit is used for adjusting the parameters of the face image model according to the truth so as to make the face image model converge.
In an embodiment of the disclosure, the to-be-detected image generation unit is configured to extract features of the face region image sample, generate an RGB channel image sample and an α channel image sample based on the extracted features, and superimpose the α channel image sample and the RGB channel image sample to obtain the to-be-detected image
In one embodiment of the present disclosure, the image acquisition module 1201 is configured to: taking the received single-frame face image as the original face image; or extracting the face information in each image frame contained in the received video to obtain the original face image.
FIG. 13 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
It should be noted that the computer system 1300 of the electronic device shown in fig. 13 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments of the present disclosure.
As shown in fig. 13, a computer system 1300 includes a Central Processing Unit (CPU)1301 that can perform various appropriate actions and processes according to a program stored in a Read-Only Memory (ROM) 1302 or a program loaded from a storage portion 1308 into a Random Access Memory (RAM) 1303, and implement the image labeling method described in the above-described embodiments. In the RAM 1303, various programs and data necessary for system operation are also stored. The CPU 1301, the ROM 1302, and the RAM 1303 are connected to each other via a bus 1304. An Input/Output (I/O) interface 1305 is also connected to bus 1304.
The following components are connected to the I/O interface 1305: an input portion 1306 including a keyboard, a mouse, and the like; an output section 1307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 1308 including a hard disk and the like; and a communication section 1309 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1309 performs communication processing via a network such as the internet. A drive 1310 is also connected to the I/O interface 1305 as needed. A removable medium 1311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1310 as necessary, so that a computer program read out therefrom is mounted into the storage portion 1308 as necessary.
In particular, the processes described below with reference to the flowcharts may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via communications component 1309 and/or installed from removable media 1311. The computer program, when executed by a Central Processing Unit (CPU)1301, performs various functions defined in the system of the present disclosure.
It should be noted that the computer readable medium shown in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present disclosure also provides a computer-readable medium that may be contained in the image processing apparatus described in the above-described embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
Claims (15)
1. An image processing method based on artificial intelligence, comprising:
acquiring an original face image, and extracting the characteristics of the original face image to acquire a face region image;
inputting the face region image into a generating network model, extracting the features of the face region image through the generating network model, and generating a first image based on the extracted features; wherein the features of the target object in the first image are different from the features of the target object in the original face image;
and aligning and superposing the first image and the original face image to acquire a second image.
2. The artificial intelligence based image processing method according to claim 1, wherein said performing feature extraction on the original face image to obtain a face region image comprises:
detecting a face region in the original face image, and performing feature registration on the face region to determine a face feature point;
determining a characteristic region corresponding to the face characteristic point according to the position information of the face characteristic point;
and expanding the characteristic region by a preset multiple, and determining the face region image according to the expanded characteristic region.
3. The artificial intelligence based image processing method of claim 1, wherein the extracting features of the face region image through the generation network and generating a first image based on the extracted features comprises:
extracting the features of the face region image to obtain contour features and facial features, obtaining an RGB channel image and an α channel image based on the contour features and the facial features, and taking the RGB channel image and the α channel image as the first image, or,
and extracting the features of the face region image to obtain contour features and facial features, obtaining an RGB channel image based on the contour features and the facial features, and taking the RGB channel image as the first image.
4. The artificial intelligence based image processing method according to claim 2, wherein said aligning and overlaying the first image with the original face image to obtain a second image comprises:
aligning the first image with the original face image according to the face feature points;
and overlapping and synthesizing the aligned first image and the original face image to obtain the second image.
5. The artificial intelligence based image processing method of claim 4, wherein the first image comprises an RGB channel image and an α channel image;
the overlaying and synthesizing the aligned first image and the original face image to obtain the second image includes:
and taking the α channel image as a weight, and weighting and summing the RGB channel image and the original face image to obtain the second image.
6. The artificial intelligence based image processing method of claim 1, further comprising:
acquiring a face region image sample and a synthetic image sample corresponding to the face region image sample, wherein the synthetic image sample is obtained by processing the target object in the face region image sample;
and training a face image model containing a to-be-trained generating network and a to-be-trained judging network according to the face region image sample and the synthetic image sample, and taking the trained to-be-trained generating network as the generating network model.
7. The artificial intelligence based image processing method of claim 6, wherein the obtaining of the face region image samples and the synthesized image samples corresponding to the face region image samples comprises:
acquiring a face image sample, and detecting and registering the face image sample to acquire a face characteristic point corresponding to the face image sample;
determining target feature points from the face feature points, and picking out the face region image samples corresponding to different postures according to the target feature points;
performing feature extraction on the face region image sample through a region prediction module to obtain a region position map corresponding to the target object;
and acquiring a two-dimensional texture image corresponding to a target object, and overlapping the face region image sample, the region position map and the two-dimensional texture image to obtain a synthetic image sample corresponding to the face region image sample.
8. The artificial intelligence based image processing method according to claim 7, wherein said matting the face region image samples corresponding to different poses according to the target feature points comprises:
determining a target characteristic region according to the target characteristic points;
and fixing the position of the target characteristic region unchanged, acquiring face images corresponding to different postures, and taking the face images as the face region image samples.
9. The artificial intelligence based image processing method according to claim 7, wherein said performing feature extraction on the face region image sample by a region prediction module to obtain a region location map corresponding to the target object comprises:
carrying out image segmentation on the face region image sample to obtain the target feature point;
and determining a region corresponding to the target object according to the target feature point and a preset distance, and obtaining the region position map according to the region corresponding to the target object.
10. The artificial intelligence based image processing method according to claim 6, wherein the training of the face image model including the generation network to be trained and the discrimination network to be trained according to the face region image sample and the synthesized image sample comprises:
inputting the face region image sample into the face image model, performing feature extraction on the face region image sample through the to-be-trained generating network, and generating an image to be detected containing the optimized target object based on the extracted features;
inputting the image to be detected and the synthetic image sample into the discrimination network to be trained, and performing feature extraction and feature comparison on the image to be detected and the synthetic image sample through the discrimination network to be trained to obtain the degree of reality of the image to be detected;
and adjusting parameters of the face image model according to the truth to make the face image model converge.
11. The artificial intelligence based image processing method according to claim 10, wherein the performing feature extraction on the face region image sample through the to-be-trained generating network, and generating an image to be detected including the optimized target object based on the extracted features comprises:
extracting the characteristics of the face region image sample, and generating an RGB channel image sample and an α channel image sample based on the extracted characteristics;
and superposing the α channel image sample and the RGB channel image sample to acquire the image to be detected.
12. The artificial intelligence based image processing method of claim 1, wherein the obtaining of the original face image comprises:
taking the received single-frame face image as the original face image; or,
extracting the face information in each image frame contained in the received video to obtain the original face image.
13. An artificial intelligence-based image processing apparatus, comprising:
the system comprises an image acquisition module, a face region extraction module and a face region extraction module, wherein the image acquisition module is used for acquiring an original face image and extracting the characteristics of the original face image to acquire a face region image;
the image processing module is used for inputting the face region image into a generation network model, extracting the features of the face region image through the generation network model and generating a first image based on the extracted features; wherein the features of the target object in the first image are different from the features of the target object in the original face image;
and the image superposition module is used for aligning and superposing the first image and the original face image to acquire a second image.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the artificial intelligence based image processing method according to any one of claims 1 to 12.
15. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to perform the artificial intelligence based image processing method of any one of claims 1 to 12.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910980281.4A CN110796593A (en) | 2019-10-15 | 2019-10-15 | Image processing method, device, medium and electronic equipment based on artificial intelligence |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910980281.4A CN110796593A (en) | 2019-10-15 | 2019-10-15 | Image processing method, device, medium and electronic equipment based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110796593A true CN110796593A (en) | 2020-02-14 |
Family
ID=69439265
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910980281.4A Pending CN110796593A (en) | 2019-10-15 | 2019-10-15 | Image processing method, device, medium and electronic equipment based on artificial intelligence |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110796593A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401283A (en) * | 2020-03-23 | 2020-07-10 | 北京达佳互联信息技术有限公司 | Face recognition method and device, electronic equipment and storage medium |
CN111652878A (en) * | 2020-06-16 | 2020-09-11 | 腾讯科技(深圳)有限公司 | Image detection method, image detection device, computer equipment and storage medium |
CN111798399A (en) * | 2020-07-10 | 2020-10-20 | 北京字节跳动网络技术有限公司 | Image processing method and device and electronic equipment |
CN111815504A (en) * | 2020-06-30 | 2020-10-23 | 北京金山云网络技术有限公司 | Image generation method and device |
CN112614205A (en) * | 2020-12-28 | 2021-04-06 | 推想医疗科技股份有限公司 | Image reconstruction method and device |
CN113256513A (en) * | 2021-05-10 | 2021-08-13 | 杭州格像科技有限公司 | Face beautifying method and system based on antagonistic neural network |
GB2596901A (en) * | 2020-05-15 | 2022-01-12 | Nvidia Corp | Content-aware style encoding using neural networks |
CN114387168A (en) * | 2022-01-17 | 2022-04-22 | 腾讯科技(深圳)有限公司 | Image processing method, related apparatus, storage medium, and program product |
CN114758391A (en) * | 2022-04-08 | 2022-07-15 | 北京百度网讯科技有限公司 | Hairstyle image determining method and device, electronic equipment, storage medium and product |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184249A (en) * | 2015-08-28 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Method and device for processing face image |
CN108509915A (en) * | 2018-04-03 | 2018-09-07 | 百度在线网络技术(北京)有限公司 | The generation method and device of human face recognition model |
US20190014884A1 (en) * | 2017-07-13 | 2019-01-17 | Shiseido Americas Corporation | Systems and Methods for Virtual Facial Makeup Removal and Simulation, Fast Facial Detection and Landmark Tracking, Reduction in Input Video Lag and Shaking, and a Method for Recommending Makeup |
CN109800732A (en) * | 2019-01-30 | 2019-05-24 | 北京字节跳动网络技术有限公司 | The method and apparatus for generating model for generating caricature head portrait |
CN109978754A (en) * | 2017-12-28 | 2019-07-05 | 广东欧珀移动通信有限公司 | Image processing method, device, storage medium and electronic equipment |
CN110070483A (en) * | 2019-03-26 | 2019-07-30 | 中山大学 | A kind of portrait cartooning method based on production confrontation network |
CN110288513A (en) * | 2019-05-24 | 2019-09-27 | 北京百度网讯科技有限公司 | For changing the method, apparatus, equipment and storage medium of face character |
-
2019
- 2019-10-15 CN CN201910980281.4A patent/CN110796593A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105184249A (en) * | 2015-08-28 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Method and device for processing face image |
US20190014884A1 (en) * | 2017-07-13 | 2019-01-17 | Shiseido Americas Corporation | Systems and Methods for Virtual Facial Makeup Removal and Simulation, Fast Facial Detection and Landmark Tracking, Reduction in Input Video Lag and Shaking, and a Method for Recommending Makeup |
CN109978754A (en) * | 2017-12-28 | 2019-07-05 | 广东欧珀移动通信有限公司 | Image processing method, device, storage medium and electronic equipment |
CN108509915A (en) * | 2018-04-03 | 2018-09-07 | 百度在线网络技术(北京)有限公司 | The generation method and device of human face recognition model |
CN109800732A (en) * | 2019-01-30 | 2019-05-24 | 北京字节跳动网络技术有限公司 | The method and apparatus for generating model for generating caricature head portrait |
CN110070483A (en) * | 2019-03-26 | 2019-07-30 | 中山大学 | A kind of portrait cartooning method based on production confrontation network |
CN110288513A (en) * | 2019-05-24 | 2019-09-27 | 北京百度网讯科技有限公司 | For changing the method, apparatus, equipment and storage medium of face character |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111401283A (en) * | 2020-03-23 | 2020-07-10 | 北京达佳互联信息技术有限公司 | Face recognition method and device, electronic equipment and storage medium |
GB2596901A (en) * | 2020-05-15 | 2022-01-12 | Nvidia Corp | Content-aware style encoding using neural networks |
CN111652878A (en) * | 2020-06-16 | 2020-09-11 | 腾讯科技(深圳)有限公司 | Image detection method, image detection device, computer equipment and storage medium |
CN111815504A (en) * | 2020-06-30 | 2020-10-23 | 北京金山云网络技术有限公司 | Image generation method and device |
CN111798399A (en) * | 2020-07-10 | 2020-10-20 | 北京字节跳动网络技术有限公司 | Image processing method and device and electronic equipment |
CN111798399B (en) * | 2020-07-10 | 2024-04-30 | 抖音视界有限公司 | Image processing method and device and electronic equipment |
CN112614205A (en) * | 2020-12-28 | 2021-04-06 | 推想医疗科技股份有限公司 | Image reconstruction method and device |
CN113256513A (en) * | 2021-05-10 | 2021-08-13 | 杭州格像科技有限公司 | Face beautifying method and system based on antagonistic neural network |
CN113256513B (en) * | 2021-05-10 | 2022-07-01 | 杭州格像科技有限公司 | Face beautifying method and system based on antagonistic neural network |
CN114387168A (en) * | 2022-01-17 | 2022-04-22 | 腾讯科技(深圳)有限公司 | Image processing method, related apparatus, storage medium, and program product |
CN114387168B (en) * | 2022-01-17 | 2024-07-12 | 腾讯科技(深圳)有限公司 | Image processing method, related device, storage medium, and program product |
CN114758391A (en) * | 2022-04-08 | 2022-07-15 | 北京百度网讯科技有限公司 | Hairstyle image determining method and device, electronic equipment, storage medium and product |
CN114758391B (en) * | 2022-04-08 | 2023-09-12 | 北京百度网讯科技有限公司 | Hair style image determining method, device, electronic equipment, storage medium and product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110796593A (en) | Image processing method, device, medium and electronic equipment based on artificial intelligence | |
CN111488865B (en) | Image optimization method and device, computer storage medium and electronic equipment | |
CN113496507B (en) | Human body three-dimensional model reconstruction method | |
US11587288B2 (en) | Methods and systems for constructing facial position map | |
US11562536B2 (en) | Methods and systems for personalized 3D head model deformation | |
CN113570684A (en) | Image processing method, image processing device, computer equipment and storage medium | |
US11461970B1 (en) | Methods and systems for extracting color from facial image | |
US11417053B1 (en) | Methods and systems for forming personalized 3D head and facial models | |
CN113362422B (en) | Shadow robust makeup transfer system and method based on decoupling representation | |
CN114821675B (en) | Object processing method and system and processor | |
WO2024174422A1 (en) | Model generation method and apparatus, electronic device, and storage medium | |
CN117218300A (en) | Three-dimensional model construction method, three-dimensional model construction training method and device | |
CN115147261A (en) | Image processing method, device, storage medium, equipment and product | |
CN117237542B (en) | Three-dimensional human body model generation method and device based on text | |
CN116863044A (en) | Face model generation method and device, electronic equipment and readable storage medium | |
CN110956599A (en) | Picture processing method and device, storage medium and electronic device | |
Xia et al. | 3D information guided motion transfer via sequential image based human model refinement and face-attention GAN | |
CN117079313A (en) | Image processing method, device, equipment and storage medium | |
CN115205171A (en) | Image generation method and device and electronic equipment | |
CN113706685A (en) | Face model reconstruction method, apparatus, device and storage medium | |
Yu et al. | Facial video coding/decoding at ultra-low bit-rate: a 2D/3D model-based approach | |
Liu | RETRACTED ARTICLE: Light image enhancement based on embedded image system application in animated character images | |
CN118411453B (en) | Digital human-computer interaction method and system | |
Yang et al. | Unsupervised Shape Enhancement and Factorization Machine Network for 3D Face Reconstruction | |
Huang et al. | A Reference-Based 3D Semantic-Aware Framework for Accurate Local Facial Attribute Editing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
REG | Reference to a national code |
Ref country code: HK Ref legal event code: DE Ref document number: 40021148 Country of ref document: HK |
|
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |