CN111709875A

CN111709875A - Image processing method, image processing device, electronic equipment and storage medium

Info

Publication number: CN111709875A
Application number: CN202010549839.6A
Authority: CN
Inventors: 李鑫; 李甫; 林天威; 何栋梁; 张赫男; 孙昊; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2020-09-25
Anticipated expiration: 2040-06-16
Also published as: CN111709875B

Abstract

The application discloses an image processing method, an image processing device, an image processing system and a storage medium, and relates to the field of image processing and the field of deep learning. The specific implementation scheme is as follows: acquiring a training image containing a first type of face image; selecting a second-type characteristic image corresponding to at least part of the face area of the first-type face image; wherein the first type of style is different from the second type of style; based on the second type of characteristic image corresponding to at least part of the face area, adjusting the face image to obtain an adjusted face image; determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing a face image into an output image containing a second type of face image.

Description

Image processing method, image processing device, electronic equipment and storage medium

Technical Field

The present application relates to the field of information processing. The present application relates in particular to the field of image processing and the field of deep learning.

Background

In the related art, a similar cycle (cycle) generated network (GAN) is usually adopted in the process of converting images in different styles, but such a generated network is often greatly influenced by training data, and the problems of uncontrollable and unclear images which are finally generated are easily caused.

Disclosure of Invention

The disclosure provides an image processing method, an image processing device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided an image processing method including:

acquiring a training image containing a first type of face image;

selecting a second-type characteristic image corresponding to at least part of the face area of the first-type face image; wherein the first type of style is different from the second type of style;

based on the second type of characteristic image corresponding to at least part of the face area, adjusting the face image to obtain an adjusted face image;

determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing a face image into an output image containing a second type of face image.

According to another aspect of the present disclosure, there is provided an image processing apparatus including:

the image acquisition module is used for acquiring a training image containing a first type of face image;

the image preprocessing module is used for selecting a second-type characteristic image corresponding to at least part of the face area of the first-type face image; based on the second type of characteristic image corresponding to at least part of the face area, adjusting the face image to obtain an adjusted face image;

the training module is used for determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing a face image into an output image containing a second type of face image.

According to an aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aforementioned method.

According to an aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the aforementioned method.

According to the technology of the application, when the target network is trained, the training image is preprocessed, the face area in the training image is adjusted to be the feature image of the second type style, and then the training of the target network is carried out based on the adjusted image, so that the task difficulty of the training of the network is reduced, the load of the network is reduced, and the image generated by the network is more controllable.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a first flowchart illustrating an image processing method according to an embodiment of the present disclosure;

FIG. 2 is a second flowchart illustrating an image processing method according to an embodiment of the present application;

FIG. 3 is a cartoon face image obtained after image conversion in the related art;

FIG. 4 is a semi-finished image in face image preprocessing according to an embodiment of the present application;

FIG. 5 is a first schematic diagram of an image processing apparatus according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a second exemplary configuration of an image processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing an image processing method according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An embodiment of the present invention provides an image processing method, as shown in fig. 1, including:

s101: acquiring a training image containing a first type of face image;

s102: selecting a second-type characteristic image corresponding to at least part of the face area of the first-type face image; wherein the first type of style is different from the second type of style;

s103: based on the second type of characteristic image corresponding to at least part of the face area, adjusting the face image to obtain an adjusted face image;

s104: determining a target network by adopting a training image containing the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing a face image into an output image containing a second type of face image.

The scheme provided by this embodiment may be applied to an electronic device, for example, a server or a terminal device, which is not limited herein.

The training images comprising face images of the first type of style may be all or at least part of the images in the set of training images. That is, in the process of training the target network, the target network may be trained by using all the images in the training image set by the method described above.

The first style is different from the second style, and in one example, the first style can be understood as a photo taken or an image of a real person; the second style can be understood as cartoon style, oil painting style, traditional Chinese painting style and the like. Of course, the two styles can be determined according to actual situations, and this example is not exhaustive.

In the above S102, the selecting a feature image of a second type corresponding to at least a partial face area of the face image of the first type includes:

and selecting a second-type feature image matched with the features of each face region from a preset image library based on the features of each face region in at least part of face regions of the first-type face image.

Here, the preset image library may be different from the training image library. The preset image library mainly comprises at least one characteristic image of a second type.

Specifically, the second-style feature image refers to a feature image of a different face region in the second-style face.

The face region may be one of five sense organs in the face image, for example, the face region may be one of eyes, nose, mouth, eyebrows, and ears in the face image. Accordingly, the second style of feature image is the second style of image of five sense organs (any of eyes, nose, mouth, eyebrows, ears).

In one example, the second style of feature image matching the features of each face region may be selected as follows:

and selecting a second type of characteristic image matched with at least one characteristic from a preset image library according to the at least one characteristic of each facial area.

Wherein the at least one characteristic may include at least one of: the size of the face area, the gender of the face corresponding to the face area, and the opening and closing angle of the face area. There may of course be more features, and this embodiment is illustrated by way of example only and not by way of limitation.

For example, the facial region may be an eye, and the corresponding features may include: the size of the eyes, the gender of the face, the female, the eye opening angle, etc. Still alternatively, the facial region may be a mouth, and the corresponding features may include: mouth width, height, gender of the face, male, mouth closed, etc. There may also be more facial regions and their corresponding features, which are not exhaustive here.

Further, the second style of feature image matching the features of each face region may be selected as:

sequentially matching the characteristics of the face area with each characteristic image in a preset image library to obtain a matched characteristic image of a second type;

alternatively, the feature images stored in the preset image library may include corresponding labels or feature descriptions, and the features of the face region are sequentially matched with the labels or feature descriptions of each of the feature images in the preset image library to obtain the matched feature images of the second type.

In one example, the eyes and the mouth can be recognized in the face image, the characteristics of the eyes are analyzed, the characteristics of the eyes, such as the sizes and the opening and closing angles of the eyes, the characteristics of the mouth, such as the size and the maximum opening and closing angle of the mouth, are obtained, and the gender corresponding to the face image is male or female; then, based on the features of the eyes and the mouth obtained by analysis, feature images of the second type of the corresponding eyes and mouth, namely the feature images of the eyes and the mouth of the card ventilation grid, are selected and obtained from a preset image library respectively.

If the regions of the eyes, nose and ears can be identified in the face image, it can be determined according to the preset configuration that only the eyes and the mouth are subjected to image adjustment, and the above-mentioned at least partial regions can be understood as the eyes and the mouth in the whole regions in the face image. That is, at least a partial region may be understood as a region determined according to a preset configuration.

Alternatively, at least some of the face regions may be recognizable face regions, for example, the face image may include eyes, nose, ears, and mouth, but some of the face regions, for example, ears, cannot be clearly recognized as contours, and at least some of the face regions may be eyes, nose, and mouth.

The second style of characteristic image is selected based on the processing mode, so that an image which is more fit with the replacement characteristic of the face image can be obtained, and the output obtained by the finally trained target network can better meet the requirements of users.

In S103, the adjusting the face image based on the feature image of the second type corresponding to at least part of the face region to obtain an adjusted face image includes:

and adding the characteristic image of the second type corresponding to each face area to the corresponding face area of the face image to obtain the adjusted face image.

For example, at least part of the face area may be eyes and mouth, and the corresponding second style feature image may be cartoon eye and mouth image; and sticking the cartoon eyes and cartoon mouth images to the positions of the eyes and the mouth of the human face image to obtain the adjusted human face image.

Of course, the face region corresponding to the face image may also be replaced based on the feature image of the second type, that is, the image of the corresponding region in the face image is replaced with the image of the cartoon eye and mouth, so as to obtain the adjusted face image.

In S104, the target network may be a cycle Gan network, and certainly, may also be other networks, which are not exhaustive here.

It should be noted that, if the network is a cycle Gan network, a corresponding cartoon output image (i.e. a second style face image) may also be input during training.

The training process of the cycle Gan network will not be described herein.

The trained target network is obtained, and further, in an example, the scheme provided by this embodiment may further include, as shown in fig. 2:

s201: acquiring an image to be processed, and extracting a face image to be processed in a first type style from the image to be processed;

s202: selecting a second-style characteristic image matched with at least part of face area in the first-style face image to be processed;

s203: adding the characteristic image of the second type style to at least part of the face area of the human face image to be processed to obtain an adjusted image to be processed;

s204: and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image with the second type of style.

The image to be processed may be any image taken by the user, or a photograph input by the user, or the like.

The first style and the second style in S201-S204 are the same as those in the previous embodiment, and are not described herein again.

In S202, selecting a second style of feature image matched with at least a part of the face region in the first style of face image to be processed, includes:

and selecting a second type of characteristic image matched with the characteristics of each facial area from a preset image library based on the characteristics of each facial area in the first type of characteristic image to be processed.

The specific processing of S202 is similar to the aforementioned processing of selecting a corresponding feature image for the training image, and is not described again; in addition, the process of adjusting the image to be processed in S203 is similar to the process for the training image, and is not repeated.

In the related art, the problem of unclear lines is easily caused when the input image is processed by using a cycle gan, for example, referring to fig. 3, the input image is an image obtained after conversion based on an input face image (the input face image is a photograph, which is not shown in the figure), it can be seen that the result generated by the cartoon portrait obtained after conversion is relatively disordered, for example, lines of eyebrows are not normal, and unclear lines appear on the face, and it can be seen that the result is not controllable in the process of converting the image types in the related art.

By the scheme, at least one of the second type of eyes, nose, eyebrows and mouth which are matched with the facial region characteristics of the original image can be firstly pasted in a pasting mode to generate a semi-finished product of the two-dimensional cartoon image (as shown in fig. 4); and then, the graph of the semi-finished product is used as an input training cycle gan, so that the task difficulty is reduced, the network burden is reduced, the model only needs to make some fine adjustment on the input semi-finished product, and the generated image is more controllable. Alternatively, at least one of the second type of eye, nose, eyebrow, and mouth matching with the facial region feature of the image to be processed may be attached to the image to generate a semi-finished product of a two-dimensional cartoon (the image shown in fig. 4 may be used as an example); then the semi-finished image is input into a cycle gan to obtain a converted cartoon face image, so that the generated image is more controllable.

An embodiment of the present invention further provides an image processing apparatus, as shown in fig. 5, including:

an image obtaining module 51, configured to obtain a training image including a first-class style face image;

the image preprocessing module 52 is configured to select a feature image of a second type corresponding to at least a part of a face region of the face image of the first type; based on the second type of characteristic image corresponding to at least part of the face area, adjusting the face image to obtain an adjusted face image;

a training module 53, configured to determine a target network by using a training image including the adjusted face image; the target network is a trained network and is used for converting an input image to be processed containing a face image into an output image containing a second type of face image.

The image preprocessing module 52 is configured to select a second-type feature image matched with features of each face region from a preset image library based on features of each face region in at least part of face regions of the first-type face image.

The image preprocessing module 52 is configured to add the feature images of the second type corresponding to each face region to the corresponding face region of the face image, so as to obtain an adjusted face image.

In one example, as shown in fig. 6, the apparatus further comprises:

the image processing module 54 is configured to acquire an image to be processed, and extract a first type of face image to be processed from the image to be processed; selecting a second-style characteristic image matched with at least part of face area in the first-style face image to be processed; adding the characteristic image of the second type style to at least part of the face area of the human face image to be processed to obtain an adjusted image to be processed; and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image with the second type of style.

The image processing module 54 is configured to select a second type of feature image matched with the features of each face region from a preset image library based on the features of each face region in the first type of face image to be processed.

It should be understood that the processing that can be performed by each module in the image processing apparatus in this embodiment is the same as that in the foregoing method embodiment, and is not described here again.

In addition, the image processing apparatus may be implemented in the same electronic device, that is, the modules are all disposed in the same electronic device. Or, the image obtaining module, the image preprocessing module, and the training module may be disposed in a first electronic device, and the image processing module may be disposed in a second electronic device, in which case, the image preprocessing module may send the trained target network to the image processing module, that is, send the trained target network from the first electronic device to the second electronic device for subsequent processing. Of course, there may be many other module arrangements, which are not exhaustive.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 7, it is a block diagram of an electronic device according to an image processing method of the embodiment of the present application. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 7 illustrates an example of a processor 801.

The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by at least one processor to cause the at least one processor to perform the image processing method provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the image processing method provided by the present application.

The memory 802, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., an image acquisition module, an image pre-processing module, a training module, an image processing module shown in fig. 6) corresponding to the image processing method in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the image processing method in the above-described method embodiment.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the image processing method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, as exemplified by the bus connection in fig. 7.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

According to the technical scheme of the embodiment of the application, when the target network is trained, the training image is preprocessed, the face area in the training image is adjusted to be the feature image of the second type style, and then the training of the target network is carried out based on the adjusted image, so that the task difficulty of the training of the network is reduced, the load of the network is reduced, and the image generated by the network is more controllable.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An image processing method comprising:

acquiring a training image containing a first type of face image;

2. The method according to claim 1, wherein said selecting the feature images of the second type corresponding to at least part of the face regions of the face images of the first type comprises:

3. The method of claim 1, wherein the adjusting the face image based on the second style of feature image corresponding to the at least part of the face region to obtain an adjusted face image comprises:

4. The method according to any one of claims 1-3, further comprising:

acquiring an image to be processed, and extracting a face image to be processed in a first type style from the image to be processed;

selecting a second-style characteristic image matched with at least part of face area in the first-style face image to be processed;

adding the characteristic image of the second type style to at least part of the face area of the human face image to be processed to obtain an adjusted image to be processed;

and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image with the second type of style.

5. The method of claim 4, wherein selecting the feature images of the second style matching at least part of the face regions in the face image to be processed of the first style comprises:

6. An image processing apparatus comprising:

7. The apparatus according to claim 6, wherein the image preprocessing module is configured to select, from a preset image library, a feature image of a second type matching with the feature of each face region based on the feature of each face region in at least part of the face regions of the face image of the first type.

8. The apparatus according to claim 6, wherein the image preprocessing module is configured to add the feature images of the second type corresponding to each face region to the corresponding face region of the face image, so as to obtain the adjusted face image.

9. The apparatus of any of claims 6-8, further comprising:

the image processing module is used for acquiring an image to be processed and extracting a face image to be processed in a first type style from the image to be processed; selecting a second-style characteristic image matched with at least part of face area in the first-style face image to be processed; adding the characteristic image of the second type style to at least part of the face area of the human face image to be processed to obtain an adjusted image to be processed; and inputting the adjusted image to be processed into the target network to obtain an output image containing the face image with the second type of style.

10. The apparatus according to claim 9, wherein the image processing module is configured to select a second type of feature image matching the features of each face region from a preset image library based on the features of each face region in the first type of face image to be processed.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.