CN114387649A

CN114387649A - Image processing method, image processing apparatus, electronic device, and storage medium

Info

Publication number: CN114387649A
Application number: CN202210025401.7A
Authority: CN
Inventors: 姜珊; 郭知智; 洪智滨; 韩钧宇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2022-04-22

Abstract

The present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a storage medium, which relate to the technical field of artificial intelligence, specifically to the technical field of deep learning and computer vision, and can be applied to scenes such as image processing, so as to at least solve the technical problem of poor effect of performing optimization processing on a target portion in a face image by a neural network in the related art. The specific implementation scheme is as follows: acquiring a target face image, wherein the target face image at least comprises a target part; processing a target part in the target face image by using a target neural network to generate a target processing result; the target neural network is obtained by machine training by using a plurality of groups of images, and each group of images comprises at least one of the following images: a first type sample image and a second type sample image.

Description

Image processing method, image processing apparatus, electronic device, and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning and computer vision technologies, which can be applied to image processing and other scenes, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

At present, when a target part in a face is optimized by using a neural network, the neural network needs to be trained by a large number of images before optimization and images after optimization, but the number of images for optimizing the target part is small, so that the effect of the trained neural network is poor.

Disclosure of Invention

The present disclosure provides an image processing method, an image processing apparatus, an electronic device, and a storage medium, to at least solve a technical problem in the related art that a neural network has a poor effect of performing optimization processing on a target portion in a face image.

According to an aspect of the present disclosure, there is provided an image processing method, including: acquiring a target face image, wherein the target face image at least comprises a target part; processing a target part in the target face image by using a target neural network to generate a target processing result; the target neural network is obtained by machine training by using a plurality of groups of images, and each group of images comprises at least one of the following images: the image processing method comprises a first type sample image and a second type sample image, wherein the first type sample image is composed of a first source sample image and a target sample image obtained after the first source sample image is optimized, and the second type sample image is composed of the first source sample image and the target sample image obtained after the second source sample image is optimized.

Optionally, processing a target portion in the target face image by using a target neural network, and generating a target processing result, including: performing attention processing on a target face image by using a target neural network to obtain an attention image, wherein the weight value of an area where a target part is located in the attention image is greater than a preset value; and processing the target part in the target face image based on the attention map image to generate a target processing result.

Optionally, the method further comprises: training a plurality of preset neural networks by using the first type of sample images to obtain training results; screening a plurality of preset neural networks based on the training result to obtain an initial neural network; and training the initial neural network based on the first type of sample image and the second type of sample image to obtain a target neural network.

Optionally, training the initial neural network based on the first type of sample image and the second type of sample image to obtain a target neural network, including: performing attention processing on the first type sample image and the second type sample image by using an initial neural network to obtain a plurality of groups of attention training images, wherein target parts in the first type sample image and the second type sample image comprise preset detection frames, and target parts in the plurality of groups of attention training images comprise training detection frames; generating a first loss function based on a preset detection frame and a training detection frame; and adjusting the model parameters of the initial neural network based on the first loss function to obtain the target neural network.

Optionally, after generating the first loss function based on the preset detection box and the training detection box, the method further includes: processing the first type sample image and the second type sample image based on the plurality of groups of attention training images, and outputting a target training image; generating a second loss function based on the first type sample image, the second type sample image and the target training image; and adjusting the model parameters of the initial neural network based on the first loss function and the second loss function to obtain the target neural network.

Optionally, the method further comprises: acquiring a target sample image; performing triangulation processing on the target sample image to obtain at least one first source sample image; and carrying out pairing processing on the target sample image and at least one first source sample image to obtain a first type sample image.

Optionally, the pairing the target sample image and the at least one first source sample image to obtain a first type sample image includes: pairing the target sample image and at least one first source sample image to obtain an initial training image; and performing enhancement processing on the initial training image to obtain a first type sample image.

Optionally, after triangulating the target sample image to obtain at least one first source sample image, the method further includes: performing enhancement processing on at least one first source sample image to obtain at least one third source sample image; and matching the target sample image and at least one third source sample image to obtain a first type sample image.

According to a second aspect of the present disclosure, there is provided an image processing apparatus, comprising: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target face image, and the target face image at least comprises a target part; the processing module is used for processing a target part in the target face image by using the target neural network to generate a target processing result; the target neural network is obtained by machine training by using a plurality of groups of images, and each group of images comprises at least one of the following images: the image processing method comprises a first type sample image and a second type sample image, wherein the first type sample image is composed of a first source sample image and a target sample image obtained after the first source sample image is optimized, and the second type sample image is composed of the first source sample image and the target sample image obtained after the second source sample image is optimized.

Optionally, the processing module comprises: the first attention processing unit is used for carrying out attention processing on a target face image by using a target neural network to obtain an attention image, wherein the weight value of an area where a target part is located in the attention image is larger than a preset value; and the target part processing unit is used for processing the target part in the target face image based on the attention map image and generating a target processing result.

Optionally, the apparatus further comprises: the first training module is used for training a plurality of preset neural networks by utilizing the first type of sample images to obtain training results; the screening module is used for screening the plurality of preset neural networks based on the training result to obtain an initial neural network; and the second training module is used for training the initial neural network based on the first type of sample image and the second type of sample image to obtain the target neural network.

Optionally, a second training module comprising: the second attention processing unit is used for performing attention processing on the first type sample image and the second type sample image by using the initial neural network to obtain a plurality of groups of attention training images, wherein target parts in the first type sample image and the second type sample image comprise preset detection frames, and target parts in the plurality of groups of attention training images comprise training detection frames; the first loss function generating unit is used for generating a first loss function based on a preset detection frame and a training detection frame; and the first adjusting unit is used for adjusting the model parameters of the initial neural network based on the first loss function to obtain the target neural network.

Optionally, the second training module further comprises: the processing unit is used for processing the first type of sample images and the second type of sample images based on the plurality of groups of attention training images and outputting target training images; the second loss function generating unit is used for generating a second loss function based on the first type sample image, the second type sample image and the target training image; and the second adjusting unit is used for adjusting the model parameters of the initial neural network based on the first loss function and the second loss function to obtain the target neural network.

Optionally, the apparatus further comprises: the acquisition module is also used for acquiring a target sample image; the processing module is also used for carrying out triangulation processing on the target sample image to obtain at least one first source sample image; and the pairing module is used for pairing the target sample image and at least one first source sample image to obtain a first type of sample image.

Optionally, a pairing module comprising: the matching unit is used for matching the target sample image and at least one first source sample image to obtain an initial training image; and the enhancement processing unit is used for enhancing the initial training image to obtain a first type sample image.

Optionally, the apparatus further comprises: the enhancement processing module is used for carrying out enhancement processing on at least one first source sample image to obtain at least one third source sample image; the pairing module is further used for carrying out pairing processing on the target sample image and the at least one third source sample image to obtain a first type sample image.

According to a third aspect of the embodiments of the present disclosure, there is also provided an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method of image processing according to any of the above embodiments.

According to a fourth aspect of the embodiments of the present disclosure, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the image processing methods in the foregoing embodiments.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, performs any one of the image processing methods of the above embodiments.

In the implementation of the present disclosure, a target face image may be first obtained, where the target face image at least includes a target portion; the target neural network can be used for processing a target part in a target face image to generate a target processing result, wherein the target neural network can be obtained by using a plurality of groups of images through machine training, and each group of images comprises at least one of the following images: the method comprises the steps that a first type sample image and a second type sample image are obtained, the first type sample image is composed of a first source sample image and a target sample image obtained after the first source sample image is optimized, the second type sample image is composed of a first source sample image and a target sample image obtained after the second source sample image is optimized, the number of the sample images used for training can be increased by adopting the first type sample image and the second type sample image, and therefore a target neural network with high accuracy can be obtained through training.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a further understanding of the disclosure, and are not to be construed as limiting the disclosure. Wherein:

fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing an image processing method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of an image processing method according to a first embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a generator according to an embodiment of the present disclosure;

FIG. 4 is a schematic illustration of an attention mechanism according to an embodiment of the present disclosure;

FIG. 5 is a comparison graph of dental images according to an embodiment of the present disclosure;

FIG. 6 is a diagram of a training framework for a dental image beautification model according to an embodiment of the present disclosure;

FIG. 7 is a flow chart of a method of image processing according to a second embodiment of the present disclosure;

FIG. 8 is a flow chart of a method of image processing according to a third embodiment of the present disclosure;

FIG. 9 is a front-to-back comparison of triangulation of an image according to an embodiment of the disclosure;

FIG. 10 is a pre-and post-contrast diagram of a stochastic masking and Gaussian blur process performed on an image according to an embodiment of the disclosure;

FIG. 11 is a front-to-back comparison diagram of an optimization process for an image according to an embodiment of the disclosure;

fig. 12 is a block diagram of a structure of an image processing apparatus according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Currently, dental image beautification tasks typically require a large amount of data to train. The quality and quantity of the training data can significantly affect the effectiveness of the treatment, further affecting the fidelity and aesthetics of the teeth. The existing face public data set is usually small in tooth exposure degree, paired images before and after tooth beautification hardly exist, meanwhile, a target neural network which can be deployed in an actual production environment and is used for tooth beautification needs a large amount of high-quality data, and the acquisition cost of the large amount of high-quality data is extremely high. On one hand, high-quality data sources of paired images before and after tooth beautification are needed, and meanwhile, the requirement on type coverage is complete; on one hand, the acquisition difficulty of the data is high, and large-scale data is difficult to obtain.

In accordance with an embodiment of the present disclosure, there is provided an image processing method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method embodiments provided by the embodiments of the present disclosure may be executed in a mobile terminal, a computer terminal or similar electronic devices. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing an image processing method.

As shown in fig. 1, the computer terminal 100 includes a computing unit 101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)102 or a computer program loaded from a storage unit 108 into a Random Access Memory (RAM) 103. In the RAM 103, various programs and data necessary for the operation of the computer terminal 100 can also be stored. The computing unit 101, the ROM 102, and the RAM 103 are connected to each other via a bus 104. An input/output (I/O) interface 105 is also connected to bus 104.

A number of components in the computer terminal 100 are connected to the I/O interface 105, including: an input unit 106 such as a keyboard, a mouse, and the like; an output unit 107 such as various types of displays, speakers, and the like; a storage unit 108, such as a magnetic disk, optical disk, or the like; and a communication unit 109 such as a network card, modem, wireless communication transceiver, etc. The communication unit 109 allows the computer terminal 100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

Computing unit 101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 101 performs the image processing method described herein. For example, in some embodiments, the image processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 108. In some embodiments, part or all of the computer program may be loaded and/or installed onto the computer terminal 100 via the ROM 102 and/or the communication unit 109. When the computer program is loaded into the RAM 103 and executed by the computing unit 101, one or more steps of the image processing method described herein may be performed. Alternatively, in other embodiments, the computing unit 101 may be configured to perform the image processing method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here can be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

It should be noted here that in some alternative embodiments, the electronic device shown in fig. 1 may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the electronic device described above.

In the above operating environment, the present disclosure provides the image processing method as shown in fig. 2, which may be executed by the computer terminal shown in fig. 1 or a similar electronic device. Fig. 2 is a flowchart of an image processing method provided according to an embodiment of the present disclosure. As shown in fig. 2, the method may include the steps of:

step S201, a target face image is acquired.

The target face image at least comprises a target part.

The target face image can be a face image of a target part to be beautified.

The target part can be teeth, hairline, eyelid, part to be beautified and the like.

In an optional embodiment, if the user needs to beautify the target portion in the target face image, the target face image may be input into the target neural network, so that the target neural network may beautify the target portion in the target face image, and the target portion in the target face image may meet the user's desire.

The teeth in the target face image can be darker and irregular, and the teeth in the target face image can be made more white and regular by beautifying the teeth in the target face image.

The hairline in the target face image can be rare, and the hairline in the target face image can be made denser and bulkier by beautifying the hairline in the target face image.

The eyelid in the target face image can be a single eyelid, and the single eyelid in the target face image can be changed into a double eyelid by beautifying the single eyelid in the target face image.

Step S202, processing the target part in the target face image by using the target neural network to generate a target processing result.

The target neural network is obtained by machine training by using a plurality of groups of images, and each group of images comprises at least one of the following images: the image processing method comprises a first type sample image and a second type sample image, wherein the first type sample image is composed of a first source sample image and a target sample image obtained after the first source sample image is optimized, and the second type sample image is composed of the first source sample image and the target sample image obtained after the second source sample image is optimized.

The first type sample image described above may be obtained by pairing images. The second type of sample image described above may result from unpaired image pairings. The first type sample image and the second type sample image are obtained by combining the image before optimization and the image after optimization.

In an alternative embodiment, the first source sample image may be a sample image without optimization, and the first source sample image may be optimized to obtain a target sample image, and the first source sample image and the target sample image are paired to obtain the first type sample image. The second source sample image may be other source sample images except the first source sample image, a target sample image obtained by optimizing the second source sample image may be obtained, and the target sample image and the first source sample image are paired to obtain the second type sample image.

In another alternative embodiment, because the images in the first type of sample images are paired images, the target neural network may be obtained by training using the first type of sample images, so that the accuracy of the target neural network is higher, and in order to further improve the accuracy of the target neural network and reduce the overfitting of the model, the unpaired second type of sample images may be used to adjust the model parameters of the target neural network, so as to further improve the accuracy of the model and prevent the overfitting.

In an optional embodiment, the attention layer in the target neural network may be used to perform attention processing on the target face image, and focus the attention on the target portion in the target face image to obtain the attention image, specifically, the attention may be focused on the target portion in the target face image according to the weight value of each pixel point in the target face image, the color of the pixel point with the weight value greater than the preset value may be deepened, and the color of the pixel point with the weight value less than the preset value is reduced, so as to obtain the attention map focused on the target portion.

Further, the region corresponding to the attention map in the target face image may be processed according to the attention map in the processing process, that is, the region where the target part in the target face image is located is processed, so that the target processing result with higher accuracy is obtained.

In an optional embodiment, the target neural network may be pix2pixHD of a GAN network, and a generator structure of the pix2pixHD is shown in fig. 3, and a target face image may be input into the target neural network, down-sampled, passed through a plurality of residual error units, up-sampled, and then gradually restored to the size of the input target face image to obtain an output image, because the whole image is operated simultaneously, which may cause the target neural network to be difficult to focus on a region where a target portion is located to some extent, a self-learning attention machine may be added on the basis of the pix2pixHD, the self-learning attention machine may cause the output image to hardly change in other regions except the target portion, and the target portion may be self-adapted learned, so that the target neural network may focus on the region where the target portion is located, thereby improving the accuracy of the target processing result.

The schematic diagram of the above-mentioned attention mechanism is shown in fig. 4, and attention can be focused on a target portion in a target face image through the attention mechanism shown in fig. 4, so as to optimize the target portion in the face image, avoid affecting other regions except for the region where the target portion is located, and improve the optimization effect on the target portion.

Fig. 5 shows a tooth image (a) before beautification, a tooth image (b) after beautification, and an attention map (c) corresponding to a tooth part. An attention mechanism may be added to the pix2pixHD network structure, and it can be found by a visual attention map (attention map) that the attention of the network is focused on the tooth area and has higher weight at the tooth gloss points and dentition margins, etc., the visual effect is shown in (c) of fig. 5, where the darker colored areas in the attention map indicate higher attention weight.

Through the steps, firstly, a target face image can be obtained, wherein the target face image at least comprises a target part; the target neural network can be used for processing a target part in a target face image to generate a target processing result, wherein the target neural network can be obtained by using a plurality of groups of images through machine training, and each group of images comprises at least one of the following images: the method comprises the steps that a first type sample image and a second type sample image are obtained, the first type sample image is composed of a first source sample image and a target sample image obtained after the first source sample image is optimized, the second type sample image is composed of a first source sample image and a target sample image obtained after the second source sample image is optimized, the number of the sample images used for training can be increased by adopting the first type sample image and the second type sample image, and therefore a target neural network with high accuracy can be obtained through training.

The target neural network can be a tooth image beautifying model, the target part can be a tooth part, as shown in fig. 6, a training frame diagram of the tooth image beautifying model is shown, firstly, original paired data can be triangulated to enhance data, the purpose of enhancing data can be achieved by means of masking and Gaussian blurring on the original paired data, the purpose of enhancing data can be achieved by processing the position and color of the original paired data, in the first stage of training data, a plurality of tooth beautifying pre-training models can be trained through a training set corresponding to the enhanced data, a tooth beautifying pre-training model with a good training effect is selected from the plurality of tooth beautifying pre-training models according to the training result, then unpaired data is added into the training set, and the model parameters of the tooth beautifying pre-training model are adjusted according to the training result, and obtaining a tooth image beautifying model, namely the target neural network.

Fig. 7 is a flowchart of an image processing method according to a second embodiment of the present disclosure, as shown in fig. 7, the method including the steps of:

step S701, training a plurality of preset neural networks by utilizing the first type of sample images to obtain training results.

Step S702, screening a plurality of preset neural networks based on the training result to obtain an initial neural network.

Step S703, training the initial neural network based on the first type sample image and the second type sample image to obtain the target neural network.

In an optional embodiment, a training process of the target neural network may be divided into two stages, in the training process of the first stage, in order to ensure accuracy and stability of a subsequent neural network, a pair of first-type sample images may be first adopted to train a plurality of preset neural networks to obtain a training result, where the training result may be used to represent accuracy of each preset neural network in the plurality of preset neural networks, and the plurality of preset neural networks may be screened according to the training result to obtain a neural network with higher accuracy in the plurality of preset neural networks as an initial neural network.

Further, after the initial neural network is determined, the initial neural network may be subjected to hybrid training according to the first type sample image and the second type sample image, and model parameters of the initial neural network are adjusted, so that a target neural network with higher accuracy is obtained.

The preset detection frame may be a detection frame pre-labeled for the target portion in the first type sample image and the second type sample image.

The training detection frame may be a detection frame labeled at the target portion after the initial neural network is used to perform attention processing on the first type sample image and the second type sample image.

In an alternative embodiment, in the process of training the initial neural network, the initial neural network may be used to perform attention processing on the first type sample image and the second type sample image, so as to focus the attention of the network on the target part in the image, obtain a plurality of groups of attention training images, by comparing the preset detection frame and the training detection frame contained in the target part in the first type sample image and the second type sample image, a first loss function can be obtained, and the model parameters of the initial neural network can be adjusted through the first loss function, so that when the obtained target neural network processes the target part in the image, the method can avoid processing other areas except the area where the target part is located, achieves the effect of accurately processing the target part in the image, and improves the processing accuracy of the target neural network.

In an optional embodiment, a plurality of groups of attention images may be used to perform optimization processing on a first source sample image in a first type of sample image and a second type of sample image, and output a target training image, the second loss function may be obtained according to the target sample image and the target training image after the optimization processing in the first type of sample image and the second type of sample image, and the obtained target neural network may perform targeted optimization on a target portion in the image based on the first loss function and the second loss function and adjustment on model parameters of the initial neural network, so that the purpose of optimizing the target portion in the image may be achieved, and the target neural network with higher accuracy may be obtained.

Step S704, a target face image is acquired.

The target face image at least comprises a target part.

Step S705, a target neural network is used to process a target portion in the target face image, and a target processing result is generated.

Fig. 8 is a flowchart of an image processing method according to a third embodiment of the present disclosure, as shown in fig. 8, the method including the steps of:

step S801, a target sample image is acquired.

Step S802, triangulation processing is carried out on the target sample image to obtain at least one first source sample image.

The target sample image may be one or more.

In an optional embodiment, a target sample image after optimization processing may be obtained, triangulation processing may be performed on the target sample image to obtain at least one first source sample image, so as to achieve the purpose of enhancing the number of sample images, and after the at least one first source sample image is obtained, pairing processing may be performed on the first source sample image and the target sample image corresponding to the first source sample image to obtain a first type of sample image, where the first type of sample image includes the first source sample image and the target sample image after enhancement processing.

In another alternative embodiment, a random mask or gaussian blur may be performed on the target sample image to obtain the at least one first source sample image, so as to achieve the purpose of enhancing the number of sample images.

Fig. 9 is a schematic diagram of a target sample image before triangulation processing and a first source sample image after triangulation processing. Fig. 9 shows the image before the triangulation process, i.e., the target sample image, on the left, and fig. 9 shows the image after the triangulation process, i.e., the first source sample image, on the right.

In an alternative embodiment, after acquiring at least one first source sample image, since the first source sample image is an image before optimization, therefore, a mask of a random shape may be placed on the target portion of the first source sample image or the target portion may be subjected to gaussian blurring, to achieve data enhancement of the first source sample image, resulting in said at least one third source sample image, during the pairing process of the target sample image and the at least one third source sample image, because the third source sample image is obtained on the basis of the first source sample image, the first type sample image can be obtained by pairing the target sample image and the third source sample image according to the relation between the first source sample images corresponding to the third source sample image, so that the purpose of data enhancement is achieved, and the accuracy of the target neural network can be improved.

Fig. 10 shows a front-back comparison diagram of random masking and gaussian blurring processing performed on an image, the left diagram of fig. 10(a) shows an image obtained by random masking a tooth portion in a human face, that is, the third source sample image, and the right diagram of fig. 10(a) shows a target sample image corresponding to the third source sample image; the image obtained by applying gaussian blur to the tooth portion of the face, that is, the third source sample image, is shown in the left diagram of fig. 10(b), and the target sample image corresponding to the third source sample image is shown in the right diagram of fig. 10 (b).

Step S803, a pairing process is performed on the target sample image and the at least one first source sample image to obtain a first type sample image.

In an optional embodiment, the target sample image and the at least one first source sample image may be paired to obtain an initial training image, and in order to further increase the number of images used for training, the initial training image may be enhanced, and specifically, the enhanced first type sample image may be obtained by performing a data enhancement method using a random rotation, a mirror image, a small-range color dithering, and the like on the initial training image.

Through the above steps, the number of images used for training can be enhanced, and thus the accuracy of the target neural network can be improved.

Step S804, training a plurality of preset neural networks by utilizing the first type of sample images to obtain training results.

Step S805, a plurality of preset neural networks are screened based on the training result to obtain an initial neural network.

Step 806, training the initial neural network based on the first type sample image and the second type sample image to obtain a target neural network.

In step S807, a target face image is acquired.

The target face image at least comprises a target part.

And step S808, processing the target part in the target face image by using the target neural network to generate a target processing result.

To further describe the above method of this embodiment, the overall frame diagram of the present disclosure is shown in fig. 6, first, 8000 sets of real databases of comparison images before and after tooth beautification can be established, and some virtual images before tooth beautification are manufactured by using a triangulation-based method, and data enhancement is performed; and then carrying out primary model training by adopting the enhanced picture obtained in the previous step. Taking the model with better training effect obtained by training as a pre-training model for subsequent model training; and finally, adding 2000 groups of unpaired images to adjust model parameters by adopting the enhanced images and the pre-training model obtained in the two steps to obtain the tooth image beautifying model.

The training process of the model specifically comprises the following steps: the method comprises three steps of data preprocessing, training frame construction and tooth image beautifying model training.

a) A data preprocessing step: and establishing 8000 sets of real databases of comparison images before and after beautifying the teeth, wherein the real databases comprise various characteristics such as physiological information, tooth forms, tooth exposure degree and the like. Because the self-built data set has a limited scale, partial virtual images before tooth beautification can be manufactured by adopting a triangulation method, other facial features are not changed, only the intra-oral area is transformed, the tooth state distribution before beautification is effectively enriched, and the comparison effect of the original image and triangulation processing is shown in fig. 9. Next, data enhancement is performed on the pre-beautified image by randomly setting an arbitrary-shaped mask at the mouth within a controllable range or performing gaussian blur processing on the area inside the mouth, and the post-beautified image corresponding to the image is still used as a matched image, which are two data enhancement effects, for example, as shown in fig. 10. After that, the paired data are further enhanced, wherein random rotation, mirror image and small-range color dithering are selected as the data enhancement method of the invention, and all paired images after data enhancement are obtained as training data.

b) A training frame building step: tooth image beautification belongs to an image generation task, pix2pixHD based on a GAN network is adopted as a basic training frame, and the structure of a generator of the pix2pixHD is shown in figure 3. As can be seen from the figure, the network structure performs downsampling on an input image, performs upsampling after passing through a plurality of residual error units, and gradually restores the size of the input image to generate an output image. The network is difficult to focus on the key area to some extent because the whole image is operated at the same time. Therefore, the invention adds a self-learning attention mechanism on the basis of the network structure, the mechanism enables the output image to hardly change in the region outside the mouth, and the intraoral region can carry out self-adaptive learning to enable the model to focus on the intraoral key region, and the attention mechanism is derived from the GANIMATION network structure, and the schematic diagram is shown in FIG. 4. Adding this attention mechanism to the pix2pixHD network structure, it can be found by a visual attention map (attention map) that the attention of the network is focused on the tooth area and has higher weight at the tooth gloss point and dentition margin, etc., and the visual effect is shown in fig. 5, where the darker colored areas in the attention map indicate higher attention weight.

c) Model training: firstly, the enhanced paired data obtained in the above steps are adopted to carry out tooth image beautifying training based on the model structure, and the optimal model obtained by training is screened out to be used as a pre-training model for the second stage training. Subsequently, 2000 sets of unpaired images are added into the training set, all data are used as a new training set, mixed training of paired and unpaired images is carried out, and the network structure and the first stage of the finetune process are consistent. Through the two steps, the training of the tooth image beautification model is completed, and the final end-to-end effect of the model is shown in fig. 11, wherein the left image is the teeth before optimization, and the right image is the teeth after optimization.

Through the steps, the accuracy of the model can be improved by adopting a small amount of samples, the process saves cost, a large amount of paired tooth beautifying data is not needed, the cost of data acquisition can be greatly saved, the training process is simple, the parameter quantity of the model is less, the calculated quantity is not high, the deployment and calling efficiency of the model is higher, the generalization of the enhanced data obtained by the methods such as triangulation is better, and the beautification effect of the teeth generated by the method is higher in attractiveness and fidelity.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions to enable a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the methods of the embodiments of the present disclosure.

The present disclosure also provides an image processing apparatus, which is used to implement the above embodiments and preferred embodiments, and the description of the apparatus that has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 12 is a block diagram of an image processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 12, an image processing apparatus 1200 includes: an acquisition module 1202 and a processing module 1204.

The system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a target face image, and the target face image at least comprises a target part;

the processing module is used for processing a target part in the target face image by using the target neural network to generate a target processing result;

According to another aspect of the embodiments of the present disclosure, there is also provided an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method of image processing according to any of the above embodiments.

According to another aspect of the embodiments of the present disclosure, there is also provided a computer-readable storage medium, where the computer-readable storage medium includes a stored program, and when the program runs, the apparatus where the computer-readable storage medium is located is controlled to execute any one of the image processing methods in the foregoing embodiments.

According to still another aspect of an embodiment of the present disclosure, there is provided a computer program product including a computer program which, when executed by a processor, performs any one of the image processing methods of the above embodiments.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

According to an embodiment of the present disclosure, there is also provided an electronic device including a memory having stored therein computer instructions and at least one processor configured to execute the computer instructions to perform the steps in any of the above method embodiments.

Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present disclosure, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a target face image, wherein the target face image at least comprises a target part;

s2, processing the target part in the target face image by using the target neural network to generate a target processing result;

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

According to an embodiment of the present disclosure, there is also provided a non-transitory computer readable storage medium having stored therein computer instructions, wherein the computer instructions are arranged to perform the steps in any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned nonvolatile storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in the present embodiment, the non-transitory computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The present disclosure also provides a computer program product according to an embodiment of the present disclosure. Program code for implementing the audio processing methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the above embodiments of the present disclosure, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present disclosure, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

The foregoing is merely a preferred embodiment of the present disclosure, and it should be noted that modifications and embellishments could be made by those skilled in the art without departing from the principle of the present disclosure, and these should also be considered as the protection scope of the present disclosure.

Claims

1. An image processing method, comprising:

acquiring a target face image, wherein the target face image at least comprises a target part;

processing a target part in the target face image by using a target neural network to generate a target processing result;

the target neural network is obtained by machine training by using a plurality of groups of images, and each group of images comprises at least one of the following images: the image processing method comprises a first type sample image and a second type sample image, wherein the first type sample image is composed of a first source sample image and a target sample image which is subjected to optimization processing on the first source sample image, and the second type sample image is composed of the first source sample image and the target sample image which is subjected to optimization processing on the second source sample image.

2. The method of claim 1, wherein processing the target portion in the target face image using a target neural network to generate a target processing result comprises:

performing attention processing on the target face image by using a target neural network to obtain an attention image, wherein the weight value of an area where a target part is located in the attention image is greater than a preset value;

and processing a target part in the target face image based on the attention map image to generate the target processing result.

3. The method of claim 2, wherein the method further comprises:

training a plurality of preset neural networks by using the first type of sample images to obtain training results;

screening the plurality of preset neural networks based on the training result to obtain an initial neural network;

and training the initial neural network based on the first type of sample image and the second type of sample image to obtain the target neural network.

4. The method of claim 3, wherein training the initial neural network based on the first type of sample image and the second type of sample image to obtain the target neural network comprises:

performing attention processing on the first type sample image and the second type sample image by using the initial neural network to obtain a plurality of groups of attention training images, wherein target parts in the first type sample image and the second type sample image comprise preset detection frames, and target parts in the plurality of groups of attention training images comprise training detection frames;

generating a first loss function based on the preset detection frame and the training detection frame;

and adjusting the model parameters of the initial neural network based on the first loss function to obtain the target neural network.

5. The method of claim 4, wherein after generating a first loss function based on the preset detection box and the training detection box, the method further comprises:

processing the first type sample image and the second type sample image based on the plurality of groups of attention training images, and outputting a target training image;

generating a second loss function based on the first type of sample image, the second type of sample image and the target training image;

and adjusting the model parameters of the initial neural network based on the first loss function and the second loss function to obtain the target neural network.

6. The method of claim 5, wherein the method further comprises:

acquiring the target sample image;

performing triangulation processing on the target sample image to obtain at least one first source sample image;

and matching the target sample image and the at least one first source sample image to obtain the first type sample image.

7. The method of claim 6, wherein pairing the target sample image and the at least one first source sample image to obtain the first type of sample image comprises:

pairing the target sample image and the at least one first source sample image to obtain an initial training image;

and performing enhancement processing on the initial training image to obtain the first type sample image.

8. The method of claim 7, wherein after triangulating the target sample image for at least one first source sample image, the method further comprises:

performing enhancement processing on the at least one first source sample image to obtain at least one third source sample image;

and matching the target sample image and the at least one third source sample image to obtain the first type sample image.

9. An image processing apparatus, comprising:

the processing module is used for processing a target part in the target face image by using a target neural network to generate a target processing result;

10. The apparatus of claim 9, wherein the processing module comprises:

the first attention processing unit is used for carrying out attention processing on the target face image by using a target neural network to obtain an attention image, wherein the weight value of an area where a target part is located in the attention image is larger than a preset value;

and the target part processing unit is used for processing a target part in the target face image based on the attention map image and generating a target processing result.

11. The apparatus of claim 10, wherein the apparatus further comprises:

the first training module is used for training a plurality of preset neural networks by using the first type of sample images to obtain training results;

the screening module is used for screening the plurality of preset neural networks based on the training result to obtain an initial neural network;

and the second training module is used for training the initial neural network based on the first type of sample image and the second type of sample image to obtain the target neural network.

12. The apparatus of claim 11, wherein the second training module comprises:

a second attention processing unit, configured to perform attention processing on the first type sample image and the second type sample image by using the initial neural network to obtain a plurality of sets of attention training images, where target portions in the first type sample image and the second type sample image include a preset detection frame, and target portions in the plurality of sets of attention training images include a training detection frame;

a first loss function generating unit, configured to generate a first loss function based on the preset detection frame and the training detection frame;

and the first adjusting unit is used for adjusting the model parameters of the initial neural network based on the first loss function to obtain the target neural network.

13. The apparatus of claim 12, wherein the second training module further comprises:

the processing unit is used for processing the first type sample image and the second type sample image based on the plurality of groups of attention training images and outputting a target training image;

a second loss function generating unit, configured to generate a second loss function based on the first type sample image, the second type sample image, and the target training image;

and the second adjusting unit is used for adjusting the model parameters of the initial neural network based on the first loss function and the second loss function to obtain the target neural network.

14. The apparatus of claim 13, wherein the apparatus further comprises:

the acquisition module is also used for acquiring the target sample image;

the processing module is further used for carrying out triangulation processing on the target sample image to obtain at least one first source sample image;

and the pairing module is used for pairing the target sample image and the at least one first source sample image to obtain the first type of sample image.

15. The apparatus of claim 14, wherein the pairing module comprises:

the matching unit is used for matching the target sample image and the at least one first source sample image to obtain an initial training image;

and the enhancement processing unit is used for enhancing the initial training image to obtain the first type sample image.

16. The apparatus of claim 15, wherein the apparatus further comprises:

the enhancement processing module is used for enhancing the at least one first source sample image to obtain at least one third source sample image;

the pairing module is further used for performing pairing processing on the target sample image and the at least one third source sample image to obtain the first type sample image.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.