CN112581359A

CN112581359A - Image processing method, device, terminal and storage medium

Info

Publication number: CN112581359A
Application number: CN202011536818.7A
Authority: CN
Inventors: 邹子杰
Original assignee: Oppo Chongqing Intelligent Technology Co Ltd
Current assignee: Oppo Chongqing Intelligent Technology Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-03-30
Anticipated expiration: 2040-12-23
Also published as: CN112581359B

Abstract

The application relates to an image processing method, an image processing device, a terminal and a storage medium, and belongs to the technical field of image processing. The method comprises the following steps: determining a plurality of image areas from a first image to be processed; for each image region, determining a plurality of salient feature regions from the image region; determining at least one target image area from a plurality of image areas based on the image area and a plurality of display feature areas of the image area, the target image area being an image area including a facial flaw in the face area; determining a contour region of the facial flaw from the target image region; and carrying out image processing on the facial flaws in the outline area in the first image to obtain a second image. By means of the scheme, the face area is divided, the outline area of the face flaw is determined from the target image area with the face flaw, and therefore the face flaw in the outline area is subjected to image processing, the pertinence of the image processing is improved, and the face beautifying effect is optimized.

Description

Image processing method, device, terminal and storage medium

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to an image processing method, an image processing device, a terminal and a storage medium.

Background

With the development of terminal technology, the user's demand for beautifying the face in a captured image is higher and higher. For example, the user may desire fewer blemishes on the face. Therefore, image processing of a flaw area such as a spot, a pox, a mole, or the like in a human face is required.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device, a terminal and a storage medium, which can improve the accuracy of image processing. The technical scheme is as follows:

in one aspect, an image processing method is provided, and the method includes:

determining a plurality of image areas from a first image to be processed, wherein the image areas are local areas in a face area of the first image;

for each image area, determining a plurality of salient feature areas from the image areas, wherein the salient feature areas are the image areas with image feature difference values between image features and surrounding image areas exceeding a preset threshold value;

determining at least one target image region from the plurality of image regions based on the image region and a plurality of display feature regions of the image region, the target image region being an image region of the face region that includes a facial flaw;

determining a contour region of a facial flaw from the target image region;

and carrying out image processing on the face area in the contour area in the first image to obtain a second image.

In another aspect, there is provided an image processing apparatus, the apparatus including:

the device comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining a plurality of image areas from a first image to be processed, and the image areas are local areas in a face area of the first image;

the second determination module is used for determining a plurality of salient feature areas from the image areas for each image area, wherein the salient feature areas are image areas with image feature difference values of image features and surrounding image areas exceeding a preset threshold value;

a third determining module, configured to determine at least one target image region from the plurality of image regions based on the image region and a plurality of display feature regions of the image region, where the target image region is an image region including a facial flaw in the face region;

a fourth determination module for determining a contour region of a facial flaw from the target image region;

and the processing module is used for carrying out image processing on the image area in the outline area in the first image to obtain a second image.

In another aspect, a terminal is provided that includes a processor and a memory; the memory stores at least one instruction for execution by the processor to implement the image processing method as described in the above aspect.

In another aspect, a computer-readable storage medium is provided, the storage medium storing at least one instruction for execution by a processor to implement the image processing method of the above aspect.

In the embodiment of the application, the area of the face area is divided, and the outline area of the face defect is determined from the target image area with the face defect, so that the image processing is performed on the face defect in the outline area, the pertinence of the image processing is improved, and the beautifying effect is optimized.

Drawings

Fig. 1 illustrates a schematic structural diagram of a terminal provided in an exemplary embodiment of the present application;

FIG. 2 illustrates a flow chart of an image processing method shown in an exemplary embodiment of the present application;

FIG. 3 illustrates a schematic diagram of an image region shown in an exemplary embodiment of the present application;

FIG. 4 illustrates a flow chart of an image processing method shown in an exemplary embodiment of the present application;

FIG. 5 shows a schematic structural diagram of a DeepLabV3+ shown in an exemplary embodiment of the present application;

FIG. 6 illustrates a schematic diagram of a spatial pyramid pooling structure of an image segmentation model shown in an exemplary embodiment of the present application;

FIG. 7 is a diagram illustrating a codec structure of an image segmentation model according to an exemplary embodiment of the present application;

FIG. 8 is a diagram illustrating a codec structure incorporating a spatial pyramid pooling structure according to an exemplary embodiment of the present application;

FIG. 9 is a diagram illustrating parameters of an Xception architecture in accordance with an illustrative embodiment of the present application;

FIG. 10 illustrates a flow chart of a method of training an image segmentation model in accordance with an exemplary embodiment of the present application;

FIG. 11 illustrates a schematic diagram of a salient image region, shown in an exemplary embodiment of the present application;

FIG. 12 illustrates a flow chart of an image processing method shown in an exemplary embodiment of the present application;

FIG. 13 illustrates a flow chart of a training method for an information content determining network, shown in an exemplary embodiment of the present application;

FIG. 14 illustrates a schematic diagram of a feature extraction model shown in an exemplary embodiment of the present application;

FIG. 15 illustrates a schematic diagram of a feature extraction model shown in an exemplary embodiment of the present application;

FIG. 16 illustrates a schematic diagram of a feature extraction model shown in an exemplary embodiment of the present application;

FIG. 17 illustrates a schematic diagram of a feature extraction model shown in an exemplary embodiment of the present application;

FIG. 18 illustrates a schematic diagram of a feature extraction model shown in an exemplary embodiment of the present application;

FIG. 19 illustrates a schematic view of a contour region shown in an exemplary embodiment of the present application;

fig. 20 is a block diagram showing a configuration of an image processing apparatus according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Reference herein to "a plurality" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

Referring to fig. 1, a block diagram of a terminal 100 according to an exemplary embodiment of the present application is shown. In some embodiments, the terminal 100 is a smartphone, tablet, wearable device, camera, or the like having image processing functionality. The terminal 100 in the present application includes at least one or more of the following components: processor 110, memory 120, image collector 130.

In some embodiments, processor 110 includes one or more processing cores. The processor 110 connects various parts within the entire terminal 100 using various interfaces and lines, performs various functions of the terminal 100 and processes data by running or executing program codes stored in the memory 120 and calling data stored in the memory 120. In some embodiments, the processor 110 is implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Neural-Network Processing Unit (NPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the NPU is used for realizing an Artificial Intelligence (AI) function; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a single chip.

In some embodiments, the processor 110 is configured to perform analysis processing on the image captured by the image collector 130, for example, performing image segmentation, feature extraction, feature fusion, or brightness adjustment.

In some embodiments, Memory 120 comprises Random Access Memory (RAM), and in some embodiments, Memory 120 comprises Read-Only Memory (ROM). In some embodiments, the memory 120 includes a non-transitory computer-readable medium. The memory 120 may be used to store program code. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like; the storage data area may store data (such as audio data, a phonebook) created according to the use of the terminal 100, and the like.

In some embodiments, the memory 120 stores model parameters of a model of an image used for image segmentation, model parameters of a model used for feature extraction, and the like. In some embodiments, the memory 120 also stores images captured by the image collector 130, and the like.

The image collector 130 is a component for collecting images. In some embodiments, the image collector 130 is an image collector 130 integrated on the terminal. The image collector 130 is, for example, a camera or the like mounted on the terminal 100. In some embodiments, the image collector 130 is an image collecting device connected to the terminal. The image pickup device 130 is, for example, a camera or the like connected to the terminal 100.

In some embodiments, a display screen is also included in terminal 100. A display screen is a display component for displaying a user interface. In some embodiments, the display screen is a display screen with a touch function, and a user can perform a touch operation on the display screen by using any suitable object such as a finger, a touch pen, and the like. In some embodiments, the display is typically provided on the front panel of the terminal 100. In some embodiments, the display screen is designed as a full-face screen, curved screen, contoured screen, double-face screen, or folded screen. In some embodiments, the display screen is further designed to be a combination of a full-face screen and a curved-face screen, a combination of a special-shaped screen and a curved-face screen, and the like, which is not limited by the embodiment.

In addition, those skilled in the art will appreciate that the configuration of terminal 100 illustrated in the above-described figures is not intended to be limiting of terminal 100, as terminal 100 may include more or less components than those illustrated, or some components may be combined, or a different arrangement of components. For example, the terminal 100 further includes a microphone, a speaker, a radio frequency circuit, an input unit, a sensor, an audio circuit, a Wireless Fidelity (Wi-Fi) module, a power supply, a bluetooth module, and other components, which are not described herein again.

With the development of terminal technology, the user's demand for beautifying the face in a captured image is higher and higher. For example, the user may desire fewer blemishes on the face. Therefore, image processing of a flaw area such as a spot, a pox, a mole, or the like in a human face is required. At present, the treatment of flaws in the human face is generally realized by performing the treatment such as buffing on the whole human face area of the human face.

In the related art, in order to treat blemish regions such as spots, acne and moles in partial regions of a face, skin beautifying operations such as skin grinding and the like are often performed on the whole face region, so that the beautifying process is not targeted, and the beautifying effect of an image is poor.

In the embodiment of the present application, the area of the face area is divided, and the contour area of the face defect is determined from the target image area having the face defect, so that the image processing is performed on the face defect in the contour area, the pertinence of the image processing is improved, and the beauty effect is optimized.

Referring to fig. 2, a flowchart of an image processing method according to an exemplary embodiment of the present application is shown. The executing agent in the embodiment of the present application may be the terminal 100, or may also be the processor 110 in the terminal 100 or the operating system in the terminal 100, and the executing agent is taken as the terminal 100 in the embodiment for example. In the examples of the present application, the description is given by way of example. The method comprises the following steps:

step 201: the terminal determines a plurality of image areas from a first image to be processed.

Wherein the image region is a local region in the face region of the first image. The first image is an image acquired by the terminal, or a data stream displayed in an image display frame in the process of acquiring the image by the terminal. In the embodiments of the present application, this is not particularly limited.

The terminal determines a plurality of image regions in the first image through an anchor frame mechanism. The process is realized by the following steps (1) to (2), and comprises the following steps:

(1) the terminal determines a plurality of target positions from the first image.

And the target positions are preset positions, or the target positions are randomly selected positions. The number of the plurality of target positions is set as needed, and the number of the target positions is not particularly limited in the embodiment of the present application. For example, in some embodiments, the plurality of target positions are intersection positions of the boundary lines when the first image is equally divided into the squared image determined in the first image based on the size of the first image.

(2) For each target position, the terminal determines a plurality of image areas corresponding to the target position based on a plurality of preset selection ranges.

In the anchor frame mechanism, the terminals are respectively at target positions, and a plurality of anchor frames are determined at each target position based on different Aspect ratios (Aspect ratios) and sizes (scales). The terminal determines regions within the anchor frame as a plurality of image regions of the first image. The length-width ratio and the size of the anchor frame are set as required, and this is not particularly limited in the embodiments of the present application.

For example, referring to fig. 3, for the first image, three original anchor frames are generated at target position 1 and target position 2 based on different aspect ratios and sizes, respectively, resulting in 6 anchor frames, anchor ═ { a, a1, a2, B1, B2 }.

The terminal determines the plurality of image areas directly in the first image after acquiring the first image. Or, the terminal performs face detection on the first image after acquiring the first image, performs image segmentation on the first image when the first image includes a face region, and determines the plurality of image regions in the segmented regions capable of image processing. Accordingly, prior to this step, the terminal performs image segmentation on the first image. Referring to fig. 4, the terminal performs face detection on the first image, and does not perform image processing using the present scheme when no face region is detected. When the Ordovician face area is detected, the first image is subjected to image segmentation to obtain a background area, namely a first face area and a second face area. The process is as follows: in response to the presence of a face region in the first image, the terminal image-divides the face region to obtain a first face region and a second face region, the first face region being an image processing region and the second face region being a region not requiring image processing. For example, the first face area is a skin area of a human face, and the second face area is a beard area of the human face, an area where glasses are located, and the like.

In some embodiments, the terminal performs image segmentation on the first image through an image segmentation module, where the image segmentation model is determined as needed, and in this embodiment, the image segmentation model is not specifically limited. For example, the image segmentation model is deplab v3+ (an image segmentation model). Wherein, the DeepLabV3+ model adopts an encoding-decoding (Encoder-Decoder) structure. Referring to fig. 5, fig. 5 illustrates the structure of a deplab v3+ model according to an exemplary embodiment. Referring to fig. 5, the deplab v3+ model includes an encoder and a decoder. Wherein, the image segmentation model also fuses a spatial pyramid pooling structure. The spatial pyramid pooling serves to pool images of different resolutions to capture rich contextual information. Referring to fig. 6, fig. 6 illustrates a spatial pyramid pooling structure of an image segmentation model according to an exemplary embodiment. The coding and decoding structure is used for obtaining a boundary area with larger characteristic difference with other areas in the image. Referring to fig. 7, fig. 7 illustrates a coding/decoding structure of an image segmentation model according to an exemplary embodiment. In the embodiment of the application, the spatial pyramid structure and the coding and decoding structure are fused to obtain the image segmentation structure. Referring to fig. 8, fig. 8 is a coding and decoding structure fused with a spatial pyramid pooling structure. The image is up-sampled by 4 times by an image segmentation model, the features obtained by up-sampling are merged (corresponding) with the input image, and then down-sampled by 4 times to restore the original image size. That is, the number of bytes occupied per pixel (stride) of the last one or two cells (blocks) of the residual network is modified in the encoder so that the number of output bytes (output stride) is 16 or 8. And then applying an improved spatial pyramid pooling structure after block4, and connecting the obtained feature maps by using 1 × 1 convolution to obtain feature maps of 256 channels. In the decoder, the feature map is first up-sampled by a factor of 4 and then concatenated with the corresponding resolution low-level features in the encoder. Before splicing, the low-level feature map needs to be reduced in the number of channels by 1 × 1 convolution. Reducing the number of channels by convolution of 1 × 1 for the low-level feature map before splicing prevents semantic information from being faded out due to the fact that the number of channels of the low-level feature map is usually too many (256 or 512), and the number of channels of the feature map rich in semantic information obtained from the encoder is only 256. After stitching, the features are improved by convolution with 3 × 3, and finally up-sampling is performed by 4 times to restore the original image size. In addition, in the embodiments of the present application, the type of the encoder is not particularly limited. For example, the encoder is in an Xception architecture. And applying depth-wise separable convolution (depth-wise separable convolution) in the pooling structure and the decoder to improve the operation speed and accuracy of the partitioned network. The network parameters of the image segmentation model are adjusted as needed, see fig. 9, and fig. 9 is a parameter diagram of an Xception architecture according to an exemplary embodiment. Referring to fig. 9, the Xception architecture includes three flows, namely an input flow (entry flow), an intermediate flow (middle flow), and an output flow (exit flow), and the segmentation model includes a plurality of convolutional layers, and the size of the convolutional core of each convolutional layer is set as needed, for example, the size of the convolutional core in each convolutional layer is 3 × 3.

Correspondingly, before the step, the terminal needs to perform model training on the image segmentation model, referring to fig. 10, the terminal inputs the sample image into the image segmentation model to be trained to obtain an image segmentation result; and adjusting model parameters of the image segmentation model according to the image segmentation result, the image segmentation result labeled by the sample image and the loss function until loss values obtained according to the image segmentation result, the image segmentation result labeled by the sample image and the loss function are basically unchanged, and determining that model training is finished to obtain the image segmentation model.

The terminal determines the plurality of image areas in the first image area after the image segmentation of the first image. Correspondingly, the method comprises the following steps: in the first face region, the plurality of image regions are determined. The terminal determines at least one target location in the first face region.

In the implementation manner, by determining the plurality of image areas in the first image area after image segmentation, the content in the non-skin area is prevented from being mistakenly judged as a defective area, for example, the position of a beard in a human face or a glasses area is determined as an area where a spot is located, and thus the accuracy of eliminating the defective area in the image is improved.

Step 202: for each image region, the terminal determines a plurality of salient feature regions from the image region.

The salient feature region is an image region of which the difference value of the image features and the image features of the surrounding image region exceeds a preset threshold value. The salient feature region is an image region where a defective region may exist.

And the terminal detects each image area through the information quantity determination network and determines a plurality of salient feature areas of each image area. The number of the plurality of salient feature regions is set as required, and in the embodiment of the present application, the number of the salient feature regions is not particularly limited. For example, the plurality of salient feature regions is 3, 4, or 5, etc. The process is realized by the following steps (a1) - (A3), including:

(A1) the terminal determines information confidences of a plurality of local regions in the image region.

Wherein the information confidence is used to represent a probability that the local region includes a facial flaw. In this step, the terminal determines a local image area of the image in each image area, determines whether a facial defect exists in the local image area, and obtains an information confidence of the local area.

In some embodiments, the terminal performs feature extraction on each local region, determines the difference between the image features and the image features of other local regions according to the extracted image features, and determines the information confidence of the local region. Wherein the information confidence is positively correlated with the image feature difference.

(A2) And the terminal determines the local area with the information confidence degree larger than a second preset threshold value from the plurality of local areas.

The second preset threshold is set as needed, and in the embodiment of the present application, the second preset threshold is not specifically limited.

In some embodiments, the terminal is further capable of determining the local area according to a preset number of preset salient image areas. Correspondingly, the terminal sorts the plurality of local regions according to the information confidence to obtain a sorting result, and determines a preset number of local regions with higher information confidence from the sorting result.

(A3) And the terminal determines the local area with the confidence coefficient larger than the second preset threshold value as the salient image area.

Referring to fig. 11, in the image region, three significant image regions are selected. In the implementation mode, the terminal determines a plurality of significant images from the local images, so that the processing pertinence of the image area is improved, and different significant areas of the same image area are processed, so that the feature extraction of the image area is more sufficient, and the accuracy of the image processing is improved.

In some embodiments, the terminal determines a plurality of salient feature regions in the image region through the information amount determination network, and referring to fig. 12, the terminal inputs each image region into the information amount determination network and outputs the plurality of salient feature regions through the information amount determination network. Correspondingly, before this step, the information quantity determination network needs to be trained, referring to fig. 13, the training process is as follows:

(B1) the terminal determines sample data.

The sample data is a partial image area including a defective area in any image.

In some embodiments, the terminal acquires a sample image, and divides an area in the sample image where the flaw area exists into the sample data.

(B2) And the terminal determines a network through the information amount to be trained and determines a plurality of predicted image areas in the sample data.

In this step, the terminal inputs the sample data into the information quantity determination network to be trained, and obtains a plurality of predicted image areas output by the information quantity determination network.

(B3) And the terminal respectively extracts the characteristics of the plurality of predicted image areas to obtain predicted image characteristics.

In some embodiments, the terminal sequentially inputs the plurality of predicted image regions to the feature extraction model, resulting in predicted image features of the plurality of predicted images. Alternatively, the terminal inputs the plurality of predicted images into a plurality of feature extraction models to obtain predicted image features for each predicted image. Wherein the plurality of feature extraction models share the same network parameters and structure.

The feature extraction model can be selected according to different terminal types. For example, the terminal is a large-scale device such as a server, and the feature extraction model is ResNet (a feature extraction model); the terminal is a small device such as a mobile terminal, and a mobilent (a feature extraction model) is adopted.

Wherein the feature extraction model is selected as required. In some embodiments, the feature extraction model is an AlexNet network. Referring to fig. 14, in the entire AlexNet (a neural network) network, except for the pooling layer and the Local Response Normalization Layer (LRN), there are 8 layers that need training parameters, the first 5 layers are convolutional layers, and the last 3 layers are fully-connected layers. The last layer of the AlexNet network is the Softmax (a sort function) layer with 1000 kinds of outputs for sorting. The LRN layers appear after the 1 st and 2 nd convolutional layers, while the largest pooling layer appears after the two LRN layers and the last convolutional layer. And determining sampling areas in the characteristic images through the convolutional layers respectively, and performing convolution on the basis of the sampling areas to obtain the characteristic image input by the next convolutional layer.

In some embodiments, the feature extraction model is a computer vision Group (VGGNet) network. Referring to fig. 15, the VGGNet network constructs a 16-19 layer deep convolutional neural network by repeatedly stacking 3 × 3 small convolutional kernels and 2 × 2 max pooling layers. VGGNet has a significantly lower error rate than previous network architectures.

In some embodiments, the feature extraction model is a GoogleNet (a neural network) network. Referring to fig. 16, the GoogleNet network further reduces the computational load by the decomposition of the convolution. For example, a layer of 5x5 convolution may be replaced by a two-layer 3x3 convolution, while a layer of 3x3 convolution may be replaced by a layer of 1x3 convolution and a layer of 3x1 convolution, which greatly reduces the amount of computation.

In some embodiments, the feature extraction model is a ResNet (residual error network) network, referring to fig. 17, the ResNet network directly bypasses the input information to the output through a residual error structure, so as to protect the integrity of the information, and the whole network only needs to learn a part of the difference between the input and the output, thereby simplifying the learning objective and the difficulty. Therefore, the problems of information loss, loss and the like of more or less traditional convolution layers or full connection layers during information transmission are prevented.

In some embodiments, the feature extraction model is a global mean-pooling network (SE-Net) network, see FIG. 18, in which a mechanism of attention is introduced. The SE-Net network learns the feature weight according to the loss through the network, so that the effective feature map weight is large, and the invalid or small-effect feature map weight is small, the model is trained to achieve a better result, and the classification accuracy is improved.

(B4) And the terminal adjusts the parameters of the information quantity determination network to be trained through a supervision network until the confidence coefficient of the image area predicted by the information quantity determination network determined by the supervision network reaches a third preset threshold value, completes model training and obtains the information quantity determination network.

In the step, the monitoring network determines the loss value of the predicted image selected by the information quantity determination network according to the characteristics of each predicted image, adjusts the network parameters of the information quantity determination network based on the loss value until the confidence coefficient of the image area determined by the information determination network exceeds a third preset threshold, completes model training and obtains the information quantity determination network.

In the implementation mode, the local candidate frame is selected through the information quantity determination network, the image in the local candidate frame is subjected to feature extraction, the optimal local image area, namely the salient image area, in each image area is determined, and the image processing accuracy is improved.

Step 203: the terminal determines at least one target image area from the plurality of image areas based on the image area and a plurality of display feature areas of the image area.

Wherein the target image area is an image area including a facial flaw in the face area.

In this step, the terminal traverses the plurality of image regions, obtains a probability that each image region corresponds to a defective region, and selects at least one target image region including a facial defect based on the probability. The process is realized by the following steps (1) to (2), and comprises the following steps:

(1) and the terminal respectively extracts the image area and the plurality of significant image areas in the image area to obtain a plurality of image features.

In this step, the terminal inputs the image region and a plurality of significant image regions in the image region into the feature extraction model, respectively. For example, referring to fig. 11, the image area includes three significant image areas, and the terminal inputs the image area into the first feature extraction model and the three significant image areas into the second feature extraction model, the third feature extraction model, and the fourth feature extraction model, respectively.

It should be noted that the first feature extraction model, the second feature extraction model, the third feature extraction model, and the fourth feature extraction model are network models sharing the same network parameters and structure. In the embodiment of the application, the terminal can respectively extract the image features of the image region and the plurality of significant image regions in the image region through the same feature extraction model, so that the occupation of the model structure and parameters of the feature extraction model on the storage resources is reduced, the output image features are ensured to be the same type of image features, and the subsequent splicing of the image features is facilitated.

(2) The terminal determines the at least one target image region from the plurality of image regions based on the plurality of image features of each image region.

In the step, the terminal performs feature splicing on the obtained multiple image features, performs type prediction of a facial defect region on the image region based on the fusion features obtained by splicing, and determines at least one target image region with higher probability from a prediction result. The process is realized by the following steps (2-1) - (2-3), and comprises the following steps:

and (2-1) the terminal performs feature fusion on the plurality of image features of the image area to obtain a fused image feature.

In this step, the terminal splices the image features of the image areas to obtain a fused image feature. For example, if the plurality of image features are all one-dimensional vector features, in this step, the terminal splices the plurality of one-dimensional vector features in an ending manner to obtain a fused image feature.

And (2-2) classifying the image region by the terminal according to the image fusion characteristics to obtain the image category of the image region, and determining the probability that the image region is the image category.

In some embodiments, the terminal classifies the image region according to the fused image features, obtains image categories corresponding to the facial flaws included in the image region, and the probability of each category, and outputs the image category corresponding to the facial flaw with the highest probability and the probability corresponding to the category.

And (2-3) the terminal determining at least one target image area, of the plurality of image areas, with the probability of the image class being greater than a first preset threshold.

And when the probability of the image area is smaller than a first preset threshold, abandoning the image area, and determining the image area as a target image area if the probability of the image area is larger than the first preset threshold.

In the implementation mode, a plurality of salient image areas are determined in an image area, feature extraction is respectively carried out on the image area and the plurality of salient image areas in the image area, on the basis of the whole image area, a plurality of salient image areas with large information quantity are determined to determine whether a face flaw exists in the image area and the image type of the face flaw, the accuracy rate of determining the face flaw is improved, and the plurality of salient image areas with large information quantity are added, so that the fused image features are more specific, a terminal can determine the image type corresponding to the face flaw according to the fused image features, and the face flaws with different analogy are respectively processed.

Step 204: the terminal determines a contour region of the facial blemish from the target image region.

In this step, the terminal performs edge detection on each target image region, and determines the detection result as a contour region of the facial defect. The process is as follows: and the terminal carries out edge detection in the target image area to obtain a facial flaw outline area.

Wherein the terminal determines the contour region by any edge detection algorithm, see, for example, fig. 19. The terminal extracts edge feature points in the target image region by a Difference Of Gauss (DOG) algorithm, and determines a line formed by connecting the extracted edge feature points as the contour region.

Step 205: and the terminal performs image processing on the facial flaws in the contour area in the first image to obtain a second image.

In this step, the terminal determines an image processing method of each image region based on the image type of the image region, and performs image processing on a facial defect in a contour region in the image region based on the image processing method. The process is realized by the following steps (1) to (2), and comprises the following steps:

(1) the terminal determines an image processing mode for processing the image type based on the image type of the target image area.

The terminal stores the corresponding relation between the image type and the image processing mode. In this step, the terminal determines, from the correspondence between the image type and the image processing method, the image processing method corresponding to the image type, based on the image type of each target image region.

(2) And the terminal performs image processing on the target image area in the first image based on the image processing mode of each target image area to obtain the second image.

In this step, the terminal performs image processing on the regions corresponding to the facial flaws in the different image regions based on the different image types, respectively, to obtain second images.

In the implementation mode, when the first image is processed, the image is processed according to different image processing modes, so that the processing of different facial flaws in the first image is targeted, and the image processing result is optimized.

Referring to fig. 20, a block diagram of an image processing apparatus according to an embodiment of the present application is shown. The image processing device may be implemented as all or part of the processor 110 by software, hardware, or a combination of both. The device includes:

a first determining module 2001, configured to determine, from a first image to be processed, a plurality of image regions, which are local regions in a face region of the first image;

a second determining module 2002, configured to determine, for each image area, a plurality of salient feature areas from the image area, where an image feature difference between the image feature and a surrounding image area exceeds a preset threshold;

a third determining module 2003, configured to determine at least one target image area from the plurality of image areas based on the image area and the plurality of display feature areas of the image area, the target image area being an image area including a facial flaw in the face area;

a fourth determination module 2004 for determining a contour region of the facial blemish from the target image region;

the processing module 2005 is configured to perform image processing on the facial defect in the contour region in the first image to obtain a second image.

In one possible implementation, the third determining module 2003 includes:

the extraction unit is used for respectively extracting the features of the image area and a plurality of salient image areas in the image area to obtain a plurality of image features;

a determining unit for determining at least one target image area from the plurality of image areas based on the plurality of image features of each image area.

In another possible implementation manner, the determining unit is configured to perform feature fusion on a plurality of image features of the image region to obtain a fused image feature; classifying the image area according to the image fusion characteristics to obtain the image category of the image area, and determining the probability that the image area is the image category; at least one target image area with the probability of the image category larger than a first preset threshold is determined from the plurality of image areas.

In another possible implementation manner, the processing module 2005 is configured to determine, based on an image category of the target image region, an image processing manner for processing the image category; and performing image processing on the target image area in the first image based on the image processing mode of each target image area to obtain a second image.

In another possible implementation, the second determining module 2002 is configured to determine information confidence levels of a plurality of local regions in the image region, where the information confidence levels are used to indicate probabilities that the local regions include facial flaws; determining a local area with information confidence coefficient larger than a second preset threshold value from the plurality of local areas; and determining the local area with the confidence coefficient larger than a second preset threshold value as a salient image area.

In another possible implementation, the first determining module 2001 is configured to determine a plurality of target positions from the first image; and for each target position, determining a plurality of image areas corresponding to the target position based on a plurality of preset selection ranges.

In another possible implementation manner, the fourth determining module 2004 is configured to perform edge detection in the target image area to obtain a contour area of the facial defect.

In another possible implementation manner, the apparatus further includes:

the segmentation module is used for responding to the existence of a face region in the first image, and performing image segmentation on the face region to obtain a first face region and a second face region, wherein the first face region is an image processing region, and the second face region is a region which does not need image processing;

a first determining module 2001 for determining a plurality of image areas in the first face area.

The embodiment of the present application also provides a computer-readable medium, which stores at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the image processing method as shown in the above embodiments.

The embodiment of the present application further provides a computer program product, where at least one instruction is stored, and the at least one instruction is loaded and executed by the processor to implement the image processing method as shown in the above embodiments.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

determining a contour region of a facial flaw from the target image region;

and carrying out image processing on the facial flaws in the contour area in the first image to obtain a second image.

2. The method of claim 1, wherein determining at least one target image region from the plurality of image regions based on the image region and a plurality of display feature regions of the image region comprises:

respectively extracting the features of the image area and a plurality of significant image areas in the image area to obtain a plurality of image features;

determining the at least one target image region from the plurality of image regions based on the plurality of image features for each image region.

3. The method of claim 2, wherein determining the at least one target image region from the plurality of image regions based on the plurality of image features of each image region comprises:

performing feature fusion on the plurality of image features of the image area to obtain fused image features;

classifying the image area according to the image fusion characteristics to obtain the image category of the image area, and determining the probability that the image area is the image category;

and determining at least one target image area with the probability of the image category larger than a first preset threshold from the plurality of image areas.

4. The method of claim 3, wherein said image processing the facial blemish in the outline region in the first image to obtain a second image comprises:

determining an image processing mode for processing the image type based on the image type of the target image area;

and performing image processing on the target image area in the first image based on the image processing mode of each target image area to obtain the second image.

5. The method of claim 1, wherein determining a plurality of salient feature regions from the image region comprises:

determining information confidence of a plurality of local regions in the image region, wherein the information confidence is used for representing the probability that the local regions comprise facial flaws;

determining a local area with information confidence coefficient larger than a second preset threshold value from the plurality of local areas;

and determining the local area with the confidence coefficient larger than the second preset threshold value as the salient image area.

6. The method of claim 1, wherein determining a plurality of image regions from the first image to be processed comprises:

determining a plurality of target positions from the first image;

for each target position, determining a plurality of image areas corresponding to the target position based on a plurality of preset selection ranges.

7. The method of claim 1, wherein determining a contour region of a facial blemish from the target image region comprises:

and carrying out edge detection in the target image area to obtain a facial flaw contour area.

8. The method of claim 1, wherein prior to determining the plurality of image regions from the first image to be processed, the method further comprises:

responding to the existence of a face area in the first image, and performing image segmentation on the face area to obtain a first face area and a second face area, wherein the first face area is an image processing area, and the second face area is an area which does not need image processing;

the determining a plurality of image areas from a first image to be processed comprises:

in the first face region, the plurality of image regions are determined.

9. An image processing apparatus, characterized in that the apparatus comprises:

and the processing module is used for carrying out image processing on the facial flaws in the contour region in the first image to obtain a second image.

10. A terminal, characterized in that the terminal comprises a processor and a memory; the memory stores at least one instruction for execution by the processor to implement the image processing method of any of claims 1 to 8.

11. A computer-readable storage medium having stored thereon at least one instruction for execution by a processor to implement the image processing method of any one of claims 1 to 8.