CN112990331A

CN112990331A - Image processing method, electronic device, and storage medium

Info

Publication number: CN112990331A
Application number: CN202110328263.5A
Authority: CN
Inventors: 陈济楠; 豆泽阳; 蒋阳
Original assignee: Gongdadi Innovation Technology Shenzhen Co ltd
Current assignee: Gongdadi Innovation Technology Shenzhen Co ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2021-06-18

Abstract

The application provides an image processing method, an electronic device and a storage medium, wherein the method comprises the following steps: analyzing images to be processed with different preset sizes according to a pre-trained first image processing model to obtain a ternary image set of the images to be processed; inputting the ternary image set and the target training data set into a pre-trained second image processing model for training to obtain a third image processing model; analyzing the image to be processed based on the third image processing model to obtain a foreground mask of the image to be processed; and filling a target background for the foreground mask to obtain a target image. The acquisition efficiency of the ternary image and the use flexibility of the image processing model can be improved.

Description

Image processing method, electronic device, and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, an electronic device, and a storage medium.

Background

At present, image processing is generally performed by using a big data model, for example, matting processing on images, but a large number of original images and ternary images of the original images are required due to the training process of the big data model. The existing method for obtaining the original image ternary diagram is mainly marked manually, and due to the fact that manual marking efficiency is low and accuracy cannot be guaranteed, the difficulty in obtaining a training sample of a big data model is large. And different ternary graphs corresponding to different images cannot be uniformly acquired, so that different big data models need to be trained for different application scenes. Therefore, the problems that the training sample is difficult to obtain and cannot be flexibly applied exist in the conventional image processing by utilizing a big data model.

Disclosure of Invention

The application provides an image processing method, electronic equipment and a storage medium, aiming at reducing the difficulty in obtaining training samples for an image processing model and flexibly applying the image processing model in different scenes.

In a first aspect, an embodiment of the present application provides an image processing method, including:

analyzing images to be processed with different preset sizes according to a pre-trained first image processing model to obtain a ternary image set of the images to be processed;

inputting the ternary image set and the target training data set into a pre-trained second image processing model for training to obtain a third image processing model;

analyzing the image to be processed based on the third image processing model to obtain a foreground mask of the image to be processed;

and filling a target background for the foreground mask to obtain a target image.

In a second aspect, an embodiment of the present application provides an electronic device, including a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and, when executing the computer program, implement the image processing method according to the first aspect.

In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and when executed by a processor, the computer program causes the processor to implement the image processing method according to the first aspect.

The embodiment of the application provides an image processing method, electronic equipment and a storage medium, and the method comprises the steps of firstly, analyzing to-be-processed images with different preset sizes through a pre-trained first image processing model to obtain a ternary image set of the to-be-processed images, so that manual participation is avoided, and efficiency and accuracy of ternary image acquisition are effectively improved; secondly, retraining the pre-trained second image processing model according to the acquired ternary image set and the target data set to obtain a third image processing model which can meet the current application scene, analyzing the image to be processed through the third image processing model to obtain a foreground mask of the image to be processed, and filling the foreground mask with a target background to obtain the target image. The method improves the efficiency and accuracy of obtaining the ternary image set, and simultaneously realizes the application flexibility of the image processing model in different scenes.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure of the embodiments of the application.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of an implementation of an image processing method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a training process of a first image processing model provided in an embodiment of the present application;

FIG. 3 is a flowchart of a specific implementation of a training process for a first image processing model;

FIG. 4 is a schematic structural diagram of a preset image segmentation model;

FIG. 5(a) is a flowchart of an implementation of S101 in FIG. 1;

FIG. 5(b) is a flowchart of another specific implementation of S101 in FIG. 1;

fig. 6 is a schematic view of an application scenario of an image processing method provided in an embodiment of the present application;

fig. 7 is a schematic block diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Before explaining an image processing method provided by an embodiment of the present application, an exemplary explanation is first made on an existing image processing method. At present, two common image processing methods, namely full-automatic matting and semi-automatic matting, are available. The full-automatic matting refers to that an image is put into AI matting application, and matting is automatically completed through AI technology. Semi-automatic cutout refers to needing the manual work to carry out simple marking and drawing lines, select the part that needs to be scratched out on the picture, reverse the part that needs to be wiped out on the picture, suggestion AI is scratched out and is used and is scratched where. Of course, in some applications, the two can be combined, the AI matting application automatically scratches out the object which is identified as the main character, and then prompts the operator to mark out the part to be scratched out and the part to be erased, so as to help the AI matting application to correct errors.

At present, in the process of automatic matting by using an AI technology, no matter a full-automatic matting method or a semi-automatic matting method needs support of an AI model, and the accuracy of a matting result has a direct association relationship with training sample selection of the AI model. In the training process of the AI model, the number of samples is sufficient, the image information contained in the samples is rich, and the recognition accuracy of the correspondingly trained AI model is higher. The training samples required by the existing AI matting model comprise an original picture in an application scene, a Trimap corresponding to the original picture and a foreground image Alpha mate. In the prediction stage, Trimap is required to be provided for supervision, namely, a user is required to provide marked Trimap during prediction, or the user intervenes to specify a definite background and a definite foreground region, and then an AI cutout model is used for reasoning to obtain the final Alpha Matte. However, because the existing training data set acquisition process needs to depend on image annotation software manually, for example, PS software performs annotation of a ternary diagram and a foreground image, a large amount of human resources are consumed in the training data set acquisition process, and it is difficult to obtain a sufficient number of training samples satisfying different application scenarios, so that the existing AI matting model cannot be flexibly applied to different scenarios, for example, the AI matting model applied to a portrait may not accurately matte other animals, or when the background color is the same as the foreground color in the same image, there is a problem that matting cannot be performed accurately.

In view of this, the embodiment of the present application provides an image processing method, where images to be processed with different preset sizes are analyzed through a pre-trained first image processing model to obtain a ternary image set of the images to be processed, so as to avoid manual participation and effectively improve efficiency and accuracy of obtaining the ternary images; secondly, retraining the pre-trained second image processing model according to the acquired ternary image set and the target data set to obtain a third image processing model which can meet the current application scene, analyzing the image to be processed through the third image processing model to obtain a foreground mask of the image to be processed, and filling the foreground mask with a target background to obtain the target image. The method improves the efficiency and accuracy of obtaining the ternary image set, and simultaneously realizes the application flexibility of the image processing model in different scenes.

Referring to fig. 1, fig. 1 is a schematic view illustrating an implementation flow of an image processing method according to an embodiment of the present disclosure. The image processing method provided by this embodiment may be implemented by hardware or software of a terminal device or a server, where the terminal device may be an electronic device such as a smart phone, a tablet computer, a notebook computer, and a desktop computer, and the server may be an independent server or a server cluster. The details are as follows:

s101, analyzing the images to be processed with different preset sizes according to the pre-trained first image processing model to obtain a ternary image set of the images to be processed.

The first image processing model is obtained by training a preset image segmentation model based on a target training data set, wherein the target training data set comprises images to be processed with different preset sizes. Illustratively, the preset image segmentation model is a trained completed image segmentation model based on a preset image data set. The preset image dataset is relative to the target training dataset, and the preset image dataset and the target training dataset may comprise images of the same category or images of different categories. Specifically, the images included in the target training data set are determined by the images to be processed in the current application scene, and the images included in the preset image data set are determined by the images to be processed in the historical application scene.

If the image to be processed in the current application scene and the image to be processed in the historical application scene belong to the same category, the target training data set and the preset image data set comprise the images in the same category; if the image to be processed in the current application scene and the image to be processed in the historical application scene do not belong to the same category, the target training data set and the preset image data set comprise images of different categories. For example, the preset image data set includes a face image data set, and the image segmentation model trained based on the preset image data set can be used for image segmentation of an image including a face. If the image to be processed in the current application scene does not include the face image but includes other target objects, for example, the image to be processed in the current application scene includes a building, and in the current application scene, a foreground mask including an image of the building needs to be obtained, a preset image segmentation model needs to be retrained through a target training data set to obtain a first image processing model, a ternary image including the image of the building is obtained based on the first image processing model, and then the scene mask including the image of the building is determined according to the image including the building and the obtained ternary image. Specifically, in this application scenario, the target training dataset is a dataset that includes image components of a building.

Exemplarily, as shown in fig. 2, fig. 2 is a schematic diagram of a training process of a first image processing model provided in an embodiment of the present application. As can be seen from fig. 2, in the present embodiment, the image to be processed in the current application scene may not belong to the same category as the image to be processed in the historical application scene, for example, the image to be processed in the current application scene may include an image of a puppy or other animal, and the image to be processed in the historical application scene is an image including a person. Correspondingly, the target training data set consists of differently sized images including puppies or other animals. The preset image segmentation model is trained based on a preset training set (a preset image data set including images of a person), and specifically is a model for performing face segmentation on images including a person.

In this embodiment, the image segmentation model trained and completed based on the preset training set including the images of the people is retrained based on the target training data set including the puppy or other animals, so that the first image processing model obtained after training can accurately perform image segmentation processing on the images including the puppy or other animals in the current application scene to obtain a ternary image of the image to be processed in the current application scene, thereby realizing flexible use of the image segmentation model in different application scenes.

In addition, it is assumed that the image to be processed in the current application scene and the image to be processed in the historical application scene belong to the same category, for example, the image to be processed in the current application scene and the image to be processed in the historical application scene are both images containing people, and in both the current application scene and the historical application scene, the image containing people needs to be subjected to matting processing, so that the target training data set and the preset image data set contain images of the same category, and both the target training data set and the preset image data set contain data sets formed by images containing people. In addition, when the target data set and the preset image data set contain the same kind of object, the training sample in the target training data set may be an image obtained by labeling the training sample in the preset image data set again. Exemplarily, the training samples in the preset image dataset include target objects and a background having similar features, for example, when a clothing color of the target object (person) is the same as a background color, the clothing color of the person and the background color are respectively labeled to obtain the training samples in the target dataset; or when the target object (person) and other objects wear the same ornaments, the ornaments of the target object and the ornaments of the other objects are respectively marked to obtain training samples in the target data set.

In this embodiment, the preset image segmentation model is retrained again based on the target training data set, which is similar to enriching the training samples in the preset image data set, so that the segmentation accuracy of the trained first image processing model is higher than that of the preset image segmentation model.

In addition, the preset image segmentation model may be a neural network model. The neural network model can be any one of the existing neural network models such as a convolutional neural network model, a sequence neural network model, an end-to-end trainable neural network model based on image sequence recognition and the like. For example, if the neural network model is an end-to-end trainable neural network model identified based on the image sequence, the end-to-end trainable neural network model identified based on the image sequence is used for performing intelligent image matting processing on the image to be processed, and outputting a foreground masking image of the image to be processed. The end-to-end trainable neural network model based on image sequence identification can acquire input images with different sizes and generate predicted images containing foreground masking images with the same size.

In an embodiment of the present application, data enhancement may be performed on an image to be processed (for example, the size of the image to be scratched is 512 times 512), for example, random cropping is performed according to different preset sizes (for example, 224 times 224) to obtain images to be processed with different preset sizes, the images to be processed with different preset sizes are used as training samples in a target training data set, and a preset image segmentation model is input for training to obtain a first image processing model.

Illustratively, after a target training data set is input into an end-to-end trainable neural network model based on image sequence recognition, the end-to-end trainable neural network model based on image sequence recognition does not need to label each individual element (such as the eye, nose and the like of a puppy) in a training sample in detail, runs on a coarse-grained label (such as the puppy) directly, treats an image as a sequence recognition problem, predicts a sequence of each pixel point forming a foreground image from input images with different sizes directly, and outputs a predicted foreground sketch.

In addition, if the preset image segmentation model is an end-to-end trainable neural network model based on image sequence recognition, the preset image segmentation model is trained based on a target training data set, and a first image processing model which is compact and efficient can be obtained.

Specifically, the first image processing model and the preset image segmentation model have the same model architecture, and the first image processing model and the preset image segmentation model have different loss functions.

Specifically, the training process of the first image processing model includes: and training the preset image segmentation model based on the distribution rule of the target training data set to obtain a first image processing model.

The method comprises the steps of training a preset image segmentation model, including continuously carrying out iterative optimization on model parameters of the preset image segmentation model based on a function value of a loss function of a first image processing model until the loss function value of the first image processing model meets a preset condition, stopping iterative optimization on the model parameters of the preset image segmentation model, and obtaining a corresponding first image processing model.

As shown in fig. 3, fig. 3 is a flowchart of a specific implementation of the training process of the first image processing model. As can be seen from fig. 3, the training process of the first image processing model includes S301 to S303. The details are as follows:

s301, determining a loss function of the first image processing model according to the distribution rule of the target training data set.

The loss function of the first image processing model is used for estimating the difference between a predicted value and a true value of the first image processing model for classifying and labeling the pixel points, and is a non-negative real value function. In the case where the training data set of the first image processing model is fixed, the input and output of the first image processing model can be considered to be fixed regardless of the number of training data. The distribution of the corresponding target training data set is in fact equivalent to the parameters of the loss function of the first image processing model. This is because if the distribution of the training data set is not consistent with the distribution of the test data, even if the loss function value of the first image processing model is optimized to the minimum value during the process of training the preset image segmentation model to obtain the first image processing model, the loss value of the first image processing model on the real data may still be large, that is, the error between the test value and the real value is large, and the prediction function cannot be performed based on the corresponding loss value. Therefore, the loss function of the first image processing model needs to be determined according to the distribution rule of the target training data set, so as to obtain the first image processing model with smaller error and accurate prediction.

In an embodiment, presetting a mapping relationship between a distribution rule of a target data set and a loss function, and determining the loss function of the first image processing model according to the distribution rule of the target training data set may include: and determining the loss function of the first image processing model according to the mapping relation between the distribution rule of the preset target data set and the loss function.

For example, if the distribution rule of the target data set conforms to the long-tail distribution rule, the loss function corresponding to the data set conforming to the long-tail distribution rule can be determined to be a binary cross entropy loss function focal loss according to the mapping relation between the distribution rule of the target data set and the loss function; for another example, if the distribution rule of the target data set conforms to the small target distribution rule, the loss function corresponding to the data set conforming to the small target distribution rule may be determined to be the cross-entropy loss function cross-entropy loss according to the mapping relationship between the distribution rule of the target data set and the loss function.

Among them, the long tail distribution is also called zipff's law. The target objects in the image are arranged from high to low according to the times of the target objects appearing (or using) in the image, the arrangement sequence number (also called grade) is represented by r, g (r) represents the times of the target objects with the arrangement sequence number r appearing in the image, and the product of a certain power of r (beta) and g (r) is asymptotically constant, namely g (r) r (beta) is approximately equal to c. A small object distribution means that the small objects in the data set (the animals in the image are small objects with respect to the background) are more abundant than the other objects in the data set (e.g. large and medium objects). For example, in all training set pictures, the small target's percentage reaches 52.3% (e.g., the small target consisting of crowded people), while the medium targets (other background buildings in the image) and the large targets (image background) are distributed relatively more uniformly, and both percentages are less than 52.3%.

S302, inputting the images to be processed with different preset sizes into the preset image segmentation model, and determining the value of the loss function based on the output result of the preset image segmentation model.

After the images to be processed with different preset sizes are input into the preset image segmentation model, the preset image segmentation model respectively outputs the ternary image predicted values corresponding to the images to be processed with different preset sizes.

The loss function is used for comparing the predicted value and the true value of the ternary image respectively to obtain the difference between the predicted value and the true value of the ternary image, and the difference is the comparison value between the predicted value and the true value of the ternary image and corresponds to the value of the loss function. When the comparison value does not satisfy the preset condition (for example, is less than the preset change threshold value 0.1), the network parameter of the preset image segmentation model needs to be continuously updated iteratively until the comparison value satisfies the preset condition, and at this time, the comparison value corresponding to the preset condition is a value of the loss function.

It should be noted that, in this embodiment, a process of retraining a preset image segmentation model based on a target training data set composed of images to be processed with different preset sizes to obtain a first image processing model is performed, and therefore, a function value of a loss function and a loss function appearing in the embodiment of the present application is a function value of a loss function and a loss function corresponding to the first image processing model determined based on a distribution rule of different images to be processed in the target training data set.

In an embodiment, in order to further improve the segmentation accuracy of the first image processing model, an attention mechanism is introduced in the process of training the preset image segmentation model to obtain the first image processing model. As shown in fig. 4, fig. 4 is a schematic structural diagram of a preset image segmentation model. As shown in fig. 4, the preset image segmentation model 400 includes a backbone network 401 and an attention mechanism 402; correspondingly, inputting the images to be processed with different preset sizes into a preset image segmentation model, and determining the value of the loss function based on the output result of the preset image segmentation model, wherein the method comprises the following steps: inputting images to be processed with different preset sizes into a backbone network 401 of a preset image segmentation model for training to obtain a first output result; learning the first output result based on the attention mechanism 402 to obtain a second output result; and determining the value of the loss function according to the first output result and the second output result.

The images to be processed with different preset sizes are input into the preset image segmentation model 400 for training, and actually, the training is performed on the backbone network 401 of the preset image segmentation model 400. In the embodiment of the present application, as shown in fig. 4, an attention mechanism (Attn)402 is introduced after a Trunk network (Trunk)401 of a preset image segmentation model 400, and the attention mechanism 402 further performs attention allocation probability learning on a first output result of the Trunk network 401, and performs attention allocation probability learning on image segmentation results with different probabilities represented by the first output result, so as to increase a change prediction process of the attention mechanism 402 to obtain a second output result. The ratio of the second output result to the first output result represents a probability ratio (which can also be understood as a ratio of a score of an image as a foreground image to a score of a background image) of a foreground image and a background image included in the currently predicted image segmentation result, a first loss value is obtained by multiplying a current loss value of the preset image segmentation model by a ratio of the foreground image in the probability ratio, and a second loss value is obtained by multiplying a current loss value of the preset image segmentation model by a ratio of the background image in the probability ratio; the first loss value and the second loss value are summed as a value of the loss function.

In this embodiment, the first result output by the attention mechanism to the backbone network of the preset image segmentation model is subjected to attention allocation probability learning based on the attention mechanism, and the first result output by the backbone network and the second result output after the attention allocation probability learning based on the attention mechanism are considered in a comprehensive manner, so as to determine the value of the loss function of the first image processing model, and further improve the prediction accuracy of the first image processing model.

S303, adjusting the parameters of the preset image segmentation model based on the value of the loss function to obtain the first image processing model.

In an embodiment, adjusting a parameter of a preset image segmentation model based on a value of a loss function to obtain a first image processing model includes: and adjusting parameters of a backbone network of the preset image segmentation model based on the value of the loss function until the value of the loss function meets the preset condition according to the first output result and the second output result, and stopping adjusting the parameters of the backbone network of the preset image segmentation model to obtain the first image processing model.

Wherein the value of the loss function satisfies a predetermined condition of being less than or equal to a predetermined threshold (e.g., 0.03). The value of the penalty function represents the difference between the predicted result and the true result. Correspondingly, in the embodiment, because the loss function is determined based on the distribution rule of the target training data set, the distribution rule of the training data set is consistent with the distribution rule of the test data set, and further, the smaller the preset threshold value is, the smaller the difference between the representative prediction result and the true value is; furthermore, in the process of training the preset image segmentation model based on the loss function to obtain the first image processing model, an attention mechanism is introduced, and the attention mechanism is used for learning the probability of attention distribution of the prediction result output by the main network of the preset image segmentation model, so that the prediction accuracy of the first image processing model is effectively improved.

It is to be understood that the above description is only an exemplary description of the training process of the first image processing model, and does not constitute a limitation of the first image processing model.

Optionally, in the training process of the first image processing model, the distribution rule of the target data set may be automatically extracted by using an AutoML principle, and for a preset image segmentation model, the preset image segmentation model uses an enhanced Pipeline class, and processing steps (such as data scaling) of the preset image segmentation model and a supervision model (such as a classifier) are linked together, so that multiple processing steps are combined into one step list, thereby reducing the preprocessing and classification processes of the preset image segmentation model, for example, the preset image segmentation model is abstracted into an encoder model including an encoder module and a decoder module, and further a spatial information branch and a semantic high-order branch are adopted for model training, thereby improving the training efficiency of the first image processing model.

For another example, assuming that the preset image segmentation model uses an HRNet + OCRNet structure, based on the knowledge distillation principle, the first image processing model adopts a terminal network structure such as MobileNet, and combines the few-shot algorithm to train the model of the terminal network structure, so as to obtain the trained first image processing model. The first image processing model has a more compact and efficient network structure, data under different application scenes can be migrated through the training method, and the training efficiency of the model is improved.

Specifically, analyzing the images to be processed with different preset sizes according to a pre-trained first image processing model to obtain a ternary image set of the images to be processed, including: and inputting the image to be processed into the first image processing model for carrying out pixel point classification and labeling, and obtaining a ternary image set of the image to be processed according to the pixel point classification and labeling result.

The method comprises the steps of inputting an image to be processed into a first image processing model for carrying out pixel point classification and labeling, respectively obtaining scores of each pixel point as a foreground image and a background image, determining pixel points of a foreground region, pixel points of a background region and pixel points of an undetermined region included in the image to be processed according to the scores of each pixel point as the foreground image and the background image, and forming a ternary image set of the image to be processed by the pixel points of the foreground region, the pixel points of the background region and the pixel points of the undetermined region included in the image to be processed.

Exemplarily, the output layer of the first image processing model obtains scores of the foreground image and the background image of each pixel point in the image to be processed by using the Softmax function according to the category of each pixel point, and maps the scores into the foreground category, the background category or the undetermined area category.

For example, the first image processing model finally outputs a prediction image which is consistent with the size of the input image to be subjected to matting, wherein the prediction image comprises a foreground image, a background image and an undetermined area image, and the prediction image is called a ternary image of the image to be processed. The pixel points corresponding to the predicted foreground image are displayed in white, the pixel points corresponding to the predicted background image are displayed in black, and the pixel points corresponding to the predicted undetermined area are displayed in a transition color of white and black. For example, the pixel value of the corresponding pixel displayed as black is 0, the pixel value of the corresponding pixel displayed as white is 255, and the pixel value of the corresponding pixel displayed as a transition color is 128.

The foreground image generally represents a target object in the image to be processed, and the background image generally represents a background and other objects except the target object in the image to be processed; the undetermined area image refers to an area where it is difficult to recognize whether it is a target object or another object.

For example, if there is a region in the image to be processed whose color is similar to the color of the target object, the region may be an undetermined region; or when the image to be processed includes a group of people, the target object is an individual in the group of people, and similar features exist between the individual and other individuals, other individuals in the group of people except the target object may be undetermined areas.

Specifically, according to the scores of the foreground image and the background image of each pixel output by the Softmax function, whether each pixel belongs to the foreground image, the background image or an undetermined area can be determined; for example, if the score of a pixel point as a background image is 0.94 and the score of a foreground portrait is 0.06, the pixel point is determined as the background image according to the score of the pixel point; if the score of the pixel point as the background image and the score of the pixel point as the foreground image are both low, for example, 0.03, the pixel point is determined as an undetermined area image according to the score of the pixel point.

In this embodiment, the first image processing model is used for performing pixel point classification and labeling on the image to be processed, and the ternary image of the image to be processed is determined according to the prediction scores of the pixel points belonging to different categories, so that manual labeling is not required, and the efficiency and accuracy of obtaining the ternary image can be improved.

Illustratively, as shown in fig. 5(a), fig. 5(a) is a flowchart of a specific implementation of S101 in fig. 1. As shown in fig. 5(a), S101 includes S1011 to S1013. The details are as follows:

and S1011, inputting the image to be processed into the first image processing model for carrying out pixel point classification and annotation, and obtaining the prediction score of each pixel point in the image to be processed.

The prediction score of each pixel point comprises a foreground score and a background score; in this implementation, the prediction score of each pixel point is used for comparison with a preset image category threshold, so as to obtain a ternary image of the image to be processed.

And S1012, determining a ternary image of the image to be processed according to the prediction score of each pixel point and a preset image category threshold.

The prediction score of the pixel point reflects the judgment confidence of the image category (foreground image, background image or uncertain image) to which the current pixel point belongs; and comparing the judgment confidence coefficient of the representative image category with a preset image category threshold value, determining the image category to which the corresponding pixel point belongs, and further determining the ternary image of the image to be processed according to the image category of each pixel point.

In an embodiment, determining a ternary image of an image to be processed according to the prediction score of each pixel point and a preset image category threshold may include: respectively comparing the prediction value of each pixel point with a preset image category threshold value, and determining the category of each pixel point in the ternary image according to the comparison result; and obtaining the ternary image of the image to be processed based on the category of each pixel point in the ternary image.

Specifically, the prediction score of each pixel point includes: a foreground score and a background score; the method comprises the following steps of comparing the prediction score of each pixel point with a preset image category threshold value, and determining the category of each pixel point in the ternary image according to the comparison result, wherein the method comprises the following steps: if the foreground score of a pixel point is greater than or equal to a preset image category threshold value and the background score of the pixel point is smaller than the preset image category threshold value, determining that the pixel point corresponds to a foreground image in a ternary image; if the background score of a pixel point is greater than or equal to a preset image category threshold value and the foreground score of the pixel point is less than the preset image category threshold value, determining that the pixel point corresponds to a background image in a ternary image; and if the foreground score and the background score of a pixel point are both smaller than a preset image category threshold, or if the foreground score and the background score of the pixel point are both larger than the preset image category threshold, determining that the pixel point corresponds to an uncertain image in the ternary image.

And S1013, performing data expansion on the ternary image of the image to be processed to obtain a ternary image set of the image to be processed.

The data expansion of the ternary image of the image to be processed to obtain the ternary image set of the image to be processed may include: and randomly cutting the ternary image of the image to be processed in different directions, and zooming according to different sizes to obtain a ternary image set of the image to be processed.

In this embodiment, the prediction scores of the pixel points in the image to be processed by the first image processing model are respectively compared with the preset image category threshold, and the category of the ternary image corresponding to each pixel point is accurately determined by introducing the image category threshold, so that the ternary image of the image to be processed is automatically generated based on the first image processing model, and further the ternary image of the image to be processed is subjected to data expansion, and the acquisition efficiency of the ternary image is improved.

Exemplarily, in another embodiment of the present application, as shown in fig. 5(b), fig. 5(b) is a flowchart of another specific implementation of S101 in fig. 1. As shown in fig. 5(b), S101 includes S1014 to S1016. The details are as follows:

s1014, aiming at the image to be processed with any preset size, inputting the image to be processed into the first image processing model for carrying out pixel point classification and labeling, and obtaining a segmentation mask covered on each pixel point in the image to be processed.

The image to be processed is input into a first image processing model for carrying out pixel point classification and labeling, and the first image processing model outputs three prediction results. Correspondingly, the first prediction result is a target detection frame, the second prediction result is a prediction score of each pixel point in the image to be processed, and the third prediction result is a segmentation mask (mask) covered on each pixel point in the image to be processed. In the present embodiment, the segmentation mask is selected from the output results of the first image processing model.

And S1015, performing morphological processing on the segmentation mask to obtain a ternary image of the image to be processed.

In the present embodiment, the performing the morphological processing on the division mask includes performing the morphological etching processing and the morphological dilation processing on the division mask a plurality of times.

Exemplarily, the performing morphological processing on the segmentation mask to obtain a ternary image of the image to be processed includes: performing multiple morphological corrosion treatments on the segmentation mask to obtain a first morphological image; performing mask expansion on the segmentation mask based on a superpixel algorithm, and performing multiple morphological expansion processing on the expanded mask to obtain a second morphological image; and determining a ternary image of the image to be processed according to the first morphological image and the second morphological image.

The purpose of carrying out multiple morphological erosion treatments on the segmentation mask is that the segmentation mask is only one block with completely connected interior because the edge fineness of the segmentation mask obtained by segmentation of the first image processing model is not high, the interior space of the object is not segmented, and the space information of the interior details of the target object cannot be reflected. The mask is regarded as a foreground region, so that the mask needs to be etched to a proper size, the mask can be used as a foreground region to generate the segmentation mask, and excessive background regions cannot be introduced, so that the segmentation result is not ideal. Specifically, the morphological etching of the division mask is performed several times, and it is experimentally confirmed that, for example, the etching of the division mask to 1/3, which has an area occupying only the area of the original mask, can obtain a good experimental effect. The 1/3 part has 1 set pixel, i.e. the determined foreground region, and the other surrounding regions have 0 set pixel, i.e. the determined background region. The image obtained by etching 1/3 the segmentation mask to an area occupying only the original mask area is referred to as a first morphological image.

Unlike simple expansion of the split mask, in this embodiment, the split mask needs to be expanded based on the superpixel algorithm, and the expanded mask is expanded multiple times (for example, through experimental comparison, a satisfactory effect can be achieved when the area obtained by expansion is 1/3 of the area of the mask after superpixel expansion). The method aims to expand the area to be segmented as much as possible, because the edge information of the segmentation mask is missing, the missing part is possibly sharp or narrow in shape, large-area communication is possibly formed between pixel points, and the missing part cannot be covered by simple expansion. The image after the super-pixel expansion is subjected to certain extension and expansion on the edge of the object, certain filling is carried out on the missing part, and the missing part of the target foreground can be filled as much as possible through multiple times of expansion. However, the size of the mask cannot be too large, and thus an object which does not belong to the target object but has a pixel value close to the initial mask is mistaken for the target and introduced during segmentation. The partial pixel point is set to be 2, namely, the uncertain region, and other surrounding regions are all background regions (the pixel point value is 0). The image obtained by the cardiac morphological dilation process is referred to as a second morphological image.

Then, according to the first morphological image and the second morphological image, a ternary image of the image to be processed is determined. Specifically, image splicing is carried out on a foreground region in a first morphological image and an uncertain region in a second morphological image, a pixel point with a pixel value of 3 after splicing is set to be 1, and other pixels are unchanged to obtain a ternary diagram of the image to be processed, wherein the ternary diagram comprises three regions which are a determined background region, an uncertain region and a determined foreground region from outside to inside.

S1016, obtaining the ternary image set according to the ternary images of the images to be processed with different preset sizes.

After obtaining the ternary images of the images to be processed with different preset sizes, performing expansion processing on all the ternary images, for example, performing scaling processing on each ternary image to obtain a ternary image set.

In the embodiment, after the segmentation mask of the image to be processed is obtained through the first image processing model, the ternary image of the image to be processed is obtained by using a mode of performing morphological processing on the segmentation mask, so that automatic acquisition of the ternary image is realized, manual participation is avoided, and the accuracy and efficiency of acquisition of the ternary image can be ensured.

And S102, inputting the ternary image set and the target training data set into a pre-trained second image processing model for training to obtain a third image processing model.

Wherein the pre-trained second image processing model may be represented as: the process of training the pre-trained second image processing model is understood to be a process of optimizing the parameters α (opacity, varying between 0 and 1, 1 representing opacity, 0 representing transparency), F (color of foreground region) and B (color of background region) of the model I. I is the currently observable color of the input image and is generally known.

In an embodiment of the present application, the process of inputting the ternary image set and the target training data set into the pre-trained second image processing model for training includes: and finally, after the color of the input image in the superposition synthesis process is changed into a preset superposition synthesis color, the values of the corresponding parameters alpha, F and B are the parameter values of a third image processing model, and a third image processing model is obtained.

S103, analyzing the image to be processed based on the third image processing model to obtain a foreground mask of the image to be processed.

Specifically, the image to be processed is input into a third image processing model, the third image processing model performs foreground mask identification on the image to be processed, and the foreground mask of the image to be processed is directly output.

And S104, filling the target background of the foreground mask to obtain a target image.

Wherein the target background may be an image that satisfies a current application scene. For example, the foreground mask is a face image, and the target background is an image containing a sea and/or a mountain. Illustratively, the face image is filled with a background containing sea, and a target image is obtained.

According to the image processing method provided by the embodiment of the application, the images to be processed with different preset sizes are analyzed through the pre-trained first image processing model to obtain the ternary image set of the images to be processed, so that manual participation is avoided, and the efficiency and accuracy of obtaining the ternary images are effectively improved; and retraining the pre-trained second image processing model according to the acquired ternary image set and the target data set to obtain a third image processing model which can meet the current application scene, analyzing the image to be processed through the third image processing model to obtain a foreground mask of the image to be processed, and realizing target background filling on the foreground mask to obtain a target image. The method improves the efficiency and accuracy of obtaining the ternary image set, and simultaneously realizes the application flexibility of the image processing model in different scenes.

As shown in fig. 6, fig. 6 is a schematic view of an application scenario of the image processing method according to the embodiment of the present application. As can be seen from fig. 6, in the present embodiment, the foreground area 601 of the to-be-processed image 600 includes people, the background area includes buildings, trees, and the like, and the undetermined area includes objects that cannot be clearly identified, such as roads. According to the image processing method provided by the application, the foreground mask of the image 600 to be processed needs to be extracted, and then the target background filling is performed on the extracted foreground mask to obtain the target image. Specifically, the process of extracting the foreground mask of the image to be processed 600 includes: analyzing the image 600 to be processed according to the pre-trained first image processing model 602 to obtain a ternary image 603 of the image 600 to be processed; the ternary image 603 and the image 600 to be processed are input into the second image processing model 604 for training, so as to obtain a third image processing model (not shown in fig. 6), and the background 605 of the image 600 to be processed is removed by the third image processing model. Correspondingly, the background 605 of the image 600 to be processed is removed, and a foreground mask of the image 600 to be processed is obtained (the foreground mask is not shown in fig. 6). The process of target background filling the obtained foreground mask is also not shown in fig. 6, which can refer to the existing image filling process, and is not described in detail here. For the training process and the explanation of the first image processing model 602, the second image processing model 604 and the third image processing model, reference may be made to the explanation process of the first image processing model, the second image processing model and the third image processing model, which is not repeated herein.

Referring to fig. 7 in conjunction with the above embodiments, fig. 7 is a schematic block diagram of an electronic device 700 according to an embodiment of the present application.

For example, the electronic device 700 may include a terminal device or a server; the terminal equipment can be electronic equipment such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant and wearable equipment; the servers may be independent servers or server clusters.

The electronic device 700 comprises a processor 701 and a memory 702.

Illustratively, the processor 701 and the memory 702 are connected by a bus 703, such as an I2C (Inter-integrated Circuit) bus.

Specifically, the Processor 701 may be a Micro-controller Unit (MCU), a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like.

Specifically, the Memory 702 may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.

The processor 701 is configured to run a computer program stored in the memory 702, and when executing the computer program, implement the image processing method.

Illustratively, the processor 701 is configured to run a computer program stored in the memory 702 and to implement the following steps when executing the computer program:

In one embodiment, the training process of the first image processing model includes:

and training a preset image segmentation model based on the distribution rule of the target training data set to obtain the first image processing model.

In an embodiment, the training a preset image segmentation model based on the distribution rule of the target training data set to obtain the first image processing model includes:

determining a loss function of the first image processing model according to the distribution rule of the target training data set;

inputting the images to be processed with different preset sizes into the preset image segmentation model, and determining the value of the loss function based on the output result of the preset image segmentation model;

and adjusting parameters of the preset image segmentation model based on the value of the loss function to obtain the first image processing model.

In an embodiment, the inputting the images to be processed with different preset sizes into the preset image segmentation model, and determining the value of the loss function based on the output result of the preset image segmentation model includes:

inputting the images to be processed with different preset sizes into the preset image segmentation model for model training to obtain a first output result;

learning the first output result based on an attention mechanism to obtain a second output result;

and determining the value of the loss function according to the first output result and the second output result.

In an embodiment, the analyzing, according to a pre-trained first image processing model, images to be processed with different preset sizes to obtain a ternary image set of the images to be processed includes:

inputting the image to be processed into the first image processing model for carrying out pixel point classification and labeling to obtain a prediction score of each pixel point in the image to be processed;

obtaining a ternary image of the image to be processed according to the prediction value of each pixel point and a preset image category threshold;

and performing data expansion on the ternary image of the image to be processed to obtain a ternary image set of the image to be processed.

In an embodiment, the obtaining a ternary image of the image to be processed according to the prediction score of each pixel point and a preset image category threshold includes:

respectively comparing the prediction value of each pixel point with a preset image category threshold value, and determining the category of each pixel point in the ternary image according to the comparison result;

and obtaining the ternary image of the image to be processed based on the category of each pixel point in the ternary image.

inputting the image to be processed into the first image processing model for carrying out pixel point classification and labeling aiming at the image to be processed with any preset size, so as to obtain a segmentation mask covered on each pixel point in the image to be processed;

performing morphological processing on the segmentation mask to obtain a ternary image of the image to be processed;

and obtaining the ternary image set according to the ternary images of the images to be processed with different preset sizes.

In an embodiment, the performing morphological processing on the segmentation mask to obtain a ternary image of the image to be processed includes:

performing multiple morphological corrosion treatments on the segmentation mask to obtain a first morphological image;

performing mask expansion on the segmentation mask based on a superpixel algorithm, and performing multiple morphological expansion processing on the expanded mask to obtain a second morphological image;

and determining a ternary image of the image to be processed according to the first morphological image and the second morphological image.

The specific principle and implementation manner of the electronic device provided in the embodiment of the present application are similar to those of the image processing method in the foregoing embodiment, and are not described herein again.

The embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is enabled to implement the steps of the image processing method provided by the above embodiment.

The computer-readable storage medium may be an internal storage unit of the electronic device according to any of the foregoing embodiments, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It should also be understood that the term "and/or" as used in this application and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An image processing method, comprising:

2. The image processing method of claim 1, wherein the training process of the first image processing model comprises:

3. The image processing method according to claim 2, wherein the training a preset image segmentation model based on the distribution rule of the target training data set to obtain the first image processing model comprises:

4. The image processing method according to claim 3, wherein the inputting the images to be processed with different preset sizes into the preset image segmentation model, and determining the value of the loss function based on the output result of the preset image segmentation model comprises:

5. The image processing method according to any one of claims 1 to 4, wherein the analyzing the images to be processed with different preset sizes according to the pre-trained first image processing model to obtain the ternary image set of the images to be processed comprises:

6. The image processing method according to claim 5, wherein obtaining the ternary image of the image to be processed according to the prediction score of each pixel point and a preset image category threshold comprises:

7. The image processing method according to any one of claims 1 to 4, wherein the analyzing the images to be processed with different preset sizes according to the pre-trained first image processing model to obtain the ternary image set of the images to be processed comprises:

8. The image processing method according to claim 7, wherein the performing morphological processing on the segmentation mask to obtain a ternary image of the image to be processed comprises:

9. An electronic device comprising a memory and a processor;

the memory is used for storing a computer program;

the processor for executing the computer program and implementing the image processing method according to any of claims 1-8 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to implement the image processing method according to any one of claims 1 to 8.