CN109461167B

CN109461167B - Training method, matting method, device, medium and terminal of image processing model

Info

Publication number: CN109461167B
Application number: CN201811302651.0A
Authority: CN
Inventors: 朱豪; 刘耀勇; 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-11-02
Filing date: 2018-11-02
Publication date: 2020-07-21
Anticipated expiration: 2038-11-02
Also published as: CN109461167A

Abstract

The embodiment of the application discloses a training method, a matting method, a device, a medium and a terminal of an image processing model. The training method of the image processing model comprises the steps of obtaining a trimap image of an original image; generating a training sample set according to the original image and the trimap image; and training a preset deep learning network based on the training sample set to obtain an image processing model, wherein the image processing model is used for performing labeling processing on an original image to obtain a trisection image. By adopting the technical scheme, the input original image can be automatically labeled to obtain the three-part graph, a large amount of hair-level data labeling is not required to be carried out in a manual labeling mode, the labeling workload can be reduced, and the image labeling efficiency is improved. In addition, the original image is marked by adopting the image processing model, so that errors possibly introduced by manual marking are avoided, and the matting effect can be optimized.

Description

Training method, matting method, device, medium and terminal of image processing model

Technical Field

The embodiment of the application relates to an image processing technology, in particular to a training method, a matting method, a device, a medium and a terminal of an image processing model.

Background

At present, matting becomes one of the most commonly done operations in image processing. For example, more and more people choose to buy clothes on the internet, and the function of searching for things by pictures of e-commerce is produced. It is difficult to accurately search for similar clothes, so it is necessary to divide the portrait in the picture. As another example, some portrait beautification-like functions also rely on accurate segmentation of backgrounds and portraits.

The matting scheme in the related art is mainly realized based on a clustering method of pixels and an algorithm such as graph partitioning (graphpartioning) algorithm. In the case that the background is relatively complex or the similarity between the background and the foreground (which may also be referred to as a matting target) is very large, the segmentation effect is not good, for example, the perfect segmentation of hair, fine clothes, branches and other delicate articles cannot be realized.

Disclosure of Invention

The embodiment of the application provides a training method, a matting method, a device, a medium and a terminal of an image processing model, which can optimize a matting scheme in the related technology.

In a first aspect, an embodiment of the present application provides a training method for an image processing model, including:

acquiring a trimap image of an original image;

generating a training sample set according to the original image and the trimap image;

and training a preset deep learning network based on the training sample set to obtain an image processing model, wherein the image processing model is used for performing labeling processing on an original image to obtain a trisection image.

In a second aspect, an embodiment of the present application further provides a matting method, including:

acquiring a target picture to be scratched;

labeling the target picture through an image processing model to obtain a trisection image of the target picture, wherein the image processing model is a deep learning network trained through a training sample set formed by an original image and the trisection image;

and based on the target picture and the trisection picture, performing matting processing on the target picture by adopting a set matting algorithm to obtain a matting image.

In a third aspect, an embodiment of the present application further provides a training apparatus for an image processing model, including:

the trimap image acquisition module is used for acquiring trimap images of the original images;

the sample generation module is used for generating a training sample set according to the original image and the trimap image;

and the model training module is used for training a preset deep learning network based on the training sample set to obtain an image processing model, wherein the image processing model is used for performing labeling processing on an original image to obtain a ternary diagram.

In a fourth aspect, an embodiment of the present application further provides a matting device, including:

the target picture acquisition module is used for acquiring a target picture to be subjected to matting;

the image annotation module is used for performing annotation processing on the target image through an image processing model to obtain a trisection image of the target image, wherein the image processing model is a deep learning network trained through a training sample set consisting of an original image and the trisection image;

and the matting module is used for carrying out matting processing on the target picture by adopting a set matting algorithm based on the target picture and the trisection picture to obtain a matting image.

In a fifth aspect, embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored; the program is executed by a processor to implement a training method of an image processing model as described in the embodiments of the present application, or the program is executed by a processor to implement a matting method as described in the embodiments of the present application.

In a sixth aspect, an embodiment of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor; the processor implements the training method of the image processing model according to the embodiment of the application when executing the computer program, or implements the matting method according to the embodiment of the application when executing the computer program.

The embodiment of the application provides a training scheme of an image processing model, which comprises the steps of obtaining a trimap image of an original image, and generating a training sample set according to a plurality of original images and corresponding trimap images; training a preset deep learning network based on the training sample set to iteratively update various parameter values of the deep learning network, obtaining an image processing model after training is completed, and performing labeling processing on an original image through the image processing model to obtain a trimap image. By adopting the technical scheme, the deep learning network can be trained on the basis of the original image and the corresponding trimap image, so that the input original image can be automatically labeled to obtain the trimap image, a large amount of data labels at the hair level are not required to be carried out in a manual labeling mode, the labeling workload can be reduced, and the image labeling efficiency is improved. In addition, the original image is marked by adopting the image processing model, so that errors possibly introduced by manual marking are avoided, and the optimized matting effect can be improved.

Drawings

Fig. 1 is a flowchart of a training method of an image processing model according to an embodiment of the present disclosure;

FIG. 2 is a flowchart of another method for training an image processing model according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of a matting method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a training apparatus for an image processing model according to an embodiment of the present disclosure;

fig. 5 is a block diagram of a matting device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of another terminal provided in the embodiment of the present application;

fig. 8 is a block diagram of a structure of a smart phone according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the steps as a sequential process, many of the steps can be performed in parallel, concurrently or simultaneously. In addition, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Fig. 1 is a flowchart of a training method of an image processing model according to an embodiment of the present application, which may be performed by an apparatus for training an image processing model, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a terminal or a server. As shown in fig. 1, the method includes:

and step 110, acquiring a trimap image of the original image.

It should be noted that, the terminal in the embodiment of the present application may include an intelligent device such as a mobile phone, a tablet computer, a notebook computer, and a desktop computer.

It should be noted that the trimap image, also called trimap image, is a contour image for labeling the edge of the target object in the image. And manually marking the outline of the target object in the original image based on the set matting algorithm and the user input operation, thereby providing constraint information for the matting operation. For example, the TRIMAP algorithm is adopted to roughly divide the original image to divide the original image into a foreground, a background and an unknown region to be solved, white represents the foreground, black represents the background, and gray represents the unknown region to be solved, so as to obtain a TRIMAP image.

Illustratively, label information of pixels in an original image is acquired, and a trimap image of the original image is generated according to the label information. The labeling information may be an outline or a boundary between a foreground and a background in the manually labeled original image.

And 120, generating a training sample set according to the original image and the trimap image.

Illustratively, the original image is associated with a corresponding trimap image to serve as a training sample, and a set number of training samples form a training sample set. Wherein the set number may be a system default number.

Since the trimap image is obtained by manually labeling the set region in the original image, a labeling error which is not easily perceived by human eyes may occur during manual labeling, and thus the effect of the cutout operation performed based on the trimap image is not good. In view of the problem, whether the trimap graph meets the set condition can be verified, and whether the trimap graph and the corresponding original image are used as training samples or not is judged according to the verification result. The setting condition may be that the score of the cutout image exceeds a set score threshold, and the embodiment of the present application is not particularly limited. And assuming that the trimap graph meets the set conditions, taking the trimap graph and the corresponding original image as training samples, and forming a training sample set by a set number of training samples.

Illustratively, based on the original image and the trimap image, a set matting algorithm is adopted to perform matting processing on the original image to obtain a matting image. And acquiring evaluation information of the sectional image, and judging whether the trimap image meets the set conditions according to the evaluation information. In the embodiment of the present application, the evaluation information may be data for evaluating the matting effect of the matting image, such as scoring the matting image, or sorting the matting image according to the matting effect, and the like.

There are many ways to acquire evaluation information, and the embodiment of the present application is not particularly limited. In some examples, the evaluation information may be a result of analyzing a user operation. For example, after generating a cutout image, if a user's correction operation on the cutout image is detected, a lower score (a score lower than a set threshold) is given to the cutout image. The matte image can also be scored according to the number of the positions of the correction operation of the user, that is, the more the positions corrected by the user, the lower the score of the matte image. And comparing the score of the cutout image with a set score threshold, and judging that the trimap image meets the set condition when the score exceeds the set score threshold.

And step 130, training a preset deep learning network based on the training sample set to obtain an image processing model.

In the embodiment of the application, the image processing model can automatically label the input original image to obtain the three-part graph.

The deep learning refers to an algorithm set for solving various problems in form files such as images and texts by applying various machine learning algorithms on the multilayer neural network. Deep learning can fall into neural networks in a broad category, but there are many variations on the specific implementation. The core of deep learning is feature learning, and aims to obtain hierarchical feature information through a hierarchical network, so that the important problem that features need to be designed manually in the past is solved. Deep learning is a framework that contains a plurality of important algorithms, such as Convolutional Neural Networks (CNN), auto encoders (autoencoders), Sparse Coding (Sparse Coding), Restricted Boltzmann Machines (RBM), Deep Belief Networks (DBN), and multi-layer feedback cyclic Neural Networks (RNN). For different problems (such as images, voice, text and the like), different frames can be selected, and the frame selection can be performed by combining factors such as the operation speed, the operation accuracy and the like.

The parameters of the deep learning network include the weight of the edge between layers in the hierarchical network and the bias theta of the neuron, etc.

For example, the preset deep learning network in the present application may be a deep learning model in which a post-processing layer is added after an output layer of the deep learning network based on image semantic segmentation. It should be noted that an image is composed of many pixels, and semantic Segmentation means that pixels are grouped (Grouping)/segmented (Segmentation) according to the difference of semantic meanings expressed in the image. The segmented sample images can be adopted to train the deep convolution neural network model to obtain the deep learning network based on image semantic segmentation, so that the original images can be segmented based on image semantics through the deep learning network based on semantic segmentation.

Taking an original image including a human image as an example, the human image is taken as a target image, and in order to achieve the purpose of segmenting the human image from the background, the original image can be subjected to semantic segmentation by adopting a deep convolutional neural network model, and the obtained output image is a class image in which the human image is represented by white pixels and the background is represented by black pixels.

The post-processing layer is arranged behind an output layer of the deep learning network based on the image semantic segmentation, and is used for performing segmented thresholding on a class map which is output by the deep learning network based on the image semantic segmentation and has two colors of black and white to obtain a three-segment map with three colors of black, white and gray. The segmentation thresholding may be an operation of classifying the pixels included in the class map into several classes based on a threshold segmentation method. For example, the pixel points of the category map having two colors of black and white may be divided according to the gray level to obtain 3 pixel sets, the pixel points in each pixel set belong to the same threshold range section, and the pixel points in different pixel sets belong to different threshold range sections.

Illustratively, based on the training sample set, the preset deep learning network is trained by using a forward propagation algorithm (i.e., the original image is input into the preset deep learning network), so as to obtain an actual output result. Based on the back propagation algorithm, the difference between the actual output result and the ideal output result (i.e. the trimap in the training sample set) is calculated. And adjusting various parameter values of the deep learning network according to a method of minimizing errors and a back propagation algorithm.

It should be noted that, if the server executes the method for training the image processing model, after the training is completed, the server issues the image processing model to the terminal. And when the model updating event is triggered, the server updates the image processing model and sends the updated image processing model to the terminal. And the server receives the labeling effect information of the trimap image obtained by labeling the original image in the image processing model fed back by the terminal. And if the number of the messages with poor feedback labeling effect exceeds a set threshold value, starting the updating operation of the image processing model.

Optionally, if the computing power of the terminal is strong enough, the terminal may also execute the above-mentioned training method of the image processing model. Because the users have subjectivity in evaluating the cutout effect, some users consider the cutout with better effect, and other users may not agree. By executing the training method of the image processing model on the terminal, the image processing model can be updated in time according to the personalized requirements of the user, so that the model suitable for the terminal user can be obtained.

According to the technical scheme of the embodiment of the application, a trimap image of an original image is obtained, and a training sample set is generated according to a plurality of original images and the corresponding trimap images; training a preset deep learning network based on the training sample set to iteratively update various parameter values of the deep learning network, obtaining an image processing model after training is completed, and performing labeling processing on an original image through the image processing model to obtain a trimap image. By adopting the technical scheme, the deep learning network can be trained on the basis of the original image and the corresponding trimap image, so that the input original image can be automatically labeled to obtain the trimap image, a large amount of data labels at the hair level are not required to be carried out in a manual labeling mode, the labeling workload can be reduced, and the image labeling efficiency is improved. In addition, the original image is labeled by adopting the image processing model, so that errors possibly introduced by manual labeling are avoided, and the labeling effect can be improved.

Fig. 2 is a flowchart of another method for training an image processing model according to an embodiment of the present application, and as shown in fig. 2, the method includes:

step 201, acquiring a trimap image of an original image.

Illustratively, a correlation operation performed by a user on an original image for separating a foreground from a background is acquired, and a trimap map of the original image is generated according to the operation.

And 202, based on the original image and the trisection image, performing matting processing on the original image by adopting a set matting algorithm to obtain a matting image.

It should be noted that matting is generally a hyperfine image segmentation technique for segmenting the foreground from the background (generalized image segmentation may also include separation between equivalent objects, etc.). The Matting algorithm is an algorithm applied to the image segmentation technology, and includes, but is not limited to, bayesian Matting algorithm, poisson Matting algorithm, Grabcut segmentation algorithm, domain information-based Matting algorithm (shaded Sampling for Real Time Alpha Matting), robust Matting algorithm (robust Matting), and lazysnapping algorithm. Different matting algorithms can be selected according to different original images, and a proper matting algorithm can also be selected by combining the original images, the matting effect and the matting efficiency.

Illustratively, the original image and the trimap image are used as input data of a selected matting algorithm, and the matting image can be obtained through image segmentation processing.

And step 203, acquiring a target image in the original image.

Illustratively, a pixel point of a user label in a process of generating a trimap image can be recorded, a target object concerned by a user is determined according to the pixel point, and a target image of the target object is obtained from an original image.

And 204, determining the similarity between the sectional image and the target image, and scoring the sectional image according to the similarity.

Illustratively, the scratch image is compared with the target image to determine the similarity between the scratch image and the target image. The corresponding relation between the similarity and the score interval can be predetermined, so that the sectional image is scored according to the similarity. For example, if the similarity exceeds a first threshold value, the score of the sectional image is determined to be 80-100 points; if the similarity exceeds a second threshold value but is smaller than a first threshold value, determining that the score of the sectional image is 60-80 points; if the similarity exceeds a third threshold value but is smaller than a second threshold value, determining that the score of the sectional image is 50-60 points; if the similarity exceeds the fourth threshold but is smaller than the third threshold, determining that the score of the sectional image is 40-50 points; and if the similarity exceeds a fifth threshold value and is less than a fourth threshold value, determining the score of the sectional image to be 0-40.

It should be noted that, in the embodiment of the present application, the manner of determining the similarity may be: and (3) carrying out size reduction processing on the cutout image and the target image, for example, reducing the size of the cutout image and the target image to 8 × 8, reducing the size of the image to a set size to remove the details of the image, only retaining basic information such as structure/brightness and the like, and abandoning image differences caused by different sizes/proportions. And converting the reduced 8 x 8 image into 64-level gray scale, namely that the pixel points only have 64 colors in total. The gray level average of all 64 pixels is calculated. Comparing the gray value of each pixel point in the sectional image with the average value, if the comparison result is greater than or equal to the average value, marking the pixel point as 1, and if the comparison result is less than the average value, marking the pixel point as 0; and combining the comparison results to form a 64-bit integer which is recorded as a sectional image fingerprint. In a similar manner, a target image fingerprint is computed. And comparing the Hamming distance (the Hamming distance between two character strings with equal length is the number of different characters at the corresponding positions of the two character strings) of the sectional image fingerprint and the target image fingerprint, and taking the Hamming distance as the similarity of the two images. That is, a larger hamming distance indicates a larger difference between the two images, and a smaller hamming distance indicates a smaller difference between the two images, the more similar the two images.

It should be noted that there are many ways to score the cutout image, and the method is not limited to the way listed in the embodiments of the present application. For example, the matte image can also be scored according to its completeness, and so on. The completeness can be determined according to whether the pixels of the adjacent regions in the matte image are continuous or not, and can also be determined according to whether the boundary of the matte image is complete or not, and the like.

Step 205, judging whether the score of the cutout image exceeds a set threshold, if so, executing step 206, otherwise, executing step 201.

Illustratively, the score of the matte image is compared with a set threshold, and when the score of the matte image is greater than or equal to the set threshold, step 206 is performed. When the score of the cutout image is smaller than the set threshold, judging that the cutout effect of the cutout image is poor, determining that the trimap image for generating the cutout image does not meet the set condition, giving up the operation of generating the training sample based on the trimap image, and returning to execute the step 201 to obtain the trimap image again.

And step 206, taking the original image and the ternary image as training samples, and generating a training sample set according to the training samples.

Illustratively, when the trimap graph meets the set conditions, establishing an association relationship between the trimap graph and the original image as a training sample. After a set number of training samples are acquired, it is determined that the sample collection operation is completed, and a set number of training samples constitute a training sample set.

And step 207, training a preset deep learning network based on the training sample set to obtain an image processing model.

The preset deep learning network comprises a deep learning network based on image semantic segmentation and a post-processing layer arranged behind an output layer of the deep learning network based on image semantic segmentation, wherein the post-processing layer is the output layer of the image processing model. And setting a segmentation function in the post-processing layer, and processing the class graph output by the deep learning network based on image semantic segmentation through the segmentation function to obtain a trimap graph of the original image. The set piecewise function defines the association relationship between the gray value interval and the target pixel value. For example, if the pixels with gray values a 1-a 2 in the class diagram are black, the corresponding target pixel values may be (0,0,0), the pixels with gray values a 5-a 6 may be white, and the corresponding target pixel values may be (255 ). It should be noted that, although the pixel points whose color is not black (or white) but is close to black (or white) are included in the above range, whether the pixel points are closer to black or white may be determined according to the gray scale value, the corresponding target pixel value is set to (0,0,0) for the pixel points close to black, and the corresponding target pixel value is set to (255 ) for the pixel points closer to white. The pixels with gray values of a 3-a 4 are gray (including pixels that are neither close to black nor close to white), and the corresponding target pixel value may be (192,192,192). It will be appreciated that there are many possible values for the target pixel value and that the values are not limited to the pixel values listed in the above examples. For example, the pixel value for cold gray is (128,138,135), the pixel value for ivory black is (88,87,86), and so on.

Illustratively, a record in a training sample set is obtained, an original image in the record is input into the deep learning network, a class diagram with two colors of black and white is obtained through processing of the deep learning network based on image semantic segmentation, the class diagram is transmitted into a post-processing layer, and pixel values in the class diagram are adjusted through a set function to obtain a trimap diagram with three colors of black, white and gray. Comparing the trimap image with the trimap image obtained in the step 201, and adjusting various parameter values of the deep learning network by adopting a back propagation algorithm to enable the trimap image output by the model to approach the trimap image obtained in the step 201. And after the training is finished, recording the preset deep learning network as an image processing model.

According to the technical scheme of the embodiment of the application, the cutout operation is carried out based on the original image and the manually marked trimap image to obtain the cutout image; the method comprises the steps of determining the matting effect by obtaining a target image in an original image and comparing the matting image, judging whether the trimap image meets a set condition, associating the original image with the trimap image when the set condition is met, and using the original image and the trimap image as a training sample, so that the labeling accuracy of the sample is improved, the processing accuracy of an image processing model based on training of the training sample is improved, and a better matting effect can be obtained. In addition, the original image is labeled by adopting the image processing model so as to automatically generate the trimap image, and the labeling workload can be greatly reduced.

Fig. 3 is a flowchart of a matting method provided in an embodiment of the present application, which can be performed by a matting device, wherein the matting device can be implemented by software and/or hardware, and can be generally integrated in a terminal. As shown in fig. 3, the method includes:

and step 310, acquiring a target picture to be scratched.

It should be noted that the terminal in the application embodiment may include a device provided with an operating system, such as a mobile phone, a tablet computer, a notebook computer, a handheld game console, and an intelligent appliance. The type of the operating system in the embodiment of the present application is not limited, and may include an Android (Android) operating system, a Windows (Windows) operating system, an apple (ios) operating system, and the like.

The target picture may be a local picture determined based on a user operation or a picture in an internet platform. The user operation may include a touch operation, a voice operation, a gesture operation, an eye-gaze operation, or the like. For example, if a user clicks a certain picture in the local picture library, the picture may be determined as the target picture. For another example, if the user watches a certain picture on the internet platform, a prompt message may pop up to ask whether the user needs to perform a matting process on the picture. And if the confirmation instruction input by the user is detected, determining the picture watched by the user as the target picture.

Illustratively, user operation is detected, a target picture to be scratched is determined according to the user operation, and the target picture is acquired from the storage position of the target picture.

And 320, performing labeling processing on the target picture through an image processing model to obtain a three-part picture of the target picture.

In the embodiment of the present application, the trimap map may also be referred to as a trimap map. The image processing model is a deep learning network trained by a training sample set formed by an original image and a trimap image, and is used for carrying out annotation processing on the original image to obtain the trimap image, so that an annotation error caused by individual cognition difference in a manual annotation mode can be avoided, and the annotation precision is greatly improved, thereby improving the matting accuracy of matting based on the trimap image.

Illustratively, a target picture is input into an image processing model, and a trimap image of the target picture is obtained.

And 330, based on the target picture and the trisection picture, performing matting processing on the target picture by adopting a set matting algorithm to obtain a matting image.

The set matting algorithm is an image segmentation algorithm for segmenting the foreground from the background (the generalized image segmentation may also include separation between equivalent objects, etc.). The specific algorithm included in the matting algorithm is described in the above embodiments, and is not described herein again.

Illustratively, a target picture and a trimap picture are used as input data for setting a matting algorithm, and a matting image can be obtained through image segmentation processing. Assuming that the target picture contains the portrait, the target picture and the corresponding trimap image are used as input data of the matting algorithm, and the matting image aiming at the portrait can be obtained through image segmentation processing, so that the matting effect accurate to the hairline level can be realized.

According to the technical scheme of the embodiment of the application, when the matting operation is executed, a target picture to be subjected to matting is obtained; labeling the target picture through an image processing model to obtain a trimap picture; then, the target picture and the trimap picture are used as input data for setting a matting algorithm, and a matting operation is executed to obtain a matting image of the target picture. Through adopting above-mentioned technical scheme, can avoid in the manual mark mode because individual cognitive difference and the mark error that introduces, improve the mark precision greatly to, promoted and carried out the matting degree of accuracy of matting based on this trimap picture.

In some examples, after obtaining the target picture to be scratched, the method further includes: judging whether the attribute information of the target picture is matched with the sample attribute information of the original image in the training sample set; when the target pictures are matched with the target pictures, performing labeling processing on the target pictures through an image processing model; and when the target picture is not matched with the target picture, adjusting the attribute information of the target picture according to the sample attribute information. The determining whether the attribute information of the target picture matches with the sample attribute information of the original image may be determining whether the color space of the target picture is the same as the color space of the sample picture. The color space includes, but is not limited to, RGB format, YUV format, HSV format, HIS format, or the like. By adopting the technical scheme, the attribute information of the target picture is judged before the target picture is input into the image processing model, so that the matching of the target picture and the attribute information of the original image in the training sample set can be ensured, the situation that an inaccurate trimap picture is obtained due to the fact that the attribute information is not matched and a labeling error is introduced is avoided, even the situation that the trimap picture cannot be generated due to the fact that the labeling fails due to the fact that the attribute information is not matched occurs, and the matting efficiency and the matting effect can be effectively improved.

Fig. 4 is a schematic structural diagram of an image processing model training apparatus according to an embodiment of the present disclosure, which may be implemented by software and/or hardware, and is generally integrated in a terminal or a server, and may train a deep learning network by performing an image processing model training method to obtain an image processing model. As shown in fig. 4, the apparatus includes:

a trimap image obtaining module 410, configured to obtain a trimap image of an original image;

a sample generation module 420, configured to generate a training sample set according to the original image and the trimap image;

and the model training module 430 is configured to train a preset deep learning network based on the training sample set to obtain an image processing model, where the image processing model is used to perform labeling processing on an original image to obtain a ternary diagram.

The embodiment of the application provides a training device of an image processing model, which is characterized in that a training sample set is generated according to a plurality of original images and corresponding trimap images by acquiring trimap images of the original images; training a preset deep learning network based on the training sample set to iteratively update various parameter values of the deep learning network, obtaining an image processing model after training is completed, and performing labeling processing on an original image through the image processing model to obtain a trimap image. By adopting the technical scheme, the deep learning network can be trained on the basis of the original image and the corresponding trimap image, so that the input original image can be automatically labeled to obtain the trimap image, a large amount of data labels at the hair level are not required to be carried out in a manual labeling mode, the labeling workload can be reduced, and the image labeling efficiency is improved. In addition, the original image is labeled by adopting the image processing model, so that errors possibly introduced by manual labeling are avoided, and the labeling effect can be improved.

Optionally, the sample generating module 420 includes:

the pre-matting submodule is used for carrying out matting processing on the original image by adopting a set matting algorithm based on the original image and the three-segment image to obtain a matting image;

the evaluation submodule is used for acquiring evaluation information of the sectional image and judging whether the three-part image meets a set condition or not according to the evaluation information;

and the sample generation submodule is used for taking the original image and the trimap image as training samples and generating a training sample set according to the training samples if the trimap image is determined to meet the set conditions.

Optionally, the evaluation sub-module is specifically configured to:

acquiring a target image in the original image;

determining the similarity between the sectional image and the target image, and scoring the sectional image according to the similarity;

and when the score of the sectional image exceeds a set threshold value, determining that the trimap meets a set condition.

Optionally, the preset deep learning network is a deep learning model in which a post-processing layer is added after an output layer of the deep learning network based on image semantic segmentation, where the post-processing layer is configured to perform segmented thresholding on a class map with two colors, namely black and white, output by the deep learning network based on image semantic segmentation, so as to obtain a three-segment map with three colors, namely black, white and gray.

Optionally, the performing segmented thresholding on the class map with black and white colors output by the deep learning network based on image semantic segmentation includes:

acquiring a set piecewise function, wherein the set piecewise function defines the incidence relation between a gray value interval and a target pixel value;

and acquiring a category map with black and white colors output by a deep learning network based on image semantic segmentation, and adjusting the pixel value of the category map by adopting the set piecewise function.

Fig. 5 is a block diagram of a matting device provided in an embodiment of the present application, where the device may be implemented by software and/or hardware, and may be generally integrated in a terminal, and may perform matting processing on an object picture by executing a matting method based on an image processing model. As shown in fig. 5, the apparatus includes:

a target picture obtaining module 510, configured to obtain a target picture to be scratched;

the image labeling module 520 is configured to label the target image through an image processing model to obtain a trisection image of the target image, where the image processing model is a deep learning network trained through a training sample set composed of an original image and the trisection image;

a matting module 530, configured to perform matting processing on the target picture by using a set matting algorithm based on the target picture and the trisection map to obtain a matting image.

The embodiment of the application provides a matting device, which is used for acquiring a target picture to be subjected to matting when matting operation is executed; labeling the target picture through an image processing model to obtain a trimap picture; then, the target picture and the trimap picture are used as input data for setting a matting algorithm, and a matting operation is executed to obtain a matting image of the target picture. Through adopting above-mentioned technical scheme, can avoid in the manual mark because individual cognitive difference and the mark error that introduces, improve the mark precision greatly to, promoted and carried out the matting degree of accuracy of matting based on this trimap picture.

Optionally, the matting device further comprises:

after a target picture to be subjected to matting is obtained, judging whether attribute information of the target picture is matched with sample attribute information of an original image in the training sample set;

when the target pictures are matched with the target pictures, performing labeling processing on the target pictures through an image processing model;

and when the target picture is not matched with the target picture, adjusting the attribute information of the target picture according to the sample attribute information.

Embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a method of training an image processing model, the method comprising:

acquiring a trimap image of an original image;

Additionally, embodiments of the present application also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a matting method, the method including:

acquiring a target picture to be scratched;

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media that may reside in different locations, such as in different computer systems that are connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the training operation of the image processing model described above, and may also perform related operations in the training method of the image processing model provided in any embodiments of the present application.

Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the matting operation described above, and may also perform related operations in the matting method provided in any embodiments of the present application.

The embodiment of the application also provides a terminal, and the training device of the image processing model provided by the embodiment of the application can be integrated in the terminal. Fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 6, the terminal includes a memory 610 and a processor 620. The memory 610 is used for storing computer programs and the like; the processor 620 reads and executes the computer programs stored in the memory 610. The processor 620 includes, inter alia, a trimap acquisition module 621, a sample generation module 622, and a model training module 623. The processor 620, when executing the computer program, performs the steps of: acquiring a trimap image of an original image; generating a training sample set according to the original image and the trimap image; and training a preset deep learning network based on the training sample set to obtain an image processing model, wherein the image processing model is used for performing labeling processing on an original image to obtain a trisection image.

In addition, the embodiment of the application also provides another terminal, and the matting device provided by the embodiment of the application can be integrated in the terminal. Fig. 7 is a schematic structural diagram of another terminal provided in the embodiment of the present application. As shown in fig. 7, the terminal includes a memory 710 and a processor 720. The memory 710 is used for storing computer programs and the like; the processor 720 reads and executes the computer programs stored in the memory 710. The processor 720 includes a target picture acquiring module 721, a picture marking module 722 and a matting module 723. The processor 720, when executing the computer program, performs the steps of: acquiring a target picture to be scratched; labeling the target picture through an image processing model to obtain a trisection image of the target picture, wherein the image processing model is a deep learning network trained through a training sample set formed by an original image and the trisection image; and based on the target picture and the trisection picture, performing matting processing on the target picture by adopting a set matting algorithm to obtain a matting image.

The memory and the processor listed in the above examples are part of the components of the terminal, and the terminal may further include other components. Taking a smart phone as an example, a possible structure of the terminal is described. Fig. 8 is a block diagram of a structure of a smart phone according to an embodiment of the present application. As shown in fig. 8, the smart phone may include: memory 801, a Central Processing Unit (CPU) 802 (also known as a processor, hereinafter CPU), a peripheral interface 803, a Radio Frequency (RF) circuit 805, an audio circuit 806, a speaker 811, a touch screen 812, a power management chip 808, an input/output (I/O) subsystem 809, other input/control devices 810, and an external port 804, which communicate via one or more communication buses or signal lines 807.

It should be understood that the illustrated smartphone 800 is merely one example of a terminal, and that the smartphone 800 may have more or fewer components than shown in the figures, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The following describes in detail a smartphone integrated with a training apparatus for an image processing model according to this embodiment.

A memory 801, the memory 801 being accessible by the CPU802, the peripheral interface 803, and the like, the memory 801 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other volatile solid state storage devices. The memory 801 stores a computer program, and may also store a preset file, a preset white list, and the like.

A peripheral interface 803, said peripheral interface 803 allowing input and output peripherals of the device to be connected to the CPU802 and the memory 801.

I/O subsystem 809, which I/O subsystem 809 may connect input and output peripherals on the device, such as touch screen 812 and other input/control devices 810, to peripheral interface 803. The I/O subsystem 809 may include a display controller 8091 and one or more input controllers 8092 for controlling other input/control devices 810. Where one or more input controllers 8092 receive electrical signals from or transmit electrical signals to other input/control devices 810, other input/control devices 810 may include physical buttons (push buttons, rocker buttons, etc.), dials, slide switches, joysticks, click wheels. It is worth noting that the input controller 8092 may be connected to any of the following: a keyboard, an infrared port, a USB interface, and a pointing device such as a mouse.

A touch screen 812, which touch screen 812 is an input interface and an output interface between the user terminal and the user, displays visual output to the user, which may include graphics, text, icons, video, and the like.

The display controller 8091 in the I/O subsystem 809 receives electrical signals from the touch screen 812 or sends electrical signals to the touch screen 812. The touch screen 812 detects a contact on the touch screen, and the display controller 8091 converts the detected contact into an interaction with a user interface object displayed on the touch screen 812, that is, implements a human-computer interaction, and the user interface object displayed on the touch screen 812 may be an icon for running a game, an icon networked to a corresponding network, or the like. It is worth mentioning that the device may also comprise a light mouse, which is a touch sensitive surface that does not show visual output, or an extension of the touch sensitive surface formed by the touch screen.

The RF circuit 805 is mainly used to establish communication between the mobile phone and the wireless network (i.e., the network side), and implement data reception and transmission between the mobile phone and the wireless network. Such as sending and receiving short messages, e-mails, etc. In particular, the RF circuitry 805 receives and transmits RF signals, also referred to as electromagnetic signals, which the RF circuitry 805 converts to or from electrical signals, and communicates with communication networks and other devices over. RF circuitry 805 may include known circuitry for performing these functions including, but not limited to, an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC (CODEC) chipset, a Subscriber Identity Module (SIM), and so forth.

The audio circuit 806 is mainly used to receive audio data from the peripheral interface 803, convert the audio data into an electric signal, and transmit the electric signal to the speaker 811.

The speaker 811 is used to convert the voice signal received by the handset from the wireless network through the RF circuit 805 into sound and play the sound to the user.

And the power management chip 808 is used for supplying power and managing power to the hardware connected with the CPU802, the I/O subsystem and the peripheral interface.

The terminal provided by the embodiment of the application can train the deep learning network based on the original image and the corresponding trimap image, so that the input original image can be automatically labeled to obtain the trimap image, a large amount of data labels at the hair level are not required to be carried out in a manual labeling mode, the labeling workload can be reduced, and the image labeling efficiency is improved. In addition, the original image is labeled by adopting the image processing model, so that errors possibly introduced by manual labeling are avoided, and the labeling effect can be improved. Therefore, the matting accuracy of matting based on the trimap image is improved.

The training device, the matting device, the storage medium and the terminal of the image processing model provided in the above embodiments can execute the training method or the matting method of the image processing model provided in any embodiment of the present application, and have corresponding functional modules and beneficial effects for executing the method. Technical details not described in detail in the above embodiments may be referred to a training method or a matting method of an image processing model provided in any embodiment of the present application.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present application and the technical principles employed. It will be understood by those skilled in the art that the present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the application. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the appended claims.

Claims

1. A method for training an image processing model, comprising:

acquiring a trimap image of an original image;

training a preset deep learning network based on the training sample set to obtain an image processing model, wherein the image processing model is used for carrying out labeling processing on an original image to obtain a three-division image, the preset deep learning network is a deep learning model with an after-processing layer added behind an output layer of the deep learning network based on image semantic segmentation, and the after-processing layer is used for carrying out segmentation thresholding processing on a class image which is output by the deep learning network based on image semantic segmentation and has two colors of black and white to obtain the three-division image with three colors of black, white and gray.

2. The method of claim 1, wherein generating a training sample set from the original image and the trimap image comprises:

based on the original image and the trisection image, carrying out matting processing on the original image by adopting a set matting algorithm to obtain a matting image;

obtaining evaluation information of the sectional image, and judging whether the three-part image meets set conditions or not according to the evaluation information;

and if so, taking the original image and the ternary image as training samples, and generating a training sample set according to the training samples.

3. The method as claimed in claim 2, wherein obtaining evaluation information of the matte image, and determining whether the trisection image meets a set condition according to the evaluation information comprises:

acquiring a target image in the original image;

4. The method according to claim 1, wherein the step of performing segment thresholding on the class map with black and white colors output by the deep learning network based on image semantic segmentation comprises the following steps:

5. A matting method, comprising:

acquiring a target picture to be scratched;

labeling the target picture through an image processing model to obtain a three-division graph of the target picture, wherein the image processing model is a deep learning network trained through a training sample set consisting of an original image and the three-division graph, the deep learning network is a deep learning model with an after-processing layer added behind an output layer of the deep learning network based on image semantic segmentation, and the after-processing layer is used for performing segmented thresholding processing on a class graph with two colors of black and white output by the deep learning network based on image semantic segmentation to obtain the three-division graph with three colors of black, white and gray;

6. The method as claimed in claim 5, wherein after obtaining the target picture to be scratched, the method further comprises:

judging whether the attribute information of the target picture is matched with the sample attribute information of the original image in the training sample set;

7. An apparatus for training an image processing model, comprising:

and the model training module is used for training a preset deep learning network based on the training sample set to obtain an image processing model, wherein the image processing model is used for labeling an original image to obtain a three-division graph, the preset deep learning network is a deep learning model with an after-processing layer added behind an output layer of the deep learning network based on image semantic segmentation, and the after-processing layer is used for performing segmented thresholding on a class graph with two colors of black and white output by the deep learning network based on image semantic segmentation to obtain a three-division graph with three colors of black, white and gray.

8. A matting device, comprising:

the image annotation module is used for performing annotation processing on the target image through an image processing model to obtain a three-division image of the target image, wherein the image processing model is a deep learning network trained through a training sample set consisting of an original image and the three-division image, the deep learning network is a deep learning model with a post-processing layer added behind an output layer of the deep learning network based on image semantic segmentation, and the post-processing layer is used for performing segmented thresholding processing on a class image with two colors of black and white output by the deep learning network based on image semantic segmentation to obtain the three-division image with three colors of black, white and gray;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a method of training an image processing model according to any one of claims 1 to 4; alternatively, the program is executed by a processor to implement the matting method as recited in claim 5 or 6.

10. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a training method of an image processing model as claimed in any one of claims 1 to 4 when executing the computer program or implementing a matting method as claimed in claim 5 or 6 when executing the computer program.