CN113420769A

CN113420769A - Image mask recognition, matting and model training method and device and electronic equipment

Info

Publication number: CN113420769A
Application number: CN202011265865.2A
Authority: CN
Inventors: 陈汐; 赵志艳
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-11-12
Filing date: 2020-11-12
Publication date: 2021-09-21
Anticipated expiration: 2040-11-12
Also published as: CN113420769B

Abstract

本发明实施例提供了一种图像掩模的识别、抠取、模型训练方法、装置及电子设备，其中，图像掩模的识别方法，包括：对待识别图像以及用户在所述待识别图像中选定的目标图像基准区域进行特征提取；确定特征扩散源对应的第一特征数据和特征扩散目的地对应的第二特征数据之间的特征相似度；按照特征相似度标识的权重，将所述第一特征数据和所述第二特征数据进行叠加；根据加权叠加处理后的所述待识别图像对应的特征数据，进行图像掩模识别，生成图像掩模的识别结果。本发明实施例通过引入特征扩散机制，将用户选定的目标图像基准区域作为特征扩散源，在待识别图像中进行特征扩散，从而提高了对待识别图像中目标对象识别的准确度和图像分割处理的效率。Embodiments of the present invention provide a method, device, and electronic device for identifying, extracting, and training an image mask, wherein the method for identifying an image mask includes: an image to be identified and a user selecting an image from the image to be identified. feature extraction in the target image reference area; determine the feature similarity between the first feature data corresponding to the feature diffusion source and the second feature data corresponding to the feature diffusion destination; The first feature data and the second feature data are superimposed; image mask recognition is performed according to the feature data corresponding to the to-be-recognized image after weighted superposition processing, and an image mask recognition result is generated. In the embodiment of the present invention, by introducing a feature diffusion mechanism, the target image reference area selected by the user is used as a feature diffusion source, and feature diffusion is performed in the to-be-recognized image, thereby improving the recognition accuracy of the target object in the to-be-recognized image and image segmentation processing. s efficiency.

Description

Image mask recognition, matting and model training method and device and electronic equipment

Technical Field

The application relates to a method and a device for recognizing, matting and training a model of an image mask and electronic equipment, and belongs to the technical field of computers.

Background

The separation mask has wide application in image processing, for example, it can be applied to some matting tools, and uses image masking technology to separate the image background of a specific object in a picture.

However, in the prior art, during the segmentation mask labeling process of an image, a large amount of manual delineation or smearing is often used to complete the segmentation mask identification of different objects in the image, and in many cases, a user needs to mark a reference point of a target image a considerable number of times to obtain a satisfactory segmentation mask, thereby reducing the efficiency of image segmentation and extraction.

Disclosure of Invention

The embodiment of the invention provides an image mask identification, extraction and model training method and device and electronic equipment, and aims to improve the identification efficiency of an image mask.

In order to achieve the above object, an embodiment of the present invention provides an image mask identification method, including:

carrying out feature extraction on an image to be recognized and a target image reference area selected by a user in the image to be recognized;

determining feature similarity between first feature data corresponding to the feature diffusion source and second feature data corresponding to the feature diffusion destination by taking the target image reference region as a feature diffusion source and one or more regions except the target image reference region as feature diffusion destinations;

superposing the first characteristic data and the second characteristic data according to the weight of the characteristic similarity identifier;

and performing image mask recognition according to the characteristic data corresponding to the image to be recognized after the weighted superposition processing to generate a recognition result of the image mask.

The embodiment of the invention also provides a training method of the image segmentation model, which comprises the following steps:

taking a plurality of training images as images to be identified, acquiring a target image reference area selected by a user aiming at a target object, and acquiring an image mask aiming at the target object by using the image mask identification method;

training a machine learning model for performing an image segmentation process of segmenting the target object from a specified image, using the training image and a corresponding image mask as training data.

The embodiment of the invention also provides an image mask recognition device, which comprises:

the characteristic extraction module is used for extracting characteristics of an image to be recognized and a target image reference area selected by a user in the image to be recognized;

a feature similarity determination module, configured to determine feature similarity between first feature data corresponding to the feature diffusion source and second feature data corresponding to the feature diffusion destination, where the target image reference region is used as a feature diffusion source and one or more regions other than the target image reference region are used as feature diffusion destinations;

the feature diffusion module is used for superposing the first feature data and the second feature data according to the weight of the feature similarity identifier;

and the recognition processing module is used for carrying out image mask recognition according to the characteristic data corresponding to the image to be recognized after the weighted superposition processing, and generating a recognition result of the image mask.

The embodiment of the invention also provides a training device of the image segmentation model, which comprises the following steps:

the image mask acquisition module is used for taking a plurality of training images as images to be recognized, acquiring a target image reference region selected by a user aiming at a target object, performing feature diffusion processing by taking the target image reference region as a feature diffusion source, performing image mask recognition on the images to be recognized after the feature diffusion processing, and acquiring an image mask aiming at the target object;

and the training processing module is used for training a machine learning model by taking the training image and the corresponding image mask as training data, wherein the machine learning model is used for executing image segmentation processing for segmenting the target object from a specified image.

An embodiment of the present invention further provides an electronic device, including:

a memory for storing a program;

and the processor is used for operating the program stored in the memory so as to execute the image mask identification method.

a memory for storing a program;

and the processor is used for operating the program stored in the memory so as to execute the training method of the image segmentation model.

The embodiment of the invention also provides an image matting method, which comprises the following steps:

carrying out matting processing on an input image aiming at a specified target image by using an image segmentation model;

responding to the correction operation of the user on the extracted target image, and acquiring a target image reference area selected by the user;

and performing feature diffusion processing by taking the target image reference region as a feature diffusion source, performing image mask recognition on the image to be recognized after the feature diffusion processing, acquiring an image mask of the target object, and performing matting processing on the input image aiming at the specified target image again by using the image mask to generate a new target image.

a memory for storing a program;

a processor for executing the program stored in the memory to perform the aforementioned image matting method.

According to the image mask identification, extraction and model training method, device and electronic equipment, the characteristic diffusion mechanism is introduced, the target image reference region selected by the user is used as the characteristic diffusion source, the characteristic diffusion processing based on the characteristic similarity is carried out on the image to be identified, the characteristic significance of the image point similar to the characteristic diffusion source selected by the user is enhanced, the accuracy of the image identification model for identifying the target object in the image to be identified is improved, the number of times of selecting the target image reference point by the user is reduced, and the efficiency of image segmentation processing is improved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

FIG. 1 is a schematic diagram of an exemplary configuration of an image mask recognition processing system according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating an image mask recognition method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for training an image segmentation model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an apparatus for identifying an image mask according to an embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an apparatus for training an image segmentation model according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an application scenario of an image matting method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The embodiment of the invention provides an image mask identification method, which takes foreground points and/or background points selected by a user as diffusion sources by introducing a characteristic diffusion mechanism and a pixel diffusion mechanism, and diffuses in an image to be identified, so that the accuracy of an image identification model for identifying a target object in the image to be identified is improved, the user can mark an image mask area corresponding to the target object only by selecting a few times aiming at foreground points and/or background points, and the efficiency and the accuracy of image segmentation processing are improved.

Fig. 1 is a schematic diagram illustrating an exemplary structure of an image mask recognition processing system according to an embodiment of the present invention. The processing system can be deployed on a cloud server, provides image processing services based on a cloud platform for users, and the users can interact with the cloud server through image processing tools arranged on terminal equipment, such as various matting tools and the like. The user can display the image to be recognized on the terminal device, and select the target image reference area through clicking operation of the image to be recognized. In addition, the processing system can also be directly deployed on the terminal equipment.

The target image reference area may be realized by a user selecting a target image reference point, and after the target image reference point is selected, an area in a preset range near the reference point is used as the target image reference area. The target image reference points may include foreground points and/or background points selected by the user as a reference for determining the target object image, and in the image processing process, the foreground points and/or the background points selected by the user are used as a reference for subsequently determining whether other areas in the image are foreground points or background points, that is, a reference for determining the target object. The foreground point is in the range of the target object which the user wants to extract, and the background point is in the region outside the range of the target object which the user wants to extract. An image mask is a template for covering a specified target area in an image, and is generally used for separating the image in the specified target area from an image to be processed. In digital image processing, the image mask may be embodied as a two-dimensional matrix array (e.g., a mask matrix), an image template for performing a filtering process on a prepared area, or the like. The image mask based processing is a process that uses the selected image or graphic to block the processed image (either entirely or partially) to control the area or process of image processing. In a specific application, the image mask is generally recognized by dividing a designated object in an image, for example, it is desirable to divide an image of a dog from a photograph including a background environment, and in this case, it is necessary to recognize an image area corresponding to the object of the dog and form the image mask. When a specific target is segmented, a point in an image region where the target is located is a foreground point, and a point in the image region where the target is located is a background point. The user can continuously select foreground points and/or background points to trigger the image recognition model to judge the image area where the target is located, so that an image mask is formed and then image segmentation is carried out.

Theoretically, a user can trigger an image recognition model to determine an accurate target image area through numerous selections of foreground points and/or background points, but the time cost of the user is seriously consumed, and the efficiency is extremely low. In the embodiment of the invention, a diffusion mechanism based on the target image reference point selected by a user is introduced to improve the processing efficiency of mask identification.

As shown in fig. 1, after the foreground point and/or the background point are selected by the user, the foreground point and/or the background point are processed by the gaussian kernel processing module to generate a gaussian representation of the foreground point and/or the background point. Specifically, a pixel region in a predetermined range including foreground points and/or background points is used as a target image reference region, a gaussian kernel with a preset amplitude and standard deviation (for example, a gaussian kernel with an amplitude of 1 and a standard deviation of 10) is used to process the target image reference region, a gaussian map corresponding to the target image reference region is generated, and then the gaussian map and an image to be recognized are input into a feature extraction module together to perform feature extraction. The main role of the processing using the gaussian kernel is to perform noise reduction processing on the exchange point region, so that accurate feature information corresponding to the local region can be acquired. Specifically, two gaussian images can be generated from a foreground point image reference region and a background point image reference region corresponding to a foreground point and a background point, and the two gaussian images can be spliced with an image to be identified having three channels of RGB (three primary colors light mode) to form a 5-channel image as an input of the feature extraction module. In addition, the feature extraction module can be implemented by using a feature extraction part in a Deeplab V3+ (third edition of deep hole network reinforcement) segmentation model. Deeplab V3+ is a semantic segmentation model. It uses an encoder-decoder (encoder-decoder) structure for inference. The encoder is used for extracting abstract high-level semantic information, and the decoder is used for recovering image detail information. In the encoder part, the Deeplab V3+ uses a hole convolution (aperture convolution) to enlarge the receptive fields (receiptive fields) on the premise of keeping the scale of the feature map, so that the model can acquire background information from a larger range to carry out reasoning, thereby obtaining better high-level semantic representation. In the decoder part, Deeplab V3+ introduces a bottom layer feature map to be fused with a high layer feature map so as to reconstruct the detail features of the image. Therefore, the prediction result of the Deeplab V3+ has both high-level semantic features and low-level detail features.

And inputting the feature data generated after the feature extraction module is used for processing feature diffusion into a feature diffusion module, so that the features of the target image reference region are diffused into a region similar to the target image reference region. Specifically, the feature diffusion process may include the following processes:

1) constructing a characteristic diffusion source: and respectively taking the foreground point image reference area and/or the background point image reference area as a foreground characteristic diffusion source and/or a background characteristic diffusion source, wherein the characteristic for diffusion is a characteristic vector corresponding to the target image reference area.

2) Constructing a characteristic diffusion destination: based on 1) the determined foreground point image reference area and/or background point image reference area, performing preliminary prediction of foreground points and background points on each pixel point of the whole image by using an image recognition model, and determining whether each pixel point is used as a foreground feature diffusion destination or a background feature diffusion destination according to a preliminary prediction result.

3) Calculating the feature similarity: by the processing of the vector dot product, the feature similarity between the feature vector of the feature diffusion source and the feature vector of the feature diffusion destination is calculated.

4) Characteristic diffusion: weighting according to the feature similarity obtained by calculation in 3), namely, weighting and superposing the feature vector corresponding to the feature diffusion source on the feature vector corresponding to the feature diffusion destination by using the feature similarity as the weight. Considering that the foreground feature diffusion destination and the background feature diffusion destination are already determined in 2), correspondingly, the feature vectors corresponding to the foreground feature diffusion source and the background feature diffusion source and the feature vectors corresponding to the foreground feature diffusion destination and the background feature diffusion destination can be weighted and superposed, so that feature diffusion is realized.

After the feature diffusion processing, the feature data of the whole image to be recognized is updated and adjusted, based on the feature data of the image to be recognized after the feature diffusion processing, an image recognition model is used for predicting foreground points and background points of each pixel point of the whole image, an intermediate prediction result is generated, and the intermediate prediction result is used for performing pixel diffusion processing subsequently, wherein the intermediate prediction result comprises the prediction probability that each pixel point is a foreground point and/or a background point. The image recognition model here can still adopt the above-mentioned deplab v3+ segmentation model.

After the intermediate prediction result is input into the pixel diffusion module, the pixel diffusion module performs diffusion processing on surrounding pixel points from a foreground point and/or a background point selected by a user based on the similarity of pixel data so as to adjust the prediction probability of the foreground point and/or the background point corresponding to the diffused pixel point. Specifically, the pixel diffusion process may include the following processes:

1) constructing a pixel diffusion source: selecting pixel points in the foreground point image reference area and/or the background point image reference area as foreground pixel diffusion sources and/or background pixel diffusion sources, wherein one or more pixel points can be selected, or a plurality of pixel points can form a pixel area as a pixel diffusion source. The content of the pixel diffusion is pixel data corresponding to the pixel point, for example, color, brightness, saturation, and the like based on RGB data.

2) Constructing a pixel diffusion destination: the pixel diffusion destination can select the pixel points around the pixel diffusion source, namely, the pixel diffusion source diffuses towards the surrounding pixel points in a radiation mode.

3) Calculating pixel similarity: and calculating the data difference value of the RGB channel between the pixel diffusion source and the pixel diffusion destination as the pixel similarity, wherein the smaller the difference value is, the higher the pixel similarity is.

4) Pixel diffusion: based on the pixel similarity in the step 3), weighting and superposing the prediction probability of the foreground point and/or the background point of the pixel diffusion source in the intermediate prediction result on the prediction probability of the foreground point and/or the background point of the pixel diffusion destination, thereby adjusting the prediction result of the pixel point. The weighting process described here is performed based on the magnitude of the pixel similarity, and the larger the pixel similarity is, the more the pixel diffusion source affects the pixel diffusion destination prediction result in the superimposing process.

5) And (3) iterative processing: in practical applications, it is necessary to repeat the pixel diffusion processing of 1) to 4), that is, after one round of pixel diffusion processing is performed, pixel points corresponding to the pixel diffusion destinations of the round are used as new pixel diffusion sources, and pixel diffusion is continued to the surroundings according to the processing, so that the influence range of the initial pixel diffusion source is expanded. The iteration times can be preset according to needs, and the higher the iteration times is, the wider the pixel diffusion range is.

Through the pixel diffusion processing, the pixel point prediction result in a certain range can be adjusted, namely, fine adjustment of the intermediate prediction result after feature diffusion is realized, and the fine adjustment effect on the mask boundary can be more obvious.

It should be noted that, the feature diffusion processing and the pixel diffusion processing are essentially weighted accumulation processing based on similarity, and the purpose of the feature diffusion processing and the pixel diffusion processing is to propagate the feature information of the diffusion source to the diffusion destination, so as to more fully utilize the influence of the target image reference point selected by the user, improve the accuracy of target mask identification, and further reduce the number of times that the user needs to select the target image reference point.

After the intermediate prediction result is adjusted by the pixel diffusion module, the intermediate prediction result is input into the threshold value judgment module, and the prediction probability of foreground points and/or background points in the intermediate prediction result is judged based on a preset confidence threshold value, so that the final prediction result that each pixel point is a foreground point or a background point is output, and the mask area of the target object is determined. The confidence threshold can be set according to actual requirements, the confidence judgment module can be realized by adopting a Sigmoid function, the Sigmoid function is a threshold function, the value range is (0,1), a real number can be mapped to the interval of (0,1), and the two-classification processing can be performed.

It should be noted that the intermediate prediction result may also be output as a final recognition result, and even if only feature diffusion processing is used instead of pixel diffusion processing, the technical solution of the embodiment of the present invention can also achieve the technical effects of reducing the number of times that a user selects foreground points and/or background points and improving the efficiency and accuracy of image segmentation processing, compared with the prior art.

According to the image mask identification method provided by the embodiment of the invention, a characteristic diffusion mechanism is introduced, the area where the foreground point and/or the background point selected by a user is located is used as a characteristic diffusion source, and characteristic diffusion is carried out in the image to be identified, so that the accuracy of identifying a target object in the image to be identified by an image identification model is improved, and further pixel diffusion is carried out on pixel data of pixel points in the area where the selected foreground point and/or the background point is located, so that a prediction result is finely adjusted, and the accuracy of the prediction probability of the pixel points in the image to be identified is further improved. According to the scheme of the embodiment of the invention, the user can mark the image mask area corresponding to the target object only by selecting the foreground point and/or the background point for a small number of times, so that the efficiency and the accuracy of image segmentation processing are improved.

The image mask identification method provided by the embodiment of the invention can be applied to generating training data of a machine learning model for image segmentation processing. In some application scenarios, a user needs a matting tool for a certain type of target object, and in order to realize automatic identification of the matting tool, machine learning is generally used for completion. For example, a clothing design company may need to perform matting on clothing images in a large number of pictures on the internet to acquire various clothing data to assist in design and industry analysis, and for such a need, a machine learning model for identifying a target object in a picture, such as a machine learning model based on a deep neural network, may be provided for a user, and the user may train the machine learning model according to his own needs, so that the machine learning model can meet the use of the user. The basis on which a user can reasonably train the machine learning model is to be able to construct large amounts of training data. The image mask recognition method provided by the embodiment of the invention can be used for generating the training data. A user can collect a large number of pictures as training images, and then the method provided by the embodiment of the invention is used for acquiring the more accurate image mask of the target object by one or more times of selecting the foreground points and/or the background points of the target object. The machine learning model is then trained using the training images and corresponding image masks as training data, the trained machine learning model being used to perform an image segmentation process that segments the target object from the specified image. For example, if a user uses a large amount of training data for a target garment, the machine learning model can automatically recognize and segment the garment in a specific picture. The method provided by the embodiment of the invention has the advantages that a large amount of training data can be generated quickly and accurately, so that the collection cost of the training data is reduced.

In terms of product application, the image mask identification method provided by the embodiment of the invention can be used for carrying out data annotation according to the requirements of a user, and the high-quality image mask can be obtained by clicking the image area where the target object is located and/or the image area of the non-target object for a small number of times by the user, so that the data annotation process is efficiently completed, and further training data is generated. By the method, a large amount of training data can be generated very conveniently and rapidly, so that the image segmentation model in the matting tool is trained to meet the matting requirement of a user. In addition, by using the method provided by the embodiment of the invention, the image segmentation processing result can be corrected, and then the image segmentation model is trained based on the corrected image mask, so that the accuracy of the image segmentation model for segmenting the target object is improved. Overall, the image mask recognition method, the image matting method and the model training method provided by the embodiment of the invention can realize the whole flow system of 'annotation data-training model-model prediction-result correction', reduce the collection cost of model training data, and facilitate the correction of the model prediction result by a user, namely the correction of the matting result, and the corrected result can also be subjected to feedback training, thereby further improving the accuracy of the image segmentation model and better meeting the matting requirement of the user.

In addition, in the process of executing the image mask recognition method or the image matting process according to the embodiment of the present invention, selectable click positions or regions may be prompted based on a preliminary analysis of the image, and for example, an indicator prompting a user to select may be generated at a boundary of an object image on the basis of recognizing each target object in the image. In addition, the image mask identification method or the image matting processing method according to the embodiment of the present invention may be embedded in a process of editing and producing a picture, that is, may be a part of a tool for editing and producing a picture, for example, in a process of producing some poster propaganda, some objects in an original picture may be removed or some objects may be extracted from some photographic works, and a later synthesis process may be performed, and these applications may use the method provided by the embodiment of the present invention.

The technical solution of the present invention is further illustrated by some specific examples.

Example one

As shown in fig. 2, which is a schematic flow chart of an image mask recognition method according to an embodiment of the present invention, the method may be applied to a cloud service platform, and a user interacts with the cloud service platform through a terminal device, so as to implement interactive recognition processing on an image mask. In addition, the method can also be operated on a terminal device, so that the interactive recognition processing of the image mask is completed locally. Specifically, the method comprises the following steps:

s101: and performing feature extraction on the image to be recognized and a target image reference area selected by a user in the image to be recognized. The target image reference points are points selected by a user as characteristic references, and can include foreground points and/or background points, and in the image processing process, the foreground points and/or the background points selected by the user are used as subsequent judgment references for whether other areas in the image are foreground points or background points. The foreground points are in the target range that the user wants to extract, and the background points are in the region outside the target range that the user wants to extract. In actual processing, a region including a specified range of the target image reference point is generally selected as a feature reference, and this region is a so-called target image reference region, and the size of the region can be set as necessary.

The feature extraction of the image to be recognized and the reference region of the target image can be realized by adopting a neural network model for image recognition processing, for example, the feature extraction part in a Deeplab V3+ segmentation model. For the processing of the target image reference region, a gaussian kernel function may be used to perform noise reduction and other processing, so as to acquire more accurate feature information serving as a subsequent feature diffusion source. Specifically, the feature extraction of the target image reference region selected by the user in the image to be recognized may include: obtaining foreground points and/or background points selected by a user; and taking a pixel area of a preset range containing foreground points and/or background points as a reference area of the target image and extracting the characteristics. The processing of feature extraction may specifically be: processing is performed using a gaussian kernel function to generate a gaussian map of the target image reference region, and feature extraction is performed on the gaussian map. In practical application, two gaussians can be generated in a foreground point image reference region and a background point image reference region corresponding to a foreground point and a background point selected by a user respectively, and the gaussians are spliced with an image to be identified with three channels of RGB to form a 5-channel image as input data for feature extraction processing.

S102: and determining the feature similarity between first feature data corresponding to the feature diffusion source and second feature data corresponding to the feature diffusion destination by taking the target image reference region as a feature diffusion source and one or more regions except the target image reference region as feature diffusion destinations. Specifically, the first feature data and the second feature data may specifically be a feature vector corresponding to a feature diffusion source and a feature vector corresponding to a feature diffusion destination, and specifically, the feature similarity between the feature vector of the feature diffusion source and the feature vector of the feature diffusion destination may be calculated through processing of a vector dot product.

Further, as described above, the user may select foreground points and/or background points, and accordingly, the target image reference region may be formed to include a foreground point image reference region and/or a background point image reference region. Accordingly, the process of determining a feature diffusion destination may include: predicting foreground points and/or background points according to the characteristic data of the foreground point image reference area or the background point image reference area and the characteristic data of the image to be recognized to generate a preliminary prediction result; and determining a characteristic diffusion destination corresponding to the foreground point image reference region and/or the background point image reference region according to the preliminary prediction result. The preliminary prediction result mentioned here is prediction of foreground points and/or background points based on an image to be recognized before feature diffusion, and by this preliminary prediction, the probability that each point in the image to be recognized is a foreground point and/or a background point can be roughly determined, so that the image to be recognized can be diffused in a targeted manner, that is, for a part where the prediction probability is inclined to a foreground point, for example, the probability of being a foreground point is greater than 0.5, the image to be recognized is taken as a feature diffusion destination corresponding to a foreground point image reference region, and similarly, for a part where the prediction probability is inclined to a background point, the image to be recognized is taken as a feature diffusion destination corresponding to a foreground point image reference region. By such processing, the feature diffusion processing can be made more effective.

S103: and overlapping the first characteristic data and the second characteristic data according to the weight of the characteristic similarity identifier. Specifically, the feature vector corresponding to the feature diffusion source may be weighted and superimposed on the feature vector corresponding to the feature diffusion destination by the weight identified by the feature similarity. Through the weighted superposition of the feature vectors based on the feature similarity, the features of the foreground point and/or the background point selected by the user can be diffused to the whole image to be recognized, so that the target image reference point selection of the user each time is more effective.

S104: and performing image mask recognition according to the characteristic data of the image to be recognized after the weighted superposition processing to generate a recognition result of the image mask. The image mask recognition may be implemented by using the neural network model for the image recognition process, for example, using a deplab v3+ segmentation model.

After the feature diffusion processing, the feature data of the whole image to be recognized is updated and adjusted. Based on the feature data of the image to be recognized after the feature diffusion processing, an image recognition model is used for predicting foreground points and background points of all pixel points of the whole image, and a prediction result of each pixel point is generated. The prediction result here includes the prediction probability that each pixel point is a foreground point and/or a background point. And then comparing the prediction summary that each pixel point is a foreground point and/or a background point with a preset confidence threshold value, the identification result of the pixel point in the image to be identified can be determined, and the mask area of the target object is determined. This process may be implemented by the Sigmoid function described previously.

Through the characteristic diffusion processing, the similarity information between the characteristic of the target image reference area selected by the user and the characteristics of other areas is fully utilized, the characteristic data of the whole image to be recognized is adjusted, and then the image mask recognition is carried out.

In order to further improve the accuracy of mask recognition, a prediction result can be obtained based on the feature data of the image to be recognized after feature diffusion, and further pixel diffusion processing can be performed. That is, the step S104 may further include:

s1041: and predicting foreground points and/or background points according to the characteristic data of the image to be recognized after weighted superposition processing to generate an intermediate prediction result. That is, the aforementioned prediction results for foreground points and/or background points are not directly output, but the subsequent pixel diffusion processing is performed as an intermediate prediction result.

S1042: and taking pixel points in the reference region of the target image as pixel diffusion sources, taking pixel points around the pixel diffusion sources as pixel diffusion destinations, and performing weighted superposition processing on the intermediate prediction results by using the weight of the pixel similarity identification between the pixel diffusion sources and the pixel diffusion destinations. This step may further include the following iterative process:

s10421: based on pixel data of the pixel points, pixel similarity between a pixel diffusion source and a pixel diffusion destination is determined. Specifically, the data difference value of the RGB channel between the pixel diffusion source and the pixel diffusion destination may be calculated as the pixel similarity.

S10422: and weighting and superposing the prediction result as the pixel diffusion source on the prediction result of the pixel diffusion destination by the weight identified by the pixel similarity.

S10423: the current pixel diffusion destination is used as a new pixel diffusion source, pixel points around the new pixel diffusion source are used as the new pixel diffusion destination, pixel similarity calculation and weighted superposition processing of prediction results are repeatedly executed until a preset round is reached, namely S10421 and S10423 are reset and executed until the preset round is reached, the more the execution rounds are, the larger the pixel diffusion range is, the better the pixel diffusion effect is, in practical application, a balance can be taken between improvement of processing efficiency and accuracy, and a reasonable iteration round is selected.

S1043: and generating an identification result of the image mask according to the intermediate prediction result after the weighted superposition processing. After the pixel diffusion processing is carried out, the prediction probabilities of a plurality of pixel points are adjusted, and finally, the recognition result of the pixel points in the image to be recognized is determined based on the prediction probability of the pixel points as foreground points and/or background points in the intermediate prediction result after the weighted superposition processing and the preset confidence threshold. Also, this part of the processing may be implemented by the aforementioned Sigmoid function.

The above describes the image mask recognition process performed after the user selects the target image reference point once, where the selecting of the target image reference point once may include that the user selects a plurality of foreground points and/or background points, and the user may trigger the image mask recognition process according to the embodiment of the present invention by confirming a key. The image mask identification processing essentially predicts foreground points and/or background points of each pixel point of the whole image based on the selection of the foreground points and/or the background points of the judgment reference by a user, and then determines the range of the image mask according to the prediction result.

In practical applications, multiple selections may be required, i.e., by performing the methods provided by embodiments of the present invention multiple times, to determine the exact extent of the image mask. Accordingly, the above method may further comprise: and in response to the operation of the user on the target image reference region selected again in the image to be recognized, taking the target image reference region selected again as a feature diffusion source and one or more regions except the target image reference region as feature diffusion destinations, calculating the feature similarity of the feature vector corresponding to the feature diffusion source and the feature vector corresponding to the feature diffusion destinations, and executing subsequent processing.

According to the image mask identification method provided by the embodiment of the invention, the characteristic diffusion mechanism is introduced, the area where the foreground point and/or the background point selected by the user is located is used as the characteristic diffusion source, and the characteristic diffusion is carried out in the image to be identified, so that the accuracy of the image identification model for identifying the target object in the image to be identified is improved. And then, pixel diffusion is carried out on the pixel data corresponding to the foreground point and/or the background point selected by the user, so that the prediction result is finely adjusted, and the accuracy of the prediction probability of the pixel points in the image to be recognized is further improved. According to the scheme of the embodiment of the invention, the user can mark the image mask area corresponding to the target object only by selecting the foreground point and/or the background point for a small number of times, so that the efficiency and the accuracy of image segmentation processing are improved.

Example two

As shown in fig. 3, which is a schematic flow chart of a training method of an image segmentation model according to an embodiment of the present invention, the method may be applied to a cloud service platform, and a user interacts with the cloud service platform through a terminal device, so as to achieve training data acquisition and training processing on the image segmentation model. In addition, the method can also be operated on the terminal equipment, so that the processing is locally finished. Specifically, the method comprises the following steps:

s201: the method comprises the steps of taking a plurality of training images as images to be recognized, obtaining a target image reference region selected by a user aiming at a target object, carrying out feature diffusion processing by taking the target image reference region as a feature diffusion source, carrying out image mask recognition on the images to be recognized after the feature diffusion processing, and obtaining an image mask aiming at the target object. In a specific mask identification process based on feature diffusion, the image mask identification method provided in the foregoing embodiment may be used to obtain an image mask for a target object. The target object as referred to herein may represent a kind of object, for example, clothes, an animal, or a specific motion such as a dog. The selection of the target object also determines the specific application of the machine learning model obtained after training.

S202: a machine learning model for executing an image segmentation process for segmenting a target object from a specified image is trained using a training image and a corresponding image mask as training data. The machine learning model can be, for example, a machine learning model based on a deep neural network, and a user can train the machine learning model according to the needs of the user, so that the machine learning model can meet the use of the user. The trained machine learning model can be embedded into a matting tool or a webpage tool of a user, so that personalized automatic image segmentation processing is provided for the user.

Further, since the trained machine learning model may also have a situation of inaccurate recognition, in this case, the recognition result may still be corrected by using the recognition method of the image mask provided by the embodiment of the present invention, specifically, the method of the embodiment may further include:

s203: image segmentation processing is performed using a machine learning model, and a result image after the segmentation processing is generated. The result image can be displayed to a user through the terminal equipment, and the user can execute subsequent correction operation under the condition that the user finds that the target object is not accurately segmented.

S204: and responding to the correction operation of the user on the result image, acquiring a target image reference area selected by the user, taking the result image as an image to be identified, and correcting the result image by using the image mask identification method provided by the embodiment of the invention. The user performs correction based on the image resulting from the segmentation process in such a manner that the foreground point and/or the background point as the reference point of the target image are selected, as in the case where the user needs to perform the target image reference point selection operation a plurality of times to acquire an accurate image mask as described in the foregoing embodiment. The corrected result image can be presented to the user again through the terminal device, and if an inaccurate place still exists, the user can perform the correction process again.

The corrected result image can be used as training data to train the machine learning model in an online feedback mode, so that the parameters of the machine learning model are updated, and the accuracy is further improved. Specifically, the following step S205 may be performed.

S205: and acquiring an image mask obtained after correcting the result image, and training the machine learning model by taking the corrected image mask and the corresponding original image as training data. The original image referred to here is an image input to a machine learning model and subjected to image segmentation processing.

Therefore, by the training method of the image segmentation model, a large amount of accurate training data can be efficiently formed, in addition, the processing result can be corrected in the using process of the model, and the model is further subjected to updating training, so that the accuracy of image segmentation processing can be continuously improved.

EXAMPLE III

As shown in fig. 4, which is a schematic structural diagram of an image mask recognition apparatus according to an embodiment of the present invention, the apparatus may be applied to a cloud service platform, and a user interacts with the cloud service platform through a terminal device, so as to implement interactive recognition processing on an image mask. In addition, the device can also run on a terminal device, so that the interactive recognition processing of the image mask is completed locally. Specifically, the apparatus includes:

and the feature extraction module 11 is configured to perform feature extraction on the image to be recognized and a target image reference region selected by the user in the image to be recognized. The target image reference points are points selected by a user as characteristic references and can comprise foreground points and/or background points. The foreground points are in the target range that the user wants to extract, and the background points are in the region outside the target range that the user wants to extract. In actual processing, a region including a specified range of the target image reference point is generally selected as a feature reference, and this region is a so-called target image reference region, and the size of the region can be set as necessary.

The feature extraction of the image to be recognized and the reference region of the target image can be realized by adopting a neural network model for image recognition processing, for example, the feature extraction part in a Deeplab V3+ segmentation model. For the processing of the target image reference region, a gaussian kernel function may be used to perform noise reduction and other processing, so as to acquire more accurate feature information serving as a subsequent feature diffusion source. Specifically, the feature extraction of the target image reference region selected by the user in the image to be recognized may include: obtaining foreground points and/or background points selected by a user; and taking a pixel area of a preset range containing foreground points and/or background points as a reference area of the target image and extracting the characteristics. The processing of feature extraction may specifically be: processing is performed using a gaussian kernel function to generate a gaussian map of the target image reference region, and feature extraction is performed on the gaussian map.

The feature similarity determining module 12 is configured to determine feature similarity between first feature data corresponding to a feature diffusion source and second feature data corresponding to a feature diffusion destination, where a target image reference region is used as the feature diffusion source and one or more regions other than the target image reference region are used as the feature diffusion destinations. Specifically, the first feature data and the second feature data may specifically be a feature vector corresponding to a feature diffusion source and a feature vector corresponding to a feature diffusion destination, and specifically, the feature similarity between the feature vector of the feature diffusion source and the feature vector of the feature diffusion destination may be calculated through processing of a vector dot product.

Further, as described above, the user may select foreground points and/or background points, and accordingly, image reference regions including foreground point image reference regions and/or background point image reference regions may be formed. Accordingly, the process of determining a feature diffusion destination may include: predicting foreground points and/or background points according to the characteristic data of the foreground point image reference area or the background point image reference area and the characteristic data of the image to be recognized to generate a preliminary prediction result; and determining a characteristic diffusion destination corresponding to the foreground point image reference region and/or the background point image reference region according to the preliminary prediction result.

And the feature diffusion module 13 is configured to superimpose the first feature data and the second feature data according to the weight identified by the feature similarity. Specifically, the feature vector corresponding to the feature diffusion source may be weighted and superimposed on the feature vector corresponding to the feature diffusion destination by the weight identified by the feature similarity. Through the weighted superposition of the feature vectors based on the feature similarity, the features of the foreground point and/or the background point selected by the user can be diffused to the whole image to be recognized, so that the target image reference point selection of the user each time is more effective.

And the identification processing module 14 is configured to perform image mask identification according to the feature data of the image to be identified after the weighted overlap-add processing, and generate an identification result of the image mask. The image mask recognition may be implemented by using the neural network model for the image recognition process, for example, using a deplab v3+ segmentation model.

In order to further improve the accuracy of mask recognition, a prediction result can be obtained based on the feature data of the image to be recognized after feature diffusion, and further pixel diffusion processing can be performed. That is, the process of the recognition processing module 14 may further include:

taking pixel points in a reference region of a target image as a pixel diffusion source, taking pixel points around the pixel diffusion source as a pixel diffusion destination, performing weighted superposition processing on an intermediate prediction result according to the weight of pixel similarity identification between the pixel diffusion source and the pixel diffusion destination, and generating an identification result of an image mask according to the intermediate prediction result after weighted superposition processing.

The above-mentioned weighting identified by the pixel similarity between the pixel diffusion source and the pixel diffusion destination, performing the weighted overlap-add processing of the intermediate prediction result may include the following iterative process: and taking the pixel points in the reference region of the target image as pixel diffusion sources, taking the pixel points around the pixel diffusion sources as pixel diffusion destinations, and calculating the pixel similarity between the pixel diffusion sources and the pixel diffusion destinations based on the pixel data of the pixel points. Specifically, the data difference value of the RGB channel between the pixel diffusion source and the pixel diffusion destination may be calculated as the pixel similarity; and weighting and superposing the prediction result as the pixel diffusion source on the prediction result of the pixel diffusion destination by the weight identified by the pixel similarity. And then, taking the current pixel diffusion destination as a new pixel diffusion source, taking pixel points around the new pixel diffusion source as a new pixel diffusion destination, repeatedly executing pixel similarity calculation and weighted superposition processing on the prediction result until a preset turn is reached, and after iteration is finished, executing the processing for generating the recognition result of the image mask according to the intermediate prediction result after the weighted superposition processing.

After the pixel diffusion processing is carried out, the prediction probabilities of a plurality of pixel points are adjusted, and finally, the recognition result of the pixel points in the image to be recognized is determined based on the prediction probability of the pixel points as foreground points and/or background points in the intermediate prediction result after the weighted superposition processing and the preset confidence threshold.

According to the image mask recognition device provided by the embodiment of the invention, a feature diffusion mechanism is introduced, the region where the foreground point and/or the background point selected by a user is located is used as a feature diffusion source, and feature diffusion is carried out in the image to be recognized, so that the accuracy of the image recognition model for recognizing the target object in the image to be recognized is improved. And then, pixel diffusion is carried out on the pixel data corresponding to the foreground point and/or the background point selected by the user, so that the prediction result is finely adjusted, and the accuracy of the prediction probability of the pixel points in the image to be recognized is further improved. According to the scheme of the embodiment of the invention, the user can mark the image mask area corresponding to the target object only by selecting the foreground point and/or the background point for a small number of times, so that the efficiency and the accuracy of image segmentation processing are improved.

Example four

As shown in fig. 5, which is a schematic structural diagram of a training device for an image segmentation model according to an embodiment of the present invention, the device may be applied to a cloud service platform, and a user interacts with the cloud service platform through a terminal device, so as to achieve training data acquisition and training processing on the image segmentation model. In addition, the device can also run on the terminal equipment, so that the processing is completed locally. Specifically, the apparatus includes:

the image mask acquiring module 21 is configured to acquire a target image reference region selected by a user for a target object by using a plurality of training images as images to be recognized, perform feature diffusion processing by using the target image reference region as a feature diffusion source, perform image mask recognition on the images to be recognized after the feature diffusion processing, and acquire an image mask for the target object. In a specific mask recognition process based on feature diffusion, the image mask for the target object can be acquired by using the recognition device of the image mask of the foregoing embodiment. The target object as referred to herein may represent a kind of object, for example, clothes, an animal, or a specific motion such as a dog. The selection of the target object also determines the specific application of the machine learning model obtained after training.

A training processing module 22 for training a machine learning model for performing an image segmentation process for segmenting the target object from the specified image, using the training image and the corresponding image mask as training data. The machine learning model can be, for example, a machine learning model based on a deep neural network, and a user can train the machine learning model according to the needs of the user, so that the machine learning model can meet the use of the user. The trained machine learning model can be embedded into a matting tool or a webpage tool of a user, so that personalized automatic image segmentation processing is provided for the user.

Further, the trained machine learning model may also have a situation of inaccurate recognition, and in this situation, the recognition result may still be corrected by using the recognition apparatus for image mask provided by the embodiment of the present invention, and the corrected result image may be used as training data to train the machine learning model in an online feedback manner, so as to update the parameters of the machine learning model, thereby further improving the accuracy. The detailed processing has been described in the foregoing embodiments.

EXAMPLE six

Fig. 6 is a schematic view of an application scenario of the image matting method according to the embodiment of the present invention. The device can be applied to a cloud service platform, and a user interacts with the cloud service platform through terminal equipment, so that image matting processing is realized. In addition, the device can also be operated on a terminal device and embedded into a matting tool to help a user perform matting processing on a target object. Specifically, the method comprises the following steps:

s301: and performing matting processing on the input image aiming at the specified target image by using an image segmentation model. In some application scenarios, the cloud service platform can provide a matting tool based on a machine learning model for a user, and an image segmentation model trained according to user requirements is embedded in the matting tool to help the user to perform matting processing on a target image specified by the user. For example, a clothing design company as a user may need to perform matting on clothing images in a large number of pictures on the internet, and therefore, the cloud service platform may provide the user with an image segmentation model specially used for clothing image matting for the matting processing of the user.

As shown in connection with fig. 6, the matting tool shown in the figure comprises an already trained image segmentation model, and the image mask recognition device described above for generating an image mask through interaction with a user. The upper left square box represents an input image subjected to matting processing, the input image is subjected to matting processing through an image segmentation model, and an image before left-below correction is output, as shown in the figure, a four-corner star pattern in the image represents a scratched target image, but the target image may have some problems and needs to be further corrected by a user.

S302: and acquiring a target image reference area selected by the user in response to the correction operation of the user on the extracted target image. In some cases, the image segmentation model may have a deviation with respect to the matting details, and after seeing the matting result, the user may select some foreground points and/or background points through the aforementioned click interaction operation, so as to mark a target image reference region for image modification.

With reference to fig. 6, a user selects foreground points (solid points) and background points (hollow points) near the edge (dotted line portion shown in the figure) of the target image to be corrected on the image before correction obtained in the previous step based on the interactive interface provided by the matting tool, so as to trigger the image mask recognition apparatus provided in the embodiment of the present invention to generate an image mask.

S303: using the image mask recognition method in the foregoing embodiment, the image mask of the target object is acquired, and the image mask is used to perform the matting processing on the input image again for a specified target image, so as to generate a new target image. Based on the previous step, the image mask can be regenerated from the target image reference region marked by the user, and then the matting processing is performed based on the new image mask, so as to form a new target image after the correction of the lower right side, and as can be seen from the figure, the edge of the new target image is changed. If the user is still unsatisfied with the new target image, the above processing of step S302 may be repeated to continue to correct the target image until the target image that the user is satisfied with is extracted.

And as shown in fig. 6, if the user is still unsatisfied with the new matting image, the user can go back to the interactive interface provided by the matting tool, reselect foreground points and/or background points, trigger to generate a new image mask, and perform the matting processing again until the user is satisfied.

In addition, after a new target image which is satisfied by a user is generated, the image segmentation model can be trained by combining the corresponding original image as training data based on the target image which is corrected and confirmed by the user, so that the accuracy of the image segmentation model for the matting processing of the specified target image is improved, and the requirements of the user are better met.

EXAMPLE seven

The foregoing embodiment describes a flow and an apparatus structure of an image mask recognition process, an image matting method, and an image segmentation model training process, and functions of the above method and apparatus can be implemented by an electronic device, as shown in fig. 7, which is a schematic structural diagram of the electronic device according to an embodiment of the present invention, and specifically includes: a memory 110 and a processor 120.

And a memory 110 for storing a program.

In addition to the programs described above, the memory 110 may also be configured to store other various data to support operations on the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth.

The memory 110 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

A processor 120, coupled to the memory 110, is used for executing the program in the memory 110 to execute the operation steps of the image mask identification processing method and/or the image matting method and/or the image segmentation model training processing method described in the foregoing embodiments.

Furthermore, the processor 120 may also include various modules described in the foregoing embodiments to perform the recognition process of the image mask and/or the image matting method and/or the training process of the image segmentation model, and the memory 110 may be used, for example, to store data required by these modules to perform the operations and/or the output data.

The detailed description of the above processing procedure, the detailed description of the technical principle, and the detailed analysis of the technical effect are described in the foregoing embodiments, and are not repeated herein.

Further, as shown, the electronic device may further include: communication components 130, power components 140, audio components 150, display 160, and other components. Only some of the components are schematically shown in the figure and it is not meant that the electronic device comprises only the components shown in the figure.

The communication component 130 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, a mobile communication network, such as 2G, 3G, 4G/LTE, 5G, or a combination thereof. In an exemplary embodiment, the communication component 130 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 130 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

The power supply component 140 provides power to the various components of the electronic device. The power components 140 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for an electronic device.

The audio component 150 is configured to output and/or input audio signals. For example, the audio component 150 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 110 or transmitted via the communication component 130. In some embodiments, audio assembly 150 also includes a speaker for outputting audio signals.

The display 160 includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The aforementioned program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method of identifying an image mask, comprising:

2. The method according to claim 1, wherein the image mask recognition is performed according to the feature data of the image to be recognized after the weighted overlap-add processing, and the generating of the recognition result of the image mask comprises:

predicting foreground points and/or background points according to the characteristic data of the image to be recognized after the weighted superposition processing to generate an intermediate prediction result;

taking pixel points in the reference region of the target image as pixel diffusion sources, taking pixel points around the pixel diffusion sources as pixel diffusion destinations, and performing weighted superposition processing on intermediate prediction results according to the weight of pixel similarity identification between the pixel diffusion sources and the pixel diffusion destinations;

and generating an identification result of the image mask according to the intermediate prediction result after the weighted superposition processing.

3. The method of claim 2, wherein the intermediate prediction result weighted overlap-add process comprises:

determining pixel similarity between the pixel diffusion source and the pixel diffusion destination based on pixel data of a pixel point;

weighting and superposing an intermediate prediction result serving as a pixel diffusion source on an intermediate prediction result of the pixel diffusion destination by the weight identified by the pixel similarity;

and taking the current pixel diffusion destination as a new pixel diffusion source, taking the pixel points around the new pixel diffusion source as the new pixel diffusion destination, and repeatedly executing pixel similarity calculation and weighted superposition processing on the intermediate prediction result until a preset turn is reached.

4. The method of claim 1, wherein the extracting the features of the target image reference region selected by the user in the image to be recognized comprises:

obtaining foreground points and/or background points selected by a user;

taking a pixel region of a predetermined range containing the foreground point and/or the background point as the reference region of the target image;

and performing feature extraction on the target image reference region.

5. The method of claim 4, wherein feature extracting the target image reference region comprises:

and performing processing by using a Gaussian kernel function to generate a Gaussian map of the target image reference region, and performing feature extraction on the Gaussian map.

6. The method of claim 4, wherein determining a pixel similarity between the pixel diffusion source and the pixel diffusion destination comprises:

determining a data difference value of RGB channels between the pixel diffusion source and the pixel diffusion destination as the pixel similarity.

7. The method of claim 2, wherein generating the recognition result of the image mask according to the weighted overlap-add processed intermediate prediction result comprises:

and comparing the prediction probability of the pixel point as a foreground point and/or a background point in the weighted and superimposed intermediate prediction result with a preset confidence threshold value, and determining the recognition result of the pixel point in the image to be recognized.

8. The method of claim 1, wherein the selected target image reference region comprises a foreground point image reference region and/or a background point image reference region, the method further comprising:

predicting foreground points and/or background points according to the characteristic data of the foreground point image reference area or the background point image reference area and the characteristic data of the image to be recognized to generate a preliminary prediction result;

and determining a characteristic diffusion destination corresponding to the foreground point image reference region and/or the background point image reference region according to the preliminary prediction result.

9. The method of claim 1, further comprising:

and in response to the operation of the user on the target image reference area selected again in the image to be recognized, taking the target image reference area selected again as a feature diffusion source and one or more areas except the target image reference area as feature diffusion destinations, determining feature similarity between first feature data corresponding to the feature diffusion source and second feature data corresponding to the feature diffusion destinations, and performing subsequent processing.

10. A training method of an image segmentation model comprises the following steps:

taking a plurality of training images as images to be identified, acquiring a target image reference region selected by a user aiming at a target object, and acquiring an image mask aiming at the target object by using the image mask identification method of any one of claims 1 to 9;

11. The method of claim 10, further comprising:

performing image segmentation processing by using the machine learning model to generate a segmentation processed result image;

in response to the correction operation of the result image by the user, acquiring a target image reference area selected by the user, taking the result image as the image to be recognized, and correcting the result image by using the image mask recognition method according to any one of claims 1 to 9.

12. The method of claim 11, further comprising:

acquiring an image mask for correcting the result image;

and taking the corrected image mask and the corresponding original image as training data, and training the machine learning model.

13. An apparatus for identifying an image mask, comprising:

14. The apparatus according to claim 13, wherein the means for performing image mask recognition according to the feature data of the image to be recognized after the weighted overlap-add processing, and generating a recognition result of the image mask comprises:

15. The apparatus of claim 14, wherein generating the recognition result of the image mask according to the weighted overlap-add processed intermediate prediction result comprises:

16. An apparatus for training an image segmentation model, comprising:

17. An electronic device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory to perform the image mask identification method according to any one of claims 1 to 9.

18. An electronic device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory to perform the method of training an image segmentation model according to any one of claims 10 to 12.

19. An image matting method comprising:

20. The method of claim 19, further comprising:

and taking the new target image and the corresponding original image as training data, and training the image segmentation model.

21. An electronic device, comprising:

a memory for storing a program;

a processor for executing the program stored in the memory to perform the image matting method of claim 19 or 20.