Disclosure of Invention
The application aims to provide an anti-shielding face image segmentation method which is clear in logic, safe, effective, reliable and simple and convenient to operate, and can improve the segmentation accuracy of difficult samples such as faces, shielding objects and the like and the segmentation efficiency of face images.
Based on the above purpose, the technical scheme provided by the application is as follows:
an anti-occlusion face image segmentation method comprises the following steps:
dividing and labeling the original image to obtain a shielding object labeling diagram and the face labeling diagram;
training an initial image segmentation model according to the original image, the occlusion object annotation graph and the face annotation graph to obtain a trained image segmentation model;
acquiring a face segmentation image according to the preprocessed undetermined image and the trained image segmentation model;
wherein the original image and the pending image both comprise: a face and a shelter.
Preferably, the training of the initial image segmentation model according to the original image, the occlusion labeling graph and the face labeling graph to obtain a trained image segmentation model includes the following steps:
respectively acquiring a shielding confidence coefficient map and a face confidence coefficient map according to the preprocessed original image and the initial image segmentation model;
obtaining model loss according to the occlusion labeling graph, the occlusion confidence graph, the face labeling graph and the face confidence graph;
and reducing the model loss through multiple iterations so as to adjust preset parameters in the initial image segmentation model and obtain a trained image segmentation model.
Preferably, the initial image segmentation model comprises:
the first sub-network module and the second sub-network module are of the same structure;
the first sub-network module includes: a first feature extraction module and a first segmentation module;
the second sub-network module includes: the device comprises a second feature extraction module and a second segmentation module.
Preferably, the step of obtaining a mask confidence map and a face confidence map according to the preprocessed original image and the initial image segmentation model respectively includes the following steps:
preprocessing the original image to obtain a preprocessed original image;
extracting the preprocessed original shelter image features according to the first feature extraction module;
dividing the preprocessed original occlusion image characteristics according to the first dividing module to obtain an occlusion confidence map;
extracting the preprocessed original face image features according to the second feature extraction module;
and according to the second segmentation module, segmenting the preprocessed original face image features to obtain the face confidence map.
Preferably, the obtaining model loss according to the occlusion labeling graph and the occlusion confidence graph, the face labeling graph and the face confidence graph includes the following steps:
calculating and acquiring a first loss according to the occlusion labeling graph and the occlusion confidence graph;
calculating and acquiring a second loss according to the face annotation graph and the face confidence map;
the first loss and the second loss are weighted and summed to obtain a model loss.
Preferably, the calculating to obtain the first loss according to the occlusion labeling graph and the occlusion confidence graph includes the following specific calculation formula:
wherein, pre occ To block confidence map, label occ For covering object mark graph, loss occ For the first loss, n is the amount of input data.
Preferably, the calculating to obtain the second loss according to the face label graph and the face confidence graph includes the following specific calculation formula:
wherein, pre seg For face confidence map, label seg Labeling the face with a graph, loss seg For the second loss, n is the amount of input data.
Preferably, after the obtaining the occlusion confidence map and the face confidence map according to the preprocessed original image and the initial image segmentation model, the method further includes the following steps:
defining a plurality of characteristics of the human faces;
and establishing a mapping relation between each feature of the face and each preset pixel value.
Preferably, the step of obtaining a face segmentation image according to the pre-processed pending image and the trained image segmentation model includes the following steps:
acquiring a confidence level diagram of the undetermined face according to the preprocessed undetermined image and the trained image segmentation model;
and outputting the characteristics of the corresponding face as a face segmentation image according to the maximum value in the preset pixel point values.
According to the anti-occlusion face image segmentation method provided by the application, an original image is subjected to segmentation labeling treatment, so that an occlusion object labeling diagram and a face labeling diagram are obtained, and an initial image segmentation model is trained through the original image, the occlusion object labeling diagram and the face labeling diagram, so that a trained image segmentation model is obtained; when the face segmentation image needs to be obtained, preprocessing the undetermined image, and then combining the trained network model to obtain the face segmentation image.
Compared with the prior art, the method effectively distinguishes the adjacent boundaries of the shielding object and the face part by decoupling the shielding object and the face, improves the segmentation accuracy of the model on difficult samples such as face shielding, weak light, strong light, light blurring and the like, and accelerates the model reasoning speed.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application is written in a progressive manner.
The embodiment of the application provides an anti-occlusion face image segmentation method. The method mainly solves the technical problem that in the prior art, the segmentation accuracy of difficult samples such as faces and shielding objects are overlapped highly is low.
As shown in fig. 1, an anti-occlusion face image segmentation method includes the following steps:
s1, dividing and labeling an original image to obtain a shielding object labeling diagram and a face labeling diagram;
s2, training an initial image segmentation model according to the original image, the shielding object annotation graph and the face annotation graph to obtain a trained image segmentation model;
s3, acquiring a face segmentation image according to the preprocessed undetermined image and the trained image segmentation model;
wherein, the original image and the undetermined image both comprise: a face and a shelter.
In the step S1, an original image is acquired by a collecting device, and the original image is processed by using a segmentation marking tool, so that a shielding object segmentation marking image and a human face segmentation marking image are obtained, and scaled to 256 multiplied by 256;
in step S2, an initial image segmentation model is constructed, and the initial image segmentation model is trained through an original image, a shelter segmentation labeling image and a face segmentation labeling image, so that a trained image segmentation model is obtained;
in the embodiment, the training set has 30000 pieces of labeling data, the training frequency is set to be 140, and the learning rate is=1e-3;
in step S3, after preprocessing the undetermined image, inputting the undetermined image into the trained image segmentation model to obtain a face segmentation image result.
In this embodiment, the twin parallel network model is composed of 2 sub-networks with the same structure, one sub-network is used for shielding object segmentation, the other sub-network is used for face segmentation, each sub-network is composed of a feature extraction module and a segmentation module, the feature extraction module adopts the Unet to perform feature extraction, and a pixel differential convolution (Pixel difference convolution, PDC) structure is designed in the segmentation module and is used for fine segmentation. The face segmentation network fuses the shielding object features and the face features, and then carries out face segmentation through the PDC segmentation module.
The specific implementation process is as follows: the input is 256×256×64 eigenvectors, the eigenvectors are convolved by 64 center differential convolutions conv1 of 5×5, 64 eigenvectors of 256×256×1 are obtained after a regularization activation function relu, then the eigenvectors are convolved by 64 center differential convolutions conv2 of 3×3, the result of conv2 convolution is added with the input eigenvectors 256×256×64, then the result of 256×256×12 is obtained by 12 common convolutions conv3 of 1×1, and finally the result is output by a normalized exponential function softmax.
As shown in fig. 2, preferably, in step S2, the following steps are included:
A1. respectively acquiring a shielding confidence coefficient map and a human face confidence coefficient map according to the preprocessed original image and the preprocessed original image segmentation model;
A2. obtaining model loss according to the shielding object labeling drawing, the shielding object confidence level drawing, the human face labeling drawing and the human face confidence level drawing;
A3. and reducing model loss through multiple iterations so as to adjust preset parameters in the initial image segmentation model and obtain the trained image segmentation model.
In the step A1, the preprocessed original image is input into an initial image segmentation model, so that prediction is performed, and a shelter confidence map and a face confidence map are obtained;
in the step A2, a model loss is calculated and obtained by combining the shielding object labeling diagram and the shielding object confidence level diagram and the face labeling diagram and the face confidence level diagram;
in step A3, the model loss is reduced by multiple iterations, so that preset parameters in the initial image segmentation model are adjusted, and the trained image segmentation model is obtained.
Preferably, the initial image segmentation model comprises:
the first sub-network module and the second sub-network module are of the same structure;
the first subnetwork module includes: a first feature extraction module and a first segmentation module;
the second sub-network module includes: the device comprises a second feature extraction module and a second segmentation module.
As shown in fig. 3, preferably, step A1 includes the steps of:
B1. preprocessing an original image to obtain a preprocessed original image;
B2. extracting the image features of the preprocessed original shielding object according to the first feature extraction module;
B3. dividing the preprocessed original occlusion image characteristics according to a first dividing module to obtain an occlusion confidence map;
B4. extracting the preprocessed original face image features according to the second feature extraction module;
B5. and according to the second segmentation module, segmenting the preprocessed original face image features to obtain a face confidence map.
In the actual application process, the initial image segmentation model designs two parallel sub-network modules, and the two sub-network modules are arranged to be of the same structure; the sub-network module is provided with a feature extraction module and a segmentation module, after the features are extracted by the feature extraction module, corresponding technical features are obtained, and after feature fusion is carried out, face segmentation is carried out by the segmentation module;
in step B1, the image preprocessing is to sort out each text image and deliver it to the recognition module for recognition, and this process is called image preprocessing. In image analysis, the input image is subjected to processing performed before feature extraction, segmentation, and matching.
The main purpose of image preprocessing is to eliminate extraneous information in the image, recover useful real information, enhance the detectability of related information and maximally simplify data, thereby improving the reliability of feature extraction, image segmentation, matching and recognition.
In this embodiment, the acquired original image, its occlusion object segmentation labeling image, and the face segmentation labeling image are scaled to 256×256 sizes.
And B2 to B5 are all that corresponding original image features are extracted from the preprocessed original face image features through corresponding feature extraction modules and segmentation modules in the first sub-network module and the second sub-network module, so that the image features are segmented to obtain corresponding face confidence maps.
In this embodiment, a twin parallel network model is constructed, which includes 2 sub-networks of the same structure, each sub-network including a feature extraction module and a segmentation module.
The feature extraction module adopts the Unet to extract features;
the segmentation module is designed with a pixel differential convolution (Pixel difference convolution, PDC) structure, and the differential information is utilized to represent abrupt change and detail characteristics of the edge context and fine edge segmentation;
specifically, a subnetwork uses the Unet to extract the characteristics of the occlusion, and segments the characteristics of the occlusion through a PDC segmentation model to obtain a segmentation confidence map of the occlusion (the value of the confidence map is a continuous value between [0,1 ]); the other sub-network uses the Unet to extract the face characteristics, and performs face segmentation through a PDC segmentation model after the face characteristics are fused with the occlusion object characteristics, so as to obtain a 12-class segmentation confidence coefficient map (12 classes of skin, left eyebrow, right eyebrow, left eye, right eye, nose, upper lip, lower lip, oral cavity, glasses, occlusion object and background).
As shown in fig. 4, preferably, step A2 includes the steps of:
C1. calculating and acquiring a first loss according to the occlusion labeling graph and the occlusion confidence graph;
C2. calculating and acquiring a second loss according to the face annotation graph and the face confidence graph;
C3. the first loss and the second loss are weighted and summed to obtain a model loss.
Preferably, according to the occlusion labeling graph and the occlusion confidence graph, the first loss is calculated and acquired, and a specific calculation formula is as follows:
wherein, pre occ To block confidence map, label occ For covering object mark graph, loss occ For the first loss, n is the amount of input data.
Preferably, according to the face label graph and the face confidence graph, calculating to obtain the second loss, wherein the specific calculation formula is as follows:
wherein, pre seg For face confidence map, label seg Labeling the face with a graph, loss seg For the second loss, n is the amount of input data.
In the steps C1 to C3, the preprocessed face image is input into a twin parallel network model for prediction, and a shelter segmentation confidence map pre is obtained occ And face segmentation confidence map pre seg Calculating pre occ Label graph label for dividing corresponding shielding object occ Loss of loss between occ 、pre seg Label graph label corresponding to face segmentation seg Between which are locatedLoss of (2) seg And carrying out weighted summation on the two losses to obtain a final model loss, and reducing model loss through continuous iteration, so that parameters in the twin parallel network model are adjusted, and a trained twin parallel network model is obtained.
Preferably, after step A1, the method further comprises the following steps:
defining a plurality of characteristics of the human faces;
and establishing a mapping relation between each feature of the face and each preset pixel value.
In this embodiment, in the segmentation label graph of the occlusion object, a pixel value of 1 indicates the occlusion object, and a pixel value of 0 indicates the non-occlusion object. In the face segmentation label graph, a pixel point value of 0 represents a background, 1 represents skin, 2 represents a left eyebrow, 3 represents a right eyebrow, 4 represents a left eye, 5 represents a right eye, 6 represents a nose, 7 represents an upper lip, 8 represents a lower lip, 9 represents an oral cavity, 10 represents glasses and 11 represents a shielding object.
As shown in fig. 5, preferably, step S3 includes the steps of:
D1. acquiring a confidence level diagram of the undetermined face according to the preprocessed undetermined image and the trained image segmentation model;
D2. and outputting the characteristics of the corresponding face as a face segmentation image according to the maximum value in the preset pixel point values.
In the step D1 to the step D2, after preprocessing an image to be subjected to face segmentation, inputting the image into a trained twin parallel network model, and outputting a face segmentation confidence map; calculating a maximum value index corresponding to each pixel point of the face segmentation confidence coefficient graph to obtain a face segmentation result;
in this embodiment, the face segmentation confidence map is composed of 12 256×256 confidence maps (total 12 classes), which class has the greatest confidence in the corresponding pixel, and the pixel outputs the class, i.e. calculates the index corresponding to the maximum value of each pixel, and finally outputs a face segmentation map of 1×256×256 (the value of the segmentation map is an integer between [0,11 ]).
In the embodiments provided in the present application, it should be understood that the disclosed method may be implemented in other manners. The above-described method embodiments are merely illustrative, and for example, the division of modules is merely a logical function division, and there may be other division manners in actual implementation, such as: multiple modules or components may be combined, or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or modules, whether electrically, mechanically, or otherwise.
In addition, each functional module in each embodiment of the present application may be integrated in one processor, or each module may be separately used as one device, or two or more modules may be integrated in one device; the functional modules in the embodiments of the present application may be implemented in hardware, or may be implemented in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by program instructions and associated hardware, where the program instructions may be stored in a computer readable storage medium, and where the program instructions, when executed, perform steps comprising the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.
It should be appreciated that the use of "systems," "devices," "units," and/or "modules" in this disclosure is but one way to distinguish between different components, elements, parts, portions, or assemblies at different levels. However, if other words can achieve the same purpose, the word can be replaced by other expressions.
As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus. The inclusion of an element defined by the phrase "comprising one … …" does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises an element.
The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature.
If a flowchart is used in the present application, the flowchart is used to describe the operations performed by a system according to an embodiment of the present application. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.
The anti-occlusion face image segmentation method provided by the application is described in detail. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.