CN115471896A - Method and device for making certificate photo - Google Patents

Method and device for making certificate photo Download PDF

Info

Publication number
CN115471896A
CN115471896A CN202211161872.7A CN202211161872A CN115471896A CN 115471896 A CN115471896 A CN 115471896A CN 202211161872 A CN202211161872 A CN 202211161872A CN 115471896 A CN115471896 A CN 115471896A
Authority
CN
China
Prior art keywords
target
auxiliary
portrait
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211161872.7A
Other languages
Chinese (zh)
Inventor
施佳子
李艳宇
吕朝辉
于海燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202211161872.7A priority Critical patent/CN115471896A/en
Publication of CN115471896A publication Critical patent/CN115471896A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships

Abstract

The application relates to the technical field of information security, in particular to a method and a device for manufacturing a certificate photo. The method comprises the following steps: acquiring a target original image; inputting a target original image into a portrait semantic segmentation model to obtain an identification result aiming at the target portrait, wherein the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the identification result of the target portrait by fusing processing data of the main module aiming at the target original image and processing data of the auxiliary module aiming at the target original image; acquiring a target portrait from a target original image according to the identification result of the target portrait; and adding the target portrait to a preset target background to obtain the identification photo. By adopting the method, the privacy of the user can be ensured, and the effect of the generated certificate photo can be taken into account.

Description

Method and device for making certificate photo
Technical Field
The present application relates to the field of information security technologies, and in particular, to a method and an apparatus for manufacturing an identification photo.
Background
The main difficulty in making the identification photo is to replace the complex background of the figure picture with a pure color background and meet the requirements of the standard identification photo. People are generally segmented from a complex background through a portrait semantic segmentation algorithm constructed by a deep learning framework, and the segmented portrait can be superimposed on a pure background.
In the human image semantic segmentation algorithm in the traditional technology, excellent algorithms such as deep Labv3, biseNetv2, fast-SCNN and the like are diversified. In the practical application process, on one hand, due to hardware performance factors, the computing power, the storage and other resources of the mobile equipment are limited, the size upper limit of a supported model is also low, and after the portrait semantic segmentation algorithm model is deployed in the mobile equipment, a user needs to upload an original image to a server to complete the production of a certificate photo, so that the privacy of the user cannot be ensured; on the other hand, if the portrait semantic segmentation algorithm model is directly operated on the mobile equipment without the server side in order to ensure the privacy of the user, the model precision is greatly reduced due to the size of the compressed model, the portrait segmentation precision is low, and the generated certificate photo is poor in effect.
Therefore, when the current portrait semantic segmentation algorithm is applied to the certificate photo making process, the user privacy and the generated certificate photo effect cannot be considered at the same time.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a computer device, a computer readable storage medium and a computer program product for producing a certificate photo, so as to ensure the privacy of the user and also to take into account the effect of the generated certificate photo.
In a first aspect, the present application provides a method of producing a document photo, the method comprising:
acquiring a target original image;
inputting the target original image into a portrait semantic segmentation model to obtain an identification result aiming at the target portrait, wherein the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the identification result of the target portrait by fusing the processing data of the main module aiming at the target original image and the processing data of the auxiliary module aiming at the target original image;
acquiring the target portrait from the target original image according to the identification result of the target portrait;
and adding the target portrait to a preset target background to obtain a certificate photo.
In one embodiment, the inputting the target original image into a portrait semantic segmentation model to obtain a recognition result for the target portrait includes:
downsampling the target original image to obtain a first target auxiliary image and a second target auxiliary image, wherein downsampling dimensionalities of the first target auxiliary image and the second target auxiliary image are different;
inputting the first target auxiliary image and the second target auxiliary image into the auxiliary module respectively to obtain a first auxiliary feature corresponding to the first target auxiliary image and a second auxiliary feature corresponding to the second target auxiliary image;
and inputting the target original image into the main module, and respectively superposing the first auxiliary feature and the second auxiliary feature to the output of the main module to obtain a recognition result aiming at the target portrait.
In one embodiment, the main module includes a main encoder and a main decoder, the inputting the target original image into the main module and the superimposing the first auxiliary feature and the second auxiliary feature on the output of the main module respectively obtain the recognition result for the target portrait, and the method includes:
inputting the target original image into the main encoder, and respectively superposing the first auxiliary feature and the second auxiliary feature to the output of the main encoder to obtain target image features;
and inputting the target image features into the main decoder, and respectively superposing the first auxiliary features and the second auxiliary features to the output of the main decoder to obtain the identification result aiming at the target portrait.
In one embodiment, the auxiliary module includes an auxiliary encoder and an auxiliary decoder, the first auxiliary feature includes a first auxiliary encoding feature and a first auxiliary decoding feature, the second auxiliary feature includes a second auxiliary encoding feature and a second auxiliary decoding feature, and the inputting the first target auxiliary image and the second target auxiliary image into the auxiliary module respectively obtains the first auxiliary feature corresponding to the first target auxiliary image and the second auxiliary feature corresponding to the second target auxiliary image includes:
inputting the first target auxiliary image and the second target auxiliary image into the auxiliary encoder respectively to obtain the first auxiliary coding feature corresponding to the first target auxiliary image and the second auxiliary coding feature corresponding to the second target auxiliary image;
and inputting the first auxiliary coding feature and the second auxiliary coding feature into the auxiliary decoder respectively to obtain the first auxiliary decoding feature corresponding to the first target auxiliary image and the second auxiliary decoding feature corresponding to the second target auxiliary image.
In one embodiment, the superimposing the first and second assist features onto the output of the primary encoder comprises:
determining a corresponding first target convolutional layer from the depth separable convolutional layers of the primary encoder according to the scale of the first auxiliary coding feature;
superposing the first auxiliary coding feature with the output of the first target convolutional layer to obtain a first coding feature, and inputting the first coding feature to a next convolutional layer of the first target convolutional layer;
determining a corresponding second target convolutional layer from the depth separable convolutional layers of the primary encoder according to the scale of the second auxiliary coding feature;
and superposing the second auxiliary coding feature and the output of the second target convolutional layer to obtain a second coding feature, and inputting the second coding feature to the next convolutional layer of the second target convolutional layer.
In one embodiment, the superimposing the first and second assist features onto the output of the main decoder, respectively, includes:
determining a corresponding first target deconvolution layer from the main decoder according to the scale of the first auxiliary decoding feature;
superimposing the first auxiliary decoding feature with an output of the first target deconvolution layer to obtain a first decoding feature, and inputting the first decoding feature to a next deconvolution layer of the first target deconvolution layer;
determining a corresponding second target deconvolution layer from the main decoder according to the scale of the second auxiliary decoding feature;
and superposing the second auxiliary decoding characteristic with the output of the second target deconvolution layer to obtain a second decoding characteristic, and inputting the second decoding characteristic to a next deconvolution layer of the second target deconvolution layer.
In one embodiment, the method for producing the certificate photo further comprises:
training the initial portrait semantic segmentation model according to a preset training set to obtain a first portrait semantic segmentation model;
determining a sum of absolute values of each value in the convolution kernels for a plurality of convolution kernels in any one of the depth separable convolution layers in the first portrait semantic segmentation model, determining a plurality of target convolution kernels from each convolution kernel according to the sum of absolute values of each value in each convolution kernel in the first portrait semantic segmentation model, and deleting the plurality of target convolution kernels to obtain an intermediate portrait semantic segmentation model;
and retraining the intermediate portrait semantic segmentation model according to the training set until the accuracy of the trained intermediate portrait semantic segmentation model reaches a preset value, and determining the intermediate portrait semantic segmentation model as the portrait semantic segmentation model.
In a second aspect, the present application also provides an apparatus for producing an identification photograph, the apparatus comprising:
the image acquisition module is used for acquiring a target original image;
the recognition module is used for inputting the target original image into a portrait semantic segmentation model to obtain a recognition result aiming at the target portrait, the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the recognition result aiming at the target portrait by fusing the processing data of the main module aiming at the target original image and the processing data of the auxiliary module aiming at the target original image;
the figure acquisition module is used for acquiring the target figure from the target original image according to the identification result of the target figure;
and the synthesis module is used for adding the target portrait to a preset target background to obtain a certificate photo.
In one embodiment, the identifying module is further configured to perform downsampling on the target original image to obtain a first target auxiliary image and a second target auxiliary image, where downsampling dimensions of the first target auxiliary image and the second target auxiliary image are different; inputting the first target auxiliary image and the second target auxiliary image into the auxiliary module respectively to obtain a first auxiliary feature corresponding to the first target auxiliary image and a second auxiliary feature corresponding to the second target auxiliary image; and inputting the target original image into the main module, and respectively superposing the first auxiliary feature and the second auxiliary feature to the output of the main module to obtain a recognition result aiming at the target portrait.
In one embodiment, the main module includes a main encoder and a main decoder, and the identification module is further configured to input the target original image into the main encoder, and superimpose the first auxiliary feature and the second auxiliary feature onto the output of the main encoder respectively to obtain a target image feature; and inputting the target image features into the main decoder, and respectively superposing the first auxiliary features and the second auxiliary features to the output of the main decoder to obtain a recognition result aiming at the target portrait.
In one embodiment, the auxiliary module includes an auxiliary encoder and an auxiliary decoder, the first auxiliary feature includes a first auxiliary encoding feature and a first auxiliary decoding feature, and the second auxiliary feature includes a second auxiliary encoding feature and a second auxiliary decoding feature, and the identification module is further configured to input the first target auxiliary image and the second target auxiliary image into the auxiliary encoder respectively, so as to obtain the first auxiliary encoding feature corresponding to the first target auxiliary image and the second auxiliary encoding feature corresponding to the second target auxiliary image; and inputting the first auxiliary coding feature and the second auxiliary coding feature into the auxiliary decoder respectively to obtain the first auxiliary decoding feature corresponding to the first target auxiliary image and the second auxiliary decoding feature corresponding to the second target auxiliary image.
In one embodiment, the identifying module is further configured to determine a corresponding first target convolutional layer from the depth separable convolutional layers of the primary encoder according to a scale of the first auxiliary coding feature; superposing the first auxiliary coding feature with the output of the first target convolutional layer to obtain a first coding feature, and inputting the first coding feature to a next convolutional layer of the first target convolutional layer; determining a corresponding second target convolutional layer from the depth separable convolutional layers of the primary encoder according to the scale of the second auxiliary coding feature; and superposing the second auxiliary coding feature and the output of the second target convolutional layer to obtain a second coding feature, and inputting the second coding feature to the next convolutional layer of the second target convolutional layer.
In one embodiment, the identifying module is further configured to determine a corresponding first target deconvolution layer from the primary decoder according to a scale of the first auxiliary decoding feature; superimposing the first auxiliary decoding feature with an output of the first target deconvolution layer to obtain a first decoding feature, and inputting the first decoding feature to a next deconvolution layer of the first target deconvolution layer; determining a corresponding second target deconvolution layer from the main decoder according to the scale of the second auxiliary decoding feature; and superposing the second auxiliary decoding characteristic with the output of the second target deconvolution layer to obtain a second decoding characteristic, and inputting the second decoding characteristic to a next deconvolution layer of the second target deconvolution layer.
In one embodiment, the identification photo making device further comprises a training module, wherein the training module is used for training the initial portrait semantic segmentation model according to a preset training set to obtain a first portrait semantic segmentation model; determining a sum of absolute values of each value in the convolution kernels for a plurality of convolution kernels in any one of the depth separable convolution layers in the first portrait semantic segmentation model, determining a plurality of target convolution kernels from each convolution kernel according to the sum of absolute values of each value in each convolution kernel in the first portrait semantic segmentation model, and deleting the plurality of target convolution kernels to obtain an intermediate portrait semantic segmentation model; and retraining the intermediate portrait semantic segmentation model according to the training set until the accuracy of the trained intermediate portrait semantic segmentation model reaches a preset value, and determining the intermediate portrait semantic segmentation model as the portrait semantic segmentation model.
In a third aspect, the present application further provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the foregoing method embodiments when executing the computer program.
In a fourth aspect, the present application further provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps in the above-mentioned method embodiments.
In a fifth aspect, the present application further provides a computer program product comprising a computer program that, when executed by a processor, performs the steps of the above-described method embodiments.
The certificate photo making method, the device, the computer equipment, the computer readable storage medium and the computer program product are used for obtaining the target original image; inputting the target original image into a portrait semantic segmentation model to obtain a recognition result aiming at the target portrait, wherein the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the recognition result of the target portrait by fusing processing data of the main module aiming at the target original image and processing data of the auxiliary module aiming at the target original image; acquiring the target portrait from the target original image according to the identification result of the target portrait; and adding the target portrait to a preset target background to obtain a certificate photo. Compared with the traditional portrait semantic segmentation algorithm, the method, the device, the computer equipment, the computer readable storage medium and the computer program product for making the identification photo provided by the application construct the portrait semantic segmentation model by using the depth separable convolution layers, can reduce the number of the convolution layers, thereby reducing the operation amount and the parameter amount, reducing the size of the portrait semantic segmentation model, ensuring the privacy of a user without uploading an original image to a server side in the process of making the identification photo, simultaneously increasing the precision of the portrait semantic segmentation model by using the auxiliary module, improving the precision of segmenting a target portrait, and improving the effect of the generated identification photo.
Drawings
FIG. 1 is a schematic flow chart illustrating a method for producing a certificate photo in one embodiment.
FIG. 2 is a schematic diagram of a portrait semantic segmentation model in an embodiment.
FIG. 3 is a flowchart illustrating step 104 according to an embodiment.
FIG. 4 is a flowchart illustrating step 306 according to one embodiment.
FIG. 5 is a flowchart illustrating step 304 according to an embodiment.
FIG. 6 is a flow diagram illustrating step 402 in one embodiment.
FIG. 7 is a flowchart of step 404 in one embodiment.
FIG. 8 is a flowchart illustrating a method of producing a certificate photo in one embodiment.
FIG. 9 is a block diagram of a portrait semantic segmentation model in an embodiment.
FIG. 10 is a block diagram of the configuration of a credential production device in one embodiment.
FIG. 11 is a diagram illustrating an internal structure of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a method for producing a certificate photo is provided, which is exemplified by applying the method to a mobile terminal. In this embodiment, the method includes the steps of:
and 102, acquiring a target original image.
The target original image is a photo containing a portrait. The target original image can be obtained by inputting through an image acquisition device of the terminal by a user or obtained from a memory of the terminal. For example, a user may take a photo as a target original image through a mobile phone applet, or may select a certain photo in a local album picture as a target original image.
And 104, inputting the target original image into a portrait semantic segmentation model to obtain an identification result aiming at the target portrait, wherein the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the identification result of the target portrait by fusing the processing data of the main module aiming at the target original image and the processing data of the auxiliary module aiming at the target original image.
In the human image semantic segmentation, a target original image can be simply regarded as being composed of a Foreground (Foreground) and a Background (Background). The target portrait is the foreground, i.e. the portrait area of interest. As shown in fig. 2, the role of the portrait semantic segmentation model is to extract a target portrait from a target original image, and filter out a background part. The basic principle of portrait semantic segmentation is to input RGB (red, yellow and blue) images, output the categories (foreground and background) of all pixels, extract the high-level semantic features of portrait outlines from the input RGB images through convolution layers, convert the feature information back to the size of the input images by utilizing inverse convolution sum and feature fusion, and obtain a black and white portrait outline image, a black representative background and a white representative portrait. And the recognition result aiming at the target portrait is a black and white portrait outline image output by the portrait semantic segmentation model.
In the embodiment of the application, the portrait semantic segmentation model may include a main module and an auxiliary module. The main module and the supplementary module may both be of an Encoder-decoder structure. The master module may use a mobilenet (lightweight neural network) suitable for the mobile terminal as a backbone network. Multiple dw (Depthwise) convolutional layers (depth separable convolutional layers) may be included in the encoder structure to reduce the amount of computation. The decoder structure may include a plurality of deconvolution layers. After a target original image is input into a human image semantic segmentation model, image features can be extracted in an encoder structure through multiple convolution pooling operations, the image features extracted in the encoder structure are sampled in a decoder structure through a deconvolution layer to be restored to the same scale as the target original image, pixel classification is carried out on an up-sampled feature image, white represents foreground pixels, and black represents background pixels, so that a recognition result for the target human image is obtained. An auxiliary module is added on the basis of the main module, the encoder feature and the decoder feature extracted by the auxiliary module can be added into the main module, namely, the processing data of the main module aiming at the target original image and the processing data of the auxiliary module aiming at the target original image are fused, so that the accuracy of the main module is improved, and further, the accuracy of the portrait semantic segmentation model, namely the accuracy of the target portrait recognition result, is improved.
And 106, acquiring the target portrait from the target original image according to the identification result of the target portrait.
In the embodiment of the application, the identification result of the target portrait is a black and white portrait outline image. The recognition result of the target portrait, namely the portrait outline image, can be superposed in the target original image, and the background in the target original image is shielded by black to only leave the target portrait (foreground), so that the target portrait segmented from the background can be obtained.
Compared with the segmentation effect of high precision obtained through a green curtain, the portrait semantic segmentation model constructed through deep learning has the advantages that the target portrait can be segmented quickly and accurately, and the segmentation effect is better under the training of big data.
And step 108, adding the target portrait to a preset target background to obtain a certificate photo.
In the embodiment of the application, the preset target background is a solid background of the identification photo required to be made by the user, for example, a red background or a blue background, and can be freely selected according to actual needs. And adding the segmented target portrait to a target background to obtain a certificate photo corresponding to the target original image.
Illustratively, brightness adjustment can be added in the process of generating the identification photo, and the brightness of the image is judged through the gray value of the image, so that the brightness of the darker image is enhanced. The different positions of the light sources can be reflected on the change of the human face brightness information, and one side of the light sources deviated is brighter. And positioning a face area through face recognition, and further calculating brightness information of two sides of the face, so as to judge the position of the light source according to the information. The method can detect the picture with overlarge brightness difference caused by the deviation of the light source to one side, can meet the brightness requirement of the standard certificate photo, simultaneously eliminates the image of the light source factor to the image, and improves the effect of the generated standard certificate photo.
Illustratively, standard identification photographs require a figure to have consistent shoulder levels. A shoulder detection function may be added during the process of production of the identification photographs. The method comprises the steps that after a target character is added to a target background, a certificate image is converted into a gray-scale image, the height of shoulders on two sides in the certificate image is judged by traversing gray-scale values on two sides of the certificate image, whether the shoulders of the character are horizontal or not is judged by the height difference of the shoulders on two sides, and the effect of the generated certificate image can be improved by detecting the problem that the shoulders of the character are uneven in the image.
For example, in a standard identification photograph it may be desirable to have a photograph of 352 pixels (wide) by 440 pixels (high), a person's vertex at a distance of 7-21 pixels from the top edge of the image, eyes at a distance of no less than 207 pixels from the bottom edge of the photograph, and a face width of 207 ± 14 pixels. In order to meet the requirements, the coordinate information of key points of the human face can be identified through a human face identification algorithm, and the coordinate information is used for cutting and scaling the target portrait or the identification photo image so as to meet the requirements.
According to the certificate photo making method provided by the embodiment of the application, a target original image is obtained; inputting a target original image into a portrait semantic segmentation model to obtain a recognition result aiming at the target portrait, wherein the portrait semantic segmentation model comprises a main module and an auxiliary module, the main module is constructed by a depth separable convolution layer, and the portrait semantic segmentation model is used for fusing processing data of the main module aiming at the target original image and processing data of the auxiliary module aiming at the target original image to obtain the recognition result of the target portrait; acquiring a target portrait from a target original image according to the identification result of the target portrait; and adding the target portrait to a preset target background to obtain a certificate photo. Compared with the traditional portrait semantic segmentation algorithm, the method for manufacturing the certificate photo has the advantages that the portrait semantic segmentation model is built by the depth separable convolution layers, the number of the convolution layers can be reduced, so that the operation amount and the parameter amount are reduced, the size of the portrait semantic segmentation model is reduced, an original image does not need to be uploaded to a server in the certificate photo manufacturing process, the privacy of a user is guaranteed, meanwhile, the precision of the portrait semantic segmentation model is increased by using the auxiliary module, the precision of segmenting a target portrait is improved, and the effect of the generated certificate photo is improved.
In one embodiment, as shown in fig. 3, in step 104, inputting the target original image into a portrait semantic segmentation model to obtain a recognition result for the target portrait, which may include:
step 302, down-sampling the target original image to obtain a first target auxiliary image and a second target auxiliary image, where the down-sampling dimensions of the first target auxiliary image and the second target auxiliary image are different.
The down-sampling dimension of the first target auxiliary image may be 1/2, that is, the first target auxiliary image is an image obtained by down-sampling the target original image by 1/2. The down-sampling dimension of the second target auxiliary image may be 1/4, that is, the second target auxiliary image is an image obtained by performing 1/4 down-sampling on the target original image.
Step 304, inputting the first target auxiliary image and the second target auxiliary image into the auxiliary module respectively to obtain a first auxiliary feature corresponding to the first target auxiliary image and a second auxiliary feature corresponding to the second target auxiliary image.
Wherein the auxiliary module may comprise an encoder structure and a decoder structure. Both the first assist feature and the second assist feature may comprise features output by the encoder structure and the decoder structure, respectively, of the assist module.
And step 306, inputting the target original image into the main module, and respectively superposing the first auxiliary feature and the second auxiliary feature to the output of the main module to obtain the recognition result for the target portrait.
After the target original image is input into the main module, in the process of processing the target original image by the main module, the auxiliary module processes a first target auxiliary image and a second target auxiliary image which are obtained after down-sampling of different dimensions, the auxiliary module can respectively overlap the first auxiliary features and the second auxiliary features of the different down-sampling dimensions to the output of each layer of the main module, and the output results of each layer in the main module are finely adjusted, so that the final main module outputs the recognition result for the target portrait.
According to the embodiment of the disclosure, the output of the main module is superimposed based on the first auxiliary feature and the second auxiliary feature of the auxiliary module with different down-sampling dimensions, so as to adjust the finally obtained recognition result of the target portrait, thereby increasing the precision of the portrait semantic segmentation model and the accuracy of the output recognition result, namely increasing the precision of segmenting the target portrait, and improving the effect of the generated certificate photo.
In one embodiment, the master module includes a master encoder and a master decoder. As shown in fig. 4, in step 306, inputting the target original image into the main module, and superimposing the first assistant feature and the second assistant feature on the output of the main module respectively to obtain the recognition result for the target portrait, may include:
step 402, inputting the target original image into the main encoder, and respectively superimposing the first auxiliary feature and the second auxiliary feature on the output of the main encoder to obtain the target image feature.
Among them, the main encoder may use dw convolution, inverted residuals (inverse residual structure), and Linear bollnecks (Linear bottleneck structure) to reduce the amount of operations and parameters. The main encoder may include 4 dw convolutional layers, which may act as 16 convolutional layers of the portrait semantic segmentation model in the conventional art, that is, the number of convolutional layers is reduced, thereby reducing the amount of computation. The target original image is input into the main encoder, the main encoder performs dw convolution pooling operation on the target original image for multiple times, meanwhile, the auxiliary module can respectively superpose the first auxiliary feature and the second auxiliary feature on the output of different convolution layers in the main encoder so as to finely adjust the image features output by the different convolution layers, and the adjusted image features enter the next convolution layer to perform convolution pooling operation, so that the main encoder outputs the target image features after the convolution pooling operation of the last convolution layer is completed.
Step 404, inputting the target image feature into the main decoder, and superimposing the first assistant feature and the second assistant feature on the output of the main decoder respectively to obtain the recognition result for the target portrait.
The target image features are input into a main decoder, the main decoder samples the target image features through deconvolution layers, meanwhile, an auxiliary module can respectively superpose a first auxiliary feature and a second auxiliary feature on outputs of different deconvolution layers in the main decoder so as to finely adjust feature graphs of different scales output by different deconvolution layers, the adjusted feature graphs enter a next deconvolution layer to perform upsampling operation, and the main decoder outputs an identification result aiming at the target portrait after the upsampling operation of the last deconvolution layer is completed.
According to the embodiment of the invention, based on the fact that the first auxiliary feature and the second auxiliary feature of the auxiliary module are respectively superposed to the output of the main encoder and the output of the main decoder, the accuracy of the main module is improved, the accuracy of the portrait semantic segmentation model and the accuracy of the output recognition result are further increased, the accuracy of the segmentation of the target portrait is also increased, and the effect of the generated certificate photo is improved.
In one embodiment, the assist module includes an assist encoder and an assist decoder, the first assist feature includes a first assist encoding feature and a first assist decoding feature, and the second assist feature includes a second assist encoding feature and a second assist decoding feature. As shown in fig. 5, in step 304, inputting the first target assistant image and the second target assistant image into the assistant module respectively to obtain a first assistant feature corresponding to the first target assistant image and a second assistant feature corresponding to the second target assistant image, which may include:
step 502, inputting the first target auxiliary image and the second target auxiliary image into the auxiliary encoder respectively, to obtain a first auxiliary encoding feature corresponding to the first target auxiliary image and a second auxiliary encoding feature corresponding to the second target auxiliary image.
The scale of the first target auxiliary image is 1/2 of the target original image, and the scale of the second target auxiliary image is 1/4 of the target original image. And respectively inputting the first target auxiliary image and the second target auxiliary image into an auxiliary encoder, and extracting the first auxiliary coding feature and the second auxiliary coding feature of different scales by the auxiliary encoder through multiple convolution pooling operations. The first auxiliary encoding feature and the second auxiliary encoding feature may be extracted image features, i.e. an array. The scale of the first auxiliary coding feature and the second auxiliary coding feature may be the length of the array, and may be, for example, 4 × 4, 8 × 8, and so on.
Step 504, the first auxiliary encoding feature and the second auxiliary encoding feature are respectively input into an auxiliary decoder, so as to obtain a first auxiliary decoding feature corresponding to the first target auxiliary image and a second auxiliary decoding feature corresponding to the second target auxiliary image.
And the auxiliary decoder obtains the first auxiliary decoding characteristic and the second auxiliary decoding characteristic with different scales through multiple deconvolution. The first auxiliary decoding feature and the second auxiliary decoding feature may be an upsampled feature map.
According to the embodiment of the disclosure, based on the first target auxiliary image and the second target auxiliary image with different scales, the first auxiliary coding feature and the second auxiliary coding feature, the first auxiliary decoding feature and the second auxiliary decoding feature with different scales are obtained in the output module, so that the output of each layer in the main coder and the main decoder is adjusted, and the accuracy of the main module is improved.
In one embodiment, as shown in fig. 6, inputting the target original image into the main encoder, and respectively superimposing the first assistant feature and the second assistant feature on the output of the main encoder in step 402 may include:
step 602, a corresponding first target convolutional layer is determined from the depth separable convolutional layers of the main encoder according to the scale of the first auxiliary encoding feature.
And finding a first target convolutional layer in the main encoder according to the scale of the first auxiliary coding feature, wherein the scale of the image feature output by the first target convolutional layer is the same as the scale of the first auxiliary coding feature. The scale of the first auxiliary coding feature is the length of the array. For example: the main encoder includes 4 depth separable convolutional layers, and the first auxiliary encoding feature corresponds to the 2 dw convolutional layer.
Step 604, the first auxiliary coding feature is superimposed with the output of the first target convolutional layer to obtain a first coding feature, and the first coding feature is input to the next convolutional layer of the first target convolutional layer.
The first auxiliary coding feature and the output of the first target convolution layer can be characterized in an array form, and the first auxiliary coding feature has the same scale as the output of the first target convolution layer, that is, the first auxiliary coding feature has the same length as the output of the first target convolution layer. The first auxiliary coding feature is superposed with the output of the first target convolution layer, and the numerical values in the array can be directly added to obtain a new array, namely the first coding feature. The first target convolutional layer of the master encoder outputs a first encoding characteristic and inputs it to the next convolutional layer of the first target convolutional layer. For example: when 4 depth separable convolutional layers are included in the main encoder, the first auxiliary coding feature with a downsampling dimension of 1/2 will be added to the output of the 2 rd dw convolutional layer and the output of this convolutional layer will be input to the 3 rd dw convolutional layer.
Step 606, a corresponding second target convolutional layer is determined from the depth separable convolutional layers of the primary encoder according to the scale of the second auxiliary coding feature.
And finding a second target convolutional layer in the main encoder according to the scale of the second auxiliary coding feature, wherein the scale of the image feature output by the second target convolutional layer is the same as the scale of the second auxiliary coding feature. The dimension of the second auxiliary coding feature is the length of the array. For example: the main encoder includes 4 depth separable convolutional layers, and the second auxiliary encoding feature corresponds to the 3 rd dw convolutional layer.
Step 608, the second auxiliary coding feature is superimposed with the output of the second target convolution layer to obtain a second coding feature, and the second coding feature is input to the next convolution layer of the second target convolution layer.
The second auxiliary coding feature and the output of the second target convolutional layer can be both characterized in an array form, and the second auxiliary coding feature has the same scale as the output of the second target convolutional layer, that is, the length of the second auxiliary coding feature is the same as the length of the output of the second target convolutional layer. The second auxiliary coding feature is superposed with the output of the second target convolution layer, and the numerical values in the array can be directly added to obtain a new array, namely the second coding feature. The second target convolutional layer of the main encoder outputs the second encoding characteristic and inputs it to the next convolutional layer of the second target convolutional layer. For example: when 4 depth separable convolutional layers are included in the main encoder, the second auxiliary coding feature with a downsampling dimension of 1/4 will be added to the output of the 3 rd dw convolutional layer and the output of this convolutional layer will be input to the 4 th dw convolutional layer.
It should be noted that, in the embodiment of the present application, there is no strict order limitation on each step, and these steps or stages are not necessarily performed at the same time, but may be performed at different times, and the order of performing these steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of steps or stages in other steps. For example: step 602 and step 606 may be performed simultaneously, step 604 and step 608 may be performed simultaneously, or steps 602 and 604 may be performed first, and then step 606 and step 608 may be performed.
According to the embodiment of the disclosure, the first target convolutional layer and the second target convolutional layer in the main encoder are determined based on the scales of the first auxiliary coding feature and the second auxiliary coding feature, the first auxiliary coding feature is superposed on the first target convolutional layer, and the second auxiliary coding feature is superposed on the second target convolutional layer, so that the output of the convolutional layer in the main encoder is adjusted, the feature extraction precision of the main encoder is improved, and the accuracy of the main module is further improved.
In one embodiment, as shown in fig. 7, the superimposing the first and second assistant features to the output of the main decoder in step 404 may include:
step 702, determining a corresponding first target deconvolution layer from the primary decoder according to the scale of the first auxiliary decoding feature.
And finding a first target deconvolution layer in the main decoder according to the scale of the first auxiliary decoding feature, wherein the scale of a feature map output by the first target deconvolution layer is the same as the scale of the first auxiliary decoding feature. For example: the main decoder comprises 4 deconvolution layers, and the first auxiliary decoding feature corresponds to the 2 nd deconvolution layer.
Step 704, superimpose the first auxiliary decoding feature with the output of the first target deconvolution layer to obtain a first decoding feature, and input the first decoding feature to the next deconvolution layer of the first target deconvolution layer.
The first auxiliary decoding feature is superimposed with the output of the first target deconvolution layer, and the feature map can be directly superimposed to obtain a new feature map, namely the first decoding feature. The first target deconvolution layer of the main decoder outputs a first decoding characteristic and inputs it to a next deconvolution layer of the first target deconvolution layer. For example: when 4 deconvolution layers are included in the main decoder, the first auxiliary decoding feature with down-sampling dimension of 1/2 is added to the output of the 2 nd last deconvolution layer, and the output of the deconvolution layer is input to the 1 st last deconvolution layer.
Step 706, determining a corresponding second target deconvolution layer from the main decoder according to the scale of the second auxiliary decoding feature.
And finding a second target deconvolution layer in the main decoder according to the scale of the second auxiliary decoding feature, wherein the scale of a feature map output by the second target deconvolution layer is the same as the scale of the second auxiliary decoding feature. For example: the main decoder includes 4 deconvolution layers, and the second auxiliary decoding feature corresponds to the 3 rd deconvolution layer.
Step 708, superimpose the second auxiliary decoding feature with the output of the second target deconvolution layer to obtain a second decoding feature, and input the second decoding feature to the next deconvolution layer of the second target deconvolution layer.
And the second auxiliary decoding characteristic is superposed with the output of the second target deconvolution layer, so that the characteristic diagram can be directly superposed to obtain a new characteristic diagram, namely the second decoding characteristic. The second target deconvolution layer of the master decoder outputs a second decoding feature and inputs it to the next deconvolution layer of the second target deconvolution layer. For example: when the main decoder comprises 4 deconvolution layers, the second auxiliary decoding characteristic with down-sampling dimensionality of 1/4 is superposed to the output of the 3 rd last deconvolution layer, and the output of the deconvolution layer is input to the 2 nd last deconvolution layer.
It should be noted that, in the embodiment of the present application, there is no strict order limitation on each step, and these steps or stages are not necessarily performed at the same time, but may be performed at different times, and the order of performing these steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with other steps or at least a part of steps or stages in other steps. For example: step 702 and step 706 may be performed simultaneously, step 704 and step 708 may be performed simultaneously, or step 706 and step 708 may be performed first, and then step 702 and step 704 may be performed.
In the embodiment of the disclosure, the first target deconvolution layer and the second target deconvolution layer in the main decoder are determined based on the scales of the first auxiliary decoding feature and the second auxiliary decoding feature, and the first auxiliary decoding feature is superimposed on the first target deconvolution layer, and the second auxiliary decoding feature is superimposed on the second target deconvolution layer, so as to adjust the output of the deconvolution layer in the main decoder, thereby improving the accuracy of the main module.
In one embodiment, as shown in FIG. 8, a method of producing a credential photograph can include the steps of:
step 802, training the initial portrait semantic segmentation model according to a preset training set to obtain a first portrait semantic segmentation model.
Wherein, the preset training set can select the front photos of the single portrait close to the use scene. The initial portrait semantic segmentation model may be an encoder-decoder structure. And a data enhancement method can be used, and the selected single portrait positive photo is subjected to methods of background cross replacement, background amplification, brightness adjustment, noise point increase and the like, so that an extended training set is added. When the initial portrait semantic segmentation model is trained, the original single portrait positive photo can be used for training the main module, the 1/2 and 1/4 downsampling images of the single portrait positive photo are used for training the auxiliary module, and meanwhile, label sample data expanded through Gaussian blur and corrosion can be selected for output of the training auxiliary module. And training the main module and the auxiliary module in the initial portrait semantic segmentation model according to a preset training set to obtain a first portrait semantic segmentation model.
Step 804, determining the sum of the absolute values of each numerical value in the convolution kernel for a plurality of convolution kernels in any depth separable convolution layer in the first portrait semantic segmentation model, determining a plurality of target convolution kernels from each convolution kernel according to the sum of the absolute values of each numerical value in each convolution kernel in the first portrait semantic segmentation model, and deleting the plurality of target convolution kernels to obtain an intermediate portrait semantic segmentation model.
Wherein each depth separable convolution layer may include a plurality of convolution kernels therein. The convolution kernel is in a matrix form, and absolute values of all numerical values in the matrix are added to obtain the sum of the absolute values of all numerical values in the convolution kernel, which is also called as L1 norm. And calculating the L1 norm of each convolution kernel in the first portrait semantic segmentation model, and taking the convolution kernels with the minimum L1 norm and the preset number as target convolution kernels. For example, the preset number may be 5, that is, the 5 convolution kernels with the lowest L1 norm in the first portrait semantic segmentation model are used as target convolution kernels, and the 5 target convolution kernels are deleted to obtain the intermediate portrait semantic segmentation model. The above process of removing the convolution kernel may be referred to as a pruning operation.
And 806, retraining the intermediate portrait semantic segmentation model according to the training set until the accuracy of the trained intermediate portrait semantic segmentation model reaches a preset value, and determining the intermediate portrait semantic segmentation model as a portrait semantic segmentation model.
And retraining the intermediate portrait semantic segmentation model according to the training set to obtain the accuracy of the intermediate portrait semantic segmentation model at the moment. If the accuracy of the intermediate portrait semantic segmentation model reaches a preset value, determining the intermediate portrait semantic segmentation model as a portrait semantic segmentation model, and if the accuracy of the intermediate portrait semantic segmentation model does not reach the preset value, repeatedly executing the step 804 and then retraining a new intermediate portrait semantic segmentation model. The preset value can be selected according to the accuracy and the actual requirement of the first portrait semantic segmentation model, for example, the accuracy of the first portrait semantic segmentation model is 95%, and if the maximum sustainable accuracy after pruning is reduced by 5%, the preset value can be 90%.
According to the embodiment of the disclosure, the target convolution kernels are deleted through the sum of the absolute values of the numerical values in the convolution kernels, the size of the portrait semantic segmentation model is compressed, and meanwhile, the precision of the portrait semantic segmentation model is sent to the preset value, so that in the process of manufacturing the certificate photo by using the portrait semantic segmentation model, an uploading server is not needed to ensure the privacy of a user, and the effect of the generated certificate photo is also considered.
In one embodiment, the portrait semantic segmentation model can be quantized, and parameters with the parameter types of 32 floating point numbers are approximately stored and calculated by 8-bit integer types, so that the storage space occupied by the quantized portrait semantic segmentation model is reduced by 75%, the effect of compressing the portrait semantic segmentation model is achieved, the portrait semantic segmentation model can be used at a mobile terminal, and the privacy of a user is guaranteed.
In order to better understand the method for producing a certificate photo provided by the present application, fig. 9 provides a block diagram of a structure of a human image semantic segmentation model, please refer to the fig. and the present application provides a most complete embodiment of the method for producing a certificate photo. And performing 1/2 down-sampling on the target original image to obtain a first target auxiliary image, and performing 1/4 down-sampling on the target original image to obtain a second target auxiliary image. The first target auxiliary image and the second target auxiliary image are input into an auxiliary encoder, the auxiliary encoder processes the first target auxiliary image to output a first auxiliary coding characteristic, and the auxiliary encoder processes the second target auxiliary image to output a second auxiliary coding characteristic. The target original image is input into a main encoder in the portrait semantic segmentation model, and each dw convolution layer in the main encoder can perform convolution pooling on the target original image. Meanwhile, the first auxiliary coding characteristics output by the auxiliary encoder are superposed to the output of the corresponding first target convolution layer in the main encoder to obtain first coding characteristics, and the first coding characteristics are input to the next convolution layer of the first target convolution layer to continue convolution pooling. The output of the first target convolutional layer is the same scale as the first auxiliary coding feature. And the second auxiliary coding characteristics output by the auxiliary encoder are superposed to the output of a corresponding second target convolutional layer in the main encoder to obtain second coding characteristics, and the second coding characteristics are input to the next convolutional layer of the second target convolutional layer to continue convolutional pooling. The output of the second target convolutional layer is the same scale as the second auxiliary coding feature. The main encoder outputs target image characteristics through the process, the target image characteristics are input into the main decoder, and each deconvolution layer in the main decoder processes the target image characteristics. Meanwhile, the first auxiliary encoding feature and the second auxiliary encoding feature output by the auxiliary encoder are input to an auxiliary decoder, and the auxiliary decoder outputs the first auxiliary decoding feature and the second auxiliary decoding feature. And the first decoding characteristic is input to the next deconvolution layer of the first target deconvolution layer to continue the up-sampling processing. The output of the first target deconvolution layer is on the same scale as the first auxiliary decoding feature. And the second decoding characteristic output by the auxiliary decoder is superposed to the output of a corresponding second target deconvolution layer in the main decoder to obtain a second decoding characteristic, and the second decoding characteristic is input to the next deconvolution layer of the second target deconvolution layer to continue the upsampling processing. The output of the second target deconvolution layer is on the same scale as the second auxiliary decoding feature. The main decoder outputs the identification result of the target portrait through the process, namely outputs a black-and-white portrait outline image with the same scale as the target original image. And superposing the black and white portrait outline image on a target original image to obtain a target portrait, and adding the target portrait into a preset target background to obtain a certificate photo.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the application also provides a certificate photo making device for realizing the above mentioned certificate photo making device. The solution to the problem provided by the device is similar to the solution described in the above method, so the specific limitations in one or more embodiments of the credential production device provided below can be referred to the limitations in the above method for producing a credential, and are not described herein again.
In one embodiment, referring to FIG. 10, a credential production device 1000 is provided. The certificate photo production apparatus 1000 includes:
an image obtaining module 1002, configured to obtain a target original image;
the recognition module 1004 is used for inputting the target original image into a portrait semantic segmentation model to obtain a recognition result aiming at the target portrait, the portrait semantic segmentation model comprises a main module and an auxiliary module, the main module is constructed by a depth separable convolution layer, and the portrait semantic segmentation model is used for fusing the processing data of the main module aiming at the target original image and the processing data of the auxiliary module aiming at the target original image to obtain the recognition result of the target portrait;
a portrait acquisition module 1006, configured to acquire a target portrait from a target original image according to a recognition result of the target portrait;
and the synthesizing module 1008 is used for adding the target portrait to a preset target background to obtain the identification photo.
The certificate photo making device provided by the embodiment of the application acquires an original target image; inputting a target original image into a portrait semantic segmentation model to obtain an identification result aiming at the target portrait, wherein the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the identification result of the target portrait by fusing processing data of the main module aiming at the target original image and processing data of the auxiliary module aiming at the target original image; acquiring a target portrait from a target original image according to the identification result of the target portrait; and adding the target portrait to a preset target background to obtain a certificate photo. Compared with the traditional portrait semantic segmentation algorithm, the device for manufacturing the identification photo constructs the portrait semantic segmentation model by the depth separable convolution layer, can reduce the number of convolution layers, thereby reducing the operation amount and the parameter amount, reducing the size of the portrait semantic segmentation model, enabling the identification photo manufacturing process not to upload an original image to a server, ensuring the privacy of a user, simultaneously increasing the precision of the portrait semantic segmentation model by using the auxiliary module, improving the precision of segmenting a target portrait, and improving the effect of the generated identification photo.
In one embodiment, the identifying module 1004 is further configured to perform downsampling on the target original image to obtain a first target auxiliary image and a second target auxiliary image, where downsampling dimensions of the first target auxiliary image and the second target auxiliary image are different; respectively inputting the first target auxiliary image and the second target auxiliary image into an auxiliary module to obtain a first auxiliary feature corresponding to the first target auxiliary image and a second auxiliary feature corresponding to the second target auxiliary image; and inputting the target original image into the main module, and respectively superposing the first auxiliary characteristic and the second auxiliary characteristic to the output of the main module to obtain a recognition result aiming at the target portrait.
In one embodiment, the main module includes a main encoder and a main decoder, and the recognition module 1004 is further configured to input the target original image into the main encoder, and superimpose the first assistant feature and the second assistant feature on the output of the main encoder respectively to obtain the target image feature; and inputting the target image characteristics into a main decoder, and respectively superposing the first auxiliary characteristics and the second auxiliary characteristics to the output of the main decoder to obtain the identification result aiming at the target portrait.
In one embodiment, the assist module includes an assist encoder and an assist decoder, the first assist feature includes a first assist encoding feature and a first assist decoding feature, and the second assist feature includes a second assist encoding feature and a second assist decoding feature. The identifying module 1004 is further configured to input the first target auxiliary image and the second target auxiliary image into the auxiliary encoder respectively, so as to obtain a first auxiliary encoding feature corresponding to the first target auxiliary image and a second auxiliary encoding feature corresponding to the second target auxiliary image; and respectively inputting the first auxiliary coding feature and the second auxiliary coding feature into an auxiliary decoder to obtain a first auxiliary decoding feature corresponding to the first target auxiliary image and a second auxiliary decoding feature corresponding to the second target auxiliary image.
In one embodiment, the identifying module 1004 is further configured to determine a corresponding first target convolutional layer from among the depth-separable convolutional layers of the primary encoder according to a scale of the first auxiliary coding feature; superposing the first auxiliary coding characteristic with the output of the first target convolutional layer to obtain a first coding characteristic, and inputting the first coding characteristic to the next convolutional layer of the first target convolutional layer; determining a corresponding second target convolutional layer from the depth separable convolutional layers of the primary encoder according to the scale of the second auxiliary coding feature; and superposing the second auxiliary coding characteristic with the output of the second target convolutional layer to obtain a second coding characteristic, and inputting the second coding characteristic to the next convolutional layer of the second target convolutional layer.
In one embodiment, the identifying module 1004 is further configured to determine a corresponding first target deconvolution layer from the primary decoder according to a scale of the first auxiliary decoding feature; superimposing the first auxiliary decoding feature with the output of the first target deconvolution layer to obtain a first decoding feature, and inputting the first decoding feature to a next deconvolution layer of the first target deconvolution layer; determining a corresponding second target deconvolution layer from the main decoder according to the scale of the second auxiliary decoding feature; and superposing the second auxiliary decoding characteristic with the output of the second target deconvolution layer to obtain a second decoding characteristic, and inputting the second decoding characteristic to a next deconvolution layer of the second target deconvolution layer.
In one embodiment, the credential production device 1000 also includes a training module. The training module is used for training the initial portrait semantic segmentation model according to a preset training set to obtain a first portrait semantic segmentation model; determining the sum of absolute values of each numerical value in the convolution kernel according to the plurality of convolution kernels in each convolution kernel in the first portrait semantic segmentation model, determining a plurality of target convolution kernels from each convolution kernel according to the sum of absolute values of each numerical value in each convolution kernel in the first portrait semantic segmentation model, and deleting the plurality of target convolution kernels to obtain an intermediate portrait semantic segmentation model; and retraining the intermediate portrait semantic segmentation model according to the training set until the accuracy of the trained intermediate portrait semantic segmentation model reaches a preset value, and determining the intermediate portrait semantic segmentation model as a portrait semantic segmentation model.
The modules in the certificate photo production coding device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of producing a certificate photo.
Those skilled in the art will appreciate that the architecture shown in fig. 11 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above-described method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, databases, or other media used in the embodiments provided herein can include at least one of non-volatile and volatile memory. The nonvolatile Memory may include a Read-Only Memory (ROM), a magnetic tape, a floppy disk, a flash Memory, an optical Memory, a high-density embedded nonvolatile Memory, a resistive Random Access Memory (ReRAM), a Magnetic Random Access Memory (MRAM), a Ferroelectric Random Access Memory (FRAM), a Phase Change Memory (PCM), a graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (11)

1. A method of producing a certificate photo, the method comprising:
acquiring a target original image;
inputting the target original image into a portrait semantic segmentation model to obtain an identification result aiming at the target portrait, wherein the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the identification result of the target portrait by fusing the processing data of the main module aiming at the target original image and the processing data of the auxiliary module aiming at the target original image;
acquiring the target portrait from the target original image according to the identification result of the target portrait;
and adding the target portrait to a preset target background to obtain a certificate photo.
2. The method of claim 1, wherein the inputting the target original image into a portrait semantic segmentation model to obtain a recognition result for a target portrait comprises:
down-sampling the target original image to obtain a first target auxiliary image and a second target auxiliary image, wherein the down-sampling dimensionality of the first target auxiliary image is different from that of the second target auxiliary image;
inputting the first target auxiliary image and the second target auxiliary image into the auxiliary module respectively to obtain a first auxiliary feature corresponding to the first target auxiliary image and a second auxiliary feature corresponding to the second target auxiliary image;
and inputting the target original image into the main module, and respectively superposing the first auxiliary feature and the second auxiliary feature to the output of the main module to obtain a recognition result aiming at the target portrait.
3. The method according to claim 2, wherein the main module comprises a main encoder and a main decoder, and the inputting the target original image into the main module and the superimposing the first assistant feature and the second assistant feature on the output of the main module respectively obtain the recognition result for the target portrait comprises:
inputting the target original image into the main encoder, and respectively superposing the first auxiliary feature and the second auxiliary feature to the output of the main encoder to obtain target image features;
and inputting the target image features into the main decoder, and respectively superposing the first auxiliary features and the second auxiliary features to the output of the main decoder to obtain a recognition result aiming at the target portrait.
4. The method according to claim 3, wherein the auxiliary module comprises an auxiliary encoder and an auxiliary decoder, the first auxiliary feature comprises a first auxiliary encoding feature and a first auxiliary decoding feature, the second auxiliary feature comprises a second auxiliary encoding feature and a second auxiliary decoding feature, and the inputting the first target auxiliary image and the second target auxiliary image into the auxiliary module respectively obtains a first auxiliary feature corresponding to the first target auxiliary image and a second auxiliary feature corresponding to the second target auxiliary image comprises:
inputting the first target auxiliary image and the second target auxiliary image into the auxiliary encoder respectively to obtain the first auxiliary coding feature corresponding to the first target auxiliary image and the second auxiliary coding feature corresponding to the second target auxiliary image;
and inputting the first auxiliary coding feature and the second auxiliary coding feature into the auxiliary decoder respectively to obtain the first auxiliary decoding feature corresponding to the first target auxiliary image and the second auxiliary decoding feature corresponding to the second target auxiliary image.
5. The method of claim 4, wherein the superimposing the first and second assist features onto the output of the primary encoder, respectively, comprises:
determining a corresponding first target convolutional layer from the depth separable convolutional layers of the primary encoder according to the scale of the first auxiliary coding feature;
superposing the first auxiliary coding feature with the output of the first target convolutional layer to obtain a first coding feature, and inputting the first coding feature to a next convolutional layer of the first target convolutional layer;
determining a corresponding second target convolutional layer from the depth separable convolutional layers of the primary encoder according to the scale of the second auxiliary coding feature;
and superposing the second auxiliary coding characteristic with the output of the second target convolution layer to obtain a second coding characteristic, and inputting the second coding characteristic to the next convolution layer of the second target convolution layer.
6. The method of claim 4, wherein said superimposing the first and second assist features onto the output of the primary decoder, respectively, comprises:
determining a corresponding first target deconvolution layer from the main decoder according to the scale of the first auxiliary decoding feature;
superimposing the first auxiliary decoding feature with an output of the first target deconvolution layer to obtain a first decoding feature, and inputting the first decoding feature to a next deconvolution layer of the first target deconvolution layer;
determining a corresponding second target deconvolution layer from the main decoder according to the scale of the second auxiliary decoding feature;
and superposing the second auxiliary decoding characteristic with the output of the second target deconvolution layer to obtain a second decoding characteristic, and inputting the second decoding characteristic to a next deconvolution layer of the second target deconvolution layer.
7. The method of claim 1, further comprising:
training the initial portrait semantic segmentation model according to a preset training set to obtain a first portrait semantic segmentation model;
determining a sum of absolute values of each numerical value in the convolution kernels for a plurality of convolution kernels in any one of the depth-separable convolution layers in the first portrait semantic segmentation model, determining a plurality of target convolution kernels from each convolution kernel according to the sum of absolute values of each numerical value in each convolution kernel in the first portrait semantic segmentation model, and deleting the plurality of target convolution kernels to obtain an intermediate portrait semantic segmentation model;
and retraining the intermediate portrait semantic segmentation model according to the training set until the accuracy of the trained intermediate portrait semantic segmentation model reaches a preset value, and determining the intermediate portrait semantic segmentation model as the portrait semantic segmentation model.
8. An apparatus for producing an identification photograph, the apparatus comprising:
the image acquisition module is used for acquiring a target original image;
the recognition module is used for inputting the target original image into a portrait semantic segmentation model to obtain a recognition result aiming at the target portrait, the portrait semantic segmentation model comprises a main module constructed by a depth separable convolution layer and an auxiliary module constructed by the depth separable convolution layer, and the portrait semantic segmentation model obtains the recognition result aiming at the target portrait by fusing the processing data of the main module aiming at the target original image and the processing data of the auxiliary module aiming at the target original image;
the portrait acquisition module is used for acquiring the target portrait from the target original image according to the identification result of the target portrait;
and the synthesis module is used for adding the target portrait to a preset target background to obtain a certificate photo.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 7 when executed by a processor.
CN202211161872.7A 2022-09-23 2022-09-23 Method and device for making certificate photo Pending CN115471896A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211161872.7A CN115471896A (en) 2022-09-23 2022-09-23 Method and device for making certificate photo

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211161872.7A CN115471896A (en) 2022-09-23 2022-09-23 Method and device for making certificate photo

Publications (1)

Publication Number Publication Date
CN115471896A true CN115471896A (en) 2022-12-13

Family

ID=84334701

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211161872.7A Pending CN115471896A (en) 2022-09-23 2022-09-23 Method and device for making certificate photo

Country Status (1)

Country Link
CN (1) CN115471896A (en)

Similar Documents

Publication Publication Date Title
Lim et al. DSLR: Deep stacked Laplacian restorer for low-light image enhancement
Yu et al. Super-resolving very low-resolution face images with supplementary attributes
CN112330574B (en) Portrait restoration method and device, electronic equipment and computer storage medium
CN111047516A (en) Image processing method, image processing device, computer equipment and storage medium
CN112560831B (en) Pedestrian attribute identification method based on multi-scale space correction
CN110675339A (en) Image restoration method and system based on edge restoration and content restoration
CN110473151B (en) Partition convolution and correlation loss based dual-stage image completion method and system
CN113205449A (en) Expression migration model training method and device and expression migration method and device
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN114881871A (en) Attention-fused single image rain removing method
CN115272437A (en) Image depth estimation method and device based on global and local features
Salmona et al. Deoldify: A review and implementation of an automatic colorization method
CN115439325A (en) Low-resolution hyperspectral image processing method and device and computer program product
CN116645598A (en) Remote sensing image semantic segmentation method based on channel attention feature fusion
Chen et al. MICU: Image super-resolution via multi-level information compensation and U-net
CN112733861B (en) Text erasing and character matting method based on U-shaped residual error network
CN113392791A (en) Skin prediction processing method, device, equipment and storage medium
CN112686830A (en) Super-resolution method of single depth map based on image decomposition
US20230135978A1 (en) Generating alpha mattes for digital images utilizing a transformer-based encoder-decoder
CN115471896A (en) Method and device for making certificate photo
Boss et al. Deep Dual Loss BRDF Parameter Estimation.
CN112651926A (en) Method and device for detecting cracks based on recursive attention mechanism
CN116912345B (en) Portrait cartoon processing method, device, equipment and storage medium
CN116703687B (en) Image generation model processing, image generation method, image generation device and computer equipment
US20230055204A1 (en) Generating colorized digital images utilizing a re-colorization neural network with local hints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination