CN116091865A

CN116091865A - Image acquisition model training method for photograph, image acquisition method, computer device, and computer-readable storage medium

Info

Publication number: CN116091865A
Application number: CN202310010092.0A
Authority: CN
Inventors: 卢闰霆; 岳永旺; 钞蓓英
Original assignee: Jinbangda Co ltd
Current assignee: Jinbangda Co ltd
Priority date: 2023-01-04
Filing date: 2023-01-04
Publication date: 2023-05-09

Abstract

The invention provides a photo image acquisition model training method, an image acquisition method, a computer device and a computer readable storage medium, wherein the photo image acquisition model training method comprises the steps of acquiring photo data, inputting the photo data into an initial image acquisition model to obtain a transparency prediction graph; global thresholding is carried out on the transparency prediction graph to generate a segmentation mask graph; amplifying the photo data to obtain amplified data; training a segmentation model based on a segmentation neural network by taking the amplified data as input; dividing the amplified data by using the trained division model to obtain an inference mask graph; carrying out morphological expansion processing on the reasoning mask map to generate a reasoning mask morphology map; channel splicing is carried out on the amplified data and the corresponding reasoning mask diagram to form a four-channel diagram; and taking the four-channel image as input, and training an image acquisition model based on the image acquisition neural network. The method can improve the identification capability of the image acquisition model to the foreground image edge.

Description

Image acquisition model training method for photograph, image acquisition method, computer device, and computer-readable storage medium

Technical Field

The present invention relates to the field of image acquisition, and in particular, to an image acquisition training method, an image acquisition method, a computer device, and a computer-readable storage medium for credentials.

Background

The credentials are generally photographed by a photo studio, but some credentials are formed by life photos or picture matting photographed by a mobile phone camera because the credentials cannot be photographed by a person in person. In the process of generating the certificate photo by matting, the main body figure of the image is found out from the front face portrait photo shot in a non-green screen background environment, the portrait foreground and the transparency map thereof are extracted by matting, and the portrait foreground and the target background color are overlapped and combined according to the transparency map by clipping, so that the certificate photo meeting the requirements is generated.

Conventional matting techniques can train a segmentation model by a static image acquisition algorithm (Trimap), but the model requires input of the original and Trimap images. The trimap image is a gray scale image, and it is difficult to obtain the trimap image, which increases the throughput of the computer.

The traditional matting technique can also train through a matting neural network such as Modnet, and the image acquisition processing of a trimap image is not needed for training, however, the image acquisition model is generally trained by using a clearer picture data set, when the universal portrait image acquisition model is used, the image acquisition effect of a clearer picture is better, but because the picture for making the certificate photo may be a picture in a low-resolution and low-light environment, the image acquisition effect of the picture is poor.

Disclosure of Invention

A first object of the present invention is to provide a training method for an image acquisition model of a photograph that improves the image edge discrimination capability of a foreground person.

A second object of the present invention is to provide an image acquisition method using an image acquisition model.

A third object of the present invention is to provide a computer apparatus to which the above-described image acquisition model training method and image acquisition method are applied.

A fourth object of the present invention is to provide a computer readable storage medium embodying the above-described image acquisition model training method and image acquisition method.

In order to achieve the first object, the present invention provides an image acquisition model training method, including: collecting photo data, and inputting the photo data into an initial image acquisition model to obtain a transparency prediction graph; global thresholding is carried out on the transparency prediction graph to generate a segmentation mask graph; amplifying the photo data to obtain amplified data; taking the amplified data as input, taking the segmentation mask map as a label, and training a segmentation model based on a segmentation neural network; dividing the amplified data by using the trained division model to obtain an inference mask graph; carrying out morphological expansion processing on the reasoning mask map to generate a reasoning mask morphology map; channel splicing is carried out on the amplified data and the corresponding reasoning mask diagram to form a four-channel diagram; and taking the four-channel image as input, taking the transparency prediction image as a label, and training an image acquisition model based on the image acquisition neural network.

According to the scheme, the segmentation model and the image acquisition model are trained, and the image acquisition operation is carried out through the segmentation model and the image acquisition model. And (3) amplifying the data and inputting the amplified data into a segmentation model to obtain an inference mask map, wherein the inference mask map is subjected to morphological expansion processing to generate an inference mask morphological map, and the inference mask morphological map is spliced with the image to form a four-channel map. The reasoning mask morphology graph in the four-channel graph can be used as auxiliary priori information for distinguishing foreground images, the image acquisition model eliminates complex background information through the reasoning mask morphology graph, and the distinguishing capability of the image acquisition model on image edges is enhanced.

In a further scheme, the number of channels of the convolution layer of the image acquisition neural network is four.

Thus, since the input of the image acquisition neural network is a four-channel image, the number of channels of the convolution layer of the image acquisition neural network is also changed to four channels.

In a further aspect, global thresholding is performed on the transparency prediction graph, and generating the segmentation mask graph includes: setting a pixel threshold; judging whether each pixel point in the transparency prediction graph is larger than or equal to the pixel threshold value; if the pixel point in the transparency prediction graph is greater than or equal to the pixel threshold value, setting the pixel point to 255; if the pixel point in the transparency prediction graph is smaller than the pixel threshold value, the pixel point is set to be 0.

It can be seen that the pixel threshold is set to 120.

In a further aspect, the acquiring photo data includes: acquiring a portrait picture in a network; removing more than two pictures of human faces through a human face detection model; removing more than two pictures of people through a person detection model; removing three pictures with Euler angles larger than a preset angle threshold value through a head posture estimation model; removing the picture with the hat through the hat detection model; and removing the picture with the blocked face through the blocking detection model to obtain photo data.

Therefore, the photo can be selected on a network, a star photo or a character photo is obtained through a plurality of ways, and a single face photo without shielding is selected through a face detection model, a person detection model, a head posture estimation model, a cap detection model and a shielding detection model.

In a further aspect, the amplifying the photograph data to obtain amplified data includes: performing face analysis on the image data by using a face analysis model; performing adjustment operation on each picture in the picture data to generate first amplified picture data; dividing the photo data into a first group of photos and a second group of photos with the same number, and respectively placing the first amplified photo data into groups of photo data corresponding to the first amplified photo data; using a makeup migration neural network model, taking the first group of photos as a reference, migrating the makeup of the first group of photos into the second group of photos, and generating second amplified photo data; changing the portrait of the second group of photos to the first group of photos to generate third amplified photo data; selecting a background picture, and carrying out background replacement on photo data, first amplified photo data, second amplified photo data and third amplified photo data to generate fourth amplified photo data; the photograph data, the first amplified photograph data, the generated second amplified photograph data, the third amplified photograph data, and the fourth amplified photograph data are combined to form amplified data.

Therefore, the digital amplification is realized by changing textures, colors, makeup and the like of different parts of the portrait and changing the background pattern, but the area range of the portrait is not changed in the data amplification process, so that the stability of the model and the robustness of distinguishing the background can be improved, and the accuracy of the model in image acquisition of low-quality pictures is improved.

In a further aspect, performing background replacement on the photograph data, the first amplified photograph data, the second amplified photograph data, and the third amplified photograph data to generate the fourth amplified photograph data includes: normalizing the transparency prediction graph, marking the transparency prediction graph as alpha, marking the amplification data as F, marking the background picture as B, and marking the fourth amplification photo data as I; the fourth amplified photograph data is generated by the formula i=alpha×f+ (1-alpha) ×b.

Therefore, the background image and the foreground image are fused through a formula, and the fourth amplified image is obtained.

In a further aspect, the background picture is a monotone background or a background of an unmanned image.

It can be seen that different background pictures can improve the discrimination of the image acquisition model to the background.

In order to achieve the second object, the present invention provides an image acquisition method, including: acquiring a picture to be acquired by an image; dividing a picture acquired by a to-be-image by using a division model to generate a mask picture; performing channel stitching on the mask image and the picture to be acquired, and generating four-channel picture to be acquired; and performing image acquisition operation on the picture to be image-acquired of the four channels by using an image acquisition model.

According to the scheme, in the image acquisition process, the mask image obtained by using the segmentation model and the image to be acquired are subjected to channel stitching to obtain four-channel image to be acquired, and the image acquisition model is used for carrying out image acquisition operation on the four-channel image to be acquired. Image acquisition using dual modes may improve the discrimination of image edges.

In order to achieve the third object, the present invention provides a computer device including a processor and a memory, the memory storing a computer program, the computer program implementing the image acquisition model training method of the photograph and the image acquisition method when executed by the processor.

In order to achieve the fourth object described above, the present invention provides a computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed implements the image acquisition model training method of the photograph and the image acquisition method described above.

Drawings

FIG. 1 is a flow chart of an embodiment of the image acquisition model training method of the present invention for photographs.

FIG. 2 is a flow chart of acquiring photo data for an embodiment of the photo image acquisition model training method of the present invention.

FIG. 3 is a flow chart of a transparency prediction graph global thresholding process for an embodiment of an image acquisition model training method of the present invention.

FIG. 4 is a flowchart of a photograph data augmentation process of an embodiment of a photograph image acquisition model training method of the present invention.

The invention is further described below with reference to the drawings and examples.

Detailed Description

Image acquisition model training method embodiment of photo:

referring to fig. 1, fig. 1 is a flowchart of an embodiment of an image acquisition model training method of the present invention for photographs. The segmentation model and the image acquisition model trained by the invention have certain improvement on the distinguishing capability of the image edge. Firstly, step S11 is executed, photo data are collected, and are input into an initial image acquisition model to obtain a transparency prediction graph. The photo data are photo data acquired on a network, and the initial image acquisition model is an image acquisition model obtained by training a public data set. And (3) manually detecting the transparency prediction graph by a technician, and modifying and adjusting the defective transparency prediction graph by using image editing software.

After step S11 is completed, step S12 is performed to perform global thresholding on the transparency prediction map, and a segmentation mask map is generated. The transparency prediction graph subjected to global thresholding can better separate an image from a background.

After the segmentation mask map is generated, step S13 is executed to amplify the photograph data to obtain amplified data. By changing the characteristics of textures, colors, makeup and the like of different parts of the portrait of the photo data or changing the background pattern of the photo data, the robustness of distinguishing the segmentation model from the image acquisition model can be improved when the segmentation model and the image acquisition model are trained, so that the segmentation model and the image acquisition model can better use complex backgrounds, and the accuracy of image acquisition of low-quality pictures is improved.

After the amplified data is obtained, step S14 is performed, in which the amplified data is used as an input, the segmentation mask map is used as a label, and the segmentation model is trained based on the segmentation neural network. The split neural network used in the invention is a BiseNet neural network and a lightweight U-Net neural network. The BiseNet neural network has higher accuracy and precision, but has higher resource consumption, and is suitable for pictures with complex background or low quality. The light U-Net neural network fuses the U-Net structure and the mobilet-V2 structure, so that the execution efficiency of a segmentation model can be improved, the resource consumption is low, and the light U-Net neural network is suitable for pictures with good quality and pictures with simple background. In the training process of the segmentation model, operations such as random clipping, overturning, brightness or contrast transformation and the like can be performed on the amplified data, so that the segmentation model has better convergence and robustness.

After the training of the segmentation model is completed, step S15 is executed, and the trained segmentation model is used to segment the amplified data, so as to obtain an inference mask graph. After obtaining the inference mask map, step S16 is executed, and morphological expansion processing is performed on the inference mask map, so as to generate an inference mask morphology map. The morphological expansion processing expands the boundary points of the images of the inference mask morphology graph outwards, so that the boundary of the foreground image is clearer.

After the inference mask pattern is generated, step 17 is executed, and the amplified data and the inference mask pattern corresponding to the amplified data are subjected to channel splicing to form a four-channel pattern. The amplified data is an RGB image, which has three channels, and the inference mask graph is a one-channel image. And performing channel splicing on the amplified data and the reasoning mask map to form a four-channel map.

After the four-channel map is formed, step S18 is performed, in which the four-channel map is used as an input, the transparency prediction map is used as a label, and the image acquisition model is trained based on the image acquisition neural network. The image acquisition neural network used in the invention is an MODET neural network, and the MODET neural network has the characteristics of high accuracy, strong generalization performance and the like. However, the image acquisition model trained by the MODET neural network can have flaws due to poor image edge processing. The MODET neural network input is changed into a four-channel image, and the number of channels of the convolution layer is changed into four channels. In the training process of the image acquisition model, operations such as clipping, overturning, brightness or contrast transformation and the like can be randomly performed on the amplified data, so that the segmentation model has better convergence and robustness. The image acquisition model obtained through training has strong identification capability on the foreground image.

Referring to fig. 2, fig. 2 is a flowchart of acquiring photo data of an embodiment of the image acquisition model training method of the photo of the present invention. First, S31 is executed to collect portrait pictures on the network. A star photo or a person photo of 34438 asians was taken in the network. After the network collects the portrait pictures, step S32 and step S33 are executed, more than two face pictures are removed through the face detection model, more than two portrait pictures are removed through the face detection model, and a single portrait picture is obtained.

After obtaining the single photo, step S35 is executed to remove the three pictures with the euler angles greater than the preset angle threshold through the head posture estimation model. The three Euler angles are pitch angle, yaw angle and roll angle, and the preset angle threshold is 15 degrees. When the pitch angle, yaw angle and roll angle are less than 15 degrees, the person photograph is a frontal photograph.

Step S36 and step S37 are executed, wherein the pictures with the hat are removed through the hat detection model, and the pictures with the blocked faces are removed through the blocking detection model, so that 4000 photo data of the single person without the blocking, the front face are obtained. Because the invention is applied to the image acquisition of the credentials, asian portrait pictures with more standard facial gestures and no shielding are adopted as training data.

Referring to fig. 3, fig. 3 is a flowchart of a transparency prediction graph global thresholding process of an embodiment of an image acquisition model training method of the present invention. First, step S21 is performed to set a pixel threshold. The pixel threshold set by the invention is 120.

After setting the pixel threshold, step S22 is executed to determine whether each pixel of the transparency prediction graph is greater than or equal to the pixel threshold, and if the pixel in the transparency prediction graph is greater than or equal to the pixel threshold, step S23 is executed to set the pixel to 255. If the pixel in the transparency prediction graph is smaller than the pixel threshold, step S24 is executed to set the pixel to 0. The transparency prediction graph subjected to global thresholding can better separate an image from a background.

Referring to fig. 4, fig. 4 is a flowchart of a photograph data augmentation process of an embodiment of the image acquisition model training method of the present invention for photographs. First, step S41 is executed to perform facial analysis on the image data using the facial analysis model, and to distinguish the hair, face, clothing, etc.

Then, step S42 is performed to perform an adjustment operation on each photo in the photo data, and generate first enlarged photo data. The adjustment operation includes brightness enhancement, contrast enhancement, whitening or darkening, shading or shimmer addition, etc. on the whole picture or part. Each photo randomly selects one or more adjustment operations to perform adjustment. Two adjustment operations were performed for each photograph, yielding 8000 first amplified photograph data.

After the first amplified photo data is generated, step S43 is executed to divide the photo data into a first group of photos and a second group of photos with the same number, and the first amplified photo data are respectively categorized into the corresponding groups of photo data. 4000 photographs were divided into a first group of photographs and a second group of photographs, each group having 2000 photographs of data. And classifying the first amplified data into corresponding groups of photo data respectively, wherein the first group of photos and the second group of photos are 6000 photos.

After step S43 is performed, step S44 is performed to migrate the makeup of the first group of photos to the second group of photos by using the makeup migration neural network model, and generate second amplified photo data. The second amplified photograph data is 6000 photographs.

After the second amplified photo data is generated, step S45 is performed to change the portrait of the second group of photos to the first group of photos to generate the third amplified photo data. The technology used for changing the portrait of the second group of photos to the first group of photos is an image face changing technology, and the data of the third amplified photos is 6000 photos.

After the third amplified photo data is generated, step S46 is executed to select a background picture, and the photo data, the first amplified photo data, the second amplified photo data, and the third amplified photo data are subjected to background replacement to generate fourth amplified photo data. The total of the photo data, the first amplified photo data, the second amplified photo data and the third amplified photo data is 24000 photos, and 24000 new photos are obtained after background replacement of 24000 photos. Some of the clearer photos were subjected to background replacement a plurality of times, and the fourth amplified photo data was 26000 photos.

The background replacement method comprises the following steps: normalizing the transparency prediction graph, marking the transparency prediction graph as alpha, marking the amplified data as F, marking the background picture as B, and marking the fourth amplified photo data as I; the fourth amplified photograph data is generated by the formula i=alpha×f+ (1-alpha) ×b. The background picture is a monotone background picture or a background picture of the unmanned image.

After step S46 is executed, step S47 is executed, and the photo data, the first amplified photo data, the second amplified photo data, the third amplified photo data and the fourth amplified photo data are combined to form amplified data, wherein the amplified data is 50000 pictures. By changing the characteristics of textures, colors, makeup and the like of different parts of the portrait of the photo data or changing the background pattern of the photo data, the robustness of distinguishing the segmentation model from the image acquisition model can be improved when the segmentation model and the image acquisition model are trained, so that the segmentation model and the image acquisition model can better use complex backgrounds, and the accuracy of image acquisition of low-quality pictures is improved.

Image acquisition method embodiment:

acquiring a picture to be acquired by an image; dividing a picture acquired by a to-be-image by using a division model to generate a mask picture; performing channel stitching on the mask image and the picture to be acquired, and generating four-channel picture to be acquired; and performing image acquisition operation on the four-channel picture to be image-acquired by using the image acquisition model.

When the image acquisition model is used for carrying out image acquisition operation on the four-channel image to be acquired, as the four-channel image to be acquired contains the mask image obtained by using the segmentation model, the mask image can be used as priori information for distinguishing images, the image acquisition model can be helped to remove a large amount of background interference information, the accuracy of the image acquisition model is increased, and the image discrimination capability of the image acquisition model on images is further improved.

Computer apparatus embodiment:

the computer device of the present embodiment includes a processor and a memory, the processor storing a computer program that is executed by the processor to implement the image acquisition model training method and the image acquisition method of the photograph described above.

Computer-readable storage medium:

the image acquisition model training method and the image acquisition method for a photograph in a computer apparatus described in the above embodiments can be stored in a computer-readable storage medium in the form of a computer program which, when executed by a processor, can perform the steps of the image acquisition model training method embodiment and the image acquisition method embodiment for a photograph in a computer apparatus described above. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing is merely a preferred embodiment of the present invention, but the inventive design concept is not limited thereto, and many other equivalent embodiments can be included without departing from the scope of the invention, as will be apparent to those skilled in the art.

Claims

1. A method of training an image acquisition model of a photograph, comprising:

collecting photo data, and inputting the photo data into an initial image acquisition model to obtain a transparency prediction graph;

global thresholding is carried out on the transparency prediction graph to generate a segmentation mask graph;

amplifying the photo data to obtain amplified data;

taking the augmentation data as input, taking the segmentation mask map as a label, and training a segmentation model based on a segmentation neural network;

dividing the amplified data by using the trained division model to obtain an inference mask graph;

carrying out morphological expansion processing on the reasoning mask map to generate a reasoning mask morphology map;

the method is characterized in that:

performing channel splicing on the amplified data and the reasoning mask graph corresponding to the amplified data to form a four-channel graph;

and taking the four-channel image as input, taking the transparency prediction image as a label, and training an image acquisition model based on the image acquisition neural network.

2. The method for training an image acquisition model of a photograph according to claim 1, wherein:

the number of the convolution layer channels of the image acquisition neural network is four.

3. The image acquisition model training method of a photograph according to claim 1 or 2, characterized in that:

global thresholding is carried out on the transparency prediction graph, and the generation of the segmentation mask graph comprises the following steps:

setting a pixel threshold;

judging whether each pixel point in the transparency prediction graph is larger than or equal to the pixel threshold value;

if the pixel point in the transparency prediction graph is greater than or equal to the pixel threshold value, setting the pixel point to 255;

and if the pixel point in the transparency prediction graph is smaller than the pixel threshold value, setting the pixel point to be 0.

4. A method of training a photographic image acquisition model as claimed in claim 3, wherein:

the photo data acquisition comprises the following steps:

acquiring a portrait picture in a network;

removing more than two pictures of human faces through a human face detection model;

removing more than two pictures of people through a person detection model;

removing three pictures with Euler angles larger than a preset angle threshold value through a head posture estimation model;

removing the picture with the hat through the hat detection model;

and removing the picture with the blocked face through the blocking detection model to obtain the photo data.

5. The method for training the image acquisition model of a photograph as claimed in claim 4, wherein:

performing amplification processing on the photo data to obtain amplification data, wherein the amplification data comprises:

performing face analysis on the photo data by using a face analysis model;

performing adjustment operation on each picture in the picture data to generate first amplified picture data;

dividing the photo data into a first group of photos and a second group of photos with the same quantity, and respectively classifying the first amplified photo data into groups of photo data corresponding to the first amplified photo data;

migrating the makeup of the first group of photos into the second group of photos by using a makeup migration neural network model to generate second amplified photo data;

changing the portrait of the second group of pictures to the first group of pictures to generate third amplified photo data;

selecting a background picture, and performing background replacement on the photo data, the first amplified photo data, the second amplified photo data and the third amplified photo data to generate fourth amplified photo data;

the photograph data, the first amplified photograph data, the generated second amplified photograph data, the third amplified photograph data, and the fourth amplified photograph data are combined to form amplified data.

6. The method for training the image acquisition model of a photograph as claimed in claim 5, wherein:

performing background replacement on the photograph data, the first amplified photograph data, the second amplified photograph data and the third amplified photograph data to generate fourth amplified photograph data includes:

normalizing the transparency prediction graph, marking the transparency prediction graph as alpha, marking the amplification data as F, marking the background picture as B, and marking the fourth amplification photo data as I;

the fourth amplified photo data is generated by the formula i=alpha×f+ (1-alpha) ×b.

7. The method for training the image acquisition model of a photograph as claimed in claim 6, wherein:

the background picture is a monotone background or a background of an unmanned image.

8. An image acquisition method, comprising:

acquiring a picture to be acquired by an image;

dividing the picture acquired by the image to be detected by using a division model to generate a mask image;

performing channel stitching on the mask image and the picture to be acquired, and generating four-channel pictures to be acquired;

and performing image acquisition operation on the picture to be image-acquired of the four channels by using an image acquisition model.

9. Computer device, characterized in that it comprises a processor and a memory, the memory storing a computer program which, when executed by the processor, implements the image acquisition model training method of a photograph according to any one of claims 1 to 7 and the image acquisition method of claim 8.

10. A computer readable storage medium having stored thereon a computer program characterized by:

the computer program when executed implements the image acquisition model training method of a photograph as claimed in any one of claims 1 to 7 and the image acquisition method as claimed in claim 8.