CN110766645B

CN110766645B - Target person recurrence map generation method based on person identification and segmentation

Info

Publication number: CN110766645B
Application number: CN201911017510.9A
Authority: CN
Inventors: 姜光; 来滇之; 史梦真; 马全盟
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2023-03-10
Anticipated expiration: 2039-10-24
Also published as: CN110766645A

Abstract

The invention discloses a target character reproduction graph generation method based on character recognition and segmentation, which comprises the following implementation scheme: in a plurality of input pictures collected in the same scene, one picture is arbitrarily selected as a reference picture for generating a reproduction picture of a target person, the target person in each picture is determined by a face recognition and detection algorithm, non-target persons are removed in the reference picture by an example segmentation algorithm, the background of the reference picture is supplemented by the backgrounds of other input pictures except the reference picture, and the regional images of the target persons in the other pictures are used for replacing the regional images of the corresponding positions in the reference picture, so that a reproduction picture which has complete background information and consists of the target persons in all the input pictures is obtained. The method has the advantages that the target person is accurately selected by using a face recognition and detection algorithm, and the background information of the picture is supplemented by using an example segmentation algorithm.

Description

Target person reproduction graph generation method based on person identification and segmentation

Technical Field

The invention belongs to the technical field of image processing, and further relates to a person target recurrence map generation method based on person identification and segmentation in the technical field of computer vision. The method can be used for removing redundant characters from a plurality of pictures collected at the dense crowd and generating a character target reproduction picture from the plurality of pictures from which the redundant characters are removed.

Background

In a natural picture shot in a complex scene, due to numerous people and a disordered background, a target person cannot be highlighted, and the background is seriously shielded, accurate target person and enough background information cannot be acquired for image processing such as person reproduction and contrast. For example, when a person wants to take a picture containing only the target person at a crowded place such as an airport, a tourist attraction, a shopping center, a physical fitness or a sports training field, the person needs to find the angle in a time-consuming and labor-consuming manner, and background information is lost when the person selects the place. Removing non-target persons may cause the image to contain more background information, highlighting the target person. In addition, the target characters at different positions in multiple pictures in the same scene can appear in the same picture, so that the interest and the enjoyment of the picture can be improved, and the picture can contain more information, such as a sentence represented by multiple sign language actions in the same picture.

The patent document of China university for Petroleum, "a street photo target person extraction method" (application number: 201711135299.1 application date: 2017.11.15 publication number: 109145911A), discloses a method for extracting a target person from a street photo. The method first generates a candidate Region using Region pro-potential Network (RPN). And then, extracting the overall features of the image by using the convolutional layer of fast rcnn to obtain a feature map of each candidate region in the image, and performing pixel correction on each candidate region by using RoIAlign. And after the feature map of each candidate region is obtained, predicting each candidate region to obtain the category and the region bounding box of the candidate region. And predicting the category of each pixel point in each candidate region by using a designed FCN frame for each candidate region to finally obtain an image instance segmentation result. And extracting the individual image of the target person by using the mask matrix obtained by the segmentation result and manual interaction. The method has the disadvantages that the target person is not judged, the image of the target person needs to be manually extracted, the operation is complex, and wrong target persons can be selected.

The patent document filed by Kyowa digital technology Co., ltd "a human image processing method and apparatus" (application No. 201510235866.5, filing date 2015.5.11, publication No. 104794462B) discloses a human image processing method. The method comprises the steps of obtaining a face area in a target image according to a preset face recognition algorithm; under the condition that the number of the obtained face areas is not less than two, dividing the obtained face areas into a foreground face area and a background face area according to a preset classification algorithm and/or according to selection operation of a user; and processing the background face region obtained by the distinguishing according to a preset first image processing algorithm, so that the visual effect of the background face region obtained by the distinguishing is poor. The method has the defects that the position and pixel information of the target person are lost due to the fact that the target person and the background are blurred while the non-target person face area is subjected to blurring processing.

Disclosure of Invention

In view of the above drawbacks of the prior art, the present invention provides a method for generating a target person recurrence map based on person identification and segmentation, which is used to solve the problems of too many persons in a picture and disordered picture background.

The technical idea for realizing the purpose of the invention is as follows: and removing non-target characters from the input picture, and integrating all target characters in one picture.

The implementation steps of the invention comprise the following steps:

step 1, inputting pictures:

inputting at least two pictures collected in the same scene, wherein each picture at least comprises a target person to be determined;

step 2, selecting a reference graph:

arbitrarily selecting an input picture as a reference picture for generating a target character reproduction picture;

step 3, establishing a face data set:

carrying out face detection on each input picture and a reference picture by using a face detection algorithm, carrying out face correction, and forming a face data set by all corrected face pictures;

step 4, determining a target person:

inputting the pictures in the face data set into a trained face recognition network, outputting the face characteristic vector of each figure, and taking the figure corresponding to the characteristic vector with the largest number as the determined target figure in each picture;

step 5, determining the position of the non-target person in the reference image:

obtaining a non-target person region image except the target person and position information of the non-target person in the reference image by using an example segmentation algorithm;

step 6, finding the positions of the other pictures after the target person is determined, wherein the positions of the other pictures are the same as the positions of the non-target persons in the reference picture; judging whether the positions in the other pictures have complete backgrounds or not, and if not, executing the step 7; otherwise, executing step 8;

step 7, executing step 3 after inputting a picture meeting the condition in step 1;

step 8, replacing the non-target character region image in the corresponding position region in the reference image by the background region in the complete background picture to obtain an updated reference image;

step 9, determining the position of the target person:

determining the target person region image and the position information of the target person in the other pictures by adopting the same example segmentation algorithm as the step 5;

step 10, obtaining a reproduction diagram:

finding the position which is the same as the position of the target person in the rest pictures in the updated reference picture; and replacing the corresponding position area image in the reference image with the target person area image in the other images to obtain a reproduction image consisting of the target persons in all the input images.

Compared with the prior art, the invention has the following advantages:

1, because the invention uses the example segmentation algorithm to determine the target person region image and the position information of the target person in the picture, the problems that the operation of manually selecting the target person is complex and the wrong target person can be selected in the prior art are overcome, and the invention has the advantage that the target person can be accurately selected to generate the correct target person reproduction picture.

2, because the background area in the complete background picture is used for replacing the image of the non-target character area in the corresponding position area in the reference picture, the invention overcomes the problem that the position and the pixel information of the target character are lost because the target character and the background are blurred while the fuzzy processing is carried out on the face area of the non-target character in the prior art, and has the advantage that the background information can be completely reserved in the generated reproduction picture.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The specific steps of the implementation of the present invention are further described below with reference to fig. 1.

Step 1, inputting a picture.

At least two pictures collected in the same scene are input, and each picture at least comprises a target person to be determined. In the embodiment of the invention, 8 pictures collected at the same place of a school are input, and each picture comprises a target person and non-target persons with different numbers of people.

And 2, selecting a reference graph.

One input picture is arbitrarily selected from the 8 pictures input in the embodiment of the present invention as a reference picture for generating a reproduction picture of a target person.

And step 3, establishing a face data set.

And performing face detection on the reference image and the remaining 7 images in the embodiment of the invention by adopting a face detection algorithm, performing face rectification, and forming a face data set by all the rectified face images.

Firstly, each input picture and a reference picture are sequentially input into a trained face region suggestion network, and each face frame in each picture is output.

The trained face area suggestion network is a full-connection network comprising a full-connection layer and a face key point locator. The fully connected layer of the network can mark each face frame from each input picture.

In the embodiment of the invention, the trained face region suggestion Network adopts a region suggestion Network Proposal Network in a Multi-target Cascaded Convolutional Network Multi-task Cascaded masked conditional Network.

Secondly, performing regression processing on each face frame by using a trained convolutional neural network, outputting coordinates of the upper left corner and the lower right corner of the regressed face frame, and outputting five feature points corresponding to five sense organs of each face by using a face key point positioner;

the regression processing of the invention adopts a trained convolutional neural network which comprises a full connection layer boundary Box regression-Box regression. The full-connection layer boundary frame of the network regresses a Bounding-Box regression to manually select a real face frame Ground route for each face in each input picture, judges whether the Intersection ratio of the face frame calibrated in the first step and the manually selected real face frame is larger than an Intersection over Unit, if so, outputs the coordinates of the upper left corner and the lower right corner of the calibrated face frame, otherwise, finely adjusts the calibrated face frame until the Intersection ratio is larger than the threshold, and outputs the coordinates of the upper left corner and the lower right corner of the regressed face frame.

The output feature points of the invention adopt a face key point positioner, which can extract five feature points corresponding to five sense organs for each face from an input picture containing the face and output the pixel coordinates of each feature point.

In the embodiment of the invention, the trained boundary box regression and face key point positioner adopts a refined Network Refine Network in a Multi-target cascade convolution Network Multi-task Cascade connected conditional Network.

And thirdly, determining each face frame after regression processing by using the coordinates of the upper left corner and the lower right corner of the face frame, cutting each face frame from each input picture and a reference picture, and aligning the face by using five feature points corresponding to each face after regression.

In the embodiment of the invention, a picture without pixel information is preset, the picture comprises five characteristic point positions corresponding to five sense organs of the front face image, and the face picture cut out in the third step is mapped to the pixel position corresponding to the preset picture through affine transformation to obtain the corrected face picture.

And 4, determining the target person.

And inputting the pictures in the face data set into a trained face recognition network, outputting the face feature vector of each person, and taking the person corresponding to the feature vector with the maximum number as the determined target person in each picture.

The invention outputs the facial feature vector of each person through the trained full-connection layer convolutional neural network, the full-connection layer of the network converts each face picture in the input face data set into the feature vector, and the person corresponding to the feature vector with the largest number is used as the determined target person in each picture.

In the embodiment of the invention, the face recognition network adopts a deep neural network ArcFace.

And 5, determining the position of the non-target person in the reference image.

The following example segmentation algorithm is employed to obtain the image of the non-target person region other than the target person and the position information of the non-target person in the reference map.

Firstly, establishing an image segmentation model formed by sequentially connecting a convolutional neural network, an interested region recommendation network and a segmentation network;

the convolutional neural network adopts a residual error network with 50 layers;

the network structure of the region of interest is a tree structure and comprises a trunk and two branches, the size of the trunk convolutional layer convolutional kernel is set to be 3 x 3, and the size of the branch convolutional layer convolutional kernel is set to be 1 x 1;

the segmentation network structure is composed of 6 convolutional layers and 2 fully-connected layers, the size of a convolutional kernel of each convolutional layer is set to be 3 x 3, and the size of a convolutional kernel of each fully-connected layer is set to be 7 x 7;

secondly, training the image segmentation model by using an image set containing characters until the characters can be segmented pixel by pixel to obtain a trained image segmentation model; the reference map is input to the trained image segmentation model, and the non-target person region image and the position information of the non-person are output.

Step 6, finding the positions of the target person in the rest pictures after the target person is determined, wherein the positions of the target person are the same as the positions of the non-target persons in the reference picture; judging whether the positions in the other pictures have complete backgrounds or not, and if not, executing the step 7; otherwise, step 8 is performed.

And 7, inputting a picture meeting the condition in the step 1 and then executing the step 3.

And 8, replacing the non-target character region image in the corresponding position region in the reference image by using the background region in the complete background picture to obtain an updated reference image.

And 9, determining the position of the target person.

Determining the target person region image and the position information of the target person in the other pictures by adopting the following example segmentation algorithm;

secondly, training the image segmentation model by using an image set containing characters until the characters can be segmented pixel by pixel to obtain a trained image segmentation model; and inputting the other pictures except the reference picture into the trained image segmentation model, and outputting the target person region image and the position information of the person.

And step 10, obtaining a reproduction diagram.

In the embodiment of the invention, the target person in 7 pictures except the reference picture is used for replacing the image of the corresponding position area in the reference picture, and a reproduction picture of the target person appearing 8 times at different positions is obtained.

Claims

1. A target person reproduction image generation method based on person recognition and segmentation is characterized in that a target person is judged according to facial features of the person, non-target persons are removed through an example segmentation algorithm, and a target person reproduction image is generated, wherein the method comprises the following steps:

step 1, inputting pictures:

step 2, selecting a reference graph:

step 3, establishing a face data set:

step 4, determining a target person:

step 9, determining the position of the target person:

step 10, obtaining a reproduction chart:

2. The method of claim 1, wherein the target person recurrence map generation method comprises: the steps of the face detection algorithm in step 3 are as follows:

the method comprises the steps that firstly, each input picture and a reference picture are sequentially input into a trained face region suggestion network, and each face frame in each picture is output;

3. The method of claim 1, wherein the step of generating the target person recurrence map comprises: and 4, adopting a deep neural network ArcFace as the face recognition network in the step 4.

4. The method of claim 1, wherein the target person recurrence map generation method comprises: the example segmentation algorithm in the steps 5 and 8 comprises the following steps:

the segmentation network structure consists of 6 convolutional layers and 2 full-connection layers, the size of a convolutional kernel of each convolutional layer is set to be 3 x 3, and the size of a convolutional kernel of each full-connection layer is set to be 7 x 7;

secondly, training the image segmentation model by using an image set containing characters until the characters can be segmented pixel by pixel to obtain a trained image segmentation model; a picture is input to the trained image segmentation model, and a person region image and position information of a person, the person being a non-target person in step 5, a target person in step 8, the picture being a reference picture in step 5, and the remaining pictures excluding the reference picture in step 8, are output.