CN114973384A - Electronic face photo collection method based on key point and visual salient target detection - Google Patents

Electronic face photo collection method based on key point and visual salient target detection Download PDF

Info

Publication number
CN114973384A
CN114973384A CN202210820790.2A CN202210820790A CN114973384A CN 114973384 A CN114973384 A CN 114973384A CN 202210820790 A CN202210820790 A CN 202210820790A CN 114973384 A CN114973384 A CN 114973384A
Authority
CN
China
Prior art keywords
image
face
electronic
background
carrying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210820790.2A
Other languages
Chinese (zh)
Inventor
张滢雪
司占军
种政
于彦辉
杨文鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Science and Technology
Original Assignee
Tianjin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Science and Technology filed Critical Tianjin University of Science and Technology
Priority to CN202210820790.2A priority Critical patent/CN114973384A/en
Publication of CN114973384A publication Critical patent/CN114973384A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses an electronic face photo acquisition method, which comprises the steps of sequentially carrying out face uniqueness inspection, face integrity inspection and image quality inspection on an input image, and determining an initial image which meets the requirement of making an electronic photo according to an inspection result; performing face correction on the initial image to obtain a face image in a standard posture; carrying out salient target detection on the face image in the standard posture, and carrying out image segmentation according to a detection result to obtain a character image for separating a character foreground from a character background; carrying out image post-processing on the figure image, and adjusting the size and the background color of the image to obtain an electronic figure photo meeting the requirements; the invention can judge the usability of the input image more comprehensively before image processing, on one hand, the calculation time brought by processing invalid images is avoided, on the other hand, the processed output image is ensured to be effectively usable, the efficiency is higher, the stability is stronger, and simultaneously, the additional manual post-processing is avoided.

Description

Electronic face photo collection method based on key point and visual salient target detection
Technical Field
The invention relates to the field of image processing, in particular to an electronic face photo acquisition method based on key point and visual salient target detection.
Background
Under the rapid development of the internet, the social informatization degree is increasingly improved, more and more application scenes need to provide electronic certificate photos containing portrait, which puts new requirements on the acquisition of electronic face photos, and hopes that the real-time property and the automation degree of photo shooting and processing are further improved. For example, in large-scale crowd electronic photo collection scenes such as student information collection and the like, on one hand, a shooting system is required to be capable of judging whether the parameters of shot photos meet the collection requirements in real time, and if not, the shot photos can be immediately collected again; on the other hand, the system is required to process the photos automatically, so as to avoid consuming excessive manpower and time. Based on this, the demand of various application scenes on an automatic and efficient photo collecting system is increasingly prominent, and an intelligent portrait photo collecting and processing algorithm also becomes one of research hotspots in the fields of image processing and computer vision.
The face recognition algorithm is used as a research focus in the field of computer vision, is developed rapidly and widely, and can also be applied to the acquisition and automatic processing process of electronic face photos. However, since the electronic face picture also needs to consider many factors such as picture quality, size, proportion, color, and the like in addition to the requirement for the face area, it is not sufficient to rely on the face recognition algorithm alone. In consideration of the problem, a complete algorithm flow and a complete system need to be designed, so that automatic acquisition and processing of the face photos can be completed according to different parameters.
Disclosure of Invention
The invention aims to provide an electronic face photo acquisition method based on key point and visual salient target detection, which can effectively extract a portrait target and realize background color replacement, realize end-to-end processing on an input image and efficiently acquire an electronic face photo meeting preset requirements.
In order to achieve the purpose, the invention provides the following scheme:
the electronic face photo acquisition method based on key point and visual salient object detection comprises the following steps:
for input image
Figure BDA0003742361810000011
Sequentially carrying out uniqueness inspection, integrity inspection and image quality inspection on the human face, and determining an initial image meeting the requirement of making an electronic photo according to an inspection result;
performing face correction on the initial image to obtain a face image in a standard posture;
carrying out salient target detection on the facial image in the standard posture, and carrying out image segmentation according to a detection result to obtain a character image for separating a character foreground from a character background;
and carrying out image post-processing on the figure image, and adjusting the size and the background color of the image to obtain the electronic figure photo meeting the requirements.
Preferably, the uniqueness test of the face comprises: for input image
Figure BDA0003742361810000021
Carrying out face detection to obtain a frame selection set { f) of a face region m If M is 1, 2, the.,. M }, the uniqueness of the face is verified to be passed, and if M is not 1, the uniqueness of the face is verified to be failed.
Preferably, the integrity check of the face comprises: for the only valid face f 1 Carrying out key point detection to obtain 81 key points { (x) of the face i ,y i ) I | (1, 2.. 81 }), and whether or not the conditions are simultaneously satisfied is determined
Figure BDA0003742361810000022
And W is the transverse resolution of the image, H is the longitudinal resolution, if the transverse resolution can be met, the integrity check of the face passes, and if the longitudinal resolution cannot be met, the integrity check of the face does not pass.
Preferably, the checking of the image quality comprises: converting an input image into a two-dimensional gray image; performing convolution operation on the gray level image by using a Laplace operator L, and calculating an image definition index q:
Figure BDA0003742361810000023
q=std(C L (x,y)) (3)
wherein, C L (x, y) is the convolution of laplacian operators at all pixel points (x, y) of the gray level image, and std () is standard deviation calculation; setting a sharpness threshold T q If q > T q Then inputting the image
Figure BDA0003742361810000024
Quality test passes if q is less than T q Then inputting the image
Figure BDA0003742361810000025
The quality check failed.
Preferably, the acquiring the face image of the standard posture includes: carrying out affine transformation on the initial image based on an affine transformation matrix to obtain a face image in a standard posture:
Figure BDA0003742361810000031
wherein (x, y) (x ', y') are input images respectively
Figure BDA0003742361810000032
And affine transformed image
Figure BDA0003742361810000033
The coordinates of the pixel points. Parameter a 1 ,a 2 ,a 3 ,a 4 Input image with 5 key points
Figure BDA0003742361810000034
And facial image of standard pose
Figure BDA0003742361810000035
Coordinate of (x) k ,y k ),(x k′ ,y k′ ) And substituting to solve.
Preferably, the acquiring of the image of the person with the person foreground separated from the background comprises: segmenting the image into superpixels by using a simple linear iterative clustering superpixel segmentation algorithm; constructing a graph model G (N, E) by taking the super pixels as nodes, wherein N is the node, and E is a non-directional edge between the nodes;
calculating a weight matrix omega ═ { omega } of the graph model G ij 1, 2, a, n, and a degree matrix D, diag { D { (D) } 11 ,...,d nn },
Figure BDA0003742361810000036
d ii =∑ j ω ij (6)
Wherein, c i And c j Is the color average value of the graph nodes i, j in the LAB color space, and sigma is a constant;
based on Ω and D, the optimal neighbor matrix a ═ (D- α Ω) -1 And calculating a ranking score r * As a saliency value of a super pixel, r * AY, wherein Y ═ Y i |i=1,2,...,n]Is an indication vector indicating whether each node is a seed point, if yes, Y i 0 if not 1;
respectively taking the superpixels at the upper edge, the lower edge, the left edge and the right edge of the image as background seed points to obtain an initial saliency map S bq ,S bq (i)=S t (i)×S b (i)×S l (i)×S r (i) 1, 2, n, wherein,S t (i),S b (i),S l (i),S r (i) significance map obtained by taking four edges of upper, lower, left and right as seed points
Figure BDA0003742361810000037
A vector of corresponding normalized saliency values;
reselecting the foreground seed points based on the initial saliency map, calculating to obtain a final saliency map,
Figure BDA0003742361810000038
detecting the obtained final saliency map, obtaining two parts with larger gray value difference, and setting a segmentation threshold T seg Gray value greater than T seg Is a portrait area, the gray value is less than Tseg and is a background area, and a portrait area coordinate set is respectively obtained
Figure BDA0003742361810000041
And background region coordinate set
Figure BDA0003742361810000042
u=1,2,...,U,v=1,2,...,V,U+V=W×H。
Preferably, the acquiring the image of the person with the foreground separated from the background further comprises: the face image in the standard posture needs to be subjected to salient object detection, and image segmentation of the human foreground and the human background based on a saliency threshold value is carried out based on the detection result.
Preferably, the adjusting the background color of the image comprises: replacing RGB three-channel color vectors of pixel points in a background region of the image, keeping the pixel points in a foreground character region unchanged, and obtaining an image with the replaced background color
Figure BDA0003742361810000043
Preferably, the resizing the image comprises: obtaining highest point (x) of face key points t ,y t ) Lowest point (x) b ,y b ) Leftmost point (x) l ,y l ) Rightmost point (x) r ,y r ) Taking a rectangular area obtained by connecting edges of four points as a face area, respectively setting the preset width and height of an output image as W 'and H', calculating the angular coordinate of a cutting frame through the formulas (10) to (13),
Figure BDA0003742361810000044
Figure BDA0003742361810000045
Figure BDA0003742361810000046
Figure BDA0003742361810000047
wherein, eW, eT and eB are the distances from the face region to the left, right, upper and lower edges of the output image respectively,
Figure BDA0003742361810000048
Figure BDA0003742361810000049
Figure BDA00037423618100000410
the calculation mode of eW ensures that the face area is centered left and right in the output portrait photo, and the ratio t ,ratio b Ratio, representing the relative distance of the facial region from the top and bottom, respectively t And ratio b All values are [0, 1 ]],ratio t +ratio b 1, hence eT and eB determine the forehead tip distance output portraitAnd the distance between the upper edge of the picture and the lower edge of the chin and the rectangular area in the coordinate range are used as a cutting frame to cut the image, and the output image is the electronic face picture meeting the preset requirement.
The invention provides an electronic human face photo acquisition method based on key point and visual salient object detection, the requirements of different layers are fully considered through the image usability judgment process, the uniqueness and integrity check of the human face is designed, and three verification processes of image definition evaluation can judge the usability of the input image more comprehensively before image processing, on one hand, the calculation time and resource waste caused by processing invalid images are avoided, on the other hand, the processed output image is ensured to be available effectively, and the method has higher efficiency and stronger stability, meanwhile, the invention adopts an end-to-end design idea, takes the original image shot by the camera as input, and automatically outputs the electronic face photo which accords with the preset background and size, thereby avoiding additional manual post-processing and being better applied to the electronic face photo collection scene of large-scale people.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of an electronic face photograph capture provided in an embodiment of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide an electronic face photo acquisition method based on key point and visual salient target detection, which can effectively extract a portrait target and realize background color replacement, realize end-to-end processing on an input image and efficiently acquire an electronic face photo meeting preset requirements.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below. As shown in fig. 1, the method for collecting an electronic face photo based on key point and visual salient object detection is characterized by comprising the following steps:
sequentially carrying out face uniqueness inspection, face integrity inspection and image quality inspection on the input image, and determining an initial image meeting the requirements of making an electronic photo according to an inspection result;
performing face correction on the initial image to obtain a face image in a standard posture;
carrying out salient target detection on the facial image in the standard posture, and carrying out image segmentation according to a detection result to obtain a character image for separating a character foreground from a character background;
and carrying out image post-processing on the figure image, and adjusting the size and the background color of the image to obtain the electronic figure photo meeting the requirements.
Further, the uniqueness test of the human face comprises: acquiring a frame selection set { f) of the face region m If M is 1, 2, the.,. M }, the uniqueness of the face is verified to be passed, and if M is not 1, the uniqueness of the face is verified to be failed.
Further, the integrity check of the human face comprises: carrying out key point detection on the human face to obtain 81 key points { (x) of the human face i ,y i ) 1, 2,. 81}, and judging whether max (x) < W, min (x) > 0, max (y) < H, min (y) > 0 are simultaneously met, wherein W is the transverse resolution of the image, H is the longitudinal resolution, if yes, the integrity of the face is checked to be passed, and if not, the integrity of the face is not checked to be passed;
further, the checking of the image quality includes: converting an input image into a two-dimensional gray image; performing convolution operation on the gray level image by using a Laplace operator L, and calculating an image definition index q:
Figure BDA0003742361810000061
q=std(C L (x, y)), wherein C L (x, y) is the convolution of Laplacian operators at all pixel points (x, y) of the gray level image, and std is standard deviation calculation; setting a sharpness threshold T q If q > T q The image quality passes the inspection, if q is less than T q The image quality check fails.
Further, the acquiring the facial image of the standard posture includes: and carrying out affine transformation on the input image based on the affine transformation matrix to obtain a face balanced image.
Further, the step of obtaining the image of the person with the separated foreground and background of the person comprises: segmenting the image into superpixels by using a simple linear iterative clustering superpixel segmentation algorithm; constructing a graph model G (N, E) by taking the super pixels as nodes, wherein N is the node, and E is a non-directional edge between the nodes;
calculating a weight matrix omega ═ { omega } of the graph model G ij 1, 2, a, n, and a degree matrix D, diag { D { (D) } 11 ,...,d nn },
Figure BDA0003742361810000062
Wherein, c i And c j Is the color average value of the graph nodes i, j in the LAB color space, and sigma is a constant;
based on Ω and D, the optimal neighbor matrix a ═ (D- α Ω) -1 And calculating a ranking score r * As a saliency value of a super pixel, r * AY, wherein Y ═ Y i |i=1,2,...,n]Is an indication vector, indicates whether each node is a seed point, if yes, Y i 0 if not 1;
respectively taking the superpixels at the upper edge, the lower edge, the left edge and the right edge of the image as background seed points to obtain an initial saliency map S bq ,S bq (i)=S t (i)×S b (i)×S l (i)×S r (i) 1, 2, n, wherein S t (i),S b (i),S l (i),S r (i) Significance map obtained by taking four edges of upper, lower, left and right as seed points
Figure BDA0003742361810000071
A vector of corresponding normalized saliency values;
reselecting the foreground seed points based on the initial saliency map, calculating to obtain a final saliency map,
Figure BDA0003742361810000072
detecting the obtained final saliency map, obtaining two parts with larger gray value difference, and setting a segmentation threshold T seg Gray value greater than T seg Is an image region with gray value less than T seg Respectively obtaining a set of coordinates of the portrait area for the background area
Figure BDA0003742361810000073
And background region coordinate set
Figure BDA0003742361810000074
u=1,2,...,U,v=1,2,...,V,U+V=W×H。
Further, acquiring the image of the person with the foreground and the background separated from each other further includes: the face image in the standard posture needs to be subjected to salient object detection, and image segmentation of the human foreground and the human background based on a saliency threshold value is carried out based on the detection result.
Further, adjusting the background color of the image comprises: and replacing the RGB three-channel color vector of the pixel point in the background area of the image, and obtaining the image with the replaced background color when the pixel point in the foreground character area is unchanged.
Further, resizing the image includes: obtaining highest point (x) of face key points t ,y t ) Lowest point (x) b ,y b ) Leftmost point (x) l ,y l ) Most preferablyRight side point (x) r ,y r ) Taking a rectangular area obtained by connecting edges where the four points are located as a face area, respectively setting the preset width and height of an output image as W 'and H', calculating the angular coordinate of a cutting frame,
Figure BDA0003742361810000075
wherein, eW, eT and eB are the distances from the face region to the left, right, upper and lower edges of the output image respectively,
Figure BDA0003742361810000076
the calculation mode of eW ensures that the face area is centered left and right in the output portrait photo, and the ratio t ,ratio b Ratio, representing the relative distance of the facial region from the top and bottom, respectively t And ratio b All values are [0, 1 ]],ratio t +ratio b Thus, eT and eB determine the distance of the forehead tip from the upper edge of the output portrait and the chin from the lower edge. And cutting the image for a cutting frame by a rectangular area in the coordinate range, wherein the output image is the electronic face picture meeting the preset requirement.
The invention also provides a specific embodiment step flow:
and 1.1, checking the uniqueness and integrity of the face. Reading in an original image shot by a camera, performing face and face key point detection on the original image, and judging the uniqueness and integrity of the face in the image according to the result. And if the unique and complete human face is included, the manufacturing requirement is considered to be met, the step 1.2 is carried out, otherwise, the requirement is not met, the problem is prompted, shooting is required to be carried out again, and the step 1.1 is carried out again.
And step 1.2, evaluating the quality of the face image. And (3) performing quality evaluation on the image which passes the face uniqueness and integrity check, entering the step (2) if the image meets the quality requirement, otherwise marking that the image does not meet the manufacturing requirement, prompting a problem, requiring to shoot again, and re-executing the step (1.1).
The image processing stage processes the input image which meets the manufacturing requirement, and generates an electronic face photo according to a preset requirement, and the method comprises the following steps:
step 2.1, face correction. And carrying out face angle correction based on the face detection and key point detection results, and converting the image into a standard facial non-inclined posture.
And 2.2, performing salient target detection on the obtained standard posture face image, performing image segmentation based on a detection result, and separating the character foreground from the character background.
And 2.3, carrying out image post-processing, and adjusting the image size, the background color and the like according to preset parameters to obtain the electronic face picture meeting the preset requirements.
Further, the uniqueness and completeness of the human face in step 1.1, wherein the uniqueness means that there is only one valid human face in the input image; completeness refers to the detected face being complete, including all key elements of the face. The detection is realized by face detection and face key point detection, and the specific process is as follows:
step 1.1.1, for input image
Figure BDA0003742361810000081
Face detection is carried out to obtain a frame selection set { f) containing a face region m 1, 2.., M }, and based on this execution condition, judges:
(1) if M is equal to 1, it is indicated that only 1 valid face is detected in the image, the image is marked as meeting the requirements, and the step 1.1.2 is carried out;
(2) if M is larger than 1, the image is indicated to detect more than 1 effective human face, the image is marked to be not in accordance with the uniqueness requirement, the subsequent steps are not carried out any more, and the image is prompted to be shot again and the step is carried out again.
(3) If M is 0, the fact that no effective face is detected in the image is indicated, the image is marked to be not in accordance with the uniqueness requirement, the subsequent steps are not carried out, re-shooting is prompted, and the step is executed again.
Step 1.1.2, the only effective face f obtained by the detection of the step 1.1.1 1 Performing key point detection to obtain a list ranging from forehead to chin81 facial key points within the eyebrow, eye, nose, mouth contours { (x) i ,y i ) And i is 1, 2, 81, and judging whether the face in the image is complete or not according to the coordinates of key points of the face contour. And setting the transverse resolution of the image as W and the longitudinal resolution as H, if the coordinates of the key points meet the following conditions, indicating that the face contour is completely contained in the image and marked to meet the integrity requirement, and entering the step 1.2:
Figure BDA0003742361810000091
if any condition of the above formula is not met, it is indicated that some faces are not included in the image range, the faces are incomplete and marked as not meeting the integrity requirement, and the subsequent steps are not performed any more, prompting to perform re-shooting and re-executing the step 1.1. It should be noted that if the key facial organs are missing, a valid face cannot be detected, i.e. the step 1.1.1 cannot be passed, so that only the integrity of the facial contour is checked here, and the integrity of the key facial organs is not checked repeatedly.
Further, the quality evaluation of the face image in step 1.2, which evaluates the image definition by using a laplacian gradient function, includes the following steps:
step 1.2.1, for input image
Figure BDA0003742361810000092
Converting the image into a two-dimensional gray image, performing convolution operation on the gray image by using a Laplace operator L shown as a formula (2) to calculate an image definition index q:
Figure BDA0003742361810000093
q=std(C L (x,y)) (3)
wherein, C L (x, y) is the convolution of laplacian operators at all pixel points (x, y) of the gray image, and std () is the standard deviation calculation.
Step 1.2.2, according to calculationAnd judging whether the image quality meets the requirements or not by the obtained definition index q. Setting a sharpness threshold T q If q < T q If so, the image definition does not meet the requirement, the subsequent steps are not carried out, parameters such as focusing of the camera and the like are prompted to be adjusted, and the step 1.1 is carried out again; if q > T q If so, the image meets the definition requirement, and the step 2 is entered.
Further, the face correction in step 2.1 aims at converting the image into a standard posture with no face inclination, and is implemented as follows:
step 2.1.1, taking the two end points of the canthus and the nose end point of the left eye and the right eye from the 81 key points of the face part in the step 1.1.2, and taking the total 5 key point coordinates { (x) k ,y k ) 1, 2, 3, 4, 5 as a reference point for image transformation.
Step 2.1.2, image pair based on affine transformation matrix
Figure BDA0003742361810000101
Affine transformation is carried out, and the standard posture of face balance is converted into:
Figure BDA0003742361810000102
wherein (xy) (x ', y') are input images, respectively
Figure BDA0003742361810000103
And affine transformed image
Figure BDA0003742361810000104
The coordinates of the pixel points. Parameter a 1 ,a 2 ,a 3 ,a 4 Input image with 5 key points
Figure BDA0003742361810000105
And standard attitude image
Figure BDA0003742361810000106
Coordinate of (x) k ,y k ),(x k′ ,y k′ ) And substituting to solve.
Further, the salient object detection in step 2.2 refers to using a graph-based popularity ranking method to rank the image containing the standard-oriented face in step 2.1
Figure BDA0003742361810000107
And detecting the salient objects, segmenting the images based on the saliency threshold value based on the detection result, and separating the foreground and the background of the person. The specific implementation mode is as follows:
step 2.2.1, for the image
Figure BDA0003742361810000108
The image is segmented into superpixels using a simple linear iterative clustering superpixel segmentation algorithm. And constructing a graph model G (N, E) by taking the super pixels as nodes, wherein N is the node, and E is a non-directional edge between the nodes.
Step 2.2.2, for the graph model G containing n nodes, calculating the weight matrix thereof
Ω={ω ij I, j ═ 1, 2.., n } and degree matrix D ═ diag { D ═ D 11 ,...,d nn }:
Figure BDA0003742361810000111
d ii =∑ j ω ij (6)
Wherein, c i And c j Is the color average of the graph nodes i, j in the LAB color space, and σ is a constant.
Step 2.2.3, based on Ω and D, calculate the optimal neighbor matrix a ═ D- α Ω) -1 And calculating a ranking score r * As saliency values of the super-pixels:
r * =AY (7)
wherein Y ═ { Y ═ Y i |i=1,2,...,n]Is an indication vector, indicates whether each node is a seed point, if yes, Y i Is not 1, but 0. Respectively taking the super-pixels at the upper, lower, left and right edges of the image as background seedsPoint, find the initial saliency map S bq
S bq (i)=S t (i)×S b (i)×S l (i)×S r (i),i=1,2,...,n (8)
Wherein S is t (i),S b (i),S l (i),S r (i) The saliency map is obtained by taking four edges of upper, lower, left and right as seed points,
Figure BDA0003742361810000112
is the corresponding normalized saliency value vector.
Step 2.2.4, based on the initial saliency map, reselecting the foreground seed points, and calculating to obtain a final saliency map:
Figure BDA0003742361810000113
and 2.2.5, obtaining a saliency map through saliency target detection, wherein the foreground target and the background have clear saliency difference, the gray value of the foreground area is close to 255, and the gray value of the background area is close to 0. Thus, the division threshold T is set seg Make the gray value greater than T seg Is divided into portrait areas, less than T seg Is divided into background areas, and a portrait area coordinate set is respectively obtained
Figure BDA0003742361810000121
And background region coordinate set
Figure BDA0003742361810000122
u=1,2,...,U,v=1,2,...,V,U+V=W×H;
Further, the image post-processing of step 2.3 is to process the image according to preset parameters
Figure BDA00037423618100001212
The size, the background color and the like of the face picture are adjusted to obtain the electronic face picture meeting the preset requirements. The specific implementation process is as follows:
step 2.3.1, image
Figure BDA0003742361810000123
The center coordinate is
Figure BDA0003742361810000124
Replacing the RGB three-channel color vector c of the pixel point in the background area with a preset color vector c', keeping the foreground portrait area unchanged, and obtaining an image with the replaced background color
Figure BDA0003742361810000125
Step 2.3.2, replacing the image with background color
Figure BDA0003742361810000126
And cutting the blank into a preset size. Taking the highest point (x) from the facial key points obtained in step 1.1.2 t ,y t ) Lowest point (x) b ,y b ) Leftmost point (x) l ,y l ) Rightmost point (x) r ,y r ) The rectangular area obtained by connecting the edges of the four points is used as the face area, namely the end point is (x) l ,y t ),(x l ,y b ),(x r ,y t ),(x r ,y b ) A rectangular region of (a). Assuming that the preset width and height of the output image are W ', H', respectively, the crop box angular coordinate is calculated by equations (10) to (13):
Figure BDA0003742361810000127
Figure BDA0003742361810000128
Figure BDA0003742361810000129
Figure BDA00037423618100001210
wherein, eW, eT, eB are the distances from the face region to the left and right edges and the top and bottom edges of the output image, respectively:
Figure BDA00037423618100001211
Figure BDA0003742361810000131
Figure BDA0003742361810000132
the calculation mode of eW ensures that the face area is centered left and right in the output portrait photo, and the ratio t ,ratio b Ratio, representing the relative distance of the facial region from the top and bottom, respectively t And ratio b All values are [0, 1 ]],ratio t +ratio b 1, therefore, eT and eB determine the distance from the forehead top to the upper edge of the output portrait picture and the distance from the chin to the lower edge, and the rectangular area in the coordinate range is used as a cutting frame to pair the images
Figure BDA0003742361810000133
And cutting, wherein the output image is the electronic face picture meeting the preset requirement.
The invention provides an electronic human face photo acquisition method based on key point and visual salient object detection, the requirements of different layers are fully considered through the image usability judgment process, the uniqueness and integrity check of the human face is designed, and three verification processes of image definition evaluation can judge the usability of the input image more comprehensively before image processing, on one hand, the calculation time and resource waste caused by processing invalid images are avoided, on the other hand, the processed output image is ensured to be available effectively, and the method has higher efficiency and stronger stability, meanwhile, the invention adopts an end-to-end design idea, takes the original image shot by the camera as input, and automatically outputs the electronic face photo which accords with the preset background and size, thereby avoiding additional manual post-processing and being better applied to the electronic face photo collection scene of large-scale people.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (9)

1. An electronic face photo collection method based on key point and visual salient object detection is characterized by comprising the following steps:
for input image
Figure FDA0003742361800000011
Sequentially carrying out uniqueness inspection, integrity inspection and image quality inspection on the human face, and determining an initial image meeting the requirement of making an electronic photo according to an inspection result;
performing face correction on the initial image to obtain a face image in a standard posture;
carrying out salient target detection on the facial image in the standard posture, and carrying out image segmentation according to a detection result to obtain a character image for separating a character foreground from a character background;
and carrying out image post-processing on the figure image, and adjusting the size and the background color of the image to obtain the electronic figure photo meeting the requirements.
2. The electronic face photograph acquisition method of claim 1, wherein the checking of the uniqueness of the face comprises: for input image
Figure FDA0003742361800000012
Carrying out face detection to obtain a frame selection set { f) of a face region m If M is 1, 2, the.,. M }, the uniqueness of the face is verified to be passed, and if M is not 1, the uniqueness of the face is verified to be failed.
3. The electronic human face photo collection method of claim 2, wherein the integrity check of the human face comprises: for only valid face f 1 Carrying out key point detection to obtain 81 key points { (x) of the face i ,y i ) I | (1, 2.. 81 }), and whether or not the conditions are simultaneously satisfied is determined
Figure FDA0003742361800000013
And W is the transverse resolution of the image, H is the longitudinal resolution, if the transverse resolution can be met, the integrity check of the face passes, and if the longitudinal resolution cannot be met, the integrity check of the face does not pass.
4. The electronic face photograph acquisition method of claim 3, wherein the checking of the image quality comprises: converting an input image into a two-dimensional gray image; performing convolution operation on the gray level image by using a Laplace operator L, and calculating an image definition index q:
Figure FDA0003742361800000014
q=std(C L (x,y)) (3)
wherein, C L (x, y) is the convolution of laplacian operators at all pixel points (x, y) of the gray level image, and std () is standard deviation calculation; setting a sharpness threshold T q If q > T q Then inputting the image
Figure FDA0003742361800000021
Quality test is passed, if q is less than T q Then inputting the image
Figure FDA0003742361800000022
The quality check failed.
5. The electronic human face photo capture method of claim 1, wherein the acquiring a facial image of a standard pose comprises: carrying out affine transformation on the initial image based on an affine transformation matrix to obtain a face image in a standard posture:
Figure FDA0003742361800000023
wherein (x, y) (x ', y') are input images respectively
Figure FDA0003742361800000024
And affine transformed image
Figure FDA0003742361800000025
The coordinates of the pixel points. Parameter a 1 ,a 2 ,a 3 ,a 4 Input image with 5 key points
Figure FDA0003742361800000026
And facial image of standard pose
Figure FDA0003742361800000027
Coordinate of (x) k ,y k ),(x k′ ,y k′ ) And substituting to solve.
6. The electronic facial photograph acquisition method of claim 1, wherein acquiring the image of the person with the person's foreground separated from the background comprises: segmenting the image into superpixels by using a simple linear iterative clustering superpixel segmentation algorithm; constructing a graph model G (N, E) by taking the super pixels as nodes, wherein N is the node, and E is a non-directional edge between the nodes;
calculating a weight matrix omega ═ { omega } of the graph model G ij I, j ═ 1, 2.., n } and degree matrix D ═ diag { D ═ D 11 ,...,d nn },
Figure FDA0003742361800000028
d ii =∑ j ω ij (6)
Wherein, c i And c j Is the color average value of the graph nodes i, j in the LAB color space, and sigma is a constant;
based on Ω and D, the optimal neighbor matrix a ═ (D- α Ω) -1 And calculating a ranking score r * As a saliency value of a super pixel, r * AY, wherein Y ═ Y i |i=1,2,...,n]Is an indication vector, indicates whether each node is a seed point, if yes, Y i 1, if not, 0;
respectively taking the super-pixels at the upper, lower, left and right edges of the image as background seed points to obtain an initial saliency map S bq ,S bq (i)=S t (i)×S b (i)×S l (i)×S r (i) 1, 2, n, wherein S t (i),S b (i),S l (i),S r (i) Significance map obtained by taking four edges of upper, lower, left and right as seed points
Figure FDA0003742361800000031
Figure FDA0003742361800000032
A vector of corresponding normalized saliency values;
reselecting the foreground seed points based on the initial saliency map, calculating to obtain a final saliency map,
Figure FDA0003742361800000033
detecting the obtained final saliency map, obtaining two parts with larger gray value difference, and setting a segmentation threshold T seg Gray value greater than T seg Is an image region with gray value less than T seg Respectively obtaining a set of coordinates of the portrait area for the background area
Figure FDA0003742361800000034
And background region coordinate set
Figure FDA0003742361800000035
u=1,2,...,U,v=1,2,...,V,U+V=W×H。
7. The electronic face photograph capture method of claim 6, wherein obtaining an image of a person with the person's foreground separated from the background further comprises: the face image in the standard posture needs to be subjected to salient object detection, and image segmentation of the human foreground and the human background based on a saliency threshold value is carried out based on the detection result.
8. The electronic human face photo collection method of claim 1, wherein adjusting the image background color comprises: replacing RGB three-channel color vectors of pixel points in the background area of the image, keeping the pixel points in the foreground character area unchanged, and obtaining the image with the replaced background color
Figure FDA0003742361800000036
9. The electronic human face photo capture method of claim 1, wherein resizing the image comprises: obtaining highest point (x) of face key points t ,y t ) Lowest point (x) b ,y b ) Leftmost point (x) 1 ,y 1 ) Rightmost point (x) r ,y r ) Taking a rectangular area obtained by connecting edges of four points as a face area, wherein the preset width and height of an output image are respectively W 'and H', and the formula (A) is10) - (13) calculating the angular coordinates of the cutting frame,
Figure FDA0003742361800000041
Figure FDA0003742361800000042
Figure FDA0003742361800000043
Figure FDA0003742361800000044
wherein, eW, eT and eB are the distances from the face region to the left, right, upper and lower edges of the output image respectively,
Figure FDA0003742361800000045
Figure FDA0003742361800000046
Figure FDA0003742361800000047
the way we compute guarantees that the face region is centered left and right in the output portrait photo, where ratio t ,ratio b Ratio, representing the relative distance of the facial region from the top and bottom, respectively t And ratio b All values are [0, 1 ]],ratio t +ratio b 1, hence eT and eB determine the distance of the forehead tip from the upper edge of the output portrait, the distance of the chin from the lower edge, the moment in the coordinate rangeAnd the shape area is used for cutting the image by the cutting frame, and the output image is the electronic face picture meeting the preset requirement.
CN202210820790.2A 2022-07-12 2022-07-12 Electronic face photo collection method based on key point and visual salient target detection Pending CN114973384A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210820790.2A CN114973384A (en) 2022-07-12 2022-07-12 Electronic face photo collection method based on key point and visual salient target detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210820790.2A CN114973384A (en) 2022-07-12 2022-07-12 Electronic face photo collection method based on key point and visual salient target detection

Publications (1)

Publication Number Publication Date
CN114973384A true CN114973384A (en) 2022-08-30

Family

ID=82970397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210820790.2A Pending CN114973384A (en) 2022-07-12 2022-07-12 Electronic face photo collection method based on key point and visual salient target detection

Country Status (1)

Country Link
CN (1) CN114973384A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115509351A (en) * 2022-09-16 2022-12-23 上海仙视电子科技有限公司 Sensory linkage situational digital photo frame interaction method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115509351A (en) * 2022-09-16 2022-12-23 上海仙视电子科技有限公司 Sensory linkage situational digital photo frame interaction method and system

Similar Documents

Publication Publication Date Title
CN110163114B (en) Method and system for analyzing face angle and face blurriness and computer equipment
Zhu et al. A three-pathway psychobiological framework of salient object detection using stereoscopic technology
JP6438403B2 (en) Generation of depth maps from planar images based on combined depth cues
US11087169B2 (en) Image processing apparatus that identifies object and method therefor
US8411932B2 (en) Example-based two-dimensional to three-dimensional image conversion method, computer readable medium therefor, and system
JP5538617B2 (en) Methods and configurations for multi-camera calibration
JP4234381B2 (en) Method and computer program product for locating facial features
CN113065558A (en) Lightweight small target detection method combined with attention mechanism
CN107909081B (en) Method for quickly acquiring and quickly calibrating image data set in deep learning
Zhang et al. Detecting and extracting the photo composites using planar homography and graph cut
JP2004078912A (en) Method for positioning face in digital color image
CN109711268B (en) Face image screening method and device
CN110473221B (en) Automatic target object scanning system and method
WO2018053952A1 (en) Video image depth extraction method based on scene sample library
CN110717936B (en) Image stitching method based on camera attitude estimation
CN108470178B (en) Depth map significance detection method combined with depth credibility evaluation factor
CN109815823B (en) Data processing method and related product
CN111695431A (en) Face recognition method, face recognition device, terminal equipment and storage medium
CN111553422A (en) Automatic identification and recovery method and system for surgical instruments
CN110120013A (en) A kind of cloud method and device
CN110909724A (en) Multi-target image thumbnail generation method
CN111222433A (en) Automatic face auditing method, system, equipment and readable storage medium
CN114973384A (en) Electronic face photo collection method based on key point and visual salient target detection
CN115222884A (en) Space object analysis and modeling optimization method based on artificial intelligence
CN116229189B (en) Image processing method, device, equipment and storage medium based on fluorescence endoscope

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination