CN111222473A

CN111222473A - Analysis and recognition method for clustering faces in video

Info

Publication number: CN111222473A
Application number: CN202010022969.4A
Authority: CN
Inventors: 冯中华; 李岩超; 相丹宁
Original assignee: Beijing Baimu Technology Co Ltd
Current assignee: Beijing Baimu Technology Co Ltd
Priority date: 2020-01-09
Filing date: 2020-01-09
Publication date: 2020-06-02
Anticipated expiration: 2040-01-09
Also published as: CN111222473B

Abstract

The invention provides an analysis and identification method for clustering faces in a video, which comprises the following steps: capturing a target face in a video stream to be identified; performing cluster analysis and recognition on the captured target face to obtain a cluster analysis and recognition result; and performing first preset marking on the same target face in all the captured target faces based on the clustering analysis recognition result, and classifying the same target face with the first preset marking as the same person. By adopting clustering analysis and recognition, the classification of the same target face is realized, and the classification efficiency is high.

Description

Analysis and recognition method for clustering faces in video

Technical Field

The invention relates to the technical field of face recognition, in particular to an analysis and recognition method for clustering faces in videos.

Background

With the rapid development of imaging devices, automatic face recognition becomes an increasingly important task, and the appearance of video poses new challenges to the face recognition problem.

At present, the method generally adopted for video face recognition is to perform image processing on a video face image on the basis of a representative example or an image set, and then classify the video face image after the image processing, but the classification efficiency is low.

Disclosure of Invention

The invention provides an analysis and recognition method for clustering faces in a video, which is used for classifying the same target face by adopting clustering analysis and recognition and has high classification efficiency.

The embodiment of the invention provides an analysis and identification method for clustering faces in a video, which comprises the following steps:

capturing a target face in a video stream to be identified;

performing cluster analysis and recognition on the captured target face to obtain a cluster analysis and recognition result;

and performing first preset marking on the same target face in all the captured target faces based on the cluster analysis recognition result, and classifying the same target face with the first preset marking as the same person.

In one possible implementation manner, the process of grabbing the target face in the video stream to be recognized includes:

carrying out video shooting on a preset area to obtain a video stream to be identified;

capturing a dynamic video frame in the video stream to be identified;

determining whether a target face exists in the dynamic video frame, if so, reserving the dynamic video frame, and capturing the target face in the dynamic video frame;

otherwise, removing the dynamic video frame.

In one possible implementation manner, when a target face exists in the dynamic video frame, the process of grabbing the target face in the dynamic video frame includes:

determining a pixel value of each plane coordinate point in the dynamic video frame based on a plane coordinate system;

acquiring pixel values of all plane coordinate points in a preset comparison area at the current position of each plane coordinate point, performing first comparison processing on the pixel values of the plane coordinate points at the current position, judging whether the pixel association degrees of the pixel values of the plane coordinate points at the current position and the pixel values of all the plane coordinate points in the preset comparison area are greater than the preset association degree or not, and if so, judging that the pixel values of the plane coordinate points at the current position are qualified;

otherwise, performing first correction processing on the pixel values of the plane coordinate point at the current position based on the pixel values of all the plane coordinate points in the preset comparison area and a pixel correction model until the pixel values of the plane coordinate point at the current position are qualified;

meanwhile, carrying out second preset marking on the plane coordinate points corresponding to all qualified pixel values in the same pixel range to obtain a marked image;

when the pixel value of each plane coordinate point in the plane coordinate system is judged to be qualified, determining a boundary area of the target face based on boundary information of a reference face acquired through historical video streaming in advance;

meanwhile, the boundary line of the target face is taken from the animation in the boundary area, when the area occupation ratio from the drawn boundary line to the internal line of the boundary area is greater than the preset occupation ratio, the preset adjustment is carried out on the boundary line until the area occupation ratio from the preset adjusted boundary line to the internal line of the boundary area is not greater than the preset occupation ratio;

and grabbing boundary pixel values in all pixel values in a grabbing area formed by the boundary line and the internal line of the boundary area, and realizing grabbing of the target face in the dynamic video frame based on the boundary pixel values.

In a possible implementation manner, the performing cluster analysis and recognition on the captured target face to obtain a cluster analysis and recognition result includes:

acquiring a historical face image, and constructing a face feature model according to historical feature attributes contained in the historical face image;

and determining the target characteristic attribute of the captured target face based on the constructed face characteristic model, and performing cluster analysis and identification on the determined target characteristic attribute to obtain a cluster analysis and identification result.

In a possible implementation manner, the determining, based on the constructed face feature model, an object feature attribute of the captured object face includes:

cutting the captured target face to obtain a plurality of cutting area blocks;

calculating the difference degree between each cutting area block in each target face;

when the difference degree is smaller than a preset difference threshold value, acquiring the region block attribute of the cutting region block corresponding to the difference degree smaller than the preset difference threshold value based on the human face feature model;

when the difference degree is not smaller than a preset difference threshold value, performing second correction processing on the cutting area block with the difference degree not smaller than the preset difference threshold value according to a pre-established area block model to obtain an area block to be identified;

based on the face feature model, the region block attribute of the region block to be identified;

and forming target characteristic attributes according to the acquired attributes of all the area blocks related to the target face.

In a possible implementation manner, based on the cluster analysis recognition result, a process of performing a first preset labeling on the same target face in all the captured target faces, and classifying the same target face with the first preset labeling as the same person includes:

classifying the target human faces existing in all the dynamic video frames in the acquired video stream to be identified according to the clustering analysis result;

carrying out first preset marking on target faces existing in the classified similar dynamic video frames;

and obtaining the same target face according to the first preset marking result.

In a possible implementation manner, when an area ratio from a drawn boundary line to an internal line of the boundary area is greater than a preset ratio, the process of performing preset adjustment on the boundary line includes:

determining all boundary lines drawn in the boundary area and connectors of all the boundary lines;

judging whether the inclination angle of the current boundary line in all the determined boundary lines exceeds a preset angle range or not based on a pre-trained line inclination model;

if so, adjusting the inclination angle of the current boundary line to a first preset angle based on the first connecting head of the current boundary line;

meanwhile, when the inclination angle of the current boundary line is adjusted, the inclination angle of the next boundary line connected with the second connector of the current boundary line is adjusted to a second preset angle until all the boundary lines are adjusted;

wherein all of the boundary lines constitute a critical region.

In a possible implementation manner, while determining all boundary lines drawn in the boundary area and all connectors of the boundary lines, the method further includes:

carrying out line identity setting on the determined boundary line and head identity setting on the connector, and storing the line identity and the head identity into an identity database;

when the boundary line is subjected to preset adjustment, the line identity and the head identity of the boundary line subjected to preset adjustment are called from the identity database and are stored in a first type of database, and meanwhile, the line identity and the head identity of the boundary line not subjected to preset adjustment are stored in a second type of database;

and acquiring corresponding boundary lines and connectors based on the stored first-class database and the second-class database, so as to realize the training of the line inclination model.

In one embodiment, the grabbing a target face in a video stream to be recognized; performing cluster analysis and recognition on the captured target face to obtain a cluster analysis and recognition result, wherein the cluster analysis and recognition result comprises the following steps:

determining gray level histograms corresponding to all video frames in the video stream to be identified;

extracting N reference video frames from all the video frames according to a time sequence, wherein P video frames are arranged between two adjacent reference video frames in the N reference video frames;

for each reference video frame, calculating a matching degree between the current reference video frame and each video frame in the P video frames following the current reference video frame according to a preset calculation method, where the preset calculation method includes steps S1-S:

s1, determining a first gray level histogram of the current ith reference video frame;

s2, determining a second gray level histogram of each video frame in the P video frames after the current ith reference video frame;

s3, respectively calculating a matching degree between the first gray level histogram of the current i-th reference video frame and the second gray level histogram of the j-th video frame in the P video frames, where the calculating formula of the matching degree is as follows (1):

wherein, R is_ijThe matching degree between a first gray level histogram of the current ith reference video frame and a second gray level histogram of the jth video frame is obtained; the Ai is a discrete distribution function of a first gray level histogram of the current ith reference video frame; the Bj is a discrete distribution function of a second gray level histogram of the jth video frame; the Mi is the total number of pixels of the current ith reference video frame; the Mj is the total number of pixels of the jth video frame;

s4, determining U video frames with the matching degree in a preset numerical range in the P video frames;

s5, taking the U video frames as a class of face images, randomly extracting one video frame from the U video frames for face recognition, and obtaining a class of face images corresponding to the U video frames;

s6, determining (P-U) video frames with the matching degree outside a preset numerical range in the P video frames, and performing face recognition on a first target video frame with the face feature missing amount not reaching the preset missing amount in the (P-U) video frames to obtain faces corresponding to the first target video frame;

when a second target video frame with the face feature missing amount reaching a preset missing amount exists in the (P-U) video frames, judging whether the part with the face feature missing is missing due to shooting illumination, if so, determining a third gray histogram of the second target video frame, and when the minimum gray value of the third gray histogram is equal to or smaller than a first preset gray value, reducing the preset gray value of each pixel of a first gray histogram corresponding to each reference video frame in the N reference video frames so that the minimum gray value of a processed first gray histogram corresponding to each reference video frame is equal to the minimum gray value of the third gray histogram; when the maximum gray value of the third gray histogram is equal to or greater than a second preset gray value, the second preset gray value is greater than the first preset gray value, and a preset gray value is raised for the gray value of each pixel of the first gray histogram corresponding to each reference video frame in the N reference video frames, so that the maximum gray value of the processed first gray histogram corresponding to each reference video frame is equal to the maximum gray value of the third gray histogram;

calculating the correlation degree between the processed first gray level histogram corresponding to each reference video frame and the third gray level histogram of the second target video frame according to the following formula (2):

wherein S is_ikThe correlation degree between the processed first gray level histogram corresponding to the ith reference video frame and the third gray level histogram of the kth second target video frame is obtained; t represents the abscissa of the pixel; r represents the ordinate of the pixel; t represents the maximum value of the abscissa of the pixel; r represents the maximum value of the ordinate of the pixel; q (t, r) represents the first gray scale after processing corresponding to the ith reference video frameThe gray value of the histogram at the pixel point (t, r); p (t, r) represents the gray value of the third gray histogram of the kth second target video frame at the pixel point (t, r);

and when the correlation degree between the processed first gray level histogram corresponding to the ith reference video frame and the third gray level histogram of the kth second target video frame is equal to or greater than the preset correlation degree, determining that the face corresponding to the kth second target video frame and the face corresponding to the ith reference video frame belong to a class of faces.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of an analysis and recognition method for clustering faces in a video according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The embodiment of the invention provides an analysis and identification method for clustering faces in a video, which comprises the following steps of:

step 1: capturing a target face in a video stream to be identified;

step 2: performing cluster analysis and recognition on the captured target face to obtain a cluster analysis and recognition result;

and step 3: and performing first preset marking on the same target face in all the captured target faces based on the cluster analysis recognition result, and classifying the same target face with the first preset marking as the same person.

The video stream to be recognized can be any video segment including a human face;

the target face is a face in a video, and is required to be subjected to cluster analysis and recognition.

The cluster analysis is to classify the human faces according to the characteristics of the human faces, to ensure that individuals in the same class have higher similarity, and to classify all the individuals in the class as the same person;

the first preset marking of the same target face is to mark the same type of target face, so that the target face is convenient to identify quickly.

The cluster analysis recognition result refers to multiple classes of faces, and each class of faces is obtained by grouping faces with higher similarity together.

The beneficial effects of the above technical scheme are: by adopting clustering analysis and recognition, the classification of the same target face is realized, and the classification efficiency is high.

The embodiment of the invention provides an analysis and identification method for clustering faces in a video, wherein the process of capturing a target face in a video stream to be identified comprises the following steps:

capturing a dynamic video frame in the video stream to be identified;

otherwise, removing the dynamic video frame.

The preset area may be any one of the areas, such as: the office area and the like, where the corresponding video stream to be identified refers to an area video stream of the office area, the area video stream refers to a video stream formed by shooting videos at a time, and the corresponding dynamic video frame refers to an image obtained by cutting the area video stream at a preset time interval, for example, 50 ms/frame, and the image corresponds to a dynamic video frame.

The beneficial effects of the above technical scheme are: the dynamic video frames are reserved or removed to improve the efficiency of cluster analysis and identification, and meanwhile, the occupation of storage space can be effectively reduced by deleting the dynamic video frames without target faces.

The embodiment of the invention provides an analysis and identification method for clustering faces in a video, wherein when a target face exists in a dynamic video frame, the process of capturing the target face in the dynamic video frame comprises the following steps:

The planar coordinate system is a qualified two-dimensional coordinate system determined based on a dynamic video frame, the dynamic video frame comprises a plurality of coordinate points, each coordinate point has a pixel value corresponding to each coordinate point, and the pixel value can be obtained through gray processing, binarization and the like;

the pixel values of all plane coordinate points in the preset comparison area at the current position may be a preset comparison area having the same or similar size as the pixel values at the current position, and the preset comparison area is included in the target face; the current position is a, the pixel value at a is a1, the preset comparison area is an area formed by all pixel values a1, or pixel values close to the pixel value a1 (for example, the absolute value difference range of the pixel values of a1 is within 10) or pixel values not close to the pixel value a1, and the area does not contain the current position a, wherein the preset comparison area is preset, or the preset comparison area can be any area of the target face except the position a.

In the first comparison processing, it is determined that the pixel relevance between the pixel value a1 of the plane coordinate point at the current position a and the pixel values of all the plane coordinate points in the preset comparison area is greater than a preset relevance (which is preset and generally has a relevance of 80% or more), and it is determined that the pixel value a1 of the plane coordinate point at the current position a is qualified;

the pixel correction model is used for correcting the pixel value, and in the process of obtaining the pixel value, conversion defects and other factors may exist in the gray scale conversion process, so that the pixel value needs to be corrected;

the second preset marking is performed on the plane coordinate points corresponding to all qualified pixel values in the same pixel range to obtain a marked image, for example: a pixel range [100,150], in which all qualified pixel values in the range are marked, for example, all pixel values in the same pixel range are represented by the same color, i.e., corresponding coordinate points, to form a marked image, which is convenient for timely understanding the related information;

determining a boundary region of the target face according to the boundary information of the reference face acquired through the historical video stream, where the historical video stream is a video related to the video stream to be recognized, the reference face is a sample face that has been captured, and the boundary information refers to information related to the captured reference face, such as: a boundary region of a reference face;

the drawing of the boundary line of the target face is to reduce the proportion of the boundary region in the captured target face, and when the region proportion from the drawn boundary line to the internal line of the boundary region is greater than the preset proportion.

The beneficial effects of the above technical scheme are: the second preset mark is used for drawing a boundary line of the target face in order to facilitate timely understanding of relevant information, the occupation ratio of the boundary region in the captured target face is reduced, the boundary line is subjected to preset adjustment in order to adjust the occupation ratio, the boundary precision of the captured target face is further improved, boundary pixel values in all pixel values in the captured region are determined, and the boundary line is further accurately determined.

The embodiment of the invention provides an analysis and recognition method for clustering faces in a video, wherein the process of clustering, analyzing and recognizing the captured target faces to obtain a clustering, analyzing and recognizing result comprises the following steps:

The above historical characteristic attributes are as follows: characteristic attributes of eyebrows, eyes, nose, mouth, etc. in a human face;

the face feature model is trained in advance based on a neural network.

The cluster analysis recognition result is, for example, a recognition result of eyebrows, eyes, a nose, a mouth, or the like of the same person, and the eyebrows, the eyes, the nose, the mouth, or the like are classified as the same person due to high similarity.

The beneficial effects of the above technical scheme are: the efficiency of cluster analysis and recognition is improved by determining the target characteristic attribute of the target face.

The embodiment of the invention provides an analysis and identification method for clustering faces in a video, wherein the process of determining the target feature attribute of the captured target face based on the constructed face feature model comprises the following steps:

cutting the captured target face to obtain a plurality of cutting area blocks;

The cutting process is performed by cutting the target face into a plurality of cutting region blocks such as eyebrows, eyes, nose, mouth, etc., and the difference degree may be, for example, a degree opposite to the similarity degree between the eyebrows, eyes, nose, mouth, etc., and the preset difference threshold is preset, and is, for example, between 0-10% difference;

the above region block attribute, for example, refers to an eye attribute at the eye, for example: eye age, eye expression, etc.;

the second correction processing is performed on the cut region block to obtain the region block to be identified, so as to adjust the cutting line of the cut region block, for example: adjusting the area range of the cutting line to a large extent, and the like;

and all the area block attributes form target characteristic attributes, and the target characteristic attributes are attributes of the target to the human face.

The beneficial effects of the above technical scheme are: through carrying out cutting processing and second correction processing, be convenient for adjust cutting region and cutting region boundary, improve the accuracy of acquireing the cutting region, be convenient for catch complete target characteristic attribute, provide the basis for follow-up same target classification.

The embodiment of the invention provides an analysis and recognition method for clustering faces in a video, which is characterized in that based on the clustering analysis and recognition result, the process of carrying out first preset marking on the same target face in all the captured target faces and classifying the same target face with the first preset marking as the same person comprises the following steps:

The classification processing may be to classify the eyes, nose, mouth, and other parts, and classify the eyes, nose, and mouth with high similarity into the same person;

or the human faces with high similarity are classified as the same person directly according to the human face images.

The first preset mark may be a color mark such as a highlight mark.

The beneficial effects of the above technical scheme are: the same target face can be conveniently obtained through the first preset mark.

The embodiment of the invention provides an analysis and identification method for clustering faces in a video, wherein when the area ratio of drawn boundary lines to internal lines of the boundary area is greater than the preset ratio, the process of presetting and adjusting the boundary lines comprises the following steps:

wherein all of the boundary lines constitute a critical region.

If the number of the boundary lines is n, the number of the corresponding connectors is n, and the corresponding critical area should be a closed area.

The above assumptions: the preset angle range is [ -10 degrees, 10 degrees ], and the inclination angle of the current boundary line is as follows: when 20 degrees, adjust the inclination angle of current boundary lines from 20 degrees to first preset angle 10 degrees, and in the same way, the inclination angle of next boundary lines is: when the angle is-20 degrees, adjusting the inclination angle of the next boundary line from-20 degrees to a second preset angle of-10 degrees;

for example: the joint of the current boundary line and the upper boundary line is a first connector, and the joint of the current boundary line and the lower boundary line is a second connector.

The technical scheme has the advantages that the inclination angle of the current boundary line is determined so as to adjust the boundary line, the area occupation ratio from the drawn boundary line to the internal line of the boundary area is not larger than the preset occupation ratio, the effectiveness of capturing the boundary of the target face is improved, the boundary line and the connector are obtained, the boundary line and the connector are adjusted, and the boundary accuracy of constructing the critical area is improved.

The embodiment of the invention provides an analysis and identification method for clustering faces in a video, which determines all boundary lines drawn in the boundary area and connectors of all the boundary lines, and simultaneously comprises the following steps:

The line identity and the head identity are set so as to endow each line and each connector with id, so that the calling is convenient, and the processing efficiency is improved;

the first type of database is a stored line identity and a head identity which are adjusted;

the second type of database is a stored unadjusted line identity and head identity, which is convenient for readjusting the adjusted line identity and head identity in the follow-up process.

The beneficial effects of the above technical scheme are: by expanding the samples, the training of the line inclination model is convenient to realize, and the recognition precision of the line inclination model is improved.

wherein S is_ikThe correlation degree between the processed first gray level histogram corresponding to the ith reference video frame and the third gray level histogram of the kth second target video frame is obtained; t represents the abscissa of the pixel; r represents the ordinate of the pixel; t represents the maximum value of the abscissa of the pixel; r represents the maximum value of the ordinate of the pixel; q (t, r) represents the gray value of the pixel point (t, r) of the processed first gray histogram corresponding to the ith reference video frame; p (t, r) represents the gray value of the third gray histogram of the kth second target video frame at the pixel point (t, r);

The beneficial effects of the above technical scheme are: rapidly calculating the class of the face included in each video frame in the video stream by using the matching degree before the gray histogram; moreover, when a video frame with serious facial feature loss is encountered, the type of the face included in the video frame with serious facial feature loss can be intelligently and conveniently calculated, the clustering efficiency is improved, the probability that the type of the face included in some video frames cannot be determined is reduced, and the integrity of a clustering result is improved.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for analyzing and identifying clusters of human faces in a video is characterized by comprising the following steps:

capturing a target face in a video stream to be identified;

2. The analytical recognition method according to claim 1, wherein the process of grabbing the target face in the video stream to be recognized comprises:

capturing a dynamic video frame in the video stream to be identified;

otherwise, removing the dynamic video frame.

3. The classification recognition method according to claim 2, wherein when the target face exists in the dynamic video frame, the process of grabbing the target face in the dynamic video frame comprises:

4. The classification recognition method according to claim 1, wherein the process of performing cluster analysis recognition on the captured target face to obtain a cluster analysis recognition result comprises:

5. The classification recognition method according to claim 4, wherein the process of determining the object feature attributes of the captured object face based on the constructed face feature model comprises:

cutting the captured target face to obtain a plurality of cutting area blocks;

6. The classification recognition method according to claim 1, wherein the process of performing a first preset labeling on the same target face among all the captured target faces based on the cluster analysis recognition result, and classifying the same target face of the first preset labeling as the same person comprises:

7. The classification and identification method according to claim 3, wherein when the area ratio from the drawn boundary line to the internal line of the boundary area is greater than a preset ratio, the process of performing preset adjustment on the boundary line comprises:

wherein all of the boundary lines constitute a critical region.

8. The classification recognition method according to claim 7, wherein, while determining all the boundary lines drawn in the boundary region and all the connectors of the boundary lines, the method further comprises:

9. The class identification method of claim 1,

capturing a target face in a video stream to be identified; performing cluster analysis and recognition on the captured target face to obtain a cluster analysis and recognition result; the method comprises the following steps:

wherein S is_ikThe correlation degree between the processed first gray level histogram corresponding to the ith reference video frame and the third gray level histogram of the kth second target video frame is obtained; t denotes the abscissa of the pixel(ii) a r represents the ordinate of the pixel; t represents the maximum value of the abscissa of the pixel; r represents the maximum value of the ordinate of the pixel; q (t, r) represents the gray value of the pixel point (t, r) of the processed first gray histogram corresponding to the ith reference video frame; p (t, r) represents the gray value of the third gray histogram of the kth second target video frame at the pixel point (t, r);