CN114241586B

CN114241586B - Face detection method and device, storage medium and electronic equipment

Info

Publication number: CN114241586B
Application number: CN202210154746.2A
Authority: CN
Inventors: 李志华; 杨松
Original assignee: Feihu Information Technology Tianjin Co Ltd
Current assignee: Feihu Information Technology Tianjin Co Ltd
Priority date: 2022-02-21
Filing date: 2022-02-21
Publication date: 2022-05-27
Anticipated expiration: 2042-02-21
Also published as: CN114241586A

Abstract

The invention provides a face detection method and device, a storage medium and electronic equipment, wherein a video to be detected can be divided into a plurality of video segments; performing a detection operation on each video segment; the detecting operation includes: carrying out face recognition on each video frame of the video clip in sequence, and tracking a target face by using a tracker under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present to obtain the tracking result of the target face; and for each video frame in each video segment after the detection operation is executed, determining the face detection result of the video frame according to the face identification result and the tracking result of the video frame under the condition that the face identification result of the video frame represents that the video frame has the target face and the tracking result of the target face exists in the video frame. The accuracy of face detection is greatly improved.

Description

Face detection method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a face detection method and apparatus, a storage medium, and an electronic device.

Background

With the development of the video industry, people have more and more outstanding extension requirements on video contents, in recent years, the artificial intelligence and deep learning technology is rapidly developed, the face recognition technology in the field of computer vision is developed more and more mature, the technology is introduced to the video content understanding, the requirements of users can be effectively met, the video extension information can be obtained on the premise that the users watch videos without being influenced, and therefore the retrieval time of the users is saved.

In the process of detecting the face in the video, a face recognition algorithm is usually adopted to recognize the video frame, however, the existing face recognition algorithm has certain false detection and missing detection, and particularly, the face recognition accuracy is low under the conditions of side face, shielding and the like.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a face detection method, which can improve the accuracy of a face detection result.

The invention also provides a face detection device, which is used for ensuring the realization and the application of the method in practice.

A face detection method, comprising:

acquiring a video to be detected;

dividing the video to be detected into a plurality of video segments;

performing a detection operation on each of the video segments; the detecting operation includes: carrying out face recognition on each video frame of the video clip according to the sequence of video time from first to last, and initializing a tracker of a target face according to the position information of the target face in the face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and tracking the target face by using the tracker to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of which the target face is tracked in the video clip;

and for each video frame in the video segment after the detection operation is executed, determining the face detection result of the video frame according to the face identification result and the tracking result of the video frame when the face identification result of the video frame represents that the target face exists in the video frame and the tracking result of the target face exists in the video frame.

The above method, optionally, further includes:

under the condition that the face recognition result of the video frame represents that a new target face exists in the video frame, initializing a tracker of the new target face according to the position information of the new target face in the face recognition result, and performing backward tracking and/or forward tracking on the new target face by using the tracker until the tracking condition is met to obtain the tracking result of the new target face in each new target video frame; the new target video frame is a video frame of the new target face tracked in the video segment, the backward tracking refers to tracking each video frame to be tracked before the video frame in the video segment according to the sequence of video time from back to front, and the forward tracking refers to tracking each video frame to be tracked after the video frame in the video segment according to the sequence of video time from front to back.

The above method, optionally, further includes:

and for each video frame in the video segment after the detection operation is executed, determining the tracking result as the face detection result of the video frame under the condition that the face recognition result of the video frame represents that the video frame has no target face and the video frame has the tracking result of the target face.

The above method, optionally, further includes:

and for each video frame in the video segments after the detection operation is executed, determining the face recognition result as the face detection result of the video frame under the condition that the face recognition result of the video frame represents that the video frame has a target face and the video frame has no tracking result of the target face.

Optionally, the method for dividing the video to be detected into a plurality of video segments includes:

determining the picture variation range between adjacent video frames in the video to be detected;

dividing the video to be detected according to the picture variation amplitude between adjacent video frames in the video to be detected to obtain a plurality of video clips; for a first video segment and a second video segment in two adjacent video segments, a picture variation range between an end frame of the first video segment and a start frame of the second video segment is greater than a preset variation range threshold, and the first video segment is a previous video segment of the second video segment.

Optionally, the determining the face detection result of the video frame according to the face recognition result and the tracking result of the video frame includes:

determining the intersection ratio between the face recognition frame of the target face in the face recognition result of the video frame and the face tracking frame of the target face in the tracking result of the video frame;

determining the tracking result of the video frame as the face detection result of the video frame under the condition that the intersection ratio is greater than a preset intersection ratio threshold;

comparing the confidence coefficient in the face recognition result of the video frame with the confidence coefficient in the tracking result of the video frame under the condition that the intersection ratio is less than or equal to the intersection ratio threshold value;

determining the face recognition result of the video frame as the face detection result of the video frame under the condition that the confidence coefficient in the face recognition result of the video frame is greater than the confidence coefficient in the tracking result of the video frame;

and under the condition that the confidence coefficient in the face recognition result of the video frame is not greater than the confidence coefficient in the tracking result of the video frame, determining the tracking result of the video frame as the face detection result of the video frame.

The above method, optionally, after determining the face detection result of the video frame, further includes:

determining the time information of the target face appearing in the video to be detected according to the video time of the video frame to which each face detection result belongs;

and outputting the time information of the target face appearing in the video to be detected to a preset front end.

A face detection apparatus comprising:

the acquisition unit is used for acquiring a video to be detected;

the dividing unit is used for dividing the video to be detected into a plurality of video segments;

an execution unit, configured to perform a detection operation on each of the video segments; the detecting operation includes: carrying out face recognition on each video frame of the video clip according to the sequence of video time from first to last, initializing a tracker of a target face according to position information of the target face in a face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and tracking the target face by using the tracker until a preset tracking condition is met to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of which the target face is tracked in the video clip;

and the determining unit is used for determining the face detection result of each video frame in the video segment after the detection operation is executed according to the face recognition result and the tracking result of the video frame when the face recognition result of the video frame represents that the target face exists in the video frame and the tracking result of the target face exists in the video frame.

A storage medium, comprising storage instructions, wherein when the instructions are executed, a device in which the storage medium is located is controlled to execute the above-mentioned face detection method.

An electronic device comprising a memory, and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by one or more processors to perform the face detection method as described above.

Based on the above-mentioned face detection method and apparatus, storage medium and electronic device provided by the implementation of the present invention, the method includes: acquiring a video to be detected; dividing the video to be detected into a plurality of video segments; performing a detection operation on each of the video segments; the detecting operation includes: carrying out face recognition on each video frame of the video clip according to the sequence of video time from first to last, and initializing a tracker of a target face according to the position information of the target face in the face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and tracking the target face by using the tracker to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of which the target face is tracked in the video clip; and for each video frame in the video clips after the detection operation is executed, determining the face detection result of the video frame according to the face recognition result and the tracking result of the video frame under the condition that the face recognition result of the video frame represents that the video frame has the target face and the tracking result of the target face exists in the video frame. By applying the method provided by the embodiment of the invention, the face detection result is determined by combining the face recognition result and the tracking result, so that the face detection precision is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method of face detection according to the present invention;

fig. 2 is a flowchart of a process for dividing a video to be detected into a plurality of video segments according to the present invention;

FIG. 3 is a flowchart of a process for determining a face detection result according to a face recognition result and a tracking result according to the present invention;

fig. 4 is a schematic structural diagram of a face detection apparatus provided in the present invention;

fig. 5 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiment of the invention provides a face detection method, which can be applied to electronic equipment, wherein the method has a flow chart as shown in figure 1, and specifically comprises the following steps:

s101: and acquiring the video to be detected.

In this embodiment, a video to be detected may be obtained when a detection instruction is received, where the video to be detected may be a video downloaded in advance, and the video may include pictures of one or more character objects.

S102: and dividing the video to be detected into a plurality of video segments.

In this embodiment, a video clip may refer to a shot clip, each video clip may include a plurality of video frames, and a picture variation range between an end frame of each video clip and a start frame of a next video clip of the video clip is greater than a preset picture variation range threshold.

Optionally, for each video segment, part or all of the video frames of the video segment may contain face information of the human object.

S103: performing a detection operation on each of the video segments; the detecting operation comprises: carrying out face recognition on each video frame of the video clip according to the sequence of video time from first to last, and initializing a tracker of a target face according to the position information of the target face in the face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and tracking the target face by using the tracker to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of the video clip, wherein the target face is tracked.

In this embodiment, each video frame has its corresponding video time, i.e., the time that the video frame appears during video playback.

Optionally, the process of performing face recognition on the video frame may be: determining whether a face exists in the video frame, extracting face features in the video frame under the condition that the face exists in the video frame, comparing the face features in the video frame with the face features of each target face in a preset target face library, if the Euclidean distance between the face features in the video frame and one of the target face features in the target face library is smaller than a preset distance threshold value, and the euclidean distance between the euclidean distance and the remaining face features in the target face library is not less than the distance threshold, determining that the face recognition result of the video frame represents that the target face exists in the video frame, otherwise, determining that the face recognition result of the video frame represents that the target face does not exist in the video frame, the target face is the face of a target person object to which the target face features with Euclidean distances from the face features of the video frames smaller than a distance threshold value belong in a target face library.

Optionally, the target face library includes face features of a plurality of target person objects, and specifically, the scrfd algorithm may be adopted to perform face detection on the image of the target person, the arcface algorithm is used to perform feature extraction on the detected face, and the extracted face features are stored in the target face library.

Optionally, a tracker may be used to perform forward tracking and/or backward tracking on a target face until a preset tracking condition is met, so as to obtain a tracking result of the target face in each target video frame; the tracker may be a siamRPN algorithm tracker, and the tracking condition may be that the tracking score is smaller than a preset tracking score threshold, or that a termination tracking frame is reached, where the termination tracking frame of the backward tracking is a start frame of the video segment, the termination tracking frame of the forward tracking is an end frame of the video segment, and the tracking score is calculated according to the siamRPN algorithm, and specifically may be a confidence level in the tracking result, that is, the higher the tracking score is, the higher a confidence level of a face tracking frame of the tracking result of the video frame is, and the position information of the target face may be a position of a face recognition frame in the face recognition result.

In some embodiments, the target face may be a first recognized face of the first target human object, and in a case where there are a plurality of first appearing target faces of the first target human object in one video frame, one tracker may be initialized for each target face.

S104: and for each video frame in the video segment after the detection operation is executed, determining the face detection result of the video frame according to the face identification result and the tracking result of the video frame when the face identification result of the video frame represents that the target face exists in the video frame and the tracking result of the target face exists in the video frame.

In this embodiment, one of the face recognition result and the tracking result may be determined as the face detection result of the video frame.

Optionally, the face recognition result may include at least one of a face recognition box, an identifier of a target person object to which the target face belongs, a confidence of the face recognition box, and the like, and the tracking result may include at least one of a face tracking box, an identifier of a target task object to which the target face belongs, a confidence of the face tracking box, and the like.

By applying the method provided by the embodiment of the invention, the face detection result is determined by combining the face recognition result and the tracking result, so that the face detection precision is effectively improved.

In an embodiment provided by the present invention, based on the implementation process, optionally, the implementation process further includes:

under the condition that the face recognition result of the video frame represents that a new target face exists in the video frame, initializing a tracker of the new target face according to the position information of the new target face in the face recognition result, and performing backward tracking and/or forward tracking on the new target face by using the tracker until the tracking condition is met to obtain the tracking result of the new target face in each new target video frame; the new target video frame is a video frame of the new target face tracked in the video segment, the backward tracking refers to tracking a video frame to be tracked before the video frame in the video segment according to a sequence from video time to video time, and the forward tracking refers to tracking a video frame to be tracked after the video frame in the video segment according to a sequence from video time to video time.

In this embodiment, the video frame to be tracked may be a video frame other than the video frame in the video segment.

Optionally, the target person object to which the target face belongs is different from the target person object to which the new target face belongs, and the new target face may be a face of the second target person object.

The embodiment of the invention realizes the tracking of multiple targets by integrating a plurality of single-target trackers, can automatically adjust the number of trackers according to the video content, and effectively reduces unnecessary calculation amount while ensuring the tracking effect.

In an embodiment provided by the present invention, based on the foregoing implementation process, optionally, the process of dividing the video to be detected into a plurality of video segments specifically includes, as shown in fig. 2:

s201: and determining the picture variation range between adjacent video frames in the video to be detected.

In this embodiment, the picture variation range between two adjacent video frames refers to the difference value of all pixels of the pictures of the two video frames.

S202: dividing the video to be detected according to the picture variation amplitude between adjacent video frames in the video to be detected to obtain a plurality of video clips; for a first video segment and a second video segment in two adjacent video segments, a picture variation range between an end frame of the first video segment and a start frame of the second video segment is greater than a preset variation range threshold, and the first video segment is a previous video segment of the second video segment.

In this embodiment, if the picture variation range between two adjacent video frames in the video to be detected is greater than the variation range threshold, the two video frames are segmented, that is, the two video frames are divided into different video segments, and the variation range threshold may be set according to actual requirements.

By applying the method provided by the embodiment of the invention, the video frame interference among different video segments can be effectively removed through the video segmentation, and the consistency of the results in the same video segment can be kept.

In an embodiment of the invention, based on the implementation process, optionally, the process of determining the face detection result of the video frame according to the face recognition result and the tracking result of the video frame may include, as shown in fig. 3:

s301: and determining the intersection ratio between the face recognition frame of the target face in the face recognition result of the video frame and the face tracking frame of the target face in the tracking result of the video frame.

In this embodiment, the intersection ratio IoU between the face recognition box and the face tracking box is compared, and the intersection ratio refers to the ratio of the intersection and union of the two boxes, which reflects the degree of coincidence of the two boxes.

S302: and determining the tracking result of the video frame as the face detection result of the video frame under the condition that the intersection ratio is greater than a preset intersection ratio threshold value.

S303: and comparing the confidence coefficient in the face recognition result of the video frame with the confidence coefficient in the tracking result of the video frame under the condition that the intersection ratio is smaller than or equal to the intersection ratio threshold value.

In this embodiment, the confidence in the face recognition result may represent the credibility of the face recognition box, and may be calculated by a face recognition algorithm; the confidence in the tracking result may represent the confidence level of the face tracking box, which may be calculated by a tracking algorithm.

S304: and under the condition that the confidence coefficient in the face recognition result of the video frame is greater than the confidence coefficient in the tracking result of the video frame, determining the face recognition result of the video frame as the face detection result of the video frame.

S305: and under the condition that the confidence coefficient in the face recognition result of the video frame is not greater than the confidence coefficient in the tracking result of the video frame, determining the tracking result of the video frame as the face detection result of the video frame.

In an embodiment provided by the present invention, based on the foregoing implementation process, optionally after determining a face detection result of a video frame, the method further includes:

The face detection method provided by the embodiment of the invention can be applied to various fields, for example, the face detection method can be used for video star recognition in a video platform, and can be convenient for a user to know stars appearing in a video.

The face recognition algorithm has certain false detection and missing detection, especially under the conditions of side face, shielding and the like, the face recognition accuracy is greatly reduced, and the condition of face recognition missing detection can be supplemented to a great extent by adding the target tracking algorithm. The target tracking algorithm does not have an identification function, only performs characteristic matching on similar regions of previous and next frames, has certain blindness, is easy to generate false detection along with the increase of time sequences, and can update the tracking target for the tracking algorithm in time according to video contents, so that the tracking false detection is reduced. The face recognition provides a target for tracking, the tracking provides supplement for the recognition, and the organic combination of the face recognition and the tracking can effectively identify the star identities in various scenes such as TV dramas and movies, and quickly and accurately meet the query requirements of users. The following description takes the detection of star faces in a video as an example:

step one, a target face library is manufactured.

Optionally, platform star information is collected, face detection is performed on the pictures in the star database by using an scrfd algorithm, features are extracted from the detected face by using an arcface algorithm, and the features are stored in a database.

And step two, video downloading and shot segmentation.

In this embodiment, a video to be detected may be downloaded, the video may be divided into a plurality of shots according to the variation range of the front and rear frames, and the start and stop frames of each shot may be recorded.

The picture variation amplitude refers to the difference value of all pixels of the pictures of the previous frame and the next frame, and then the picture variation amplitude is compared with a threshold value, so that the video is segmented.

And step three, star identification and tracking.

In this embodiment, the face can be detected frame by frame from the initial frame of the shot and compared with the star face library, the successfully recognized star information is added to the tracking set, the tracker is initialized by using the star face frame position information, the forward face tracking is performed by adopting the siamRPN algorithm, and the tracking result is recorded; when a new star is identified, adding new star information into a tracking set, and starting reverse face tracking; after the backward tracing is finished, the forward tracing is continued by using the updated tracing set. And when the score of the tracking algorithm is smaller than a certain tracking score threshold value, the target is automatically stopped being tracked, and if the score is always higher than the tracking score threshold value, the starting frame and the ending frame of the shot are respectively used as stopping conditions of backward tracking and forward tracking. When the lens is switched, the tracking set is emptied, and the process is repeated.

Specifically, the tracker needs an initial region of interest first, and then performs tracking on the extracted features of the region, where initializing the tracker means that the position information of a face frame detected by a face recognition algorithm is assigned to the region of interest of the tracker, and the tracking result includes the tracked face frame position and the tracking score of the position, and the tracking algorithm score is directly output by the tracking algorithm, meaning the confidence of the tracking frame.

It can be seen that the detection and tracking processes are independent for each shot, and the trackers of each star are independent from each other, so that the tracking results are not interfered with each other. The star face feature comparison algorithm adopts an Euclidean distance algorithm, and when the following conditions are simultaneously met, the star recognition is considered to be successful: (1) detecting that the Euclidean distance between the human face features and the target of the star features in the star database is smaller than the Euclidean distance threshold; (2) and the Euclidean distances between the detected face features and other star features to be detected are all larger than the target Euclidean distance.

The face recognition algorithm has strong recognition capability on the front face, and the tracking algorithm can track the detected front face back and forth, so that the face results under the conditions except the front face are included. The integration strategy of the detection result and the tracking result can effectively deal with various complex conditions in the picture, such as character interleaving, occlusion and the like.

And step four, integrating results.

In this embodiment, the face recognition result is used as the final face detection result when only the face recognition result is available in the current frame, and the tracking result is used as the final face detection result when only the tracking result is available in the current frame. When the face recognition result and the tracking result exist at the same time, judging the face frame area relationship in the face recognition result and the tracking result, and when the intersection ratio IoU is smaller than a threshold value, taking the face recognition algorithm and the tracking algorithm with high scores as the final face detection result; and when the intersection ratio IoU is larger than the threshold value, taking the face recognition result as the final face detection result. Due to the blindness and the gradual change of the tracking algorithm, when the face frames are staggered, the tracking result is often inaccurate. And finally, converting the star information of each frame into time period data of the appearance of each star and outputting the time period data to the front end.

Specifically, for a plurality of tracked face frames, calculating the intersection ratio (IoU) between every two face frames, wherein when IoU is small, the coincidence degree between the tracked frames is small, the tracking result is accurate, and the final result is obtained when the score of the detection result and the tracking result is high; when IoU is larger, it indicates that there is a larger degree of coincidence between the tracking frames, and the tracking result is often inaccurate.

The detection capability of the face detection algorithm has an upper limit, the face detection often cannot well play a role under the conditions of side faces and shielding in practical application scenes, and the combined target tracking algorithm can greatly break through the limit, so that the face technology can be better applied to a video star recognition task. The existing multi-target tracking algorithms such as SORT and DeepsORT all use Kalman filtering and Hungarian algorithms, and the calculation amount of the algorithms is large. The invention realizes the tracking of multiple targets by integrating a plurality of single-target trackers, can automatically adjust the number of trackers according to the video content, and effectively reduces unnecessary calculation amount while ensuring the tracking effect.

Corresponding to the method described in fig. 1, an embodiment of the present invention further provides a face detection apparatus, which is used for specifically implementing the method in fig. 1, and the face detection apparatus provided in the embodiment of the present invention may be applied to an electronic device, and a schematic structural diagram of the face detection apparatus is shown in fig. 4, and specifically includes:

an obtaining unit 401, configured to obtain a video to be detected;

a dividing unit 402, configured to divide the video to be detected into a plurality of video segments;

an execution unit 403, configured to perform a detection operation on each of the video segments; the detecting operation includes: carrying out face recognition on each video frame of the video clip according to the sequence of video time from first to last, initializing a tracker of a target face according to position information of the target face in a face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and tracking the target face by using the tracker until a preset tracking condition is met to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of which the target face is tracked in the video clip;

a determining unit 404, configured to, for each video frame in the video segment after the detection operation is performed, determine a face detection result of the video frame according to a face recognition result and a tracking result of the video frame when a face recognition result of the video frame indicates that the video frame has the target face and the video frame has the tracking result of the target face.

In an embodiment provided by the present invention, based on the above scheme, optionally, the face detection apparatus further includes: a first processing unit;

the first processing unit is configured to initialize a tracker of a new target face according to position information of the new target face in the face recognition result when the face recognition result of the video frame indicates that the video frame has the new target face, and perform backward tracking and/or forward tracking on the new target face by using the tracker until the tracking condition is met, so as to obtain a tracking result of the new target face in each new target video frame; the new target video frame is a video frame of the new target face tracked in the video segment, the backward tracking refers to tracking each video frame to be tracked before the video frame in the video segment according to the sequence of video time from back to front, and the forward tracking refers to tracking each video frame to be tracked after the video frame in the video segment according to the sequence of video time from front to back.

In an embodiment provided by the present invention, based on the above scheme, optionally, the face detection apparatus further includes: a second processing unit for performing a second processing operation,

the second processing unit is configured to, for each video frame in the video segment after the detection operation is performed, determine, when a face recognition result of the video frame indicates that the video frame does not have a target face, and the video frame has a tracking result of the target face, the tracking result as a face detection result of the video frame.

In an embodiment provided by the present invention, based on the above scheme, optionally, the face detection apparatus further includes: a third processing unit for performing a third processing operation,

the third processing unit is configured to, for each video frame in the video segment after the detection operation is performed, determine a face recognition result as a face detection result of the video frame when a face recognition result of the video frame indicates that a target face exists in the video frame and a tracking result of the target face does not exist in the video frame.

In an embodiment provided by the present invention, based on the foregoing scheme, optionally, the dividing unit 403 includes:

the first determining subunit is used for determining the picture variation range between adjacent video frames in the video to be detected;

the dividing subunit is used for dividing the video to be detected according to the picture variation amplitude between adjacent video frames in the video to be detected to obtain a plurality of video segments; for a first video segment and a second video segment in two adjacent video segments, a picture variation range between an end frame of the first video segment and a start frame of the second video segment is greater than a preset variation range threshold, and the first video segment is a previous video segment of the second video segment.

In an embodiment provided by the present invention, based on the above scheme, optionally, the determining unit 404 includes:

a second determining subunit, configured to determine an intersection ratio between a face recognition frame of the target face in the face recognition result of the video frame and a face tracking frame of the target face in the tracking result of the video frame;

a third determining subunit, configured to determine, when the intersection ratio is greater than a preset intersection ratio threshold, a tracking result of the video frame as a face detection result of the video frame;

the comparison subunit is used for comparing the confidence in the face recognition result of the video frame with the confidence in the tracking result of the video frame under the condition that the intersection ratio is smaller than or equal to the intersection ratio threshold;

a fourth determining subunit, configured to determine, when the confidence in the face recognition result of the video frame is greater than the confidence in the tracking result of the video frame, the face recognition result of the video frame as the face detection result of the video frame;

a fifth determining subunit, configured to determine, when the confidence in the face recognition result of the video frame is not greater than the confidence in the tracking result of the video frame, the tracking result of the video frame as the face detection result of the video frame.

In an embodiment provided by the present invention, based on the above scheme, optionally, the face detection apparatus further includes:

the fourth processing unit is used for determining the time information of the target face appearing in the video to be detected according to the video time of the video frame to which each face detection result belongs;

and the output unit is used for outputting the time information of the target face appearing in the video to be detected to a preset front end.

The specific principle and the execution process of each unit and each module in the face detection device disclosed in the embodiment of the present invention are the same as those of the face detection method disclosed in the embodiment of the present invention, and reference may be made to corresponding parts in the face detection method provided in the embodiment of the present invention, which are not described herein again.

The embodiment of the invention also provides a storage medium, which comprises a stored instruction, wherein when the instruction runs, the equipment where the storage medium is located is controlled to execute the face detection method.

An embodiment of the present invention further provides an electronic device, a schematic structural diagram of which is shown in fig. 5, and the electronic device specifically includes a memory 501 and one or more instructions 502, where the one or more instructions 502 are stored in the memory 501, and are configured to be executed by one or more processors 503 to perform the following operations for the one or more instructions 502:

acquiring a video to be detected;

dividing the video to be detected into a plurality of video segments;

performing a detection operation on each of the video segments; the detecting operation includes: carrying out face recognition on each video frame of the video clip according to the sequence of video time from first to last, initializing a tracker of a target face according to position information of the target face in a face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and tracking the target face by using the tracker until a preset tracking condition is met to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of which the target face is tracked in the video clip;

It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the units may be implemented in the same software and/or hardware or in a plurality of software and/or hardware when implementing the invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

The face detection method provided by the invention is described in detail above, and the principle and the implementation mode of the invention are explained by applying specific examples, and the description of the examples is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A face detection method, comprising:

acquiring a video to be detected;

dividing the video to be detected into a plurality of video segments;

performing a detection operation on each of the video segments; the detecting operation includes: performing face recognition on each video frame of the video clip according to the sequence of video time from first to last, and initializing a tracker of a target face according to position information of the target face in a face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and performing forward tracking and/or backward tracking on the target face by using the tracker until a preset tracking condition is met, so as to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of which the target face is tracked in the video clip; the target face is a face of a first target person object recognized for the first time, and if a plurality of target faces of the first target person object appear for the first time in one video frame, a tracker is respectively initialized for each target face; the tracker is a siamrPN algorithm tracker;

for each video frame in the video clips after the detection operation is executed, determining a face detection result of the video frame according to a face recognition result and a tracking result of the video frame under the condition that the face recognition result of the video frame represents that the video frame has the target face and the video frame has the tracking result of the target face;

the dividing the video to be detected into a plurality of video segments comprises:

determining the picture variation range between adjacent video frames in the video to be detected; the picture change amplitude is the difference value of all pixels of two adjacent video frame pictures;

2. The method of claim 1, further comprising:

initializing a tracker of a new target face according to the position information of the new target face in the face recognition result under the condition that the face recognition result of the video frame represents that the video frame has the new target face, and performing backward tracking and/or forward tracking on the new target face by using the tracker to obtain the tracking result of the new target face in each new target video frame; the new target video frame is a video frame of the new target face tracked in the video segment, the backward tracking refers to tracking each video frame to be tracked before the video frame in the video segment according to the sequence of video time from back to front, and the forward tracking refers to tracking each video frame to be tracked after the video frame in the video segment according to the sequence of video time from front to back.

3. The method of claim 1, further comprising:

4. The method of claim 1, further comprising:

and for each video frame in the video clips after each detection operation is executed, determining the face recognition result as the face detection result of the video frame under the condition that the face recognition result of the video frame represents that the video frame has a target face and the video frame does not have the tracking result of the target face.

5. The method of claim 1, wherein determining the face detection result of the video frame according to the face recognition result and the tracking result of the video frame comprises:

6. The method according to any one of claims 1 to 5, further comprising, after determining the face detection result of the video frame:

7. A face detection apparatus, comprising:

the acquisition unit is used for acquiring a video to be detected;

an execution unit, configured to perform a detection operation on each of the video segments; the detecting operation includes: performing face recognition on each video frame of the video clip according to the sequence of video time from first to last, and initializing a tracker of a target face according to position information of the target face in a face recognition result under the condition that the face recognition result of the video frame represents the target face existing in the video frame and the tracking result of the target face does not exist at present, and performing forward tracking and/or backward tracking on the target face by using the tracker until a preset tracking condition is met, so as to obtain the tracking result of the target face in each target video frame; the target video frame is a video frame of which the target face is tracked in the video clip; the target face is a face of a first target person object recognized for the first time, and if a plurality of target faces of the first target person object appear for the first time in one video frame, a tracker is respectively initialized for each target face; the tracker is a siamrPN algorithm tracker;

a determining unit, configured to, for each video frame in the video segment after the detection operation is performed, determine, according to a face recognition result of the video frame and a tracking result of the target face, a face detection result of the video frame when a face recognition result of the video frame indicates that the video frame has the target face and the video frame has the tracking result of the target face;

8. A storage medium, characterized in that the storage medium comprises a storage instruction, wherein when the instruction runs, a device on which the storage medium is located is controlled to execute the face detection method according to any one of claims 1 to 6.

9. An electronic device comprising a memory and one or more instructions, wherein the one or more instructions are stored in the memory and configured to be executed by the one or more processors to perform the method of face detection according to any one of claims 1-6.