CN109063611B - Face recognition result processing method and device based on video semantics - Google Patents

Face recognition result processing method and device based on video semantics Download PDF

Info

Publication number
CN109063611B
CN109063611B CN201810797921.3A CN201810797921A CN109063611B CN 109063611 B CN109063611 B CN 109063611B CN 201810797921 A CN201810797921 A CN 201810797921A CN 109063611 B CN109063611 B CN 109063611B
Authority
CN
China
Prior art keywords
video
face
segments
video segments
gray level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810797921.3A
Other languages
Chinese (zh)
Other versions
CN109063611A (en
Inventor
沈灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Moviebook Technology Corp ltd
Original Assignee
Beijing Moviebook Technology Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Moviebook Technology Corp ltd filed Critical Beijing Moviebook Technology Corp ltd
Priority to CN201810797921.3A priority Critical patent/CN109063611B/en
Publication of CN109063611A publication Critical patent/CN109063611A/en
Application granted granted Critical
Publication of CN109063611B publication Critical patent/CN109063611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The application discloses a face recognition result processing method and device based on video semantics. The method comprises the following steps: detecting and tracking a face of a video to obtain a plurality of video segments, wherein each video frame in the video segments comprises the face of the same person; identifying the face in each video clip to obtain the name of a person in the video clip; in the case where the interval between two preceding and succeeding video segments is less than or equal to the first threshold, if the names of persons in the two video segments are the same and the similarity of the video segments is greater than or equal to the second threshold, the two video segments are merged. The method can carry out face detection, tracking and identification on the given video, merge the video segments of the same person by analyzing the interval, the person name and the similarity of the video segments, avoid fragmentation of a segmentation result and improve the accuracy of the identification result.

Description

Face recognition result processing method and device based on video semantics
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for processing a face recognition result based on video semantics.
Background
With the development of multimedia technology, digital video has become an important way for people to record, transmit and communicate information. The development of broadband has led the internet to enter the network video era, and not only has the new application explosion development of the former long video episode, comprehensive art and the like, the short videos in recent two years, live video and the like, but also the application of the business scenes surrounding the videos is increased. Nowadays, video-based face processing is different from traditional still detection based on pictures, and can detect targets, obtain appearance information of targets in multiple frames and motion information of the targets among multiple frames. However, in videos, especially in movie and art shows, faces may twist from time to time, and expressions are exaggerated and change rapidly. Under the condition of the characteristics, face tracking is disconnected, so that a person can be identified into a plurality of discontinuous segments in a continuous time period, and subsequent video analysis and processing are affected, for example, video segmentation or interception based on the identification result can cause a user to think that the processing result is inaccurate, and user experience is affected.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to one aspect of the application, a face recognition result processing method based on video semantics is provided, and comprises the following steps:
a human face detection step: detecting and tracking a face of a video to obtain a plurality of video segments, wherein each video frame in the video segments comprises the face of the same person;
a face recognition step: identifying the face in each video clip to obtain the name of a person in the video clip;
video clip merging step: in the case where the interval between two preceding and succeeding video segments is less than or equal to the first threshold, if the names of persons in the two video segments are the same and the similarity of the video segments is greater than or equal to the second threshold, the two video segments are merged.
By adopting the method, the video segments of the same person can be combined by carrying out face detection, tracking and identification on the given video and analyzing the intervals, the names and the similarity of the persons even under the condition of tracking failure, so that the defect of the detection, tracking and identification process is overcome, fragmentation of the segmentation result is avoided, and the accuracy of the identification result is improved.
Optionally, the face detecting step includes: for each video frame in the video, face detection is carried out through a classifier, the detected face is tracked, continuous video frames comprising the face of the same person are taken as a video segment, and therefore the video is divided into a plurality of video segments.
This step enables the video to be initially segmented quickly by person through the detection and tracking of faces.
Optionally, the face recognition step includes:
a face screenshot selecting step: for each video clip, selecting a face screenshot in the video clip;
an identification step: and carrying out face recognition on the face screenshot by utilizing a neural network to obtain the name of the figure in the video clip.
Through the steps, the identity of the person in each video clip can be obtained, so that the subsequent video clips can be combined conveniently; the neural network is used for identifying the face screenshot, so that the identification accuracy is high and the processing speed is high.
Optionally, in the video segment merging step, the calculating of the similarity of the video segments includes:
respectively reducing the last video frame of the previous video clip and the first video frame of the next video clip to the number of pixels of a first number, quantizing the gray level of each pixel, comparing the gray level of each quantized pixel with the gray level average value of the video frame, recording the gray level average value as 1, and recording the gray level average value as 0, thereby obtaining the fingerprint sequence of each video frame, and taking the number of numerical values with the same size of corresponding positions in the two fingerprint sequences as the similarity of the video clips.
By adopting the step, the gray level of the video is simplified, the subsequent calculation amount can be reduced, the gray level is subjected to standardization processing, and the video frames can be compared under a unified standard, so that the accuracy of the similarity calculation of the video frames is improved.
Optionally, after the video segment merging step, the method further includes:
and a result output step: and repeating the video segment merging step until the video segments cannot be merged to obtain a final video segmentation result and a character name recognition result.
According to another aspect of the present application, there is also provided a face recognition result processing apparatus based on video semantics, including:
the face detection module is configured to detect and track a face of a video to obtain a plurality of video segments, wherein each video frame in the video segments comprises the face of the same person;
the face recognition module is configured to recognize the face in each video segment to obtain the name of a person in the video segment; and
and the video segment merging module is configured to merge two video segments if the names of people in the two video segments are the same and the similarity of the video segments is greater than or equal to a second threshold value when the interval between the two video segments is less than or equal to the first threshold value.
By adopting the device, the face detection, tracking and identification can be carried out on the given video, and even under the condition that the tracking is broken, the video segments of the same person are combined by analyzing the interval, the name and the similarity of the person in the video segments, so that the defect of the detection, tracking and identification process is overcome, the fragmentation of the segmentation result is avoided, and the accuracy of the identification result is improved.
Optionally, the face detection module is further configured to: for each video frame in the video, face detection is carried out through a classifier, the detected face is tracked, continuous video frames comprising the face of the same person are taken as a video segment, and therefore the video is divided into a plurality of video segments.
Optionally, the face recognition module includes:
a face screenshot selecting module configured to select, for each video segment, a face screenshot in the video segment; and
and the recognition module is configured to perform face recognition on the face screenshot by using a neural network to obtain the name of the person in the video clip.
According to another aspect of the present application, there is also provided a computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, wherein the processor implements the method as described above when executing the computer program.
According to another aspect of the application, there is also provided a computer-readable storage medium, preferably a non-volatile readable storage medium, having stored therein a computer program which, when executed by a processor, implements the method as described above.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:
FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a video semantic based face recognition result processing method in accordance with the present application;
FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a video semantic-based face recognition result processing method according to the present application;
FIG. 3 is a schematic block diagram of one embodiment of a video semantic based face recognition result processing apparatus according to the present application;
FIG. 4 is a schematic block diagram of another embodiment of a video semantic based face recognition result processing apparatus according to the present application;
FIG. 5 is a block diagram of one embodiment of a computing device of the present application;
FIG. 6 is a block diagram of one embodiment of a computer-readable storage medium of the present application.
Detailed Description
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
One embodiment of the application provides a face recognition result processing method based on video semantics. Fig. 1 is a schematic flow chart diagram of an embodiment of a video semantic-based face recognition result processing method according to the present application. The method can comprise the following steps:
s100, a human face detection step: detecting and tracking a face of a video to obtain a plurality of video segments, wherein each video frame in the video segments comprises the face of the same person;
s200, face recognition: identifying the face in each video clip to obtain the name of a person in the video clip;
s300, video clip merging step: in the case where the interval between two preceding and succeeding video segments is less than or equal to the first threshold, if the names of persons in the two video segments are the same and the similarity of the video segments is greater than or equal to the second threshold, the two video segments are merged.
By adopting the method, the video segments of the same person can be combined by carrying out face detection, tracking and identification on the given video and analyzing the intervals, the names and the similarity of the persons even under the condition of tracking failure, so that the defect of the detection, tracking and identification process is overcome, fragmentation of the segmentation result is avoided, and the accuracy of the identification result is improved.
In an alternative embodiment, the S100 face detection step includes: for each video frame in the video, face detection is carried out through a classifier, the detected face is tracked, continuous video frames comprising the face of the same person are taken as a video segment, and therefore the video is divided into a plurality of video segments.
This step enables the video to be initially segmented quickly by person through the detection and tracking of faces.
Alternatively, the classifier may include a nearest neighbor classifier, a linear classifier. For example, face detection may be implemented by detecting key points of a face based on an adaptive enhancement (AdaBoost) classifier.
Face tracking may be implemented using a mean shift (mean shift) algorithm. For example, face tracking may be achieved by: in a certain frame, selecting a position containing a tracked target rectangle A1 according to the key points of the human face detected by an AdaBoost classifier; in the next frame, detecting a candidate target rectangle A2 according to an AdaBoost classifier, performing background judgment on all pixels of a target rectangle A2, and if the candidate target rectangle A2 is judged to be a background, recording an indication function BI2 as 1, otherwise, recording as 0; calculating the probability density of the target rectangle A2 according to an indication function BI2, and calculating the distance difference between the positions of the candidate target rectangle A2 and the rectangle A1 according to the position of the candidate target rectangle A2; and if the probability density is greater than or equal to a set probability density threshold value and the distance difference value is greater than or equal to a set distance threshold value, determining that the frame and the previous frame contain the human face of the same person, and dividing the two video frames into the same video clip.
By the method, the face of the video is detected and tracked, a plurality of video segments can be obtained, and the start frame and the end frame of each video segment are recorded. When a certain video frame comprises more than two persons, the method can also realize multi-target detection and tracking, for example, a video has two faces, and in the video segments obtained by detection and tracking, all the video segments comprising the first face and all the video segments comprising the second face have cross repeated parts in time, but each video is only marked with the result of one face.
For example, the following result can be obtained for a video through step S100:
a first video clip, the starting and ending time of which is 0 to 5 seconds, is marked as a first face;
a second video segment, with a start-stop time of 6 to 8 seconds, marked as a second face;
a third video segment, the starting and ending time of which is 7 to 10 seconds, is marked as a first face;
a fourth video segment, beginning and ending times 13 to 15 seconds, marked as a second face, and so on.
In an alternative embodiment, the S200 face recognition step includes:
a face screenshot selecting step: for each video clip, selecting a face screenshot in the video clip;
an identification step: and carrying out face recognition on the face screenshot by utilizing a neural network to obtain the name of the figure in the video clip.
Through the steps, the identity of the person in each video clip can be obtained, so that the subsequent video clips can be combined conveniently; the neural network is used for identifying the face screenshot, so that the identification accuracy is high and the processing speed is high.
Optionally, in the step of selecting the face shots, the face shots with the highest quality in the video segments may be selected. The highest quality is shown, the face position in the shot image is positive, the light is good, the expression is normal, and the expression is not an exaggerated expression such as crying and laughing, the judgment can be carried out by defining a quality function, and the quality function can be realized by quantizing the standard into parameters of a classifier and the classifier. And inputting the selected face screenshot into a trained neural network for recognition to obtain a character name corresponding to each video clip.
In the identification step, the neural network can be used for carrying out face identification on the face screenshot to obtain the name of the person in the video clip. Optionally, the neural network may be a VGG network. For the face screenshot of the video clip, determining a person name and a confidence coefficient through a trained VGG network model to obtain a first identity information set, wherein the first identity information set at least comprises one person name and the confidence coefficient of the person name. In the training stage, more than 1000 persons of face picture data are used as training data, and each person is not less than 100 persons, including various angles from the front to the side. The VGG network model training results should satisfy that the average accuracy (mapp) of the test set for the target video screenshot is greater than a set threshold, e.g., 0.94. It can be understood that the model such as VGG can be used for training, and the existing face recognition tool can also be used for recognition.
Optionally, in the video segment merging step of S300, first two video segments before and after are selected, and if the interval between the two video segments before and after is greater than or equal to the first threshold, no processing is performed, that is, the original segmented video segments are kept unchanged. Alternatively, the first threshold is 2 seconds, and if there is a break in two seconds, the presence of a person in the video is considered to be continuous according to the subjective feeling of the person watching the video, so that setting the first threshold to 2 seconds is a preferable scheme.
And if the interval between the front video segment and the rear video segment is smaller than a first threshold value, comparing the names of the persons identified in the video segments, and if the names of the persons are different, not processing the names.
If the names of the people are the same, the last video frame of the previous video clip and the first video frame of the next video clip are taken out, and the similarity of the video clips is compared. If the video clips are not similar, no processing is carried out; if the video segments are similar, the video between the starting frame of the previous video segment and the ending frame of the next video segment is taken as a video segment.
For the similarity of the video segments, in an alternative embodiment, the step of calculating the similarity of the video segments includes: the method comprises the steps of obtaining a first video frame gray value set and a second video frame gray value set respectively based on gray values of pixels of a last video frame of a previous video clip and a first video frame of a next video clip, sequentially comparing the sizes of element values in the first video frame gray value set and the second video frame gray value set, regarding elements with difference values meeting constraint conditions as same elements, and regarding the number of the same elements in the first video frame gray value set and the second video frame gray value set as similarity.
By adopting the steps, the similarity of the video frames can be calculated according to the gray level of the video, and the algorithm is simple, high in calculation speed and high in accuracy.
In this step, the gray values of the previous video segment are sequentially combined into a first video frame gray value set, the gray values of the pixels of the first video frame of the next video segment are sequentially combined into a second video frame gray value set, the sizes of the element values in the first video frame gray value set and the second video frame gray value set are sequentially compared, the elements whose difference values satisfy the constraint condition are regarded as the same elements, for example, the constraint condition is that the difference value is less than or equal to 10, the number of the elements with the same gray value set of the two video frames is counted, and the number is used as the similarity of the video frames.
In an alternative embodiment, the step of calculating the similarity of the video segments comprises: respectively reducing the last video frame of the previous video clip and the first video frame of the next video clip to the number of pixels of a first number, quantizing the gray level of each pixel, comparing the gray level of each quantized pixel with the gray level average value of the video frame, recording the gray level average value as 1, and recording the gray level average value as 0, thereby obtaining the fingerprint sequence of each video frame, and taking the number of numerical values with the same size of corresponding positions in the two fingerprint sequences as the similarity.
By adopting the step, the gray level of the video is simplified, the subsequent calculation amount can be reduced, the gray level is subjected to standardization processing, and the video frames can be compared under a unified standard, so that the accuracy of the similarity calculation of the video frames is improved.
Optionally, in this step, two video frames are reduced to 8 × 8, 64 pixels in total are obtained, each pixel is quantized according to 64 levels of gray scale, for one video frame, the gray scale average value of the 64 pixels after quantization is taken as the gray scale average value of the video frame, the gray scale of each pixel of the video frame is compared with the gray scale average value, the value greater than or equal to the gray scale average value is taken as 1, the value less than the gray scale average value is taken as 0, and the comparison results are combined together to form a sequence containing 64 numbers, that is, a fingerprint sequence of the video frame. Comparing the fingerprint sequences of the two video frames, if the number of different digits does not exceed 5, the two video frames are similar.
Fig. 2 is a schematic flow chart diagram of another embodiment of a face recognition result processing method based on video semantics according to the present application. In an alternative embodiment, after the video segment merging step, the method further comprises:
s400, result output step: and repeating the video segment merging step until the video segments cannot be merged to obtain a final video segmentation result and a character name recognition result.
And after circularly and repeatedly analyzing all the remaining identification result segments, sequencing the obtained video segments and the person name identification results according to time to obtain a final identification result.
One embodiment of the application also provides a face recognition result processing device based on video semantics. Fig. 3 is a schematic block diagram of an embodiment of a face recognition result processing apparatus based on video semantics according to the present application. The apparatus may include:
a face detection module 100 configured to detect and track a face of a video to obtain a plurality of video segments, where each video frame in the video segments includes a face of the same person;
a face recognition module 200 configured to recognize a face in each video segment to obtain a name of a person in the video segment;
and a video segment merging module 300 configured to merge two video segments before and after the interval between the two video segments is smaller than or equal to a first threshold if the names of people in the two video segments are the same and the similarity of the video segments is greater than or equal to a second threshold.
By adopting the device, the face detection, tracking and identification can be carried out on the given video, and even under the condition that the tracking is broken, the video segments of the same person are combined by analyzing the interval, the name and the similarity of the person in the video segments, so that the defect of the detection, tracking and identification process is overcome, the fragmentation of the segmentation result is avoided, and the accuracy of the identification result is improved.
Optionally, the face detection module 100 is further configured to: for each video frame in the video, face detection is carried out through a classifier, the detected face is tracked, continuous video frames comprising the face of the same person are taken as a video segment, and therefore the video is divided into a plurality of video segments.
Optionally, the face recognition module 200 includes:
a face screenshot selecting module configured to select, for each video segment, a face screenshot in the video segment;
and the recognition module is configured to perform face recognition on the face screenshot by using a neural network to obtain the name of the person in the video clip.
Through the face recognition module, the identity of a person in each video clip can be obtained, so that the subsequent video clips can be combined conveniently; the neural network is used for identifying the face screenshot, so that the identification accuracy is high and the processing speed is high.
Optionally, the apparatus further includes a similarity calculation module, and in an embodiment, the similarity calculation module is configured to obtain a first video frame gray value set and a second video frame gray value set based on gray values of pixels of a last video frame of a previous video segment and a first video frame of a subsequent video segment, respectively, compare sizes of respective element values in the first video frame gray value set and the second video frame gray value set in sequence, identify an element whose difference value satisfies a constraint condition as a same element, and use the number of the same element in the first video frame gray value set and the second video frame gray value set as a similarity.
In another embodiment, the similarity calculation module is configured to respectively reduce the last video frame of the previous video segment and the first video frame of the next video segment to a first number of pixels, quantize the gray level of each pixel, compare the gray level of each pixel after quantization with the average gray level value of the video frame, and record the gray level of each pixel as 1 or more and record the gray level of each pixel as 0 or less, thereby obtaining the fingerprint sequence of each video frame, and use the number of values with the same size at corresponding positions in the two fingerprint sequences as the similarity.
Fig. 4 is a schematic block diagram of another embodiment of a face recognition result processing apparatus based on video semantics according to the present application. In an alternative embodiment, the apparatus may further comprise:
a result output module 400 configured to repeat the video segment merging step until the video segments cannot be merged, resulting in a final video segmentation result and a person name recognition result.
Embodiments of the present application also provide a computing device, referring to fig. 5, comprising a memory 1120, a processor 1110 and a computer program stored in said memory 1120 and executable by said processor 1110, the computer program being stored in a space 1130 for program code in the memory 1120, the computer program realizing for performing any of the method steps 1131 according to the present invention when executed by the processor 1110.
Embodiments of the present application also provide a computer-readable storage medium. Referring to fig. 6, the computer readable storage medium comprises a storage unit for program code provided with a program 1131' for performing the steps of the method according to the invention, which program is executed by a processor.
Embodiments of the present application also provide a computer program product containing instructions comprising computer readable code which, when executed by a computing device, causes the computing device to perform the method as described above.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. A face recognition result processing method based on video semantics comprises the following steps:
a human face detection step: detecting and tracking a face of a video to obtain a plurality of video segments, wherein each video frame in the video segments comprises the face of the same person;
a face recognition step: for each video clip, selecting a face screenshot with the highest quality in the video clip, and performing face recognition on the face screenshot by using a VGG (virtual vapor group) network to obtain a character name in the video clip; and
video clip merging step: if the distance between two front video segments and the distance between two rear video segments is smaller than or equal to a first threshold value, if the names of people in the two video segments are the same and the similarity of the video segments is larger than or equal to a second threshold value, merging the two video segments;
wherein the step of calculating the similarity of the video segments comprises the following steps:
respectively reducing the last video frame of the previous video clip and the first video frame of the next video clip to the number of pixels of a first number, quantizing the gray level of each pixel, comparing the gray level of each quantized pixel with the gray level average value of the video frame, recording the gray level average value as 1, and recording the gray level average value as 0, thereby obtaining the fingerprint sequence of each video frame, and taking the number of numerical values with the same size of corresponding positions in the two fingerprint sequences as the similarity of the video clips.
2. The method of claim 1, wherein the face detection step comprises: for each video frame in the video, face detection is carried out through a classifier, the detected face is tracked, continuous video frames comprising the face of the same person are taken as a video segment, and therefore the video is divided into a plurality of video segments.
3. The method of claim 1, wherein after the video segment merging step, the method further comprises:
and a result output step: and repeating the video segment merging step until the video segments cannot be merged to obtain a final video segmentation result and a character name recognition result.
4. A face recognition result processing device based on video semantics comprises:
the face detection module is configured to detect and track a face of a video to obtain a plurality of video segments, wherein each video frame in the video segments comprises the face of the same person;
the face recognition module is configured to select a face screenshot with the highest quality in each video clip, and perform face recognition on the face screenshot by using a VGG (virtual vapor group) network to obtain a character name in the video clip; and
a video segment merging module configured to merge two preceding video segments if the names of people in the two video segments are the same and the similarity of the video segments is greater than or equal to a second threshold value if the interval between the two video segments is less than or equal to the first threshold value;
the similarity of the video clips is calculated by adopting the following steps:
respectively reducing the last video frame of the previous video clip and the first video frame of the next video clip to the number of pixels of a first number, quantizing the gray level of each pixel, comparing the gray level of each quantized pixel with the gray level average value of the video frame, recording the gray level average value as 1, and recording the gray level average value as 0, thereby obtaining the fingerprint sequence of each video frame, and taking the number of numerical values with the same size of corresponding positions in the two fingerprint sequences as the similarity of the video clips.
5. The apparatus of claim 4, wherein the face detection module is further configured to: for each video frame in the video, face detection is carried out through a classifier, the detected face is tracked, continuous video frames comprising the face of the same person are taken as a video segment, and therefore the video is divided into a plurality of video segments.
6. A computing device comprising a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein the processor implements the method of any of claims 1 to 3 when executing the computer program.
7. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 3.
CN201810797921.3A 2018-07-19 2018-07-19 Face recognition result processing method and device based on video semantics Active CN109063611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810797921.3A CN109063611B (en) 2018-07-19 2018-07-19 Face recognition result processing method and device based on video semantics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810797921.3A CN109063611B (en) 2018-07-19 2018-07-19 Face recognition result processing method and device based on video semantics

Publications (2)

Publication Number Publication Date
CN109063611A CN109063611A (en) 2018-12-21
CN109063611B true CN109063611B (en) 2021-01-05

Family

ID=64817435

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810797921.3A Active CN109063611B (en) 2018-07-19 2018-07-19 Face recognition result processing method and device based on video semantics

Country Status (1)

Country Link
CN (1) CN109063611B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111385641B (en) * 2018-12-29 2022-04-22 深圳Tcl新技术有限公司 Video processing method, smart television and storage medium
CN109922373B (en) * 2019-03-14 2021-09-28 上海极链网络科技有限公司 Video processing method, device and storage medium
CN110119711B (en) * 2019-05-14 2021-06-11 北京奇艺世纪科技有限公司 Method and device for acquiring character segments of video data and electronic equipment
CN110868632B (en) * 2019-10-29 2022-09-09 腾讯科技(深圳)有限公司 Video processing method and device, storage medium and electronic equipment
CN113810782B (en) * 2020-06-12 2022-09-27 阿里巴巴集团控股有限公司 Video processing method and device, server and electronic device
CN112861981B (en) * 2021-02-22 2023-06-20 每日互动股份有限公司 Data set labeling method, electronic equipment and medium
CN117615084B (en) * 2024-01-22 2024-03-29 南京爱照飞打影像科技有限公司 Video synthesis method and computer readable storage medium
CN117676245A (en) * 2024-01-31 2024-03-08 深圳市积加创新技术有限公司 Context video generation method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635843A (en) * 2008-07-23 2010-01-27 北京大学 Method and system for extracting, seeking and comparing visual patterns based on frame-to-frame variation characteristics
CN103530652A (en) * 2013-10-23 2014-01-22 北京中视广信科技有限公司 Face clustering based video categorization method and retrieval method as well as systems thereof
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device
CN105718871A (en) * 2016-01-18 2016-06-29 成都索贝数码科技股份有限公司 Video host identification method based on statistics
CN105740758A (en) * 2015-12-31 2016-07-06 上海极链网络科技有限公司 Internet video face recognition method based on deep learning
CN106484837A (en) * 2016-09-30 2017-03-08 腾讯科技(北京)有限公司 The detection method of similar video file and device
CN108229322A (en) * 2017-11-30 2018-06-29 北京市商汤科技开发有限公司 Face identification method, device, electronic equipment and storage medium based on video

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10839221B2 (en) * 2016-12-21 2020-11-17 Facebook, Inc. Systems and methods for compiled video generation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101635843A (en) * 2008-07-23 2010-01-27 北京大学 Method and system for extracting, seeking and comparing visual patterns based on frame-to-frame variation characteristics
CN103530652A (en) * 2013-10-23 2014-01-22 北京中视广信科技有限公司 Face clustering based video categorization method and retrieval method as well as systems thereof
CN104796781A (en) * 2015-03-31 2015-07-22 小米科技有限责任公司 Video clip extraction method and device
CN105740758A (en) * 2015-12-31 2016-07-06 上海极链网络科技有限公司 Internet video face recognition method based on deep learning
CN105718871A (en) * 2016-01-18 2016-06-29 成都索贝数码科技股份有限公司 Video host identification method based on statistics
CN106484837A (en) * 2016-09-30 2017-03-08 腾讯科技(北京)有限公司 The detection method of similar video file and device
CN108229322A (en) * 2017-11-30 2018-06-29 北京市商汤科技开发有限公司 Face identification method, device, electronic equipment and storage medium based on video

Also Published As

Publication number Publication date
CN109063611A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109063611B (en) Face recognition result processing method and device based on video semantics
CN109359636B (en) Video classification method, device and server
CN108875676B (en) Living body detection method, device and system
US8358837B2 (en) Apparatus and methods for detecting adult videos
CN110309795B (en) Video detection method, device, electronic equipment and storage medium
Zhang et al. Efficient video frame insertion and deletion detection based on inconsistency of correlations between local binary pattern coded frames
CN111061915B (en) Video character relation identification method
US20210390316A1 (en) Method for identifying a video frame of interest in a video sequence, method for generating highlights, associated systems
JP2004199669A (en) Face detection
CN111918130A (en) Video cover determining method and device, electronic equipment and storage medium
CN111414868B (en) Method for determining time sequence action segment, method and device for detecting action
EP3438883B1 (en) Method and apparatus for detecting a common section in moving pictures
CN113283368B (en) Model training method, face attribute analysis method, device and medium
CN111738120B (en) Character recognition method, character recognition device, electronic equipment and storage medium
Heng et al. How to assess the quality of compressed surveillance videos using face recognition
CN114519863A (en) Human body weight recognition method, human body weight recognition apparatus, computer device, and medium
CN116261009B (en) Video detection method, device, equipment and medium for intelligently converting video audience
CN116340551A (en) Similar content determining method and device
US20210374419A1 (en) Semi-Supervised Action-Actor Detection from Tracking Data in Sport
Gomez-Nieto et al. Quality aware features for performance prediction and time reduction in video object tracking
JP2018137639A (en) Moving image processing system, encoder and program, decoder and program
CN113472834A (en) Object pushing method and device
CN111860070A (en) Method and device for identifying changed object
JP2019169843A (en) Video recording device, video recording method and program
CN113810751B (en) Video processing method and device, electronic device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A Method and Device for Processing Face Recognition Results Based on Video Semantics

Effective date of registration: 20230713

Granted publication date: 20210105

Pledgee: Bank of Jiangsu Limited by Share Ltd. Beijing branch

Pledgor: BEIJING MOVIEBOOK SCIENCE AND TECHNOLOGY Co.,Ltd.

Registration number: Y2023110000278