WO2017080196A1 - 基于人脸图像的视频分类方法和装置 - Google Patents

基于人脸图像的视频分类方法和装置 Download PDF

Info

Publication number
WO2017080196A1
WO2017080196A1 PCT/CN2016/084620 CN2016084620W WO2017080196A1 WO 2017080196 A1 WO2017080196 A1 WO 2017080196A1 CN 2016084620 W CN2016084620 W CN 2016084620W WO 2017080196 A1 WO2017080196 A1 WO 2017080196A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
frame
classified
video
face
Prior art date
Application number
PCT/CN2016/084620
Other languages
English (en)
French (fr)
Inventor
王甜甜
Original Assignee
深圳Tcl新技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl新技术有限公司 filed Critical 深圳Tcl新技术有限公司
Publication of WO2017080196A1 publication Critical patent/WO2017080196A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • the present invention relates to the field of multimedia, and in particular, to a video classification method and apparatus based on face images.
  • the main object of the present invention is to provide a video image classification method and apparatus based on face images, which aims to solve the problem of large amount of calculation, long calculation time, and low efficiency when a certain face image exists in a query video in the prior art.
  • Technical problem is to provide a video image classification method and apparatus based on face images, which aims to solve the problem of large amount of calculation, long calculation time, and low efficiency when a certain face image exists in a query video in the prior art.
  • the present invention provides a video classification method based on a face image, the method comprising the steps of:
  • the sample frame is the first one of the key frames in the video, and the first one of the key frames includes all face images in the video;
  • the preset algorithm is a face image extraction algorithm combining a ring symmetric Gabor transform and a local binary mode.
  • the preset algorithm comprises the following steps:
  • the “matching the face image in the to-be-classified frame with the face image included in the sample frame” is specifically:
  • the present invention also provides a video classification method based on a face image, the method comprising the steps of:
  • the to-be-classified frames of the face images having the same label are divided into one class, and the same-to-classified frames are recombined to obtain video segments having the same face image.
  • the sample frame is the first one of the key frames in the video, and the first one of the key frames includes all face images in the video.
  • the preset algorithm is a face image extraction algorithm combining a ring symmetric Gabor transform and a local binary mode.
  • the preset algorithm comprises the following steps:
  • the “matching the face image in the to-be-classified frame with the face image included in the sample frame” is specifically:
  • the present invention further provides a video classification device based on a face image, the device comprising:
  • a first extraction module configured to sequentially extract all key frames in the video, and use the key frame including all face images in the video as a sample frame, and use the key frame other than the sample frame as a frame to be classified.
  • a second extraction module configured to extract, by using a preset algorithm, the sample frame and the face image in the to-be-classified frame, and label the face image in the sample frame;
  • a comparison module configured to compare the face image in the to-be-classified frame with the face image included in the sample frame, to determine a label corresponding to the face image in the to-be-classified frame ;
  • a classification and reorganization module configured to divide the to-be-classified frames of the face images having the same label into one class, and recombine the same-to-classified frames to obtain video segments having the same face image.
  • the sample frame is the first one of the key frames in the video, and the first one of the key frames includes all face images in the video.
  • the preset algorithm is a face image extraction algorithm combining a ring symmetric Gabor transform and a local binary mode.
  • the second extraction module comprises:
  • a first extracting unit configured to extract a face image in the key frame, where the key frame is the sample frame or the to-be-classified frame;
  • a preprocessing unit configured to preprocess the face image
  • a transform unit configured to perform a circular symmetric Gabor transform on the pre-processed face image
  • a second extracting unit configured to perform local binary pattern transform processing on the face image after the circular symmetric Gabor transform, and extract a partial image histogram transformed by the local binary pattern, where each of the face images corresponds to In one of the face image histograms.
  • the comparison module is specifically:
  • the present invention adopts the following steps: extracting all key frames in a video, using a key frame including all face images in the video as a sample frame, and using a key frame other than the sample frame as a frame to be classified;
  • the algorithm extracts the face image in the sample frame and the frame to be classified, and labels the face image in the sample frame; and compares the face image in the frame to be classified with the face image in the sample frame in sequence,
  • the to-be-classified frames with the same label are divided into one class and the same type of to-be-classified frames are recombined to obtain video segments having the same face image; realizing that all key frames in the video are classified and reorganized based on the face images, and then
  • FIG. 1 is a schematic flow chart of a preferred embodiment of a video image classification method based on a face image according to the present invention
  • FIG. 2 is a schematic flowchart of a preset algorithm in an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of functional modules of a preferred embodiment of a video image classification device based on a face image according to the present invention
  • FIG. 4 is a schematic diagram of a functional module of a second extraction module according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a face image after preprocessing a face image in a sample frame according to the present invention
  • FIG. 6 is a schematic diagram of a corresponding amplitude image after a circular symmetrical Gabor transform of a face image according to the present invention
  • FIG. 7 is a schematic diagram of a texture image of a face image subjected to circular symmetric Gabor transform in the present invention.
  • FIG. 8 is a face extracted by one of the face texture images in FIG. 7 after local binary mode conversion. Schematic diagram of an image histogram
  • FIG. 9 is a schematic diagram of a histogram of a face image in which five face images are superimposed together in FIG. 7;
  • FIG. 10 is a schematic diagram showing a histogram matching result of a face image in a sample frame and a fifth face image in a sample frame being the same face image in a frame to be classified according to the present invention.
  • the invention provides a video classification method based on face images.
  • FIG. 1 is a schematic flowchart diagram of a preferred embodiment of a video classification method based on a face image according to the present invention.
  • the video image classification method based on the face image includes:
  • Step S10 sequentially extracting all the key frames in the video, using the key frame including all the face images in the video as a sample frame, and using the key frame other than the sample frame as a to-be-classified frame;
  • the terminal collects video through a camera, where the video includes an I frame, a B frame, and a P frame, wherein the I frame is a key frame, which best reflects and represents important information in the video, so during the analysis of the video frame, Extract only the I frame analysis of the video.
  • the terminal sequentially extracts all key frames in the video, and uses a key frame including all face images in the video as a sample frame, and uses a key frame other than the sample frame in the video as a frame to be classified. .
  • the camera selects a camera with a high pixel in order to be able to extract a relatively clear face image from the video.
  • the sample frame is the first one of the key frames in the video, and the first one of the key frames includes all face images in the video.
  • the terminal sequentially acquires each I frame in the video, and performs ordinal labeling for each I frame. If the video includes a total of five I frames, the five I frames are sequentially labeled as 1 , 2, 3, 4, and 5.
  • the sample frame is the first I frame of all the I frames, and the first I frame includes all the face images in the video, and the to-be-classified frame is the remaining I frame, that is, the remaining The four I frames are the frames to be classified.
  • Step S20 extracting, by using a preset algorithm, the sample frame and the face image in the to-be-classified frame, and labeling the face image in the sample frame;
  • the terminal extracts the sample frame and the face image in the to-be-classified frame by using a preset algorithm, and labels the face image in the sample frame.
  • the preset algorithm includes, but is not limited to, a circular symmetric Gabor transform combined with a local binary pattern (LBP, Local Binary Pattern).
  • the face image extraction algorithm may also be a face image extraction algorithm based on a template matching method, a singular value feature method, and a subspace analysis method.
  • the preset algorithm is a face image extraction algorithm combining a ring symmetric Gabor transform and a local binary mode (LBP).
  • the terminal extracts 10 human face images from the sample frame by using a combination of a circular symmetric Gabor transform and a local binary pattern (LBP), and labels the 10 face images, which are sequentially labeled as A and B. , C, D, E, F, G, H, I, and J, wherein A represents a first personal face image in the sample frame, B represents a second personal face image in the sample frame, and C represents a third personal face image or the like in the sample frame.
  • the Circular Symmetric Gabor Transform (CSGT) is a wavelet transform of five scales and multiple directions, which transforms an image into images in five directions and multiple directions.
  • the terminal in order to normalize the face image in the sample frame and the face image in the to-be-classified frame, the face image in the sample frame and the to-be-classified frame
  • the face image in the image is set to an image of a specified size area, such as the size of the specified area set to 54*54.
  • the terminal extracts a face image from the to-be-classified frame by an algorithm combining a ring symmetric Gabor transform and a local binary mode LBP.
  • a conventional method of extracting texture information of a face image is a combination of GT (Gabor Transform, Gabor Transform) and the LBP.
  • GT General Transform, Gabor Transform
  • the first method is The face image is transformed by the GT to obtain a filtered face image, and the GT transform is a transformation for 8 directions of 5 scales, that is, 40 filtered images are generated, and then the 40 filtered images are performed.
  • the LBP transform is described, and finally the face image is recognized.
  • the computational complexity of the method is too high, and the calculation time is too long, resulting in long reading and analysis time of the video, and low efficiency.
  • the texture information of the face image is extracted by combining the CSGT and the LBP, and the face image is transformed by the CSGT to generate five filtered images for superposition, and is recombined into five filters. And then performing energy extraction on the reconstructed filtered image to extract an image that best describes the texture information of the face image, and performing the LBP transformation on the texture image information.
  • the face recognition algorithm combining the GT and the LBP, only five filtered images need to be calculated, and 40 filtered images are calculated in relative proportion, which reduces the calculation amount and reduces the calculation time.
  • Step S30 comparing the face image in the to-be-classified frame with the face image included in the sample frame, to determine a label corresponding to the face image in the to-be-classified frame;
  • the terminal compares the face image in the to-be-classified frame with the face image included in the sample frame to determine a label corresponding to the face image in the to-be-classified group in the sample frame.
  • the face image in the frame to be classified (2, 3, 4, and 5I frames) of the terminal and the sample frame A
  • the 10 human face images of B, C, D, E, F, G, H, I, and J are compared, and the face images in the 2nd I frame and the 3I frame are matched with the A face image, that is, The face image in the 2I frame and the 3I frame corresponds to the A mark; or the face image in the 2I frame, the 4I frame, and the 5I frame is matched with the D face image, that is, the 2I frame
  • the face images in the 4I frame and the 5I frame correspond to the D label.
  • the frame to be classified may include not only one face image, but when two or more face images are included, the same to-be-classified frame may belong to different classifications
  • Step 40 The to-be-classified frames of the face images having the same label are divided into one class, and the same-to-classified frames are recombined to obtain video segments having the same face image.
  • the terminal divides the to-be-classified frames of the face images having the same label into one class, and recombines the same-to-classified frames to obtain video segments having the same face image.
  • the terminal divides the 2I frame and the 3I frame of the face image having the label A in the to-be-classified frame into the same type of frame, that is, divides the 2I frame and the 3I frame in the to-be-classified frame into I frame of the A face image, and recombining the 2I frame and the 3I frame to obtain a video segment having an A face image; the terminal will face the face with the label D in the to-be-classified frame
  • the 2I frame, the 4I frame, and the 5th frame of the image are divided into the same type of frame, that is, the 2I frame, the 4I frame, and the 5I frame in the to-be-classified frame are divided into I frames including the D face image, and the The 2II frame, the 4I frame, and the 5th II frame are recombined to
  • a key frame including all face images in the video is used as a sample frame, and a key frame other than the sample frame is used as a frame to be classified by extracting all the key frames in the video; the sample frame and the frame to be classified are extracted by a preset algorithm.
  • the face image in the frame to be classified is sequentially compared with the face image in the sample frame, and the frame to be classified having the same label is divided into One class recombines similar frames to be classified to obtain video segments with the same face image; realizes classifying and reorganizing all key frames in the video based on face images, and then determining whether a face image exists in a certain In a video or in which position of the video, there is no need to extract all the face images contained in the video from beginning to end, and the face images are sequentially analyzed, and only the images with the same face are searched.
  • the video clip can be used, which greatly reduces the amount of calculation, shortens the calculation time and improves the efficiency.
  • FIG. 2 is a schematic flowchart diagram of a preset algorithm according to an embodiment of the present invention.
  • the preset algorithm includes:
  • Step S21 extracting a face image in a key frame, where the key frame is a sample frame or a frame to be classified;
  • Step S22 preprocessing the face image
  • Extracting, by the terminal, a face image in the key frame that is, extracting a face image in the sample frame or extracting a face image in the to-be-classified frame, and extracting a face image extracted from the sample frame
  • pre-processing includes gradation transformation, equalization of face image histogram, median filtering, and homomorphic filtering. The order of execution of the gradation transformation, the equalization of the face image histogram, the median filtering, and the homomorphic filtering may be transformed.
  • the terminal performs gradation transformation on the 10 face images of A, B, C, D, E, F, G, H, I, and J in the sample frame, and equalizes and mediates the histogram of the face image.
  • the face image in the image is preprocessed to obtain a pre-processed face image in four I frames.
  • FIG. 5 is a schematic diagram of a face image preprocessed by a face image in a sample frame of the present invention.
  • the preprocessing the face image includes:
  • the face image in the sample frame of the terminal and the face image in the frame to be classified are subjected to gradation transformation to obtain a face image in the sample frame and the gradation transformation in the frame to be classified.
  • the gradation transformation is also called grayscale stretching and contrast stretching. It is the most basic kind of point operation. According to the gray value of each pixel in the original image, according to some mapping rule, it is transformed into another A gray value that achieves the purpose of enhancing an image by assigning a new gray value to each pixel in the original image.
  • the face image of the gradation transformation is subjected to equalization processing of the face image histogram.
  • the terminal performs equalization processing on the face image of the gradation transformed face image to obtain a face image in the sample frame and subjected to histogram equalization processing in the frame to be classified, that is, correspondingly obtained
  • the pre-processed face image The step of equalizing the histogram of the face image is: 1 statistic the histogram of the face image after the gradation transformation; 2 according to the statistical histogram of the face image, using the cumulative distribution function to transform, and obtaining the transformed The new gray scale; 3 replaces the old gray scale with the new gray scale.
  • This step is an approximation process. It should be as reasonable as possible according to the reasonable purpose, and the gray values are equal or approximate.
  • the terminal performs median filtering on the face image in the sample frame and the face image in the frame to be classified, where the median filtering is to sort the pixels of the local area according to the gray level.
  • the median of the gray level in the field is taken as the gray value of the current pixel.
  • the step of median filtering is: 1 roaming the filter template in the image, and superimposing the center of the template with a pixel position in the image; 2 reading the gray value of each corresponding pixel in the template; 3, the gray value Arrange from small to large; 4
  • the intermediate data of this column of data is assigned to the pixel corresponding to the center of the template.
  • the terminal is to the person in the sample frame
  • the face image and the face image in the frame to be classified are homomorphic filtered.
  • the homomorphic filtering is to change the luminance model (non-additive) in the form of an image product into an additivity form for filtering enhancement processing.
  • the steps of the homomorphic filtering are: 1 performing logarithmic transformation on both sides of the luminance function, and then taking a Fourier transform; 2 passing a unified filter; 3 taking an inverse inverse transform of the output of the filter, and then taking an exponential transformation.
  • Step S23 performing a circular symmetric Gabor transform on the pre-processed face image
  • a face image the terminal performs a circular symmetric Gabor transform on the pre-processed face image in the to-be-classified frame, and obtains a face image subjected to a circular symmetric Gabor transform in the to-be-classified frame, that is, obtains the to-be-classified frame A face image that has been transformed by CSGT.
  • FIG. 6 is a schematic diagram of a corresponding amplitude image after a circular symmetrical Gabor transform of a face image according to the present invention, wherein FIG. 6 is an A person in the sample frame.
  • the face image is taken as an example, that is, FIG. 6 shows an amplitude image corresponding to the A face image after circular symmetric Gabor transform
  • FIG. 7 is a schematic diagram of a texture image of a face image subjected to circular symmetric Gabor transform in the present invention.
  • FIG. 7 is an example of an A-face image in the sample frame, that is, FIG. 7 shows a texture image corresponding to the A-face image undergoing circular symmetric Gabor transformation.
  • Step S24 performing local binary pattern transformation processing on the face image after the circular symmetric Gabor transformation, and extracting a local binary image transformed histogram histogram, each of the face images corresponding to one Human image histogram.
  • the terminal Performing, by the terminal, the LBP transform on the face image of the circular symmetric Gabor transform in the sample frame, to obtain the face image after the LBP transform in the sample frame, and the face transformed from the LBP
  • a face image histogram is extracted from the image, and each of the face images corresponds to a histogram of the face image.
  • FIG. 8 is a schematic diagram of a face image histogram extracted by one of the face texture images in FIG.
  • FIG. 9 is a schematic diagram of a face image histogram in which five face images are superimposed together in FIG. 7, that is, FIG. 9 shows a result obtained by a circular symmetric Gabor transform and the local binary pattern (LBP) transformation.
  • LBP local binary pattern
  • comparing the face image in the to-be-classified frame with the face image included in the sample frame is specifically:
  • the terminal calculates the face image in the frame to be classified according to the face image histogram corresponding to the frame to be classified and the histogram of all face images corresponding to the sample frame. a distance between all face images included in the sample frame, wherein the face image in the frame to be classified is the same as the corresponding face image in the sample frame when the distance is the smallest.
  • the Euclidean distance formula is:
  • FIG. 10 is a schematic diagram of a histogram matching result of a face image in a frame to be classified and a face image in a sample frame being the same face image in the present invention, for example, FIG. 10 shows a 2I.
  • the face image in the frame is the same as the fifth face image (ie, the face image E) in the sample frame, that is, the label corresponding to the 2nd frame has E.
  • the present invention further provides a video classification device based on a face image.
  • FIG. 3 is a schematic diagram of functional modules of a preferred embodiment of a video image classification device based on a face image according to the present invention.
  • the video image classification device based on the face image includes:
  • a first extraction module 10 configured to sequentially extract all key frames in the video, and use the key frame including all face images in the video as a sample frame, and use the key frame other than the sample frame as a to-be-classified frame;
  • the terminal collects video through a camera, and the video includes an I frame, a B frame, and a P frame, wherein the I frame is a key frame, which can best represent and represent important information in the video, and therefore is analyzed in the video frame. In the process, only the I frame analysis of the video is extracted.
  • the terminal sequentially extracts all key frames in the video, and uses a key frame including all face images in the video as a sample frame, and uses a key frame other than the sample frame in the video as a frame to be classified. .
  • the camera selects a camera with a high pixel in order to be able to extract a relatively clear face image from the video.
  • the sample frame is the first one of the key frames in the video, and the first one of the key frames includes all face images in the video.
  • the terminal sequentially acquires each I frame in the video, and performs ordinal labeling for each I frame. If the video includes a total of five I frames, the five I frames are sequentially labeled as 1 , 2, 3, 4, and 5.
  • the sample frame is the first I frame of all the I frames, and the first I frame includes all the face images in the video, and the to-be-classified frame is the remaining I frame, that is, the remaining The four I frames are the frames to be classified.
  • a second extraction module 20 configured to extract, by using a preset algorithm, the sample frame and the face image in the to-be-classified frame, and label the face image in the sample frame;
  • the terminal extracts the sample frame and the face image in the to-be-classified frame by using a preset algorithm, and labels the face image in the sample frame.
  • the preset algorithm includes, but is not limited to, a face image extraction algorithm that combines a circular symmetric Gabor transform and a local binary pattern (LBP), and the preset algorithm may also be a method based on template matching.
  • a face image extraction algorithm based on singular value feature method and subspace analysis method.
  • the preset algorithm is a face image extraction algorithm combining a ring symmetric Gabor transform and a local binary mode (LBP).
  • the terminal extracts 10 human face images from the sample frame by using a combination of a circular symmetric Gabor transform and a local binary pattern (LBP), and labels the 10 face images, which are sequentially labeled as A and B. , C, D, E, F, G, H, I, and J, wherein A represents a first personal face image in the sample frame, B represents a second personal face image in the sample frame, and C represents a third personal face image or the like in the sample frame.
  • the Circular Symmetric Gabor Transform (CSGT) is a wavelet transform of five scales and multiple directions, which transforms an image into images in five directions and multiple directions.
  • the terminal in order to normalize the face image in the sample frame and the face image in the to-be-classified frame, the face image in the sample frame and the to-be-classified frame
  • the face image in the image is set to an image of a specified size area, such as the size of the specified area set to 54*54.
  • the terminal extracts a face image from the to-be-classified frame by an algorithm combining a ring symmetric Gabor transform and a local binary mode LBP.
  • a conventional method of extracting texture information of a face image is a combination of GT (Gabor Transform, Gabor Transform) and the LBP.
  • GT General Transform, Gabor Transform
  • the first method is The face image is transformed by the GT to obtain a filtered face Image
  • the GT transformation is a transformation for 8 directions of 5 scales, that is, 40 filtered images are generated
  • the LBP transform is performed on the 40 filtered images, and finally the face image is recognized.
  • the computational complexity of the method is too high, and the calculation time is too long, resulting in long reading and analysis time of the video, and low efficiency.
  • the texture information of the face image is extracted by combining the CSGT and the LBP, and the face image is transformed by the CSGT to generate five filtered images for superposition, and is recombined into five filters. And then performing energy extraction on the reconstructed filtered image to extract an image that best describes the texture information of the face image, and performing the LBP transformation on the texture image information.
  • the face recognition algorithm combining the GT and the LBP, only five filtered images need to be calculated, and 40 filtered images are calculated in relative proportion, which reduces the calculation amount and reduces the calculation time.
  • the comparison module 30 is configured to compare the face image in the to-be-classified frame with the face image included in the sample frame to determine a face image corresponding to the face image in the to-be-classified frame Label
  • the terminal compares the face image in the to-be-classified frame with the face image included in the sample frame to determine a label corresponding to the face image in the to-be-classified group in the sample frame.
  • the face image in the frame to be classified (2, 3, 4, and 5I frames) and the A, B, C, D, E, F, G, H, I, and J in the sample frame are 10
  • the personal face images are compared, and the face images in the 2nd I frame and the 3I frame are consistent with the A face image, that is, the face images in the 2nd I frame and the 3I frame correspond to the A mark; or
  • the face images in the 2I frame, the 4I frame, and the 5I frame are matched with the D face image, that is, the face images in the 2I frame, 4I frame, and 5I frame correspond to the D mark.
  • the frame to be classified may include not only one face image, but when two or more face images are included, the same to-be-classified frame may belong to different classifications, and may
  • the classification and reorganization module 40 is configured to divide the to-be-classified frames of the face images having the same label into one class, and recombine the same-to-classified frames to obtain video segments having the same face image.
  • the terminal divides the to-be-classified frames of the face images having the same label into one class, and recombines the same-to-classified frames to obtain video segments having the same face image.
  • the terminal divides the 2I frame and the 3I frame of the face image having the label A in the to-be-classified frame into the same type of frame, that is, divides the 2I frame and the 3I frame in the to-be-classified frame into I frame of the A face image, and recombining the 2I frame and the 3I frame to obtain a video segment having an A face image; the terminal will face the face with the label D in the to-be-classified frame
  • the 2I frame, the 4I frame, and the 5th frame of the image are divided into the same type of frame, that is, the 2I frame, the 4I frame, and the 5I frame in the to-be-classified frame are divided into I frames including the D face image, and the The 2I frame, the 4I frame, and the 5th frame are heavy Group, get a video clip
  • a key frame including all face images in the video is used as a sample frame, and a key frame other than the sample frame is used as a frame to be classified by extracting all the key frames in the video; the sample frame and the frame to be classified are extracted by a preset algorithm.
  • the face image in the frame to be classified is sequentially compared with the face image in the sample frame, and the frame to be classified having the same label is divided into One class recombines similar frames to be classified to obtain video segments with the same face image; realizes classifying and reorganizing all key frames in the video based on face images, and then determining whether a face image exists in a certain In a video or in which position of the video, there is no need to extract all the face images contained in the video from beginning to end, and the face images are sequentially analyzed, and only the images with the same face are searched.
  • the video clip can be used, which greatly reduces the amount of calculation, shortens the calculation time and improves the efficiency.
  • FIG. 4 is a schematic diagram of a functional module of a second extraction module according to an embodiment of the present invention.
  • the second extraction module 20 includes:
  • a first extracting unit 21 configured to extract a face image in a key frame, where the key frame is a sample frame or a frame to be classified;
  • a pre-processing unit 22 configured to perform pre-processing on the face image
  • Extracting, by the terminal, a face image in the key frame that is, extracting a face image in the sample frame or extracting a face image in the to-be-classified frame, and extracting a face image extracted from the sample frame
  • pre-processing includes gradation transformation, equalization of face image histogram, median filtering, and homomorphic filtering. The order of execution of the gradation transformation, the equalization of the face image histogram, the median filtering, and the homomorphic filtering may be transformed.
  • the terminal performs gradation transformation on the 10 face images of A, B, C, D, E, F, G, H, I, and J in the sample frame, and equalizes and mediates the histogram of the face image.
  • the face image in the image is preprocessed to obtain a pre-processed face image in four I frames.
  • FIG. 5 is a schematic diagram of a face image preprocessed by a face image in a sample frame of the present invention.
  • the preprocessing the face image includes:
  • the face image in the sample frame of the terminal and the face image in the frame to be classified are subjected to gradation transformation to obtain a face image in the sample frame and the gradation transformation in the frame to be classified.
  • Grayscale Transform also known as gray stretch and contrast stretch, is the most basic point operation. According to the gray value of each pixel in the original image, it is transformed into another gray according to some mapping rule. The value is achieved by assigning a new gray value to each pixel in the original image.
  • the face image of the gradation transformation is subjected to equalization processing of the face image histogram.
  • the terminal performs equalization processing on the face image of the gradation transformed face image to obtain a face image in the sample frame and subjected to histogram equalization processing in the frame to be classified, that is, correspondingly obtained
  • the pre-processed face image The step of equalizing the histogram of the face image is: 1 statistic the histogram of the face image after the gradation transformation; 2 according to the statistical histogram of the face image, using the cumulative distribution function to transform, and obtaining the transformed The new gray scale; 3 replaces the old gray scale with the new gray scale.
  • This step is an approximation process. It should be as reasonable as possible according to the reasonable purpose, and the gray values are equal or approximate.
  • the terminal performs median filtering on the face image in the sample frame and the face image in the frame to be classified, where the median filtering is to sort the pixels of the local area according to the gray level.
  • the median of the gray level in the field is taken as the gray value of the current pixel.
  • the step of median filtering is: 1 roaming the filter template in the image, and superimposing the center of the template with a pixel position in the image; 2 reading the gray value of each corresponding pixel in the template; 3, the gray value Arrange from small to large; 4
  • the intermediate data of this column of data is assigned to the pixel corresponding to the center of the template.
  • the terminal performs homomorphic filtering on the face image in the sample frame and the face image in the frame to be classified.
  • the homomorphic filtering is to change the luminance model (non-additive) in the form of an image product into an additivity form for filtering enhancement processing.
  • the steps of the homomorphic filtering are: 1 performing logarithmic transformation on both sides of the luminance function, and then taking a Fourier transform; 2 passing a unified filter; 3 taking an inverse inverse transform of the output of the filter, and then taking an exponential transformation.
  • a transform unit 23 configured to perform a circular symmetric Gabor transform on the pre-processed face image
  • a face image the terminal performs a circular symmetric Gabor transform on the pre-processed face image in the to-be-classified frame, and obtains a face image subjected to a circular symmetric Gabor transform in the to-be-classified frame, that is, obtains the to-be-classified frame A face image that has been transformed by CSGT.
  • FIG. 6 is a schematic diagram of a corresponding amplitude image after a circular symmetrical Gabor transform of a face image according to the present invention, wherein FIG. 6 is an A person in the sample frame.
  • the face image is taken as an example, that is, FIG. 6 shows an amplitude image corresponding to the A face image after circular symmetric Gabor transform
  • FIG. 7 is a schematic diagram of a texture image of a face image subjected to circular symmetric Gabor transform in the present invention.
  • the map 7 is an example of the A face image in the sample frame, that is, FIG. 7 shows a texture image corresponding to the A face image after the circular symmetric Gabor transform.
  • a second extracting unit 24 configured to perform local binary pattern transform processing on the face image after the circular symmetric Gabor transform, and extract a partial binary image transformed histogram histogram, each of the face images Corresponding to a histogram of the face image.
  • the terminal Performing, by the terminal, the LBP transform on the face image of the circular symmetric Gabor transform in the sample frame, to obtain the face image after the LBP transform in the sample frame, and the face transformed from the LBP
  • a face image histogram is extracted from the image, and each of the face images corresponds to a histogram of the face image.
  • FIG. 8 is a schematic diagram of a face image histogram extracted by one of the face texture images in FIG.
  • FIG. 7 is a schematic diagram of a face image histogram in which five face images are superimposed together, that is, FIG. 9 shows five A face images obtained by transforming a circular symmetric Gabor and the local binary pattern (LBP).
  • LBP local binary pattern
  • the comparison module 30 is configured to: calculate, according to the histogram of the face image corresponding to the frame to be classified, and the histogram of all the face images corresponding to the sample frame, by using the Euclidean distance formula to obtain the a distance between a face image in the frame to be classified and all face images included in the sample frame, wherein the face image in the frame to be classified is the same as the corresponding face image in the sample frame when the distance is the smallest .
  • the terminal calculates, according to the histogram of the face image corresponding to the frame to be classified, and the histogram of all the face images in the sample frame, by using the Euclidean distance formula, to obtain the face image in the frame to be classified. a distance from all face images included in the sample frame, wherein the face image in the frame to be classified is the same as the corresponding face image in the sample frame when the distance is the smallest.
  • the Euclidean distance formula is:
  • FIG. 10 is a schematic diagram of a histogram matching result of a face image in a frame to be classified and a face image in a sample frame being the same face image in the present invention, for example, FIG.
  • the face image in the 2I frame is the same as the fifth face image (ie, face image E) in the sample frame, that is, the label corresponding to the 2I frame has E.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

一种基于人脸图像的视频分类方法和视频分类装置,包括步骤:依次提取视频中的所有关键帧,得到样本帧和待分类帧(S10);通过预设算法提取样本帧和所述待分类帧中的人脸图像,对所述样本帧中的人脸图像进行标号(S20);将所述待分类帧中的人脸图像与所述样本帧中包含的人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号(S30);将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段(S40)。该方法和装置将视频中所有关键帧基于人脸图像进行分类、重组,降低了查询视频中某个人脸图像是否存在时的计算量,缩短了计算时间、提高了查询效率。

Description

基于人脸图像的视频分类方法和装置 技术领域
本发明涉及多媒体领域,尤其涉及一种基于人脸图像的视频分类方法和装置。
背景技术
随着视频数据的大量增加,人们经常需要分析某个人脸图像是否存在于某一段视频中。当用户需要分析一段视频中,是否存在某个特定的人脸图像、以及该人脸图像具体存在于这段视频中的哪个位置时,需要从头到尾读取这段视频,提取视频中所有的人脸图像、然后依次判断各个人脸图像是否为用户需要查找的人脸图像,进而判断该人脸图像存在于视频中的哪个位置。
然而,用户每次执行“判断某个人脸图像是否存在于某一段视频中以及存在于该段视频的具体位置”的操作时,都需要重复提取视频中所有的人脸图像、然后依次与某个人脸图像进行比对分析。因此,操作过程计算量较大,计算时间长、效率较低。
发明内容
本发明的主要目的在于提供一种基于人脸图像的视频分类方法和装置,旨在解决现有技术中查询视频中某个人脸图像是否存在及存在位置时计算量大、计算时间长、效率低的技术问题。
为实现上述目的,本发明提供一种基于人脸图像的视频分类方法,所述方法包括步骤:
依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段;
其中,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像;
所述预设算法为环形对称Gabor变换与局部二值模式相结合的人脸图像提取算法。
优选地,所述预设算法包括以下步骤:
提取所述关键帧中的人脸图像,所述关键帧为所述样本帧或所述待分类帧;
对所述人脸图像进行预处理;
对预处理后的所述人脸图像进行环形对称Gabor变换;
对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
优选地,所述“将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对”具体为:
根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
此外,为实现上述目的,本发明还提供一种基于人脸图像的视频分类方法,所述方法包括步骤:
依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。
优选地,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像。
优选地,所述预设算法为环形对称Gabor变换与局部二值模式相结合的人脸图像提取算法。
优选地,所述预设算法包括以下步骤:
提取所述关键帧中的人脸图像,所述关键帧为所述样本帧或所述待分类帧;
对所述人脸图像进行预处理;
对预处理后的所述人脸图像进行环形对称Gabor变换;
对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
优选地,所述“将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对”具体为:
根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
此外,为实现上述目的,本发明还提供一种基于人脸图像的视频分类装置,所述装置包括:
第一提取模块,用于依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
第二提取模块,用于通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
比对模块,用于将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
分类重组模块,用于将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。
优选地,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像。
优选地,所述预设算法为环形对称Gabor变换与局部二值模式相结合的人脸图像提取算法。
优选地,所述第二提取模块包括:
第一提取单元,用于提取所述关键帧中的人脸图像,所述关键帧为所述样本帧或所述待分类帧;
预处理单元,用于对所述人脸图像进行预处理;
变换单元,用于对预处理后的所述人脸图像进行环形对称Gabor变换;
第二提取单元,用于对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
优选地,所述比对模块具体为:
根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
与现有技术相比,本发明通过以下步骤:提取视频中的所有关键帧,将包括视频中所有人脸图像的关键帧作为样本帧,将样本帧以外的关键帧作为待分类帧;通过预设算法提取样本帧和待分类帧中的人脸图像,并对样本帧中的人脸图像进行标号;将待分类帧中的人脸图像依次与样本帧中的人脸图像进行比对,将具有相同标号的待分类帧划分为一类并将同类的待分类帧进行重组以得到具有同一人脸图像的视频片段;实现了将视频中所有关键帧基于人脸图像进行分类、重组,进而当判断某个人脸图像是否存在于某一段视频中或存在于该段视频的哪一位置时,无需从头至尾提取视频中所包含的所有人脸图像、并将人脸图像依次比对分析,而只需查找具有相同人脸图像的视频片段即可,大大降低了计算量,缩短了计算时间、提高了效率。
附图说明
图1为本发明基于人脸图像的视频分类方法较佳实施例的流程示意图;
图2为本发明实施例中预设算法的流程示意图;
图3为本发明基于人脸图像的视频分类装置较佳实施例的功能模块示意图;
图4为本发明实施例中第二提取模块的一种功能模块示意图;
图5为本发明中样本帧中的人脸图像进行预处理后的人脸图像的示意图;
图6为本发明中某一人脸图像经环形对称Gabor变换后对应的幅值图像的示意图;
图7为本发明中人脸图像经过环形对称Gabor变换后的纹理图像的示意图;
图8为图7中其中一幅人脸纹理图像经过局部二值模式变换后提取的人脸 图像直方图的示意图;
图9为图7中5幅人脸图像叠加到一起的人脸图像直方图的示意图;
图10为本发明中某个待分类帧中与样本帧中的人脸图像与样本帧中第五个人脸图像为同一个人脸图像的直方图匹配结果示意图。
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明提供一种基于人脸图像的视频分类方法。
参照图1,图1为本发明基于人脸图像的视频分类方法较佳实施例的流程示意图。
在本实施例中,所述基于人脸图像的视频分类方法包括:
步骤S10,依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
终端通过摄像头进行采集视频,所述视频包括了I帧、B帧和P帧,其中所述I帧为关键帧,最能体现和代表视频中的重要信息,因此在视频帧的分析过程中,只提取视频的I帧分析。所述终端依次提取所述视频中的所有关键帧,并将包括所述视频中所有人脸图像的关键帧作为样本帧,将所述视频中除所述样本帧以外的关键帧作为待分类帧。其中,所述终端为了能够从所述视频中提取出较为清晰的人脸图像,所述摄像头选择像素高的摄像头。
较优的,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像。如所述终端依次获取所述视频中的每一I帧,并对每一I帧进行序数标号,如所述视频一共包括5个I帧,则将所述5个I帧按顺序标为1、2、3、4和5。其中,所述样本帧为所有I帧中的第一个I帧,所述第一个I帧中包括所述视频中所有的人脸图像,所述待分类帧为剩余的I帧,即剩余4个I帧为待分类帧。
步骤S20,通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
所述终端通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号。其中,所述预设算法包括但不限于环形对称Gabor变换与局部二值模式(LBP,Local Binary Pattern)相结合的 人脸图像提取算法,所述预设算法还可以为基于模板匹配的方法、基于奇异值特征方法和子空间分析法等的人脸图像提取算法。其中,在本实施例中,所述预设算法为环形对称Gabor变换与局部二值模式(LBP)相结合的人脸图像提取算法。如所述终端通过环形对称Gabor变换与局部二值模式(LBP)相结合的算法从所述样本帧中提取到10个人脸图像,并对这10个人脸图像进行标号,依次标记为A、B、C、D、E、F、G、H、I和J,其中,A表示所述样本帧中的第一个人脸图像、B表示所述样本帧中的第二个人脸图像、C表示所述样本帧中的第三个人脸图像等。所述环形对称Gabor变换CSGT(Circularly Symmetric Gabor Transform,环形对称Gabor变换)是5个尺度多个方向的一种小波变换,它将一幅图像变换为5个尺度多个方向上的图像。在本实施例中,所述终端为了归一化所述样本帧中的人脸图像和所述待分类帧中的人脸图像,将所述样本帧中的人脸图像和所述待分类帧中的人脸图像设置为指定大小区域的图像,如设置的指定区域的大小为54*54。所述终端通过环形对称Gabor变换与局部二值模式LBP相结合的算法从所述待分类帧中提取人脸图像。
提取人脸图像的纹理信息的传统方法为GT(Gabor Transform,Gabor变换)和所述LBP相结合的算法,在利用所述GT进行人脸图像的纹理信息的滤波提取过程中,首先将所述人脸图像经过所述GT变换,得到滤波后的人脸图像,所述GT变换是针对5个尺度8个方向的变换,即生成40幅滤波图像,然后再对所述40幅滤波图像进行所述LBP变换,最后进行人脸图像的识别。该方法计算复杂度过高,且计算时间过长,导致视频的读取分析时间长,效率低。而本实施例是通过将所述CSGT和所述LBP相结合的算法提取人脸图像的纹理信息,所述人脸图像经过所述CSGT变换后生成5幅滤波图像进行叠加,重组成5幅滤波图像,然后对重组后的滤波图像进行能量提取,以提取出最能描述所述人脸图像的纹理信息的图像,再对所述纹理图像信息进行所述LBP变换。相对于采用所述GT和所述LBP相结合的人脸识别算法,只需要计算5幅滤波图像,相对比计算40幅滤波图像,降低了计算量,减少了计算时间。
步骤S30,将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
所述终端将所述待分类帧中的人脸图像与所述样本帧中包含的人脸图像进行比对,以判断所述待分类中的人脸图像在所述样本帧中所对应的标号。如将所述终端待分类帧(2、3、4和5I帧)中的人脸图像与所述样本帧中A、 B、C、D、E、F、G、H、I和J这10个人脸图像进行比对,得到所述第2I帧和第3I帧中的人脸图像与A人脸图像一致,即所述第2I帧和第3I帧中的人脸图像对应着A标号;或得到所述第2I帧、4I帧和第5I帧中的人脸图像与D人脸图像一致,即所述第2I帧、4I帧和第5I帧中的人脸图像对应着D标号。需要说明的是,待分类帧中可能不仅包括一个人脸图像,当包括两个以上人脸图像时,同一待分类帧可能属于不同的分类,进而可能存在于不同的视频片段中。
步骤40,将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。
所述终端将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。如所述终端将所述待分类帧中具有标号A的人脸图像的第2I帧和第3I帧划分为同一类帧,即将所述待分类帧中的第2I帧和第3I帧划分为包含A人脸图像的I帧,并将所述第2I帧和所述第3I帧进行重组,得到具有A人脸图像的视频片段;所述终端将所述待分类帧中具有标号D的人脸图像的第2I帧、4I帧和第5I帧划分为同一类帧,即将所述待分类帧中的第2I帧、4I帧和第5I帧划分为包含D人脸图像的I帧,并将所述第2I帧、所述4I帧和所述第5I帧进行重组,得到具有D人脸图像的视频片段。
本实施例通过提取视频中的所有关键帧,将包括视频中所有人脸图像的关键帧作为样本帧,将样本帧以外的关键帧作为待分类帧;通过预设算法提取样本帧和待分类帧中的人脸图像,并对样本帧中的人脸图像进行标号;将待分类帧中的人脸图像依次与样本帧中的人脸图像进行比对,将具有相同标号的待分类帧划分为一类并将同类的待分类帧进行重组以得到具有同一人脸图像的视频片段;实现了将视频中所有关键帧基于人脸图像进行分类、重组,进而当判断某个人脸图像是否存在于某一段视频中或存在于该段视频的哪一位置时,无需从头至尾提取视频中所包含的所有人脸图像、并将人脸图像依次比对分析,而只需查找具有相同人脸图像的视频片段即可,大大降低了计算量,缩短了计算时间、提高了效率。
参照图2,图2为本发明一实施例中预设算法的流程示意图。
在本实施例中,所述预设算法包括:
步骤S21,提取关键帧中的人脸图像,其中关键帧为样本帧或待分类帧;
步骤S22,对所述人脸图像进行预处理;
所述终端提取所述关键帧中的人脸图像,即提取所述样本帧中的人脸图像或提取所述待分类帧中的人脸图像,并对所述样本帧中提取的人脸图像和所述待分类帧中提取的人脸图像进行预处理,得到所述样本帧中预处理后的人脸图像和所述待分类帧中预处理后的人脸图像。所述预处理包括灰度变换、人脸图像直方图的均衡化、中值滤波和同态滤波等。其中,所述灰度变换、人脸图像直方图的均衡化、中值滤波和同态滤波等的执行先后顺序可以变换。如所述终端对所样本帧中的A、B、C、D、E、F、G、H、I和J这10个人脸图像进行灰度变换、人脸图像直方图的均衡化、中值滤波和同态滤波等操作,得到预处理后的A、B、C、D、E、F、G、H、I和J的人脸图像,并对所述待分类帧中的4个I帧中的人脸图像进行预处理,得到4个I帧中预处理后的人脸图像。具体地,参照图5,图5为本发明样本帧中的人脸图像进行预处理后的人脸图像的示意图。
进一步地,所述对人脸图像进行预处理包括:
对所述人脸图像进行灰度变换;
所述终端所述样本帧中的人脸图像和待分类帧中的人脸图像进行灰度变换,得到所述样本帧中和待分类帧中进行灰度变换后的人脸图像。所述灰度变换又称为灰度拉伸和对比度拉伸,它是最基本的一种点操作,根据原始图像中每个像素的灰度值,按照某种映射规则,将其变换为另一种灰度值,通过对原始图像中每个像素赋一个新的灰度值来达到增强图像的目的。
对所述灰度变换后的人脸图像进行人脸图像直方图的均衡化处理。
所述终端对所述灰度变换后的人脸图像进行人脸图像直方图的均衡化处理,得到所述样本帧中和待分类帧中进行直方图均衡化处理的人脸图像,即对应得到预处理后的人脸图像。所述人脸图像直方图的均衡化的步骤为:①统计所述灰度变换后的人脸图像直方图;②根据统计出的人脸图像直方图采用累积分布函数做变换,求得变换后的新灰度;③用所述新灰度代替旧灰度,这一步是近似的过程,应根据合理的目的尽量做到合理,同时把灰度值相等或近似的合并在一起。
进一步地,所述终端对所述样本帧中的人脸图像和待分类帧中的人脸图像进行中值滤波,所述中值滤波是把局部区域的像素按灰度等级进行排序,取该领域中灰度的中值作为当前像素的灰度值。所述中值滤波的步骤为:①将滤波模板在图像中漫游,并将模板中心与图中某个像素位置重合;②读取模板中各对应像素的灰度值;③将这些灰度值从小到大排列;④取这一列数据的中间数据赋给对应模板中心位置的像素。所述终端对所述样本帧中的人 脸图像和待分类帧中的人脸图像进行同态滤波。所述同态滤波是将图像乘积形式的亮度模型(非可加性)变成可加形式,以便进行滤波增强处理。所述同态滤波的步骤为:①对亮度函数两边作对数变换,再取傅氏变换;②通过统一滤波器;③对滤波器的输出取傅氏反变换,再取指数变换。选取合适的滤波器,可以适当压缩照度分量的动态范围,同时适当提升反射度分量,可以改善图像对比度,突出物体轮廓。
步骤S23,对预处理后的所述人脸图像进行环形对称Gabor变换;
所述终端对所述样本帧中预处理后的人脸图像进行环形对称Gabor变换,得到所述样本帧中经过环形对称Gabor变换的人脸图像,即得到所述样本帧中经过CSGT变换的人脸图像;所述终端对所述待分类帧中预处理后的人脸图像进行环形对称Gabor变换,得到所述待分类帧中经过环形对称Gabor变换的人脸图像,即得到所述待分类帧中经过CSGT变换的人脸图像。具体地,参照图6和图7,图6为本发明中某一人脸图像经环形对称Gabor变换后对应的幅值图像的示意图,其中,所述图6是以所述样本帧中的A人脸图像为例,即所述图6表示所述A人脸图像经过环形对称Gabor变换后所对应的幅值图像;图7为本发明中人脸图像经过环形对称Gabor变换后的纹理图像的示意图,其中,所述图7是以所述样本帧中的A人脸图像为例,即图7表示所述A人脸图像经过环形对称Gabor变换后所对应的纹理图像。
步骤S24,对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
所述终端对所述样本帧中环形对称Gabor变换后的人脸图像进行所述LBP变换,得到所述样本帧中经过所述LBP变换后的人脸图像,从所述LBP变换后的人脸图像中提取人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。如提取所述样本帧中A、B、C、D、E、F、G、H、I和J这10个人脸图像对应的人脸图像直方图;所述终端对所述待分类帧中环形对称Gabor变换后的人脸图像进行所述LBP变换,得到所述样本帧中LBP变换后的人脸图像,从所述LBP变换后的人脸图像中提取人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。如提取所述4个I帧中的人脸图像直方图。具体地,参照图8和图9,图8为图7中其中一幅人脸纹理图像经过局部二值模式变换后提取的人脸图像直方图的示意图,本实施例中图8表示所述A人脸图像经过环形对称Gabor变换后所得的五幅人脸纹理图像中,其中一幅人脸纹理图像经过所述局部二值模式(LBP)变换后所提取的人脸图 像直方图,图9为图7中5幅人脸图像叠加到一起的人脸图像直方图的示意图,即图9表示将经过环形对称Gabor变换和所述局部二值模式(LBP)变换后所得的五幅A人脸图像直方图叠加到一起的示意图。若所述待分类帧中存在多个人脸图像,则会生成多个如图9所示的人脸图像直方图的示意图。
进一步地,所述“将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对”具体为:
根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
所述终端根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。所述欧式距离公式为:
Figure PCTCN2016084620-appb-000001
其中,(xi,yi)为对应的待分类帧中第i个人脸图像的位置坐标,(xj,yj)为样本帧的第j个人脸图像的位置坐标,Di,j为所述待分类帧中第i个人脸图像与所述样本帧第j个样本人脸图像的距离。具体地,参照图10,图10为本发明中某个待分类帧中的人脸图像与样本帧中某个人脸图像为同一个人脸图像的直方图匹配结果示意图,例如:图10表示第2I帧中的人脸图像与样本帧中的第5个人脸图像(即人脸图像E)相同,即第2I帧对应的标号有E。
本发明进一步提供一种基于人脸图像的视频分类装置。
参照图3,图3为本发明基于人脸图像的视频分类装置较佳实施例的功能模块示意图。
在本实施例中,所述基于人脸图像的视频分类装置包括:
第一提取模块10,用于依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
终端通过摄像头进行采集视频,所述视频包括了I帧、B帧和P帧,其中所述I帧为关键帧,最能体现和代表视频中的重要信息,因此在视频帧的分析 过程中,只提取视频的I帧分析。所述终端依次提取所述视频中的所有关键帧,并将包括所述视频中所有人脸图像的关键帧作为样本帧,将所述视频中除所述样本帧以外的关键帧作为待分类帧。其中,所述终端为了能够从所述视频中提取出较为清晰的人脸图像,所述摄像头选择像素高的摄像头。
较优的,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像。如所述终端依次获取所述视频中的每一I帧,并对每一I帧进行序数标号,如所述视频一共包括5个I帧,则将所述5个I帧按顺序标为1、2、3、4和5。其中,所述样本帧为所有I帧中的第一个I帧,所述第一个I帧中包括所述视频中所有的人脸图像,所述待分类帧为剩余的I帧,即剩余4个I帧为待分类帧。
第二提取模块20,用于通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
所述终端通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号。其中,所述预设算法包括但不限于环形对称Gabor变换与局部二值模式(LBP,Local Binary Pattern)相结合的人脸图像提取算法,所述预设算法还可以为基于模板匹配的方法、基于奇异值特征方法和子空间分析法等的人脸图像提取算法。其中,在本实施例中,所述预设算法为环形对称Gabor变换与局部二值模式(LBP)相结合的人脸图像提取算法。如所述终端通过环形对称Gabor变换与局部二值模式(LBP)相结合的算法从所述样本帧中提取到10个人脸图像,并对这10个人脸图像进行标号,依次标记为A、B、C、D、E、F、G、H、I和J,其中,A表示所述样本帧中的第一个人脸图像、B表示所述样本帧中的第二个人脸图像、C表示所述样本帧中的第三个人脸图像等。所述环形对称Gabor变换CSGT(Circularly Symmetric Gabor Transform,环形对称Gabor变换)是5个尺度多个方向的一种小波变换,它将一幅图像变换为5个尺度多个方向上的图像。在本实施例中,所述终端为了归一化所述样本帧中的人脸图像和所述待分类帧中的人脸图像,将所述样本帧中的人脸图像和所述待分类帧中的人脸图像设置为指定大小区域的图像,如设置的指定区域的大小为54*54。所述终端通过环形对称Gabor变换与局部二值模式LBP相结合的算法从所述待分类帧中提取人脸图像。
提取人脸图像的纹理信息的传统方法为GT(Gabor Transform,Gabor变换)和所述LBP相结合的算法,在利用所述GT进行人脸图像的纹理信息的滤波提取过程中,首先将所述人脸图像经过所述GT变换,得到滤波后的人脸 图像,所述GT变换是针对5个尺度8个方向的变换,即生成40幅滤波图像,然后再对所述40幅滤波图像进行所述LBP变换,最后进行人脸图像的识别。该方法计算复杂度过高,且计算时间过长,导致视频的读取分析时间长,效率低。而本实施例是通过将所述CSGT和所述LBP相结合的算法提取人脸图像的纹理信息,所述人脸图像经过所述CSGT变换后生成5幅滤波图像进行叠加,重组成5幅滤波图像,然后对重组后的滤波图像进行能量提取,以提取出最能描述所述人脸图像的纹理信息的图像,再对所述纹理图像信息进行所述LBP变换。相对于采用所述GT和所述LBP相结合的人脸识别算法,只需要计算5幅滤波图像,相对比计算40幅滤波图像,降低了计算量,减少了计算时间。
比对模块30,用于将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
所述终端将所述待分类帧中的人脸图像与所述样本帧中包含的人脸图像进行比对,以判断所述待分类中的人脸图像在所述样本帧中所对应的标号。如将所述终端待分类帧(2、3、4和5I帧)中的人脸图像与所述样本帧中A、B、C、D、E、F、G、H、I和J这10个人脸图像进行比对,得到所述第2I帧和第3I帧中的人脸图像与A人脸图像一致,即所述第2I帧和第3I帧中的人脸图像对应着A标号;或得到所述第2I帧、4I帧和第5I帧中的人脸图像与D人脸图像一致,即所述第2I帧、4I帧和第5I帧中的人脸图像对应着D标号。需要说明的是,待分类帧中可能不仅包括一个人脸图像,当包括两个以上人脸图像时,同一待分类帧可能属于不同的分类,进而可能存在于不同的视频片段中。
分类重组模块40,用于将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。
所述终端将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。如所述终端将所述待分类帧中具有标号A的人脸图像的第2I帧和第3I帧划分为同一类帧,即将所述待分类帧中的第2I帧和第3I帧划分为包含A人脸图像的I帧,并将所述第2I帧和所述第3I帧进行重组,得到具有A人脸图像的视频片段;所述终端将所述待分类帧中具有标号D的人脸图像的第2I帧、4I帧和第5I帧划分为同一类帧,即将所述待分类帧中的第2I帧、4I帧和第5I帧划分为包含D人脸图像的I帧,并将所述第2I帧、所述4I帧和所述第5I帧进行重 组,得到具有D人脸图像的视频片段。
本实施例通过提取视频中的所有关键帧,将包括视频中所有人脸图像的关键帧作为样本帧,将样本帧以外的关键帧作为待分类帧;通过预设算法提取样本帧和待分类帧中的人脸图像,并对样本帧中的人脸图像进行标号;将待分类帧中的人脸图像依次与样本帧中的人脸图像进行比对,将具有相同标号的待分类帧划分为一类并将同类的待分类帧进行重组以得到具有同一人脸图像的视频片段;实现了将视频中所有关键帧基于人脸图像进行分类、重组,进而当判断某个人脸图像是否存在于某一段视频中或存在于该段视频的哪一位置时,无需从头至尾提取视频中所包含的所有人脸图像、并将人脸图像依次比对分析,而只需查找具有相同人脸图像的视频片段即可,大大降低了计算量,缩短了计算时间、提高了效率。
参照图4,图4为本发明实施例中第二提取模块的一种功能模块示意图。
在本实施例中,所述第二提取模块20包括:
第一提取单元21,用于提取关键帧中的人脸图像,其中关键帧为样本帧或待分类帧;
预处理单元22,用于对所述人脸图像进行预处理;
所述终端提取所述关键帧中的人脸图像,即提取所述样本帧中的人脸图像或提取所述待分类帧中的人脸图像,并对所述样本帧中提取的人脸图像和所述待分类帧中提取的人脸图像进行预处理,得到所述样本帧中预处理后的人脸图像和所述待分类帧中预处理后的人脸图像。所述预处理包括灰度变换、人脸图像直方图的均衡化、中值滤波和同态滤波等。其中,所述灰度变换、人脸图像直方图的均衡化、中值滤波和同态滤波等的执行先后顺序可以变换。如所述终端对所样本帧中的A、B、C、D、E、F、G、H、I和J这10个人脸图像进行灰度变换、人脸图像直方图的均衡化、中值滤波和同态滤波等操作,得到预处理后的A、B、C、D、E、F、G、H、I和J的人脸图像,并对所述待分类帧中的4个I帧中的人脸图像进行预处理,得到4个I帧中预处理后的人脸图像。具体地,参照图5,图5为本发明样本帧中的人脸图像进行预处理后的人脸图像的示意图。
进一步地,所述对人脸图像进行预处理包括:
对所述人脸图像进行灰度变换;
所述终端所述样本帧中的人脸图像和待分类帧中的人脸图像进行灰度变换,得到所述样本帧中和待分类帧中进行灰度变换后的人脸图像。所述灰度 变换又称为灰度拉伸和对比度拉伸,它是最基本的一种点操作,根据原始图像中每个像素的灰度值,按照某种映射规则,将其变换为另一种灰度值,通过对原始图像中每个像素赋一个新的灰度值来达到增强图像的目的。
对所述灰度变换后的人脸图像进行人脸图像直方图的均衡化处理。
所述终端对所述灰度变换后的人脸图像进行人脸图像直方图的均衡化处理,得到所述样本帧中和待分类帧中进行直方图均衡化处理的人脸图像,即对应得到预处理后的人脸图像。所述人脸图像直方图的均衡化的步骤为:①统计所述灰度变换后的人脸图像直方图;②根据统计出的人脸图像直方图采用累积分布函数做变换,求得变换后的新灰度;③用所述新灰度代替旧灰度,这一步是近似的过程,应根据合理的目的尽量做到合理,同时把灰度值相等或近似的合并在一起。
进一步地,所述终端对所述样本帧中的人脸图像和待分类帧中的人脸图像进行中值滤波,所述中值滤波是把局部区域的像素按灰度等级进行排序,取该领域中灰度的中值作为当前像素的灰度值。所述中值滤波的步骤为:①将滤波模板在图像中漫游,并将模板中心与图中某个像素位置重合;②读取模板中各对应像素的灰度值;③将这些灰度值从小到大排列;④取这一列数据的中间数据赋给对应模板中心位置的像素。所述终端对所述样本帧中的人脸图像和待分类帧中的人脸图像进行同态滤波。所述同态滤波是将图像乘积形式的亮度模型(非可加性)变成可加形式,以便进行滤波增强处理。所述同态滤波的步骤为:①对亮度函数两边作对数变换,再取傅氏变换;②通过统一滤波器;③对滤波器的输出取傅氏反变换,再取指数变换。选取合适的滤波器,可以适当压缩照度分量的动态范围,同时适当提升反射度分量,可以改善图像对比度,突出物体轮廓。
变换单元23,用于对预处理后的所述人脸图像进行环形对称Gabor变换;
所述终端对所述样本帧中预处理后的人脸图像进行环形对称Gabor变换,得到所述样本帧中经过环形对称Gabor变换的人脸图像,即得到所述样本帧中经过CSGT变换的人脸图像;所述终端对所述待分类帧中预处理后的人脸图像进行环形对称Gabor变换,得到所述待分类帧中经过环形对称Gabor变换的人脸图像,即得到所述待分类帧中经过CSGT变换的人脸图像。具体地,参照图6和图7,图6为本发明中某一人脸图像经环形对称Gabor变换后对应的幅值图像的示意图,其中,所述图6是以所述样本帧中的A人脸图像为例,即所述图6表示所述A人脸图像经过环形对称Gabor变换后所对应的幅值图像;图7为本发明中人脸图像经过环形对称Gabor变换后的纹理图像的示意图,其中,所述图 7是以所述样本帧中的A人脸图像为例,即图7表示所述A人脸图像经过环形对称Gabor变换后所对应的纹理图像。
第二提取单元24,用于对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
所述终端对所述样本帧中环形对称Gabor变换后的人脸图像进行所述LBP变换,得到所述样本帧中经过所述LBP变换后的人脸图像,从所述LBP变换后的人脸图像中提取人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。如提取所述样本帧中A、B、C、D、E、F、G、H、I和J这10个人脸图像对应的人脸图像直方图;所述终端对所述待分类帧中环形对称Gabor变换后的人脸图像进行所述LBP变换,得到所述样本帧中LBP变换后的人脸图像,从所述LBP变换后的人脸图像中提取人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。如提取所述4个I帧中的人脸图像直方图。具体地,参照图8和图9,图8为图7中其中一幅人脸纹理图像经过局部二值模式变换后提取的人脸图像直方图的示意图,本实施例中图8表示所述A人脸图像经过环形对称Gabor变换后所得的五幅人脸纹理图像中,其中一幅人脸纹理图像经过所述局部二值模式(LBP)变换后所提取的人脸图像直方图,图9为图7中5幅人脸图像叠加到一起的人脸图像直方图的示意图,即图9表示将经过环形对称Gabor变换和所述局部二值模式(LBP)变换后所得的五幅A人脸图像直方图叠加到一起的示意图。若所述待分类帧中存在多个人脸图像,则会生成多个如图9所示的人脸图像直方图的示意图。
进一步地,所述比对模块30具体为:根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
所述终端根据所述待分类帧对应的人脸图像直方图、所述样本帧中对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。所述欧式距离公式为:
Figure PCTCN2016084620-appb-000002
其中,(xi,yi)为对应的待分类帧中第i个人脸图像的位置坐标,(xj,yj)为 样本帧的第j个人脸图像的位置坐标,Di,j为所述待分类帧中第i个人脸图像与所述样本帧第j个样本人脸图像的距离。具体地,参照图10,图10为本发明中某个待分类帧中的人脸图像与样本帧中某个个人脸图像为同一个人脸图像的直方图匹配结果示意图,例如:图10表示第2I帧中的人脸图像与样本帧中的第5个人脸图像(即人脸图像E)相同,即第2I帧对应的标号有E。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (13)

  1. 一种基于人脸图像的视频分类方法,其特征在于,所述基于人脸图像的视频分类方法包括以下步骤:
    依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
    通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
    将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
    将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段;
    其中,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像;
    所述预设算法为环形对称Gabor变换与局部二值模式相结合的人脸图像提取算法。
  2. 如权利要求1所述的基于人脸图像的视频分类方法,其特征在于,所述预设算法包括以下步骤:
    提取所述关键帧中的人脸图像,所述关键帧为所述样本帧或所述待分类帧;
    对所述人脸图像进行预处理;
    对预处理后的所述人脸图像进行环形对称Gabor变换;
    对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
  3. 如权利要求2所述的基于人脸图像的视频分类方法,其特征在于,所述“将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对”具体为:
    根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分 类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
  4. 一种基于人脸图像的视频分类方法,其特征在于,所述基于人脸图像的视频分类方法包括以下步骤:
    依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
    通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
    将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
    将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。
  5. 如权利要求4所述的基于人脸图像的视频分类方法,其特征在于,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像。
  6. 如权利要求4所述的基于人脸图像的视频分类方法,其特征在于,所述预设算法为环形对称Gabor变换与局部二值模式相结合的人脸图像提取算法。
  7. 如权利要求6所述的基于人脸图像的视频分类方法,其特征在于,所述预设算法包括以下步骤:
    提取所述关键帧中的人脸图像,所述关键帧为所述样本帧或所述待分类帧;
    对所述人脸图像进行预处理;
    对预处理后的所述人脸图像进行环形对称Gabor变换;
    对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
  8. 如权利要求7所述的基于人脸图像的视频分类方法,其特征在于,所述“将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行 比对”具体为:
    根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
  9. 一种基于人脸图像的视频分类装置,其特征在于,所述基于人脸图像的视频分类装置包括:
    第一提取模块,用于依次提取视频中的所有关键帧,将包括所述视频中所有人脸图像的所述关键帧作为样本帧,将所述样本帧以外的所述关键帧作为待分类帧;
    第二提取模块,用于通过预设算法提取所述样本帧和所述待分类帧中的人脸图像,并对所述样本帧中的人脸图像进行标号;
    比对模块,用于将所述待分类帧中的人脸图像与所述样本帧中包含的所述人脸图像进行比对,以判断所述待分类帧中的人脸图像所对应的标号;
    分类重组模块,用于将具有相同标号的人脸图像的所述待分类帧划分为一类,并将同类的所述待分类帧进行重组以得到具有同一人脸图像的视频片段。
  10. 如权利要求9所述的基于人脸图像的视频分类装置,其特征在于,所述样本帧为所述视频中的第一个所述关键帧,第一个所述关键帧包括所述视频中所有的人脸图像。
  11. 如权利要求9所述的基于人脸图像的视频分类装置,其特征在于,所述预设算法为环形对称Gabor变换与局部二值模式相结合的人脸图像提取算法。
  12. 如权利要求11所述的基于人脸图像的视频分类装置,其特征在于,所述第二提取模块包括:
    第一提取单元,用于提取所述关键帧中的人脸图像,所述关键帧为所述样本帧或所述待分类帧;
    预处理单元,用于对所述人脸图像进行预处理;
    变换单元,用于对预处理后的所述人脸图像进行环形对称Gabor变换;
    第二提取单元,用于对环形对称Gabor变换后的所述人脸图像进行局部二值模式变换处理,并提取局部二值模式变换后的人脸图像直方图,每一所述人脸图像对应于一所述人脸图像直方图。
  13. 如权利要求12所述的基于人脸图像的视频分类装置,其特征在于,所述比对模块具体为:
    根据所述待分类帧对应的人脸图像直方图、所述样本帧对应的所有人脸图像直方图,通过欧式距离公式进行计算,以获取所述待分类帧中的人脸图像与所述样本帧包含的所有人脸图像之间的距离,其中距离最小时所述待分类帧中的人脸图像与所述样本帧中对应的人脸图像相同。
PCT/CN2016/084620 2015-11-10 2016-06-03 基于人脸图像的视频分类方法和装置 WO2017080196A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510763274.0A CN105426829B (zh) 2015-11-10 2015-11-10 基于人脸图像的视频分类方法和装置
CN201510763274.0 2015-11-10

Publications (1)

Publication Number Publication Date
WO2017080196A1 true WO2017080196A1 (zh) 2017-05-18

Family

ID=55505029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/084620 WO2017080196A1 (zh) 2015-11-10 2016-06-03 基于人脸图像的视频分类方法和装置

Country Status (2)

Country Link
CN (1) CN105426829B (zh)
WO (1) WO2017080196A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932254A (zh) * 2017-05-25 2018-12-04 中兴通讯股份有限公司 一种相似视频的检测方法、设备、系统及存储介质
CN109859234A (zh) * 2017-11-29 2019-06-07 深圳Tcl新技术有限公司 一种视频人体轨迹跟踪方法、装置及存储介质
CN110084259A (zh) * 2019-01-10 2019-08-02 谢飞 一种结合面部纹理和光流特征的面瘫分级综合评估系统
CN110084130A (zh) * 2019-04-03 2019-08-02 深圳鲲云信息科技有限公司 基于多目标跟踪的人脸筛选方法、装置、设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426829B (zh) * 2015-11-10 2018-11-16 深圳Tcl新技术有限公司 基于人脸图像的视频分类方法和装置
CN105528616B (zh) * 2015-12-02 2019-03-12 深圳Tcl新技术有限公司 人脸识别方法和装置
CN106227868A (zh) * 2016-07-29 2016-12-14 努比亚技术有限公司 视频文件的归类方法和装置
CN107341443B (zh) * 2017-05-23 2018-06-22 深圳云天励飞技术有限公司 视频处理方法、装置及存储介质
CN108881813A (zh) * 2017-07-20 2018-11-23 北京旷视科技有限公司 一种视频数据处理方法及装置、监控系统
CN107463922B (zh) * 2017-08-17 2020-02-14 北京星选科技有限公司 信息显示方法、信息匹配方法、对应的装置和电子设备
CN108388862B (zh) * 2018-02-08 2021-09-14 西北农林科技大学 基于lbp特征及最近邻分类器的人脸识别方法
CN110602527B (zh) * 2019-09-12 2022-04-08 北京小米移动软件有限公司 视频处理方法、装置及存储介质
CN111652186A (zh) * 2020-06-23 2020-09-11 勇鸿(重庆)信息科技有限公司 一种视频类别识别的方法及相关装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221620A (zh) * 2007-12-20 2008-07-16 北京中星微电子有限公司 人脸跟踪方法
CN101281540A (zh) * 2007-04-04 2008-10-08 索尼株式会社 用于处理信息的设备、方法和计算机程序
US20100172581A1 (en) * 2008-12-19 2010-07-08 Tandberg Telecom As Method, system, and computer readable medium for face detection
CN102306290A (zh) * 2011-10-14 2012-01-04 刘伟华 一种基于视频的人脸跟踪识别技术
CN102360421A (zh) * 2011-10-19 2012-02-22 苏州大学 一种基于视频流的人脸识别方法及系统
CN104778457A (zh) * 2015-04-18 2015-07-15 吉林大学 基于多示例学习的视频人脸识别算法
CN105426829A (zh) * 2015-11-10 2016-03-23 深圳Tcl新技术有限公司 基于人脸图像的视频分类方法和装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101281540A (zh) * 2007-04-04 2008-10-08 索尼株式会社 用于处理信息的设备、方法和计算机程序
CN101221620A (zh) * 2007-12-20 2008-07-16 北京中星微电子有限公司 人脸跟踪方法
US20100172581A1 (en) * 2008-12-19 2010-07-08 Tandberg Telecom As Method, system, and computer readable medium for face detection
CN102306290A (zh) * 2011-10-14 2012-01-04 刘伟华 一种基于视频的人脸跟踪识别技术
CN102360421A (zh) * 2011-10-19 2012-02-22 苏州大学 一种基于视频流的人脸识别方法及系统
CN104778457A (zh) * 2015-04-18 2015-07-15 吉林大学 基于多示例学习的视频人脸识别算法
CN105426829A (zh) * 2015-11-10 2016-03-23 深圳Tcl新技术有限公司 基于人脸图像的视频分类方法和装置

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932254A (zh) * 2017-05-25 2018-12-04 中兴通讯股份有限公司 一种相似视频的检测方法、设备、系统及存储介质
CN109859234A (zh) * 2017-11-29 2019-06-07 深圳Tcl新技术有限公司 一种视频人体轨迹跟踪方法、装置及存储介质
CN110084259A (zh) * 2019-01-10 2019-08-02 谢飞 一种结合面部纹理和光流特征的面瘫分级综合评估系统
CN110084259B (zh) * 2019-01-10 2022-09-20 谢飞 一种结合面部纹理和光流特征的面瘫分级综合评估系统
CN110084130A (zh) * 2019-04-03 2019-08-02 深圳鲲云信息科技有限公司 基于多目标跟踪的人脸筛选方法、装置、设备及存储介质
CN110084130B (zh) * 2019-04-03 2023-07-25 深圳鲲云信息科技有限公司 基于多目标跟踪的人脸筛选方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN105426829A (zh) 2016-03-23
CN105426829B (zh) 2018-11-16

Similar Documents

Publication Publication Date Title
WO2017080196A1 (zh) 基于人脸图像的视频分类方法和装置
Nishiyama et al. Facial deblur inference using subspace analysis for recognition of blurred faces
CN106778788B (zh) 对图像进行美学评价的多特征融合方法
Ye et al. No-reference image quality assessment using visual codebooks
CN107133575B (zh) 一种基于时空特征的监控视频行人重识别方法
Mady et al. Face recognition and detection using Random forest and combination of LBP and HOG features
Nhat et al. Feature fusion by using LBP, HOG, GIST descriptors and Canonical Correlation Analysis for face recognition
AU2011207120B2 (en) Identifying matching images
WO2017092272A1 (zh) 人脸识别方法和装置
CN111126240A (zh) 一种三通道特征融合人脸识别方法
CN104008404B (zh) 一种基于显著直方图特征的行人检测方法及系统
Nizami et al. No-reference image quality assessment using bag-of-features with feature selection
Yang et al. Quality classified image analysis with application to face detection and recognition
Yadav et al. An improved hybrid illumination normalisation and feature extraction model for face recognition
CN111259792A (zh) 基于dwt-lbp-dct特征的人脸活体检测方法
Patil et al. Expression invariant face recognition using semidecimated DWT, Patch-LDSMT, feature and score level fusion
Sharif et al. Real time face detection
Özyurt et al. A new method for classification of images using convolutional neural network based on Dwt-Svd perceptual hash function
Bala et al. An Illumination Insensitive Normalization Approach to Face Recognition Using Locality Sensitive Discriminant Analysis.
Jha et al. Integrating Global and Local Features for Efficient Face Identification Using Deep CNN Classifier
Xu et al. Region-based pornographic image detection
Li et al. Weighted contourlet binary patterns and image-based fisher linear discriminant for face recognition
Niazi et al. Hybrid face detection in color images
CN115273202A (zh) 一种人脸比对方法、系统、设备及存储介质
CN111160423B (zh) 一种基于集成映射的图像来源鉴别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16863377

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16863377

Country of ref document: EP

Kind code of ref document: A1