CN105354543A

CN105354543A - Video processing method and apparatus

Info

Publication number: CN105354543A
Application number: CN201510719389.XA
Authority: CN
Inventors: 张涛; 陈志军; 汪平仄
Original assignee: Xiaomi Inc
Current assignee: Beijing Xiaomi Technology Co Ltd; Xiaomi Inc
Priority date: 2015-10-29
Filing date: 2015-10-29
Publication date: 2016-02-24

Abstract

The present disclosure relates to a video processing method and apparatus. The method comprises: locating a face image in a detection frame by performing face detection on the detection frame in a to-be-processed target video; performing face tracking on a track frame in the to-be-processed target video based on face information obtained through detection, to determine face information comprised in the track frame; and extracting the detection frame and track frame comprising the face information from the target video, performing face recognition on an extracted to-be-recognized frame image based on a recognition model that is acquired in advance, and performing screening on a recognition result, to acquire a final face recognition result. Therefore, the final face recognition result is displayed to a user, to prompt the user actor information appeared in the video. According to the video processing method, face recognition efficiency is effectively improved.

Description

Method for processing video frequency and device

Technical field

The disclosure relates to technical field of image processing, particularly relates to a kind of method for processing video frequency and device.

Background technology

Along with the popularity of intelligent terminal is more and more higher, entertainment industry flourish, user can utilize intelligent terminal to enjoy colourful video frequency program whenever and wherever possible.

User watches the original intention of video frequency program often due to the personal like to some star in amusement circle, performer, and from the angle of user's impression, user generally can wish as much as possiblely to find the video frequency program relevant with the star that oneself is liked.

But current video frequency program, if provide performers and clerks' information in introductory video, then user can know the performer whether comprising in video and oneself like; If do not provide performers and clerks' information in introductory video, then user cannot know the performer whether existing in video and oneself like.In addition, for a lot of spectators, be difficult to the face of name and performer to sit in the right seat by means of only the actor names in performers and clerks' information.

Summary of the invention

The disclosure provides a kind of method for processing video frequency and device, by carrying out the extraction of facial image to each two field picture forming video, and automatically the face in video is identified based on model of cognition, to realize in prompting user video there is the information of performer, the method can effectively improve recognition of face efficiency.

For overcoming Problems existing in correlation technique, the disclosure provides a kind of method for processing video frequency and device, and described technical scheme is as follows:

According to the first aspect of disclosure embodiment, a kind of method for processing video frequency is provided, comprises:

Obtain pending target video, comprise in described target video: detect frame and tracking frame;

Face datection is carried out to described detection frame, obtains the detection data of the facial image comprised in described detection frame; Described detection packet contains: the face mark distinguishing different facial image;

Identify according to described face, face tracking is carried out to described tracking frame, determine whether to comprise in described tracking frame to identify corresponding facial image with described face;

From described detection frame and described tracking frame, extract the frame including described face mark, obtain two field picture to be identified;

Based on the model of cognition obtained in advance, recognition of face is carried out to described two field picture to be identified, obtains the face recognition result of facial image in every frame;

All face recognition result of described two field picture to be identified are screened, obtains the final recognition result of the facial image occurred in described target video.

Further, described detection frame is after carrying out predetermined interval division to described target video, the frame corresponding to each division points;

Described tracking frame is remove the frame of video outside described detection frame in described target video.

Further, described Face datection is carried out to described detection frame, obtains the detection data of the facial image comprised in described detection frame, comprising:

Order in chronological sequence, carries out Face datection to the current detection frame in described target video, obtains the detection data of the facial image comprised in described current detection frame; Described detection packet contains: the face mark that current detection frame is corresponding;

Described method also comprises: face corresponding for described current detection frame mark identified with the face obtained and compare, the face newly increased mark stored, obtain face to be tracked mark;

Accordingly, describedly to identify according to described face, face tracking carried out to described tracking frame, determines whether to comprise in described tracking frame to identify corresponding facial image with described face, comprising:

Whether the face mark described to be tracked according to having stored carries out face tracking to described current detection frame and the next described tracking frame detected between frame, determine to comprise in described tracking frame to identify corresponding facial image with described face to be tracked;

Upgrading described next one detection frame is described current detection frame, returns and performs described step of the current detection frame in described target video being carried out to Face datection.

Face datection is carried out to all detection frames in described target video, obtains the detection data of the facial image comprised in described all detection frames; Described detection packet contains: all people face corresponding with described all detection frames identifies;

According to described all people's face mark, face tracking is carried out to described tracking frame, determine whether to comprise in described tracking frame and identify corresponding facial image with described all people's face.

Further, described method also comprises:

The training sample of predetermined number facial image is adopted to train degree of depth convolutional neural networks, the model of cognition obtained in advance described in obtaining.

Further, the training sample of described employing predetermined number facial image is trained described degree of depth convolutional neural networks, and the model of cognition obtained in advance described in obtaining, comprising:

Described training sample is normalized, obtains standard-sized sample data;

Described standard-sized sample data is calculated, obtains ZCA matrix and Mean Matrix;

Based on described ZCA matrix and Mean Matrix, pre-service is carried out to described training sample, obtain pretreated input data; Described pre-service comprises: ZCA whitening processing;

Described input data are inputted in described degree of depth convolutional neural networks and trains, obtain training the complete described model of cognition obtained in advance.

Further, the described model of cognition based on obtaining in advance, carries out recognition of face to described two field picture to be identified, obtains the face recognition result of facial image in every frame, comprising:

Pre-service is carried out to described two field picture to be identified, obtains normalization facial image to be identified;

In the model of cognition obtained in advance described in being input to by facial image to be identified for described normalization, carry out feature extraction, obtain the high dimensional feature vector of the face to be identified corresponding to each two field picture to be identified;

Utilize the high dimensional feature vector of linear discriminate analysis LDA projection matrix to described face to be identified with reference to face prestored to carry out dimension-reduction treatment, obtain the dimensionality reduction proper vector of face to be identified;

COS distance tolerance is carried out to the dimensionality reduction proper vector of described face to be identified, the result after described tolerance and predetermined threshold value are compared;

If the result after described tolerance is greater than described predetermined threshold value, identify that the reference face characteristic prestored in described two field picture to be identified and face database matches;

If the result after described tolerance is less than or equal to described predetermined threshold value, identify described two field picture to be identified and not mating with reference to face characteristic of prestoring in face database.

Further, prestore the face characteristic data with reference to face in described face database, the described face characteristic data with reference to face comprise: the described LDA projection matrix with reference to face; Described method also comprises:

Pre-service is carried out to the described reference facial image with reference to face, obtains normalization with reference to facial image;

In the model of cognition obtained in advance described in described normalization being input to reference to facial image, carry out feature extraction, obtain corresponding to each described vector of the high dimensional feature with reference to facial image;

LDA training is carried out to the described high dimensional feature vector with reference to facial image, obtains the dimensionality reduction proper vector with reference to facial image;

According to the described dimensionality reduction proper vector with reference to facial image, generate the described LDA projection matrix with reference to face.

Further, describedly Face datection carried out to described detection frame comprise:

AdaBoost iterative algorithm is adopted to carry out Face datection to described detection frame.

Further, described predetermined interval is for presetting at equal intervals or default unequal interval.

According to the second aspect of disclosure embodiment, a kind of video process apparatus is provided, comprises:

First acquisition module, for obtaining pending target video, comprises in described target video: detect frame and tracking frame;

Detection module, carries out Face datection for the described detection frame got described first acquisition module;

Second acquisition module, for obtaining the detection data of the facial image comprised in described detection frame that described detection module detects; Described detection packet contains: the face mark distinguishing different facial image;

Tracking module, for identifying according to described face, carries out face tracking to described tracking frame;

Determination module, identifies corresponding facial image for determining whether to comprise in the described tracking frame that described tracking module is followed the tracks of with described face;

Extraction module, for extracting the frame including described face mark from described detection frame and described tracking frame, obtains two field picture to be identified;

Identification module, for based on the model of cognition obtained in advance, carries out recognition of face to described two field picture to be identified, obtains the face recognition result of facial image in every frame;

Screening module, for screening all face recognition result of described two field picture to be identified, obtains the final recognition result of the facial image occurred in described target video.

Further, described detection module comprises: the first detection sub-module;

Described first detection sub-module, for order in chronological sequence, carries out Face datection to the current detection frame in described target video;

Described second acquisition module comprises: first obtains submodule;

Described first obtains submodule, for obtaining the detection data of the facial image comprised in described current detection frame; Described detection packet contains: the face mark that current detection frame is corresponding;

Described device also comprises:

Memory module, comparing for face corresponding for described current detection frame mark being identified with the face obtained, the face newly increased mark being stored, obtaining face to be tracked mark;

Accordingly, described tracking module comprises: first follows the tracks of submodule;

Described first follows the tracks of submodule, whether the face mark described to be tracked for having stored according to described memory module carries out face tracking to described current detection frame and the next described tracking frame detected between frame, determine to comprise in described tracking frame to identify corresponding facial image with described face to be tracked;

Update module, being described current detection frame for upgrading described next one detection frame, returning described first detection sub-module.

Further, described detection module comprises: the second detection sub-module;

Described second detection sub-module, for carrying out Face datection to all detection frames in described target video;

Described second acquisition module comprises: second obtains submodule;

Described second obtains submodule, for obtaining the detection data of the facial image comprised in described all detection frames; Described detection packet contains: all people face corresponding with described all detection frames identifies;

Accordingly, described tracking module comprises: second follows the tracks of submodule;

Whether described second follows the tracks of submodule, for according to described all people's face mark, carries out face tracking to described tracking frame, determine to comprise in described tracking frame to identify corresponding facial image with described all people's face.

Further, described device also comprises:

Training module, for adopting the training sample of predetermined number facial image to train the degree of depth convolutional neural networks that described identification module adopts, the model of cognition obtained in advance described in obtaining.

Further, described training module comprises:

First normalization submodule, for being normalized described training sample, obtains standard-sized sample data;

Calculating sub module, for calculating described standard-sized sample data, obtains ZCA matrix and Mean Matrix;

Pre-service submodule, for based on described ZCA matrix and Mean Matrix, carries out pre-service to described training sample, obtains pretreated input data; Described pre-service comprises: ZCA whitening processing;

Training submodule, trains for described input data being inputted in described degree of depth convolutional neural networks, obtains training the complete described model of cognition obtained in advance.

Further, described identification module comprises:

Second normalization submodule, for carrying out pre-service to described two field picture to be identified, obtains normalization facial image to be identified;

Feature extraction submodule, for in the model of cognition that obtains in advance described in the described normalization facial image to be identified obtained from described second normalization submodule is input to, carry out feature extraction, obtain the high dimensional feature vector of the face to be identified corresponding to each two field picture to be identified;

Dimension-reduction treatment submodule, for utilizing the high dimensional feature vector of linear discriminate analysis LDA projection matrix to described face to be identified with reference to face prestored to carry out dimension-reduction treatment, obtains the dimensionality reduction proper vector of face to be identified;

Tolerance submodule, for carrying out COS distance tolerance to the dimensionality reduction proper vector of described face to be identified;

Comparison sub-module, compares for the result after described tolerance submodule being measured the described tolerance that obtains and predetermined threshold value;

Recognin module, when being greater than described predetermined threshold value for the result after described tolerance, identifies that the reference face characteristic prestored in described two field picture to be identified and face database matches; Result after described tolerance is less than or equal to described predetermined threshold value, identify described two field picture to be identified and not mating with reference to face characteristic of prestoring in face database.

Further, prestore the face characteristic data with reference to face in described face database, the described face characteristic data with reference to face comprise: the described LDA projection matrix with reference to face; Described device also comprises:

Normalization module, for carrying out pre-service to the described reference facial image with reference to face, obtains normalization with reference to facial image;

Characteristic extracting module, in the model of cognition that obtains in advance described in described normalization being input to reference to facial image, carries out feature extraction, obtains corresponding to each described vector of the high dimensional feature with reference to facial image;

Dimension-reduction treatment module, for carrying out LDA training to the described high dimensional feature vector with reference to facial image, obtains the dimensionality reduction proper vector with reference to facial image;

Generation module, for according to the described dimensionality reduction proper vector with reference to facial image, generates the described LDA projection matrix with reference to face.

Further, described detection module comprises: the 3rd detection sub-module;

Described 3rd detection sub-module, carries out Face datection for adopting AdaBoost iterative algorithm to described detection frame.

According to the third aspect of disclosure embodiment, a kind of video process apparatus is provided, comprises:

Processor;

For the storer of storage of processor executable instruction;

Wherein, described processor is used for:

The method that embodiment of the present disclosure provides and device can comprise following beneficial effect:

(1) in one embodiment, Face datection is carried out by the detection frame treated in processing target video, the facial image in detection and localization frame; The tracking frame treated in processing target video by the face information obtained based on detection again carries out face tracking, determines it is comprised face information in tracking frame; The detection frame and tracking frame that include face information are extracted from target video, and based on the model of cognition obtained in advance, recognition of face is carried out to the two field picture to be identified extracted, and recognition result is screened, obtain final face recognition result.Thus achieve final face recognition result shown to user, with to point out in user video there is the information of performer, the method effectively improves recognition of face efficiency.

(2) in another embodiment, detect frame and tracking frame by being divided into by target video, wherein, detecting frame is after carrying out predetermined interval division to target video, the frame corresponding to each division points; Tracking frame is that in target video, removing detects the frame of video outside frame.Thus comprehensive facial image location is carried out to target video, providing data to be identified as much as possible for identifying, improving the accuracy identified.

(3) in another embodiment, mode to the extraction of the two field picture including facial image in target video can be taked below: by order in chronological sequence, Face datection is carried out to the current detection frame in target video, obtain the facial image comprised in current detection frame, and demarcate with face; Again face corresponding for current detection frame mark is identified with the face obtained and compare, the face newly increased mark is stored, obtains face to be tracked mark; Whether the face mark to be tracked according to having stored carries out face tracking to current detection frame and the next tracking frame detected between frame, determine to comprise in tracking frame to identify corresponding facial image with face to be tracked; Upgrading next detection frame is again current detection frame, returns the step performing and the current detection frame in target video is carried out to Face datection.Thus the sequencing occurred with each frame in target video carries out detecting or following the tracks of frame by frame, make the location screening process of facial image simple, quick, improve the efficiency of facial image location screening.

(4) in another embodiment, mode below can be taked: Face datection is carried out to all detection frames in target video, obtains the detection data of the facial image comprised in all detection frames to the extraction of the two field picture including facial image in target video; Detection packet contains: all people face corresponding with all detection frames identifies; According to all people's face mark, face tracking is carried out to tracking frame, determines whether to comprise in tracking frame to identify corresponding facial image with all people's face.Thus according to from all detecting the face identification information oriented in frame, facial image tracking is carried out to all tracking frames, facial image as much as possible can be oriented from target video, for follow-up identifying provides data to be identified as much as possible, contribute to improving the accuracy identified.

(5) in another embodiment, model of cognition adopts degree of depth convolutional neural networks, and adopts the training sample of predetermined number facial image to train this degree of depth convolutional neural networks, obtains the model of cognition for identifying facial image.This training process and training algorithm effectively can improve the accuracy of facial image identification.

(6) in another embodiment, training sample is normalized, obtains standard-sized sample data; Standard-sized sample data is calculated, obtains ZCA matrix and Mean Matrix; Based on ZCA matrix and Mean Matrix, pre-service is carried out to training sample, obtain pretreated input data; Pre-service comprises: ZCA whitening processing; Train in input data input degree of depth convolutional neural networks, obtain training the complete model of cognition obtained in advance.By above-mentioned a series of training process, obtain the model of cognition that recognition accuracy is high, make this model of cognition possess stronger artificial intelligence recognition capability.

(7) in another embodiment, pre-service is carried out to two field picture to be identified, obtain normalization facial image to be identified; Facial image to be identified for normalization is input in the model of cognition obtained in advance, carries out feature extraction, obtain the high dimensional feature vector of the face to be identified corresponding to each two field picture to be identified; Utilize the high dimensional feature vector of linear discriminate analysis LDA projection matrix to face to be identified with reference to face prestored to carry out dimension-reduction treatment, obtain the dimensionality reduction proper vector of face to be identified; COS distance tolerance is carried out to the dimensionality reduction proper vector of face to be identified, the result after tolerance and predetermined threshold value are compared; If the result after tolerance is greater than predetermined threshold value, identify that the reference face characteristic prestored in two field picture to be identified and face database matches; If the result after tolerance is less than or equal to predetermined threshold value, identify two field picture to be identified and not mating with reference to face characteristic of prestoring in face database.This identifying is simple, quick, effectively can improve the accuracy of identification.

(8) in another embodiment, prestore the face characteristic data with reference to face in face database, the face characteristic data with reference to face comprise: with reference to the LDA projection matrix of face; Pre-service is carried out to the reference facial image with reference to face, obtains normalization with reference to facial image; Normalization is input in the model of cognition obtained in advance with reference to facial image, carries out feature extraction, obtain corresponding to each high dimensional feature with reference to facial image vector; LDA training is carried out to the high dimensional feature vector with reference to facial image, obtains the dimensionality reduction proper vector with reference to facial image; According to the dimensionality reduction proper vector with reference to facial image, generate the LDA projection matrix with reference to face.By utilizing the model of cognition trained to carry out face characteristic identification to reference to face, making with reference to the characteristic of face reliable, effective, there is reference value, for follow-up facial image to be identified provides reference standard accurately and reliably.

(9) in another embodiment, AdaBoost iterative algorithm is adopted to carry out Face datection to detection frame, wherein, AdaBoost is the abbreviation that English " AdaptiveBoosting " self-adaptation strengthens, it is a kind of machine learning method, can accurately detect the coordinate position obtaining facial image, for follow-up picture size normalization is provided convenience.

(10) in another embodiment, predetermined interval can be set to preset at equal intervals or default unequal interval, thus can according to the quantity of the detection frame in the situation that the detects flexible calibration target video of facial image, indirectly to improve the precision of identification.

Should be understood that, it is only exemplary and explanatory that above general description and details hereinafter describe, and can not limit the disclosure.

Accompanying drawing explanation

Accompanying drawing to be herein merged in instructions and to form the part of this instructions, shows and meets embodiment of the present disclosure, and is used from instructions one and explains principle of the present disclosure.

Fig. 1 is the process flow diagram of a kind of method for processing video frequency according to an exemplary embodiment;

Fig. 2 is the process flow diagram of a kind of method for processing video frequency according to another exemplary embodiment;

Fig. 3 is the process flow diagram of a kind of method for processing video frequency according to another exemplary embodiment;

Fig. 4 is the process flow diagram of a kind of video process apparatus according to an exemplary embodiment;

Fig. 5 is the process flow diagram of a kind of video process apparatus according to another exemplary embodiment;

Fig. 6 is the block diagram of a kind of video process apparatus 600 according to an exemplary embodiment;

Fig. 7 is the block diagram of a kind of video process apparatus 700 according to an exemplary embodiment.

By above-mentioned accompanying drawing, illustrate the embodiment that the disclosure is clear and definite more detailed description will be had hereinafter.These accompanying drawings and text description be not in order to limited by any mode the disclosure design scope, but by reference to specific embodiment for those skilled in the art illustrate concept of the present disclosure.

Embodiment

Here will be described exemplary embodiment in detail, its sample table shows in the accompanying drawings.When description below relates to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawing represents same or analogous key element.Embodiment described in following exemplary embodiment does not represent all embodiments consistent with the disclosure.On the contrary, they only with as in appended claims describe in detail, the example of apparatus and method that aspects more of the present disclosure are consistent.

Before specifically introducing each embodiment of the disclosure, first summary description is carried out to main thought of the present disclosure: in order to automatically identify the performer occurred in video, need to extract two field picture from video, onestep extraction of going forward side by side goes out to include the two field picture of facial image, utilize preset algorithm to carry out facial image identification to these two field pictures again, identify the actor information in video.Specifically, disclosure embodiment determines the actor information in video based on the model of cognition obtained in advance, Face datection and tracking technique, and then the user of the actor information identified to viewing video is presented.

Fig. 1 is the process flow diagram of a kind of method for processing video frequency according to an exemplary embodiment, as shown in Figure 1, the method for processing video frequency of the present embodiment can be applied in the terminal (client device) that also can be applied to receiver, video side in the video server of video provider, illustrate to be applied in video server below, the method for the present embodiment comprises the following steps:

This method for processing video frequency comprises the following steps:

In a step 101, obtain pending target video, comprise in target video: detect frame and tracking frame.

Concrete, target video can be the complete video of a program, or partial video.Target video is formed by connecting by still image one by one, by carrying out the identification of facial image to the two field picture forming target video, can realize the identification to the performer occurred in target video.But will comprise the image information of tens frames in the video in a usual second, if all carry out recognition of face operation to the every two field picture in target video, operand is huge, and recognition efficiency is not high.Therefore, can extract some specific two field pictures in target video, scan the particular frame image that these extract, obtain the face characteristic information comprised in image, these are detected frame exactly by the two field picture scanned; Those are not carried out to the two field picture of Scanning Detction, the mode of signature tracking can be adopted, before searching in tracking frame, in detection frame, scan the face characteristic obtained.Thus the frame information including facial image in target video can be obtained, for recognition of face is below prepared.

In a step 102, Face datection is carried out to detection frame, obtains the detection data detecting the facial image comprised in frame.

Concrete, by scanning detection frame, judge whether there is face information in this detection frame.If exist, the detection data of record facial image, detect packet and contain: the face mark distinguishing different facial image.In a two field picture, likely comprise single face information, also likely comprise the face information of many people, effectively can distinguish by face mark the different faces scanned.

In step 103, according to face mark, face tracking is carried out to tracking frame, determines whether to comprise in tracking frame to identify corresponding facial image with face.

Concrete, as previously mentioned, scan according in detection frame the face characteristic obtained, be identified in tracking frame the face followed the trail of and whether exist and detect and occurred in frame based on face.

At step 104, from detecting the frame extracting frame and tracking frame and include face mark, two field picture to be identified is obtained.

Concrete, when finding facial image in detection frame and tracking frame, then the detection frame and tracking frame that include facial image are extracted from target video, as the two field picture to be identified identifying face identity corresponding to facial image.

In step 105, based on the model of cognition obtained in advance, recognition of face is carried out to two field picture to be identified, obtain the face recognition result of facial image in every frame.

Concrete, the multiple algorithm for image recognition is there is in prior art, based on different algorithms, the image recognition model obtained under special scenes can be precalculated, such as, artificial neural network can be utilized to train sample image data, obtain the neural network model with artificial intelligence learning ability, adopt this artificial nerve network model trained to treat recognition image again to identify, obtain recognition result.Special scenes in the present embodiment carries out identification to the face in image, and the function of model of cognition is exactly calculate the unknown facial image of input, obtains the identity information of personage corresponding to this unknown facial image.

In step 106, all face recognition result of two field picture to be identified are screened, obtain the final recognition result of the facial image occurred in target video.

Concrete, carry out record to the face recognition result in every frame image to be identified, then, screen according to certain rule, such as same face identifies different recognition results; The recognition result that such as face mark A is corresponding has personage B and personage C, then can sort according to the quantity of the recognition result of B and C, the person recognition result corresponding for final face mark A ranked the first; Or, the face mark that same recognition result is corresponding different; Such as recognition result is performer D, but the face mark corresponding to it has E and F, then the quantity that also can identify E and F according to face sorts, and gets the face ranked the first and is designated the face corresponding with performer D and identifies.

In the present embodiment, carry out Face datection by the detection frame treated in processing target video, the facial image in detection and localization frame; Again by carrying out face tracking based on detecting the tracking frame that the face information that obtains treats in processing target video, to determine in tracking frame whether comprised face information; The detection frame and tracking frame that include face information are extracted from target video, and based on the model of cognition obtained in advance, recognition of face is carried out to the two field picture to be identified extracted, and recognition result is screened, obtain final face recognition result.Thus achieve final face recognition result shown to user, with to point out in user video there is the information of performer, the method effectively improves recognition of face efficiency.

Fig. 2 is the process flow diagram of a kind of method for processing video frequency according to another exemplary embodiment, as shown in Figure 2, the method for processing video frequency of the present embodiment can be applied in the terminal (client device) that also can be applied to receiver, video side in the video server of video provider, illustrate to be applied in video server below, the method for the present embodiment comprises the following steps:

In step 201, obtain pending target video, comprise in target video: detect frame and tracking frame.

Optionally, detecting frame is after carrying out predetermined interval division to target video, the frame corresponding to each division points; Tracking frame is that in target video, removing detects the frame of video outside frame.This predetermined interval can for presetting at equal intervals or default unequal interval; If preset at equal intervals, preferably, interval frame number is 5 frames.

Concrete, the facial image recall rate that can detect in frame according to each adjusts the interval that each detects frame adaptively, to provide the two field picture to be identified comprising facial image abundant as far as possible.

In step 202., order in chronological sequence, carries out Face datection to the current detection frame in target video, obtains the detection data of the facial image comprised in current detection frame.

Optionally, carry out Face datection to detection frame can comprise: adopt AdaBoost iterative algorithm to carry out Face datection to detection frame.Detection packet contains: the face mark distinguishing different facial image, specifically comprises: the face mark that current detection frame is corresponding.

Concrete, the core concept of AdaBoost iterative algorithm trains different sorters (Weak Classifier) for same training set, then these weak classifier set got up, and forms a stronger final sorter (strong classifier).Thus get rid of some unnecessary training data features, only retain crucial training data, effectively improve the detector efficiency of facial image, improve the accuracy of detection simultaneously.

In step 203, face corresponding for current detection frame mark is identified with the face obtained and compares, the face newly increased mark is stored, obtains face to be tracked mark.

Concrete, constantly detect by detecting frame to each, enrich the number of face mark gradually, such as in current detection frame, scan face first, detect in frame at the next one and scan face first and face second, then face mark to be tracked just has first and second, just can follow the tracks of first and second two facial images to tracking frame afterwards.

In step 204, whether the face mark to be tracked according to having stored carries out face tracking to current detection frame and the next tracking frame detected between frame, determine to comprise in tracking frame to identify corresponding facial image with face to be tracked.

In step 205, upgrading next detection frame is current detection frame, returns the method performing step 202.

Concrete, such as one section of target video, to the first two field picture of target video, Adaboost technology is adopted to carry out Face datection, if have facial image in this first two field picture, then record the detection data of this facial image, if do not comprise facial image, detection second two field picture can be continued, until according to the sequencing of frame of video, find first two field picture including facial image, the Face datection data obtained after this two field picture of writing scan, such as, the positional information etc. of face mark, this facial image.Using this frame of video including facial image as first frame, to the second two field picture after this first frame, do not carry out Face datection, particle filter technology is only adopted to carry out face tracking, if there is facial image in this second two field picture, then carry out face tracking, if this second two field picture does not exist facial image, then do not process.At interval of 5 frames, Face datection can be restarted subsequently, can ensure that the facial image of newly coming in can not be missed like this.In addition to the above methods, also can using the first two field picture of target video as first frame, no matter whether include facial image in it, carry out the Scanning Detction of two field picture according to the spacing order of the detection frame preset, then with the more and more abundanter Face datection data that scanning obtains gradually, face tracking is carried out to each tracking frame that two are detected between frame.

In step 206, from detecting the frame extracting frame and tracking frame and include face mark, two field picture to be identified is obtained.

Concrete, in the detection and tracking process of above-mentioned steps 202 to step 205, record the frame number that face images occurs.When adding up to occur that frame number is greater than certain value, the for example video of good a few minutes, the facial image of more than general hundreds of frame can be collected, these Frame storage are got off, these frames are exactly two field picture to be identified, and the star's face collected in the model of cognition obtained in advance and face database can be adopted afterwards to compare to these two field pictures to be identified and identify.

In step 207, based on the model of cognition obtained in advance, recognition of face is carried out to two field picture to be identified, obtain the face recognition result of facial image in every frame.

Concrete, model of cognition can be degree of depth convolutional neural networks.Then the method also comprises: adopt the training sample of predetermined number facial image to train degree of depth convolutional neural networks, the model of cognition obtained in advance.

Degree of depth study (DeepLearning) is a new field in machine learning research, and its motivation is the neural network set up, simulation human brain carries out analytic learning, and the mechanism that its imitates human brain carrys out decryption, such as image, sound and text.The people such as the Objective Concept Hinton of degree of depth study proposed in 2006.Propose non-supervisory greed successively training algorithm based on dark Belief Network (DBN), for the optimization difficult problem solving deep structure relevant brings hope, propose multilayer autocoder deep structure subsequently.In addition the convolutional neural networks that the people such as Lecun proposes is first real sandwich construction learning algorithm, and it utilizes spatial correlation to reduce number of parameters to improve training performance.

Convolutional neural networks (ConvolutionalNeuralNetworks, be called for short: one CNN) being artificial neural network, degree of depth convolutional neural networks is exactly the machine learning model under a kind of supervised learning of the degree of depth, has become the study hotspot of current speech analysis and field of image recognition.Its weights shared network structure makes it more to be similar to biological neural network, reduces the complexity of network model, decreases the quantity of weights.When the input of network is multidimensional image, this advantage shows more obvious, makes image directly as the input of network, can avoid feature extraction complicated in tional identification algorithm and data reconstruction processes.Convolutional network is a multilayer perceptron for identifying two-dimensional shapes and particular design, and the distortion of this network structure to translation, proportional zoom, inclination or his form altogether has height unchangeability.

Optionally, the training sample of predetermined number facial image can by M class facial image, and every class facial image is opened facial image by N and formed; Wherein M, N are natural number.

Such as, in the stage adopting training sample to train degree of depth convolutional neural networks, a large amount of face image data can be prepared.And mark demarcation is carried out to these facial images.For example: the label of the face images of Zhang San is all 1; The label of the face images of Li Si is all 2; Then the facial image of for example 20000 classes is prepared close to 600,000; Be equivalent to 20000 people, everyone facial image 30.Now, M is 20000; N is 30; The training sample of predetermined number facial image is 600,000 facial images.

Optionally, this model of cognition obtained in advance can be acquired by following steps, comprising: be normalized training sample, obtains standard-sized sample data; Standard-sized sample data is calculated, obtains ZCA matrix and Mean Matrix; Based on ZCA matrix and Mean Matrix, pre-service is carried out to training sample, obtain pretreated input data; Pre-service comprises: ZCA whitening processing; Train in input data input degree of depth convolutional neural networks, obtain training this complete model of cognition obtained in advance.

Wherein, PCA is principal component analysis (PCA), i.e. PrincipalComponentAnalysis, is called for short " PCA " or pivot analysis.ZCA is regularization PCA, ZCA albefaction is on the basis of PCA albefaction, done a rotation process, and the data after making albefaction are more close to raw data.First ZCA albefaction eliminates the correlativity between each feature by PCA, is then that input feature vector has unit variance, now obtains the result after PCA albefaction, and then data rotation is gone back, obtain the result of ZCA albefaction, result embodies usually in the matrix form, obtains ZCA matrix.Continue for the training sample of 600,000 above, utilize these data, training ZCA matrix P and Mean Matrix E, and utilize ZCA matrix P and E to carry out pre-service to all training datas, and then utilize CNN network to train, the structure of this CNN network can with reference to the network structure of ImageNet, but need to modify to the partial parameters in ImageNet.ImageNet is a computer vision system identification project, and being the database that at present image recognition is maximum in the world, is the computer scientist of Harvard of the U.S., and the recognition system of simulating human is set up, can from picture recognition object.Parameter modification wherein, such as input picture parameter is 100 × 100 pixels; Final output classification parameter is 20000; Other middle data parameters also can slightly some adjust, and concrete adjustment data are set according to the concrete needs identified by those skilled in the art, and the disclosure is not restricted this.This completes the training of CNN degree of depth learning model and network.Again to 600,000 images, adopt the CNN model trained, remove last output layer, obtain the feature value vector of 4096 dimensions, before being equivalent to utilize, the CNN network of training has carried out the work of feature extraction; And linear discriminate analysis (LinearDiscriminantAnalysis is carried out to the feature value vector of 600,000 4096 dimensions, be called for short: LDA) train, LDA is the classic algorithm of pattern-recognition, its basic thought is that the pattern sample of higher-dimension is projected to best discriminant technique vector space, to reach the effect extracting classified information and compressive features space dimensionality, after projection, Assured Mode sample has maximum between class distance and minimum inter-object distance in new subspace, and namely pattern has best separability within this space.Therefore, it is a kind of effective Feature Extraction Method.Thus obtain 200 final dimensional vectors by the feature value vector of 4096 dimensions.And preserve the projection matrix P of this LDA.

Further, step 207 specifically can comprise the following steps:

Step one, pre-service is carried out to two field picture to be identified, obtain normalization facial image to be identified.

Step 2, facial image to be identified for normalization is input in the model of cognition obtained in advance, carries out feature extraction, obtain the high dimensional feature vector of the face to be identified corresponding to each two field picture to be identified.

The high dimensional feature vector of linear discriminate analysis LDA projection matrix to face to be identified with reference to face that step 3, utilization prestore carries out dimension-reduction treatment, obtains the dimensionality reduction proper vector of face to be identified.

Step 4, COS distance tolerance is carried out to the dimensionality reduction proper vector of face to be identified, the result after tolerance and predetermined threshold value are compared; If the result after tolerance is greater than predetermined threshold value, identify that the reference face characteristic prestored in two field picture to be identified and face database matches; If the result after tolerance is less than or equal to predetermined threshold value, identify two field picture to be identified and not mating with reference to face characteristic of prestoring in face database.

Wherein, prestore the face characteristic data with reference to face in the face database in step 4, people's characteristic of this reference face comprises: with reference to the LDA projection matrix of face.Then this method for processing video frequency also comprises: carry out pre-service to the reference facial image with reference to face, obtain normalization with reference to facial image; Normalization is input in the model of cognition obtained in advance with reference to facial image, carries out feature extraction, obtain corresponding to each high dimensional feature with reference to facial image vector; LDA training is carried out to the high dimensional feature vector with reference to facial image, obtains the dimensionality reduction proper vector with reference to facial image; According to the dimensionality reduction proper vector with reference to facial image, generate the LDA projection matrix with reference to face.

Concrete, the above-mentioned model of cognition based on obtaining in advance, recognition of face is carried out to two field picture to be identified and to identify with reference to face and the process of preserving reference face characteristic data all belongs to operational phase to the CNN network trained, in simple terms, to the model of cognition obtained in advance in step 207, namely the CNN network trained, if newly come in, two facial images are compared, first also need to normalize to standard size 100 × 100 to facial image, then ZCA pre-service is carried out, utilize and train the CNN network model obtained to process pretreated data, obtain the feature value vector of two 4096 dimensions.The feature value vector of projection matrix P to these two 4096 dimensions of recycling LDA carries out dimensionality reduction, then obtains the feature value vector of 2 200 dimensions.Carry out COS distance tolerance to the feature value vector of these two 200 dimensions again, and adopt certain threshold value to split, what be greater than this threshold value can think that these two images belong to same face, otherwise is different face.And be whether that the comparison standard of same face is compared according to the face characteristic data with reference to face prestored in face database.

In a step 208, all face recognition result of two field picture to be identified are screened, obtain the final recognition result of the facial image occurred in target video.

Concrete, every frame recognition result of two field picture to be identified is all recorded, then screens according to certain rule, according to rank order successively, the result ranked the first can be selected; Or the result exceeding default recognition threshold selected, then vote, votes is maximum, be final, recognition result the most accurately.And of display screen jiao, this recognition result can be displayed, make to be difficult to performer's face and name thereof the starring actor information that corresponding user also can know video in real time.Such as, show the photo of star's first and the name of this star dated, wherein, the photo that the photo of star's first can be selected the width in two field picture to be identified or select star's first corresponding from face database.

To sum up, the present embodiment make use of Face datection tracking technique, carry out face extraction analysis to some featured performers in video, the face recognition technology then utilizing the degree of depth to learn identifies these main faces, finally utilizes multiframe ballot technology to obtain recognition result the most accurately.Then the recognition result of these videos performer can be illustrated in screen.

Fig. 3 is the process flow diagram of a kind of method for processing video frequency according to another exemplary embodiment, as shown in Figure 3, the method for processing video frequency of the present embodiment can be applied in the terminal (client device) that also can be applied to receiver, video side in the video server of video provider, illustrate to be applied in video server below, the method for the present embodiment comprises the following steps:

In step 301, obtain pending target video, comprise in described target video: detect frame and tracking frame;

In step 302, Face datection is carried out to all detection frames in target video, obtains the detection data of the facial image comprised in all detection frames.

Described detection packet contains: the face mark distinguishing different facial image, specifically comprises: all people face corresponding with all detection frames identifies.

In step 303, according to all people's face mark, face tracking is carried out to tracking frame, determines whether to comprise in tracking frame to identify corresponding facial image with all people's face.

Concrete, from the difference of a upper embodiment, the present embodiment is that the detection ordering to detecting frame is different, be the Scanning Detction carrying out facial image according to the detection frame sequencing be present in target video in a upper embodiment; The present embodiment is the division being carried out by target video presetting, and then the whole human face image information in the detection frame oriented is got, then carries out tracking operation by whole human face image information successively to tracking frame.The method can obtain facial images much more relatively from target video, thus provides data to be identified as much as possible for follow-up identifying, contributes to improving the accuracy identified.

In step 304, from detecting the frame extracting frame and tracking frame and include face mark, two field picture to be identified is obtained.

In step 305, based on the model of cognition obtained in advance, recognition of face is carried out to two field picture to be identified, obtain the face recognition result of facial image in every frame.

Within step 306, all face recognition result of two field picture to be identified are screened, obtain the final recognition result of the facial image occurred in target video.

Additive method step and a upper embodiment of this embodiment are similar, and its principle and implementation method please refer to previous embodiment, do not repeat them here.

Following is disclosure device embodiment, may be used for performing disclosure embodiment of the method.For the details do not disclosed in disclosure device embodiment, please refer to disclosure embodiment of the method.

Fig. 4 is the process flow diagram of a kind of video process apparatus according to an exemplary embodiment, and as shown in Figure 4, this video process apparatus can realize becoming the some or all of of electronic equipment by software, hardware or both combinations.This video process apparatus can comprise:

First acquisition module 401, for obtaining pending target video, comprises in target video: detect frame and tracking frame.

Detection module 402, carries out Face datection for the detection frame got the first acquisition module.

Second acquisition module 403, for obtaining the detection data of the facial image comprised in detection frame that detection module 402 detects; Detection packet contains: the face mark distinguishing different facial image.

Tracking module 404, for according to face mark, carries out face tracking to tracking frame.

Determination module 405, identifies corresponding facial image for determining whether to comprise in the tracking frame that tracking module 404 is followed the tracks of with face.

Extraction module 406, for from detecting the frame extracting in frame and tracking frame and include face mark, obtains two field picture to be identified.

Identification module 407, for based on the model of cognition obtained in advance, carries out recognition of face to two field picture to be identified, obtains the face recognition result of facial image in every frame.

Screening module 408, for screening the face recognition result of two field picture to be identified, obtains the final recognition result of the facial image occurred in target video.

In the present embodiment, the detection frame treated in processing target video by detection module carries out Face datection, and is obtained the facial image detected in frame by the second acquisition module; Face tracking is carried out based on detecting the tracking frame that the face information that obtains treats in processing target video again, by face information whether comprised in determination module determination tracking frame by tracking module; Extraction module is utilized the detection frame and tracking frame that include face information to be extracted from target video, and by identification module based on the model of cognition obtained in advance, recognition of face is carried out to the two field picture to be identified extracted, and utilize screening module to screen recognition result, obtain final face recognition result.Thus to achieve final face recognition result be user's display, with to point out in user video there is the information of performer, the method effectively improves recognition of face efficiency.

Fig. 5 is the process flow diagram of a kind of video process apparatus according to another exemplary embodiment, and this video process apparatus can realize becoming the some or all of of electronic equipment by software, hardware or both combinations.Based on said apparatus embodiment, detecting frame is after carrying out predetermined interval division to target video, the frame corresponding to each division points; Tracking frame is that in target video, removing detects the frame of video outside frame.

Optionally, detection module 402 can comprise: the first detection sub-module 4021.

First detection sub-module 4021, for order in chronological sequence, carries out Face datection to the current detection frame in target video.

Second acquisition module 403 can comprise: first obtains submodule 4031.

First obtains submodule 4031, for obtaining the detection data of the facial image comprised in current detection frame; Detection packet contains: the face mark that current detection frame is corresponding.

This device also comprises:

Memory module 409, comparing for face corresponding for current detection frame mark being identified with the face obtained, the face newly increased mark being stored, obtaining face to be tracked mark.

Accordingly, tracking module 404 comprises: first follows the tracks of submodule 4041.

First follows the tracks of submodule 4041, whether the face mark to be tracked for having stored according to memory module 409 carries out face tracking to current detection frame and the next tracking frame detected between frame, determine to comprise in tracking frame to identify corresponding facial image with face to be tracked.

This device also comprises:

Update module 410, being current detection frame for upgrading next detection frame, returning the first detection sub-module 4021.

Optionally, detection module 402 comprises: the second detection sub-module 4022.

Second detection sub-module 4022, for carrying out Face datection to all detection frames in target video.

Second acquisition module 403 comprises: second obtains submodule 4032.

Second obtains submodule 4032, for obtaining the detection data of the facial image comprised in all detection frames; Detection packet contains: all people face corresponding with all detection frames identifies.

Accordingly, tracking module 404 comprises: second follows the tracks of submodule 4042;

Whether second follows the tracks of submodule 4042, for according to all people's face mark, carry out face tracking, determine to comprise in tracking frame to identify corresponding facial image with all people's face to tracking frame.

Optionally, the model of cognition that identification module 407 adopts is degree of depth convolutional neural networks.

This device also comprises: training module 411, for adopting the training sample of predetermined number facial image to train the degree of depth convolutional neural networks that identification module 407 adopts, and the model of cognition obtained in advance.

Optionally, training module 411 comprises:

First normalization submodule 4111, for being normalized training sample, obtains standard-sized sample data.

Calculating sub module 4112, for calculating standard-sized sample data, obtains ZCA matrix and Mean Matrix.

Pre-service submodule 4113, for based on ZCA matrix and Mean Matrix, carries out pre-service to training sample, obtains pretreated input data; Pre-service comprises: ZCA whitening processing.

Training submodule 4114, for training in input data input degree of depth convolutional neural networks, obtains training the complete model of cognition obtained in advance.

Optionally, identification module 407 comprises:

Second normalization submodule 4071, for carrying out pre-service to two field picture to be identified, obtains normalization facial image to be identified.

Feature extraction submodule 4072, for the normalization obtained from the second normalization submodule 4071 facial image to be identified is input in the model of cognition obtained in advance, carry out feature extraction, obtain the high dimensional feature vector of the face to be identified corresponding to each two field picture to be identified.

Dimension-reduction treatment submodule 4073, for utilizing the high dimensional feature vector of linear discriminate analysis LDA projection matrix to face to be identified with reference to face prestored to carry out dimension-reduction treatment, obtains the dimensionality reduction proper vector of face to be identified.

Tolerance submodule 4074, for carrying out COS distance tolerance to the dimensionality reduction proper vector of face to be identified.

Comparison sub-module 4075, for by tolerance submodule measure the tolerance obtained after result and predetermined threshold value compare.

Recognin module 4076, when being greater than predetermined threshold value for the result after tolerance, identifies that the reference face characteristic prestored in two field picture to be identified and face database matches; Result after tolerance is less than or equal to predetermined threshold value, identify two field picture to be identified and not mating with reference to face characteristic of prestoring in face database.

Optionally, prestore the face characteristic data with reference to face in face database, the face characteristic data with reference to face comprise: with reference to the LDA projection matrix of face.

This device also comprises:

Normalization module 412, for carrying out pre-service to the reference facial image with reference to face, obtains normalization with reference to facial image.

Characteristic extracting module 413, for being input in the model of cognition obtained in advance by normalization with reference to facial image, carries out feature extraction, obtains corresponding to each high dimensional feature with reference to facial image vector.

Dimension-reduction treatment module 414, for carrying out LDA training to the high dimensional feature vector with reference to facial image, obtains the dimensionality reduction proper vector with reference to facial image.

Generation module 415, for according to the dimensionality reduction proper vector with reference to facial image, generates the LDA projection matrix with reference to face.

Optionally, the training sample of predetermined number facial image is by M class facial image, and every class facial image is opened facial image by N and formed; Wherein M, N are natural number.

Optionally, detection module 402 comprises: the 3rd detection sub-module 4023;

3rd detection sub-module 4023, carries out Face datection for adopting AdaBoost iterative algorithm to detection frame.

Optionally, predetermined interval is for presetting at equal intervals or default unequal interval; If preset at equal intervals, interval frame number is 5 frames.

About the device in above-described embodiment, wherein the concrete mode of modules executable operations has been described in detail in about the embodiment of the method, will not elaborate explanation herein.

Fig. 6 is the block diagram of a kind of video process apparatus 600 according to an exemplary embodiment.Such as, video process apparatus 600 can be mobile phone, computing machine, digital broadcast terminal, messaging devices, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant, router, telegon etc.

With reference to Fig. 6, device 600 can comprise following one or more assembly: processing components 602, storer 604, electric power assembly 606, multimedia groupware 608, audio-frequency assembly 610, the interface 612 of I/O (I/O), sensor module 614, and communications component 616.

The integrated operation of the usual control device 600 of processing components 602, such as with display, call, data communication, camera operation and record operate the operation be associated.Processing components 602 can comprise one or more processor 620 to perform instruction, to complete all or part of step of above-mentioned method.In addition, processing components 602 can comprise one or more module, and what be convenient between processing components 602 and other assemblies is mutual.Such as, processing components 602 can comprise multi-media module, mutual with what facilitate between multimedia groupware 608 and processing components 602.

Storer 604 is configured to store various types of data to be supported in the operation of device 600.The example of these data comprises for any application program of operation on device 600 or the instruction of method, contact data, telephone book data, message, picture, video etc.Storer 604 can be realized by the volatibility of any type or non-volatile memory device or their combination, as static RAM (SRAM), Electrically Erasable Read Only Memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM), programmable read only memory (PROM), ROM (read-only memory) (ROM), magnetic store, flash memory, disk or CD.

The various assemblies that electric power assembly 606 is device 600 provide electric power.Electric power assembly 606 can comprise power-supply management system, one or more power supply, and other and the assembly generating, manage and distribute electric power for device 600 and be associated.

Multimedia groupware 608 is included in the screen providing an output interface between described device 600 and user.In certain embodiments, screen can comprise liquid crystal display (LCD) and touch panel (TP).If screen comprises touch panel, screen may be implemented as touch-screen, to receive the input signal from user.Touch panel comprises one or more touch sensor with the gesture on sensing touch, slip and touch panel.Described touch sensor can the border of not only sensing touch or sliding action, but also detects the duration relevant to described touch or slide and pressure.In certain embodiments, multimedia groupware 608 comprises a front-facing camera and/or post-positioned pick-up head.When device 600 is in operator scheme, during as screening-mode or video mode, front-facing camera and/or post-positioned pick-up head can receive outside multi-medium data.Each front-facing camera and post-positioned pick-up head can be fixing optical lens systems or have focal length and optical zoom ability.

Audio-frequency assembly 610 is configured to export and/or input audio signal.Such as, audio-frequency assembly 610 comprises a microphone (MIC), and when device 600 is in operator scheme, during as call model, logging mode and speech recognition mode, microphone is configured to receive external audio signal.The sound signal received can be stored in storer 604 further or be sent via communications component 616.In certain embodiments, audio-frequency assembly 610 also comprises a loudspeaker, for output audio signal.

I/O interface 612 is for providing interface between processing components 602 and peripheral interface module, and above-mentioned peripheral interface module can be keyboard, some striking wheel, button etc.These buttons can include but not limited to: home button, volume button, start button and locking press button.

Sensor module 614 comprises one or more sensor, for providing the state estimation of various aspects for device 600.Such as, sensor module 614 can detect the opening/closing state of device 600, the relative positioning of assembly, such as described assembly is display and the keypad of device 600, the position of all right pick-up unit 600 of sensor module 614 or device 600 1 assemblies changes, the presence or absence that user contacts with device 600, the temperature variation of device 600 orientation or acceleration/deceleration and device 600.Sensor module 614 can comprise proximity transducer, be configured to without any physical contact time detect near the existence of object.Sensor module 614 can also comprise optical sensor, as CMOS or ccd image sensor, for using in imaging applications.In certain embodiments, this sensor module 614 can also comprise acceleration transducer, gyro sensor, Magnetic Sensor, pressure transducer or temperature sensor.

Communications component 616 is configured to the communication being convenient to wired or wireless mode between device 600 and other equipment.Device 600 can access the wireless network based on communication standard, as WiFi, 2G or 3G, or their combination.In one exemplary embodiment, communications component 616 receives from the broadcast singal of external broadcasting management system or broadcast related information via broadcast channel.In one exemplary embodiment, described communications component 616 also comprises near-field communication (NFC) module, to promote junction service.Such as, can based on radio-frequency (RF) identification (RFID) technology in NFC module, Infrared Data Association (IrDA) technology, ultra broadband (UWB) technology, bluetooth (BT) technology and other technologies realize.

In the exemplary embodiment, device 600 can be realized, for performing said method by one or more application specific integrated circuit (ASIC), digital signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD) (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor or other electronic components.

In the exemplary embodiment, additionally provide a kind of non-transitory computer-readable recording medium comprising instruction, such as, comprise the storer 604 of instruction, above-mentioned instruction can perform said method by the processor 620 of device 600.Such as, described non-transitory computer-readable recording medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy disk and optical data storage devices etc.

A kind of non-transitory computer-readable recording medium, when the instruction in described storage medium is performed by the processor of mobile terminal, make mobile terminal can perform a kind of method for network access, described method comprises:

Storer 604, for the executable instruction of storage of processor 620; Processor 620, for obtaining pending target video, comprises in target video: detect frame and tracking frame; Face datection is carried out to detection frame, obtains the detection data detecting the facial image comprised in frame; Detection packet contains: the face mark distinguishing different facial image; According to face mark, face tracking is carried out to tracking frame, determines whether to comprise in tracking frame to identify corresponding facial image with face; From detecting the frame extracting frame and tracking frame and include face mark, obtain two field picture to be identified; Based on the model of cognition obtained in advance, recognition of face is carried out to two field picture to be identified, obtain the face recognition result of facial image in every frame; All face recognition result of two field picture to be identified are screened, obtains the final recognition result of the facial image occurred in target video.

Fig. 7 is the block diagram of a kind of video process apparatus 700 according to an exemplary embodiment.Such as, device 700 may be provided in a server.With reference to Fig. 7, device 700 comprises processing components 722, and it comprises one or more processor (not shown) further, and the memory resource representated by storer 732, can such as, by the instruction of the execution of processing components 722, application program for storing.The application program stored in storer 732 can comprise each module corresponding to one group of instruction one or more.In addition, processing components 722 is configured to perform instruction, to perform above-mentioned method for processing video frequency.

Device 700 can also comprise the power management that a power supply module 726 is configured to actuating unit 700, and a wired or wireless network interface 750 is configured to device 700 to be connected to network, and input and output (I/O) interface 758.Device 700 can operate the operating system based on being stored in storer 732, such as WindowsServerTM, MacOSXTM, UnixTM, LinuxTM, FreeBSDTM or similar.

Those skilled in the art, at consideration instructions and after putting into practice invention disclosed herein, will easily expect other embodiment of the present disclosure.The application is intended to contain any modification of the present disclosure, purposes or adaptations, and these modification, purposes or adaptations are followed general principle of the present disclosure and comprised the undocumented common practise in the art of the disclosure or conventional techniques means.Instructions and embodiment are only regarded as exemplary, and true scope of the present disclosure and spirit are pointed out by claim below.

Should be understood that, the disclosure is not limited to precision architecture described above and illustrated in the accompanying drawings, and can carry out various amendment and change not departing from its scope.The scope of the present disclosure is only limited by appended claim.

Claims

1. a method for processing video frequency, is characterized in that, described method comprises:

2. method according to claim 1, is characterized in that,

Described detection frame is after carrying out predetermined interval division to described target video, the frame corresponding to each division points;

3. method according to claim 2, is characterized in that,

Described Face datection is carried out to described detection frame, obtains the detection data of the facial image comprised in described detection frame, comprising:

4. method according to claim 2, is characterized in that,

5. method according to claim 1, is characterized in that, described method also comprises:

6. method according to claim 5, is characterized in that, the training sample of described employing predetermined number facial image is trained described degree of depth convolutional neural networks, and the model of cognition obtained in advance described in obtaining, comprising:

Described training sample is normalized, obtains standard-sized sample data;

7. method according to claim 1, is characterized in that, the described model of cognition based on obtaining in advance, carries out recognition of face, obtain the face recognition result of facial image in every frame, comprising described two field picture to be identified:

8. method according to claim 7, is characterized in that, prestores the face characteristic data with reference to face in described face database, and the described face characteristic data with reference to face comprise: the described LDA projection matrix with reference to face; Described method also comprises:

9. method according to claim 1, is characterized in that, describedly carries out Face datection to described detection frame and comprises:

10. method according to claim 2, is characterized in that, described predetermined interval is for presetting at equal intervals or default unequal interval.

11. 1 kinds of video process apparatus, is characterized in that, described device comprises:

12. devices according to claim 11, is characterized in that,

13. devices according to claim 12, is characterized in that,

Described detection module comprises: the first detection sub-module;

Described second acquisition module comprises: first obtains submodule;

Described device also comprises:

14. devices according to claim 12, is characterized in that,

Described detection module comprises: the second detection sub-module;

Described second acquisition module comprises: second obtains submodule;

15. devices according to claim 11, is characterized in that, described device also comprises:

16. devices according to claim 15, is characterized in that, described training module comprises:

17. devices according to claim 11, is characterized in that, described identification module comprises:

18. devices according to claim 17, is characterized in that, prestore the face characteristic data with reference to face in described face database, and the described face characteristic data with reference to face comprise: the described LDA projection matrix with reference to face; Described device also comprises:

19. devices according to claim 11, is characterized in that, described detection module comprises: the 3rd detection sub-module;

20. devices according to claim 12, is characterized in that, described predetermined interval is for presetting at equal intervals or default unequal interval.

21. 1 kinds of video process apparatus, is characterized in that, comprising:

Processor;

For the storer of storage of processor executable instruction;

Wherein, described processor is used for: