CN106845357B

CN106845357B - A kind of video human face detection and recognition methods based on multichannel network

Info

Publication number: CN106845357B
Application number: CN201611214990.4A
Authority: CN
Inventors: 钱小鸿; 车志聪; 吴越; 陈涛; 李建元
Original assignee: Enjoyor Co Ltd
Current assignee: Yinjiang Technology Co.,Ltd.
Priority date: 2016-12-26
Filing date: 2016-12-26
Publication date: 2019-11-05
Anticipated expiration: 2036-12-26
Also published as: CN106845357A

Abstract

A kind of video human face detection and recognition methods based on multichannel network, include the following steps: S1: video pre-filtering adds temporal information to each frame image；S2: target Face datection and posture coefficient calculate；S3: human face posture is corrected: for m obtained in S2 faces, carrying out pose adjustment；S4: the face characteristic based on deep neural network extracts；S5 face characteristic compares: for the face of input, after obtaining feature vector using step S4, recycling COS distance to match the matching degree of vector in input face feature vector and feature database, is greater than given threshold with face COS distance to be identifiedClass be added in alternative class, if the feature of face to be identified and the COS distance of the central feature of all classes are respectively less thanThen it is considered as the information of not stored the people in database, terminates identification.The present invention provides a kind of higher video human face detections and recognition methods based on multichannel network of accuracy rate.

Description

A kind of video human face detection and recognition methods based on multichannel network

Technical field

The present invention relates to a kind of video human faces to detect identification field, more particularly, to a kind of video people based on deep learning Face recognition method.

Background technique

Video monitoring is an important component of security system.With video sensor technology and corresponding mating skill The development of art, the simulated monitoring system since most, digital-to-analog monitoring system later start the ip prison of application till now The application range of control system, video monitoring is increasing, and especially public security system disposes video monitoring system largely to be applied to The fields such as security administration and suspect's tracking.

Fast-developing video monitoring system produces the monitor video data of magnanimity, tracks neck in security administration and suspect What the people occurred is found in domain to one or number picture that a main processing of these video datas is by a people Video file and corresponding frame number.Traditional approach is searched these video datas using manpower, is arranged, main disadvantage It is that search efficiency is low, error rate is high, needs to exclude repeatedly, and spends the time long, with the continuous hair of video monitoring system Exhibition and monitor video data are continuously increased, and the method that everybody traditional work is searched is more and more not applicable.

It automatically handled, known to being collected into ground video information by video image processing technology and mode identification technology Not, very mature application is had been achieved in field of traffic control.But the existing identification side based on image procossing Method can only reach higher recognition accuracy in ideal circumstances, change complexity, lower picture quality, to be identified right in light When as attitudes vibration, omission factor and false detection rate just be will rise rapidly, so can't to the automatic identification of face in video, analysis Reach the level of practical application.

With the development of big data and depth learning technology, depth learning technology is applied in face detection system, and Obtain preferable effect.The more existing video human face based on deep learning identifies patent, such as patent Video is regarded as one group of picture by CN201511033733.6, and the face obtained on the preferable picture of several quality is known Not, this method has ignored the connection between different frame, is lost a large amount of information, when video quality is deteriorated, so that finally The accuracy degradation of identification.

Patent CN201510471210.3 proposes a kind of real-time face recognition methods based on deep neural network, this method The feature vector of face, the first Hamming distance between calculating feature vector are extracted by deep neural network, then at selection Object of the face as secondary identification under threshold value, calculates the Euclidean distance of the feature vector of secondary identification, with it is European away from From judging that face to be identified belongs to which face in face database, with improve face database it is larger when operation efficiency.

From the point of view of existing method, the face information of certain frames mainly in extraction video image utilizes deep learning It is trained and is detected identification, connection on the time-space between frame each in video image is considered not yet, it is accurate to cause Rate is lower.

Summary of the invention

In order to overcome the lower deficiency of accuracy rate of existing video human face detection and recognition methods, the present invention provides one kind The higher video human face detection and recognition methods based on multichannel network of accuracy rate.

The technical solution adopted by the present invention to solve the technical problems is:

A kind of video human face detection and recognition methods based on multichannel network, includes the following steps:

S1: video pre-filtering

The video data that monitoring device is collected into is received, and is broken down into image one by one, gives each frame image In addition temporal information；

S2: target Face datection and posture coefficient calculate, and process is as follows:

Face location and corresponding face position in video image are extracted, face and standard appearance in video image are calculated The distance of state human face five-sense-organ calculates posture coefficient, integrates the close image of posture, by adjacent location between frames are close and posture coefficient phase The smallest face of difference is considered as the face in same face race；Threshold value φ is defined, for each face race, chooses m p < φ's Face；If the facial image number of p < φ is m in the face race_{P < φ}, then it is posture system p in the face race is the smallest by one It opens facial image and replicates m-m_{P < φ}Part, m images are together constituted with other images, are input in S3；

S3: human face posture is corrected: for m obtained in S2 faces, carrying out pose adjustment；

S4: the face characteristic based on deep neural network extracts, and process is as follows:

S4.1 face characteristic extracts network training

When the face characteristic for carrying out video image extracts, advances with face database and has carried out characteristic model training, M images in face database under everyone different angle, different illumination are obtained, wherein m images are randomly selected, to this m After opening image progress posture correction, it is combined into w' × h' × 3m facial image, wherein w' is the width of training picture, and h' is The height of training picture, 3m are the channel RGB3 multiplied by amount of images m, in face database everyone carry out aforesaid operations, and compile Upper label is input to training in neural network；

S4.2 video human face feature extraction

The face color image after w × h × 3 are corrected is opened by having obtained m in S3, each image there are 3 channels, will be different Image is fused together in the form of different channels, that is, permeate the facial image w × h × 3m for having 3 × m channel；

The input S4.1 training of obtained this w × h × 3m facial image is obtained face characteristic to extract in network, and Finally obtain the feature vector for representing the face；

S5 face characteristic compares

For the face of input, after obtaining feature vector using step S4, COS distance is recycled to match input face The matching degree of vector, calculating process are as follows in feature vector and feature database:

S5.1 preliminary screening

Calculate the COS distance of the feature of face to be identified and the central feature of each class, calculation such as formula (10) institute Show,Operation is expressed as vectorTwo norms, i.e. vectorLength, cos θ is vectorWith vectorCosine away from From:

It is greater than given threshold with face COS distance to be identifiedClass be added in alternative class, if the spy of face to be identified It levies and is respectively less than with the COS distance of the central feature of all classesThen it is considered as the information of not stored the people in database, terminates to know Not.

Further, the step S5 is further comprising the steps of:

S5.2 is accurately screened

For each of each alternative class face, calculate they feature vector and face to be identified feature to The COS distance of amount, choose wherein COS distance be more than given threshold ρ face as recognition result, and will recognition result place Video image output；If the COS distance of each of each alternative class face and face to be identified is respectively less than ρ, It is considered as the information of not stored the people in database.

Further, in the step S1, the first frame image in video received is image 1, then temporally suitable Sequence sets the t frame image in video as image t, with I_tIt indicates t frame image, the frame image collection of same video is indicated with I, After completing to the pretreatment of video, temporally vertical sequence passes to the image of decomposition in human face target detection module.

Further, in the step S2, the process that target Face datection and posture coefficient calculate is as follows:

S2.1 extracts face location and corresponding face position in video image

For each frame image I_t, face present in the frame image and corresponding face are found out using Lis Hartel sign Coordinate is denoted as F respectively₁(x₁,y₁), F₂(x₂,y₂), F₃(x₃,y₃), F₄(x₄,y₄), F₅(x₅,y₅)；

S2.2 calculates the distance of face and standard posture human face five-sense-organ in video image

The coordinate for enabling human face five-sense-organ in standard pose presentation I' is F₁'(x'₁,y'₁), F₂'(x'₂,y'₂), F₃'(x'₃,y '₃), F₄'(x'₄,y'₄), F₅'(x'₅,y'₅), video image I is calculated using formula (1) and formula (2)_tWith standard pose presentation I' Mutual distance between middle human face five-sense-organ:

Wherein, (x_i,y_i)、(x_j,y_j) indicate the coordinates of different face in face to be found, (x'_i,y'_i)、(x'_j,y'_j) table The coordinate of different face, d in the quasi- pose presentation of indicating_ijIndicate the mutual distance between human face five-sense-organ to be identified, d'_ijExpression standard Mutual distance in pose presentation between human face five-sense-organ；

S2.3 calculates posture coefficient, integrates the close image of posture

The posture coefficient p for defining face calculates posture coefficient p using formula (3):

Wherein, λ is zoom factor, is caused when with to avoid facial image to be identified and inconsistent standard pose presentation scale Error, the value of λ can be calculated by formula (4), i.e. λ takes so that the smallest value of posture coefficient；

In the step S3, the step of pose adjustment, is as follows:

S3.1 calculates face rotating vector

By the coordinate of five features point in known standard faces model and video, figure is obtained using POSIT algorithm The posture information of face as in, i.e. the rotating vector R of face, i.e.,

The mapping relations of S3.2 calculating correcting image and original image

By the rotating vector of face, a certain pixel a certain picture into protoplast's face image in facial image after being corrected The mapping relations of vegetarian refreshments using two lines as x-axis, construct coordinate system, enable using face axis as y-axis in image after correction (x, y)=f (x', y') is some mapping of (x, y) on a bit (x', y') to original image on image after correcting, specific as follows:

The correction of S3.3 posture

Rgb'(x, y) it is rgb value after correction on image at (x, y), rgb (x, y) is in protoplast's face image at (x, y) Rgb value, then the rgb value after being corrected using formula (7) in facial image on certain point (x, y), if

Wherein, G is gaussian probability matrix, in the actual operation process, because of the practical threedimensional model and mark of different faces There are certain differences for quasi-three-dimensional model, so the mapping relations of certain point and original image corresponding points after correction on image can exist Certain error, thus the rgb value of certain point is common by the rgb of 9 points near corresponding position on original image on image after correction It obtains, i.e., acquires the desired value of rgb value at the point by gaussian probability matrix G, the rgb value as the point；K is in formula (7) Ratio value has been previously set；

After carrying out human face posture correction to every face in same face race, obtaining several sizes of m is w × h × 3 Facial image, i.e., the color image of the one w × h pixel for possessing RGB3 channel.

In the step S4.1, using gradient descent algorithm training neural network, every input batch picture simultaneously calculates damage The weight of neural network is updated after mistake, 512 dimensional vectors that full articulamentum 3 exports in network indicate which people the face of input is Probability does softmax recurrence to it, obtains corresponding loss function, i.e., as shown in formula (9), wherein k indicates input picture Affiliated classification, z_kIndicate k-th of numerical value in 512 dimensional vectors of the full output of articulamentum 3:

Loss=∑-logf (z_k) (10)

After calculating loss function, by preceding to calculating and the calculating of reversed gradient is inferred, each layer in neural network is calculated Updated value, the weight of each layer is updated；

Clustering is carried out to establish spatial index to the face characteristic collection in database in advance, operating procedure is as follows:

S4.1.1 uses clustering algorithm for the feature in face characteristic library, if it is Ganlei that these face characteristics, which are gathered,；

S4.1.2 calculates the mean value of the feature vector of all faces in class, is denoted as such central feature for every one kind.

The present invention can be in the view of magnanimity in conjunction with deep learning method using the relevance between each frame of monitor video The frame number of target face occurred in frequency file video file and place.Compared to other methods, the present invention passes through collection Image information of the same person in different frame, is comprehensively considered, can farthest using video data information, from And improve identification ground accuracy rate.

Beneficial effects of the present invention are mainly manifested in:

1, when previous method detects face in video, all it is the image individually considered in different frame, fails Enough effectively utilize the information in video.This method utilizes the method that different images are regarded as different channels, by the same face Image co-registration in different frame is the image of a multichannel, and implicitly extracts synthesis using multilayer convolutional network and extract this The union feature of a little images.On the one hand, accuracy rate can be improved to the greatest extent using the information in video；On the other hand, Multiple images are input in neural network simultaneously, avoid time waste caused by repeatedly input；

2, due to needing an image that multiple facial images permeate, it is necessary to ensure identical bits in different faces image The feature set generally remains unanimously, i.e., in different images, eyes, nose etc. will be generally in the same position.Otherwise feature Extracting network can be difficult to restrain.Thus, before carrying out feature extraction, it is necessary to carry out posture correction to facial image.In we In human face posture in method, face is utilized along the symmetrical feature of axis, correct descendant certain point on the face rgb value have its The rgb value of original image corresponding position and the rgb value synthesis for symmetrically locating corresponding points along central axes obtain.In this way, largely keeping away Exempt from the drawbacks of information in Generic face posture antidote is lost, improves the accuracy rate finally identified；

3, the quality in view of frame is had to for the selection of particular frame in video, and the human face posture in frame image is to determine One principal element of framing quality, but Attitude estimation is all carried out to each of each frame image face in real process Cost is excessive.So defining a kind of posture coefficient in this method to judge the posture of face, it can preferably reflect face Posture, while calculation amount is small.It thus can rapidly and accurately find out preferably several frame images of quality in a face race.

Detailed description of the invention

Fig. 1 is overview flow chart of the invention.

Fig. 2 is the deep learning network structure used in the embodiment of the present invention.

Specific embodiment

The invention will be further described below in conjunction with the accompanying drawings.

Referring to Figures 1 and 2, a kind of video human face detection and recognition methods based on multichannel network, the method packet Include following steps:

S1: video pre-filtering

The video data that monitoring device is collected into is received, and is broken down into image one by one, gives each frame image In addition temporal information, specifically: the first frame image in the video received is image 1, then sets video in chronological order In t frame image be image t.In following narration, with I_tIt indicates t frame image, the frame image of same video is indicated with I Set.After completing to the pretreatment of video, temporally the image of decomposition is passed to human face target detection mould by vertical sequence In block.

S2: target Face datection and posture coefficient calculate

S2.1 extracts face location and corresponding face position in video image

For each frame image I_t, face present in the frame image and corresponding face are found out using Lis Hartel sign The coordinate of (two sides of eyes, nose and lip), is denoted as F respectively₁(x₁,y₁), F₂(x₂,y₂), F₃(x₃,y₃)。F₄(x₄,y₄), F₅(x₅,y₅)

The coordinate for enabling human face five-sense-organ (two sides of eyes, nose and lip) in standard pose presentation I' is F₁'(x'₁,y '₁), F₂'(x'₂,y'₂), F₃'(x'₃,y'₃)。F₄'(x'₄,y'₄), F₅'(x'₅,y'₅), it is calculated using formula (1) and formula (2) Video image I_tAnd mutual distance between human face five-sense-organ in standard pose presentation I'.

Wherein, (x_i,y_i)、(x_j,y_j) indicate the coordinates of different face in face to be found, (x'_i,y'_i)、(x'_j,y'_j) table The coordinate of different face, d in the quasi- pose presentation of indicating_ijIndicate the mutual distance between human face five-sense-organ to be identified, d'_ijExpression standard Mutual distance in pose presentation between human face five-sense-organ.

S2.3 calculates posture coefficient, integrates the close image of posture.

The posture coefficient p for defining face calculates posture coefficient p using formula (3), for measuring the posture of face to be identified With the difference of standard posture, the smaller posture for indicating face to be identified of p gets over standard, the i.e. standard as the image of recognition of face True rate is higher；The bigger posture for indicating face to be identified of p is further off standard posture, i.e. picture quality is poor, as face The accuracy rate of the image of identification is lower:

Wherein, λ is zoom factor, is caused when with to avoid facial image to be identified and inconsistent standard pose presentation scale Error, the value of λ can be calculated by formula (4), i.e. λ takes so that the smallest value of posture coefficient.

Adjacent location between frames are approached and posture coefficient differ most by the posture coefficient for calculating all facial images detected Small face is considered as the face in same face race.Threshold value φ is defined, for each face race, chooses the people of m p < φ Face.If the facial image number of p < φ is m in the face race_{P < φ}, then it is posture system p in the face race is one the smallest Facial image replicates m-m_{P < φ}Part, m images are together constituted with other images, are input in S3.

S3: human face posture is corrected

For m obtained in S2 faces, pose adjustment is carried out, the specific steps are as follows:

S3.1 calculates face rotating vector

In monitor video, the people of appearance generally farther out from camera, i.e., want with a distance from camera on face by characteristic point Much larger than distance between them, then can be made by the coordinate of five features point in known standard faces model and video The posture information of face in image, i.e. the rotating vector R of face are obtained with POSIT algorithm.I.e.

The mapping relations of S3.2 calculating correcting image and original image

By the rotating vector of face, after available correction in facial image a certain pixel into protoplast's face image certain The mapping relations of one pixel, using two lines as x-axis, construct coordinate using face axis as y-axis in image after correction System.Enabling (x, y)=f (x', y') is some mapping of (x, y) on a bit (x', y') to original image on image after correcting.Specifically such as Under:

The correction of S3.3 posture

Rgb'(x, y) it is rgb value after correction on image at (x, y), rgb (x, y) is in protoplast's face image at (x, y) Rgb value.Rgb value after then being corrected using formula (7) in facial image on certain point (x, y).If

Wherein, G is gaussian probability matrix, in the actual operation process, because of the practical threedimensional model and mark of different faces There are certain differences for quasi-three-dimensional model, so the mapping relations of certain point and original image corresponding points after correction on image can exist Certain error, thus the rgb value of certain point is common by the rgb of 9 points near corresponding position on original image on image after correction It obtains, i.e., acquires the desired value of rgb value at the point by gaussian probability matrix G, the rgb value as the point.

Here, correcting image utilizes the characteristic of facial symmetry, with the comprehensively considering face side of the human face image information after correction Information, maximumlly extract information, avoid when face rotation angle spends big the problem of information after posture correction is lost.Formula (7) k is that ratio value has been previously set in, the value between 0-0.5, the smaller image for meaning more to consider independent side of k Information, k mean more greatly the image information for more comprehensively considering two sides.

S4 is extracted based on the face characteristic of deep neural network

S4.1 face characteristic extracts network training

When the face characteristic for carrying out video image extracts, face database is advanced with and has carried out characteristic model training. Training method is as follows:

M images in face database under everyone different angle, different illumination are obtained, wherein m figures are randomly selected Picture is combined into w' × h' × 3m facial image after carrying out posture correction to this m images, and wherein w' is training picture Width, h' be training picture height, 3m be the channel RGB3 multiplied by amount of images m.To in face database everyone carry out it is above-mentioned Operation, and it is numbered with label, it is input to training in neural network.

It is this different images to be considered as after the fusion of different channels neural network as input we are referred to as multichannel Network.

For the present invention using gradient descent algorithm training neural network, batch is set as 256, i.e., 256 pictures of every input are simultaneously Calculate the weight that neural network is updated after losing.

512 dimensional vectors that full articulamentum 3 exports in above-mentioned network indicate that the face of input is the probability of which people, do to it Softmax is returned, and obtains corresponding loss function, i.e., as shown in formula (9), wherein and k indicates classification belonging to input picture, z_kIndicate k-th of numerical value in 512 dimensional vectors of the full output of articulamentum 3.

Loss=∑-logf (z_k) (19)

After calculating loss function, by preceding to calculating and the calculating of reversed gradient is inferred, each layer in neural network is calculated Updated value, the weight of each layer is updated, with realize reduce loss purpose, to optimize network.

In practical applications, it is contemplated that the number of stored face may be very huge in database, can be preparatory Clustering first is carried out to establish spatial index to the face characteristic collection in database, specific steps are as follows:

S4.1.1 uses clustering algorithm for the feature in face characteristic library, these face characteristics are gathered and are by such as Kmeans If Ganlei.

S4.2 video human face feature extraction

The face color image after w × h × 3 are corrected is opened by having obtained m in S3, each image there are 3 channels, will be different Image is fused together in the form of different channels, that is, permeate the facial image w × h × 3m for having 3 × m channel

The input S4.1 training of obtained this w × h × 3m facial image is obtained face characteristic to extract in network, and Finally obtain the feature vector for representing the face.

S5 face characteristic compares

S5.1 preliminary screening

Calculate the COS distance of the feature of face to be identified and the central feature of each class, calculation such as formula (10) institute Show,Operation is expressed as vectorTwo norms, i.e. vectorLength (Indicate vectorLength,Indicate to AmountLength), cos θ is vectorWith vectorCOS distance:

It is greater than given threshold with face COS distance to be identifiedClass be added in alternative class.If the spy of face to be identified It levies and is respectively less than with the COS distance of the central feature of all classesThen it is considered as the information of not stored the people in database, terminates to know Not.

S5.2 is accurately screened

For each of each alternative class face, calculate they feature vector and face to be identified feature to The COS distance of amount, choose wherein COS distance be more than given threshold ρ face as recognition result, and will recognition result place Video image output；If the COS distance of each of each alternative class face and face to be identified is respectively less than ρ, It is considered as the information of not stored the people in database, terminates identification.

So far, the identification of video human face is completed.

Claims

1. a kind of video human face detection and recognition methods based on multichannel network, it is characterised in that: the method includes such as Lower step:

S1: video pre-filtering

The video data that monitoring device is collected into is received, and is broken down into image one by one, is added to each frame image Temporal information；

Face location and corresponding face position in video image are extracted, face and standard posture people in video image are calculated The distance of face face calculates posture coefficient, integrates the close image of posture, and adjacent location between frames are approached and posture coefficient differs most Small face is considered as the face in same face race；Threshold value φ is defined, for each face race, chooses the people of m p < φ Face；If the facial image number of p < φ is m in the face race_{P < φ}, then it is posture coefficient p in the face race is the smallest by one It opens facial image and replicates m-m_{P < φ}Part, m images are together constituted with other images, are input in S3；

S4.1 face characteristic extracts network training

When the face characteristic for carrying out video image extracts, face database is advanced with and has carried out characteristic model training, obtained M images in face database under everyone different angle, different illumination, randomly select wherein m images, to this m figures After carrying out posture correction, it is combined into w' × h' × 3m facial image, wherein w' is the width of training picture, and h' is training The height of picture, 3m are the channel RGB3 multiplied by amount of images m, in face database everyone carry out aforesaid operations, and be numbered with mark Label are input to training in neural network；

S4.2 video human face feature extraction

The face color image after w × h × 3 are corrected is opened by having obtained m in S3, each image there are 3 channels, by different images It is fused together in the form of different channels, that is, permeate the facial image w × h × 3m for having 3 × m channel；

The input S4.1 training of obtained this w × h × 3m facial image is obtained face characteristic to extract in network, and final The feature vector of the face is represented to one；

S5 face characteristic compares

For the face of input, after obtaining feature vector using step S4, COS distance is recycled to match input face characteristic The matching degree of vector, calculating process are as follows in vector and feature database:

S5.1 preliminary screening

The COS distance of the feature of face to be identified and the central feature of each class is calculated, shown in calculation such as formula (10),Operation is expressed as vectorTwo norms, i.e. vectorLength, cos θ is vectorWith VectorCOS distance:

It is greater than given threshold with face COS distance to be identifiedClass be added in alternative class, if the feature of face to be identified with The COS distance of the central feature of all classes is respectively less thanThen it is considered as the information of not stored the people in database, terminates identification.

2. a kind of video human face detection and recognition methods based on multichannel network as described in claim 1, it is characterised in that: The step S5 is further comprising the steps of:

S5.2 is accurately screened

For each of each alternative class face, their feature vector and the feature vector of face to be identified are calculated COS distance chooses face that wherein COS distance is more than given threshold ρ as recognition result, and by the view where recognition result The output of frequency image；If the COS distance of each of each alternative class face and face to be identified is respectively less than ρ, it is considered as The information of not stored the people in database.

3. a kind of video human face detection and recognition methods, feature based on multichannel network as claimed in claim 1 or 2 exists In: in the step S1, the first frame image in video received is image 1, is then set in video in chronological order T frame image is image t, with I_tIt indicates t frame image, the frame image collection of same video is indicated with I, complete to the pre- of video After processing, temporally vertical sequence passes to the image of decomposition in human face target detection module.

4. a kind of video human face detection and recognition methods, feature based on multichannel network as claimed in claim 1 or 2 exists In: in the step S2, the process that target Face datection and posture coefficient calculate is as follows:

S2.1 extracts face location and corresponding face position in video image

For each frame image I_t, the coordinate of face present in the frame image and corresponding face is found out using Lis Hartel sign, It is denoted as F respectively₁(x₁,y₁), F₂(x₂,y₂), F₃(x₃,y₃), F₄(x₄,y₄), F₅(x₅,y₅)；

The coordinate for enabling human face five-sense-organ in standard pose presentation I' is F₁'(x'₁,y'₁), F₂'(x'₂,y'₂), F₃'(x'₃,y'₃), F₄' (x'₄,y'₄), F₅'(x'₅,y'₅), video image I is calculated using formula (1) and formula (2)_tWith face in standard pose presentation I' Mutual distance between face:

Wherein, (x_i,y_i)、(x_j,y_j) indicate the coordinates of different face in face to be found, (x'_i,y'_i)、(x'_j,y'_j) indicate mark The coordinate of different face, d in quasi- pose presentation_ijIndicate the mutual distance between human face five-sense-organ to be identified, d'_ijExpression standard posture Mutual distance in image between human face five-sense-organ；

S2.3 calculates posture coefficient, integrates the close image of posture

Wherein, λ is zoom factor, with to avoid facial image to be identified and standard pose presentation scale it is inconsistent when caused by accidentally The value of difference, λ can be calculated by formula (4), i.e. λ takes so that the smallest value of posture coefficient；

5. a kind of video human face detection and recognition methods based on multichannel network as claimed in claim 4, it is characterised in that: In the step S3, the step of pose adjustment, is as follows:

S3.1 calculates face rotating vector

By the coordinate of five features point in known standard faces model and video, obtained in image using POSIT algorithm The posture information of face, i.e. the rotating vector R of face, i.e.,

The mapping relations of S3.2 calculating correcting image and original image

By the rotating vector of face, a certain pixel a certain pixel into protoplast's face image in facial image after being corrected Mapping relations, using face axis as y-axis in image after correction, using two lines as x-axis, construct coordinate system, enable (x, y) =f (x', y') is some mapping of (x, y) on a bit (x', y') to original image on image after correcting, specific as follows:

The correction of S3.3 posture

Rgb'(x, y) it is rgb value after correction on image at (x, y), rgb (x, y) is the rgb in protoplast's face image at (x, y) Value, then the rgb value after being corrected using formula (7) in facial image on certain point (x, y), if

Wherein, G (i, j) is gaussian probability matrix, in the actual operation process, because of the practical threedimensional model and mark of different faces There are certain differences for quasi-three-dimensional model, so the mapping relations of certain point and original image corresponding points after correction on image can exist Certain error, thus the rgb value of certain point is common by the rgb of 9 points near corresponding position on original image on image after correction It obtains, i.e., acquires the desired value of rgb value at the point by gaussian probability matrix G, the rgb value as the point；Kl is in formula (7) Ratio value has been previously set；

After carrying out human face posture correction to every face in same face race, the face figure that m Zhang great little is w × h × 3 is obtained Picture, i.e., the color image of the one w × h pixel for possessing RGB3 channel.

6. a kind of video human face detection and recognition methods based on multichannel network as claimed in claim 5, it is characterised in that: In the step S4.1, using gradient descent algorithm training neural network, every input batch picture and calculate lose after update The weight of neural network, 512 dimensional vectors that full articulamentum 3 exports in network indicate that the face of input is the probability of which people, right It does softmax recurrence, obtains corresponding loss function, i.e., as shown in formula (9), wherein z_kIndicate what full articulamentum 3 exported K-th of numerical value in 512 dimensional vectors:

Loss=∑-log f (z_k) (9)

After calculating loss function, is calculated and reversed gradient is calculated by preceding to inferring, calculate in neural network each layer more New value, is updated the weight of each layer；