CN106845357B - A kind of video human face detection and recognition methods based on multichannel network - Google Patents

A kind of video human face detection and recognition methods based on multichannel network Download PDF

Info

Publication number
CN106845357B
CN106845357B CN201611214990.4A CN201611214990A CN106845357B CN 106845357 B CN106845357 B CN 106845357B CN 201611214990 A CN201611214990 A CN 201611214990A CN 106845357 B CN106845357 B CN 106845357B
Authority
CN
China
Prior art keywords
face
image
video
posture
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611214990.4A
Other languages
Chinese (zh)
Other versions
CN106845357A (en
Inventor
钱小鸿
车志聪
吴越
陈涛
李建元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinjiang Technology Co.,Ltd.
Original Assignee
Enjoyor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Enjoyor Co Ltd filed Critical Enjoyor Co Ltd
Priority to CN201611214990.4A priority Critical patent/CN106845357B/en
Publication of CN106845357A publication Critical patent/CN106845357A/en
Application granted granted Critical
Publication of CN106845357B publication Critical patent/CN106845357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A kind of video human face detection and recognition methods based on multichannel network, include the following steps: S1: video pre-filtering adds temporal information to each frame image;S2: target Face datection and posture coefficient calculate;S3: human face posture is corrected: for m obtained in S2 faces, carrying out pose adjustment;S4: the face characteristic based on deep neural network extracts;S5 face characteristic compares: for the face of input, after obtaining feature vector using step S4, recycling COS distance to match the matching degree of vector in input face feature vector and feature database, is greater than given threshold with face COS distance to be identifiedClass be added in alternative class, if the feature of face to be identified and the COS distance of the central feature of all classes are respectively less thanThen it is considered as the information of not stored the people in database, terminates identification.The present invention provides a kind of higher video human face detections and recognition methods based on multichannel network of accuracy rate.

Description

A kind of video human face detection and recognition methods based on multichannel network
Technical field
The present invention relates to a kind of video human faces to detect identification field, more particularly, to a kind of video people based on deep learning Face recognition method.
Background technique
Video monitoring is an important component of security system.With video sensor technology and corresponding mating skill The development of art, the simulated monitoring system since most, digital-to-analog monitoring system later start the ip prison of application till now The application range of control system, video monitoring is increasing, and especially public security system disposes video monitoring system largely to be applied to The fields such as security administration and suspect's tracking.
Fast-developing video monitoring system produces the monitor video data of magnanimity, tracks neck in security administration and suspect What the people occurred is found in domain to one or number picture that a main processing of these video datas is by a people Video file and corresponding frame number.Traditional approach is searched these video datas using manpower, is arranged, main disadvantage It is that search efficiency is low, error rate is high, needs to exclude repeatedly, and spends the time long, with the continuous hair of video monitoring system Exhibition and monitor video data are continuously increased, and the method that everybody traditional work is searched is more and more not applicable.
It automatically handled, known to being collected into ground video information by video image processing technology and mode identification technology Not, very mature application is had been achieved in field of traffic control.But the existing identification side based on image procossing Method can only reach higher recognition accuracy in ideal circumstances, change complexity, lower picture quality, to be identified right in light When as attitudes vibration, omission factor and false detection rate just be will rise rapidly, so can't to the automatic identification of face in video, analysis Reach the level of practical application.
With the development of big data and depth learning technology, depth learning technology is applied in face detection system, and Obtain preferable effect.The more existing video human face based on deep learning identifies patent, such as patent Video is regarded as one group of picture by CN201511033733.6, and the face obtained on the preferable picture of several quality is known Not, this method has ignored the connection between different frame, is lost a large amount of information, when video quality is deteriorated, so that finally The accuracy degradation of identification.
Patent CN201510471210.3 proposes a kind of real-time face recognition methods based on deep neural network, this method The feature vector of face, the first Hamming distance between calculating feature vector are extracted by deep neural network, then at selection Object of the face as secondary identification under threshold value, calculates the Euclidean distance of the feature vector of secondary identification, with it is European away from From judging that face to be identified belongs to which face in face database, with improve face database it is larger when operation efficiency.
From the point of view of existing method, the face information of certain frames mainly in extraction video image utilizes deep learning It is trained and is detected identification, connection on the time-space between frame each in video image is considered not yet, it is accurate to cause Rate is lower.
Summary of the invention
In order to overcome the lower deficiency of accuracy rate of existing video human face detection and recognition methods, the present invention provides one kind The higher video human face detection and recognition methods based on multichannel network of accuracy rate.
The technical solution adopted by the present invention to solve the technical problems is:
A kind of video human face detection and recognition methods based on multichannel network, includes the following steps:
S1: video pre-filtering
The video data that monitoring device is collected into is received, and is broken down into image one by one, gives each frame image In addition temporal information;
S2: target Face datection and posture coefficient calculate, and process is as follows:
Face location and corresponding face position in video image are extracted, face and standard appearance in video image are calculated The distance of state human face five-sense-organ calculates posture coefficient, integrates the close image of posture, by adjacent location between frames are close and posture coefficient phase The smallest face of difference is considered as the face in same face race;Threshold value φ is defined, for each face race, chooses m p < φ's Face;If the facial image number of p < φ is m in the face raceP < φ, then it is posture system p in the face race is the smallest by one It opens facial image and replicates m-mP < φPart, m images are together constituted with other images, are input in S3;
S3: human face posture is corrected: for m obtained in S2 faces, carrying out pose adjustment;
S4: the face characteristic based on deep neural network extracts, and process is as follows:
S4.1 face characteristic extracts network training
When the face characteristic for carrying out video image extracts, advances with face database and has carried out characteristic model training, M images in face database under everyone different angle, different illumination are obtained, wherein m images are randomly selected, to this m After opening image progress posture correction, it is combined into w' × h' × 3m facial image, wherein w' is the width of training picture, and h' is The height of training picture, 3m are the channel RGB3 multiplied by amount of images m, in face database everyone carry out aforesaid operations, and compile Upper label is input to training in neural network;
S4.2 video human face feature extraction
The face color image after w × h × 3 are corrected is opened by having obtained m in S3, each image there are 3 channels, will be different Image is fused together in the form of different channels, that is, permeate the facial image w × h × 3m for having 3 × m channel;
The input S4.1 training of obtained this w × h × 3m facial image is obtained face characteristic to extract in network, and Finally obtain the feature vector for representing the face;
S5 face characteristic compares
For the face of input, after obtaining feature vector using step S4, COS distance is recycled to match input face The matching degree of vector, calculating process are as follows in feature vector and feature database:
S5.1 preliminary screening
Calculate the COS distance of the feature of face to be identified and the central feature of each class, calculation such as formula (10) institute Show,Operation is expressed as vectorTwo norms, i.e. vectorLength, cos θ is vectorWith vectorCosine away from From:
It is greater than given threshold with face COS distance to be identifiedClass be added in alternative class, if the spy of face to be identified It levies and is respectively less than with the COS distance of the central feature of all classesThen it is considered as the information of not stored the people in database, terminates to know Not.
Further, the step S5 is further comprising the steps of:
S5.2 is accurately screened
For each of each alternative class face, calculate they feature vector and face to be identified feature to The COS distance of amount, choose wherein COS distance be more than given threshold ρ face as recognition result, and will recognition result place Video image output;If the COS distance of each of each alternative class face and face to be identified is respectively less than ρ, It is considered as the information of not stored the people in database.
Further, in the step S1, the first frame image in video received is image 1, then temporally suitable Sequence sets the t frame image in video as image t, with ItIt indicates t frame image, the frame image collection of same video is indicated with I, After completing to the pretreatment of video, temporally vertical sequence passes to the image of decomposition in human face target detection module.
Further, in the step S2, the process that target Face datection and posture coefficient calculate is as follows:
S2.1 extracts face location and corresponding face position in video image
For each frame image It, face present in the frame image and corresponding face are found out using Lis Hartel sign Coordinate is denoted as F respectively1(x1,y1), F2(x2,y2), F3(x3,y3), F4(x4,y4), F5(x5,y5);
S2.2 calculates the distance of face and standard posture human face five-sense-organ in video image
The coordinate for enabling human face five-sense-organ in standard pose presentation I' is F1'(x'1,y'1), F2'(x'2,y'2), F3'(x'3,y '3), F4'(x'4,y'4), F5'(x'5,y'5), video image I is calculated using formula (1) and formula (2)tWith standard pose presentation I' Mutual distance between middle human face five-sense-organ:
Wherein, (xi,yi)、(xj,yj) indicate the coordinates of different face in face to be found, (x'i,y'i)、(x'j,y'j) table The coordinate of different face, d in the quasi- pose presentation of indicatingijIndicate the mutual distance between human face five-sense-organ to be identified, d'ijExpression standard Mutual distance in pose presentation between human face five-sense-organ;
S2.3 calculates posture coefficient, integrates the close image of posture
The posture coefficient p for defining face calculates posture coefficient p using formula (3):
Wherein, λ is zoom factor, is caused when with to avoid facial image to be identified and inconsistent standard pose presentation scale Error, the value of λ can be calculated by formula (4), i.e. λ takes so that the smallest value of posture coefficient;
In the step S3, the step of pose adjustment, is as follows:
S3.1 calculates face rotating vector
By the coordinate of five features point in known standard faces model and video, figure is obtained using POSIT algorithm The posture information of face as in, i.e. the rotating vector R of face, i.e.,
The mapping relations of S3.2 calculating correcting image and original image
By the rotating vector of face, a certain pixel a certain picture into protoplast's face image in facial image after being corrected The mapping relations of vegetarian refreshments using two lines as x-axis, construct coordinate system, enable using face axis as y-axis in image after correction (x, y)=f (x', y') is some mapping of (x, y) on a bit (x', y') to original image on image after correcting, specific as follows:
The correction of S3.3 posture
Rgb'(x, y) it is rgb value after correction on image at (x, y), rgb (x, y) is in protoplast's face image at (x, y) Rgb value, then the rgb value after being corrected using formula (7) in facial image on certain point (x, y), if
Wherein, G is gaussian probability matrix, in the actual operation process, because of the practical threedimensional model and mark of different faces There are certain differences for quasi-three-dimensional model, so the mapping relations of certain point and original image corresponding points after correction on image can exist Certain error, thus the rgb value of certain point is common by the rgb of 9 points near corresponding position on original image on image after correction It obtains, i.e., acquires the desired value of rgb value at the point by gaussian probability matrix G, the rgb value as the point;K is in formula (7) Ratio value has been previously set;
After carrying out human face posture correction to every face in same face race, obtaining several sizes of m is w × h × 3 Facial image, i.e., the color image of the one w × h pixel for possessing RGB3 channel.
In the step S4.1, using gradient descent algorithm training neural network, every input batch picture simultaneously calculates damage The weight of neural network is updated after mistake, 512 dimensional vectors that full articulamentum 3 exports in network indicate which people the face of input is Probability does softmax recurrence to it, obtains corresponding loss function, i.e., as shown in formula (9), wherein k indicates input picture Affiliated classification, zkIndicate k-th of numerical value in 512 dimensional vectors of the full output of articulamentum 3:
Loss=∑-logf (zk) (10)
After calculating loss function, by preceding to calculating and the calculating of reversed gradient is inferred, each layer in neural network is calculated Updated value, the weight of each layer is updated;
Clustering is carried out to establish spatial index to the face characteristic collection in database in advance, operating procedure is as follows:
S4.1.1 uses clustering algorithm for the feature in face characteristic library, if it is Ganlei that these face characteristics, which are gathered,;
S4.1.2 calculates the mean value of the feature vector of all faces in class, is denoted as such central feature for every one kind.
The present invention can be in the view of magnanimity in conjunction with deep learning method using the relevance between each frame of monitor video The frame number of target face occurred in frequency file video file and place.Compared to other methods, the present invention passes through collection Image information of the same person in different frame, is comprehensively considered, can farthest using video data information, from And improve identification ground accuracy rate.
Beneficial effects of the present invention are mainly manifested in:
1, when previous method detects face in video, all it is the image individually considered in different frame, fails Enough effectively utilize the information in video.This method utilizes the method that different images are regarded as different channels, by the same face Image co-registration in different frame is the image of a multichannel, and implicitly extracts synthesis using multilayer convolutional network and extract this The union feature of a little images.On the one hand, accuracy rate can be improved to the greatest extent using the information in video;On the other hand, Multiple images are input in neural network simultaneously, avoid time waste caused by repeatedly input;
2, due to needing an image that multiple facial images permeate, it is necessary to ensure identical bits in different faces image The feature set generally remains unanimously, i.e., in different images, eyes, nose etc. will be generally in the same position.Otherwise feature Extracting network can be difficult to restrain.Thus, before carrying out feature extraction, it is necessary to carry out posture correction to facial image.In we In human face posture in method, face is utilized along the symmetrical feature of axis, correct descendant certain point on the face rgb value have its The rgb value of original image corresponding position and the rgb value synthesis for symmetrically locating corresponding points along central axes obtain.In this way, largely keeping away Exempt from the drawbacks of information in Generic face posture antidote is lost, improves the accuracy rate finally identified;
3, the quality in view of frame is had to for the selection of particular frame in video, and the human face posture in frame image is to determine One principal element of framing quality, but Attitude estimation is all carried out to each of each frame image face in real process Cost is excessive.So defining a kind of posture coefficient in this method to judge the posture of face, it can preferably reflect face Posture, while calculation amount is small.It thus can rapidly and accurately find out preferably several frame images of quality in a face race.
Detailed description of the invention
Fig. 1 is overview flow chart of the invention.
Fig. 2 is the deep learning network structure used in the embodiment of the present invention.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings.
Referring to Figures 1 and 2, a kind of video human face detection and recognition methods based on multichannel network, the method packet Include following steps:
S1: video pre-filtering
The video data that monitoring device is collected into is received, and is broken down into image one by one, gives each frame image In addition temporal information, specifically: the first frame image in the video received is image 1, then sets video in chronological order In t frame image be image t.In following narration, with ItIt indicates t frame image, the frame image of same video is indicated with I Set.After completing to the pretreatment of video, temporally the image of decomposition is passed to human face target detection mould by vertical sequence In block.
S2: target Face datection and posture coefficient calculate
S2.1 extracts face location and corresponding face position in video image
For each frame image It, face present in the frame image and corresponding face are found out using Lis Hartel sign The coordinate of (two sides of eyes, nose and lip), is denoted as F respectively1(x1,y1), F2(x2,y2), F3(x3,y3)。F4(x4,y4), F5(x5,y5)
S2.2 calculates the distance of face and standard posture human face five-sense-organ in video image
The coordinate for enabling human face five-sense-organ (two sides of eyes, nose and lip) in standard pose presentation I' is F1'(x'1,y '1), F2'(x'2,y'2), F3'(x'3,y'3)。F4'(x'4,y'4), F5'(x'5,y'5), it is calculated using formula (1) and formula (2) Video image ItAnd mutual distance between human face five-sense-organ in standard pose presentation I'.
Wherein, (xi,yi)、(xj,yj) indicate the coordinates of different face in face to be found, (x'i,y'i)、(x'j,y'j) table The coordinate of different face, d in the quasi- pose presentation of indicatingijIndicate the mutual distance between human face five-sense-organ to be identified, d'ijExpression standard Mutual distance in pose presentation between human face five-sense-organ.
S2.3 calculates posture coefficient, integrates the close image of posture.
The posture coefficient p for defining face calculates posture coefficient p using formula (3), for measuring the posture of face to be identified With the difference of standard posture, the smaller posture for indicating face to be identified of p gets over standard, the i.e. standard as the image of recognition of face True rate is higher;The bigger posture for indicating face to be identified of p is further off standard posture, i.e. picture quality is poor, as face The accuracy rate of the image of identification is lower:
Wherein, λ is zoom factor, is caused when with to avoid facial image to be identified and inconsistent standard pose presentation scale Error, the value of λ can be calculated by formula (4), i.e. λ takes so that the smallest value of posture coefficient.
Adjacent location between frames are approached and posture coefficient differ most by the posture coefficient for calculating all facial images detected Small face is considered as the face in same face race.Threshold value φ is defined, for each face race, chooses the people of m p < φ Face.If the facial image number of p < φ is m in the face raceP < φ, then it is posture system p in the face race is one the smallest Facial image replicates m-mP < φPart, m images are together constituted with other images, are input in S3.
S3: human face posture is corrected
For m obtained in S2 faces, pose adjustment is carried out, the specific steps are as follows:
S3.1 calculates face rotating vector
In monitor video, the people of appearance generally farther out from camera, i.e., want with a distance from camera on face by characteristic point Much larger than distance between them, then can be made by the coordinate of five features point in known standard faces model and video The posture information of face in image, i.e. the rotating vector R of face are obtained with POSIT algorithm.I.e.
The mapping relations of S3.2 calculating correcting image and original image
By the rotating vector of face, after available correction in facial image a certain pixel into protoplast's face image certain The mapping relations of one pixel, using two lines as x-axis, construct coordinate using face axis as y-axis in image after correction System.Enabling (x, y)=f (x', y') is some mapping of (x, y) on a bit (x', y') to original image on image after correcting.Specifically such as Under:
The correction of S3.3 posture
Rgb'(x, y) it is rgb value after correction on image at (x, y), rgb (x, y) is in protoplast's face image at (x, y) Rgb value.Rgb value after then being corrected using formula (7) in facial image on certain point (x, y).If
Wherein, G is gaussian probability matrix, in the actual operation process, because of the practical threedimensional model and mark of different faces There are certain differences for quasi-three-dimensional model, so the mapping relations of certain point and original image corresponding points after correction on image can exist Certain error, thus the rgb value of certain point is common by the rgb of 9 points near corresponding position on original image on image after correction It obtains, i.e., acquires the desired value of rgb value at the point by gaussian probability matrix G, the rgb value as the point.
Here, correcting image utilizes the characteristic of facial symmetry, with the comprehensively considering face side of the human face image information after correction Information, maximumlly extract information, avoid when face rotation angle spends big the problem of information after posture correction is lost.Formula (7) k is that ratio value has been previously set in, the value between 0-0.5, the smaller image for meaning more to consider independent side of k Information, k mean more greatly the image information for more comprehensively considering two sides.
After carrying out human face posture correction to every face in same face race, obtaining several sizes of m is w × h × 3 Facial image, i.e., the color image of the one w × h pixel for possessing RGB3 channel.
S4 is extracted based on the face characteristic of deep neural network
S4.1 face characteristic extracts network training
When the face characteristic for carrying out video image extracts, face database is advanced with and has carried out characteristic model training. Training method is as follows:
M images in face database under everyone different angle, different illumination are obtained, wherein m figures are randomly selected Picture is combined into w' × h' × 3m facial image after carrying out posture correction to this m images, and wherein w' is training picture Width, h' be training picture height, 3m be the channel RGB3 multiplied by amount of images m.To in face database everyone carry out it is above-mentioned Operation, and it is numbered with label, it is input to training in neural network.
It is this different images to be considered as after the fusion of different channels neural network as input we are referred to as multichannel Network.
For the present invention using gradient descent algorithm training neural network, batch is set as 256, i.e., 256 pictures of every input are simultaneously Calculate the weight that neural network is updated after losing.
512 dimensional vectors that full articulamentum 3 exports in above-mentioned network indicate that the face of input is the probability of which people, do to it Softmax is returned, and obtains corresponding loss function, i.e., as shown in formula (9), wherein and k indicates classification belonging to input picture, zkIndicate k-th of numerical value in 512 dimensional vectors of the full output of articulamentum 3.
Loss=∑-logf (zk) (19)
After calculating loss function, by preceding to calculating and the calculating of reversed gradient is inferred, each layer in neural network is calculated Updated value, the weight of each layer is updated, with realize reduce loss purpose, to optimize network.
In practical applications, it is contemplated that the number of stored face may be very huge in database, can be preparatory Clustering first is carried out to establish spatial index to the face characteristic collection in database, specific steps are as follows:
S4.1.1 uses clustering algorithm for the feature in face characteristic library, these face characteristics are gathered and are by such as Kmeans If Ganlei.
S4.1.2 calculates the mean value of the feature vector of all faces in class, is denoted as such central feature for every one kind.
S4.2 video human face feature extraction
The face color image after w × h × 3 are corrected is opened by having obtained m in S3, each image there are 3 channels, will be different Image is fused together in the form of different channels, that is, permeate the facial image w × h × 3m for having 3 × m channel
The input S4.1 training of obtained this w × h × 3m facial image is obtained face characteristic to extract in network, and Finally obtain the feature vector for representing the face.
S5 face characteristic compares
For the face of input, after obtaining feature vector using step S4, COS distance is recycled to match input face The matching degree of vector, calculating process are as follows in feature vector and feature database:
S5.1 preliminary screening
Calculate the COS distance of the feature of face to be identified and the central feature of each class, calculation such as formula (10) institute Show,Operation is expressed as vectorTwo norms, i.e. vectorLength (Indicate vectorLength,Indicate to AmountLength), cos θ is vectorWith vectorCOS distance:
It is greater than given threshold with face COS distance to be identifiedClass be added in alternative class.If the spy of face to be identified It levies and is respectively less than with the COS distance of the central feature of all classesThen it is considered as the information of not stored the people in database, terminates to know Not.
S5.2 is accurately screened
For each of each alternative class face, calculate they feature vector and face to be identified feature to The COS distance of amount, choose wherein COS distance be more than given threshold ρ face as recognition result, and will recognition result place Video image output;If the COS distance of each of each alternative class face and face to be identified is respectively less than ρ, It is considered as the information of not stored the people in database, terminates identification.
So far, the identification of video human face is completed.

Claims (6)

1. a kind of video human face detection and recognition methods based on multichannel network, it is characterised in that: the method includes such as Lower step:
S1: video pre-filtering
The video data that monitoring device is collected into is received, and is broken down into image one by one, is added to each frame image Temporal information;
S2: target Face datection and posture coefficient calculate, and process is as follows:
Face location and corresponding face position in video image are extracted, face and standard posture people in video image are calculated The distance of face face calculates posture coefficient, integrates the close image of posture, and adjacent location between frames are approached and posture coefficient differs most Small face is considered as the face in same face race;Threshold value φ is defined, for each face race, chooses the people of m p < φ Face;If the facial image number of p < φ is m in the face raceP < φ, then it is posture coefficient p in the face race is the smallest by one It opens facial image and replicates m-mP < φPart, m images are together constituted with other images, are input in S3;
S3: human face posture is corrected: for m obtained in S2 faces, carrying out pose adjustment;
S4: the face characteristic based on deep neural network extracts, and process is as follows:
S4.1 face characteristic extracts network training
When the face characteristic for carrying out video image extracts, face database is advanced with and has carried out characteristic model training, obtained M images in face database under everyone different angle, different illumination, randomly select wherein m images, to this m figures After carrying out posture correction, it is combined into w' × h' × 3m facial image, wherein w' is the width of training picture, and h' is training The height of picture, 3m are the channel RGB3 multiplied by amount of images m, in face database everyone carry out aforesaid operations, and be numbered with mark Label are input to training in neural network;
S4.2 video human face feature extraction
The face color image after w × h × 3 are corrected is opened by having obtained m in S3, each image there are 3 channels, by different images It is fused together in the form of different channels, that is, permeate the facial image w × h × 3m for having 3 × m channel;
The input S4.1 training of obtained this w × h × 3m facial image is obtained face characteristic to extract in network, and final The feature vector of the face is represented to one;
S5 face characteristic compares
For the face of input, after obtaining feature vector using step S4, COS distance is recycled to match input face characteristic The matching degree of vector, calculating process are as follows in vector and feature database:
S5.1 preliminary screening
The COS distance of the feature of face to be identified and the central feature of each class is calculated, shown in calculation such as formula (10),Operation is expressed as vectorTwo norms, i.e. vectorLength, cos θ is vectorWith VectorCOS distance:
It is greater than given threshold with face COS distance to be identifiedClass be added in alternative class, if the feature of face to be identified with The COS distance of the central feature of all classes is respectively less thanThen it is considered as the information of not stored the people in database, terminates identification.
2. a kind of video human face detection and recognition methods based on multichannel network as described in claim 1, it is characterised in that: The step S5 is further comprising the steps of:
S5.2 is accurately screened
For each of each alternative class face, their feature vector and the feature vector of face to be identified are calculated COS distance chooses face that wherein COS distance is more than given threshold ρ as recognition result, and by the view where recognition result The output of frequency image;If the COS distance of each of each alternative class face and face to be identified is respectively less than ρ, it is considered as The information of not stored the people in database.
3. a kind of video human face detection and recognition methods, feature based on multichannel network as claimed in claim 1 or 2 exists In: in the step S1, the first frame image in video received is image 1, is then set in video in chronological order T frame image is image t, with ItIt indicates t frame image, the frame image collection of same video is indicated with I, complete to the pre- of video After processing, temporally vertical sequence passes to the image of decomposition in human face target detection module.
4. a kind of video human face detection and recognition methods, feature based on multichannel network as claimed in claim 1 or 2 exists In: in the step S2, the process that target Face datection and posture coefficient calculate is as follows:
S2.1 extracts face location and corresponding face position in video image
For each frame image It, the coordinate of face present in the frame image and corresponding face is found out using Lis Hartel sign, It is denoted as F respectively1(x1,y1), F2(x2,y2), F3(x3,y3), F4(x4,y4), F5(x5,y5);
S2.2 calculates the distance of face and standard posture human face five-sense-organ in video image
The coordinate for enabling human face five-sense-organ in standard pose presentation I' is F1'(x'1,y'1), F2'(x'2,y'2), F3'(x'3,y'3), F4' (x'4,y'4), F5'(x'5,y'5), video image I is calculated using formula (1) and formula (2)tWith face in standard pose presentation I' Mutual distance between face:
Wherein, (xi,yi)、(xj,yj) indicate the coordinates of different face in face to be found, (x'i,y'i)、(x'j,y'j) indicate mark The coordinate of different face, d in quasi- pose presentationijIndicate the mutual distance between human face five-sense-organ to be identified, d'ijExpression standard posture Mutual distance in image between human face five-sense-organ;
S2.3 calculates posture coefficient, integrates the close image of posture
The posture coefficient p for defining face calculates posture coefficient p using formula (3):
Wherein, λ is zoom factor, with to avoid facial image to be identified and standard pose presentation scale it is inconsistent when caused by accidentally The value of difference, λ can be calculated by formula (4), i.e. λ takes so that the smallest value of posture coefficient;
5. a kind of video human face detection and recognition methods based on multichannel network as claimed in claim 4, it is characterised in that: In the step S3, the step of pose adjustment, is as follows:
S3.1 calculates face rotating vector
By the coordinate of five features point in known standard faces model and video, obtained in image using POSIT algorithm The posture information of face, i.e. the rotating vector R of face, i.e.,
The mapping relations of S3.2 calculating correcting image and original image
By the rotating vector of face, a certain pixel a certain pixel into protoplast's face image in facial image after being corrected Mapping relations, using face axis as y-axis in image after correction, using two lines as x-axis, construct coordinate system, enable (x, y) =f (x', y') is some mapping of (x, y) on a bit (x', y') to original image on image after correcting, specific as follows:
The correction of S3.3 posture
Rgb'(x, y) it is rgb value after correction on image at (x, y), rgb (x, y) is the rgb in protoplast's face image at (x, y) Value, then the rgb value after being corrected using formula (7) in facial image on certain point (x, y), if
Wherein, G (i, j) is gaussian probability matrix, in the actual operation process, because of the practical threedimensional model and mark of different faces There are certain differences for quasi-three-dimensional model, so the mapping relations of certain point and original image corresponding points after correction on image can exist Certain error, thus the rgb value of certain point is common by the rgb of 9 points near corresponding position on original image on image after correction It obtains, i.e., acquires the desired value of rgb value at the point by gaussian probability matrix G, the rgb value as the point;Kl is in formula (7) Ratio value has been previously set;
After carrying out human face posture correction to every face in same face race, the face figure that m Zhang great little is w × h × 3 is obtained Picture, i.e., the color image of the one w × h pixel for possessing RGB3 channel.
6. a kind of video human face detection and recognition methods based on multichannel network as claimed in claim 5, it is characterised in that: In the step S4.1, using gradient descent algorithm training neural network, every input batch picture and calculate lose after update The weight of neural network, 512 dimensional vectors that full articulamentum 3 exports in network indicate that the face of input is the probability of which people, right It does softmax recurrence, obtains corresponding loss function, i.e., as shown in formula (9), wherein zkIndicate what full articulamentum 3 exported K-th of numerical value in 512 dimensional vectors:
Loss=∑-log f (zk) (9)
After calculating loss function, is calculated and reversed gradient is calculated by preceding to inferring, calculate in neural network each layer more New value, is updated the weight of each layer;
Clustering is carried out to establish spatial index to the face characteristic collection in database in advance, operating procedure is as follows:
S4.1.1 uses clustering algorithm for the feature in face characteristic library, if it is Ganlei that these face characteristics, which are gathered,;
S4.1.2 calculates the mean value of the feature vector of all faces in class, is denoted as such central feature for every one kind.
CN201611214990.4A 2016-12-26 2016-12-26 A kind of video human face detection and recognition methods based on multichannel network Active CN106845357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611214990.4A CN106845357B (en) 2016-12-26 2016-12-26 A kind of video human face detection and recognition methods based on multichannel network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611214990.4A CN106845357B (en) 2016-12-26 2016-12-26 A kind of video human face detection and recognition methods based on multichannel network

Publications (2)

Publication Number Publication Date
CN106845357A CN106845357A (en) 2017-06-13
CN106845357B true CN106845357B (en) 2019-11-05

Family

ID=59137016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611214990.4A Active CN106845357B (en) 2016-12-26 2016-12-26 A kind of video human face detection and recognition methods based on multichannel network

Country Status (1)

Country Link
CN (1) CN106845357B (en)

Families Citing this family (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392129A (en) * 2017-07-13 2017-11-24 浙江捷尚视觉科技股份有限公司 Face retrieval method and system based on Softmax
CN107481318A (en) * 2017-08-09 2017-12-15 广东欧珀移动通信有限公司 Replacement method, device and the terminal device of user's head portrait
CN107798308B (en) * 2017-11-09 2020-09-22 一石数字技术成都有限公司 Face recognition method based on short video training method
CN107818316A (en) * 2017-11-24 2018-03-20 合肥博焱智能科技有限公司 A kind of batch face identification system
CN107944416A (en) * 2017-12-06 2018-04-20 成都睿码科技有限责任公司 A kind of method that true man's verification is carried out by video
CN108108676A (en) * 2017-12-12 2018-06-01 北京小米移动软件有限公司 Face identification method, convolutional neural networks generation method and device
CN108197574B (en) * 2018-01-04 2020-09-08 张永刚 Character style recognition method, terminal and computer readable storage medium
CN108171191B (en) * 2018-01-05 2019-06-28 百度在线网络技术(北京)有限公司 Method and apparatus for detecting face
CN108256459B (en) * 2018-01-10 2021-08-24 北京博睿视科技有限责任公司 Security check door face recognition and face automatic library building algorithm based on multi-camera fusion
CN108460329B (en) * 2018-01-15 2022-02-11 任俊芬 Face gesture cooperation verification method based on deep learning detection
CN108427871A (en) * 2018-01-30 2018-08-21 深圳奥比中光科技有限公司 3D faces rapid identity authentication method and device
CN110555338A (en) * 2018-05-30 2019-12-10 北京三星通信技术研究有限公司 object identification method and device and neural network generation method and device
CN110555342B (en) * 2018-05-31 2022-08-26 杭州海康威视数字技术股份有限公司 Image identification method and device and image equipment
CN108875602A (en) * 2018-05-31 2018-11-23 珠海亿智电子科技有限公司 Monitor the face identification method based on deep learning under environment
CN109145940B (en) * 2018-07-02 2021-11-30 北京陌上花科技有限公司 Image recognition method and device
CN110751163B (en) * 2018-07-24 2023-05-26 杭州海康威视数字技术股份有限公司 Target positioning method and device, computer readable storage medium and electronic equipment
CN110874602A (en) * 2018-08-30 2020-03-10 北京嘀嘀无限科技发展有限公司 Image identification method and device
CN109190561B (en) * 2018-09-04 2022-03-22 四川长虹电器股份有限公司 Face recognition method and system in video playing
CN109389588A (en) * 2018-09-28 2019-02-26 大连民族大学 The method for measuring difference between video successive frame and its convolution characteristic pattern
CN109344764A (en) * 2018-09-28 2019-02-15 大连民族大学 Measure the system and device of difference between video successive frame and its convolution characteristic pattern
CN109684913A (en) * 2018-11-09 2019-04-26 长沙小钴科技有限公司 A kind of video human face mask method and system based on community discovery cluster
CN109800685A (en) * 2018-12-29 2019-05-24 上海依图网络科技有限公司 The determination method and device of object in a kind of video
CN109949276B (en) * 2019-02-28 2021-06-11 华中科技大学 Lymph node detection method for improving SegNet segmentation network
CN110276320A (en) * 2019-06-26 2019-09-24 杭州创匠信息科技有限公司 Guard method, device, equipment and storage medium based on recognition of face
CN110738103A (en) * 2019-09-04 2020-01-31 北京奇艺世纪科技有限公司 Living body detection method, living body detection device, computer equipment and storage medium
CN110648324A (en) * 2019-09-29 2020-01-03 百度在线网络技术(北京)有限公司 Vehicle tire burst early warning method and device and computer equipment
CN112668362B (en) * 2019-10-15 2023-06-16 浙江中正智能科技有限公司 Human evidence comparison model training method for dynamic optimization class proxy
CN111178129B (en) * 2019-11-25 2023-07-14 浙江工商大学 Multi-mode personnel identification method based on human face and gesture
CN111325712B (en) * 2020-01-20 2024-01-23 北京百度网讯科技有限公司 Method and device for detecting image validity
CN111339973A (en) * 2020-03-03 2020-06-26 北京华捷艾米科技有限公司 Object identification method, device, equipment and storage medium
CN111444916A (en) * 2020-03-26 2020-07-24 中科海微(北京)科技有限公司 License plate positioning and identifying method and system under unconstrained condition
CN111737525B (en) * 2020-06-03 2022-10-25 西安交通大学 Multi-video program matching method
CN111681271B (en) * 2020-08-11 2020-10-30 湖南大学 Multichannel multispectral camera registration method, system and medium
CN111814760B (en) * 2020-08-24 2021-06-01 湖南视觉伟业智能科技有限公司 Face recognition method and system
CN112418322B (en) * 2020-11-24 2024-08-06 苏州爱医斯坦智能科技有限公司 Image data processing method and device, electronic equipment and storage medium
CN112525352A (en) * 2020-11-24 2021-03-19 深圳市高巨创新科技开发有限公司 Infrared temperature measurement compensation method based on face recognition and terminal
CN113269022B (en) * 2021-03-24 2022-02-08 七台河市公安局 System and method for predicting recessive specific personnel
CN113240092B (en) * 2021-05-31 2024-09-17 深圳市商汤科技有限公司 Neural network training and face recognition method, device, equipment and storage medium
CN113837040A (en) * 2021-09-14 2021-12-24 天津市国瑞数码安全系统股份有限公司 Video face detection method and system based on deep neural network
CN116311464B (en) * 2023-03-24 2023-12-12 北京的卢铭视科技有限公司 Model training method, face recognition method, electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004899A (en) * 2010-11-03 2011-04-06 无锡中星微电子有限公司 Human face identifying system and method
CN103049459A (en) * 2011-10-17 2013-04-17 天津市亚安科技股份有限公司 Feature recognition based quick video retrieval method
CN105654055A (en) * 2015-12-29 2016-06-08 广东顺德中山大学卡内基梅隆大学国际联合研究院 Method for performing face recognition training by using video data
CN106127170A (en) * 2016-07-01 2016-11-16 重庆中科云丛科技有限公司 A kind of merge the training method of key feature points, recognition methods and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004899A (en) * 2010-11-03 2011-04-06 无锡中星微电子有限公司 Human face identifying system and method
CN103049459A (en) * 2011-10-17 2013-04-17 天津市亚安科技股份有限公司 Feature recognition based quick video retrieval method
CN105654055A (en) * 2015-12-29 2016-06-08 广东顺德中山大学卡内基梅隆大学国际联合研究院 Method for performing face recognition training by using video data
CN106127170A (en) * 2016-07-01 2016-11-16 重庆中科云丛科技有限公司 A kind of merge the training method of key feature points, recognition methods and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Fusion ofcolor,localspatialandglobalfrequencyinformation;Zhiming Liu 等;《Pattern Recognition》;20100316;全文 *

Also Published As

Publication number Publication date
CN106845357A (en) 2017-06-13

Similar Documents

Publication Publication Date Title
CN106845357B (en) A kind of video human face detection and recognition methods based on multichannel network
CN109919977B (en) Video motion person tracking and identity recognition method based on time characteristics
CN107832672B (en) Pedestrian re-identification method for designing multi-loss function by utilizing attitude information
CN108052896B (en) Human body behavior identification method based on convolutional neural network and support vector machine
CN108520226B (en) Pedestrian re-identification method based on body decomposition and significance detection
CN104517102B (en) Student classroom notice detection method and system
CN108256439A (en) A kind of pedestrian image generation method and system based on cycle production confrontation network
US20200234036A1 (en) Multi-Camera Multi-Face Video Splicing Acquisition Device and Method Thereof
CN104881637B (en) Multimodal information system and its fusion method based on heat transfer agent and target tracking
CN110135249B (en) Human behavior identification method based on time attention mechanism and LSTM (least Square TM)
CN103177269B (en) For estimating the apparatus and method of object gesture
CN107369183A (en) Towards the MAR Tracing Registration method and system based on figure optimization SLAM
CN110008913A (en) Pedestrian re-identification method based on fusion of attitude estimation and viewpoint mechanism
CN103996046B (en) The personal identification method merged based on many visual signatures
CN106778604A (en) Pedestrian&#39;s recognition methods again based on matching convolutional neural networks
CN109800624A (en) A kind of multi-object tracking method identified again based on pedestrian
CN109190561B (en) Face recognition method and system in video playing
CN112966736B (en) Vehicle re-identification method based on multi-view matching and local feature fusion
CN102043953A (en) Real-time-robust pedestrian detection method aiming at specific scene
EP4174716A1 (en) Pedestrian tracking method and device, and computer readable storage medium
CN107766791A (en) A kind of pedestrian based on global characteristics and coarseness local feature recognition methods and device again
CN109341703A (en) A kind of complete period uses the vision SLAM algorithm of CNNs feature detection
CN109886356A (en) A kind of target tracking method based on three branch&#39;s neural networks
CN103729620B (en) A kind of multi-view pedestrian detection method based on multi-view Bayesian network
CN113221625A (en) Method for re-identifying pedestrians by utilizing local features of deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee after: Yinjiang Technology Co.,Ltd.

Address before: 310012 1st floor, building 1, 223 Yile Road, Hangzhou City, Zhejiang Province

Patentee before: ENJOYOR Co.,Ltd.