CN108564053A

CN108564053A - Multi-cam dynamic human face recognition system based on FaceNet and method

Info

Publication number: CN108564053A
Application number: CN201810370308.3A
Authority: CN
Inventors: 桂冠; 江斌; 任强; 戴菲; 熊健
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2018-04-24
Filing date: 2018-04-24
Publication date: 2018-09-21

Abstract

The invention discloses multi-cam dynamic human face recognition systems and method based on FaceNet, it includes the multiple camera monitoring systems for being mounted on management region to be applicable in scene to be, the camera of multiple monitoring systems acquires the video of pedestrian in real time from different perspectives, all monitoring systems pass through network connection to same server and the Intranet of shared server, including step 1, the video of pedestrian is acquired from different perspectives, and face's frame is cut out from video flowing using HAAR models frame by frame；2, face characteristic is extracted to all face's frames using FaceNet frames；3, classify to the feature extracted, to realize recognition of face.The hardware platform of present invention combination multi-cam and server, deep learning and feature recognition are applied on monitoring system, are had great importance to improving security administration.

Description

Multi-cam dynamic human face recognition system based on FaceNet and method

Technical field

The present invention relates to technical field of face recognition, and in particular to a kind of multi-cam dynamic human face based on FaceNet Identifying system further relates to the recognition methods area of this system.

Background technology

With the fast development of artificial intelligence technology and becoming increasingly popular for video monitoring equipment, intelligent monitoring is accurate with it, Timely and feature-rich and by various circles of society extensive concern.Currently, domestic many occasions are all furnished with monitoring, video monitoring is Through as another great Video Applications after DTV, video conference, and have become " scale of construction " maximum one A Video Applications system.Security administration monitors an important application as field of video monitoring.It can face video monitoring function It is single, record various, intelligent monitoring is not high many for the feature learning rate of face under different angle different illumination conditions How disadvantage improves the feature extraction rate of intelligent video monitoring, how so that intelligent monitoring training study institute under complex environment The problems such as model poor fitting obtained, faces significant challenge.Continuous improvement with the following safety-protection system cost performance and digital high-definition The development of the technologies such as change, intelligence, market application space will constantly increase.

Currently, video monitoring key Processing Algorithm includes automatic exposure algorithm, automatic white balance algorithm, automatic focusing calculation Method, wide dynamic algorithm etc..Better color rendition may be implemented in excellent Processing Algorithm, keeps acquired image more life-like, The video of monitoring scene is set to have better performance in the case where low-light (level) and light change greatly.And hard-disc storage often occurs Hard disk fragment caused by recording repeatedly, the high fever brought when hard disc data damage, multiple hard disk operationals caused by burst power down and The problems such as vibration, due to the particularity of security protection industry, it is desirable that video/audio stores 24 hours uninterrupted steady operations of whole day, answers It is complex with occasion.Intelligent video analysis technology is " machine in monitoring technology third developing stage " machine eye+machine brain " Device brain " part judges the monitoring of video pictures " human brain " using machine, carries out data analysis and refines feature formation calculation Method is implanted into machine, forms " machine brain " to video pictures automatic detection analysis, and makes alarm or other actions.It is by calculating The powerful data-handling capacity of machine filters out useless video pictures or interference information, automatically analyzes, extracts pass in video source Key useful information becomes the eyes of people to make video camera not only, computer is also made to become the brain of people.

Intelligent video monitoring progress feature extraction, which is still had, in the prior art cannot meet asking for real high request Topic, such as：

1) feature extraction rate is not high：Existing scheme is mostly that limited a major organs feature can only be carried out to face It practises, and the extraction of characteristic point is also much insufficient for the requirement of real recognition of face precision for more details.

2) (function and training of the picture matrix through fitting gained of the model poor fitting under complex environment obtained by training study It is larger to collect error) problem：Existing monitoring scheme, since extraction characteristic point is insufficient, causes to train under complicated environmental condition Model poor fitting.

3) multi-cam model sharing dynamic monitoring problem：There are identification features for the real-time intelligent monitoring of multi-cam at present The problem that can not be shared.

Invention content

It is an object of the invention to overcome deficiency in the prior art, a kind of multi-cam based on FaceNet is provided Dynamic human face recognition system and method, combine HAAR models and FaceNet frames model the feature of face various dimensions Match cognization can promote discrimination and accelerate operating rate.

In order to solve the above technical problems, the present invention provides a kind of, the multi-cam dynamic human face based on FaceNet identifies System, it includes the multiple camera monitoring systems for being mounted on management region to be applicable in scene to be, characterized in that multiple monitoring systems Camera acquires the video of pedestrian in real time from different perspectives, and all monitoring systems by network connection to same server and are total to Enjoy the Intranet of server；This face identification system includes video acquisition module, characteristic extracting module and tagsort module,

Wherein, video acquisition module, all monitoring systems acquire pedestrian in real time from different perspectives based on respective camera Video flowing；Face's frame is cut out from video flowing using HAAR models frame by frame, and all face's frames are sent to server；

Characteristic extracting module extracts face characteristic using FaceNet frames in server to all face's frames；

Tagsort module classifies to the face characteristic extracted, to realize recognition of face.

Further, all monitoring systems connected trained FaceNet model sharings to server in server.

Correspondingly, the present invention also provides a kind of multi-cam dynamic human face recognition methods based on FaceNet, including with Lower step：

Step S1 acquires the video of pedestrian, and cuts out face from video flowing frame by frame using HAAR models from different perspectives Portion's frame；

Step S2 extracts face characteristic using FaceNet frames to all face's frames；

Step S3 classifies to the feature extracted, to realize recognition of face.

Further, face's frame is indicated using 128 dimension various dimensions matrixes.

Further, in step S2, in triple training process, pass through every n (n ∈ N^*) step calculate subset argmax and Argmin is filtered, that is, expands iteration step length, is reduced model and is restrained number.

Further, in step S3, classified using KNN methods.

Compared with prior art, the advantageous effect of the invention reached is：Present system combine HAAR models and FaceNet frames carry out modeling match cognization to the feature of face various dimensions, can promote discrimination and accelerate operating rate.

Description of the drawings

Fig. 1 is the flow chart of the method for the present invention；

Fig. 2 is the schematic diagram that acquisition successive frame carries out Face datection in implementing；

Fig. 3 is the schematic diagram that HAAR models carry out Face datection；

Fig. 4 is FaceNet frame principles schematic diagrames.

Specific implementation mode

The invention will be further described below in conjunction with the accompanying drawings.Following embodiment is only used for clearly illustrating the present invention Technical solution, and not intended to limit the protection scope of the present invention.

The present invention a kind of multi-cam dynamic human face recognition system based on FaceNet, applicable scene be include installation Regarding for pedestrian is acquired in real time from different perspectives in the camera of multiple camera monitoring systems of management region, multiple monitoring systems Frequently, all monitoring systems pass through network connection to same server and the Intranet of shared server；This face identification system packet Video acquisition module, characteristic extracting module and tagsort module are included,

In terms of feature recognition, present invention incorporates HAAR models and FaceNet frames to representing the multidimensional of face picture The feature of degree matrix carries out modeling match cognization, can promote discrimination and accelerate operating rate.The invention can be applied to regard The multiple occasions, including bank, market, enterprise etc. of frequency monitoring.

Correspondingly, a kind of multi-cam dynamic human face recognition methods based on FaceNet of the present invention, as shown in Figure 1, packet Include following steps：

All camera monitoring systems are opened, the video of pedestrian is acquired in real time from different perspectives by camera, beats herein It opens camera and record and OpenCV technologies in the prior art may be used realize.

In the embodiment of the present invention, face's frame is extracted from the successive image frame of video flowing and uses HAAR features detection face, The schematic diagram of face is extracted from successive image frame referring to Fig. 2.HAAR features are a kind of grey scale change of reflection image, pixel Sub-module seeks a kind of feature of difference.It is divided into three classes：Edge feature, linear character, central feature and diagonal line feature.With black White two kinds of rectangle frames are combined into feature templates, this feature masterplate may include multiple combinations, such as shown in Fig. 3, black and white two Two rectangular characteristics that rectangle frame is combined into, three rectangular characteristics that three rectangle frames of white black and white are combined into, four black and white rectangle frame groups The four rectangular characteristic templates for closing square with black rectangle pixel and subtract white rectangle pixel and indicate in feature templates The characteristic value of this masterplate.

The calculation formula N of HAAR Characteristic Numbers is：

Wherein, W*H is picture size, and w*h is matrix character size,Representing matrix feature exists The maximum ratio coefficient that can both horizontally and vertically amplify.

It is detected after obtaining face picture with HAAR features, needs to be transformed into various dimensions to the face picture detected frame by frame Matrix.Various dimensions matrix is mapped in the feature vector of theorem in Euclid space by CNN, calculate different picture face characteristics away from From by the distance of the face of same individual, always less than the face of Different Individual, this priori trains FaceNet.It surveys It only needs to calculate face characteristic when examination, then calculates distance and can determine that whether two human face photos belong to identical using threshold value Individual.

It is different from application of other deep learning methods on face, the not useful traditional softmax's of FaceNet Mode goes to carry out classification learning, then extracts wherein a certain layer as feature；But end-to-end study one is directly carried out from figure Coding method as arriving theorem in Euclid space, is then based on this coding and does recognition of face, face verification and face cluster etc. again.

FaceNet algorithms eliminate last softmax, but calculate the mode of distance with tuple to carry out the instruction of model Practice.So that the graphical representation acquired in this way is compacted very much, uses 128 dimensions.The various dimensions matrix of face picture is obtained, is obtained Face more details characteristic point, to meet the requirement of real recognition of face precision.

The model framework of FaceNet is as shown in figure 4, Deep Architecture (depth framework) are convolutional neural networks Remove the structure after sofmax, by the normalization of L2, then obtain character representation, triple is calculated based on this character representation Loss.This FaceNet model is the prior art, is seldom repeated herein its model framework, referring to the prior art.

So-called triple (is made of, any one pictures all can serve as one Anchor, Negative, Positive A basic point (A), the picture that same people is then belonged to it is exactly its P, and the picture that same people is not belonging to it is exactly its N) It is exactly three samples, such as (anchor, pos, neg) (see attached drawing 4).The process so learnt is exactly to acquire a kind of expression, for Triple as much as possible so that the distance of anchor and pos is less than the distance of anchor and neg, because only that such ability Ensure that the distance of the face of same individual described above is always less than the distance of the face of Different Individual.I.e.：

So converting, object function is obtained：

Wherein, two norms on the left side indicate that inter- object distance, two norms on the right indicate between class distance, and α is a constant.Mesh The meaning of scalar functions is exactly to be optimized for being unsatisfactory for the triple of condition, i.e., as far as possible so that anchor and pos away from With a distance from less than anchor and neg；For meeting the triple of condition, just first no matter.

The selection of triple is extremely important for the convergence of model.ForNeed the different pictures of selection same personSo thatIt is also desirable to select different personal imagesSo thatIn hands-on, the argmax and argmin of all training samples are calculated It is unpractical, since label image accuracy rate is there are error, training convergence is highly difficult.

So passing through every n (n ∈ N in the present invention^*) step calculates the argmax and argmin of subset and filter, that is, expand iteration Step-length, reduces model and restrains number, and ensure influences convergent factor as far as possible is reduced.

Step S3 classifies to the feature extracted, to realize recognition of face and identities match.

In the embodiment of the present invention, classified to the feature that previous step is extracted using KNN algorithms in the prior art.KNN It is to be classified by measuring the distance between different characteristic value.Its thinking is：If K of the sample in feature space Most of in a most like sample (i.e. closest in feature space) belong to some classification, then the sample also belongs to this Classification.K is typically the integer no more than 20.In KNN algorithms, selected neighbours are the objects correctly classified.The party Method only determines the classification belonging to sample to be divided on determining class decision according to the classification of one or several closest samples.

In KNN, the non-similarity index between each object is used as by distance between computing object, avoids object Between matching problem, herein distance generally use Euclidean distance or manhatton distance：

Euclidean distance：

Wherein, x and y is theorem in Euclid space coordinate.

Manhatton distance：

Wherein, x and y is Manhattan space coordinate.

Meanwhile KNN is by according to the classification progress decision being dominant in K object, rather than single object type decision. This 2 points be exactly KNN algorithms advantage.The detailed process that KNN algorithms are classified is：Known to training intensive data and label In the case of, the feature of test data feature corresponding with training set is compared to each other, finds instruction by input test data Practice and concentrate the most similar preceding K data therewith, then the corresponding classification of the test data be exactly in K data occurrence number it is most That classification, algorithm is described as：

1) the distance between test data and each training data are calculated；

2) it is ranked up according to the incremental relationship of distance；

3) K point of selected distance minimum；

4) frequency of occurrences of classification where K point before determining；

5) the highest classification of the frequency of occurrences is classified as the prediction of test data in K point before returning.

All monitoring systems that previous trained model sharing to server is connected, i.e., all monitoring systems all may be used To directly invoke same training pattern by server platform, once identification feature is similar to the face characteristic in database, then Identities match is carried out in server internal.

Multi-cam dynamic human face recognition system and method based on FaceNet proposed by the invention, combines HAAR Model and FaceNet frames carry out modeling match cognization to face characteristic, can effectively improve safety defense monitoring system to different angles Spend the facial recognition capability of different illumination conditions, and all monitoring that model sharing to server is connected, i.e., all monitoring Equipment can be directly invoked same training pattern by server platform and carry out face identities match, identify target body in time Part, compared to traditional monitoring scheme, greatly improve the intelligent and safety of monitoring system.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, without departing from the technical principles of the invention, several improvements and modifications, these improvements and modifications can also be made Also it should be regarded as protection scope of the present invention.

Claims

1. the multi-cam dynamic human face recognition system based on FaceNet, to be applicable in scene be to include mounted on the more of management region A camera monitoring system, characterized in that the camera of multiple monitoring systems acquires the video of pedestrian, institute in real time from different perspectives There is monitoring system by network connection to same server and the Intranet of shared server；This face identification system includes video Acquisition module, characteristic extracting module and tagsort module,

Wherein, video acquisition module, all monitoring systems are based on respective camera and acquire pedestrian's video in real time from different perspectives Stream；Face's frame is cut out from video flowing using HAAR models frame by frame, and all face's frames are sent to server；

2. the multi-cam dynamic human face recognition system according to claim 1 based on FaceNet, characterized in that service All monitoring systems for connecting trained FaceNet model sharings to server in device.

3. the multi-cam dynamic human face recognition methods based on FaceNet, characterized in that include the following steps：

Step S1 acquires the video of pedestrian, and cuts out face's frame from video flowing frame by frame using HAAR models from different perspectives；

Step S3 classifies to the feature extracted, to realize recognition of face.

4. the multi-cam dynamic human face recognition methods according to claim 3 based on FaceNet, characterized in that face Frame is indicated using 128 dimension various dimensions matrixes.

5. the multi-cam dynamic human face recognition methods according to claim 3 based on FaceNet, characterized in that step In S2, in triple training process, pass through every n (n ∈ N^*) step calculates the argmax and argmin of subset and filter, that is, expand and change It rides instead of walk length, reduces model and restrain number.

6. the multi-cam dynamic human face recognition methods according to claim 3 based on FaceNet, characterized in that step In S3, classified using KNN methods.