CN111914613B

CN111914613B - Multi-target tracking and facial feature information recognition method

Info

Publication number: CN111914613B
Application number: CN202010437613.7A
Authority: CN
Inventors: 朱全银; 马思伟; 曹猛; 李佳冬; 高尚兵; 李翔; 陈伯伦; 曹苏群; 马甲林; 周泓; 马天龙; 申奕; 王梦迪; 倪金霆
Original assignee: Huaiyin Institute of Technology
Current assignee: Huaiyin Institute of Technology
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2024-03-01
Anticipated expiration: 2040-05-21
Also published as: CN111914613A

Abstract

The invention discloses a multi-target tracking and facial feature information identification method, which comprises the steps of firstly inputting a facial video and converting the facial video into a facial key frame set SF; then obtaining a data set SFD through classification and data enhancement SF; extracting SFD features by using an improved mini-Xreception model to obtain an adaptive optimizing facial feature recognition model FFs_model; extracting a key frame sequence in a face data stream FEV, and adaptively aggregating face tracking features to obtain a multi-target face key frame position set FEC; loading an ffs_model model and inputting FEC to generate a multi-target facial feature classification result set FECR; and finally, opening the self-adaptive identification interface to process the terminal request, and collecting the obtained processing results of sign-in and target feature tracking state identification under the multi-target scene in the Web server. The method provided by the invention combines the improved multi-target tracking and self-adaptive facial feature recognition technology, can effectively acquire the picture feature result label with highest accuracy, and increases the accuracy and the use value of face tracking and facial feature recognition under a multi-target scene.

Description

Multi-target tracking and facial feature information recognition method

Technical Field

The invention belongs to the field of image classification and feature recognition, and particularly relates to a multi-target tracking and facial feature information recognition method.

Background

In recent years, computer technology and AI have been rapidly developed, so that people gradually pay attention to tracking and identifying facial feature information of people by using computer means. The face feature information of a plurality of targets is detected and tracked in the real-time video, so that the method has important practical significance in places such as schools, train stations and the like, not only can an effective feature information result set in a classroom be provided for the schools, but also certain potential safety hazards can be avoided for the train stations. Therefore, the invention provides the multi-target tracking and facial feature information identification method based on the improved mini-Xreception model and the self-adaptive optimization, facial feature information of a plurality of targets can be identified in real time through videos, the face feature tracking and identification result under the multi-target condition is more accurate, and the use value of face tracking and facial feature information identification under the multi-target scene is increased.

Zhu Quanyin et al have studied on the basis of Quanyin Zhu, sunqun Cao.A. Novel classes-independent Feature Selection Algorithm for Imbalanced data 2009, p 77-82; li Xiang and Zhu Quanyin collaborative filtering recommendation [ J ] computer science and exploration, 2014,8 (6): 751-759; quanyin Zhu, yonyang Yan, jin Ding, jin Qian. The Case Study for Price Extracting of Mobile Phone Sell Online.2011, p:282-285; quanyin Zhu, suqun Cao, pei Zhou, yunyang Yan, hong Zhou. Integrated Price Forecast based on Dichotomy Backfilling and Disturbance Factor Algorithm. International Review on Computers and Software,2011, vol.6 (6): 10891093; zhu Quanyin et al, filed, published and issued related patents: zhu Quanyin, liu Tao, yan Yunyang, gao Shangbing, etc. an OpenCV-based construction drawing label information detection method, chinese patent publication nos. CN109002824a,2018.12.14; zhu Quanyin, xu Kang, zong Hui, feng Moli, etc. A building element extraction method based on the Faster-RCNN model, chinese patent publication No. CN109002841A,2018.12.14; feng Moli, yan Yunyang, yang Maocan, zhu Quanyin et al, an intelligent terminal IC card authorizing and managing method for identity authentication system, chinese patent publication No. CN107016310B,2019.12.10; zhu Quanyin, in persimmon citizens, hu Ronglin, feng Moli, etc., an expert combination recommendation method based on knowledge patterns, chinese patent publication No. CN109062961A,2018.12.21; ma S, cao M, li J, et al A Face Sequence Recognition Method Based on Deep Convolutional Neural Network [ C ]//2019 18th International Symposium on Distributed Computing and Applications for Business Engineering and Science (DCABES) & IEEE,2019:104-107.

Convolutional neural network:

the convolutional neural network (Convolutional Neural Network, CNN) is a feed-forward neural network whose artificial neurons can respond to surrounding cells in a part of the coverage area with excellent performance for large image processing. CNNs take image pixels as inputs and expect to output features in the image, the feature outputs forming inputs to the logical layer, conforming to the logic and processing the features efficiently. Its artificial neuron can affect a part of surrounding units within the coverage area, and has excellent performance for large-scale image processing. The typical structure of convolutional neural networks generally comprises: input layer, convolution layer, pooling layer, full connection layer and output layer.

And (3) target detection:

object detection is one of four basic tasks in computer vision, namely, finding all objects (objects) of interest in an image, and determining their positions and sizes, which is one of the core problems in the field of machine vision. In addition, because various objects have different appearances, shapes and postures, and the interference of factors such as illumination, shielding and the like during imaging is added, the target detection is always the most challenging problem in the field of machine vision. The currently mainstream target detection algorithm or framework has three detection frameworks of Faster R-CNN, SSD and YOLO.

Face tracking:

face Tracking (Face Tracking) refers to the process of determining the motion trajectory and size change of a Face in an input image sequence. The face tracking technology has important potential application value, and is used as a key technology in the fields of automatic face recognition, model-based coding, video retrieval, visual monitoring and the like. The common starting point of the face tracking method based on skin color information, the motion model and the local organ characteristics is that the related heuristic knowledge (such as defining a search space according to skin color and motion information) is utilized to achieve the purpose of rapid tracking, and only the distribution information of a small part of the face or local organs is generally used, so that a good face tracking effect can be achieved under some typical constraint environments.

BN (BatchNormalization) algorithm:

the BN algorithm performs normalization processing on data of the middle layer number of the network, adopts a transformation reconstruction mode, introduces the learnable parameters gamma and beta, and enables the network to learn and recover the feature distribution learned by the original network. The method solves the problem that the characteristics learned by the neural network of the layer are affected by normalization processing of the output data of the layer.

In the aspect of multi-target feature information tracking and detection, most of researches are mainly directed to unilateral processing of the problems of multi-target tracking or feature recognition models and the like, the researches on a multi-target feature self-adaptive classification method in a key frame sequence with time sequence attributes are lacked, information fusion is single, and the efficiency of multi-target feature information tracking and analysis under data with the time sequence attributes is limited.

Disclosure of Invention

The invention aims to: aiming at the problems existing in the prior art, the invention provides a multi-target tracking and facial feature information identification method for solving the problem of facial feature information identification under multiple targets.

The technical scheme is as follows: in order to solve the technical problems, the invention provides a multi-target tracking and facial feature information identification method, which comprises the following specific steps:

(1) Inputting a face feature information video, converting the video into a key frame sequence S, and obtaining a face key frame set as SF after face detection;

(2) Classifying by marking SF feature points, obtaining a label set as SFL, preprocessing and enhancing the SFL by data, and obtaining a data set as SFD;

(3) Using an improved mini-Xreception model to carry out self-adaptive feature extraction on the SFD, and fusing the extracted feature vectors to obtain a self-adaptive optimizing facial feature recognition model FFs_model;

(4) Setting the obtained face feature data stream as FEV, circularly extracting a key frame sequence in the FEV, carrying out self-adaptive tracking on a multi-target face, and aggregating and extracting face tracking features to obtain a multi-target face key frame position set FEC;

(5) Loading a facial feature recognition model FFs_model, inputting a multi-target key frame position set FEC into a trained self-adaptive optimizing model, and generating a multi-target facial feature classification result set FECR;

(6) The method comprises the steps that an open facial feature self-adaptive recognition interface FFs API, a user initiates an HTTP request through a client program, the self-adaptive recognition interface FFs API carries out self-adaptive multi-target feature recognition on request parameters initiated by the user, a processing result set FECR of sign-in and target feature information tracking state recognition under a self-adaptive multi-target scene obtained through processing is stored in a server of a Web platform, core information is returned to a calling program, and the user can acquire visual display through the Web platform.

Further, the specific steps of the step (1) are as follows:

(1.1) input image dataset S, definition function len (X) represents the length of set X, let S= { S ₁ ，S ₂ ，…，S _M S, where S _M Representing the Mth image in S, M.epsilon.1, len (S)]；

(1.2) defining a loop variable i1 for traversing S, i1 e [1, len (S) ], i1 having an initial value of 1;

(1.3) if i1 is not greater than len (S), proceeding to step (1.4) otherwise proceeding to step (1.10);

(1.4) pair S _i1 Gray_S is obtained by gray scale processing _i1 ；

(1.5) enhancing the gray-scale image gray_S _i1 Conversion to histogram equalization image hist_s _i1 ；

(1.6) vs hist_S _i1 Median filtering processing is carried out to obtain med_S _i1 ；

(1.7) for med_S _i1 Sharpening to obtain sha_S _i1 ；

(1.8) pair sha_S Using Haar Cascade classifier in OpenCV _i1 Face detection is carried out, faces are extracted, and the obtained faces are placed in SF;

(1.9) i1=i1+1, step (1.3);

(1.10) the face extraction is finished.

Further, the specific steps of the step (2) are as follows:

(2.1) inputting a face data set SF;

(2.2) labeling the face data set SF, and classifying the pictures into seven basic facial feature categories;

(2.3) defining id, wherein label is the serial number and label of the feature of a single face picture with label, which satisfies feature= { id, label }, and sfl= { feature ₀ ,feature ₁ ,…,feature _m }, feature therein _m For the mth face image in SFL, len (SFL) is the number of data set images;

(2.4) define the loop variable i2 for traversing SFL, i 2. Epsilon. [1, len (SFL)]I2 has an initial value of 1, image _i2 The i2 th face image in SFL;

(2.5) if i2 is less than or equal to len (SFL), then entering step (2.6) or else entering step (2.12);

(2.6) image _i2 Scaling in equal proportion to obtain img _i2 Meets img _i2 = { img_shape, depth=3 }, where img_shape= { height=48, width=48 };

(2.7) img the image _i2 Randomly adjusting the brightness to obtain bright_img _i2 ；

(2.8) img the image _i2 Rotating for 1 time to obtain rota_img _i2 ；

(2.9) img the image _i2 Turning to obtain mirr_img _i2 ；

(2.10) img _i2 ,rota_img _i2 ,brig_img _i2 ,mirr_img _i2 Face alignment to obtain an align_image _i2 ；

(2.11) i2=i2+1, go to (2.5);

(2.12) obtaining an image set SFD= { align_image ₁ ,alig_image ₂ ,…,alig_image _3m }。

Further, the specific steps of the step (3) are as follows:

(3.1) defining a loop variable i3 for traversing SFD, i3 e [1, len (SFD) ], i3 having an initial value of 1;

(3.2) if i3 is less than or equal to len (SFD), proceeding to step (3.3), otherwise proceeding to step (3.17);

(3.3) setting a tensor set in the data set SFD as Ts and a label set as Lk;

(3.4) tensor Ts is subjected to two convolution layers of which the convolution kernel is 3×3 with ReLU as an activation function and which are subjected to batch normalization processing, and the output is set as C ₀ ；

(3.5) defining a cyclic variable j3, wherein the topological layer number n=5, j3 epsilon [1, n ], and the initial value of j3 is 1;

(3.6) if j3 is less than or equal to n, proceeding to step (3.7);

(3.7)C ₀ the output is C after the convolution layer with convolution kernel of 1×1, stride of 2 and batch normalization processing ₁ ；

(3.8)C ₁ The output is C through a separate convolution layer with a convolution kernel of 3×3 and performing batch normalization and a ReLU as an activation function ₂ ；

(3.9)C ₂ The output is C after the convolution layer with convolution kernel of 5×5 and batch normalization processing ₃ ；

(3.10)C ₃ Through the pooling layer with window size of 3×3 and stride of 2, let the output be P _j3 And assign it to C ₀ ；

(3.11) j3=j3+1, go to step (3.6);

(3.12) the resulting tensor set p= { P ₁ ,P ₂ ,P ₃ ,P ₄ ,P ₅ Summing, and setting the output as TP;

(3.13) TP enters the full connection layer after passing through a convolution layer with a convolution kernel of 3 multiplied by 3, and the output is set as F;

(3.14) F passing through a neural network with softmax as an activation function to obtain a predictive label Lk;

(3.15) setting a multi-class logarithmic loss function and an adam optimizer to obtain a facial feature classification result r= { Lk, lk };

(3.16) i3=i3+1, step (3.2) is entered;

and (3.17) after model training, obtaining a facial feature recognition model FFs_model.

Further, the specific steps of the step (4) are as follows:

(4.1) inputting a face information video data stream FEV;

(4.2) converting the video stream FEV into a frame sequence to obtain a video frame sequence set FEVC= { fecframe ₁ ,fecframe ₂ ,…,fecframe _M }, fecframe _M Representing an Mth group of frame sequences in FEVC;

(4.3) defining a feature list of the face tracking object for recording the tracking object, defining a loop variable i4 for traversing FEVC, i 4E [1, len (FEVC)]I gives an initial value of 1, image _i4 The i4 th video key frame in FEVC;

(4.4) traversing FEVC, if i4 is not greater than len (FEVC), jumping to the step (4.5), otherwise ending traversing FEVC, jumping to the step (4.19);

(4.5) imaging BGR format _i4 Converting into gray pictures to obtain gray_image _i4 ；

(4.6) loading face detection classifier face_detection, cyclic detection of gray_image _i4 A middle human face;

(4.7) defining a current key frame face acquisition flag state d_flag, wherein when d_flag is 1, the flag represents a gray_image _i When the face is detected and d_flag is 0, the method represents the gray_image _i No face is detected;

(4.8) if d_flag=1, then jump to step (4.9), otherwise jump to step (4.17);

(4.9) extracting the face characteristics of all people in the key frame by the self-adaptive aggregation network;

(4.10) if the system has a tracking target object, jumping to step (4.11), otherwise jumping to step (4.18);

(4.11) calculating an aggregated feature for each tracking target feature list using the adaptive aggregation network;

(4.12) defining the face association matching flag state m_flag of the current key frame, wherein when m_flag is 1, the image is expressed _i Matching to face features in object, representing gray image when m_flag is 0 _i Face features that do not match in the object;

(4.13) constructing a similar matrix association matching tracking target according to the aggregation characteristics and the predicted positions;

(4.14) if m_flag=1, jumping to step (4.17), otherwise jumping to step (4.15);

(4.15) creating new tracking targets, creating a feature list of each tracking target, and storing the feature list in an object;

(4.16) if the system has a tracking target object, jumping to step (4.18), otherwise jumping to step (4.19);

(4.17) adding the extracted feature to a feature list of each tracking target;

(4.18) predicting the position of the next frame of each tracking target in the object by using a Kalman observer, and clearing the tracker which is not matched with the target for a long time;

(4.19) the variable i4 increases by 1, i.e. i4=i4+1, to step (4.4);

(4.20) obtaining a video frame face position set FEC= { face ₁ ,face ₂ ,…,face _i4 }, face of _i The i4 th face in FEC is shown.

Further, the specific steps of the step (5) are as follows:

(5.1) input video frame face position set fec= { face ₁ ,face ₂ ,…,face _N }, face of _N A key frame position area of the Nth face tracked in FEC is represented;

(5.2) define Rid, rlabel are face in single tagged face picture FEC respectively _N Is satisfied with the sequence number and feature tag of Rfeature= { Rid, rlabel }, let FECL= { Rfeature } ₀ ,Rfeature ₂ ,…,Rfeature _m }, rfeature therein _m For the mth face image in the FECL, len (FECL) is the length of the set FECL;

(5.3) loading the facial feature recognition model FFs_model, defining a loop variable i5 for traversing FECL, i 5. Epsilon. [1, len (FECL)]I5 is given an initial value of 1, and a gray_face is defined _i5 An i5 th face gray image in FECL;

(5.4) if i5 is less than or equal to len (FECL), then entering step (5.5) or else entering step (5.12);

(5.5) obtaining the rectangular face position to obtain face_coordinates _i5 = { x1, y1, x2, y2}, where x1, x2, y1, y2 represent faces, respectivelyThe positions of four points of the regional rectangle selection frame;

(5.6) acquiring the face region to obtain a gray_face _i5 Wherein, the grayFace _i5 ＝gray_image _i5 [y1:y2,x1:x2]；

(5.7) Gray_face _i5 Scaling in equal proportion to obtain img _i5 Meets img _i5 = { img_shape, depth=3 }, where img_shape= { height=48, width=48 };

(5.8) img _i5 Data normalization processing is carried out to obtain norm_img _i5 ；

(5.9) the norm_img _i5 Placing the facial feature into an adaptive aggregation network facial feature classifier to obtain a facial feature predicted value FFs_prediction _i5 ；

(5.10) optimizing FFs_prediction through the adaptive optimization aggregation network _i5 Maximum facial feature tendency label ffs_label_arg _i5 Obtaining Rlabel _i5 Facial feature predictive value and predictive probability, wherein Rlabel _i5 ＝FFs_label_arg _i5 ；

(5.11) i5=i5+1, proceeding to step (5.4);

(5.12) obtaining facial feature classification result FECR= { ffs_image ₁ ,FFs_image ₂ ,…,FFs_image _m Wherein ffs_image= { Rid, rlabel }, rid and Rlabel respectively represent the sequence number of the face feature result and the feature recognition prediction tag, ffs_image _m Representing an mth result sequence in the result set FECR;

and (5.13) adaptively aggregating FECR characteristic information to obtain the face information of the monitored object in the adaptive scene, and obtaining a multi-target tracking check-in attendance and facial characteristic change information tracking identification characteristic state result set.

Further, the specific steps of the step (6) are as follows:

(6.1) an open facial feature adaptive recognition interface FFs API;

(6.2) creating a Thread Pool;

(6.3) judging whether all tasks of the Thread Pool are executed, if so, entering a step (6.9), otherwise, entering a step (6.4);

(6.4) the user initiates an HTTP request through the client program;

(6.5) the Child Thread acquires task processing;

(6.6) the self-adaptive recognition interface FFs API carries out self-adaptive multi-target feature recognition on request parameters initiated by a user, and stores a processing result set FECR of attendance check-in and target feature information tracking state recognition in a self-adaptive multi-target scene obtained by processing in a Web platform;

(6.7) returning the identified multi-target feature state core parameter information FFs result set to the calling program, wherein the detailed core result set FECR is reserved on the Web platform for visual display;

(6.8) ending the Child process Child Thread, and entering the step (6.3);

(6.9) closing the Thread Pool;

(6.10) after the facial feature adaptive recognition is finished, a visual result set is returned to the client for display.

The invention adopts the technical scheme and has the following beneficial effects:

the method of the invention carries out facial feature information recognition based on the face photos obtained by the video stream, recognizes the multi-target face position information in the key frame of the video stream by the self-adaptive multi-target tracking technology, and carries out feature extraction by utilizing the improved mini-Xreception convolution neural self-adaptive network, thereby realizing the tracking recognition of the real-time multi-target facial feature information. The method changes the limitations of the traditional detection and facial feature information identification method, combines the improved multi-target tracking and self-adaptive facial feature information identification technology, can effectively acquire a picture facial feature information result label with highest accuracy, enables the facial feature information identification result of the multi-target face to be more accurate under the condition of time sequence attribute data, and increases the use value of facial tracking and facial feature information identification under the multi-target scene.

Drawings

FIG. 1 is a general flow chart of the present invention;

FIG. 2 is a flow chart of face detection for the key frame sequence of FIG. 1;

FIG. 3 is a flow chart of the pre-processing and enhancing of the detected face data of FIG. 1;

FIG. 4 is a flow chart of the training feature extraction neural network model of FIG. 1;

FIG. 5 is a flow chart of the adaptive multi-objective tracking of FIG. 1 to obtain a set of face keyframe locations;

FIG. 6 is a flow chart of a facial feature classification result set obtained by adaptively optimizing the loading training model in FIG. 1;

fig. 7 is a flow chart of facial feature recognition by the facial feature adaptive recognition interface of fig. 1.

Detailed Description

The present invention is further illustrated in the following examples of production items for multi-objective tracking check-in attendance of a student's face and facial feature information tracking recognition of a student in a campus class scene, which are to be understood as merely illustrative of the present invention and not as limiting the scope of the present invention, and various modifications of the invention, which are equivalent to those skilled in the art, will fall within the scope of the appended claims of the present application after reading the present invention.

As shown in fig. 1 to 7, the method for identifying multi-target tracking and facial feature information based on improved mini-Xception model and adaptive optimization according to the present invention comprises the following steps:

step 1: inputting student facial feature information video, converting the video into a key frame sequence S, and obtaining a human face key frame set which is SF after human face detection;

step 1.1: input image dataset S, definition function len (X) represents the length of set X, let s= { S ₁ ，S ₂ ，…，S _M S, where S _M Representing the Mth image in S, M.epsilon.1, len (S)]；

Step 1.2: defining a circulation variable i1 for traversing S, wherein i1 epsilon [1, len (S) ], and the initial value of i1 is 1;

step 1.3: if i1 is less than or equal to len (S), entering a step 1.4, otherwise entering a step 1.10;

step 1.4: for S _i1 Gray_S is obtained by gray scale processing _i1 ；

Step 1.5: enhancing the gray scale image gray_s _i1 Conversion to histogram equalization image hist_s _i1 ；

Step 1.6: for hist_S _i1 Median filtering processing is carried out to obtain med_S _i1 ；

Step 1.7: for med_S _i1 Sharpening to obtain sha_S _i1 ；

Step 1.8: sha_S pair using Haar Cascade classifier in OpenCV _i1 Face detection is carried out, a student face is extracted, and the obtained face is put into SF;

step 1.9: i1 =i1+1, go to step 1.3;

step 1.10: and (5) finishing face extraction.

Step 2: classifying by marking SF feature points, obtaining a label set as SFL, preprocessing and enhancing the SFL by data, and obtaining a data set as SFD;

step 2.1: inputting a student face data set SF;

step 2.2: labeling the face data set SF, and dividing the picture into seven basic facial feature categories;

step 2.3: defining id, and label as the serial number and label of a single face picture feature with label respectively, and satisfying feature = { id, label }, and letting SFL = { feature } ₀ ,feature ₁ ,…,feature _m }, feature therein _m For the mth face image in SFL, len (SFL) is the number of data set images;

step 2.4: defining a loop variable i2 for traversing SFL, i2 ε [1, len (SFL)]I2 has an initial value of 1, image _i2 The i2 th face image in SFL;

step 2.5: if i2 is less than or equal to len (SFL), the step 2.6 is entered, otherwise, the step 2.12 is entered;

step 2.6: image is imaged _i2 Scaling in equal proportion to obtain img _i2 Meets img _i2 = { img_shape, depth=3 }, where img_shape= { height=48, width=48 };

step 2.7: image img _i2 Randomly adjusting the brightness to obtain bright_img _i2 ；

Step 2.8: image img _i2 Rotating for 1 time to obtain rota_img _i2 ；

Step 2.9: image img _i2 Turning to obtain mirr_img _i2 ；

Step 2.10: img is to be _i2 ,rota_img _i2 ,brig_img _i2 ,mirr_img _i2 Face alignment to obtain an align_image _i2 ；

Step 2.11: i2 =i2+1, go to step 2.5;

step 2.12: get image set sfd= { align_image ₁ ,alig_image ₂ ,…,alig_image _3m }。

Step 3: using an improved mini-Xreception model to carry out self-adaptive feature extraction on the SFD, and fusing the extracted feature vectors to obtain a self-adaptive optimizing facial feature recognition model FFs_model;

step 3.1: defining a loop variable i3 for traversing SFD, wherein i3 epsilon [1, len (SFD) ], and i3 has an initial value of 1;

step 3.2: if i3 is equal to or less than len (SFD), then enter step 3.3, otherwise enter 3.17;

step 3.3: setting a tensor set in a data set SFD as Ts and a label set as Lk;

step 3.4: the tensor Ts is subjected to two convolution layers with ReLU as an activation function convolution kernel of 3×3 and subjected to batch normalization processing, and the output is set as C ₀ ；

Step 3.5: defining a cyclic variable j3, wherein the topological layer number n=5, j3 epsilon [1, n ], and the initial value of j3 is 1;

step 3.6: if j3 is less than or equal to n, the step (3.7) is carried out;

step 3.7: c (C) ₀ The output is C after the convolution layer with convolution kernel of 1×1, stride of 2 and batch normalization processing ₁ ；

Step 3.8: c (C) ₁ The output is C through a separate convolution layer with a convolution kernel of 3×3 and performing batch normalization and a ReLU as an activation function ₂ ；

Step 3.9: c (C) ₂ The output is C after the convolution layer with convolution kernel of 5×5 and batch normalization processing ₃ ；

Step 3.10: c (C) ₃ Through the pooling layer with window size of 3×3 and stride of 2, let the output be P _j3 And assign it to C ₀ ；

Step 3.11: j3 =j3+1, go to step (3.6);

step 3.12: the resulting tensor set p= { P ₁ ,P ₂ ,P ₃ ,P ₄ ,P ₅ Summing, and setting the output as TP;

step 3.13: TP enters a full connection layer after passing through a convolution layer with a convolution kernel of 3 multiplied by 3, and the output is set as F;

step 3.14: f, obtaining a predictive label Lk through a neural network taking softmax as an activation function;

step 3.15: setting a multi-classification logarithmic loss function and an adam optimizer to obtain a facial feature classification result R= { Lk, lk };

step 3.16: i3 =i3+1, go to step (3.2);

step 3.17: and after the model training is finished, obtaining a facial feature recognition model FFs_model.

Step 4: setting the obtained face feature data stream as FEV, circularly extracting a key frame sequence in the FEV, carrying out self-adaptive tracking on a multi-target face, and aggregating and extracting face tracking features to obtain a multi-target face key frame position set FEC;

step 4.1: inputting a face information video data stream FEV;

step 4.2: converting the video stream FEV into a frame sequence to obtain a video frame sequence set FEVC= { fecframe ₁ ,fecframe ₂ ,…,fecframe _M }, fecframe _M Representing an Mth group of frame sequences in FEVC;

step 4.3: defining a feature list of a face tracking object for recording the tracking object, and defining a loop variable i4 for traversing FEVC, i 4E [1, len (FEVC)]I4 has an initial value of 1, image _i4 Is the ith video key frame in FEVC;

step 4.4: traversing FEVC, if i4 is not more than len (FEVC), jumping to step 4.5, otherwise ending traversing FEVC, jumping to step 4.19;

step 4.5: BGR cellImage device _i4 Converting into gray pictures to obtain gray_image _i4 ；

Step 4.6: face detection classifier face_detection is loaded, and gray_image is circularly detected _i4 A middle human face;

step 4.7: defining a current key frame face acquisition flag state d_flag, wherein when d_flag is 1, the d_flag represents a gray_image _i4 When the face is detected and d_flag is 0, the method represents the gray_image _i4 No face is detected;

step 4.8: if d_flag=1, then jump to step 4.9, otherwise jump to step 4.17;

step 4.9: extracting face features of all people in the key frame by the self-adaptive aggregation network;

step 4.10: if the system has a tracking target object, jumping to step (4.11), otherwise jumping to step (4.18);

step 4.11: calculating an aggregation characteristic of each tracking target characteristic list by using the adaptive aggregation network;

step 4.12: defining a face association matching flag state m_flag of a current key frame, wherein when the m_flag is 1, the m_flag represents a gray_image _i4 Matching to face features in object, representing gray image when m_flag is 0 _i4 Face features that do not match in the object;

step 4.13: constructing a similar matrix association matching tracking target according to the aggregation characteristics and the predicted positions;

step 4.14: if m_flag=1, then go to step (4.17), otherwise go to step (4.15);

step 4.15: creating new tracking targets, creating a feature list of each tracking target, and storing the feature list in an object;

step 4.16: if the system has a tracking target object, jumping to step (4.18), otherwise jumping to step (4.19);

step 4.17: adding the extracted features to a feature list of each tracking target;

step 4.18: predicting the position of the next frame of each tracking target in the object by using a Kalman observer, and clearing the tracker which is not matched with the target for a long time;

step 4.19: the variable i4 is self-incremented by 1, i.e., i4=i4+1, and step (4.4) is entered;

step 4.20: obtain the student face position set FEC= { face of video frame ₁ ,face ₂ ,…,face _i4 }, face of _i4 The i4 th face in FEC is shown.

Step 5: loading a facial feature recognition model FFs_model, inputting a multi-target key frame position set FEC into a trained self-adaptive optimizing model, and generating a multi-target facial feature classification result set FECR;

step 5.1: input video frame face position set fec= { face ₁ ,face ₂ ,…,face _N }, face of _N A key frame position area of the Nth face tracked in FEC is represented;

step 5.2: defining Rid and Rlabel as face in FEC of single face picture with label respectively _N Is satisfied with the sequence number and feature tag of Rfeature= { Rid, rlabel }, let FECL= { Rfeature } ₀ ,Rfeature ₂ ,…,Rfeature _m }, rfeature therein _m For the mth face image in the FECL, len (FECL) is the length of the set FECL;

step 5.3: loading a facial feature recognition model ffs_model, defining a loop variable i5 for traversing FECL, i5 ε [1, len (FECL)]I5 is given an initial value of 1, and a gray_face is defined _i5 An i5 th face gray image in FECL;

step 5.4: if i5 _≤ Len (FECL) then goes to step 5.5, otherwise goes to step 5.12;

step 5.5: obtaining a face rectangle position to obtain face_coordinates _i5 = { x1, y1, x2, y2}, wherein x1, x2, y1, y2 respectively represent the positions of four points of the rectangular frame of the face region;

step 5.6: obtaining a face region to obtain a gray_face _i5 Wherein, the grayFace _i5 ＝gray_image _i [y1:y2,x1:x2]；

Step 5.7: will gray_face _i5 Scaling in equal proportion to obtain img _i5 Meets img _i5 = { img_shape, depth=3 }, where img_shape= { height=48, width=48 };

step 5.8: img is to be _i5 Data normalization processing is carried out to obtain norm_img _i5 ；

Step 5.9: the norm_img _i5 Placing the facial feature into an adaptive aggregation network facial feature classifier to obtain a facial feature predicted value FFs_prediction _i5 ；

Step 5.10: optimizing FFs_prediction through self-adaptive optimizing aggregation network _i5 Maximum facial feature tendency label ffs_label_arg _i5 Obtaining Rlabel _i5 Facial feature predictive value and predictive probability, wherein Rlabel _i5 ＝FFs_label_arg _i5 ；

Step 5.11: i5 =i5+1, go to step 5.4;

step 5.12: obtain facial feature classification result FECR= { ffs_image ₁ ,FFs_image ₂ ,…,FFs_image _m Wherein ffs_image= { Rid, rlabel }, rid and Rlabel respectively represent the sequence number of the face feature result and the feature recognition prediction tag, ffs_image _m Representing the mth result sequence in the result set FECR

Step 5.13: and adaptively aggregating FECR characteristic information to obtain face information of a monitored object in an adaptive scene, and obtaining a multi-target tracking check-in attendance and facial characteristic change information tracking identification characteristic state result set.

Step 6: an open facial feature self-adaptive recognition interface FFs API, a user initiates an HTTP request through a client program, the self-adaptive recognition interface FFs API carries out self-adaptive multi-target feature recognition on request parameters initiated by the user, a processing result set FECR of sign-in and target feature information tracking state recognition under a self-adaptive multi-target scene obtained by processing is stored in a server of a Web platform, core information is returned to a calling program, and the user can acquire visual display through the Web platform;

step 6.1, opening a facial feature adaptive recognition interface FFs API;

step 6.2, creating a Thread Pool;

step 6.3, judging whether all tasks of the Thread Pool are executed, if so, entering a step 6.9, otherwise, entering a step 6.4;

step 6.4, the user initiates an HTTP request through the client program;

step 6.5, child Thread acquires task processing;

step 6.6, the self-adaptive recognition interface FFs API carries out self-adaptive multi-target feature recognition on request parameters initiated by a user, and stores a processing result set FECR of attendance check-in and target feature information tracking state recognition in a self-adaptive multi-target scene obtained by processing in a Web platform;

step 6.7, the identified multi-target characteristic state core parameter information FFs result set is returned to the calling program, and a detailed core result set FECR is reserved on the Web platform for visual display;

step 6.8, ending the Child process Child Thread, and entering step 6.3;

step 6.9, closing a Thread Pool;

and 6.10, after the facial feature self-adaptive recognition is finished, returning a visual result set to the client for display.

In order to better illustrate the effectiveness of the method, the improved mini-Xreception neural network is utilized to perform feature extraction and multi-target self-adaptive tracking to obtain a result by performing data processing on 8956 student face information key frame sequences. The method further improves the accuracy of multi-target facial feature recognition under the condition of video streaming, improves the accuracy of facial feature recognition and obtains 98.96% of accuracy on the classification result.

The following table is a detailed description of all the variables involved in the above steps.

/>

The invention can be combined with a computer system to complete the automatic tracking and classification of the multi-target facial features.

The invention creatively provides a multi-target tracking and facial feature information identification method based on an improved mini-Xreception model and self-adaptive optimization, and the optimal multi-target facial feature classification result in a key frame sequence is obtained through multiple experiments.

The multi-target tracking and facial feature information identification method based on the improved mini-Xreception model and the self-adaptive optimization provided by the invention can be used for tracking and classifying the facial features of multi-target students in a key frame sequence and can also be used for tracking and classifying other sequence data.

Claims

1. A multi-target tracking and facial feature information recognition method is characterized by comprising the following specific steps:

(1) Inputting a face feature information video, converting the video into a key frame sequence S, and obtaining a face key frame set as SF after face detection; the function len (X) represents the length of the set X;

(5) Loading an adaptive optimizing facial feature recognition model FFs_model, and inputting a multi-target face key frame position set FEC into the adaptive optimizing facial feature recognition model FFs_model to generate a multi-target facial feature classification result set FECR;

(6) An open facial feature self-adaptive recognition interface FFs API, a user initiates an HTTP request through a client program, the self-adaptive recognition interface FFs API carries out self-adaptive multi-target feature recognition on request parameters initiated by the user, a facial feature classification result set FECR of sign-in and target feature information tracking state recognition under a self-adaptive multi-target scene obtained through processing is stored in a server of a Web platform, core information is returned to a calling program, and the user can acquire visual display through the Web platform;

the specific steps for obtaining the self-adaptive optimizing facial feature recognition model ffs_model in the step (3) are as follows:

(3.3) setting an image set in the data set SFD as Ts and a label set as Lk;

(3.4) image T in image set Ts _i3 The output is C after two convolution layers with ReLU as the activation function convolution kernel of 3×3 and batch normalization processing ₀ ；

(3.6) if j3 is less than or equal to n, proceeding to step (3.7);

(3.11) j3=j3+1, go to step (3.6);

(3.16) i3=i3+1, step (3.2) is entered;

(3.17) after model training is completed, obtaining an adaptive optimizing facial feature recognition model FFs_model;

the specific steps for obtaining the multi-target face key frame position set FEC in the step (4) are as follows:

(4.1) inputting a face information video data stream FEV;

(4.3) defining a feature list of the face tracking object for recording the tracking object, defining a loop variable i4 for traversing FEVC, i 4E [1, len (FEVC)]I4 has an initial value of 1, image _i4 The i4 th video key frame in FEVC;

(4.7) defining a current key frame face acquisition flag state d_flag, wherein when d_flag is 1, the flag represents a gray_image _i4 When the face is detected and d_flag is 0, the method represents the gray_image _i4 No face is detected;

(4.8) if d_flag=1, then jump to step (4.9), otherwise jump to step (4.17);

(4.12) defining the face association matching flag state m_flag of the current key frame, wherein when m_flag is 1, the image is expressed _i4 Matching to face features in object, representing gray image when m_flag is 0 _i4 Face features that do not match in the object;

(4.14) if m_flag=1, jumping to step (4.17), otherwise jumping to step (4.15);

(4.17) adding the extracted feature to a feature list of each tracking target;

(4.19) the variable i4 increases by 1, i.e. i4=i4+1, to step (4.4);

(4.20) obtaining a video frame face position set FEC= { face ₁ ,face ₂ ,…,face _i4 }, face of _i4 Indicating the i4 th face position in FEC;

the specific steps for generating the multi-objective facial feature classification result set FECR in the step (5) are as follows:

(5.1) input multipleTarget face key frame position set fec= { face ₁ ,face ₂ ,…,face _N }, face of _N A key frame position area of the Nth face tracked in FEC is represented;

(5.2) define Rid, rlabel are face respectively _N Is satisfied with the sequence number and feature tag of Rfeature= { Rid, rlabel }, let FECL= { Rfeature } ₀ ,Rfeature ₂ ,…,Rfeature _m }, rfeature therein _m For the mth face image in the FECL, len (FECL) is the length of the set FECL;

(5.5) obtaining the rectangular face position to obtain face_coordinates _i5 = { x1, y1, x2, y2}, wherein x1, x2, y1, y2 respectively represent the positions of four points of the rectangular frame of the face region;

(5.9) the norm_img _i5 Putting into an adaptive aggregation network to obtain a facial feature predictive value FFs_prediction _i5 ；

(5.10) optimizing ffs_prediction through an adaptive aggregation network _i5 Maximum facial feature tendency label ffs_label_arg _i5 Obtaining Rlabel _i5 Facial feature predictive value and predictive probability, wherein Rlabel _i5 ＝FFs_label_arg _i5 ；

(5.11) i5=i5+1, proceeding to step (5.4);

(5.12) obtaining a facial feature classification result set fecr= { ffs_image1, ffs_image2, …, ffs_imagem }, wherein ffs_image= { Rid, rlabel }, rid and Rlabel respectively represent a sequence number of a facial feature result and a feature recognition prediction label, and ffs_imagem represents an mth result sequence in the facial feature classification result set FECR;

2. The method for multi-target tracking and facial feature information recognition according to claim 1, wherein the specific steps of the face key frame set SF obtained in the step (1) are as follows:

(1.4) pair S _i1 Gray_S is obtained by gray scale processing _i1 ；

(1.7) for med_S _i1 Sharpening to obtain sha_S _i1 ；

(1.9) i1=i1+1, step (1.3);

(1.10) the face extraction is finished.

3. The method for multi-target tracking and facial feature information recognition according to claim 1, wherein the specific steps of the dataset SFD obtained in the step (2) are as follows:

(2.1) inputting a face key frame set SF;

(2.2) labeling the face key frame set SF, and dividing the pictures into seven basic facial feature categories;

(2.8) img the image _i2 Rotating for 1 time to obtain rota_img _i2 ；

(2.9) img the image _i2 Turning to obtain mirr_img _i2 ；

(2.11) i2=i2+1, go to (2.5);

(2.12) obtaining an image set Ts= { align_image ₁ ,alig_image ₂ ,…,alig_image _3m }。

4. The method for multi-target tracking and facial feature information recognition according to claim 1, wherein in the step (6), an open facial feature adaptive recognition interface FFs API is provided, a user initiates an HTTP request through a client program, the adaptive recognition interface FFs API performs adaptive multi-target feature recognition on a request parameter initiated by the user, and stores a signature and facial feature classification result set FECR obtained by processing in an adaptive multi-target scene in a server of a Web platform, and returns core information to a calling program, and the specific steps of the user obtaining visual presentation through the Web platform are as follows:

(6.1) an open facial feature adaptive recognition interface FFs API;

(6.2) creating a Thread Pool;

(6.4) the user initiates an HTTP request through the client program;

(6.5) the Child Thread acquires task processing;

(6.5) the self-adaptive recognition interface FFs API carries out self-adaptive multi-target feature recognition on request parameters initiated by a user, and stores the attendance check-in and facial feature classification result set FECR in the Web platform under the self-adaptive multi-target scene;

(6.7) returning the self-adaptive multi-target feature core parameter information FFs result set identified in the step (6.5) to a calling program, wherein the facial feature classification result set FECR is reserved on a Web platform for visual display;

(6.8) ending the Child Thread, and entering the step (6.3);

(6.9) closing the Thread Pool;