CN116229381A

CN116229381A - River and lake sand production ship face recognition method

Info

Publication number: CN116229381A
Application number: CN202310525507.8A
Authority: CN
Inventors: 包学才
Original assignee: Nanchang Institute of Technology
Current assignee: Nanchang Institute of Technology
Priority date: 2023-05-11
Filing date: 2023-05-11
Publication date: 2023-06-06
Anticipated expiration: 2043-05-11
Also published as: CN116229381B

Abstract

The invention discloses a river and lake sand-collecting ship face recognition method, which aims at the task of individual recognition of a sand-collecting ship, changes an activation function in a Retinaface basic convolution layer into a GELU activation function to form an improved trunk feature extraction network, is favorable for better convergence of a training model, introduces an ECA attention mechanism module after three effective feature layers with different sizes are output by the trunk network, and takes an effect-RepGFPN network of a DAMO-YOLO network model as a feature fusion network, so that the accuracy of the model is improved while more calculation amount is not increased; in addition, an appropriate network structure is searched out by using an MAE-NAS technology to serve as a FaceNet backbone network, a GAM attention mechanism is introduced into the FaceNet network, and the whole model is strengthened and can be used for optimizing the face individual recognition task of the sand-collecting ship in the river and lake.

Description

River and lake sand production ship face recognition method

Technical Field

The invention relates to the technical field of computer vision target recognition, in particular to a method for recognizing the face of a river and lake sand-collecting ship.

Background

River sand resources are important material bases for economic and social development, river and lake sand production management is an important component of river and lake management, the task of sand production monitoring mainly comprises detection of whether illegal sand production vessels produce sand in forbidden areas, monitoring of whether sand production damages a channel structure and the like, monitoring of sand production behaviors of inland channels and lakes by adopting a video monitoring system is one of effective means for managing the sand production vessels, and target identification of individuals of the sand production vessels is a core for realizing accurate supervision of the sand production vessels and is an essential component of intelligent management of sand production in river channels. However, the prior art can only realize the identification of the type of the sand production ship, and the traditional target detection algorithm is used for judging the type and the positioning of the ship, so as to count the channel information;

however, the method can only detect the type and the position of the sand production ship, and can not accurately identify the individual sand production ship and whether the sand production ship is an illegal operation ship, so that the algorithm can not well solve the problem of fine management of the sand production ship, and can not be put into practical application.

In order to solve the problems, a method for identifying the face of the river and lake sand-collecting ship is provided, and the method for identifying the face of the river and lake sand-collecting ship is applied to the river and lake sand-collecting ship face identification task.

Disclosure of Invention

The invention provides a method for identifying the face of a river and lake sand-collecting ship, which specifically adopts the following technical scheme:

a method for identifying the face of a river and lake sand-collecting ship comprises the following steps: the method comprises the following steps:

s1, collecting past sand-collecting ship pictures shot by cameras arranged on the sides of rivers and lakes, wherein the sand-collecting ship pictures comprise pictures shot in sunny and rainy weather environments, pictures shot in the evening and night, pictures shot at different positions of the ship face, screening the collected sand-collecting ship pictures to obtain a sand-collecting ship picture data set, and adding salt and pepper noise, color jitter and color gamut distortion to the sand-collecting ship picture data set through OPENCV so as to train the sand-collecting ship picture data set;

s2, marking the acquired pictures of different sand-collecting vessels to form a training RetinaFace model data set I;

s3, intercepting the face image of the sand-collecting ship in the first data set, and placing pictures belonging to the same sand-collecting ship under a folder to form a second data set for training the FaceNet model;

s4, aiming at a specific sand mining ship individual face recognition task, respectively improving a RetinaFace model and a FaceNet model to realize higher precision;

the improved Retinaface model is characterized in that a LeakyReLU activation function in a basic convolution layer in a Retinaface main network is replaced by a GELU activation function to form a CBG convolution module, a new main feature extraction network is formed, the main network of the original Retinaface has three output effective feature layers, ECA attention structures are respectively added after the three output effective feature layers C3, C4 and C5 to enhance the self-adaptive attention capacity of an interested area, the shape of the output effective feature layer C5 of the main network is (20, 20), the shape of the C4 is (40, 40), the shape of the C3 is (80, 80), an Effict-RepGFPN network using a DAMO-YOLO network model is used as a feature fusion network, the feature fusion realized by linear stacking of the convolution layers is improved into CSPNet connection, high-level semantic information and low-level space information can be fully exchanged, the precision of the model is improved, and finally the output effective feature layers are connected to the feature fusion network;

the improved FaceNet model is characterized in that various backbone network structures with different scales are quickly searched in a large scale by using a heuristic and training-free search method MAE-NAS, a low-cost customized model is formed according to a delay budget, an improved FaceNet backbone network is formed, a one-shot method is based on the one-shot method, firstly, time delay data of all used operators are obtained by sampling target equipment, delay prediction is carried out on the model according to the time delay data of the operators, if the predicted model magnitude accords with a preset target, the model can enter a subsequent model to update and calculate scores, finally, the model is subjected to iterative update to obtain an optimal model which accords with time delay constraint, the number of floating point operation is finally determined to be 89MB, the number of the parameters is 15.4G, and finally, a GAM attention mechanism is introduced after a global average pooling layer behind the backbone network, so that the self-adaptive attention of the network to an interested region is enhanced;

s5, training and improving a RetinaFace target detection model by using the data set I, training and improving a FaceNet sand dredger individual face recognition model by using the data set II, and respectively obtaining an optimal model;

s6, coding the output feature vector by using a trained Retinaface+FaceNet model to obtain an NPY file containing face information and the name of the sand production ship as a database;

and S7, jointly identifying the individual faces of the river and lake sand-collecting ship by utilizing the improved Retinaface+FaceNet model obtained through training.

Preferably, the pictures of different sand-collecting vessels in the sand-collecting areas of the river and the lake are subjected to data augmentation pretreatment, and then marked by using a Labelme dataset marking tool to form a first target detection dataset for training RetinaFace, and the faces of individual sand-collecting vessels in the pictures are intercepted to form a second face dataset of the individual sand-collecting vessels for training FaceNet.

Preferably, the pictures in the picture data set of the sand-collecting ship after training in the step S1 are numbered, and a label marking tool Labelme is used for marking and collecting picture target detection and key point information of different sand-collecting ships in the river and lake sand-collecting area to form a data set I of a JSON marking file in a COCO format.

Preferably, batch conversion is performed on JSON annotation files generated in the first dataset into text files, all the text files are combined into one text file, the synthesized text file contains the path and name of the image, the coordinate information of the upper left corner and the lower right corner of the frame and the coordinate information of the key point, and the text file is combined with the original image to be used as the first dataset.

Preferably, the shot sand mining ship is transmitted into Retinaface target detection and key point positioning algorithm for classification and positioning identification, whether sand mining ships exist in the picture is judged, if the sand mining ships exist, a prediction frame is used for marking the targets, the key points are used for marking the face information of the sand mining ships, then the picture is corrected to be in the horizontal and vertical directions, then the sand mining ships selected according to the frame are intercepted, and the individual face pictures of the intercepted sand mining ship are transmitted into an improved faceNet network;

then, the intercepted individual face image of the sand-collecting ship is transmitted into an improved faceNet network to identify the individual face of the sand-collecting ship, the identified output result is compared with the coded feature vector output in the database, the Euclidean distance is calculated one by the feature vector output by the network of the picture to be identified and the feature vector in the database, if the Euclidean distance is smaller than the threshold value determined by the features of the training set when the training network is used for training, the individual face information of the sand-collecting ship is judged to be the face of the ship stored in the database, at the moment, early warning is sent out, manual intervention is carried out, manual checking is carried out on the ship with the identical two features, and after the error is confirmed, the name number of the sand-collecting ship is marked in the image and displayed;

if the Euclidean distance between the input sand-collecting ship individual face information and all the feature vectors in the database is not smaller than a preset threshold value, the input sand-collecting ship individual face information does not belong to the sand-collecting ship recorded in the database, namely, the sand-collecting ship is judged to be illegal, meanwhile, the illegal sand-collecting ship is marked and output, and meanwhile, early warning prompt is sent out to prompt management staff.

Compared with the prior art, the invention has the beneficial effects that:

according to the method, an activation function in a Retinaface basic convolution layer is changed into a GELU activation function, an improved trunk feature extraction network is formed, better convergence of a training model is facilitated, an ECA attention mechanism module is introduced after three effective feature layers of the trunk network are output in different sizes, an effect-RepGFPN network of a DAMO-YOLO network model is used as a feature fusion network, feature fusion realized by linear stacking of the convolution layers is improved to CSPNet connection, high-level semantic information and low-level space information can be fully exchanged, the accuracy of the model is improved while no more calculation amount is increased, an improved MAE-NAS heuristic and training-free searching method is utilized to form a trunk network of the FaceNet, various trunk network structures in different scales are quickly searched in a large range, the MAE-NAS utilizes an information theory to evaluate a lake initializing network from the aspect of entropy, the evaluating process does not need any training process, the defect that training is needed is overcome, the network searching in a large range in a short time is realized, the searching process in the large range is reduced, the complexity of the network is better than the conventional network, the network can be recognized by the GACEN, the network is better in the situation that the network is better can be recognized by the conventional face network, and the network is better than the face network can be recognized by the face network, and the network can be better recognized by the face network is better than the conventional network.

The river and lake sand-collecting ship individual face recognition algorithm based on deep learning can solve the defect that the existing ship supervision technology can only judge the type of the sand-collecting ship and can not accurately recognize each individual sand-collecting ship, realizes the recognition of individual characteristics of illegal sand-collecting ships, reduces the labor intensity of manual monitoring and improves the management efficiency of the sand-collecting ship.

Drawings

Fig. 1 is a flow chart of a method for identifying the face of a river and lake sand-collecting ship.

Detailed Description

The following description of the technical solutions in the embodiments of the present invention will be clear and complete, and it is obvious that the described embodiments are only some embodiments of the present invention, but not all embodiments.

Example 1

Referring to fig. 1, a method for identifying a face of a river and lake sand-collecting ship comprises the following steps:

s2, marking the acquired pictures of different sand-collecting ship individuals to form a training RetinaFace model data set I;

s3, intercepting the face image of the individual sand dredger in the first data set, and placing the pictures belonging to the same individual sand dredger under a folder to form a second data set for training the FaceNet model;

s4, aiming at a specific sand-collecting ship individual face image recognition task, respectively improving a RetinaFace model and a FaceNet model to realize higher precision;

the improved FaceNet model is characterized in that various backbone network structures with different scales are quickly searched in a large scale by using a heuristic and training-free search method MAE-NAS, a low-cost customized model is formed according to a delay budget, an improved FaceNet backbone network is formed, a one-shot method is based on the one-shot method, firstly, target equipment is sampled to obtain delay data of all operators, delay prediction is carried out on the model according to the delay data of the operators, if the predicted model magnitude accords with a preset target, the model can enter a subsequent model to update and calculate scores, finally, the model is subjected to iterative update to obtain an optimal model which accords with delay constraint, the DeepMAD-89M is finally determined as the backbone network, the parameter number is 89MB, the floating point operation number is 15.4G, and finally, a GAM attention mechanism is introduced after the global average pooling layer behind the backbone network, so that the self-adaptive attention of the network to an interested region is enhanced.

s6, coding the output feature vector by using a trained Retinaface+FaceNet model to obtain an NPY file which contains face information and the name of the sand production ship and is used as a database;

In the S1-S7 steps, a river and lake sand mining ship target detection model based on improved Retinaface is built, a LeakyReLU activation function in a basic convolution layer in a Retinaface main network is replaced by a GELU activation function to form a CBG convolution module, a new main feature extraction network is formed, the main network of the original Retinaface is provided with three output effective feature layers, ECA attention structures are added after the three output effective feature layers C3, C4 and C5 respectively, the ECA attention structures are used for network enhancement of self-adaptive attention capacity to an interested area, the output effective feature layer C5 of the main network is (20, 20), the C4 is (40, 40), the C3 is (80, 80), and an Effient-RepGFPN network using a DAMO-YOLO network model is used as a feature fusion network, and finally the output of the effective feature layers is connected to the feature fusion network;

furthermore, different channel numbers are used for different scale features, so that the expression capacities of the high-level features and the low-level features are flexibly controlled under the constraint of lightweight calculation;

feature fusion realized by linear stacking of convolution layers is improved to CSPNet connection, high-level semantic information and low-level space information can be fully exchanged, and the accuracy of a model is improved while no more calculation amount is increased.

S1-S7, detecting the river and lake ships which are conveyed by using a trained improved RetinaFace model according to S7, judging whether sand-collecting ships exist, marking the category, position regression and key point regression of the sand-collecting ships if the sand-collecting ships exist, and intercepting individual face images of the sand-collecting ships;

then, the intercepted individual face image of the sand-collecting ship is transmitted into an improved faceNet network to identify the individual face of the sand-collecting ship, the identified output result is compared with the coded feature vector output in the database, the Euclidean distance is calculated one by the feature vector output by the network of the picture to be identified and the feature vector in the database, if the Euclidean distance is smaller than the threshold value determined by the features of the training set when the training network is used for training, the individual face information of the sand-collecting ship is judged to be the face of the ship stored in the database, at the moment, early warning is sent out, manual intervention is carried out, manual checking is carried out on the ship with the identical two features, and after the error is confirmed, the serial number of the sand-collecting ship is marked in the image and displayed;

if the Euclidean distance between the input sand-collecting ship individual face information and all the feature vectors in the database is not smaller than a preset threshold value, the input sand-collecting ship individual face information does not belong to the recorded sand-collecting ship in the database, namely, the illegal operation ship is judged, the illegal sand-collecting ship is marked and output, and meanwhile, early warning prompt management personnel is sent;

establishing an individual face database of the sand mining ship: traversing all sand-collecting boat pictures, detecting sand-collecting boats in each picture by using RetinaFace, intercepting the identified sand-collecting boat face pictures, aligning the acquired sand-collecting boat face pictures by using key point information, then sending the sand-collecting boat face pictures into a FaceNet for identification, extracting features of individual sand-collecting boat face pictures to generate 128-dimensional vectors containing individual sand-collecting boat face information, storing the result of individual sand-collecting boat face coding into an NPY-format file, and storing the names of all individual sand-collecting boats into another NPY-format file.

Example two

Taking the Poyang lake as an example, collecting pictures of 30 different sand-collecting vessels in a certain sand-collecting area of the Poyang lake and preprocessing the pictures, namely using a data augmentation technology to enrich a data set, wherein in practical application, the method comprises the following steps: the method comprises the steps of collecting pictures of 30 different sand collecting ship individuals in a certain sand collecting area of the Poyang lake, preprocessing a sand collecting ship data set by utilizing a data augmentation technology, so that the data set is richer, and in practical application, the ship pictures can be shot and collected from cameras deployed on the bank of the river and the lake;

and manufacturing a river and lake sand-collecting ship individual face recognition data set, wherein the first data set for training and improving RetinaFace comprises sand-collecting ship pictures and other kinds of ship pictures, the second data set for training and improving FaceNet comprises sand-collecting ship individual face images, and building and improving a RetinaFace+FaceNet network to improve the recognition accuracy of the sand-collecting ship individual faces.

Labeling the preprocessed picture by using a Labelme dataset labeling tool to form a first target detection dataset for training RetinaFace, and intercepting individual pictures of the face in the first dataset to generate a second face dataset for training FaceNet;

further, the collection and pretreatment of the individual face images of the sand-collecting ship are to collect 30 different sand-collecting ship images in a certain sand-collecting area of the Poyang lake, the ship data set is sorted manually, the sand-collecting ship images with clear images and easy resolution are screened out, and the data amount and the complexity of the data set are increased by adding the salt and pepper noise, color dithering and color gamut distortion data enhancement methods to the images;

further, marking the collected and preprocessed picture data of the sand mining ship, specifically numbering the processed picture in the first step, marking the target and key point information of the sand mining ship by using a label marking tool Labelme to form a JSON marking file in a COCO format, forming a first data set, intercepting face images of the sand mining ship of different individuals in the first data set, and forming a second data set for training a FaceNet model;

still further, JSON annotation files generated in the dataset one are converted into text files in batches, and then all the text files are combined into one text file, and the synthesized text file contains the path and name of the image, the coordinate information of the upper left corner and the lower right corner of the frame and the coordinate information of the key points.

Example III

The embodiment of the invention also provides a river and lake sand-picking ship target detection model, which replaces the LeakyReLU activation function in the basic convolution layer in the Retinaface main network with the GELU activation function to form a CBG convolution module to form a new main feature extraction network;

and respectively adding ECA attention mechanism modules after three different-scale output feature graphs C3, C4 and C5 of a backbone network of RetinaFace, wherein the ECA attention mechanism modules pay attention to the relation among channels, the models can automatically learn the importance degree of different channel features, simultaneously, an effect-RepGFPN network using a DAMO-YOLO network model is used as a feature fusion network, and finally, the output of the effective feature layer is connected to the feature fusion network.

As an implementation, the new CBG module includes a convolutional layer (Conv), a batch normalization layer (Batch Normalization), and an activation function (GELU);

the GELU activation function layer is a GELU activation function, and the formula is as follows:

training the RetinaFace model by utilizing a data set pair, training 1000 epochs to obtain an optimal model, and storing a weight file after training is completed;

a heuristic and training-free searching method is utilized to form a main network of the improved FaceNet, various main network structures with different scales are quickly searched in a large scale, the searching cost is reduced by evaluating from the entropy angle by utilizing the information theory, and finally, the main network of the final improved FaceNet is determined according to the identification precision requirement. And finally, introducing a GAM attention mechanism after a global average pooling layer behind the backbone network, and enhancing the self-adaptive attention of the network to the region of interest.

Training the FaceNet model by using a data set II to obtain an optimal recognition model, and storing a weight file after training is completed;

evaluating the Accuracy of the optimal detection model obtained by training by using a test set, wherein the evaluation indexes comprise a correct rate (Accuracy) and a ROC curve;

the calculation process of the evaluation index Accuracy (Accuracy) and ROC (Receiver Operating Characteristic Curve) curves is as follows:

(1) Accuracy rate (Accuracy)

Where TP represents the number of correctly divided positive examples, TN represents the number of correctly divided negative examples, P represents the number of positive samples, and N represents the number of negative samples.

(2) ROC curve

The ROC curve is used for evaluating the output quality of the classifier, the area under the ROC curve is AUC (Area Under the Curve), and the classification effect is better when the AUC is larger.

And finally, utilizing the improved Retinaface+FaceNet optimal recognition model obtained through training to recognize the individual faces of the river and lake sand-collecting ship.

The method further analyzes the identification performance of the collected pictures under the relatively clear sand-collecting ship target environment and the complex scene environment, wherein the picture collected under the relatively clear sand-collecting ship target environment is the picture collected under the relatively clear sand-collecting ship image with better weather environment. The picture collected under the complex scene environment is the picture collected under different positions of the face of the sand collecting ship facing the camera, wherein most of weather in the picture is overcast, rainy, evening and night. Through experimental analysis, the individual recognition effects before and after improvement under the relatively clear sand mining ship target environment are shown in table 1, and from the table, it can be seen that the AUC value of the individual recognition of the sand mining ship by the improved method is improved by 0.03, the recognition accuracy is improved by 2.06%, and the improved sand mining ship individual recognition algorithm can accurately recognize different sand mining ships and the same sand mining ship, so that the network model provided by the invention has higher accuracy in the face recognition of the river and lake sand mining ship individual.

Table 1 comparison of identification effects of methods before and after improvement of relatively clear sand production ship targets

The individual recognition effects of the improved method on the target sand dredger under complex scenes such as darkness and blurring are shown in the table 2, and the table shows that the AUC value of the improved method on the individual recognition of the sand dredger is improved by 0.04, the recognition accuracy is improved by 4.77%, and the improved sand dredger individual recognition algorithm can accurately recognize different sand dredger and the same sand dredger under the complex scenes, so that the network model provided by the invention has higher accuracy in the face recognition of the river and lake sand dredger individual.

Table 2 comparison of identification effects of sand production vessels by methods before and after improvement in complex scene

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. The method for identifying the face of the river and lake sand-collecting ship is characterized by comprising the following steps of:

2. The method for recognizing the faces of sand-collecting vessels in river and lake according to claim 1, wherein pictures of different sand-collecting vessels in a sand-collecting region of the river and lake are subjected to data augmentation pretreatment, then labeling is carried out by using a Labelme dataset labeling tool to form a first target detection dataset for training RetinaFace, and the faces of individual sand-collecting vessels in the pictures are intercepted to form a second face dataset of the individual sand-collecting vessels for training Facenet.

3. The method for identifying the face of the sand-collecting ship in the river and lake according to claim 2, wherein pictures in the picture data set of the sand-collecting ship trained in the step S1 are numbered, and picture target detection and key point information of different sand-collecting ships in the sand-collecting area in the river and lake are marked by using a label marking tool Labelme to form a data set I of a JSON marked file in a COCO format.

4. A method for recognizing the face of a river and lake sand-collecting ship according to claim 3, wherein JSON annotation files generated in the first dataset are converted into text files in batches, all the text files are combined into one text file, the combined text file contains the path and name of the image, the coordinate information of the upper left corner and the lower right corner of the frame and the coordinate information of the key points, and the text file is combined with the original image to be the first dataset.

5. The river and lake sand-collecting vessel face recognition method according to claim 1, wherein shot sand-collecting vessels are transmitted into Retinaface target detection and key point positioning algorithm to conduct classification and positioning recognition, whether sand-collecting vessels exist in pictures is judged, if the sand-collecting vessels exist, targets are marked by prediction frames, sand-collecting vessel face information is marked by key points, then correction of the pictures is conducted, the pictures are corrected to be horizontal and vertical, then interception is conducted according to the sand-collecting vessel targets selected by the frames, and the individual face pictures of the intercepted sand-collecting vessels are transmitted into an improved faceNet network;