CN112115783B

CN112115783B - Depth knowledge migration-based face feature point detection method, device and equipment

Info

Publication number: CN112115783B
Application number: CN202010809064.1A
Authority: CN
Inventors: 吕科; 高鹏程; 薛健
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2020-08-12
Filing date: 2020-08-12
Publication date: 2023-11-14
Anticipated expiration: 2040-08-12
Also published as: CN112115783A

Abstract

The embodiment of the invention discloses a face feature point detection method, a device and equipment based on depth knowledge migration, wherein the method comprises the following steps: providing a face data set, and cutting a face image according to a face detection frame or a bounding box of face feature points provided by the face data set to obtain a training set, a verification set and a test set; inputting a test sample and a training sample into an initial face alignment network frame; training a teacher network and a student network in an initial face alignment network framework by using Pytorch until a loss function and the maximum iteration number meet a preset condition to generate a training model; freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, and transmitting the deep dark knowledge to a student network to generate a final face alignment network model; and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face feature point detection result. The face feature point detection precision manuscript has low model parameter and calculation complexity.

Description

Depth knowledge migration-based face feature point detection method, device and equipment

Technical Field

The embodiment of the invention relates to the field of computer vision and digital image processing, in particular to a face feature point detection method, device and equipment based on depth knowledge migration.

Background

The existing method for detecting the face feature points cannot effectively solve the problem that the face feature point positioning in a natural scene is not effective, the complex method is huge in model parameters, high in calculation complexity and incapable of meeting the requirement of operation speed. The simple method can not cope with the interference of factors such as extreme gesture, changeable illumination, serious shielding and the like in a natural scene, and the precision can not meet the application requirement.

Disclosure of Invention

The embodiment of the invention aims to provide a face feature point detection method, device and equipment based on depth knowledge migration, which are used for solving the problems of high computational complexity, low running speed and low precision of the existing face feature point detection.

In order to achieve the above purpose, the embodiment of the present invention mainly provides the following technical solutions:

in a first aspect, an embodiment of the present invention provides a face feature point detection method based on depth knowledge migration, including:

s1: providing a face data set containing face feature point labels, and cutting a face image according to a face detection frame or a bounding box of the face feature points provided by the face data set to obtain a training set, a verification set and a test set;

s2: obtaining training samples from the training set, obtaining test samples from the test set, and inputting the test samples and the training samples into an initial face alignment network frame;

s3: setting parameters of a convolutional neural network, and training a teacher network and a student network in the initial face alignment network framework by using Pytorch until a loss function and the maximum iteration number meet preset conditions to generate a training model;

s4: freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising the training process of the student network to generate a final face alignment network model;

s5: and inputting the RGB face image in the natural scene into the final face alignment network model, and outputting a face feature point detection result.

In one embodiment of the present invention, step S1 includes:

s1-1: providing a WFLW data set, wherein the WFLW data set comprises N training pictures and M test pictures, each picture is provided with a picture tag, the picture information comprises face frame information, face characteristic point position information and a plurality of attribute information, and N and M are positive integers larger than zero;

s1-2: cutting a face image according to a face detection frame provided by the face data set, disturbing the face detection frame, and applying random rotation, size scaling and overturning to the face image so as to enhance data and obtain the training set, the verification set and the test set.

In one embodiment of the invention, the initial face alignment network framework is generated by:

generating the teacher grid by adopting a network structure of an encoder-decoder, wherein the teacher grid encoder comprises three up-sampling layers and a convolution layer, and is used for carrying out feature extraction and encoding on an input image, retaining feature extraction information of an original network, and removing a final average pooling layer, a full-connection layer used for classification and a final dimension-lifting 1 multiplied by 1 convolution layer;

adding the decoder after the encoder, performing spatial up-sampling on the image features extracted by the encoder to obtain feature images, converting the channel dimension of the feature images into the number of face feature points, and calculating the expected corresponding face feature point coordinates on each feature image after transformation by using spatial softargmax operation;

providing a student network of an EfficientFAN structure, wherein the student network encoder comprises three upsampling layers and a convolution layer, the student network is used for final face feature point detection, efficientNet-B0 is used as a trunk part of the student network encoder, and a final average pooling layer, a full connection layer used for classification and a final 1X 1 convolution layer of up-dimension of the EfficientNet-B0 are removed;

and adding a 1 multiplied by 1 convolution layer after the student grid encoder, converting the channel number of the feature map obtained by up-sampling of the student grid encoder into the number of the face feature points, and calculating the coordinates of the face feature points on the converted feature map by using space softargmax operation.

In one embodiment of the present invention, step S3 includes:

training the teacher network and the student network separately using a feature point loss function L _P Optimizing network parameters and characteristic point loss function L _P Calculated by the windloss loss function, which is expressed as follows:

wherein P is E R ^1×2N Is the predicted face feature point coordinate vector, G E R ^1×2N Is a real face feature point coordinate vector, N is the number of face feature points, ω and e are preset parameters of f (x).

In one embodiment of the present invention, in step S4, extracting deep dark knowledge learned by the teacher network includes:

extracting pixel distribution information on a feature map based on a feature alignment knowledge distillation method, aligning pixel distribution of the feature map of the teacher network and the student network, wherein a feature alignment knowledge distillation loss function is as follows:

wherein A and B are feature graphs of the teacher network and the student network at the same stage respectively,is a 1 x 1 convolution layer used to align the channel dimensions of the two feature maps a and B.

In one embodiment of the present invention, in step S4, transferring the deep secret knowledge to the student network includes:

and extracting face structure information under different scales by a knowledge distillation method based on block similarity, and transmitting the structured information of the face image to the student network by the teacher network.

In a second aspect, an embodiment of the present invention further provides a face feature point detection device based on depth knowledge migration, including:

the face image processing device comprises a providing module, a face image processing module and a face image processing module, wherein the providing module is used for providing a face data set containing face feature point labels, and cutting a face image according to a face detection frame or a bounding box of the face feature points provided by the face data set to obtain a training set, a verification set and a test set;

an output module;

the control processing module is used for acquiring training samples from the training set, acquiring test samples from the test set and inputting the test samples and the training samples into an initial face alignment network frame; the control processing module is also used for setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network framework by using Pytorch, and generating a training model until a loss function and the maximum iteration number meet preset conditions; the control processing module is also used for freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising the training process of the student network to generate a final face alignment network model; the control processing module is also used for inputting RGB face images in the natural scene into the final face alignment network model, and outputting a face feature point detection result through the output module.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: at least one processor and at least one memory; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform the face feature point detection method based on depth knowledge migration according to the first aspect.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium containing one or more program instructions for being executed with the depth knowledge migration-based face feature point detection method according to the first aspect.

The technical scheme provided by the embodiment of the invention has at least the following advantages:

according to the face feature point detection method, device and equipment based on depth knowledge migration, provided by the embodiment of the invention, the EfficientFAN is adopted as a simple and effective lightweight model, the up-sampling recovery process of the feature map is rapidly realized based on the decoder structure of up-sampling and depth separable convolution, and the spatial information of the feature map is effectively saved.

Compared with the current advanced large-scale complex model, the invention can reach comparable face feature point detection precision, but the model parameter and the calculation complexity are obviously reduced.

The invention uses a knowledge distillation method and a knowledge migration module to improve the accuracy of positioning feature points of the face of the student network EfficientFAN, and provides a block similarity knowledge distillation method for learning multi-scale structural information of the face, and the training process of EfficientFAN is jointly supervised and guided by combining the pixel distribution information on a feature alignment knowledge distillation learning feature diagram. On the premise of not changing the network structure and not increasing the model parameters, the EfficientFAN obtains more accurate face feature point detection results through a knowledge migration method. Experimental results on the public data set show that the EfficientFAN is a simple and effective face feature point detection network, and the knowledge distillation method effectively improves the accuracy of face feature point detection. In combination, the EfficientFAN has quite excellent performance, and has both precision and speed.

Drawings

Fig. 1 is a flowchart of a face feature point detection method based on depth knowledge migration.

Fig. 2 is a block diagram of a face feature point detection device based on depth knowledge migration according to the present invention.

Detailed Description

Further advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure of the present invention, which is described by the following specific examples.

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In the description of the present invention, it is to be understood that the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless explicitly stated and limited otherwise, the terms "connected" and "connected" are to be construed broadly, and may be connected directly or indirectly through intermediaries, for example. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Fig. 1 is a flowchart of a face feature point detection method based on depth knowledge migration. As shown in fig. 1, the face feature point detection method based on depth knowledge migration of the present invention includes:

s1: providing a face data set containing face feature point labels, and cutting a face image according to a face detection frame or a bounding box of the face feature points provided by the face data set to obtain a training set, a verification set and a test set.

Specifically, step S1 includes:

s1-1: a WFLW dataset is provided. The dataset was derived from IEEE Conference on Computer Vision and Pattern Recognition 2018 and contained 10000 pictures (7500 training pictures and 2500 test pictures). Each picture tag provides face frame information, 98 face feature point location information, and 6 kinds of attribute information (pose, expression, illumination, makeup, occlusion, blurring), and the entire dataset is divided into 6 kinds of subsets according to the image attribute information.

S1-2: cutting a face image according to a face detection frame provided by a face data set, disturbing the face detection frame, and applying random rotation, size scaling and overturning to the face image so as to enhance data and obtain a training set, a verification set and a test set.

S2: training samples are obtained from the training set, test samples are obtained from the testing set, and the test samples and the training samples are input into the initial face alignment network framework.

Specifically, the teacher network adopts an encoder-decoder network architecture, using Efficient Net-B7 as the backbone portion of its encoder. The encoder is used for feature extraction and encoding of the input image, only preserving the feature extraction part of the original network, removing the last average pooling layer and the fully connected layers for classification, also removing the last updimensional 1 x 1 convolution layer and extracting features from the last inverse residual module. Compared with the feature map after the 1X 1 convolution layer, the channel number of the feature map extracted by the teacher network has smaller channel number (640 vs. 2048), so that more original feature information is maintained, information cannot be lost due to dimension increase, and the low-dimension feature map is more suitable for decoder analysis.

And adding a decoder after the last reverse residual error module of the Efficient Net-B7, performing spatial upsampling on the image features extracted by the encoder, and improving the spatial dimension of the feature map by using a more natural upsampling method, namely replacing deconvolution by using the combination of an upsampling layer and a convolution layer, performing spatial upsampling on the feature map by using a general upsampling method, and then performing convolution operation on the basis of the upsampled feature map to enrich the transformation of the feature map.

The invention uses a combination of three upsampling layers and convolutional layers as a decoder for the face alignment network, added after the encoder. The depth separable convolution is used in the network model to replace the traditional convolution operation, so that the calculated amount in the up-sampling process is reduced.

Specifically, the scale factor of the upsampling layer is set to 2, that is, the length and width of the feature map obtained by each upsampling are doubled as compared with those of the input feature map, and the upsampling of the feature map is realized based on the nearest neighbor interpolation algorithm. A 1 x 1 convolutional layer is used after the decoder to generate a space thermodynamic diagram and to convert the channel dimension of the feature map to the number of face feature points. And calculating the expected corresponding face feature point coordinates on each feature map after transformation by using a space softargmax operation.

The spatial softargmax operation can be divided into two steps, the first step being normalized on the output signature using the softmax operation, which can be expressed as:

where x, y are pixel indexes, exp represents an exponential function, and M is a normalized feature map. In the second step, the coordinates P of the feature point l can be finally expressed as:

a small and lightweight student network, called Efficient Face Alignment Network (EfficientFAN), has a network structure similar to that of a teacher network, and will be used for final face feature point detection. EfficientNet-B0 is used as the backbone part of the student network EfficientFAN encoder. Like the teacher network, the encoder of the student network also eliminates the last average pooling layer and full-connection layer for classification in Efficient Net-B0, and the last 1X 1 convolution layer in the upgrad.

Likewise, a combination of three upsampling layers and convolutional layers is used as a decoder for the student network, added after the encoder. The scale factor for each up-sampling layer is 2 and the number of output channels for each convolutional layer is 128. A 1 x 1 convolutional layer is added after the decoder of the student network, and the number of channels of the feature map obtained by up-sampling the decoder is converted from 128 to the number of face feature points.

And finally, calculating coordinates of the face feature points on the converted feature map by using space softargmax operation.

Table 1 student network

The specific structure of the student network is shown in table 1, wherein MBConv represents a handset-side reverse residual module (Mobile Inverted Bottleneck) used by efficiency, DSConv represents a depth separable convolution, and k represents the size of a convolution kernel.

The teacher network located above and the student network located below are organically linked together through a knowledge migration (Knowledge Transfer) module.

The high-efficiency face alignment network based on depth knowledge migration uses two knowledge distillation methods, so that different types of dark knowledge are migrated from a teacher network to a student network EfficientFAN.

The knowledge distillation method for feature alignment extracts pixel distribution information on the feature map, and aligns pixel distribution of the teacher network and the student network feature map, so that the feature map distribution of the student network is close to the distribution of the teacher network.

Correspondingly, the knowledge distillation method of the block similarity extracts the face structure information under different scales, and the structured information of the face image is transmitted to the student network from the teacher network, so that the simple student network can learn the face structure information of the current image.

Feature alignment distillation aligns channel dimensions of feature graphs at the same stage of a teacher network and a student network, and directly compares differences between the teacher network feature graph and the aligned student network feature graph as supervision information in the student network training process.

S3: setting parameters of a convolutional neural network, and training a teacher network and a student network in an initial face alignment network framework by using Pytorch until a loss function and the maximum iteration number meet preset conditions to generate a training model.

In particular, a sheetTraining teacher and student networks exclusively using feature point loss function L _P And optimizing network parameters. Characteristic point loss function L _P By calculation of the Wing loss function, the Wing loss function can be expressed as follows:

wherein P is E R ^1×2N Is the predicted face feature point coordinate vector, G E R ^1×2N Is the real face feature point coordinate vector, and N is the number of face feature points. f (x) is a specially designed loss function that appears as a logarithmic loss function with offset for small errors; for larger errors, which appear as L1 loss functions, ω, ε are preset parameters of f (x),is a constant.

S4: the model parameters of the teacher network are frozen, deep dark knowledge learned by the teacher network is extracted, the deep dark knowledge is transmitted to the student network, and the training process of the student network is supervised to generate a final face alignment network model.

Specifically, the knowledge distillation method with characteristic alignment extracts pixel distribution information on the characteristic map, and aligns pixel distribution of the characteristic map of the teacher network and the characteristic map of the student network, so that the characteristic map distribution of the student network approaches to the distribution of the teacher network. The knowledge of feature alignment distillation loss function may be defined as follows:

wherein A and B are characteristic diagrams of the teacher network and the student network at the same stage respectively,is a 1 x 1 convolution layer used to align the channel dimensions of the two feature maps a and B.

The knowledge distillation method of the block similarity extracts the face structure information under different scales, and transmits the structured information of the face image to the student network from the teacher network, so that the simple student network can learn the face structure information of the current image.

And constructing relationship diagrams of different scales for the input feature diagrams, and calculating a similarity matrix based on the constructed relationship diagrams. For a feature map of size h×w, the feature map area may be divided by local blocks of different sizes. The size of the feature map generally satisfies h=w=2 ⁿ The whole feature graph is taken as a connected domain, the relation graph is constructed based on local blocks with different sizes as nodes, and the nodes in the relation graph can be set to be 2 ^k ×2 ^k K=0, 1, …, k-1 size local block. One width 2 ⁿ ×2 ⁿ Node size of 2 for feature map construction ^k ×2 ^k The relationship diagram of (2) includes 2 ^n-k ×2 ^n-k Local blocks or relationship nodes. For simplicity, 2 will be used with the average pooling operation ^k ×2 ^k Is aggregated into 1 x 1 relationship graph nodes. For a feature map with a channel number, the vectorization of the first node in the constructed relation map can be expressed as f _i ∈R ^C . Calculating the similarity relation between nodes in the relation graph by using the cosine similarity of the vectors, and calculating the ith node vector f _i And the j-th node vector f _j Similarity a between _ij The calculations are shown below.

In particular, the intermediate feature maps of the teacher network and the student network at the same stage have the same resolution and different channel numbers. Assume that the characteristic diagram of the teacher network is A epsilon R ^C×H×W The characteristic diagram of the student network is B epsilon R ^C′×H×W On the characteristic diagram with 2 ^k ×2 ^k In the affinity graph constructed by using local blocks with the size as nodes, the number of the nodes is 4 ^n-k The similarity relation between two nodes can be calculated to obtain 4 ^n-k ×4 ^n-k A similarity matrix of size. Order theRepresentation of teacher network feature graph with 2 ^k ×2 ^k The local blocks with the sizes are cosine similarity obtained by the ith node and the jth node in the relation graph constructed by the nodes,the characteristic diagram corresponding to the student network is also represented by 2 ^k ×2 ^k Cosine similarity obtained by the ith node and the jth node in a relation diagram constructed by partial blocks with the size, the loss function of the block similarity knowledge distillation method can be generalized as follows, wherein the size of the feature diagram satisfies h=w=2 ⁿ 。

Combining a feature alignment knowledge distillation method and a block similarity knowledge distillation method, and introducing a knowledge migration loss function L _KT As part of the network training loss function, the training process of the student network is supervised. The student network not only learns the true label information provided by the labeled face feature point coordinates, but also learns finer face structured knowledge and data distribution knowledge extracted from the teacher network. Optimizing the performance of student network EfficientFAN by using a knowledge migration module and a knowledge distillation method, keeping parameters of a teacher network after pre-training in a frozen state, and transferring a knowledge transfer loss function L _KT And the training loss function is added, and the dark knowledge learned by the distillation teacher network in the training process of the EfficientFAN is transmitted to the student network, so that the accuracy of positioning the face feature points of the student network is improved. The loss function finally used for optimizing the student network EfficientFAN is shown as follows, and is represented by a characteristic point loss function L _P And L _KT In combination, where lambda is an adjustable weight parameter for balancing the effects of two loss functions,and->The block similarity knowledge distillation loss function and the feature alignment knowledge distillation loss function of the decoder stage d, respectively.

S5: and inputting the RGB face image in the natural scene into a final face alignment network model, and outputting a face feature point detection result.

According to the face feature point detection method based on depth knowledge migration, provided by the embodiment of the invention, the EfficientFAN is adopted as a simple and effective lightweight model, the up-sampling recovery process of the feature map is rapidly realized based on the decoder structure of up-sampling and depth separable convolution, and the spatial information of the feature map is effectively saved.

Fig. 2 is a block diagram of a face feature point detection device based on depth knowledge migration according to the present invention. As shown in fig. 2, the face feature point detection device based on depth knowledge migration of the present invention includes: a module 100, an output module 200 and a control processing module 300 are provided.

The providing module 100 is configured to provide a face data set including face feature point labels, and cut a face image according to a face detection frame or a bounding box of the face feature points provided by the face data set to obtain a training set, a verification set and a test set. The control processing module 300 is configured to obtain training samples from the training set, obtain test samples from the test set, and input the test samples and the training samples into the initial face alignment network frame. The control processing module 300 is further configured to set parameters of the convolutional neural network, train the teacher network and the student network in the initial face alignment network framework by using Pytorch, and generate a training model until the loss function and the maximum iteration number meet predetermined conditions. The control processing module 300 is further configured to freeze model parameters of the teacher network, extract deep dark knowledge learned by the teacher network, transmit the deep dark knowledge to the student network, and monitor a training process of the student network to generate a final face alignment network model. The control processing module 300 is further configured to input the RGB face image in the natural scene into a final face alignment network model, and output a face feature point detection result through the output module.

It should be noted that, the specific implementation manner of the face feature point detection device based on depth knowledge migration in the embodiment of the present invention is similar to the specific implementation manner of the face feature point detection method based on depth knowledge migration in the embodiment of the present invention, and specific reference is made to the description of the face feature point detection method based on depth knowledge migration, so that redundancy is reduced and redundant description is omitted.

In addition, other structures and functions of the facial feature point detection device based on depth knowledge migration according to the embodiments of the present invention are known to those skilled in the art, and in order to reduce redundancy, description is omitted.

The embodiment of the invention also provides electronic equipment, which comprises: at least one processor and at least one memory; the memory is used for storing one or more program instructions; the processor is configured to execute one or more program instructions to perform the face feature point detection method based on depth knowledge migration according to the first aspect.

The disclosed embodiments provide a computer readable storage medium having stored therein computer program instructions that, when executed on a computer, cause the computer to perform the above-described depth knowledge migration-based face feature point detection method.

In the embodiment of the invention, the processor may be an integrated circuit chip with signal processing capability. The processor may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP for short), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), a field programmable gate array (Field Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The processor reads the information in the storage medium and, in combination with its hardware, performs the steps of the above method.

The storage medium may be memory, for example, may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.

The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable ROM (Electrically EPROM, EEPROM), or a flash Memory.

The volatile memory may be a random access memory (Random Access Memory, RAM for short) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (Direct Rambus RAM, DRRAM).

The storage media described in embodiments of the present invention are intended to comprise, without being limited to, these and any other suitable types of memory.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the present invention may be implemented in a combination of hardware and software. When the software is applied, the corresponding functions may be stored in a computer-readable medium or transmitted as one or more instructions or code on the computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention in further detail, and are not to be construed as limiting the scope of the invention, but are merely intended to cover any modifications, equivalents, improvements, etc. based on the teachings of the invention.

Claims

1. The face feature point detection method based on depth knowledge migration is characterized by comprising the following steps of:

s5: inputting RGB face images in a natural scene into the final face alignment network model, and outputting a face feature point detection result;

the initial face alignment network framework is generated by:

generating a teacher grid by adopting a network structure of an encoder-decoder, wherein the teacher grid encoder comprises three up-sampling layers and a convolution layer, and is used for carrying out feature extraction and encoding on an input image, retaining feature extraction information of an original network, and removing a final average pooling layer, a full-connection layer used for classification and a final dimension-rising 1X 1 convolution layer;

a convolution layer of 1 multiplied by 1 is added behind the student grid encoder, the channel number of the feature image obtained by up-sampling of the student grid encoder is converted into the number of the face feature points, and the coordinates of the face feature points are calculated on the converted feature image by using space softargmax operation;

the step S3 comprises the following steps:

wherein P is E R ^1×2N Is the predicted face feature point coordinate vector, G E R ^1×2N Is a real face feature point coordinate vector, N is the number of face feature points, ω and e are preset parameters of f (x);

in step S4, extracting deep dark knowledge learned by the teacher network, including:

wherein A and B are feature graphs of the teacher network and the student network at the same stage respectively,is a 1 x 1 convolution layer for aligning the channel dimensions of the two feature maps a and B;

in step S4, the delivering the deep secret knowledge to the student network includes:

2. The face feature point detection method based on depth knowledge migration of claim 1, wherein step S1 comprises:

3. The device for detecting the facial feature points based on depth knowledge migration is characterized by comprising the following components:

an output module;

the control processing module is used for acquiring training samples from the training set, acquiring test samples from the test set and inputting the test samples and the training samples into an initial face alignment network frame; the control processing module is also used for setting parameters of a convolutional neural network, training a teacher network and a student network in the initial face alignment network framework by using Pytorch, and generating a training model until a loss function and the maximum iteration number meet preset conditions; the control processing module is also used for freezing model parameters of a teacher network, extracting deep dark knowledge learned by the teacher network, transmitting the deep dark knowledge to the student network, and supervising the training process of the student network to generate a final face alignment network model; the control processing module is also used for inputting RGB face images in a natural scene into the final face alignment network model, and outputting a face feature point detection result through the output module;

the initial face alignment network framework is generated by:

extracting deep dark knowledge learned by the teacher through the network, wherein the deep dark knowledge comprises the following steps:

4. An electronic device, the electronic device comprising: at least one processor and at least one memory;

the memory is used for storing one or more program instructions;

the processor is configured to execute one or more program instructions to perform the depth knowledge migration-based face feature point detection method according to any one of claims 1-2.

5. A computer readable storage medium, wherein the computer readable storage medium contains one or more program instructions for performing the depth knowledge migration-based face feature point detection method according to any one of claims 1-2.