WO2022127111A1

WO2022127111A1 - Cross-modal face recognition method, apparatus and device, and storage medium

Info

Publication number: WO2022127111A1
Application number: PCT/CN2021/107933
Authority: WO
Inventors: 陈碧辉; 高通; 钱贝贝; 黄源浩; 肖振中
Original assignee: 奥比中光科技集团股份有限公司
Priority date: 2020-12-14
Filing date: 2021-07-22
Publication date: 2022-06-23
Also published as: CN112507897A

Abstract

A cross-modal face recognition method, apparatus and device, and a storage medium. The method comprises: by using a cross-modal face recognition model obtained by training a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence, performing face recognition on a face image to be recognized. Thus, the recognition accuracy of face images obtained under cameras of different modalities can be improved.

Description

Cross-modal face recognition method, device, device and storage medium

technical field

The present application belongs to the technical field of image processing, and in particular, relates to a cross-modal face recognition method, apparatus, device and storage medium.

Background technique

The accuracy of face recognition is greatly affected by ambient lighting. Common face recognition technologies are mainly used to recognize face images captured by near-infrared cameras that are not affected by ambient light. However, in the real environment, uneven lighting and poor lighting often occur, which requires recognition of images obtained under different modalities of cameras, and the current face recognition technology cannot detect different modalities of cameras. Accurate identification of the images acquired under. Therefore, in the prior art, there is a problem that the recognition accuracy of face images obtained under different modalities of cameras is not high.

SUMMARY OF THE INVENTION

The present application provides a cross-modal face recognition method, device, device and storage medium, which can solve the problem of low recognition accuracy of face images obtained under different modalities of cameras.

In a first aspect, the present application provides a cross-modal face recognition method, including:

Collect face images to be recognized;

Inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;

The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;

According to the visible light face preprocessing image sequence and the first classification loss function, the preset cross-modal face recognition model is trained to obtain the first cross-modal face recognition model;

Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.

In an optional implementation, the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction. The max pooling layer of .

In an optional implementation manner, before the acquiring the first training sample set, the method further includes:

acquiring the first preset number of visible light face image sequences;

Perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.

acquiring the second preset number of infrared face image sequences;

Perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence, including:

Converting the visible light face image sequence into a grayscale image;

The grayscale image is normalized to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence, including:

performing image contrast enhancement on the infrared face image sequence to obtain an enhanced infrared face image sequence;

The enhanced infrared face image sequence is normalized to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, image contrast enhancement is performed on the infrared face image sequence to obtain an enhanced infrared face image sequence, including:

The histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.

In a second aspect, the present application provides a cross-modal face recognition device, including:

The acquisition module is used to collect the face image to be recognized;

A recognition module, for inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;

The preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model;

In an optional implementation manner, it also includes:

a first acquisition module, configured to acquire the first preset number of visible light face image sequences;

The first processing module is configured to perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.

In an optional implementation manner, it also includes:

a second acquisition module, configured to acquire the first preset number of infrared face image sequences;

The second processing module is configured to perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.

In an optional implementation, the first processing module includes:

a conversion unit, for converting the visible light face image sequence into a grayscale image;

The first processing unit is configured to perform normalization processing on the grayscale image to obtain the visible-light face preprocessing image sequence.

In an optional implementation, the second processing module includes:

an enhancement module, configured to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence;

The second processing unit is used for normalizing the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.

In an optional implementation manner, the enhancement module is specifically used for:

In a third aspect, the present application provides a cross-modal face recognition device, the above-mentioned cross-modal face recognition device includes a memory, a processor, and a computer program stored in the above-mentioned memory and running on the above-mentioned processor, the above-mentioned When the processor executes the above-mentioned computer program, the steps of the method of the above-mentioned first aspect are implemented.

In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the method in the first aspect.

In a fifth aspect, the present application provides a computer program product, wherein the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method of the first aspect are implemented.

The cross-modal face recognition method of the first aspect above uses a cross-modal face recognition model trained from a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence to perform a facial image recognition process. Face recognition can improve the accuracy of face image recognition obtained under different modes of cameras.

It can be understood that, for the beneficial effects of the foregoing second aspect to the fifth aspect, reference may be made to the relevant descriptions in the foregoing first aspect, and details are not described herein again .

Description of drawings

In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

Fig. 1 is the realization flow chart of the cross-modal face recognition method provided by the embodiment of the present application;

Fig. 2 is the training process schematic diagram of the cross-modal face recognition model that pre-training is completed;

3 is a schematic diagram of a cross-modal face recognition device provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application.

Detailed ways

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or sets thereof.

It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.

As used in the specification of this application and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrases "if it is determined" or "if the [described condition or event] is detected" may be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is detected. ]" or "in response to detection of the [described condition or event]".

In addition, in the description of the specification of the present application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and should not be construed as indicating or implying relative importance.

References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.

The cross-modal face recognition method provided by the present application is exemplarily described below with reference to specific embodiments. As shown in FIG. 1 , FIG. 1 is an implementation flowchart of the cross-modal face recognition method provided by the embodiment of the present application. This implementation can be performed by cross-modal face recognition devices, including but not limited to self-service terminals, monitoring equipment, attendance equipment, and servers, robots, wearable devices or mobile terminal, etc. Details are as follows:

S101, collect a face image to be recognized.

In the embodiment of the present application, the face image to be recognized may be a face image collected in a visible light mode or an infrared mode. Exemplarily, a camera of a cross-modal face recognition device, such as a camera of a mobile terminal or an attendance device, may collect a face image in a visible light mode, or collect a face image in an infrared mode.

S102 Input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition.

In the embodiment of the present application, the pre-trained cross-modal face recognition model is obtained by pre-training the deep convolutional neural network through the face image in the visible light modality to obtain the pre-trained cross-modal image The deep convolutional neural network provides prior knowledge for the training of the deep convolutional neural network of cross-modal images, and then the face image in the visible light mode and the face image in the infrared mode are formed according to preset rules. Tuple training set, and select the pre-trained cross-modal image deep convolutional neural network for fine-tuning, and iterate repeatedly until the performance of the pre-trained cross-modal image deep convolutional neural network is no longer improved. A neural network model.

Among them, FIG. 2 is a schematic diagram of the training process of the pre-trained cross-modal face recognition model. As shown in Figure 2, the training process of the pre-trained cross-modal face recognition model includes the following steps:

S201: Obtain a first training sample set, where the first training sample set includes a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences.

It should be noted that a color camera or a multispectral camera can be used to collect a visible-light face image sequence including a human face. Among them, visible light face images contain rich texture features and are easily affected by ambient light. Therefore, in some optional implementation manners, the visible light face preprocessing image sequence is obtained by acquiring a first preset number of visible light face image sequences; and performing pixel equalization processing on the visible light face image sequence.

In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence, which may include: collecting the face region including the visible light face image. Segmentation with the background area to obtain the visible face image.

In some embodiments of the present application, before image segmentation, the image detection model may be used to detect whether there is a face in the image to be processed. When the output result of the image detection model shows that there is no face in the image, it is not necessary to perform a face segmentation, and end face segmentation processing to reduce unnecessary workload. When the output result of the image detection model shows that there is a face in the image, it is possible to further screen the face, that is, to determine whether there is a face that meets the preset conditions in the image, for example, it can be to determine whether the face meets the requirements. , the above requirements can be preset for the position and/or size of the face, for example, the size of the face area can be preset to meet the preset size, and it is considered to meet the requirements. When the face meets the preset conditions, follow-up processing is performed, such as image rotation correction and object segmentation processing on the image. When the face does not meet the preset conditions, the image can not be segmented.

In some embodiments of the present application, a series of visible light face images are processed as above to obtain a visible light face image sequence, and further preprocessing of the visible light face image sequence can obtain a visible light face preprocessed image sequence. Optionally, preprocessing the visible light face image sequence may include: performing grayscale conversion and normalization processing on the visible light face image sequence to obtain a visible light face preprocessing image sequence.

Among them, the face images in the visible light face image sequence are converted into grayscale images by grayscale. Optionally, grayscale conversion can be performed by a preset grayscale conversion formula, and the preset grayscale conversion formula can be expressed as:

I _gary = 0.2989×R+0.5870×G+0.1140×B

Among them, Igray is the grayscale image output after grayscale conversion, and R, G, and B are the RGB values corresponding to the image before grayscale conversion.

The converted grayscale image is further subjected to normalization processing, exemplarily, normalization processing is performed by using a preset normalization processing formula.

S202: Train a preset cross-modal face recognition model according to the visible-light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model.

Since the face in the visible light modality is more complex, the cross-modal neural network can be pre-trained by using the visible light face preprocessing image sequence. In an embodiment of the present application, the visible light face preprocessing image sequence is divided into two parts: a training set and a validation set. Wherein, the training set and the verification set do not overlap, the training set is used to train the preset cross-modal face recognition model, and the verification is carried out through the training of the preset cross-modal face recognition model in the verification set. A classification loss function, by continuously saving the neural network that minimizes the loss of the validation set to determine the first cross-modal neural network finally trained in this embodiment, and the first cross-modal neural network is the first cross-modal neural network. face recognition model.

S203: Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modality face recognition model, and classify the first cross-modality face recognition model based on the first classification loss function The face recognition model is retrained to obtain a second cross-modal face recognition model.

It should be noted that the infrared face preprocessing image sequence is obtained by preprocessing the infrared face image sequence. Exemplarily, before acquiring the second training sample set, the method includes: acquiring the second preset number of infrared face image sequences; performing pixel equalization processing on the infrared face image sequences to obtain the infrared face pre-set number. Process image sequences.

Wherein, performing pixel equalization processing on the infrared face image sequence includes: performing image contrast enhancement and normalization processing on the infrared face image sequence. In some embodiments of the present application, histogram equalization may be performed on the infrared face image sequence to enhance image contrast. Among them, histogram equalization is a method to enhance the image contrast by stretching the pixel intensity distribution range. In some other embodiments, a logarithmic function and a power function can also be used to transform the infrared face image sequence to enhance the image contrast.

In addition, the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.

Following the example of this application, the process of normalizing the infrared face image sequence is the same as the process of normalizing the visible light face image, and details are not repeated here.

In some embodiments of the present application, the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers. Among them, there can be any number of convolutional layers before the fully connected layer. Exemplarily, the cross-modal face recognition model includes five convolutional layers and one fully connected layer; wherein, the first convolutional layer and the second convolutional layer each include two convolutional layers for feature extraction. and a maximum pooling layer for dimensionality reduction, the third to fifth convolutional layers include three convolutional layers for feature extraction and one maximum pooling layer for dimensionality reduction, each layer The feature maps after the operation are all subjected to nonlinear activation functions; the visible light preprocessing image sequence and the infrared preprocessing image sequence are subjected to convolution operation to extract feature values through the convolution layer, and then the face feature vector is output through the fully connected layer.

Exemplarily, the first convolutional layer may include two convolutional layers with a convolution kernel size of 3×3, a stride of 1×1, and a convolutional kernel number of 64, and one convolutional kernel of 2×2. , a maximum pooling layer with stride 2×2; the second convolutional layer includes two convolutional layers with kernel size 3×3, stride 1×1, and the number of convolution kernels 128 and a volume The maximum pooling layer with a kernel of 2×2 and a stride of 2×2; the third convolutional layer includes three convolution kernels with a size of 3×3, a stride of 1×1, and a number of convolution kernels of 256. A convolutional layer and a max-pooling layer with a convolution kernel of 2×2 and a stride of 2×2; the fourth convolutional layer includes three convolutional kernels of size 3×3, a stride of 1×1, and a convolutional A convolutional layer with 512 kernels and a max pooling layer with 2×2 convolution kernels and 2×2 stride; the fifth convolutional layer includes three convolutional kernels with a size of 3×3 and a stride of 2×2. A 1×1 convolutional layer with 512 convolution kernels and a max pooling layer with 2×2 convolution kernels and a stride of 2×2; the two fully connected layers each have 4096 nodes. It should be understood that the above-mentioned cross-modal neural network can adopt any structure, and the above-mentioned examples are not limiting.

Among them, in the training process of the cross-modal face recognition model, the convolution kernel and weight are randomly initialized, and the bias term is set to 0. The stochastic gradient descent (SGD) algorithm is used to update the network parameters and optimize the gradient of the above-mentioned cross-modal neural network. When the number of network iterations reaches a preset value, the training stops and the trained cross-modal neural network is saved.

It can be seen from the above embodiments that the cross-modal face recognition method provided by the present application adopts the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence. face image recognition, which can improve the recognition accuracy of face images obtained under different modalities of cameras.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Corresponding to the cross-modal face recognition method described in the above embodiment, FIG. 3 shows a structural block diagram of the cross-modal face recognition device provided by the embodiment of the present application. Example relevant part.

As shown in FIG. 3 , FIG. 3 is a schematic diagram of a cross-modal face recognition apparatus provided by an embodiment of the present application. The cross-modal face recognition device 300 includes:

The collection module 301 is used to collect the face image to be recognized;

A recognition module 302, configured to input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;

In an optional implementation manner, it also includes:

In an optional implementation, the first processing module includes:

In an optional implementation, the second processing module includes:

It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.

Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application. As shown in FIG. 4 , the cross-modal face recognition device 4 of this embodiment includes: at least one processor 40 (only one is shown in FIG. 4 ), a memory 41 , and a memory 41 stored in the memory 41 and available in the A computer program 42 running on at least one processor 40, when the processor 40 executes the computer program 42, the steps in the method embodiment described in FIG. 1 above are implemented.

The cross-modal face recognition device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The cross-modal face recognition device 4 may include, but is not limited to, a processor 40 and a memory 41 . Those skilled in the art can understand that FIG. 4 is only an example of the cross-modal face recognition device 4, and does not constitute a limitation to the cross-modal face recognition device 4, and may include more or less components than those shown in the figure. Alternatively, some components may be combined, or different components may also include, for example, input and output devices, network access devices, and the like.

The so-called processor 40 may be a central processing unit (Central Processing Unit, CPU), and the processor 40 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In some embodiments, the memory 41 may be an internal storage unit of the cross-modality face recognition device 4 , such as a hard disk or memory of the cross-modality face recognition device 4 . The memory 41 may also be an external storage device of the cross-modal face recognition device 4 in other embodiments, such as a plug-in hard disk equipped on the cross-modal face recognition device 4, a smart memory card. (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 41 may also include both an internal storage unit of the cross-modal face recognition device 4 and an external storage device. The memory 41 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as program codes of the computer program. The memory 41 can also be used to temporarily store data that has been output or will be output.

An embodiment of the present application also provides a network device, the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing The computer program implements the steps in any of the foregoing method embodiments.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.

The embodiments of the present application provide a computer program product, when the computer program product runs on a cross-modal face recognition device, the steps in the above method embodiments can be implemented when the cross-modal face recognition device is executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can be implemented by a computer program to instruct the relevant hardware. The computer program can be stored in a computer-readable storage medium, and the computer program When executed by the processor, the steps of the above-mentioned various method embodiments may be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/electronic device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media. For example, U disk, mobile hard disk, disk or CD, etc. In some jurisdictions, under legislation and patent practice, computer readable media may not be electrical carrier signals and telecommunications signals.

In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it is still possible to implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims

A cross-modal face recognition method, comprising:

Collect face images to be recognized;

Inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;

The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;

The preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model;

Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
The method of claim 1, wherein the cross-modal face recognition model comprises a preset number of convolutional layers and fully connected layers; the convolutional layers comprise convolutional layers for feature extraction and Max pooling layer for dimensionality reduction.
The method of claim 1, wherein before the acquiring the first training sample set, further comprising:

acquiring the first preset number of visible light face image sequences;

Perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
The method of claim 1, wherein before the acquiring the first training sample set, further comprising:

acquiring the second preset number of infrared face image sequences;

Perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
The method of claim 3, wherein performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence, comprising:

Converting the visible light face image sequence into a grayscale image;

The grayscale image is normalized to obtain the visible light face preprocessing image sequence.
The method according to claim 4, wherein performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence, comprising:

performing image contrast enhancement on the infrared face image sequence to obtain an enhanced infrared face image sequence;

The enhanced infrared face image sequence is normalized to obtain the infrared face preprocessing image sequence.
The cross-modal face recognition method according to claim 6, wherein the image contrast enhancement is performed on the infrared face image sequence to obtain an enhanced infrared face image sequence, comprising:

The histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
A cross-modal face recognition device, comprising:

The acquisition module is used to collect the face image to be recognized;

A recognition module, for inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;

The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;

The preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model;

Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
A cross-modal face recognition device, comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program A method as claimed in any one of claims 1 to 7 is carried out.
A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.