WO2022127111A1 - Cross-modal face recognition method, apparatus and device, and storage medium - Google Patents

Cross-modal face recognition method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2022127111A1
WO2022127111A1 PCT/CN2021/107933 CN2021107933W WO2022127111A1 WO 2022127111 A1 WO2022127111 A1 WO 2022127111A1 CN 2021107933 W CN2021107933 W CN 2021107933W WO 2022127111 A1 WO2022127111 A1 WO 2022127111A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
cross
modal
image sequence
face recognition
Prior art date
Application number
PCT/CN2021/107933
Other languages
French (fr)
Chinese (zh)
Inventor
陈碧辉
高通
钱贝贝
黄源浩
肖振中
Original Assignee
奥比中光科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 奥比中光科技集团股份有限公司 filed Critical 奥比中光科技集团股份有限公司
Publication of WO2022127111A1 publication Critical patent/WO2022127111A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present application belongs to the technical field of image processing, and in particular, relates to a cross-modal face recognition method, apparatus, device and storage medium.
  • the present application provides a cross-modal face recognition method, device, device and storage medium, which can solve the problem of low recognition accuracy of face images obtained under different modalities of cameras.
  • the present application provides a cross-modal face recognition method, including:
  • the training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
  • the preset cross-modal face recognition model is trained to obtain the first cross-modal face recognition model
  • the recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction.
  • the method before the acquiring the first training sample set, the method further includes:
  • the method before the acquiring the first training sample set, the method further includes:
  • performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence including:
  • the grayscale image is normalized to obtain the infrared face preprocessing image sequence.
  • performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence including:
  • the enhanced infrared face image sequence is normalized to obtain the infrared face preprocessing image sequence.
  • image contrast enhancement is performed on the infrared face image sequence to obtain an enhanced infrared face image sequence, including:
  • the histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
  • the present application provides a cross-modal face recognition device, including:
  • the acquisition module is used to collect the face image to be recognized
  • a recognition module for inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition
  • the training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
  • the preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model
  • the recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction.
  • a first acquisition module configured to acquire the first preset number of visible light face image sequences
  • the first processing module is configured to perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
  • a second acquisition module configured to acquire the first preset number of infrared face image sequences
  • the second processing module is configured to perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the first processing module includes:
  • a conversion unit for converting the visible light face image sequence into a grayscale image
  • the first processing unit is configured to perform normalization processing on the grayscale image to obtain the visible-light face preprocessing image sequence.
  • the second processing module includes:
  • an enhancement module configured to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence
  • the second processing unit is used for normalizing the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the enhancement module is specifically used for:
  • the histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
  • the present application provides a cross-modal face recognition device, the above-mentioned cross-modal face recognition device includes a memory, a processor, and a computer program stored in the above-mentioned memory and running on the above-mentioned processor, the above-mentioned
  • the processor executes the above-mentioned computer program, the steps of the method of the above-mentioned first aspect are implemented.
  • the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the method in the first aspect.
  • the present application provides a computer program product, wherein the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method of the first aspect are implemented.
  • the cross-modal face recognition method of the first aspect above uses a cross-modal face recognition model trained from a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence to perform a facial image recognition process. Face recognition can improve the accuracy of face image recognition obtained under different modes of cameras.
  • Fig. 1 is the realization flow chart of the cross-modal face recognition method provided by the embodiment of the present application.
  • Fig. 2 is the training process schematic diagram of the cross-modal face recognition model that pre-training is completed
  • FIG. 3 is a schematic diagram of a cross-modal face recognition device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting “.
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • FIG. 1 is an implementation flowchart of the cross-modal face recognition method provided by the embodiment of the present application. This implementation can be performed by cross-modal face recognition devices, including but not limited to self-service terminals, monitoring equipment, attendance equipment, and servers, robots, wearable devices or mobile terminal, etc. Details are as follows:
  • the face image to be recognized may be a face image collected in a visible light mode or an infrared mode.
  • a camera of a cross-modal face recognition device such as a camera of a mobile terminal or an attendance device, may collect a face image in a visible light mode, or collect a face image in an infrared mode.
  • S102 Input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition.
  • the pre-trained cross-modal face recognition model is obtained by pre-training the deep convolutional neural network through the face image in the visible light modality to obtain the pre-trained cross-modal image
  • the deep convolutional neural network provides prior knowledge for the training of the deep convolutional neural network of cross-modal images, and then the face image in the visible light mode and the face image in the infrared mode are formed according to preset rules. Tuple training set, and select the pre-trained cross-modal image deep convolutional neural network for fine-tuning, and iterate repeatedly until the performance of the pre-trained cross-modal image deep convolutional neural network is no longer improved.
  • a neural network model is obtained by pre-training the deep convolutional neural network through the face image in the visible light modality to obtain the pre-trained cross-modal image
  • the deep convolutional neural network provides prior knowledge for the training of the deep convolutional neural network of cross-modal images, and then the face image in the visible light mode and the face image in the infrared mode are formed according
  • FIG. 2 is a schematic diagram of the training process of the pre-trained cross-modal face recognition model.
  • the training process of the pre-trained cross-modal face recognition model includes the following steps:
  • S201 Obtain a first training sample set, where the first training sample set includes a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences.
  • a color camera or a multispectral camera can be used to collect a visible-light face image sequence including a human face.
  • visible light face images contain rich texture features and are easily affected by ambient light. Therefore, in some optional implementation manners, the visible light face preprocessing image sequence is obtained by acquiring a first preset number of visible light face image sequences; and performing pixel equalization processing on the visible light face image sequence.
  • performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence may include: collecting the face region including the visible light face image. Segmentation with the background area to obtain the visible face image.
  • the image detection model may be used to detect whether there is a face in the image to be processed.
  • the output result of the image detection model shows that there is no face in the image, it is not necessary to perform a face segmentation, and end face segmentation processing to reduce unnecessary workload.
  • the output result of the image detection model shows that there is a face in the image
  • it is possible to further screen the face that is, to determine whether there is a face that meets the preset conditions in the image, for example, it can be to determine whether the face meets the requirements.
  • the above requirements can be preset for the position and/or size of the face, for example, the size of the face area can be preset to meet the preset size, and it is considered to meet the requirements.
  • follow-up processing is performed, such as image rotation correction and object segmentation processing on the image.
  • the face does not meet the preset conditions, the image can not be segmented.
  • a series of visible light face images are processed as above to obtain a visible light face image sequence, and further preprocessing of the visible light face image sequence can obtain a visible light face preprocessed image sequence.
  • preprocessing the visible light face image sequence may include: performing grayscale conversion and normalization processing on the visible light face image sequence to obtain a visible light face preprocessing image sequence.
  • the face images in the visible light face image sequence are converted into grayscale images by grayscale.
  • grayscale conversion can be performed by a preset grayscale conversion formula, and the preset grayscale conversion formula can be expressed as:
  • Igray is the grayscale image output after grayscale conversion
  • R, G, and B are the RGB values corresponding to the image before grayscale conversion.
  • the converted grayscale image is further subjected to normalization processing, exemplarily, normalization processing is performed by using a preset normalization processing formula.
  • S202 Train a preset cross-modal face recognition model according to the visible-light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model.
  • the cross-modal neural network can be pre-trained by using the visible light face preprocessing image sequence.
  • the visible light face preprocessing image sequence is divided into two parts: a training set and a validation set. Wherein, the training set and the verification set do not overlap, the training set is used to train the preset cross-modal face recognition model, and the verification is carried out through the training of the preset cross-modal face recognition model in the verification set.
  • a classification loss function by continuously saving the neural network that minimizes the loss of the validation set to determine the first cross-modal neural network finally trained in this embodiment, and the first cross-modal neural network is the first cross-modal neural network. face recognition model.
  • S203 Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modality face recognition model, and classify the first cross-modality face recognition model based on the first classification loss function
  • the face recognition model is retrained to obtain a second cross-modal face recognition model.
  • the infrared face preprocessing image sequence is obtained by preprocessing the infrared face image sequence.
  • the method includes: acquiring the second preset number of infrared face image sequences; performing pixel equalization processing on the infrared face image sequences to obtain the infrared face pre-set number. Process image sequences.
  • performing pixel equalization processing on the infrared face image sequence includes: performing image contrast enhancement and normalization processing on the infrared face image sequence.
  • histogram equalization may be performed on the infrared face image sequence to enhance image contrast.
  • histogram equalization is a method to enhance the image contrast by stretching the pixel intensity distribution range.
  • a logarithmic function and a power function can also be used to transform the infrared face image sequence to enhance the image contrast.
  • the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the process of normalizing the infrared face image sequence is the same as the process of normalizing the visible light face image, and details are not repeated here.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers. Among them, there can be any number of convolutional layers before the fully connected layer.
  • the cross-modal face recognition model includes five convolutional layers and one fully connected layer; wherein, the first convolutional layer and the second convolutional layer each include two convolutional layers for feature extraction.
  • the third to fifth convolutional layers include three convolutional layers for feature extraction and one maximum pooling layer for dimensionality reduction, each layer
  • the feature maps after the operation are all subjected to nonlinear activation functions; the visible light preprocessing image sequence and the infrared preprocessing image sequence are subjected to convolution operation to extract feature values through the convolution layer, and then the face feature vector is output through the fully connected layer.
  • the first convolutional layer may include two convolutional layers with a convolution kernel size of 3 ⁇ 3, a stride of 1 ⁇ 1, and a convolutional kernel number of 64, and one convolutional kernel of 2 ⁇ 2.
  • a maximum pooling layer with stride 2 ⁇ 2 the second convolutional layer includes two convolutional layers with kernel size 3 ⁇ 3, stride 1 ⁇ 1, and the number of convolution kernels 128 and a volume
  • the third convolutional layer includes three convolution kernels with a size of 3 ⁇ 3, a stride of 1 ⁇ 1, and a number of convolution kernels of 256.
  • the fourth convolutional layer includes three convolutional kernels of size 3 ⁇ 3, a stride of 1 ⁇ 1, and a convolutional A convolutional layer with 512 kernels and a max pooling layer with 2 ⁇ 2 convolution kernels and 2 ⁇ 2 stride;
  • the fifth convolutional layer includes three convolutional kernels with a size of 3 ⁇ 3 and a stride of 2 ⁇ 2.
  • a 1 ⁇ 1 convolutional layer with 512 convolution kernels and a max pooling layer with 2 ⁇ 2 convolution kernels and a stride of 2 ⁇ 2; the two fully connected layers each have 4096 nodes.
  • the convolution kernel and weight are randomly initialized, and the bias term is set to 0.
  • the stochastic gradient descent (SGD) algorithm is used to update the network parameters and optimize the gradient of the above-mentioned cross-modal neural network.
  • the training stops and the trained cross-modal neural network is saved.
  • the cross-modal face recognition method provided by the present application adopts the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence. face image recognition, which can improve the recognition accuracy of face images obtained under different modalities of cameras.
  • FIG. 3 shows a structural block diagram of the cross-modal face recognition device provided by the embodiment of the present application. Example relevant part.
  • FIG. 3 is a schematic diagram of a cross-modal face recognition apparatus provided by an embodiment of the present application.
  • the cross-modal face recognition device 300 includes:
  • the collection module 301 is used to collect the face image to be recognized
  • a recognition module 302 configured to input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition
  • the training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
  • the preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model
  • the recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  • the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction.
  • a first acquisition module configured to acquire the first preset number of visible light face image sequences
  • the first processing module is configured to perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
  • a second acquisition module configured to acquire the first preset number of infrared face image sequences
  • the second processing module is configured to perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the first processing module includes:
  • a conversion unit for converting the visible light face image sequence into a grayscale image
  • the first processing unit is configured to perform normalization processing on the grayscale image to obtain the visible-light face preprocessing image sequence.
  • the second processing module includes:
  • an enhancement module configured to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence
  • the second processing unit is used for normalizing the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
  • the enhancement module is specifically used for:
  • the histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
  • FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application.
  • the cross-modal face recognition device 4 of this embodiment includes: at least one processor 40 (only one is shown in FIG. 4 ), a memory 41 , and a memory 41 stored in the memory 41 and available in the A computer program 42 running on at least one processor 40, when the processor 40 executes the computer program 42, the steps in the method embodiment described in FIG. 1 above are implemented.
  • the cross-modal face recognition device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the cross-modal face recognition device 4 may include, but is not limited to, a processor 40 and a memory 41 .
  • FIG. 4 is only an example of the cross-modal face recognition device 4, and does not constitute a limitation to the cross-modal face recognition device 4, and may include more or less components than those shown in the figure. Alternatively, some components may be combined, or different components may also include, for example, input and output devices, network access devices, and the like.
  • the so-called processor 40 may be a central processing unit (Central Processing Unit, CPU), and the processor 40 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 41 may be an internal storage unit of the cross-modality face recognition device 4 , such as a hard disk or memory of the cross-modality face recognition device 4 .
  • the memory 41 may also be an external storage device of the cross-modal face recognition device 4 in other embodiments, such as a plug-in hard disk equipped on the cross-modal face recognition device 4, a smart memory card. (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 41 may also include both an internal storage unit of the cross-modal face recognition device 4 and an external storage device.
  • the memory 41 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as program codes of the computer program.
  • the memory 41 can also be used to temporarily store data that has been output or will be output.
  • An embodiment of the present application also provides a network device, the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing The computer program implements the steps in any of the foregoing method embodiments.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
  • the embodiments of the present application provide a computer program product, when the computer program product runs on a cross-modal face recognition device, the steps in the above method embodiments can be implemented when the cross-modal face recognition device is executed.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • all or part of the processes in the methods of the above embodiments can be implemented by a computer program to instruct the relevant hardware.
  • the computer program can be stored in a computer-readable storage medium, and the computer program When executed by the processor, the steps of the above-mentioned various method embodiments may be implemented.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/electronic device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunication signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunications signals.
  • the disclosed apparatus/network device and method may be implemented in other manners.
  • the apparatus/network device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

A cross-modal face recognition method, apparatus and device, and a storage medium. The method comprises: by using a cross-modal face recognition model obtained by training a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence, performing face recognition on a face image to be recognized. Thus, the recognition accuracy of face images obtained under cameras of different modalities can be improved.

Description

跨模态人脸识别方法、装置、设备及存储介质Cross-modal face recognition method, device, device and storage medium 技术领域technical field
本申请属于图像处理技术领域,尤其涉及一种跨模态人脸识别方法、装置、设备及存储介质。The present application belongs to the technical field of image processing, and in particular, relates to a cross-modal face recognition method, apparatus, device and storage medium.
背景技术Background technique
人脸识别的准确性受周围环境光照的影响较大。常见的人脸识别技术主要是针对不受环境光照影响的近红外摄像头拍摄的人脸图像进行识别。但是,在现实环境中经常会出现光照不均以及光照不佳的情况,这就需要针对在不同模态的摄像头下获取的图像进行识别,而目前的人脸识别技术无法对不同模态的摄像头下获取的图像进行准确性识别。因此,现有技术存在对不同模态的摄像头下获取的人脸图像识别准确率不高的问题。The accuracy of face recognition is greatly affected by ambient lighting. Common face recognition technologies are mainly used to recognize face images captured by near-infrared cameras that are not affected by ambient light. However, in the real environment, uneven lighting and poor lighting often occur, which requires recognition of images obtained under different modalities of cameras, and the current face recognition technology cannot detect different modalities of cameras. Accurate identification of the images acquired under. Therefore, in the prior art, there is a problem that the recognition accuracy of face images obtained under different modalities of cameras is not high.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种跨模态人脸识别方法、装置、设备及存储介质,能够解决对不同模态的摄像头下获取的人脸图像识别准确率不高的问题。The present application provides a cross-modal face recognition method, device, device and storage medium, which can solve the problem of low recognition accuracy of face images obtained under different modalities of cameras.
第一方面,本申请提供了一种跨模态人脸识别方法,包括:In a first aspect, the present application provides a cross-modal face recognition method, including:
采集待识别的人脸图像;Collect face images to be recognized;
将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;Inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;
其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态 人脸识别模型进行训练,得到第一跨模态人脸识别模型;According to the visible light face preprocessing image sequence and the first classification loss function, the preset cross-modal face recognition model is trained to obtain the first cross-modal face recognition model;
将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
在一可选的实现方式中,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。In an optional implementation, the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction. The max pooling layer of .
在一可选的实现方式中,在所述获取第一训练样本集之前,还包括:In an optional implementation manner, before the acquiring the first training sample set, the method further includes:
获取所述第一预设数量的可见光人脸图像序列;acquiring the first preset number of visible light face image sequences;
对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。Perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
在一可选的实现方式中,在所述获取第一训练样本集之前,还包括:In an optional implementation manner, before the acquiring the first training sample set, the method further includes:
获取所述第二预设数量的红外人脸图像序列;acquiring the second preset number of infrared face image sequences;
对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。Perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
在一可选的实现方式中,对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列,包括:In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence, including:
对所述可见光人脸图像序列转换为灰度图像;Converting the visible light face image sequence into a grayscale image;
对所述灰度图像进行归一化处理,得到所述红外人脸预处理图像序列。The grayscale image is normalized to obtain the infrared face preprocessing image sequence.
在一可选的实现方式中,对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列,包括:In an optional implementation manner, performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence, including:
对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;performing image contrast enhancement on the infrared face image sequence to obtain an enhanced infrared face image sequence;
对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。The enhanced infrared face image sequence is normalized to obtain the infrared face preprocessing image sequence.
在一可选的实现方式中,对所述红外人脸图像序列进行图像对比度增强, 得到增强后的红外人脸图像序列,包括:In an optional implementation manner, image contrast enhancement is performed on the infrared face image sequence to obtain an enhanced infrared face image sequence, including:
对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。The histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
第二方面,本申请提供了跨模态人脸识别装置,包括:In a second aspect, the present application provides a cross-modal face recognition device, including:
采集模块,用于采集待识别的人脸图像;The acquisition module is used to collect the face image to be recognized;
识别模块,用于将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;A recognition module, for inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;
其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;The preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model;
将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
在一可选的实现方式中,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。In an optional implementation, the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction. The max pooling layer of .
在一可选的实现方式中,还包括:In an optional implementation manner, it also includes:
第一获取模块,用于获取所述第一预设数量的可见光人脸图像序列;a first acquisition module, configured to acquire the first preset number of visible light face image sequences;
第一处理模块,用于对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。The first processing module is configured to perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
在一可选的实现方式中,还包括:In an optional implementation manner, it also includes:
第二获取模块,用于获取所述第一预设数量的红外人脸图像序列;a second acquisition module, configured to acquire the first preset number of infrared face image sequences;
第二处理模块,用于对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。The second processing module is configured to perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
在一可选的实现方式中,所述第一处理模块,包括:In an optional implementation, the first processing module includes:
转换单元,用于对所述可见光人脸图像序列转换为灰度图像;a conversion unit, for converting the visible light face image sequence into a grayscale image;
第一处理单元,用于对所述灰度图像进行归一化处理,得到所述可见光人脸预处理图像序列。The first processing unit is configured to perform normalization processing on the grayscale image to obtain the visible-light face preprocessing image sequence.
在一可选的实现方式中,所述第二处理模块,包括:In an optional implementation, the second processing module includes:
增强模块,用于对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;an enhancement module, configured to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence;
第二处理单元,用于对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。The second processing unit is used for normalizing the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
在一可选的实现方式中,所述增强模块,具体用于:In an optional implementation manner, the enhancement module is specifically used for:
对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。The histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
第三方面,本申请提供了一种跨模态人脸识别设备,上述跨模态人脸识别设备包括存储器、处理器以及存储在上述存储器中并可在上述处理器上运行的计算机程序,上述处理器执行上述计算机程序时实现如上述第一方面的方法的步骤。In a third aspect, the present application provides a cross-modal face recognition device, the above-mentioned cross-modal face recognition device includes a memory, a processor, and a computer program stored in the above-mentioned memory and running on the above-mentioned processor, the above-mentioned When the processor executes the above-mentioned computer program, the steps of the method of the above-mentioned first aspect are implemented.
第四方面,本申请提供了一种计算机可读存储介质,上述计算机可读存储介质存储有计算机程序,上述计算机程序被处理器执行时实现如上述第一方面的方法的步骤。In a fourth aspect, the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the steps of the method in the first aspect.
第五方面,本申请提供了一种计算机程序产品,上述计算机程序产品包括计算机程序,上述计算机程序被一个或多个处理器执行时实现如上述第一方面的方法的步骤。In a fifth aspect, the present application provides a computer program product, wherein the computer program product includes a computer program, and when the computer program is executed by one or more processors, the steps of the method of the first aspect are implemented.
上述第一方面的跨模态人脸识别方法,通过采用由可见光人脸预处理图像序列和红外光人脸预处理图像序列训练完成的跨模态人脸识别模型,对待识别的人脸图像进行人脸识别,能够提高对不同模态的摄像头下获取的人脸图像识别的准确率。The cross-modal face recognition method of the first aspect above uses a cross-modal face recognition model trained from a visible light face preprocessing image sequence and an infrared light face preprocessing image sequence to perform a facial image recognition process. Face recognition can improve the accuracy of face image recognition obtained under different modes of cameras.
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述 It can be understood that, for the beneficial effects of the foregoing second aspect to the fifth aspect, reference may be made to the relevant descriptions in the foregoing first aspect, and details are not described herein again .
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only for the present application. In some embodiments, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.
图1是本申请实施例提供的跨模态人脸识别方法的实现流程图;Fig. 1 is the realization flow chart of the cross-modal face recognition method provided by the embodiment of the present application;
图2是预先训练完成的跨模态人脸识别模型的训练过程示意图;Fig. 2 is the training process schematic diagram of the cross-modal face recognition model that pre-training is completed;
图3是本申请实施例提供的跨模态人脸识别装置的示意图;3 is a schematic diagram of a cross-modal face recognition device provided by an embodiment of the present application;
图4是本申请实施例提供的跨模态人脸识别设备的结构示意图。FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application.
具体实施方式Detailed ways
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are set forth in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to those skilled in the art that the present application may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It is to be understood that, when used in this specification and the appended claims, the term "comprising" indicates the presence of the described feature, integer, step, operation, element and/or component, but does not exclude one or more other The presence or addition of features, integers, steps, operations, elements, components and/or sets thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It will also be understood that, as used in this specification and the appended claims, the term "and/or" refers to and including any and all possible combinations of one or more of the associated listed items.
如在本申请说明书和所附权利要求书中所使用的那样,术语“如果”可以依 据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in the specification of this application and the appended claims, the term "if" may be contextually interpreted as "when" or "once" or "in response to determining" or "in response to detecting ". Similarly, the phrases "if it is determined" or "if the [described condition or event] is detected" may be interpreted, depending on the context, to mean "once it is determined" or "in response to the determination" or "once the [described condition or event] is detected. ]" or "in response to detection of the [described condition or event]".
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the specification of the present application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and should not be construed as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。References in this specification to "one embodiment" or "some embodiments" and the like mean that a particular feature, structure or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," "in other embodiments," etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean "one or more but not all embodiments" unless specifically emphasized otherwise. The terms "including", "including", "having" and their variants mean "including but not limited to" unless specifically emphasized otherwise.
下面结合具体实施例对本申请提供的跨模态人脸识别方法进行示例性的说明。如图1所示,图1是本申请实施例提供的跨模态人脸识别方法的实现流程图。本实施了可以由跨模态人脸识别设备执行,所述跨模态人脸识别设备包括但不限于自助终端、监控设备、考勤设备以及各种应用场景下的服务器、机器人、可穿戴设备或者移动终端等。详述如下:The cross-modal face recognition method provided by the present application is exemplarily described below with reference to specific embodiments. As shown in FIG. 1 , FIG. 1 is an implementation flowchart of the cross-modal face recognition method provided by the embodiment of the present application. This implementation can be performed by cross-modal face recognition devices, including but not limited to self-service terminals, monitoring equipment, attendance equipment, and servers, robots, wearable devices or mobile terminal, etc. Details are as follows:
S101,采集待识别的人脸图像。S101, collect a face image to be recognized.
在本申请的实施例中,所述待识别的人脸图像可以是可见光模态下或者红外模态下采集的人脸图像。示例性地,可以通过跨模态人脸识别设备的摄像机,例如移动终端或者考勤设备的摄像机采集可见光模态下的人脸图像,或者采集红外模态下的人脸图像。In the embodiment of the present application, the face image to be recognized may be a face image collected in a visible light mode or an infrared mode. Exemplarily, a camera of a cross-modal face recognition device, such as a camera of a mobile terminal or an attendance device, may collect a face image in a visible light mode, or collect a face image in an infrared mode.
S102将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别。S102 Input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition.
在本申请的实施例中,所述预先训练完成的跨模态人脸识别模型为通过可 见光模态下的人脸图像对深度卷积神经网络进行预训练之后,得到预训练的跨模态图像的深度卷积神经网络,为跨模态图像的深度卷积神经网络的训练提供先验知识,然后将可见光模态下的人脸图像和红外模态下的人脸图像按照预设规则构成二元组训练集,并挑选出预训练的跨模态图像深度卷积神经网络进行精调,反复迭代,直到预训练的跨模态图像深度卷积神经网络的性能不再提升之后得到的深度卷积神经网络模型。In the embodiment of the present application, the pre-trained cross-modal face recognition model is obtained by pre-training the deep convolutional neural network through the face image in the visible light modality to obtain the pre-trained cross-modal image The deep convolutional neural network provides prior knowledge for the training of the deep convolutional neural network of cross-modal images, and then the face image in the visible light mode and the face image in the infrared mode are formed according to preset rules. Tuple training set, and select the pre-trained cross-modal image deep convolutional neural network for fine-tuning, and iterate repeatedly until the performance of the pre-trained cross-modal image deep convolutional neural network is no longer improved. A neural network model.
其中,图2是预先训练完成的跨模态人脸识别模型的训练过程示意图。如图2所示,所述预先训练完成的跨模态人脸识别模型的训练过程包括如下步骤:Among them, FIG. 2 is a schematic diagram of the training process of the pre-trained cross-modal face recognition model. As shown in Figure 2, the training process of the pre-trained cross-modal face recognition model includes the following steps:
S201,获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列。S201: Obtain a first training sample set, where the first training sample set includes a first preset number of visible light face preprocessing image sequences and a second preset number of infrared face preprocessing image sequences.
需要说明的是,可使用彩色相机或多光谱相机采集包含人脸的可见光人脸图像序列。其中,可见光人脸图像包含丰富的纹理特征且容易受环境光影响。因此,在一些可选的实现方式中,通过获取第一预设数量的可见光人脸图像序列;并对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。It should be noted that a color camera or a multispectral camera can be used to collect a visible-light face image sequence including a human face. Among them, visible light face images contain rich texture features and are easily affected by ambient light. Therefore, in some optional implementation manners, the visible light face preprocessing image sequence is obtained by acquiring a first preset number of visible light face image sequences; and performing pixel equalization processing on the visible light face image sequence.
在一可选的实现方式中,对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列,可以包括:将采集到的包括可见光人脸图像中的人脸区域与背景区域进行分割,以获取见光人脸图像。In an optional implementation manner, performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence, which may include: collecting the face region including the visible light face image. Segmentation with the background area to obtain the visible face image.
在本申请的一些实施例中,在进行图像分割之前,可以首先通过图像检测模型检测待处理图像中是否有人脸,当图像检测模型的输出结果显示图像中没有人脸时,则不必进行人脸分割,同时结束人脸分割处理,以减少不必要的工作量。当图像检测模型的输出结果显示图像中有人脸时,还可以进一步进行人脸的筛选,即判断图像中是否存在符合预设条件的人脸,例如,可以是判断人脸是否符合要求,具体的,可以针对人脸的位置和/或大小来预先设定上述要求,如可以预先设定人脸区域大小满足预设大小时才认为符合要求。当人脸符合预设条件时,则执行后续处理,如进行图像的旋转校正、对图像进行对象分割处 理,当人脸不符合预设条件时,则可以不对图像进行分割处理。In some embodiments of the present application, before image segmentation, the image detection model may be used to detect whether there is a face in the image to be processed. When the output result of the image detection model shows that there is no face in the image, it is not necessary to perform a face segmentation, and end face segmentation processing to reduce unnecessary workload. When the output result of the image detection model shows that there is a face in the image, it is possible to further screen the face, that is, to determine whether there is a face that meets the preset conditions in the image, for example, it can be to determine whether the face meets the requirements. , the above requirements can be preset for the position and/or size of the face, for example, the size of the face area can be preset to meet the preset size, and it is considered to meet the requirements. When the face meets the preset conditions, follow-up processing is performed, such as image rotation correction and object segmentation processing on the image. When the face does not meet the preset conditions, the image can not be segmented.
在本申请的一些实施例中,将一系列可见光人脸图像作如上处理,即可获取可见光人脸图像序列,进一步对可见光人脸图像序列进行预处理可以得到可见光人脸预处理图像序列。可选地,对所述可见光人脸图像序列进行预处理,可以包括:对所述可见光人脸图像序列进行灰度转换和归一化处理,以得到可见光人脸预处理图像序列。In some embodiments of the present application, a series of visible light face images are processed as above to obtain a visible light face image sequence, and further preprocessing of the visible light face image sequence can obtain a visible light face preprocessed image sequence. Optionally, preprocessing the visible light face image sequence may include: performing grayscale conversion and normalization processing on the visible light face image sequence to obtain a visible light face preprocessing image sequence.
其中,将可见光人脸图像序列中的人脸图像进行灰度转换为灰度图像。可选地,可通过预设灰度转换的公式进行灰度转换,所述预设灰度转换公式可以表示为:Among them, the face images in the visible light face image sequence are converted into grayscale images by grayscale. Optionally, grayscale conversion can be performed by a preset grayscale conversion formula, and the preset grayscale conversion formula can be expressed as:
I gary=0.2989×R+0.5870×G+0.1140×B I gary = 0.2989×R+0.5870×G+0.1140×B
其中,Igray为灰度转换后灰度图像输出,R、G、B为灰度转换前图像对应的RGB值。Among them, Igray is the grayscale image output after grayscale conversion, and R, G, and B are the RGB values corresponding to the image before grayscale conversion.
进一步将转换得到的灰度图像进行归一化处理,示例性地,通过预设的归一化处理公式进行归一化处理。The converted grayscale image is further subjected to normalization processing, exemplarily, normalization processing is performed by using a preset normalization processing formula.
S202,根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型。S202: Train a preset cross-modal face recognition model according to the visible-light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model.
由于可见光模态下的人脸较复杂,因此可先使用可见光人脸预处理图像序列对跨模态神经网络进行预训练。在本申请的一个实施例中,将可见光人脸预处理图像序列分为训练集、验证集两部分。其中,训练集与验证集不重合,使用训练集对所述预设跨模态人脸识别模型进行训练,通过验证集所述预设跨模态人脸识别模型的训练进行验证,同时构建第一分类损失函数,通过不断保存使验证集损失最小的神经网络确定为本实施例最终训练得到的第一跨模态神经网络,该第一跨模态神经网络为所述第一跨模态人脸识别模型。Since the face in the visible light modality is more complex, the cross-modal neural network can be pre-trained by using the visible light face preprocessing image sequence. In an embodiment of the present application, the visible light face preprocessing image sequence is divided into two parts: a training set and a validation set. Wherein, the training set and the verification set do not overlap, the training set is used to train the preset cross-modal face recognition model, and the verification is carried out through the training of the preset cross-modal face recognition model in the verification set. A classification loss function, by continuously saving the neural network that minimizes the loss of the validation set to determine the first cross-modal neural network finally trained in this embodiment, and the first cross-modal neural network is the first cross-modal neural network. face recognition model.
S203,将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型。S203: Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modality face recognition model, and classify the first cross-modality face recognition model based on the first classification loss function The face recognition model is retrained to obtain a second cross-modal face recognition model.
需要说明的是,所述红外人脸预处理图像序列为对红外人脸图像序列进行预处理得到。示例性地,在获取第二训练样本集之前,包括:获取所述第二预设数量的红外人脸图像序列;对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。It should be noted that the infrared face preprocessing image sequence is obtained by preprocessing the infrared face image sequence. Exemplarily, before acquiring the second training sample set, the method includes: acquiring the second preset number of infrared face image sequences; performing pixel equalization processing on the infrared face image sequences to obtain the infrared face pre-set number. Process image sequences.
其中,对所述红外人脸图像序列进行像素均衡处理包括:对所述红外人脸图像序列进行图像对比度增强以及归一化处理。在本申请的一些实施例中,可以对红外人脸图像序列进行直方图均衡化来增强图像对比度。其中,直方图均衡化是一种通过拉伸像素强度分布范围来增强图像对比度的方法。在其它一些实施例中,还可以对红外人脸图像序列采用对数函数和幂函数进行转换来增强图像对比度。Wherein, performing pixel equalization processing on the infrared face image sequence includes: performing image contrast enhancement and normalization processing on the infrared face image sequence. In some embodiments of the present application, histogram equalization may be performed on the infrared face image sequence to enhance image contrast. Among them, histogram equalization is a method to enhance the image contrast by stretching the pixel intensity distribution range. In some other embodiments, a logarithmic function and a power function can also be used to transform the infrared face image sequence to enhance the image contrast.
此外,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。In addition, the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
在本申请实例之后,对红外人脸图像序列进行归一化处理的过程与对可见光人脸图像进行归一化处理的过程相同,在此不再赘述。Following the example of this application, the process of normalizing the infrared face image sequence is the same as the process of normalizing the visible light face image, and details are not repeated here.
在本申请的一些实施例中,所述跨模态人脸识别模型包括预设数量卷积层和全连接层。其中,在全连接层前可有任意层数的卷积层。示例性地,所述跨模态人脸识别模型包括五个卷积层和一个全连接层;其中,第一卷积层和第二卷积层均包括两个用于特征提取的卷积层和一个用于降维的最大池化层,第三卷积层至第五卷积层均包括三个用于特征提取的卷积层和一个用于降维的最大池化层,每一层运算操作后的特征图都经过非线性激活函数;可见光预处理图像序列与红外预处理图像序列经过卷积层进行卷积操作提取特征值,之后经过全连接层输出人脸特征向量。In some embodiments of the present application, the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers. Among them, there can be any number of convolutional layers before the fully connected layer. Exemplarily, the cross-modal face recognition model includes five convolutional layers and one fully connected layer; wherein, the first convolutional layer and the second convolutional layer each include two convolutional layers for feature extraction. and a maximum pooling layer for dimensionality reduction, the third to fifth convolutional layers include three convolutional layers for feature extraction and one maximum pooling layer for dimensionality reduction, each layer The feature maps after the operation are all subjected to nonlinear activation functions; the visible light preprocessing image sequence and the infrared preprocessing image sequence are subjected to convolution operation to extract feature values through the convolution layer, and then the face feature vector is output through the fully connected layer.
示例性的,所述第一卷积层可以包括两个卷积核尺寸为3×3、步长为1×1、卷积核数量为64的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第二卷积层包括两个卷积核尺寸为3×3、步长为1×1、卷积核数量为128的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第三卷积层包括三个卷积 核尺寸为3×3、步长为1×1、卷积核数量为256的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第四卷积层包括三个卷积核尺寸为3×3、步长为1×1、卷积核数量为512的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;第五卷积层包括三个卷积核尺寸为3×3、步长为1×1、卷积核数量为512的卷积层和一个卷积核为2×2、步长为2×2的最大池化层;两个全连接层各有4096个节点。应该了解的是,上述跨模态神经网络可采用任意结构,上述例子不具有限制作用。Exemplarily, the first convolutional layer may include two convolutional layers with a convolution kernel size of 3×3, a stride of 1×1, and a convolutional kernel number of 64, and one convolutional kernel of 2×2. , a maximum pooling layer with stride 2×2; the second convolutional layer includes two convolutional layers with kernel size 3×3, stride 1×1, and the number of convolution kernels 128 and a volume The maximum pooling layer with a kernel of 2×2 and a stride of 2×2; the third convolutional layer includes three convolution kernels with a size of 3×3, a stride of 1×1, and a number of convolution kernels of 256. A convolutional layer and a max-pooling layer with a convolution kernel of 2×2 and a stride of 2×2; the fourth convolutional layer includes three convolutional kernels of size 3×3, a stride of 1×1, and a convolutional A convolutional layer with 512 kernels and a max pooling layer with 2×2 convolution kernels and 2×2 stride; the fifth convolutional layer includes three convolutional kernels with a size of 3×3 and a stride of 2×2. A 1×1 convolutional layer with 512 convolution kernels and a max pooling layer with 2×2 convolution kernels and a stride of 2×2; the two fully connected layers each have 4096 nodes. It should be understood that the above-mentioned cross-modal neural network can adopt any structure, and the above-mentioned examples are not limiting.
其中,在跨模态人脸识别模型的训练过程中,卷积核和权重进行随机初始化,偏置项置为0。采用随机梯度下降(SGD)算法对上述跨模态神经网络进行网络参数的更新和梯度的优化,当网络迭代次数达到预设值时,训练停止并保存训练好的跨模态神经网络。Among them, in the training process of the cross-modal face recognition model, the convolution kernel and weight are randomly initialized, and the bias term is set to 0. The stochastic gradient descent (SGD) algorithm is used to update the network parameters and optimize the gradient of the above-mentioned cross-modal neural network. When the number of network iterations reaches a preset value, the training stops and the trained cross-modal neural network is saved.
通过上述实施例可知,本申请提供的跨模态人脸识别方法,通过采用由可见光人脸预处理图像序列和红外光人脸预处理图像序列训练完成的跨模态人脸识别模型,对待识别的人脸图像进行人脸识别,能够提高对不同模态的摄像头下获取的人脸图像识别的准确率。It can be seen from the above embodiments that the cross-modal face recognition method provided by the present application adopts the cross-modal face recognition model trained by the visible light face preprocessing image sequence and the infrared light face preprocessing image sequence. face image recognition, which can improve the recognition accuracy of face images obtained under different modalities of cameras.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
对应于上文实施例所述的跨模态人脸识别方法,图3示出了本申请实施例提供的跨模态人脸识别装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the cross-modal face recognition method described in the above embodiment, FIG. 3 shows a structural block diagram of the cross-modal face recognition device provided by the embodiment of the present application. Example relevant part.
如图3所示,图3是本申请实施例提供的跨模态人脸识别装置的示意图。该跨模态人脸识别装置300包括:As shown in FIG. 3 , FIG. 3 is a schematic diagram of a cross-modal face recognition apparatus provided by an embodiment of the present application. The cross-modal face recognition device 300 includes:
采集模块301,用于采集待识别的人脸图像;The collection module 301 is used to collect the face image to be recognized;
识别模块302,用于将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;A recognition module 302, configured to input the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;
其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;The preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model;
将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
在一可选的实现方式中,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。In an optional implementation, the cross-modal face recognition model includes a preset number of convolutional layers and fully connected layers; the convolutional layers include a convolutional layer for feature extraction and a convolutional layer for dimensionality reduction. The max pooling layer of .
在一可选的实现方式中,还包括:In an optional implementation manner, it also includes:
第一获取模块,用于获取所述第一预设数量的可见光人脸图像序列;a first acquisition module, configured to acquire the first preset number of visible light face image sequences;
第一处理模块,用于对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。The first processing module is configured to perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
在一可选的实现方式中,还包括:In an optional implementation manner, it also includes:
第二获取模块,用于获取所述第一预设数量的红外人脸图像序列;a second acquisition module, configured to acquire the first preset number of infrared face image sequences;
第二处理模块,用于对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。The second processing module is configured to perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
在一可选的实现方式中,所述第一处理模块,包括:In an optional implementation, the first processing module includes:
转换单元,用于对所述可见光人脸图像序列转换为灰度图像;a conversion unit, for converting the visible light face image sequence into a grayscale image;
第一处理单元,用于对所述灰度图像进行归一化处理,得到所述可见光人脸预处理图像序列。The first processing unit is configured to perform normalization processing on the grayscale image to obtain the visible-light face preprocessing image sequence.
在一可选的实现方式中,所述第二处理模块,包括:In an optional implementation, the second processing module includes:
增强模块,用于对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;an enhancement module, configured to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence;
第二处理单元,用于对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。The second processing unit is used for normalizing the enhanced infrared face image sequence to obtain the infrared face preprocessing image sequence.
在一可选的实现方式中,所述增强模块,具体用于:In an optional implementation manner, the enhancement module is specifically used for:
对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。The histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information exchange, execution process and other contents between the above-mentioned devices/units are based on the same concept as the method embodiments of the present application. For specific functions and technical effects, please refer to the method embodiments section. It is not repeated here.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated in one processing unit, or each unit may exist physically alone, or two or more units may be integrated in one unit, and the above-mentioned integrated units may adopt hardware. It can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working processes of the units and modules in the above-mentioned system, reference may be made to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
图4是本申请实施例提供的跨模态人脸识别设备的结构示意图。如图4所示,该实施例的跨模态人脸识别设备4包括:至少一个处理器40(图4中仅示出一个)、存储器41以及存储在所述存储器41中并可在所述至少一个处理器40上运行的计算机程序42,所述处理器40执行所述计算机程序42时实现上述图1所述方法实施例中的步骤。FIG. 4 is a schematic structural diagram of a cross-modal face recognition device provided by an embodiment of the present application. As shown in FIG. 4 , the cross-modal face recognition device 4 of this embodiment includes: at least one processor 40 (only one is shown in FIG. 4 ), a memory 41 , and a memory 41 stored in the memory 41 and available in the A computer program 42 running on at least one processor 40, when the processor 40 executes the computer program 42, the steps in the method embodiment described in FIG. 1 above are implemented.
所述跨模态人脸识别设备4可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。该跨模态人脸识别设备4可包括,但不仅限于,处理器40、存储器41。本领域技术人员可以理解,图4仅仅是跨模态人脸识别设备4 的举例,并不构成对跨模态人脸识别设备4的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。The cross-modal face recognition device 4 may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The cross-modal face recognition device 4 may include, but is not limited to, a processor 40 and a memory 41 . Those skilled in the art can understand that FIG. 4 is only an example of the cross-modal face recognition device 4, and does not constitute a limitation to the cross-modal face recognition device 4, and may include more or less components than those shown in the figure. Alternatively, some components may be combined, or different components may also include, for example, input and output devices, network access devices, and the like.
所称处理器40可以是中央处理单元(Central Processing Unit,CPU),该处理器40还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The so-called processor 40 may be a central processing unit (Central Processing Unit, CPU), and the processor 40 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuits) , ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
所述存储器41在一些实施例中可以是所述跨模态人脸识别设备4的内部存储单元,例如跨模态人脸识别设备4的硬盘或内存。所述存储器41在另一些实施例中也可以是所述跨模态人脸识别设备4的外部存储设备,例如所述跨模态人脸识别设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器41还可以既包括所述跨模态人脸识别设备4的内部存储单元也包括外部存储设备。所述存储器41用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述计算机程序的程序代码等。所述存储器41还可以用于暂时地存储已经输出或者将要输出的数据。In some embodiments, the memory 41 may be an internal storage unit of the cross-modality face recognition device 4 , such as a hard disk or memory of the cross-modality face recognition device 4 . The memory 41 may also be an external storage device of the cross-modal face recognition device 4 in other embodiments, such as a plug-in hard disk equipped on the cross-modal face recognition device 4, a smart memory card. (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Further, the memory 41 may also include both an internal storage unit of the cross-modal face recognition device 4 and an external storage device. The memory 41 is used to store an operating system, an application program, a boot loader (Boot Loader), data, and other programs, such as program codes of the computer program. The memory 41 can also be used to temporarily store data that has been output or will be output.
本申请实施例还提供了一种网络设备,该网络设备包括:至少一个处理器、存储器以及存储在所述存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任意各个方法实施例中的步骤。An embodiment of the present application also provides a network device, the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing The computer program implements the steps in any of the foregoing method embodiments.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps in the foregoing method embodiments can be implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在跨模态人脸识别设备上运行时,使得跨模态人脸识别设备执行时实现可实现上述各个方 法实施例中的步骤。The embodiments of the present application provide a computer program product, when the computer program product runs on a cross-modal face recognition device, the steps in the above method embodiments can be implemented when the cross-modal face recognition device is executed.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/电子设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, all or part of the processes in the methods of the above embodiments can be implemented by a computer program to instruct the relevant hardware. The computer program can be stored in a computer-readable storage medium, and the computer program When executed by the processor, the steps of the above-mentioned various method embodiments may be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like. The computer-readable medium may include at least: any entity or device capable of carrying the computer program code to the photographing device/electronic device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media. For example, U disk, mobile hard disk, disk or CD, etc. In some jurisdictions, under legislation and patent practice, computer readable media may not be electrical carrier signals and telecommunications signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the foregoing embodiments, the description of each embodiment has its own emphasis. For parts that are not described or described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或 通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in this application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods, such as multiple units. Or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it is still possible to implement the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims (10)

  1. 一种跨模态人脸识别方法,其特征在于,包括:A cross-modal face recognition method, comprising:
    采集待识别的人脸图像;Collect face images to be recognized;
    将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;Inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;
    其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
    根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;The preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model;
    将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  2. 如权利要求1所述的方法,其特征在于,所述跨模态人脸识别模型包括预设数量的卷积层和全连接层;所述卷积层包括用于特征提取的卷积层和用于降维的最大池化层。The method of claim 1, wherein the cross-modal face recognition model comprises a preset number of convolutional layers and fully connected layers; the convolutional layers comprise convolutional layers for feature extraction and Max pooling layer for dimensionality reduction.
  3. 如权利要求1所述的方法,其特征在于,在所述获取第一训练样本集之前,还包括:The method of claim 1, wherein before the acquiring the first training sample set, further comprising:
    获取所述第一预设数量的可见光人脸图像序列;acquiring the first preset number of visible light face image sequences;
    对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列。Perform pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence.
  4. 如权利要求1所述的方法,其特征在于,在所述获取第一训练样本集之前,还包括:The method of claim 1, wherein before the acquiring the first training sample set, further comprising:
    获取所述第二预设数量的红外人脸图像序列;acquiring the second preset number of infrared face image sequences;
    对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列。Perform pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence.
  5. 如权利要求3所述的方法,其特征在于,对所述可见光人脸图像序列进行像素均衡处理,得到所述可见光人脸预处理图像序列,包括:The method of claim 3, wherein performing pixel equalization processing on the visible light face image sequence to obtain the visible light face preprocessing image sequence, comprising:
    对所述可见光人脸图像序列转换为灰度图像;Converting the visible light face image sequence into a grayscale image;
    对所述灰度图像进行归一化处理,得到所述可见光人脸预处理图像序列。The grayscale image is normalized to obtain the visible light face preprocessing image sequence.
  6. 如权利要求4所述的方法,其特征在于,对所述红外人脸图像序列进行像素均衡处理,得到所述红外人脸预处理图像序列,包括:The method according to claim 4, wherein performing pixel equalization processing on the infrared face image sequence to obtain the infrared face preprocessing image sequence, comprising:
    对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列;performing image contrast enhancement on the infrared face image sequence to obtain an enhanced infrared face image sequence;
    对增强后的红外人脸图像序列进行归一化处理,得到所述红外人脸预处理图像序列。The enhanced infrared face image sequence is normalized to obtain the infrared face preprocessing image sequence.
  7. 如权利要求6所述的跨模态人脸识别方法,其特征在于,对所述红外人脸图像序列进行图像对比度增强,得到增强后的红外人脸图像序列,包括:The cross-modal face recognition method according to claim 6, wherein the image contrast enhancement is performed on the infrared face image sequence to obtain an enhanced infrared face image sequence, comprising:
    对所述红外人脸图像序列进行直方图均衡化处理,增强所述红外人脸图像序列的图像对比度,得到增强后的红外人脸图像序列。The histogram equalization process is performed on the infrared face image sequence to enhance the image contrast of the infrared face image sequence to obtain an enhanced infrared face image sequence.
  8. 一种跨模态人脸识别装置,其特征在于,包括:A cross-modal face recognition device, comprising:
    采集模块,用于采集待识别的人脸图像;The acquisition module is used to collect the face image to be recognized;
    识别模块,用于将所述待识别的人脸图像输入预先训练完成的跨模态人脸识别模型进行人脸识别;A recognition module, for inputting the face image to be recognized into a pre-trained cross-modal face recognition model for face recognition;
    其中,所述预先训练完成的跨模态人脸识别模型的训练过程包括:获取第一训练样本集,所述第一训练样本集包括第一预设数量的可见光人脸预处理图像序列和第二预设数量的红外人脸预处理图像序列;The training process of the pre-trained cross-modal face recognition model includes: acquiring a first training sample set, where the first training sample set includes a first preset number of visible-light face preprocessing image sequences and a first training sample set. 2. A preset number of infrared face preprocessing image sequences;
    根据所述可见光人脸预处理图像序列以及第一分类损失函数对预设跨模态人脸识别模型进行训练,得到第一跨模态人脸识别模型;The preset cross-modal face recognition model is trained according to the visible light face preprocessing image sequence and the first classification loss function to obtain a first cross-modal face recognition model;
    将所述可见光预处理图像序列和所述红外人脸预处理图像序列输入所述第 一跨模态人脸识别模型,并基于所述第一分类损失函数对所述第一跨模态人脸识别模型进行再次训练,得到第二跨模态人脸识别模型,所述第二跨模态人脸识别模型为所述预先训练完成的跨模态人脸识别模型。Input the visible light preprocessing image sequence and the infrared face preprocessing image sequence into the first cross-modal face recognition model, and classify the first cross-modal face based on the first classification loss function The recognition model is retrained to obtain a second cross-modal face recognition model, where the second cross-modal face recognition model is the pre-trained cross-modal face recognition model.
  9. 一种跨模态人脸识别设备,其特征在于,包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述的方法。A cross-modal face recognition device, comprising: a memory, a processor, and a computer program stored in the memory and executable on the processor, when the processor executes the computer program A method as claimed in any one of claims 1 to 7 is carried out.
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述的方法。A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1 to 7 is implemented.
PCT/CN2021/107933 2020-12-14 2021-07-22 Cross-modal face recognition method, apparatus and device, and storage medium WO2022127111A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011467115.3A CN112507897A (en) 2020-12-14 2020-12-14 Cross-modal face recognition method, device, equipment and storage medium
CN202011467115.3 2020-12-14

Publications (1)

Publication Number Publication Date
WO2022127111A1 true WO2022127111A1 (en) 2022-06-23

Family

ID=74973029

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107933 WO2022127111A1 (en) 2020-12-14 2021-07-22 Cross-modal face recognition method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN112507897A (en)
WO (1) WO2022127111A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565215A (en) * 2022-07-01 2023-01-03 北京瑞莱智慧科技有限公司 Face recognition algorithm switching method and device and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507897A (en) * 2020-12-14 2021-03-16 奥比中光科技集团股份有限公司 Cross-modal face recognition method, device, equipment and storage medium
CN113743379B (en) * 2021-11-03 2022-07-12 杭州魔点科技有限公司 Light-weight living body identification method, system, device and medium for multi-modal characteristics
CN115147679B (en) * 2022-06-30 2023-11-14 北京百度网讯科技有限公司 Multi-mode image recognition method and device, model training method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608450A (en) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on deep convolutional neural network
CN108520220A (en) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 model generating method and device
US20190258885A1 (en) * 2018-02-19 2019-08-22 Avigilon Corporation Method and system for object classification using visible and invisible light images
CN112149635A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Cross-modal face recognition model training method, device, equipment and storage medium
CN112507897A (en) * 2020-12-14 2021-03-16 奥比中光科技集团股份有限公司 Cross-modal face recognition method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105608450A (en) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on deep convolutional neural network
US20190258885A1 (en) * 2018-02-19 2019-08-22 Avigilon Corporation Method and system for object classification using visible and invisible light images
CN108520220A (en) * 2018-03-30 2018-09-11 百度在线网络技术(北京)有限公司 model generating method and device
CN112149635A (en) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 Cross-modal face recognition model training method, device, equipment and storage medium
CN112507897A (en) * 2020-12-14 2021-03-16 奥比中光科技集团股份有限公司 Cross-modal face recognition method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG DIAN, WANG HAI-TAO, JIANG YING CHEN, XING: "Research on Face Recognition Algorithm Based on Near Infrared and Visible Image Fusion of Lightweight Neural Network", JOURNAL OF CHINESE COMPUTER SYSTEMS, vol. 41, no. 4, 30 April 2020 (2020-04-30), XP055943347 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115565215A (en) * 2022-07-01 2023-01-03 北京瑞莱智慧科技有限公司 Face recognition algorithm switching method and device and storage medium
CN115565215B (en) * 2022-07-01 2023-09-15 北京瑞莱智慧科技有限公司 Face recognition algorithm switching method and device and storage medium

Also Published As

Publication number Publication date
CN112507897A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
WO2022127112A1 (en) Cross-modal face recognition method, apparatus and device, and storage medium
WO2022127111A1 (en) Cross-modal face recognition method, apparatus and device, and storage medium
CN109117803B (en) Face image clustering method and device, server and storage medium
US8750573B2 (en) Hand gesture detection
Faraji et al. Face recognition under varying illuminations using logarithmic fractal dimension-based complete eight local directional patterns
CN111461165A (en) Image recognition method, recognition model training method, related device and equipment
WO2020143330A1 (en) Facial image capturing method, computer-readable storage medium and terminal device
TWI727548B (en) Method for face recognition and device thereof
WO2023010758A1 (en) Action detection method and apparatus, and terminal device and storage medium
KR101912748B1 (en) Scalable Feature Descriptor Extraction and Matching method and system
WO2020248848A1 (en) Intelligent abnormal cell determination method and device, and computer readable storage medium
CN106650568B (en) Face recognition method and device
WO2020143165A1 (en) Reproduced image recognition method and system, and terminal device
WO2024077781A1 (en) Convolutional neural network model-based image recognition method and apparatus, and terminal device
WO2023179095A1 (en) Image segmentation method and apparatus, terminal device, and storage medium
CN112464803A (en) Image comparison method and device
CN113158869A (en) Image recognition method and device, terminal equipment and computer readable storage medium
Roy et al. A novel quaternary pattern of local maximum quotient for heterogeneous face recognition
CN111400528A (en) Image compression method, device, server and storage medium
Rehman Light microscopic iris classification using ensemble multi‐class support vector machine
WO2021027155A1 (en) Verification method and apparatus based on finger vein image, and storage medium and computer device
CN111325709A (en) Wireless capsule endoscope image detection system and detection method
CN108960246B (en) Binarization processing device and method for image recognition
CN111126250A (en) Pedestrian re-identification method and device based on PTGAN
Fathee et al. Iris segmentation in uncooperative and unconstrained environments: state-of-the-art, datasets and future research directions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21905044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21905044

Country of ref document: EP

Kind code of ref document: A1