CN109583569B - Multi-mode feature fusion method and device based on convolutional neural network - Google Patents

Multi-mode feature fusion method and device based on convolutional neural network Download PDF

Info

Publication number
CN109583569B
CN109583569B CN201811456123.0A CN201811456123A CN109583569B CN 109583569 B CN109583569 B CN 109583569B CN 201811456123 A CN201811456123 A CN 201811456123A CN 109583569 B CN109583569 B CN 109583569B
Authority
CN
China
Prior art keywords
mode
feature
image
neural network
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811456123.0A
Other languages
Chinese (zh)
Other versions
CN109583569A (en
Inventor
仲崇亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Entropy Technology Co ltd
Original Assignee
Entropy Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Entropy Technology Co Ltd filed Critical Entropy Technology Co Ltd
Priority to CN201811456123.0A priority Critical patent/CN109583569B/en
Publication of CN109583569A publication Critical patent/CN109583569A/en
Application granted granted Critical
Publication of CN109583569B publication Critical patent/CN109583569B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Collating Specific Patterns (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a multi-mode feature fusion method based on a convolutional neural network, which comprises the following steps: extracting features of a plurality of modals from different heterogeneous images to obtain a first feature set of each modality; in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode; and determining the weight of the second feature set of each mode in a full connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition. Therefore, the problem of limitation of single mode identification in the prior art is solved, and the accuracy of biological feature identification is improved.

Description

Multi-mode feature fusion method and device based on convolutional neural network
Technical Field
The invention relates to the field of image processing, in particular to a multi-modal feature fusion method and device based on a convolutional neural network.
Background
With the development of scientific technology and the more mature biological recognition technology, people's life also depends on the biological feature recognition technology, such as: functions such as fingerprint and face recognition unlocking depend on the biometric technology.
In the prior art, biometric feature recognition is usually performed by using image features of a single modality, and different modalities can be understood as images obtained under different scenes, such as a visible light face image, a near-infrared iris image, and the like.
However, each modality has certain limitations, and if a recognition model trained in one modality is used, when the recognition model is used for recognizing images of other modalities, the accuracy of a recognition result is affected. Thus, there is a need for a way to fuse different modalities.
Disclosure of Invention
The embodiment of the invention discloses a multi-modal feature fusion method, device and system based on a convolutional neural network, which solve the problem of limitation of single-modal identification in the prior art and improve the accuracy of biological feature identification.
The embodiment of the invention discloses a multi-mode feature fusion method based on a convolutional neural network, which comprises the following steps:
extracting features of a plurality of modals from different heterogeneous images to obtain a first feature set of each modality;
in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode;
and determining the weight of the second feature set of each mode in a full connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition.
Optionally, the heterogeneous image includes:
the image processing device comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode.
Optionally, when the heterogeneous image is a near-infrared face image or a visible light face image, the extracting features of multiple modalities from different heterogeneous images includes:
detecting the input visible light face image or near infrared face image to obtain position information of a face and position information of key points;
preprocessing the input visible light face image or near-infrared face image;
and inputting the preprocessed near-infrared face image or visible light face image into a trained face image feature extraction model, and extracting face features under near-infrared light or visible light.
Optionally, when the heterogeneous image is a visible light iris image or a near-infrared iris image, the extracting features of multiple modalities from different heterogeneous images includes:
extracting correlation characteristics of two eyes in the visible light iris image or the near-infrared iris image respectively in a first mode and a second mode to obtain a first target characteristic set and a second target characteristic set;
and extracting the depth features of the iris from the first target feature set and the second target feature set according to the complementarity of the first target feature set and the second target feature set.
Optionally, the screening out, according to the correlation between different modalities, features that meet a preset condition from the first feature set of each modality includes:
respectively screening out features with maximized inter-class difference and minimized intra-class difference from the first feature set of each mode to obtain a third feature set of each mode;
and analyzing the third feature set of each mode through a multivariate variable regression model to obtain a second feature set of each mode.
The embodiment of the invention also discloses a multi-mode feature fusion device based on the convolutional neural network, which comprises the following steps:
the multi-modal feature extraction unit is used for extracting features of a plurality of modes from different heterogeneous images to obtain a first feature set of each mode;
the screening unit is used for screening out the characteristics meeting the preset conditions from the first characteristic set of each mode according to the correlation among different modes in the multi-mode convolutional neural network to obtain a second characteristic set of each mode;
and the fusion unit is used for determining the weight of the second feature set of each mode in a full connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition.
Optionally, the heterogeneous image includes:
the image processing device comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode.
Optionally, the screening unit includes:
the screening subunit is used for screening out the features with the maximized inter-class difference and the minimized intra-class difference from the first feature set of each mode respectively to obtain a third feature set of each mode;
and the analysis subunit is used for analyzing the third feature set of each mode through the multivariate variable regression model to obtain a second feature set of each mode.
The embodiment of the invention also discloses a multi-mode feature fusion system based on the convolutional neural network, which comprises the following steps:
the system comprises an acquisition end and a data processing end;
the acquisition terminal is used for acquiring heterogeneous images representing different modalities;
the data processing terminal is used for extracting features of a plurality of modes from different heterogeneous images to obtain a first feature set of each mode;
in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode;
and determining the weight of the second feature set of each mode in a full connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition.
Optionally, the heterogeneous image includes:
the image processing device comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode.
The embodiment of the invention discloses a multi-mode feature fusion method, a device and a system based on a convolutional neural network, which comprise the following steps: extracting features of a plurality of modals from different heterogeneous images to obtain a first feature set of each modality; in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode; and determining the weight of the second feature set of each mode in a full connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition. Therefore, the multi-modal convolutional neural network for feature recognition is obtained by fusing the multi-modal features and training the multi-modal convolutional neural network according to the fused features, so that the problem of limitation of single-mode recognition in the prior art is solved, and the accuracy of biological feature recognition is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart diagram illustrating a multi-modal feature fusion method based on a convolutional neural network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a process of feature extraction of a visible light face image or a near-infrared face image;
FIG. 3 is a schematic diagram showing a process of feature extraction for a visible light iris image or a near-infrared iris image;
FIG. 4 shows a schematic diagram of feature fusion;
FIG. 5 is a schematic structural diagram of a multi-modal feature fusion apparatus based on a convolutional neural network according to an embodiment of the present invention;
fig. 6 shows a schematic structural diagram of a multi-modal feature fusion system based on a convolutional neural network according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a flow diagram of a multi-modal feature fusion method based on a convolutional neural network according to an embodiment of the present invention is shown, in this embodiment, the method includes:
s101: extracting features of a plurality of modals from different heterogeneous images to obtain a first feature set of each modality;
in this embodiment, the heterogeneous images are images under different scene conditions, such as different illumination, different shooting angles, different lens settings (near distance and far distance), and different shooting sites (offices, banks, cells, and the like).
In this embodiment, the following four images are taken as examples to explain the scheme, including: visible light face image, near-infrared face image, visible light iris image and near-infrared iris image.
In this embodiment, feature extraction may be performed on images of different modalities (for example, a human face or an iris) in multiple ways, which is not limited in this embodiment. However, in order to clearly explain a specific implementation process of the present embodiment, two ways are described in the present embodiment for performing feature extraction on a face image and an iris image, respectively.
In one embodiment, as shown in fig. 2, for a near-infrared face image or a visible-light face image, S101 includes:
s201: detecting the input visible light face image or near infrared face image to obtain position information of a face and position information of key points;
s202: preprocessing the input visible light face image or near-infrared face image;
because the illumination condition when shooing is different with the shooting angle, there is certain difference in the face image in the sample and the face image of standard, in order to eliminate the error influence that this difference brought, carries out the preliminary treatment with visible light face image or near-infrared face image, specifically includes:
acquiring key point position information and illumination conditions of a standard face;
the standard face may be preset, or the key point position information and the illumination condition of the average face calculated on the training set may be used as the standard face.
And aligning the key point position of the visible light face image or the near-infrared face image with the key point position of the standard face according to the acquired position information and the key point position information of the visible light face image or the near-infrared face image.
Acquiring illumination of a visible light face image or a near-infrared face image;
and converting the illumination of the visible light face image or the near-infrared face image into the illumination condition of a standard face through an image processing algorithm.
The key points of the sample are aligned with the key points of the standard face, and the number of operations of converting the illumination of the face image into the illumination condition of the standard face is not limited. In addition, the order of performing the key point alignment and the conversion of the lighting conditions may be arbitrarily adjusted, and is not limited in this embodiment.
S203: and inputting the preprocessed visible light face image or near-infrared face image into a trained face image feature extraction model, and extracting face features under near-infrared light or visible light.
In this embodiment, the face image feature extraction model is obtained by training a standard face, where the features that can be extracted by the model include: identity feature vectors and feature vectors of different attributes. In addition, the method can also comprise the following steps: gender feature vector, age feature vector.
Specifically, the facial image feature extraction model may be a multi-task neural network model, and each task represents extracting different facial features, for example: identity characteristics, gender characteristics, age characteristics, and the like.
The objective of the multitask neural network is to minimize the weighted loss sum of each subtask, and in order to minimize the weighted loss sum of each subtask, different loss functions may be used to optimize the multitask neural network model, specifically, the method includes:
1. for identity recognition tasks
The multitasking neural network model may be optimized, for example, using a softmax loss function as an optimization objective, where the softmax loss function is as follows:
Figure BDA0001887767380000061
where N is the number of classes, x is the input face image, yIdentity∈RN×1Is a class vector representing the class of face images,
Figure BDA0001887767380000062
and representing the output of the ith node of the face identity classifier learned by the neural network.
2. Identifying tasks for age
The human face gender estimation task divides the human face image into two categories according to different genders, and the task can use a two-category loss function represented by a hinge loss as an optimization target. Where the change loss function is as follows:
Figure BDA0001887767380000071
wherein, yGenderE { -1, +1} is a label representing the gender of the face image,
Figure BDA0001887767380000072
the method is the prediction output of the neural network to the gender of the input face image.
The human face age estimation task means that the age of the human face is predicted according to the human face image, and the human face age estimation task is a regression task. This task may use a series of regression loss functions, represented by the square loss, as optimization objectives. Wherein the square loss is as follows:
Figure BDA0001887767380000073
wherein, yAgeIs the true value of the age of the face image,
Figure BDA0001887767380000074
is the prediction output of the neural network to the age of the input face image.
It should be noted that the above task of identity classification, gender classification, and age estimation is not the only task forming form of the multitask neural network, and subtasks may be replaced by ethnic classification, hair style identification, and the like. The subtasks of the multitasking neural network are also not limited to three, and may be any number of combinations. The optimization goal of the overall multitasking neural network is the weighted sum of the subtasks, as follows:
L=λILIGLGALA+…
where λ is the loss weight of the subtask.
In this embodiment, after the face feature extraction model is obtained, the preprocessed visible light image or near-infrared image is input into the face feature extraction model, and the face features under visible light or near-infrared light are extracted.
In a second embodiment, as shown in fig. 3, when the heterogeneous image is a visible light iris image or a near infrared iris image, S101 includes:
s301: extracting correlation characteristics of two eyes in the visible light iris image or the near-infrared iris image respectively in a first mode and a second mode to obtain a first target characteristic set and a second target characteristic set;
the first mode and the second mode are two different feature extraction modes, for example, the first mode may be a preset convolution algorithm, such as a Pairwise CNNS algorithm, and the second mode may be a conventional feature extraction method, such as a sequencing measure filter.
S302: and extracting the depth features of the iris from the first target feature set and the second target feature set according to the complementarity of the first target feature set and the second target feature set. (ii) a
Different feature extraction modes have advantages and disadvantages, extracted features also have certain complementarity, and depth features with better robustness are extracted through the feature complementarity extracted in different modes.
Preferably, a convolutional neural network model based on a maxout activation unit can be adopted to extract the depth features of the iris. In this embodiment, the extracted depth features may express the similarities and differences between the iris textures more robustly.
Further, it is assumed that the heterogeneous image includes: visible light face image, near-infrared face image, visible light iris image, near-infrared iris image, the first characteristic set that obtains includes: the method comprises the steps of a face feature set under visible light, a face feature set under near infrared light, an iris feature set under visible light and an iris feature set under near infrared light.
S102: in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode;
in this embodiment, for the same organism, different modalities have a certain correlation, and features having a certain correlation may be screened out from different feature sets according to the correlation, specifically, S102 includes:
respectively screening out features with maximized inter-class difference and minimized intra-class difference from the first feature set of each mode to obtain a third feature set of each mode;
and analyzing the third feature set of each mode through a multivariate variable regression model to obtain a second feature set of each mode.
In this embodiment, the intra-class difference refers to the similarity between different features in the same image, where the similarity between different features may be represented by the distance between features; inter-class differences refer to distances between features in different images. Wherein, in the same image, the larger the similarity between the features is, the smaller the intra-class difference is; the greater the distance between features in different images, the greater the difference between the classes.
In this embodiment, the multivariate regression model includes: CCA (Chinese full name: canonical correlation Analysis), PLS (Partial Least Squares), CSR (Coupled Spectral Regression), and the like.
The main idea is that the third feature set of each modality is processed according to the correlation among the features in the third feature set of each modality, and a common feature space containing the second feature set of each modality is obtained.
Besides, in order to improve the accuracy of the subsequent multi-modal convolutional neural network training, the distance D between feature points with correlation in the third feature set of different modalities may be approximately equal to the distance D between feature points with correlation in the same type of sample.
Wherein homogeneous samples represent homogeneous samples, such as two human face images under visible light.
S103: and determining the weight of the second feature set of each mode in a full connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition.
In this embodiment, assume f as shown in FIG. 41Representing a set of facial features in visible light, f2Representing a set of facial features in near infrared light, f3Representing the set of features of the iris in visible light, f4Representing a set of iris features in near infrared light. Each feature is combinedInputting the features in the set into a multi-mode feature convolution learning layer to obtain four corresponding feature matrixes which are respectively: w1、W2、W3、W4Then W is weighted according to the different modalities1f1、W2f2、W3f3、W4f4Are connected in series.
The determination of the weight may be preset by a technician, or may be determined according to a training result during a training process.
In the embodiment, the multi-modal convolutional neural network for feature recognition is obtained by fusing the multi-modal features and training the multi-modal convolutional neural network according to the fused features, so that the problem of limitation of single-mode recognition in the prior art is solved, and the accuracy of biological feature recognition is improved.
Referring to fig. 5, a schematic structural diagram of a multi-modal feature fusion apparatus based on a convolutional neural network according to an embodiment of the present invention is shown, in this embodiment, the apparatus includes;
the multi-modal feature extraction unit 501 is configured to extract features of multiple modalities from different heterogeneous images to obtain a first feature set of each modality;
the screening unit 502 is configured to screen out, in the multi-modal convolutional neural network, features meeting preset conditions from the first feature set of each modality according to correlations between different modalities to obtain a second feature set of each modality;
the fusion unit 503 is configured to determine, at a full connection layer of the multi-modal convolutional neural network, a weight of the second feature set of each modality, and fuse the second feature sets of the multiple modalities according to the weight, so that the fused second feature sets train the multi-modal convolutional neural network for biological feature recognition.
Optionally, the heterogeneous image includes:
the image processing device comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode.
Optionally, the screening unit includes:
the screening subunit is used for screening out the features with the maximized inter-class difference and the minimized intra-class difference from the first feature set of each mode respectively to obtain a third feature set of each mode;
and the analysis subunit is used for analyzing the third feature set of each mode through the multivariate variable regression model to obtain a second feature set of each mode.
Optionally, the multi-modal feature extraction unit is specifically configured to:
detecting the input visible light face image or near infrared face image to obtain position information of a face and position information of key points;
preprocessing the input visible light face image or near-infrared face image;
and inputting the preprocessed near-infrared face image or visible light face image into a trained face image feature extraction model, and extracting face features under near-infrared light or visible light.
And
extracting correlation characteristics of two eyes in the visible light iris image or the near-infrared iris image respectively in a first mode and a second mode to obtain a first target characteristic set and a second target characteristic set;
and extracting the depth features of the iris from the first target feature set and the second target feature set according to the complementarity of the first target feature set and the second target feature set.
By the device, the multi-modal characteristics are fused, and the multi-modal convolutional neural network is trained according to the fused characteristics to obtain the multi-modal convolutional neural network for characteristic recognition, so that the problem of limitation of single mode recognition in the prior art is solved, and the accuracy of biological characteristic recognition is improved.
Referring to fig. 6, a schematic structural diagram of a multi-modal feature fusion system based on a convolutional neural network according to an embodiment of the present invention is shown, where the system includes:
an acquisition end 601 and a data processing end 602;
the acquisition terminal 601 is configured to acquire heterogeneous images representing different modalities;
the data processing terminal 602 is configured to extract features of multiple modalities from different heterogeneous images to obtain a first feature set of each modality;
in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode;
and determining the weight of the second feature set of each mode in a full connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition.
Optionally, the heterogeneous image includes:
the image processing device comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode.
Optionally, when the data processing end executes the step of extracting features of multiple modalities from different heterogeneous images when the heterogeneous images are near-infrared face images or visible light face images, the data processing end is specifically configured to:
detecting the input visible light face image or near infrared face image to obtain position information of a face and position information of key points;
preprocessing the input visible light face image or near-infrared face image;
and inputting the preprocessed near-infrared face image or visible light face image into a trained face image feature extraction model, and extracting face features under near-infrared light or visible light.
Optionally, when the step of extracting features of multiple modalities from different heterogeneous images is performed when the heterogeneous image is a visible light iris image or a near-infrared iris image, the data processing end specifically includes:
extracting correlation characteristics of two eyes in the visible light iris image or the near-infrared iris image respectively in a first mode and a second mode to obtain a first target characteristic set and a second target characteristic set;
and extracting the depth features of the iris from the first target feature set and the second target feature set according to the complementarity of the first target feature set and the second target feature set.
Optionally, the data processing end, when executing the correlation between the different modalities, screens out a feature meeting a preset condition from the first feature set of each modality, and specifically is configured to:
respectively screening out features with maximized inter-class difference and minimized intra-class difference from the first feature set of each mode to obtain a third feature set of each mode;
and analyzing the third feature set of each mode through a multivariate variable regression model to obtain a second feature set of each mode.
By the system, heterogeneous images in a complex scene are collected, multi-modal features are fused, the multi-modal convolutional neural network is trained according to the fused features, and the multi-modal convolutional neural network for feature recognition is obtained.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (5)

1. A multi-modal feature fusion method based on a convolutional neural network is characterized by comprising the following steps:
extracting features of a plurality of modals from different heterogeneous images to obtain a first feature set of each modality; the heterogeneous image includes: the system comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode;
in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode;
determining the weight of the second feature set of each mode in a full-connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition;
according to the correlation among different modes, the characteristics meeting the preset conditions are screened out from the first characteristic set of each mode, and the method comprises the following steps:
respectively screening out features with maximized inter-class difference and minimized intra-class difference from the first feature set of each mode to obtain a third feature set of each mode;
analyzing the third feature set of each mode through a multivariate variable regression model to obtain a second feature set of each mode;
the distance between the characteristic points with correlation in the third characteristic set of different modalities is approximately equal to the distance between the characteristic points with correlation in the same type of sample; the homogeneous samples represent homogeneous samples.
2. The method according to claim 1, wherein in the case that the heterogeneous image is a near-infrared face image or a visible-light face image, the extracting features of multiple modalities from different heterogeneous images comprises:
detecting the input visible light face image or near infrared face image to obtain position information of a face and position information of key points;
preprocessing the input visible light face image or near-infrared face image;
and inputting the preprocessed near-infrared face image or visible light face image into a trained face image feature extraction model, and extracting face features under near-infrared light or visible light.
3. The method of claim 1, wherein in the case that the heterogeneous image is a visible light iris image or a near infrared iris image, the extracting features of a plurality of modalities from different heterogeneous images comprises:
extracting correlation characteristics of two eyes in the visible light iris image or the near-infrared iris image respectively in a first mode and a second mode to obtain a first target characteristic set and a second target characteristic set; and extracting the depth features of the iris from the first target feature set and the second target feature set according to the complementarity of the first target feature set and the second target feature set.
4. A multi-modal feature fusion device based on a convolutional neural network, comprising:
the multi-modal feature extraction unit is used for extracting features of a plurality of modes from different heterogeneous images to obtain a first feature set of each mode; the heterogeneous image includes: the system comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode;
the screening unit is used for screening out the characteristics meeting the preset conditions from the first characteristic set of each mode according to the correlation among different modes in the multi-mode convolutional neural network to obtain a second characteristic set of each mode;
the fusion unit is used for determining the weight of the second feature set of each mode in a full-connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition;
the screening unit includes:
the screening subunit is used for screening out the features with the maximized inter-class difference and the minimized intra-class difference from the first feature set of each mode respectively to obtain a third feature set of each mode;
the analysis subunit is used for analyzing the third feature set of each mode through the multivariate variable regression model to obtain a second feature set of each mode;
the distance between the characteristic points with correlation in the third characteristic set of different modalities is approximately equal to the distance between the characteristic points with correlation in the same type of sample; the homogeneous samples represent homogeneous samples.
5. A multi-modal feature fusion system based on convolutional neural networks, comprising:
the system comprises an acquisition end and a data processing end;
the acquisition terminal is used for acquiring heterogeneous images representing different modalities; the heterogeneous image includes: the system comprises a visible light face image, a near infrared face image, a visible light iris image and a near infrared iris image, wherein each image corresponds to a mode;
the data processing terminal is used for extracting features of a plurality of modes from different heterogeneous images to obtain a first feature set of each mode;
in the multi-mode convolutional neural network, according to the correlation among different modes, screening out the characteristics which meet the preset conditions from the first characteristic set of each mode to obtain a second characteristic set of each mode;
determining the weight of the second feature set of each mode in a full-connection layer of the multi-mode convolutional neural network, and fusing the second feature sets of the multiple modes according to the weight so that the fused second feature sets train the multi-mode convolutional neural network for biological feature recognition;
according to the correlation among different modes, the characteristics meeting the preset conditions are screened out from the first characteristic set of each mode, and the method comprises the following steps:
respectively screening out features with maximized inter-class difference and minimized intra-class difference from the first feature set of each mode to obtain a third feature set of each mode;
analyzing the third feature set of each mode through a multivariate variable regression model to obtain a second feature set of each mode;
the distance between the characteristic points with correlation in the third characteristic set of different modalities is approximately equal to the distance between the characteristic points with correlation in the same type of sample; the homogeneous samples represent homogeneous samples.
CN201811456123.0A 2018-11-30 2018-11-30 Multi-mode feature fusion method and device based on convolutional neural network Active CN109583569B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811456123.0A CN109583569B (en) 2018-11-30 2018-11-30 Multi-mode feature fusion method and device based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811456123.0A CN109583569B (en) 2018-11-30 2018-11-30 Multi-mode feature fusion method and device based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN109583569A CN109583569A (en) 2019-04-05
CN109583569B true CN109583569B (en) 2021-08-31

Family

ID=65926547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811456123.0A Active CN109583569B (en) 2018-11-30 2018-11-30 Multi-mode feature fusion method and device based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN109583569B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046588B (en) * 2019-04-22 2019-11-01 吉林大学 It is a kind of with steal attack coping mechanism heterogeneous iris recognition method
CN110046698B (en) * 2019-04-28 2021-07-30 北京邮电大学 Heterogeneous graph neural network generation method and device, electronic equipment and storage medium
CN110674677A (en) * 2019-08-06 2020-01-10 厦门大学 Multi-mode multi-layer fusion deep neural network for anti-spoofing of human face
CN110598748B (en) * 2019-08-13 2021-09-21 清华大学 Heterogeneous image change detection method and device based on convolutional neural network fusion
CN110598573B (en) * 2019-08-21 2022-11-25 中山大学 Visual problem common sense reasoning model and method based on multi-domain heterogeneous graph guidance
CN111340239B (en) * 2020-03-13 2021-05-04 清华大学 Hesitation iterative computation method and device for multi-mode machine learning target recognition
CN111523663B (en) * 2020-04-22 2023-06-23 北京百度网讯科技有限公司 Target neural network model training method and device and electronic equipment
CN111968087B (en) * 2020-08-13 2023-11-07 中国农业科学院农业信息研究所 Plant disease area detection method
CN112085681B (en) * 2020-09-09 2023-04-07 苏州科达科技股份有限公司 Image enhancement method, system, device and storage medium based on deep learning
CN112365340A (en) * 2020-11-20 2021-02-12 无锡锡商银行股份有限公司 Multi-mode personal loan risk prediction method
CN113011485B (en) * 2021-03-12 2023-04-07 北京邮电大学 Multi-mode multi-disease long-tail distribution ophthalmic disease classification model training method and device
CN113191991B (en) * 2021-04-07 2024-04-12 山东师范大学 Information bottleneck-based multi-mode image fusion method, system, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682662A (en) * 2015-11-11 2017-05-17 天津中科智能识别产业技术研究院有限公司 Multimodal biometric feature recognition module and method for mobile terminal
CN106803098A (en) * 2016-12-28 2017-06-06 南京邮电大学 A kind of three mode emotion identification methods based on voice, expression and attitude
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN108182441A (en) * 2017-12-29 2018-06-19 华中科技大学 Parallel multichannel convolutive neural network, construction method and image characteristic extracting method
CN105608450B (en) * 2016-03-01 2018-11-27 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on depth convolutional neural networks

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682662A (en) * 2015-11-11 2017-05-17 天津中科智能识别产业技术研究院有限公司 Multimodal biometric feature recognition module and method for mobile terminal
CN105608450B (en) * 2016-03-01 2018-11-27 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on depth convolutional neural networks
CN106803098A (en) * 2016-12-28 2017-06-06 南京邮电大学 A kind of three mode emotion identification methods based on voice, expression and attitude
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN108182441A (en) * 2017-12-29 2018-06-19 华中科技大学 Parallel multichannel convolutive neural network, construction method and image characteristic extracting method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Feature Fusion for Iris and Periocular Biometrics on Mobile Devices;Qi Zhang et al;《IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY》;20180508;摘要、第1-6节、图1-6 *
Exploring Complementary Features for Iris Recognition on Mobile Devices;Qi Zhang et al;《2016 international conference on biometrics(ICB)》;20161231;摘要,第1-5节 *
基于手指多模态生物特征的身份认证关键问题研究;彭加亮;《中国博士学位论文全文数据库信息科技辑》;20150115;摘要、第4章 *

Also Published As

Publication number Publication date
CN109583569A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN109583569B (en) Multi-mode feature fusion method and device based on convolutional neural network
CN104933344B (en) Mobile terminal user identity authentication device and method based on multi-biological characteristic mode
JP5008269B2 (en) Information processing apparatus and information processing method
CN110235169B (en) Cosmetic evaluation system and operation method thereof
Raja et al. Binarized statistical features for improved iris and periocular recognition in visible spectrum
CN110889312A (en) Living body detection method and apparatus, electronic device, computer-readable storage medium
US20080298644A1 (en) System and method for controlling image quality
MX2012010637A (en) Biometric information processing device.
CN110741377A (en) Face image processing method and device, storage medium and electronic equipment
KR20130018763A (en) Face detection and method and apparatus
JP2008243093A (en) Dictionary data registration device and method
CN108446687B (en) Self-adaptive face vision authentication method based on interconnection of mobile terminal and background
WO2018061786A1 (en) Living body authentication device
EP3617993B1 (en) Collation device, collation method and collation program
CN109064613A (en) Face identification method and device
KR101016758B1 (en) Method for identifying image face and system thereof
CN116311400A (en) Palm print image processing method, electronic device and storage medium
Mane et al. Novel multiple impression based multimodal fingerprint recognition system
CN113792807A (en) Skin disease classification model training method, system, medium and electronic device
JP7337541B2 (en) Information processing device, information processing method and program
CN111937005A (en) Biological feature recognition method, device, equipment and storage medium
Chen Design and simulation of AI remote terminal user identity recognition system based on reinforcement learning
CN117253318B (en) Intelligent self-service payment terminal system and method
Jain et al. Face recognition
WO2021157023A1 (en) Authentication method, information processing device, and authentication program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 523710, 26, 188 Industrial Road, Pingshan Town, Guangdong, Dongguan, Tangxia

Applicant after: Entropy Technology Co.,Ltd.

Address before: 523710, 26, 188 Industrial Road, Pingshan Town, Guangdong, Dongguan, Tangxia

Applicant before: ZKTECO Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20221201

Address after: Room 1301, No.132, Fengqi Road, phase III, software park, Xiamen City, Fujian Province

Patentee after: Xiamen Entropy Technology Co.,Ltd.

Address before: 523710 26 Pingshan 188 Industrial Avenue, Tangxia Town, Dongguan, Guangdong

Patentee before: Entropy Technology Co.,Ltd.