CN115188084A - Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein - Google Patents

Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein Download PDF

Info

Publication number
CN115188084A
CN115188084A CN202210927661.3A CN202210927661A CN115188084A CN 115188084 A CN115188084 A CN 115188084A CN 202210927661 A CN202210927661 A CN 202210927661A CN 115188084 A CN115188084 A CN 115188084A
Authority
CN
China
Prior art keywords
palm
module
feature
characteristic
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210927661.3A
Other languages
Chinese (zh)
Inventor
胡文艺
杜育佳
王洪坤
赵昆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Univeristy of Technology
Original Assignee
Chengdu Univeristy of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Univeristy of Technology filed Critical Chengdu Univeristy of Technology
Priority to CN202210927661.3A priority Critical patent/CN115188084A/en
Publication of CN115188084A publication Critical patent/CN115188084A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1347Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/14Vascular patterns
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Business, Economics & Management (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Vascular Medicine (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses a multi-mode identity recognition system and method for non-contact voiceprints and palm print palm veins, which comprises the following steps: a power supply module: the system is used for supplying power to the whole multi-mode identity recognition system; fixed wavelength infrared LED light source module: the hand of a human body is irradiated by an infrared LED light source, and the acquisition of the information characteristics of the palm print and the palm vein of the human body is assisted by an image acquisition CCD module; image acquisition CCD module: collecting the information characteristics of the palm print and the palm vein of the human body; the voice acquisition module: extracting voice information by using MFCC characteristics; a storage module: the device is used for storing data acquired by the voice acquisition module and the image acquisition CCD module. The multi-modal identity recognition module: and (4) preprocessing the picture, extracting picture characteristics, fusing and comparing the characteristics and outputting a result. The invention has the advantages that: the authentication safety is improved, the complexity of manually extracting features is reduced, the anti-noise interference capability is enhanced, and the robustness and the transportability of the system are improved.

Description

Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein
Technical Field
The invention relates to the technical field of biological feature recognition, in particular to a non-contact type multi-mode identity recognition system and method for voiceprints and palmar veins.
Background
With the rapid development of global information industrialization, how to perform rapid, accurate and safe identification and verification in a digital environment is a hot topic which is receiving attention in recent years. The traditional identity authentication is easy to lose, forget and forge, so that the biometric feature recognition technology is concerned more and more. Biometric identification is a process of identifying the authenticity of identity information after being processed by a system by collecting physiological characteristics and behavioral characteristics of a human body [1]. At present, the more mature or widely applied biometric identification technology is face, voice, fingerprint, iris, finger vein, DNA, signature, gait, etc. [2,3,4]. However, the single-mode biometric recognition may have a reduced accuracy due to sensor noise, unsuitability of feature extraction or matching methods, and may also have security problems due to the falsification of features, such as false fingerprint. Further, multimodal biometric identification comes into the line of sight of people. Different descriptions or perspectives of the same object are called modalities, while multi-modal characterization is the characterization of a particular task using information from multiple such entities together [3]. In general, a multi-modal biometric system fuses two or more biometric features at different levels, and can be divided into a sensor layer, a feature layer, a score layer, and a decision layer [5,6,7]. The research difficulty of multi-modal fusion authentication is how to effectively acquire, extract and compare the characteristics of multi-source heterogeneous data.
The characteristic learning technology is a technical set which can effectively identify and apply original complex data distribution according to tasks, namely useful information is extracted from data so as to learn data characteristics, and therefore the effectiveness of an algorithm model and the accuracy of a predictor are greatly improved. Based on the research of the characterization learning technology in the multi-mode data environment, the characterization learning can establish a model for processing and associating various mode information to perform multi-mode information fusion, so that the accuracy and the safety of identity authentication are improved. The goal of multi-modal token learning is to extract tokens of data objects (users) from data of multiple heterogeneous modalities, a typical approach is to concatenate the individual tokens of each modality together to form a joint token, and then perform subsequent task learning on this joint token [8]. The data representation is fused and the data of a plurality of data sources are unified, so that the heterogeneity among the data is overcome, and complementary information can be extracted from the data sources, so that the fused representation has richer and more effective information than that in a single mode.
Prior art 1
The fingerprint identification is to identify the identity by utilizing the uneven grain characteristics on the skin on the front surface of the tail end of the finger, the fingerprint has uniqueness and stability, and the verification of the real identity is realized by comparing the fingerprint with the fingerprint prestored in a database. Among various biometric identification techniques, fingerprint identification remains the most mature identification technique, and fingerprint identification has been accepted by officials in many countries, becomes an effective means for identity identification in the judicial community, and has also been widely used in many other industrial fields, and has become a pronoun and de facto standard for biometric identification. The fingerprint identification technology mainly relates to the processes of fingerprint image acquisition, fingerprint image preprocessing, fingerprint feature extraction, fingerprint image database establishment, fingerprint feature value comparison and matching and the like. After years of research, various fingerprint identification methods have been generated, wherein the most mature and widely applied fingerprint identification method based on the minutiae is the most widely used fingerprint identification method. The image that adopts in the laboratory utilizes the current device in laboratory to accomplish the collection, and specific content includes: acquiring a fingerprint image directional diagram; segmenting the fingerprint image; enhancing the fingerprint image; carrying out binarization and post-processing on the fingerprint image; thinning the fingerprint image; sixthly, extracting the characteristics of the fingerprint image; matching of fingerprint images.
Disadvantages of the first prior art
The finger cleaning device has the advantages that the requirement on the environment is high, the finger cleaning degree and the humidity of the finger are sensitive, and dirty oil and water can not be identified or the identification result is influenced;
the problems of difficult identification and low identification rate of low-quality fingerprints such as scars, molting and the like are solved;
the operation specification requirement during fingerprint identification is high;
fingerprint traces may remain on the device, and these traces may be used to copy the fingerprint.
Prior art 2
Compared with other biological identification technologies, the palm print and palm vein fusion identification technology has higher identification precision, convenience and stability, is favorable for improving the convenience of life of people and improves the safety of personal information to a certain extent.
The palm print and the palm vein have the texture which does not change with age, and the palm print characteristic identification has the advantages of rich texture characteristics, easy acceptance by users, higher safety and stability and the like.
The second prior art has the defects
(1) Palm vein and palm print image acquisition environment. The collection of the palm vein mainly has contact collection and non-contact collection, and no matter which collection mode is utilized, the collection process can be influenced by factors such as illumination, collection background and temperature.
(2) The influence of the localized segmentation of the critical region of the palm vein. In order to obtain a region with rich vein features, a palm region-of-interest (ROI) image needs to be positioned and segmented, researchers generally adopt palm vein images of a Hongkong science university database to perform vein recognition research, and hardware equipment is installed at a valley position between a middle finger and a ring finger due to the fact that a palm needs to be fixed during collection of the palm vein images in the database, so that the palm vein ROI image is difficult to position and segment. Due to the lack of a proper ROI positioning segmentation method, the accuracy of feature extraction is low, and the recognition rate is low.
(3) The palm veins interfere with the palm prints. The palm vein image has a palm print, and the existing algorithm still cannot completely remove the interference of the palm print, for example, the robustness of the algorithm is improved by using fuzzy threshold judgment and global gray value matching, but the interference of the palm print is not better removed, so that the identification effect of the palm vein is poor.
(4) The non-contact acquisition mode mainly has the problems of position deviation, distance drift, image defocusing, brightness fluctuation and the like of the palm print sample image. For the anti-counterfeiting of palm print identification, counterfeiting means such as a silica gel prosthesis and a palm print film mainly exist. These factors are the main reason why the accuracy of the non-contact palm print recognition system is lower than that of the contact palm print recognition system, and also the main reason that the non-contact palm print recognition system is limited to be put into practical use.
Reference to the literature
[1] Liu Qianying, liu Ji biometric identification technology is developing in the field of authentication [ J ] the electronic world, 2020 (05): 23-24;
[2] xie Lu, yu Fei secure authentication technology based on multi-modal biometric [ J ] secret science technology, 2016 (01): 36-40;
[3] zhou Chenyi, multimodal biometric identification based on fusion algorithms and deep learning study [ D ]. Southern medical university, 2020;
[4] zhang Lou, wang Huabin, tao Liang, zhou Jian adaptive multimodal biometric fusion based on classification distance scores [ J ] computer research and development, 2018, volume 55 (1): 151-162;
[5] ma Ruru bimodal identity authentication research based on fingerprints and electrocardiosignals [ D ]. Tianjin university of science, 2021;
[6] zhang Yue, algorithmic study of multimodal biometric identification technology [ D ]. University of vinpocetine, 2017;
[7] ding Xuan multimodal biometric identification technology and its standardized dynamics [ J ] computer knowledge and technology, 2017, vol 13 (36): 153-154;
[8] halbernet, lu Kai, characterization learning summary of complex heterogeneous data [ J ] computer science, 2020,47 (02): 1-9.
Disclosure of Invention
The invention provides a non-contact type multi-mode identity recognition system and method for voiceprints and palmprint metacarpal veins, which aim at the defects of the prior art. The identity authentication multi-mode biological feature recognition method based on the intelligent data representation theory is used as core content, and related technologies are integrated in a network security scene, so that the identity authentication multi-mode biological feature recognition method has high safety, convenience and reliability.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a multi-modal identification system for non-contact voiceprint and palmprint metacarpal veins, comprising: the system comprises a power supply module, a fixed wavelength infrared LED light source module, an image acquisition CCD module, a voice acquisition module, a storage module and a multi-mode identity recognition module;
a power supply module: for powering the entire multimodal identity recognition system
Fixed wavelength infrared LED light source module: the human hand is irradiated by an infrared LED light source to assist the image acquisition CCD module in acquiring the information characteristics of the palm print and the palm vein of the human body;
image acquisition CCD module: collecting the information characteristics of the palm print and the palm vein of the human body;
the voice acquisition module: extracting voice information by using MFCC characteristics;
a storage module: the device is used for storing data acquired by the voice acquisition module and the image acquisition CCD module.
The multi-modal identity recognition module: and preprocessing the picture, extracting picture characteristics, fusing and comparing the characteristics and outputting a result.
A multi-mode identity recognition method for non-contact voiceprints and palm print palm veins comprises the following steps:
step 1, preprocessing an image; the preprocessing mainly comprises three steps, firstly, denoising an infrared acquisition palm image by adopting low-pass filtering, secondly, extracting a binary image of a palm region by an image enhancement part through a Sauvola algorithm, finally, performing gray level transformation on a palm print and a palm vein by an ROI positioning part to enable the palm edge to be protruded, then, using a Canny operator to detect the palm edge, and finally, cutting the image to obtain an interested palm region image.
Step 2, feature extraction; the feature extraction is divided into two parts, wherein the first part is used for extracting voice features, and the second part is used for extracting two hand features of a palm print and a palm vein. Adopting ResNet as a main structure, introducing an SE module, constructing an SE-ResNet network structure, inputting the preprocessed pictures into the SE-ResNet network structure, generating feature distribution by adding a global pooling layer, and finishing the extraction of information codes. In order to obtain the correlation between channels, a ReLU activation function and a sigmoid gate control mechanism are combined to complete the recalibration of the characteristics.
Step 3, feature fusion; and a multi-layer characteristic fusion mechanism is adopted, the bilinear models are decomposed to carry out fusion to obtain the interaction between different modes of the hand and the audio, paired audio and hand characteristics are input into the fusion model, and the final result is output on a full connection layer through softmax.
Step 4, comparing characteristics; and (3) calculating the corner response function of each point by using the Shi-Tomasi algorithm for the feature points preliminarily extracted by using the improved FAST corner detection algorithm, and determining the top N points with the maximum response values as the feature points according to the corner response function. There are at least 2 strong boundaries in different directions around the screened feature points. For matching of binary feature description vectors, hamming distance is used as a similarity measure between descriptors.
And 5, outputting the interaction. Judging the in-mold sample characteristic points of the three modes by adopting a joint judgment sparse coding algorithm, so that the distance in the classes is minimum, and the distance between the classes is maximum; and setting a proper threshold value according to the actual scene requirement, if the two matched samples belong to the same class and are successfully matched in the voiceprint, the palm print and the palm vein, displaying that the authentication is successful on an interface, and otherwise, prompting that the authentication is failed.
Further, step 2 specifically comprises: for any given information, after entering the network module, the conversion is performed as shown in formula (1):
Figure BDA0003780255410000061
x is the input picture and U is the extracted feature.
And the SE compresses the global space information into a channel descriptor, the channel descriptor contains the global distribution condition of the feature response on the channel dimension, and the global average pooling layer is utilized to obtain the statistical data on the channel dimension. Statistical value
Figure BDA0003780255410000062
Is obtained by compressing U having a spatial dimension H × W by equation (2):
Figure BDA0003780255410000063
the transform output U is interpreted as a set of local descriptors, the statistics of which can express the entire image.
And completely capturing the dependency on the channel dimension by using the aggregation information obtained by the compression operation. A simple threshold mechanism with sigmoid activation function (3) was chosen:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z)) (3)
where, δ represents the ReLU activation function,
Figure BDA0003780255410000071
and
Figure BDA0003780255410000072
in order to limit the complexity of the model and to assist the generalization of the model, the threshold mechanism is parameterized by composing two fully-connected layers (FC) around the nonlinearity into a bottleneck (bottleeck) structure, and the final output of the block is obtained by rescaling the transform output U using the activation function (4):
Figure BDA0003780255410000073
in the formula (I), the compound is shown in the specification,
Figure BDA0003780255410000074
F scale( u c ,s c ) Representation characteristic diagram
Figure BDA0003780255410000075
And a scalar s c The product of the corresponding channels. s the role of this activation function is to give a weight to each channel based on the descriptor z of the input feature.
Further, step 3 is specifically as follows:
decomposing the bilinear model considers each feature pair by a linear transformation:
Z i =x T W i y+b i (5)
wherein x ∈ R n And y ∈ R m Is an input feature vector, W, from different modalities of hand and audio i Is a weightMatrix, b i Is the offset.
Weighting matrix W i Decomposed into two low-order matrices, i.e. where W i =U i V i T Wherein U is i ∈R n×d And V i ∈R m×d And d is less than or equal to min (n, m) by applying constraint on the dimension d. Equation (5) can be further rewritten as:
Z i =x T U i V i T y+b i (6)
capturing the inherent correlation between two heterogeneous modes, equation (7):
Figure BDA0003780255410000081
wherein 1 ∈ R d A column vector representing 1, and
Figure BDA0003780255410000085
representing a Hadamard or element-wise product.
To obtain the output eigenvector z, two third order tensors are required: u = [ U1, …, UO]∈R n×d×o And V = [ V = 1 ,…,V o ]∈R m×d×o . Using linear projection P ∈ R d×o Instead of a column vector, vector z is represented as:
Figure BDA0003780255410000082
wherein b ∈ R o Is a deviation vector.
After each linear mapping a non-linear activation function is added, the vector z is further represented as:
Figure BDA0003780255410000083
where σ represents any nonlinear activation function, and x and y represent the hand attention vector and audio feature vector, respectively, then the value of x is both greater than 0, and y is in the range of [ -1,1 ].
A Relu function is further added to normalize the output of the network, and the final vector z can be expressed as:
Figure BDA0003780255410000084
and inputting paired audio and hand features into the fusion model, and outputting a final result on the full connection layer through softmax.
Further, step 4 is specifically as follows:
the improved FAST algorithm is adopted, and the specific improvement is as follows: taking 24 pixels around one pixel P as a detection template, setting the gray value of the P as IP, setting a threshold value T, and if the gray value of 14 continuous pixels in the 24 pixels is greater than IP + T or less than IP-T, then P is an angular point.
And (3) optimizing the characteristic points by using a Shi-Tomasi algorithm, wherein the Shi-Tomasi algorithm compares the smaller one of the two characteristic values with a given minimum threshold value, and if the smaller one of the two characteristic values is larger than the given minimum threshold value, a strong corner point is obtained.
The Shi-Tomasi algorithm detects corner points by calculating the gray level after the local small window W (x, y) is moved in each direction. Shifting the window u, v to produce a gray scale change E u, v
Figure BDA0003780255410000091
Where M is a 2 x 2 autocorrelation matrix, calculated from the derivatives of the image
Figure BDA0003780255410000092
For λ in two features of matrix M max And λ min The analysis is performed in that the corner response function is defined as λ, since the larger uncertainty of curvature depends on the small corner min . Calculating the corner response function of each point by using Shi-Tomasi algorithm for the characteristic points preliminarily extracted by using the improved FAST corner detection algorithmλ min According to λ min And taking the point with the maximum N response values to determine the characteristic point. At least 2 strong boundaries in different directions exist around the screened feature points, and the feature points are easy to identify and stable.
For matching of binary feature description vectors, hamming distance is used as a similarity measure between descriptors. Let two feature vectors of the descriptor be F1, F2, then the hamming distance of F1, F2 is:
Figure BDA0003780255410000093
and judging whether the feature vectors are matched or not by determining the threshold value of the Hamming distance.
Further, step 5 is specifically as follows:
the joint discrimination sparse coding algorithm is as follows: given the feature matrices X, Y and Z of the three modalities, jointly learning the three projection matrices Px, py and Pz, mapping the three peak features to sparse matrices Vx ∈ Rd × N, vy ∈ Rd × N and Vz ∈ Rd × N, can accurately approximate the original matrix X, Y, Z, as Vx ≈ PxX, vy ≈ PyY, vz ≈ PzZ.
After the characteristic expressions Vx, vy and Vz are obtained from the three modes, the characteristic expressions are quantized to
C x =sgn(V x );C y =sgn(V y );C x =sgn(V X ); (13)
Wherein sgn () is a meta-level sign function to obtain sparse binary code, cx (Cy/Cy) = [ c1, c2, … ], cN ] ∈ Rl × N, ci ∈ {0,1} l represents the learned ith class of sparse binary code, and l (= 1,2, …, 12) is the length of the binary code.
Sparsity constraint is applied to the projection characteristic representations Vx and Vy by utilizing two projection matrixes to reduce projection errors, and the Frobenius norm is used as a cost function, so that the engineering errors can be expressed as
Figure BDA0003780255410000101
Where a, b >0, a + b epsilon (0,1) are trade-off parameters that balance the three different modes.
Two constraints are performed on the projection sparse features, 1) for the in-mold samples of each mode, the distance in the class is minimized, and the distance between the classes is maximized; 2) For intra-class samples, the information correlation between feature points is maximized and thus the distance is minimized. The projection sparse feature has stronger resolution and compactness through constraint.
Compared with the prior art, the invention has the advantages that:
(1) The method adopts a non-contact mode to collect the characteristics of the voiceprint, the palm print and the palm vein, improves the safety of authentication, and is suitable for scenes with higher requirements on the sanitary environment under epidemic situations and the like.
(2) The feature extraction adopts a deep learning mode, the complexity of manually extracting features is reduced, the anti-noise interference capability is enhanced, and the robustness and the transportability of the system are improved.
(3) Voiceprint recognition is integrated into palm print and palm vein recognition, and three modal features are integrated for identity authentication, so that the security, accuracy and robustness of authentication are improved.
Drawings
FIG. 1 is a diagram of a multi-modal identification system architecture in accordance with an embodiment of the present invention;
FIG. 2 is a flow diagram of the operation of a multimodal identity recognition system in accordance with an embodiment of the present invention;
FIG. 3 is a diagram of a SE-ResNet network architecture in accordance with an embodiment of the present invention;
FIG. 4 is a block diagram of a multi-layer feature fusion model according to an embodiment of the present invention;
fig. 5 is a feature matching flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.
The voice and hand multi-mode information acquisition device is an important device for identifying the identity of a human body, and the acquisition principle of the device is shown in figure 1. The voice and hand multi-mode identity recognition designed by the system is realized by placing a human hand in an infrared LED light source environment, collecting the information characteristics of a palm print and a palm vein of the human body by using a CCD device, extracting voice information by using MFCC characteristics and comparing the extracted voice and hand multi-mode characteristics with a verification map.
The overall architecture design of the system is shown in fig. 1, and mainly comprises a hardware part and a software part. The hardware part is mainly used for collecting multi-mode voice and hand characteristic information, and the software part is mainly used for multi-mode information processing and recognition. The system flow chart is as shown in fig. 2, and the hardware part specifically comprises a power supply module, a fixed wavelength infrared LED light source module, an image acquisition CCD module, a voice acquisition module and a storage module; the software part comprises image preprocessing, a feature extraction algorithm, feature fusion comparison and a user interaction interface.
The feature extraction is divided into two parts, wherein the first part is used for extracting voice features, and the second part is used for extracting two hand features of a palm print and a palm vein. The invention adopts ResNet as a main structure, and introduces an SE module on the basis to construct an SE-ResNet network structure, as shown in figure 3. And generating feature distribution by adding a global pooling layer, and finishing the extraction of information codes according to the feature distribution. In order to obtain the correlation between channels, a ReLU activation function and a sigmoid gate control mechanism are combined to complete the recalibration of the characteristics. In addition, in order to simplify the complexity of the model parameters, 1 × 1 full connection layers are also used at both ends of the ReLU function.
The SE (Squeeze-and-Excitation Networks) module is a computing unit that can consist of any given transformation, and for any given information, it performs the conversion as shown in (1) after entering the network module:
Figure BDA0003780255410000111
x is the input picture and U is the extracted feature. In order to make the information of the global receptive field from the network available to the lower level layers, the SE compresses the global spatial information into a channel descriptorThe character comprises the global distribution condition of the feature response on the channel dimension, and statistical data on one channel dimension is obtained by utilizing the global average pooling layer. Statistical value
Figure BDA0003780255410000121
Is derived from (2) compressing U with spatial dimension H × W:
Figure BDA0003780255410000122
the transformation output U can be interpreted as a set of local descriptors whose statistics can represent the entire image. In order to be able to exploit the aggregated information from the compression operation, the next goal is to fully capture the dependencies in channel dimensions. A simple threshold mechanism with sigmoid activation function (3) was chosen:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z)) (3)
where, δ represents the ReLU activation function,
Figure BDA0003780255410000123
and
Figure BDA0003780255410000124
to limit the complexity of the model and to assist the generalization of the model, the threshold mechanism is parameterized by constructing a bottleneck (bottleeck) structure from two fully connected layers (FC) around the non-linearity, e.g. with one parameter W 1 A dimension reduction layer for reducing the amount of parameters by a factor of r, a ReLU activation function and a parameter of W 2 Is added to the layer. The final output of the block is obtained by rescaling the transformed output U using the activation function (4):
Figure BDA0003780255410000125
in the formula (I), the compound is shown in the specification,
Figure BDA0003780255410000126
F scale( u c ,s c ) Representation characteristic diagram
Figure BDA0003780255410000127
And a scalar s c The product of the corresponding channels. s the role of this activation function is to give a weight to each channel based on the descriptor z of the input feature.
Technical route and implementation scheme of multi-layer feature fusion mechanism
In general, cascading or element-by-element summation is the most common scheme for heterogeneous feature fusion. Since the distribution of audio and hand features typically varies widely and their feature sizes typically vary in size, the representational capabilities of these simple fusion schemes may not be sufficient to achieve reliable speaker naming performance. Fusing by decomposing a bilinear model (FBM) enables better capture of the interaction between the two different modalities and is generally superior to simple fusion methods (e.g., tandem), as shown in fig. 4.
Decomposing the bilinear model considers each feature pair by a linear transformation:
Z i =x T W i y+b i (5)
wherein x ∈ R n And y ∈ R m Is an input feature vector from two different modalities (e.g., high-level features of hands and audio), W i Is a weight matrix, b i Is the offset. Although a bilinear model can capture the pairwise interrelationship between two modalities, it typically introduces a large number of parameters, which can lead to increased computational costs. To solve this problem, an effective method is to apply a weight matrix W i Decomposed into two low-order matrices, i.e. where W i =U i V i T Wherein U is i ∈R n×d And V i ∈R m×d And d is less than or equal to min (n, m) by applying constraint on the dimension d. Therefore, equation (5) can be further rewritten as:
Z i =x T U i V i T y+b i (6)
in general, the first term on the right of the equation can be further transformed with a Hadamard product or element-by-element multiplication to capture the inherent correlation between the two heterogeneous modes:
Figure BDA0003780255410000131
wherein 1 ∈ R d A column vector representing 1, and
Figure BDA0003780255410000132
representing a Hadamard or element-wise product. To obtain the output eigenvector z, two third order tensors are required: u = [ U1, …, UO]∈R n×d×o And V = [ V = 1 ,…,V o ]∈R m×d×o . Using linear projection P ∈ R d×o The column vector is replaced, so the vector z can be expressed as:
Figure BDA0003780255410000133
wherein b ∈ R o Is a deviation vector. The application of a non-linear activation function generally helps to increase the representation capability of the bilinear model. Therefore, a non-linear activation function is added after each linear mapping, so the vector z can be further expressed as:
Figure BDA0003780255410000141
where σ denotes any non-linear activation function, such as ReLU, sigmoid or tanh. Assuming that x and y represent the hand attention vector and the audio feature vector, respectively, then x is both greater than 0 and y is in the range of [ -1,1 ]. To avoid information loss, values may be mapped to a finite interval using different nonlinear activation functions. The size of the output neurons may vary greatly due to the introduction of element-by-element multiplication to obtain correlation between the two modalities. To reduce the impact of such variations, a Relu function is further added to normalize the output of the network, and the final vector z can be expressed as:
Figure BDA0003780255410000142
during the training process, the fusion parameters of the FBM can be updated and optimized by back propagation. And inputting paired audio and hand features into the fusion model, and outputting a final result on the full connection layer through softmax.
Technical route and implementation of feature matching
The FAST algorithm is a corner detection algorithm with a relatively high speed at present, but the FAST algorithm can generate false detection on some edge points, so that some false corner points exist. In order to eliminate the interference of the edge point to the detection result, the invention adopts an improved FAST algorithm, and the specific improvement is as follows: and taking 24 pixel points around one pixel point P as a detection template, setting a threshold value T for the gray value of the P point as IP, and if the gray value of 14 continuous pixel points in the 24 pixel points is greater than IP + T or less than IP-T, then P is an angular point. The invention uses Shi-Tomasi algorithm to optimize the characteristic points, the Shi-Tomasi algorithm takes the smaller of two characteristic values to compare with a given minimum threshold, if the smaller is larger than the minimum threshold, the strong corner point can be obtained.
The Shi-Tomasi algorithm detects corner points by calculating the gray level after the local small window W (x, y) is moved in each direction. Shifting the window u, v to produce a gray scale change E u, v
Figure BDA0003780255410000143
Where M is a 2 x 2 autocorrelation matrix, which can be calculated from the derivatives of the image
Figure BDA0003780255410000151
For λ in two features of matrix M max And λ min Is divided intoSince the larger uncertainty of curvature depends on the small corner, the corner response function is defined as λ min . Calculating the corner response function lambda of each point by using Shi-Tomasi algorithm for the characteristic points preliminarily extracted by using the improved FAST corner detection algorithm min According to λ min And taking the point with the maximum N response values to determine the characteristic point. At least 2 strong boundaries in different directions exist around the screened feature points, and the feature points are easy to identify and stable.
For matching of binary feature description vectors (as shown in fig. 5), hamming distance is generally used as a similarity measure between descriptors. The hamming distance is the minimum number of replacements required to change one of two binary strings of equal length to the other. Assuming that the two feature vectors of the descriptor are F1, F2, the hamming distance of F1, F2 is:
Figure BDA0003780255410000152
and judging whether the feature vectors are matched or not by determining the threshold value of the Hamming distance.
The above-described method according to the present invention can be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the method described herein can be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the processing methods described herein. Further, when a general-purpose computer accesses code for implementing the processes shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the processes shown herein.
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (6)

1. A multi-modal identification system for voiceprints and palm veins, comprising: the device comprises a power supply module, a fixed wavelength infrared LED light source module, an image acquisition CCD module, a voice acquisition module and a storage module;
a power supply module: for powering the entire multimodal identity recognition system
Fixed wavelength infrared LED light source module: the human hand is irradiated by an infrared LED light source to assist the image acquisition CCD module in acquiring the information characteristics of the palm print and the palm vein of the human body;
image acquisition CCD module: collecting the information characteristics of the palm print and the palm vein of the human body;
the voice acquisition module: extracting voice information by using MFCC characteristics;
a storage module: the device is used for storing data acquired by the voice acquisition module and the image acquisition CCD module;
the multi-modal identity recognition module: and preprocessing the picture, extracting picture characteristics, fusing and comparing the characteristics and outputting a result.
2. A multi-mode identity recognition method for non-contact voiceprints and palm print palm veins is characterized by comprising the following steps:
step 1, preprocessing an image; the preprocessing mainly comprises three steps, firstly, denoising an infrared acquisition palm image by adopting low-pass filtering, secondly, extracting a binary image of a palm region by an image enhancement part through a Sauvola algorithm, finally, performing gray level transformation on a palm print and a palm vein by an ROI positioning part to enable the palm edge to be protruded, then using a Canny operator for detecting the palm edge, and finally, cutting the image to obtain an interested palm region image;
step 2, feature extraction; the feature extraction is divided into two parts, wherein the first part is used for extracting voice features, and the second part is used for extracting two hand features of a palm print and a palm vein; adopting ResNet as a main body structure, introducing an SE module, constructing an SE-ResNet network structure, inputting a preprocessed picture into the SE-ResNet network structure, generating feature distribution by adding a global pooling layer, and finishing the extraction of information codes; in order to obtain the correlation among channels, a ReLU activation function and a sigmoid gate control mechanism are combined to complete the recalibration of the characteristics;
step 3, feature fusion; a multi-layer characteristic fusion mechanism is adopted, the bilinear models are decomposed for fusion to obtain interaction between different modes of hands and audio, paired audio and hand characteristics are input into the fusion model, and a final result is output on a full connection layer through softmax;
step 4, comparing characteristics; using the characteristic points preliminarily extracted by using an improved FAST corner detection algorithm, calculating the corner response function of each point by using a Shi-Tomasi algorithm, and taking the first N points with the maximum response values according to the corner response function to determine the points as the characteristic points; at least 2 strong boundaries in different directions exist around the screened feature points; for the matching of binary feature description vectors, the Hamming distance is adopted as the similarity measurement between descriptors;
step 5, outputting interaction; judging the in-mold sample characteristic points of the three modes by adopting a joint judgment sparse coding algorithm, so that the distance in the classes is minimum, and the distance between the classes is maximum; and setting a proper threshold value according to the actual scene requirement, if the two matched samples belong to the same class and are successfully matched in the voiceprint, the palm print and the palm vein, displaying that the authentication is successful on an interface, and otherwise, prompting that the authentication is failed.
3. The multimodal identification method of claim 2 wherein: the step 2 specifically comprises the following steps: for any given information, after entering the network module, the conversion is performed as shown in formula (1):
Figure FDA0003780255400000021
x is the input picture and U is the extracted feature;
the SE compresses the global space information into a channel descriptor, the channel descriptor contains the global distribution condition of the feature response on the channel dimension, and the global average pooling layer is utilized to obtain the statistical data on the channel dimension; statistical value
Figure FDA0003780255400000022
Is obtained by compressing U with spatial dimension H × W by equation (2):
Figure FDA0003780255400000023
the transformation output U is interpreted as a set of local descriptors, and the statistical information of the channel descriptors can express the whole image;
the dependency on the channel dimension is completely captured by utilizing the aggregation information obtained by compression operation; a simple threshold mechanism with sigmoid activation function (3) was chosen:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z)) (3)
where, δ represents the ReLU activation function,
Figure FDA0003780255400000031
and
Figure FDA0003780255400000032
in order to limit the complexity of the model and to help the generalization of the model, the threshold mechanism is parameterized by composing two fully-connected layers (FC) around the nonlinearity into a bottleneck (bottleeck) structure, and the final output of the block is obtained by rescaling the transform output U using the activation function (4):
Figure FDA0003780255400000033
in the formula (I), the compound is shown in the specification,
Figure FDA0003780255400000034
F scale (u c ,s c ) Representation characteristic diagram
Figure FDA0003780255400000035
And a scalar s c The product of the corresponding channels of (a); s the role of this activation function is to give a weight to each channel based on the descriptor z of the input feature.
4. The multimodal identification method of claim 2 wherein: the step 3 is as follows:
decomposing the bilinear model considers each feature pair by a linear transformation:
Z i =x T W i y+b i (5)
wherein x ∈ R n And y ∈ R m Is an input feature vector, W, from different modalities of hand and audio i Is a weight matrix, b i Is an offset;
weighting matrix W i Decomposed into two low-order matrices, i.e. where W i =U i V i T Wherein U is i ∈R n×d And V i ∈R m×d D is less than or equal to min (n, m) by applying constraint on the dimension d; equation (5) can be further rewritten as:
Z i =x T U i V i T y+b i (6)
capturing the inherent correlation between two heterogeneous modes, equation (7):
Figure FDA0003780255400000041
wherein 1 ∈ R d A column vector representing 1, and
Figure FDA0003780255400000042
representing a Hadamard or element-wise product;
to obtain the output eigenvector z, two third order tensors are required: u = [ U1, …, UO]∈R n×d×o And V = [ V = 1 ,…,V o ]∈R m×d×o (ii) a Using linear projection P ∈ R d×o Instead of a column vector, vector z is represented as:
Figure FDA0003780255400000043
wherein b ∈ R o Is a deviation vector;
after each linear mapping a non-linear activation function is added, the vector z is further represented as:
Figure FDA0003780255400000044
wherein σ represents any nonlinear activation function, and x and y represent a hand attention vector and an audio feature vector, respectively, the value of x is greater than 0, and y is in the range of [ -1,1 ];
a Relu function is further added to normalize the output of the network, and the final vector z can be expressed as:
Figure FDA0003780255400000045
and inputting paired audio and hand features into the fusion model, and outputting a final result on the full connection layer through softmax.
5. The multimodal identification method of claim 2 wherein: the step 4 is as follows:
an improved FAST algorithm is adopted, and the specific improvement is as follows: taking 24 pixels around one pixel P as a detection template, setting the gray value of the P as IP, setting a threshold value T, and if the gray value of 14 continuous pixels in the 24 pixels is greater than IP + T or less than IP-T, then P is an angular point;
using Shi-Tomasi algorithm to optimize the characteristic points, comparing the smaller of the two characteristic values with a given minimum threshold value by the Shi-Tomasi algorithm, and if the smaller of the two characteristic values is larger than the minimum threshold value, obtaining a strong corner point;
the Shi-Tomasi algorithm detects angular points by calculating the gray condition of the local small window W (x, y) after moving in each direction; shifting the window u, v to produce a gray scale change E u, v
Figure FDA0003780255400000051
Where M is a 2 x 2 autocorrelation matrix, calculated from the derivatives of the image
Figure FDA0003780255400000052
For λ in two features of matrix M max And λ min The analysis is performed as the corner response function is defined as λ, since the larger uncertainty of curvature depends on the small corner min (ii) a Calculating the corner response function lambda of each point by using Shi-Tomasi algorithm for the characteristic points preliminarily extracted by using the improved FAST corner detection algorithm min According to λ min Taking the first N points with the maximum response values to determine as characteristic points; at least 2 strong boundaries in different directions exist around the screened feature points, and the feature points are easy to identify and stable;
for the matching of binary feature description vectors, the Hamming distance is adopted as the similarity measurement between descriptors; let two feature vectors of the descriptor be F1, F2, then the hamming distance between F1, F2 is:
Figure FDA0003780255400000053
and judging whether the feature vectors are matched or not by determining the threshold value of the Hamming distance.
6. The multimodal identification method of claim 2 wherein: the step 5 is as follows:
the joint discrimination sparse coding algorithm is as follows: given characteristic matrixes X, Y and Z of three modes, jointly learning three projection matrixes Px, py and Pz, mapping three peak characteristics to sparse matrixes Vx from Rd multiplied by N, vy from Rd multiplied by N and Vz from Rd multiplied by N, and accurately approximating an original matrix X, Y, Z, such as Vx from PxX, vy from PyY and Vz from PzZ;
after the characteristic expressions Vx, vy and Vz are obtained from the three modes, the characteristic expressions are quantized to
C x =sgn(V x );C y =sgn(V y );C x =sgn(V X ); (13)
Wherein sgn () is a meta-level sign function to obtain sparse binary code, cx (Cy/Cy) = [ c1, c2, … ], cN ] ∈ Rl × N, ci ∈ {0,1} l represents the learned ith class of sparse binary code, and l (= 1,2, …, 12) is the length of the binary code;
sparsity constraint is applied to the projection characteristic representations Vx and Vy by utilizing two projection matrixes to reduce projection errors, and the Frobenius norm is used as a cost function, so that the engineering errors can be expressed as
Figure FDA0003780255400000061
Wherein a, b >0, a + b epsilon (0,1) are balance parameters for balancing three different modes;
two constraints are carried out on the projection sparse characteristics, namely 1) for the in-mold sample of each mode, the distance in the class is minimum, and the distance between the classes is maximum; 2) For the intra-class samples, the information correlation between the feature points is maximized, and thus the distance is minimized; the projection sparse feature has stronger resolution and compactness through constraint.
CN202210927661.3A 2022-08-03 2022-08-03 Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein Pending CN115188084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210927661.3A CN115188084A (en) 2022-08-03 2022-08-03 Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210927661.3A CN115188084A (en) 2022-08-03 2022-08-03 Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein

Publications (1)

Publication Number Publication Date
CN115188084A true CN115188084A (en) 2022-10-14

Family

ID=83521810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210927661.3A Pending CN115188084A (en) 2022-08-03 2022-08-03 Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein

Country Status (1)

Country Link
CN (1) CN115188084A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639560A (en) * 2020-05-15 2020-09-08 圣点世纪科技股份有限公司 Finger vein feature extraction method and device based on dynamic fusion of vein skeleton line and topographic relief characteristic
CN116504226A (en) * 2023-02-27 2023-07-28 佛山科学技术学院 Lightweight single-channel voiceprint recognition method and system based on deep learning
CN116580444A (en) * 2023-07-14 2023-08-11 广州思林杰科技股份有限公司 Method and equipment for testing long-distance running timing based on multi-antenna radio frequency identification technology

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639560A (en) * 2020-05-15 2020-09-08 圣点世纪科技股份有限公司 Finger vein feature extraction method and device based on dynamic fusion of vein skeleton line and topographic relief characteristic
CN116504226A (en) * 2023-02-27 2023-07-28 佛山科学技术学院 Lightweight single-channel voiceprint recognition method and system based on deep learning
CN116504226B (en) * 2023-02-27 2024-01-02 佛山科学技术学院 Lightweight single-channel voiceprint recognition method and system based on deep learning
CN116580444A (en) * 2023-07-14 2023-08-11 广州思林杰科技股份有限公司 Method and equipment for testing long-distance running timing based on multi-antenna radio frequency identification technology

Similar Documents

Publication Publication Date Title
Xin et al. Multimodal feature-level fusion for biometrics identification system on IoMT platform
CN115188084A (en) Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein
CN108416338B (en) Non-contact palm print identity authentication method
Barpanda et al. Iris recognition with tunable filter bank based feature
KR102483650B1 (en) User verification device and method
Malgheet et al. Iris recognition development techniques: a comprehensive review
Hou et al. Finger-vein biometric recognition: A review
Doublet et al. Robust grayscale distribution estimation for contactless palmprint recognition
Stojanović et al. Latent overlapped fingerprint separation: a review
CN112232163A (en) Fingerprint acquisition method and device, fingerprint comparison method and device, and equipment
Stojanović et al. A novel neural network based approach to latent overlapped fingerprints separation
Yang et al. A Face Detection Method Based on Skin Color Model and Improved AdaBoost Algorithm.
Rajasekar et al. Efficient multimodal biometric recognition for secure authentication based on deep learning approach
CN113657498B (en) Biological feature extraction method, training method, authentication method, device and equipment
Mehmood et al. Palmprint enhancement network (PEN) for robust identification
Prabu et al. A novel biometric system for person recognition using palm vein images
CN112232152B (en) Non-contact fingerprint identification method and device, terminal and storage medium
CN111428670B (en) Face detection method, face detection device, storage medium and equipment
Arora et al. Sp-net: One shot fingerprint singular-point detector
Dahea et al. An Efficient Feature Selection scheme based on Genetic Algorithm for Finger Vein Recognition
AlShemmary et al. Siamese Network-Based Palm Print Recognition
Santosh et al. Recent Trends in Image Processing and Pattern Recognition: Third International Conference, RTIP2R 2020, Aurangabad, India, January 3–4, 2020, Revised Selected Papers, Part I
Hariprasath et al. Bimodal biometric pattern recognition system based on fusion of iris and palmprint using multi-resolution approach
CN117688365B (en) Multi-mode biological identification access control system
Gao et al. On Designing a SwinIris Transformer Based Iris Recognition System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination