CN115188084A - Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein - Google Patents
Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein Download PDFInfo
- Publication number
- CN115188084A CN115188084A CN202210927661.3A CN202210927661A CN115188084A CN 115188084 A CN115188084 A CN 115188084A CN 202210927661 A CN202210927661 A CN 202210927661A CN 115188084 A CN115188084 A CN 115188084A
- Authority
- CN
- China
- Prior art keywords
- palm
- module
- feature
- characteristic
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 210000003462 vein Anatomy 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 238000003860 storage Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 38
- 230000006870 function Effects 0.000 claims description 32
- 230000004913 activation Effects 0.000 claims description 24
- 230000004927 fusion Effects 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 13
- 239000011159 matrix material Substances 0.000 claims description 13
- 230000007246 mechanism Effects 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 11
- 238000005316 response function Methods 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- 230000009466 transformation Effects 0.000 claims description 8
- 230000003993 interaction Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 230000014509 gene expression Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 150000001875 compounds Chemical class 0.000 claims description 3
- 230000006835 compression Effects 0.000 claims description 3
- 238000007906 compression Methods 0.000 claims description 3
- 230000006872 improvement Effects 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 2
- 238000004220 aggregation Methods 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 claims description 2
- 238000005520 cutting process Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims description 2
- 238000005259 measurement Methods 0.000 claims 2
- 230000008901 benefit Effects 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 6
- 238000012512 characterization method Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000011524 similarity measure Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 208000032544 Cicatrix Diseases 0.000 description 1
- VYPSYNLAJGMNEJ-UHFFFAOYSA-N Silicium dioxide Chemical compound O=[Si]=O VYPSYNLAJGMNEJ-UHFFFAOYSA-N 0.000 description 1
- DDNCQMVWWZOMLN-IRLDBZIGSA-N Vinpocetine Chemical compound C1=CC=C2C(CCN3CCC4)=C5[C@@H]3[C@]4(CC)C=C(C(=O)OCC)N5C2=C1 DDNCQMVWWZOMLN-IRLDBZIGSA-N 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000002902 bimodal effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 230000005021 gait Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 210000004205 output neuron Anatomy 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 231100000241 scar Toxicity 0.000 description 1
- 230000037387 scars Effects 0.000 description 1
- 239000000741 silica gel Substances 0.000 description 1
- 229910002027 silica gel Inorganic materials 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 229960000744 vinpocetine Drugs 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/70—Multimodal biometrics, e.g. combining information from different biometric modalities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/12—Fingerprints or palmprints
- G06V40/1347—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/14—Vascular patterns
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/08—Use of distortion metrics or a particular distance between probe pattern and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/32—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
- H04L9/3226—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
- H04L9/3231—Biological data, e.g. fingerprint, voice or retina
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Computer Security & Cryptography (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Biodiversity & Conservation Biology (AREA)
- Business, Economics & Management (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Vascular Medicine (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Collating Specific Patterns (AREA)
Abstract
The invention discloses a multi-mode identity recognition system and method for non-contact voiceprints and palm print palm veins, which comprises the following steps: a power supply module: the system is used for supplying power to the whole multi-mode identity recognition system; fixed wavelength infrared LED light source module: the hand of a human body is irradiated by an infrared LED light source, and the acquisition of the information characteristics of the palm print and the palm vein of the human body is assisted by an image acquisition CCD module; image acquisition CCD module: collecting the information characteristics of the palm print and the palm vein of the human body; the voice acquisition module: extracting voice information by using MFCC characteristics; a storage module: the device is used for storing data acquired by the voice acquisition module and the image acquisition CCD module. The multi-modal identity recognition module: and (4) preprocessing the picture, extracting picture characteristics, fusing and comparing the characteristics and outputting a result. The invention has the advantages that: the authentication safety is improved, the complexity of manually extracting features is reduced, the anti-noise interference capability is enhanced, and the robustness and the transportability of the system are improved.
Description
Technical Field
The invention relates to the technical field of biological feature recognition, in particular to a non-contact type multi-mode identity recognition system and method for voiceprints and palmar veins.
Background
With the rapid development of global information industrialization, how to perform rapid, accurate and safe identification and verification in a digital environment is a hot topic which is receiving attention in recent years. The traditional identity authentication is easy to lose, forget and forge, so that the biometric feature recognition technology is concerned more and more. Biometric identification is a process of identifying the authenticity of identity information after being processed by a system by collecting physiological characteristics and behavioral characteristics of a human body [1]. At present, the more mature or widely applied biometric identification technology is face, voice, fingerprint, iris, finger vein, DNA, signature, gait, etc. [2,3,4]. However, the single-mode biometric recognition may have a reduced accuracy due to sensor noise, unsuitability of feature extraction or matching methods, and may also have security problems due to the falsification of features, such as false fingerprint. Further, multimodal biometric identification comes into the line of sight of people. Different descriptions or perspectives of the same object are called modalities, while multi-modal characterization is the characterization of a particular task using information from multiple such entities together [3]. In general, a multi-modal biometric system fuses two or more biometric features at different levels, and can be divided into a sensor layer, a feature layer, a score layer, and a decision layer [5,6,7]. The research difficulty of multi-modal fusion authentication is how to effectively acquire, extract and compare the characteristics of multi-source heterogeneous data.
The characteristic learning technology is a technical set which can effectively identify and apply original complex data distribution according to tasks, namely useful information is extracted from data so as to learn data characteristics, and therefore the effectiveness of an algorithm model and the accuracy of a predictor are greatly improved. Based on the research of the characterization learning technology in the multi-mode data environment, the characterization learning can establish a model for processing and associating various mode information to perform multi-mode information fusion, so that the accuracy and the safety of identity authentication are improved. The goal of multi-modal token learning is to extract tokens of data objects (users) from data of multiple heterogeneous modalities, a typical approach is to concatenate the individual tokens of each modality together to form a joint token, and then perform subsequent task learning on this joint token [8]. The data representation is fused and the data of a plurality of data sources are unified, so that the heterogeneity among the data is overcome, and complementary information can be extracted from the data sources, so that the fused representation has richer and more effective information than that in a single mode.
The fingerprint identification is to identify the identity by utilizing the uneven grain characteristics on the skin on the front surface of the tail end of the finger, the fingerprint has uniqueness and stability, and the verification of the real identity is realized by comparing the fingerprint with the fingerprint prestored in a database. Among various biometric identification techniques, fingerprint identification remains the most mature identification technique, and fingerprint identification has been accepted by officials in many countries, becomes an effective means for identity identification in the judicial community, and has also been widely used in many other industrial fields, and has become a pronoun and de facto standard for biometric identification. The fingerprint identification technology mainly relates to the processes of fingerprint image acquisition, fingerprint image preprocessing, fingerprint feature extraction, fingerprint image database establishment, fingerprint feature value comparison and matching and the like. After years of research, various fingerprint identification methods have been generated, wherein the most mature and widely applied fingerprint identification method based on the minutiae is the most widely used fingerprint identification method. The image that adopts in the laboratory utilizes the current device in laboratory to accomplish the collection, and specific content includes: acquiring a fingerprint image directional diagram; segmenting the fingerprint image; enhancing the fingerprint image; carrying out binarization and post-processing on the fingerprint image; thinning the fingerprint image; sixthly, extracting the characteristics of the fingerprint image; matching of fingerprint images.
Disadvantages of the first prior art
The finger cleaning device has the advantages that the requirement on the environment is high, the finger cleaning degree and the humidity of the finger are sensitive, and dirty oil and water can not be identified or the identification result is influenced;
the problems of difficult identification and low identification rate of low-quality fingerprints such as scars, molting and the like are solved;
the operation specification requirement during fingerprint identification is high;
fingerprint traces may remain on the device, and these traces may be used to copy the fingerprint.
Prior art 2
Compared with other biological identification technologies, the palm print and palm vein fusion identification technology has higher identification precision, convenience and stability, is favorable for improving the convenience of life of people and improves the safety of personal information to a certain extent.
The palm print and the palm vein have the texture which does not change with age, and the palm print characteristic identification has the advantages of rich texture characteristics, easy acceptance by users, higher safety and stability and the like.
The second prior art has the defects
(1) Palm vein and palm print image acquisition environment. The collection of the palm vein mainly has contact collection and non-contact collection, and no matter which collection mode is utilized, the collection process can be influenced by factors such as illumination, collection background and temperature.
(2) The influence of the localized segmentation of the critical region of the palm vein. In order to obtain a region with rich vein features, a palm region-of-interest (ROI) image needs to be positioned and segmented, researchers generally adopt palm vein images of a Hongkong science university database to perform vein recognition research, and hardware equipment is installed at a valley position between a middle finger and a ring finger due to the fact that a palm needs to be fixed during collection of the palm vein images in the database, so that the palm vein ROI image is difficult to position and segment. Due to the lack of a proper ROI positioning segmentation method, the accuracy of feature extraction is low, and the recognition rate is low.
(3) The palm veins interfere with the palm prints. The palm vein image has a palm print, and the existing algorithm still cannot completely remove the interference of the palm print, for example, the robustness of the algorithm is improved by using fuzzy threshold judgment and global gray value matching, but the interference of the palm print is not better removed, so that the identification effect of the palm vein is poor.
(4) The non-contact acquisition mode mainly has the problems of position deviation, distance drift, image defocusing, brightness fluctuation and the like of the palm print sample image. For the anti-counterfeiting of palm print identification, counterfeiting means such as a silica gel prosthesis and a palm print film mainly exist. These factors are the main reason why the accuracy of the non-contact palm print recognition system is lower than that of the contact palm print recognition system, and also the main reason that the non-contact palm print recognition system is limited to be put into practical use.
Reference to the literature
[1] Liu Qianying, liu Ji biometric identification technology is developing in the field of authentication [ J ] the electronic world, 2020 (05): 23-24;
[2] xie Lu, yu Fei secure authentication technology based on multi-modal biometric [ J ] secret science technology, 2016 (01): 36-40;
[3] zhou Chenyi, multimodal biometric identification based on fusion algorithms and deep learning study [ D ]. Southern medical university, 2020;
[4] zhang Lou, wang Huabin, tao Liang, zhou Jian adaptive multimodal biometric fusion based on classification distance scores [ J ] computer research and development, 2018, volume 55 (1): 151-162;
[5] ma Ruru bimodal identity authentication research based on fingerprints and electrocardiosignals [ D ]. Tianjin university of science, 2021;
[6] zhang Yue, algorithmic study of multimodal biometric identification technology [ D ]. University of vinpocetine, 2017;
[7] ding Xuan multimodal biometric identification technology and its standardized dynamics [ J ] computer knowledge and technology, 2017, vol 13 (36): 153-154;
[8] halbernet, lu Kai, characterization learning summary of complex heterogeneous data [ J ] computer science, 2020,47 (02): 1-9.
Disclosure of Invention
The invention provides a non-contact type multi-mode identity recognition system and method for voiceprints and palmprint metacarpal veins, which aim at the defects of the prior art. The identity authentication multi-mode biological feature recognition method based on the intelligent data representation theory is used as core content, and related technologies are integrated in a network security scene, so that the identity authentication multi-mode biological feature recognition method has high safety, convenience and reliability.
In order to realize the purpose, the technical scheme adopted by the invention is as follows:
a multi-modal identification system for non-contact voiceprint and palmprint metacarpal veins, comprising: the system comprises a power supply module, a fixed wavelength infrared LED light source module, an image acquisition CCD module, a voice acquisition module, a storage module and a multi-mode identity recognition module;
a power supply module: for powering the entire multimodal identity recognition system
Fixed wavelength infrared LED light source module: the human hand is irradiated by an infrared LED light source to assist the image acquisition CCD module in acquiring the information characteristics of the palm print and the palm vein of the human body;
image acquisition CCD module: collecting the information characteristics of the palm print and the palm vein of the human body;
the voice acquisition module: extracting voice information by using MFCC characteristics;
a storage module: the device is used for storing data acquired by the voice acquisition module and the image acquisition CCD module.
The multi-modal identity recognition module: and preprocessing the picture, extracting picture characteristics, fusing and comparing the characteristics and outputting a result.
A multi-mode identity recognition method for non-contact voiceprints and palm print palm veins comprises the following steps:
Step 2, feature extraction; the feature extraction is divided into two parts, wherein the first part is used for extracting voice features, and the second part is used for extracting two hand features of a palm print and a palm vein. Adopting ResNet as a main structure, introducing an SE module, constructing an SE-ResNet network structure, inputting the preprocessed pictures into the SE-ResNet network structure, generating feature distribution by adding a global pooling layer, and finishing the extraction of information codes. In order to obtain the correlation between channels, a ReLU activation function and a sigmoid gate control mechanism are combined to complete the recalibration of the characteristics.
Step 3, feature fusion; and a multi-layer characteristic fusion mechanism is adopted, the bilinear models are decomposed to carry out fusion to obtain the interaction between different modes of the hand and the audio, paired audio and hand characteristics are input into the fusion model, and the final result is output on a full connection layer through softmax.
Step 4, comparing characteristics; and (3) calculating the corner response function of each point by using the Shi-Tomasi algorithm for the feature points preliminarily extracted by using the improved FAST corner detection algorithm, and determining the top N points with the maximum response values as the feature points according to the corner response function. There are at least 2 strong boundaries in different directions around the screened feature points. For matching of binary feature description vectors, hamming distance is used as a similarity measure between descriptors.
And 5, outputting the interaction. Judging the in-mold sample characteristic points of the three modes by adopting a joint judgment sparse coding algorithm, so that the distance in the classes is minimum, and the distance between the classes is maximum; and setting a proper threshold value according to the actual scene requirement, if the two matched samples belong to the same class and are successfully matched in the voiceprint, the palm print and the palm vein, displaying that the authentication is successful on an interface, and otherwise, prompting that the authentication is failed.
Further, step 2 specifically comprises: for any given information, after entering the network module, the conversion is performed as shown in formula (1):
x is the input picture and U is the extracted feature.
And the SE compresses the global space information into a channel descriptor, the channel descriptor contains the global distribution condition of the feature response on the channel dimension, and the global average pooling layer is utilized to obtain the statistical data on the channel dimension. Statistical valueIs obtained by compressing U having a spatial dimension H × W by equation (2):
the transform output U is interpreted as a set of local descriptors, the statistics of which can express the entire image.
And completely capturing the dependency on the channel dimension by using the aggregation information obtained by the compression operation. A simple threshold mechanism with sigmoid activation function (3) was chosen:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z)) (3)
in order to limit the complexity of the model and to assist the generalization of the model, the threshold mechanism is parameterized by composing two fully-connected layers (FC) around the nonlinearity into a bottleneck (bottleeck) structure, and the final output of the block is obtained by rescaling the transform output U using the activation function (4):
in the formula (I), the compound is shown in the specification,F scale( u c ,s c ) Representation characteristic diagramAnd a scalar s c The product of the corresponding channels. s the role of this activation function is to give a weight to each channel based on the descriptor z of the input feature.
Further, step 3 is specifically as follows:
decomposing the bilinear model considers each feature pair by a linear transformation:
Z i =x T W i y+b i (5)
wherein x ∈ R n And y ∈ R m Is an input feature vector, W, from different modalities of hand and audio i Is a weightMatrix, b i Is the offset.
Weighting matrix W i Decomposed into two low-order matrices, i.e. where W i =U i V i T Wherein U is i ∈R n×d And V i ∈R m×d And d is less than or equal to min (n, m) by applying constraint on the dimension d. Equation (5) can be further rewritten as:
Z i =x T U i V i T y+b i (6)
capturing the inherent correlation between two heterogeneous modes, equation (7):
To obtain the output eigenvector z, two third order tensors are required: u = [ U1, …, UO]∈R n×d×o And V = [ V = 1 ,…,V o ]∈R m×d×o . Using linear projection P ∈ R d×o Instead of a column vector, vector z is represented as:
wherein b ∈ R o Is a deviation vector.
After each linear mapping a non-linear activation function is added, the vector z is further represented as:
where σ represents any nonlinear activation function, and x and y represent the hand attention vector and audio feature vector, respectively, then the value of x is both greater than 0, and y is in the range of [ -1,1 ].
A Relu function is further added to normalize the output of the network, and the final vector z can be expressed as:
and inputting paired audio and hand features into the fusion model, and outputting a final result on the full connection layer through softmax.
Further, step 4 is specifically as follows:
the improved FAST algorithm is adopted, and the specific improvement is as follows: taking 24 pixels around one pixel P as a detection template, setting the gray value of the P as IP, setting a threshold value T, and if the gray value of 14 continuous pixels in the 24 pixels is greater than IP + T or less than IP-T, then P is an angular point.
And (3) optimizing the characteristic points by using a Shi-Tomasi algorithm, wherein the Shi-Tomasi algorithm compares the smaller one of the two characteristic values with a given minimum threshold value, and if the smaller one of the two characteristic values is larger than the given minimum threshold value, a strong corner point is obtained.
The Shi-Tomasi algorithm detects corner points by calculating the gray level after the local small window W (x, y) is moved in each direction. Shifting the window u, v to produce a gray scale change E u, v
Where M is a 2 x 2 autocorrelation matrix, calculated from the derivatives of the image
For λ in two features of matrix M max And λ min The analysis is performed in that the corner response function is defined as λ, since the larger uncertainty of curvature depends on the small corner min . Calculating the corner response function of each point by using Shi-Tomasi algorithm for the characteristic points preliminarily extracted by using the improved FAST corner detection algorithmλ min According to λ min And taking the point with the maximum N response values to determine the characteristic point. At least 2 strong boundaries in different directions exist around the screened feature points, and the feature points are easy to identify and stable.
For matching of binary feature description vectors, hamming distance is used as a similarity measure between descriptors. Let two feature vectors of the descriptor be F1, F2, then the hamming distance of F1, F2 is:
and judging whether the feature vectors are matched or not by determining the threshold value of the Hamming distance.
Further, step 5 is specifically as follows:
the joint discrimination sparse coding algorithm is as follows: given the feature matrices X, Y and Z of the three modalities, jointly learning the three projection matrices Px, py and Pz, mapping the three peak features to sparse matrices Vx ∈ Rd × N, vy ∈ Rd × N and Vz ∈ Rd × N, can accurately approximate the original matrix X, Y, Z, as Vx ≈ PxX, vy ≈ PyY, vz ≈ PzZ.
After the characteristic expressions Vx, vy and Vz are obtained from the three modes, the characteristic expressions are quantized to
C x =sgn(V x );C y =sgn(V y );C x =sgn(V X ); (13)
Wherein sgn () is a meta-level sign function to obtain sparse binary code, cx (Cy/Cy) = [ c1, c2, … ], cN ] ∈ Rl × N, ci ∈ {0,1} l represents the learned ith class of sparse binary code, and l (= 1,2, …, 12) is the length of the binary code.
Sparsity constraint is applied to the projection characteristic representations Vx and Vy by utilizing two projection matrixes to reduce projection errors, and the Frobenius norm is used as a cost function, so that the engineering errors can be expressed as
Where a, b >0, a + b epsilon (0,1) are trade-off parameters that balance the three different modes.
Two constraints are performed on the projection sparse features, 1) for the in-mold samples of each mode, the distance in the class is minimized, and the distance between the classes is maximized; 2) For intra-class samples, the information correlation between feature points is maximized and thus the distance is minimized. The projection sparse feature has stronger resolution and compactness through constraint.
Compared with the prior art, the invention has the advantages that:
(1) The method adopts a non-contact mode to collect the characteristics of the voiceprint, the palm print and the palm vein, improves the safety of authentication, and is suitable for scenes with higher requirements on the sanitary environment under epidemic situations and the like.
(2) The feature extraction adopts a deep learning mode, the complexity of manually extracting features is reduced, the anti-noise interference capability is enhanced, and the robustness and the transportability of the system are improved.
(3) Voiceprint recognition is integrated into palm print and palm vein recognition, and three modal features are integrated for identity authentication, so that the security, accuracy and robustness of authentication are improved.
Drawings
FIG. 1 is a diagram of a multi-modal identification system architecture in accordance with an embodiment of the present invention;
FIG. 2 is a flow diagram of the operation of a multimodal identity recognition system in accordance with an embodiment of the present invention;
FIG. 3 is a diagram of a SE-ResNet network architecture in accordance with an embodiment of the present invention;
FIG. 4 is a block diagram of a multi-layer feature fusion model according to an embodiment of the present invention;
fig. 5 is a feature matching flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail below with reference to the accompanying drawings by way of examples.
The voice and hand multi-mode information acquisition device is an important device for identifying the identity of a human body, and the acquisition principle of the device is shown in figure 1. The voice and hand multi-mode identity recognition designed by the system is realized by placing a human hand in an infrared LED light source environment, collecting the information characteristics of a palm print and a palm vein of the human body by using a CCD device, extracting voice information by using MFCC characteristics and comparing the extracted voice and hand multi-mode characteristics with a verification map.
The overall architecture design of the system is shown in fig. 1, and mainly comprises a hardware part and a software part. The hardware part is mainly used for collecting multi-mode voice and hand characteristic information, and the software part is mainly used for multi-mode information processing and recognition. The system flow chart is as shown in fig. 2, and the hardware part specifically comprises a power supply module, a fixed wavelength infrared LED light source module, an image acquisition CCD module, a voice acquisition module and a storage module; the software part comprises image preprocessing, a feature extraction algorithm, feature fusion comparison and a user interaction interface.
The feature extraction is divided into two parts, wherein the first part is used for extracting voice features, and the second part is used for extracting two hand features of a palm print and a palm vein. The invention adopts ResNet as a main structure, and introduces an SE module on the basis to construct an SE-ResNet network structure, as shown in figure 3. And generating feature distribution by adding a global pooling layer, and finishing the extraction of information codes according to the feature distribution. In order to obtain the correlation between channels, a ReLU activation function and a sigmoid gate control mechanism are combined to complete the recalibration of the characteristics. In addition, in order to simplify the complexity of the model parameters, 1 × 1 full connection layers are also used at both ends of the ReLU function.
The SE (Squeeze-and-Excitation Networks) module is a computing unit that can consist of any given transformation, and for any given information, it performs the conversion as shown in (1) after entering the network module:
x is the input picture and U is the extracted feature. In order to make the information of the global receptive field from the network available to the lower level layers, the SE compresses the global spatial information into a channel descriptorThe character comprises the global distribution condition of the feature response on the channel dimension, and statistical data on one channel dimension is obtained by utilizing the global average pooling layer. Statistical valueIs derived from (2) compressing U with spatial dimension H × W:
the transformation output U can be interpreted as a set of local descriptors whose statistics can represent the entire image. In order to be able to exploit the aggregated information from the compression operation, the next goal is to fully capture the dependencies in channel dimensions. A simple threshold mechanism with sigmoid activation function (3) was chosen:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z)) (3)
where, δ represents the ReLU activation function,andto limit the complexity of the model and to assist the generalization of the model, the threshold mechanism is parameterized by constructing a bottleneck (bottleeck) structure from two fully connected layers (FC) around the non-linearity, e.g. with one parameter W 1 A dimension reduction layer for reducing the amount of parameters by a factor of r, a ReLU activation function and a parameter of W 2 Is added to the layer. The final output of the block is obtained by rescaling the transformed output U using the activation function (4):
in the formula (I), the compound is shown in the specification,F scale( u c ,s c ) Representation characteristic diagramAnd a scalar s c The product of the corresponding channels. s the role of this activation function is to give a weight to each channel based on the descriptor z of the input feature.
Technical route and implementation scheme of multi-layer feature fusion mechanism
In general, cascading or element-by-element summation is the most common scheme for heterogeneous feature fusion. Since the distribution of audio and hand features typically varies widely and their feature sizes typically vary in size, the representational capabilities of these simple fusion schemes may not be sufficient to achieve reliable speaker naming performance. Fusing by decomposing a bilinear model (FBM) enables better capture of the interaction between the two different modalities and is generally superior to simple fusion methods (e.g., tandem), as shown in fig. 4.
Decomposing the bilinear model considers each feature pair by a linear transformation:
Z i =x T W i y+b i (5)
wherein x ∈ R n And y ∈ R m Is an input feature vector from two different modalities (e.g., high-level features of hands and audio), W i Is a weight matrix, b i Is the offset. Although a bilinear model can capture the pairwise interrelationship between two modalities, it typically introduces a large number of parameters, which can lead to increased computational costs. To solve this problem, an effective method is to apply a weight matrix W i Decomposed into two low-order matrices, i.e. where W i =U i V i T Wherein U is i ∈R n×d And V i ∈R m×d And d is less than or equal to min (n, m) by applying constraint on the dimension d. Therefore, equation (5) can be further rewritten as:
Z i =x T U i V i T y+b i (6)
in general, the first term on the right of the equation can be further transformed with a Hadamard product or element-by-element multiplication to capture the inherent correlation between the two heterogeneous modes:
wherein 1 ∈ R d A column vector representing 1, andrepresenting a Hadamard or element-wise product. To obtain the output eigenvector z, two third order tensors are required: u = [ U1, …, UO]∈R n×d×o And V = [ V = 1 ,…,V o ]∈R m×d×o . Using linear projection P ∈ R d×o The column vector is replaced, so the vector z can be expressed as:
wherein b ∈ R o Is a deviation vector. The application of a non-linear activation function generally helps to increase the representation capability of the bilinear model. Therefore, a non-linear activation function is added after each linear mapping, so the vector z can be further expressed as:
where σ denotes any non-linear activation function, such as ReLU, sigmoid or tanh. Assuming that x and y represent the hand attention vector and the audio feature vector, respectively, then x is both greater than 0 and y is in the range of [ -1,1 ]. To avoid information loss, values may be mapped to a finite interval using different nonlinear activation functions. The size of the output neurons may vary greatly due to the introduction of element-by-element multiplication to obtain correlation between the two modalities. To reduce the impact of such variations, a Relu function is further added to normalize the output of the network, and the final vector z can be expressed as:
during the training process, the fusion parameters of the FBM can be updated and optimized by back propagation. And inputting paired audio and hand features into the fusion model, and outputting a final result on the full connection layer through softmax.
Technical route and implementation of feature matching
The FAST algorithm is a corner detection algorithm with a relatively high speed at present, but the FAST algorithm can generate false detection on some edge points, so that some false corner points exist. In order to eliminate the interference of the edge point to the detection result, the invention adopts an improved FAST algorithm, and the specific improvement is as follows: and taking 24 pixel points around one pixel point P as a detection template, setting a threshold value T for the gray value of the P point as IP, and if the gray value of 14 continuous pixel points in the 24 pixel points is greater than IP + T or less than IP-T, then P is an angular point. The invention uses Shi-Tomasi algorithm to optimize the characteristic points, the Shi-Tomasi algorithm takes the smaller of two characteristic values to compare with a given minimum threshold, if the smaller is larger than the minimum threshold, the strong corner point can be obtained.
The Shi-Tomasi algorithm detects corner points by calculating the gray level after the local small window W (x, y) is moved in each direction. Shifting the window u, v to produce a gray scale change E u, v
Where M is a 2 x 2 autocorrelation matrix, which can be calculated from the derivatives of the image
For λ in two features of matrix M max And λ min Is divided intoSince the larger uncertainty of curvature depends on the small corner, the corner response function is defined as λ min . Calculating the corner response function lambda of each point by using Shi-Tomasi algorithm for the characteristic points preliminarily extracted by using the improved FAST corner detection algorithm min According to λ min And taking the point with the maximum N response values to determine the characteristic point. At least 2 strong boundaries in different directions exist around the screened feature points, and the feature points are easy to identify and stable.
For matching of binary feature description vectors (as shown in fig. 5), hamming distance is generally used as a similarity measure between descriptors. The hamming distance is the minimum number of replacements required to change one of two binary strings of equal length to the other. Assuming that the two feature vectors of the descriptor are F1, F2, the hamming distance of F1, F2 is:
and judging whether the feature vectors are matched or not by determining the threshold value of the Hamming distance.
The above-described method according to the present invention can be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the method described herein can be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It is understood that a computer, processor, microprocessor controller, or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by a computer, processor, or hardware, implements the processing methods described herein. Further, when a general-purpose computer accesses code for implementing the processes shown herein, execution of the code transforms the general-purpose computer into a special-purpose computer for performing the processes shown herein.
It will be appreciated by those of ordinary skill in the art that the examples described herein are intended to assist the reader in understanding the manner in which the invention is practiced, and it is to be understood that the scope of the invention is not limited to such specifically recited statements and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.
Claims (6)
1. A multi-modal identification system for voiceprints and palm veins, comprising: the device comprises a power supply module, a fixed wavelength infrared LED light source module, an image acquisition CCD module, a voice acquisition module and a storage module;
a power supply module: for powering the entire multimodal identity recognition system
Fixed wavelength infrared LED light source module: the human hand is irradiated by an infrared LED light source to assist the image acquisition CCD module in acquiring the information characteristics of the palm print and the palm vein of the human body;
image acquisition CCD module: collecting the information characteristics of the palm print and the palm vein of the human body;
the voice acquisition module: extracting voice information by using MFCC characteristics;
a storage module: the device is used for storing data acquired by the voice acquisition module and the image acquisition CCD module;
the multi-modal identity recognition module: and preprocessing the picture, extracting picture characteristics, fusing and comparing the characteristics and outputting a result.
2. A multi-mode identity recognition method for non-contact voiceprints and palm print palm veins is characterized by comprising the following steps:
step 1, preprocessing an image; the preprocessing mainly comprises three steps, firstly, denoising an infrared acquisition palm image by adopting low-pass filtering, secondly, extracting a binary image of a palm region by an image enhancement part through a Sauvola algorithm, finally, performing gray level transformation on a palm print and a palm vein by an ROI positioning part to enable the palm edge to be protruded, then using a Canny operator for detecting the palm edge, and finally, cutting the image to obtain an interested palm region image;
step 2, feature extraction; the feature extraction is divided into two parts, wherein the first part is used for extracting voice features, and the second part is used for extracting two hand features of a palm print and a palm vein; adopting ResNet as a main body structure, introducing an SE module, constructing an SE-ResNet network structure, inputting a preprocessed picture into the SE-ResNet network structure, generating feature distribution by adding a global pooling layer, and finishing the extraction of information codes; in order to obtain the correlation among channels, a ReLU activation function and a sigmoid gate control mechanism are combined to complete the recalibration of the characteristics;
step 3, feature fusion; a multi-layer characteristic fusion mechanism is adopted, the bilinear models are decomposed for fusion to obtain interaction between different modes of hands and audio, paired audio and hand characteristics are input into the fusion model, and a final result is output on a full connection layer through softmax;
step 4, comparing characteristics; using the characteristic points preliminarily extracted by using an improved FAST corner detection algorithm, calculating the corner response function of each point by using a Shi-Tomasi algorithm, and taking the first N points with the maximum response values according to the corner response function to determine the points as the characteristic points; at least 2 strong boundaries in different directions exist around the screened feature points; for the matching of binary feature description vectors, the Hamming distance is adopted as the similarity measurement between descriptors;
step 5, outputting interaction; judging the in-mold sample characteristic points of the three modes by adopting a joint judgment sparse coding algorithm, so that the distance in the classes is minimum, and the distance between the classes is maximum; and setting a proper threshold value according to the actual scene requirement, if the two matched samples belong to the same class and are successfully matched in the voiceprint, the palm print and the palm vein, displaying that the authentication is successful on an interface, and otherwise, prompting that the authentication is failed.
3. The multimodal identification method of claim 2 wherein: the step 2 specifically comprises the following steps: for any given information, after entering the network module, the conversion is performed as shown in formula (1):
x is the input picture and U is the extracted feature;
the SE compresses the global space information into a channel descriptor, the channel descriptor contains the global distribution condition of the feature response on the channel dimension, and the global average pooling layer is utilized to obtain the statistical data on the channel dimension; statistical valueIs obtained by compressing U with spatial dimension H × W by equation (2):
the transformation output U is interpreted as a set of local descriptors, and the statistical information of the channel descriptors can express the whole image;
the dependency on the channel dimension is completely captured by utilizing the aggregation information obtained by compression operation; a simple threshold mechanism with sigmoid activation function (3) was chosen:
s=F ex (z,W)=σ(g(z,W))=σ(W 2 δ(W 1 z)) (3)
in order to limit the complexity of the model and to help the generalization of the model, the threshold mechanism is parameterized by composing two fully-connected layers (FC) around the nonlinearity into a bottleneck (bottleeck) structure, and the final output of the block is obtained by rescaling the transform output U using the activation function (4):
in the formula (I), the compound is shown in the specification,F scale (u c ,s c ) Representation characteristic diagramAnd a scalar s c The product of the corresponding channels of (a); s the role of this activation function is to give a weight to each channel based on the descriptor z of the input feature.
4. The multimodal identification method of claim 2 wherein: the step 3 is as follows:
decomposing the bilinear model considers each feature pair by a linear transformation:
Z i =x T W i y+b i (5)
wherein x ∈ R n And y ∈ R m Is an input feature vector, W, from different modalities of hand and audio i Is a weight matrix, b i Is an offset;
weighting matrix W i Decomposed into two low-order matrices, i.e. where W i =U i V i T Wherein U is i ∈R n×d And V i ∈R m×d D is less than or equal to min (n, m) by applying constraint on the dimension d; equation (5) can be further rewritten as:
Z i =x T U i V i T y+b i (6)
capturing the inherent correlation between two heterogeneous modes, equation (7):
to obtain the output eigenvector z, two third order tensors are required: u = [ U1, …, UO]∈R n×d×o And V = [ V = 1 ,…,V o ]∈R m×d×o (ii) a Using linear projection P ∈ R d×o Instead of a column vector, vector z is represented as:
wherein b ∈ R o Is a deviation vector;
after each linear mapping a non-linear activation function is added, the vector z is further represented as:
wherein σ represents any nonlinear activation function, and x and y represent a hand attention vector and an audio feature vector, respectively, the value of x is greater than 0, and y is in the range of [ -1,1 ];
a Relu function is further added to normalize the output of the network, and the final vector z can be expressed as:
and inputting paired audio and hand features into the fusion model, and outputting a final result on the full connection layer through softmax.
5. The multimodal identification method of claim 2 wherein: the step 4 is as follows:
an improved FAST algorithm is adopted, and the specific improvement is as follows: taking 24 pixels around one pixel P as a detection template, setting the gray value of the P as IP, setting a threshold value T, and if the gray value of 14 continuous pixels in the 24 pixels is greater than IP + T or less than IP-T, then P is an angular point;
using Shi-Tomasi algorithm to optimize the characteristic points, comparing the smaller of the two characteristic values with a given minimum threshold value by the Shi-Tomasi algorithm, and if the smaller of the two characteristic values is larger than the minimum threshold value, obtaining a strong corner point;
the Shi-Tomasi algorithm detects angular points by calculating the gray condition of the local small window W (x, y) after moving in each direction; shifting the window u, v to produce a gray scale change E u, v
Where M is a 2 x 2 autocorrelation matrix, calculated from the derivatives of the image
For λ in two features of matrix M max And λ min The analysis is performed as the corner response function is defined as λ, since the larger uncertainty of curvature depends on the small corner min (ii) a Calculating the corner response function lambda of each point by using Shi-Tomasi algorithm for the characteristic points preliminarily extracted by using the improved FAST corner detection algorithm min According to λ min Taking the first N points with the maximum response values to determine as characteristic points; at least 2 strong boundaries in different directions exist around the screened feature points, and the feature points are easy to identify and stable;
for the matching of binary feature description vectors, the Hamming distance is adopted as the similarity measurement between descriptors; let two feature vectors of the descriptor be F1, F2, then the hamming distance between F1, F2 is:
and judging whether the feature vectors are matched or not by determining the threshold value of the Hamming distance.
6. The multimodal identification method of claim 2 wherein: the step 5 is as follows:
the joint discrimination sparse coding algorithm is as follows: given characteristic matrixes X, Y and Z of three modes, jointly learning three projection matrixes Px, py and Pz, mapping three peak characteristics to sparse matrixes Vx from Rd multiplied by N, vy from Rd multiplied by N and Vz from Rd multiplied by N, and accurately approximating an original matrix X, Y, Z, such as Vx from PxX, vy from PyY and Vz from PzZ;
after the characteristic expressions Vx, vy and Vz are obtained from the three modes, the characteristic expressions are quantized to
C x =sgn(V x );C y =sgn(V y );C x =sgn(V X ); (13)
Wherein sgn () is a meta-level sign function to obtain sparse binary code, cx (Cy/Cy) = [ c1, c2, … ], cN ] ∈ Rl × N, ci ∈ {0,1} l represents the learned ith class of sparse binary code, and l (= 1,2, …, 12) is the length of the binary code;
sparsity constraint is applied to the projection characteristic representations Vx and Vy by utilizing two projection matrixes to reduce projection errors, and the Frobenius norm is used as a cost function, so that the engineering errors can be expressed as
Wherein a, b >0, a + b epsilon (0,1) are balance parameters for balancing three different modes;
two constraints are carried out on the projection sparse characteristics, namely 1) for the in-mold sample of each mode, the distance in the class is minimum, and the distance between the classes is maximum; 2) For the intra-class samples, the information correlation between the feature points is maximized, and thus the distance is minimized; the projection sparse feature has stronger resolution and compactness through constraint.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210927661.3A CN115188084A (en) | 2022-08-03 | 2022-08-03 | Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210927661.3A CN115188084A (en) | 2022-08-03 | 2022-08-03 | Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115188084A true CN115188084A (en) | 2022-10-14 |
Family
ID=83521810
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210927661.3A Pending CN115188084A (en) | 2022-08-03 | 2022-08-03 | Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115188084A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111639560A (en) * | 2020-05-15 | 2020-09-08 | 圣点世纪科技股份有限公司 | Finger vein feature extraction method and device based on dynamic fusion of vein skeleton line and topographic relief characteristic |
CN116504226A (en) * | 2023-02-27 | 2023-07-28 | 佛山科学技术学院 | Lightweight single-channel voiceprint recognition method and system based on deep learning |
CN116580444A (en) * | 2023-07-14 | 2023-08-11 | 广州思林杰科技股份有限公司 | Method and equipment for testing long-distance running timing based on multi-antenna radio frequency identification technology |
-
2022
- 2022-08-03 CN CN202210927661.3A patent/CN115188084A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111639560A (en) * | 2020-05-15 | 2020-09-08 | 圣点世纪科技股份有限公司 | Finger vein feature extraction method and device based on dynamic fusion of vein skeleton line and topographic relief characteristic |
CN116504226A (en) * | 2023-02-27 | 2023-07-28 | 佛山科学技术学院 | Lightweight single-channel voiceprint recognition method and system based on deep learning |
CN116504226B (en) * | 2023-02-27 | 2024-01-02 | 佛山科学技术学院 | Lightweight single-channel voiceprint recognition method and system based on deep learning |
CN116580444A (en) * | 2023-07-14 | 2023-08-11 | 广州思林杰科技股份有限公司 | Method and equipment for testing long-distance running timing based on multi-antenna radio frequency identification technology |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xin et al. | Multimodal feature-level fusion for biometrics identification system on IoMT platform | |
CN115188084A (en) | Multi-mode identity recognition system and method for non-contact voiceprint and palm print palm vein | |
CN108416338B (en) | Non-contact palm print identity authentication method | |
Barpanda et al. | Iris recognition with tunable filter bank based feature | |
KR102483650B1 (en) | User verification device and method | |
Malgheet et al. | Iris recognition development techniques: a comprehensive review | |
Hou et al. | Finger-vein biometric recognition: A review | |
Doublet et al. | Robust grayscale distribution estimation for contactless palmprint recognition | |
Stojanović et al. | Latent overlapped fingerprint separation: a review | |
CN112232163A (en) | Fingerprint acquisition method and device, fingerprint comparison method and device, and equipment | |
Stojanović et al. | A novel neural network based approach to latent overlapped fingerprints separation | |
Yang et al. | A Face Detection Method Based on Skin Color Model and Improved AdaBoost Algorithm. | |
Rajasekar et al. | Efficient multimodal biometric recognition for secure authentication based on deep learning approach | |
CN113657498B (en) | Biological feature extraction method, training method, authentication method, device and equipment | |
Mehmood et al. | Palmprint enhancement network (PEN) for robust identification | |
Prabu et al. | A novel biometric system for person recognition using palm vein images | |
CN112232152B (en) | Non-contact fingerprint identification method and device, terminal and storage medium | |
CN111428670B (en) | Face detection method, face detection device, storage medium and equipment | |
Arora et al. | Sp-net: One shot fingerprint singular-point detector | |
Dahea et al. | An Efficient Feature Selection scheme based on Genetic Algorithm for Finger Vein Recognition | |
AlShemmary et al. | Siamese Network-Based Palm Print Recognition | |
Santosh et al. | Recent Trends in Image Processing and Pattern Recognition: Third International Conference, RTIP2R 2020, Aurangabad, India, January 3–4, 2020, Revised Selected Papers, Part I | |
Hariprasath et al. | Bimodal biometric pattern recognition system based on fusion of iris and palmprint using multi-resolution approach | |
CN117688365B (en) | Multi-mode biological identification access control system | |
Gao et al. | On Designing a SwinIris Transformer Based Iris Recognition System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |