CN110738985A - Cross-modal biometric feature recognition method and system based on voice signals - Google Patents

Cross-modal biometric feature recognition method and system based on voice signals Download PDF

Info

Publication number
CN110738985A
CN110738985A CN201910981216.3A CN201910981216A CN110738985A CN 110738985 A CN110738985 A CN 110738985A CN 201910981216 A CN201910981216 A CN 201910981216A CN 110738985 A CN110738985 A CN 110738985A
Authority
CN
China
Prior art keywords
features
vector
voiceprint
modal
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910981216.3A
Other languages
Chinese (zh)
Inventor
潘成华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Net Into Polytron Technologies Inc
Original Assignee
Jiangsu Net Into Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Net Into Polytron Technologies Inc filed Critical Jiangsu Net Into Polytron Technologies Inc
Priority to CN201910981216.3A priority Critical patent/CN110738985A/en
Publication of CN110738985A publication Critical patent/CN110738985A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention provides a method for identifying cross-modal biological characteristics of voice signals, which comprises the steps of S1 obtaining multi-modal biological characteristic information including voice signals to be identified and a plurality of persons, S2 extracting characteristics by utilizing a neural network model aiming at each single modes to obtain voiceprint characteristics and vectors of fixed dimensions of corresponding biological characteristics of other modes, S3 confirming whether the voiceprint characteristic vectors of the multi-modal biological characteristics and the characteristic vectors of other dimensions are from the same person, aiming at a plurality of obtained parallel vector pairs and corresponding 0 or 1 labels, carrying out classification training, selecting a loss function to evaluate optimal model and parameters, outputting 0 or 1 to confirm an identification result, and identifying biological characteristic information of other modes of a speaker by a system through inputting the voice signals.

Description

Cross-modal biometric feature recognition method and system based on voice signals
Technical Field
The invention relates to biometric feature recognition methods and systems, in particular to cross-modal biometric feature recognition methods and systems based on voice signals.
Background
With the widespread application of the artificial intelligence technology in in the field of biometric identification, the technologies of face identification, voiceprint identification, fingerprint identification, iris identification, palm print identification, gait identification and the like have obtained very high identification rate and a large number of application scenes capable of landing.
However, in some practical applications , there are no registered data corresponding to the biometric identification modality data, for example, there is a phone recording of a fraudster suspect, but there is no registered voice thereof, so that voiceprint identification cannot be performed.
There is strong correlation between the biometric data of individuals in different modalities, for example, by listening -segment recording, we can find out who the individual is, sex, general age, general regional dialect, tone, whether the utterance is thin, sharp, etc., and all these information can find the corresponding place in the face image, because the face image can also recognize the above information by face recognition, such as identity, sex, general age, south/north, height, character, etc.
Therefore, it is necessary to provide cross-modal biometric identification methods and systems based on speech signals.
Disclosure of Invention
The invention aims to provide cross-modal biometric identification methods and systems based on voice signals, which identify biometric information of other modalities of a speaker of the voice signals through the input voice signals.
In order to achieve the purpose, the invention adopts the following technical scheme that the cross-modal biometric feature recognition method of voice signals comprises the following steps:
s1, acquiring multi-mode biological characteristic information including the voice signal to be recognized and a plurality of persons;
s2, extracting features by utilizing a neural network model aiming at each single modes, and acquiring fixed dimension vectors of the voiceprint features and the corresponding biological features of other modes;
s3, confirming whether the voiceprint feature vectors of the multi-modal biological features and the feature vectors of other dimensions come from the same person, enabling the voiceprint feature vectors extracted in the step 2 and the feature vectors of other modalities to be connected in parallel to form vector pairs, enabling the output of the vector pairs to be labeled with 1 if the voiceprint features and the features of other dimensions come from the same person, and enabling the output of the vector pairs to be labeled with 0 if the voiceprint features and the features of other dimensions come from two different persons.
S4: and carrying out supervised classification training on the obtained vector pairs formed by the parallel connection of the plurality of vectors and corresponding 0 or 1 labels, selecting a model and parameters with optimal loss function evaluation, and outputting a 0 or 1 confirmation recognition result.
In the step S2, the neural network model is used to extract the speech signal to be recognized, the input speech signal to be recognized is used to extract mel spectrum features by using a python toolkit, a Resnet neural network model is built to pass through the network model, the input of the neural network model is mel spectrum vectors extracted by the python toolkit, the output is fixed dimension 128-dimensional g-vector features, and the g-vector features are output of the neural network.
In the step S4, the neural network is used to train and evaluate the non-linear kernel-based SVM support vector machine, and based on the kernel technique, the non-linear SVM model can be expressed as follows:
Figure BDA0002235240100000021
satisfies the following conditions
Figure BDA0002235240100000022
When the condition (2) is satisfied, the parameter α for the minimum value of the formula (1) is obtained, where N represents the number of samples, y represents the true label value, and X represents the input value, where K (xi, yj) is a function of the original low-dimensional feature space X
The invention also provides a cross-modal biological feature recognition system of voice signals, which comprises an acquisition module, an extraction module, a confirmation module and an output module, wherein the acquisition module is used for acquiring multi-modal biological feature information of multiple persons including voice to be recognized, the extraction module is used for extracting features by using a neural network model every single modes and acquiring fixed-dimension vectors of voiceprint features and corresponding other modal biological features, the confirmation module is used for confirming whether the voiceprint feature vectors of the multi-modal biological features and the feature vectors of other dimensions are from the same person, if the voiceprint features and the features of other dimensions are from the same person, the output person of the vector pair is labeled as 1, otherwise, the output person is labeled as 0 if the voiceprint features and the features of other dimensions are from different two persons, the output module is used for carrying out supervision classification training on the obtained vector pairs formed in parallel connection and the corresponding labels of 0 or 1, models and parameters optimal for loss function evaluation are selected, and 0 or 1 confirmation recognition results are output.
Compared with the prior art, the cross-modal biometric feature recognition method and system based on the voice signal have the beneficial effects that: by inputting the voice signal, the system recognizes the biological characteristic information of other modes of the speaker of the voice signal in the biological characteristic signals of other modes of a plurality of candidate persons by means of the input voice signal.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:
FIG. 1 is a flow chart of a cross-modal biometric identification method of a speech signal according to the present invention;
fig. 2 is a block diagram of a cross-modal biometric recognition system of a speech signal of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the drawings, but it should be emphasized that the following embodiments are only exemplary and are not intended to limit the scope and application of the present invention.
Fig. 1 is a flow chart of a cross-modal biometric identification method of a speech signal according to the present invention.
S1: acquiring multi-modal biological characteristic information of a plurality of persons including the speech to be recognized;
specifically, the multi-modal biometric features include voiceprint features of a speaker's voice, face features of human facial information, gait features of a walking posture, iris features of a human eye, and the like, and form a multi-modal biometric feature data set or use a public biometric feature data set as a training set of the system model.
S2, extracting features by utilizing a neural network model aiming at each single modes, and acquiring fixed dimension vectors of the voiceprint features and the corresponding biological features of other modes;
the method comprises the steps of extracting a voice signal to be recognized by using a neural network model, extracting Mel spectral characteristics of the input voice signal to be recognized by using a python toolkit, building a Resnet neural network model through the network model, inputting the neural network model by using Mel spectral vectors extracted by the python program toolkit, outputting the model by using a fixed dimension 128-dimensional g-vector characteristic, and outputting the g-vector characteristic by using a neural network. The specific model structure is as follows:
in the above table, layer is a neural network layer, output size is the output size of the corresponding layer, 3X3 represents a convolution kernel, stride represents a step size, T represents a time step size, and params is what the parameters of this layer are.
The th layer of the network is conv1 convolutional layer, the second layer is Res1, the third layer is Res2, the … … … … sixth layer is GSP pooling layer, a global statistics pooling mode (statistics) is adopted, the 7 th layer is full connection layer (FC1), the output is also full connection layer (FC2), and full-connected means.
For the extraction of the human face features, a DeepiD network model is adopted, and vectors with fixed dimensions are obtained for each human faces, wherein the DeepiD model is based on a convolutional neural network and comprises 4 convolutional layers (each has the maximum pooling layer) and fully-connected layers (namely 160-dimensional features of the DeepiD).
Obtaining fixed-dimension feature vectors reflecting identity information of people through a DeepiD network model (namely, collected multi-modal biological feature information is represented by mathematical vectors), wherein text records similar to Excel are made during feature extraction, such as Xiaoming classmate, the number of which is 1, the name of extracted voiceprint features is 1-vector, the name of extracted face features is 1-factor, and other biological feature recording methods are similar, the features of Xiaoming classmate are extracted, if other people exist, the recording is continued according to the recording mode, and records are obtained as follows:
name-number voiceprint features, face features, or other features
Xiaoming 11-vector 1-factor 1-xxxxxxx
Zhang three 22-vector 2-factor 2-xxxxxx
S3, confirming whether the voiceprint feature vector of the multi-modal biological features and the feature vector of other dimensions are from the same person, if the voiceprint feature and the features of other dimensions are from the same person, the output person of the vector pair is labeled as 1, otherwise, if the voiceprint feature and the features of other dimensions are from two different persons, the label is 0.
And (3) connecting a plurality of voiceprint features generated in the training data in parallel with features of other modality biological features respectively to form vector pairs, namely connecting the voiceprint feature vector extracted in the step (2) and the other modality biological feature vectors in parallel to form vector pairs, and if the two voiceprint feature vectors and the other modality biological feature vectors are from the same person, labeling the output person of the vector pair as 1, otherwise, labeling the output person as 0 if the two voiceprint feature vectors and the other modality biological feature vectors are from different two persons.
For example, if A is a voiceprint feature, B is a face feature, A and B are both the feature vectors extracted in step 2, A and B may be the same person's face and voiceprint, and may not be the same person
S4: and carrying out supervised classification training on the obtained vector pairs formed by the parallel connection of the plurality of vectors and corresponding 0 or 1 labels, selecting a model and parameters with optimal loss function evaluation, and outputting a 0 or 1 confirmation recognition result.
All vector pairs obtained were split by a ratio of 8:2, with random 80% vector pairs as the training set and 20% vector pairs as the test set.
The neural network is adopted for training through a nonlinear kernel function-based SVM (support vector machine), and based on kernel skills, a nonlinear SVM model can be expressed as follows:
Figure BDA0002235240100000061
satisfies the following conditions
Figure BDA0002235240100000062
When the condition (2) is satisfied, the parameter α when the minimum value is obtained in the formula (1) is obtained, in which N represents the number of samples, y represents the true label value, and X represents the input value, where K (xi, yj) is a function of the original low-dimensional feature space X, and the operation is performed on the low-dimensional space, which is essentially a kernel function.
When the method is applied, a fixed dimension of a voiceprint feature reflecting identity information of a person is obtained for an input voice signal, a feature vector of the fixed dimension is obtained by preprocessing and denoising and filtering candidate data of other modes to be recognized, other mode fixed dimension features reflecting the identity information of the person are obtained, a vector pair to be recognized is obtained by connecting the candidate data of the other modes to be recognized in parallel with the voiceprint feature mode fixed dimension features, the generated vector pair to be recognized is recognized by using the supervised learning classifier model in the step 4, the vector pair to be recognized is input to the supervised learning classifier model, and the voiceprint feature modal data and other modal data which are output by the model and represented by 0 or 1.1 come from the same person , wherein 0 represents different persons.
For example, when an arbitrary person speaks into the system, the voiceprint feature A of the person is extracted, then all face feature vectors B are extracted from alternative face pictures, A and B are connected in parallel to form feature vector pairs and used as the input of the system, and then the prediction is made by supervising classification model reasoning operation.1 represents that the A-mode data and the B-mode data are from the same person, and 0 represents from different persons.
Fig. 2 is a block diagram of a cross-modal biometric recognition system of a speech signal according to the present invention, which includes:
the system comprises an acquisition module 1, a recognition module and a processing module, wherein the acquisition module is used for acquiring multi-modal biological characteristic information of a plurality of people including voices to be recognized;
specifically, the voice to be recognized is acquired through a microphone, the system has multi-modal biological characteristics including voiceprint characteristics of the voice of a speaker, face characteristics of human face information, gait characteristics of walking posture, iris characteristics of human eyes and the like, and a multi-modal biological characteristic data set is formed or a public biological characteristic data set is used as a training set of a system model.
The extraction module 2 is used for extracting features by utilizing a neural network model in each single modes, and obtaining fixed dimension vectors of voiceprint features and corresponding biological features of other modes;
the confirming module 3 is used for confirming whether the voiceprint feature vectors of the multi-modal biological features and the feature vectors of other dimensions are from the same person, if the voiceprint features and the features of other dimensions are from the same person, the output person of the vector pair is labeled as 1, otherwise, if the voiceprint features and the features of other dimensions are from two different persons, the label is labeled as 0;
and the output module 4 is used for carrying out supervision classification training on the obtained multiple parallel vector pairs and corresponding 0 or 1 labels, selecting a model and parameters with optimal loss function evaluation, and outputting a 0 or 1 confirmation recognition result.
The cross-modal biometric feature recognition method and system based on the voice signal have the beneficial effects that: by inputting the voice signal, the system recognizes the biological characteristic information of other modes of the speaker of the voice signal in the biological characteristic signals of other modes of a plurality of candidate persons by means of the input voice signal.
Of course, persons skilled in the art should recognize that the above-described embodiments are illustrative only, and not limiting, and that changes and modifications to the above-described embodiments are intended to fall within the scope of the appended claims, as long as they fall within the true spirit of the invention.

Claims (4)

1, A method for recognizing cross-modal biometric features of speech signals, characterized by the steps of:
s1, acquiring multi-mode biological characteristic information including the voice signal to be recognized and a plurality of persons;
s2, extracting features by utilizing a neural network model aiming at each single modes, and acquiring fixed dimension vectors of the voiceprint features and the corresponding biological features of other modes;
s3, confirming whether the voiceprint feature vectors of the multi-modal biological features and the feature vectors of other dimensions come from the same person, and enabling the voiceprint feature vectors extracted in the step 2 and the feature vectors of other modalities to be connected in parallel to form vector pairs, wherein if the voiceprint features and the features of other dimensions come from the same person, the output of the vector pairs is artificially labeled as 1, and if the voiceprint features and the features of other dimensions come from two different persons, the label is 0;
s4: and carrying out supervised classification training on the obtained vector pairs formed by the parallel connection of the plurality of vectors and corresponding 0 or 1 labels, selecting a model and parameters with optimal loss function evaluation, and outputting a 0 or 1 confirmation recognition result.
2. The method for recognizing the biometric characteristic of the voice signal across the modal state according to claim 1, wherein in the step S2, the voice signal to be recognized is extracted by using a neural network model, the mel spectrum characteristic of the input voice signal to be recognized is extracted by using a python toolkit, a respet neural network model is built to pass through the neural network model, the input of the neural network model is the mel spectrum vector extracted by the python toolkit, the output is the g-vector characteristic with the fixed dimension of 128 dimensions, and the g-vector characteristic is the output of the neural network.
3. The cross-modal biometric recognition method of a speech signal according to claim 1,
in the step S4, the neural network is used to train and evaluate the non-linear kernel-based SVM support vector machine, and based on the kernel technique, the non-linear SVM model can be expressed as follows:
Figure FDA0002235240090000011
satisfies the following conditions
Figure FDA0002235240090000012
When the condition (2) is satisfied, the parameter α for the minimum value of the equation (1) is obtained, where N represents the number of samples, y represents the true label value, and X represents the input value, where K (xi, yj) is a function of the original low-dimensional feature space X.
A system for cross-modal biometric recognition of speech signals, comprising:
the system comprises an acquisition module, a recognition module and a processing module, wherein the acquisition module is used for acquiring multi-modal biological characteristic information of a plurality of people including voices to be recognized;
the extraction module is used for extracting features by utilizing a neural network model in each single modes and acquiring fixed dimension vectors of voiceprint features and corresponding biological features of other modes;
a confirmation module for confirming whether the voiceprint feature vector of the multi-modal biological features and the feature vector of other dimensions are from the same person, if the voiceprint feature and the feature of other dimensions are from the same person, the output person of the vector pair is labeled as 1, otherwise, if the voiceprint feature and the feature of other dimensions are from two different persons, the label is 0;
and the output module is used for carrying out supervision classification training on the obtained multiple parallel vector pairs and corresponding 0 or 1 labels, selecting a model and parameters with optimal loss function evaluation, and outputting a 0 or 1 confirmation recognition result.
CN201910981216.3A 2019-10-16 2019-10-16 Cross-modal biometric feature recognition method and system based on voice signals Pending CN110738985A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910981216.3A CN110738985A (en) 2019-10-16 2019-10-16 Cross-modal biometric feature recognition method and system based on voice signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910981216.3A CN110738985A (en) 2019-10-16 2019-10-16 Cross-modal biometric feature recognition method and system based on voice signals

Publications (1)

Publication Number Publication Date
CN110738985A true CN110738985A (en) 2020-01-31

Family

ID=69268977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910981216.3A Pending CN110738985A (en) 2019-10-16 2019-10-16 Cross-modal biometric feature recognition method and system based on voice signals

Country Status (1)

Country Link
CN (1) CN110738985A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401440A (en) * 2020-03-13 2020-07-10 重庆第二师范学院 Target classification recognition method and device, computer equipment and storage medium
CN114611400A (en) * 2022-03-18 2022-06-10 河北金锁安防工程股份有限公司 Early warning information screening method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190163891A1 (en) * 2013-05-08 2019-05-30 Jpmorgan Chase Bank, N.A. Systems and methods for high fidelity multi-modal out-of-band biometric authentication with human cross-checking
CN109840287A (en) * 2019-01-31 2019-06-04 中科人工智能创新技术研究院(青岛)有限公司 A kind of cross-module state information retrieval method neural network based and device
CN109902780A (en) * 2019-02-14 2019-06-18 广州番禺职业技术学院 Testimony of a witness unification verification terminal and system and method based on multi-modal recognition of face
CN109903774A (en) * 2019-04-12 2019-06-18 南京大学 A kind of method for recognizing sound-groove based on angle separation loss function
CN110109541A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method of multi-modal interaction
US20190278937A1 (en) * 2018-03-07 2019-09-12 Open Inference Holdings LLC Systems and methods for privacy-enabled biometric processing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190163891A1 (en) * 2013-05-08 2019-05-30 Jpmorgan Chase Bank, N.A. Systems and methods for high fidelity multi-modal out-of-band biometric authentication with human cross-checking
US20190278937A1 (en) * 2018-03-07 2019-09-12 Open Inference Holdings LLC Systems and methods for privacy-enabled biometric processing
CN109840287A (en) * 2019-01-31 2019-06-04 中科人工智能创新技术研究院(青岛)有限公司 A kind of cross-module state information retrieval method neural network based and device
CN109902780A (en) * 2019-02-14 2019-06-18 广州番禺职业技术学院 Testimony of a witness unification verification terminal and system and method based on multi-modal recognition of face
CN109903774A (en) * 2019-04-12 2019-06-18 南京大学 A kind of method for recognizing sound-groove based on angle separation loss function
CN110109541A (en) * 2019-04-25 2019-08-09 广州智伴人工智能科技有限公司 A kind of method of multi-modal interaction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ARSHA NAGRANI ET AL.: "Seeing_Voices_and_Hearing_Faces_Cross-Modal_Biometric_Matching", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
郑婉蓉: "声音_图像的跨模态处理方法综述", 《中国传媒大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401440A (en) * 2020-03-13 2020-07-10 重庆第二师范学院 Target classification recognition method and device, computer equipment and storage medium
CN114611400A (en) * 2022-03-18 2022-06-10 河北金锁安防工程股份有限公司 Early warning information screening method and system
CN114611400B (en) * 2022-03-18 2023-08-29 河北金锁安防工程股份有限公司 Early warning information screening method and system

Similar Documents

Publication Publication Date Title
Kim et al. Person authentication using face, teeth and voice modalities for mobile device security
Frischholz et al. BiolD: a multimodal biometric identification system
Alshamsi et al. Automated facial expression and speech emotion recognition app development on smart phones using cloud computing
CN112101096A (en) Suicide emotion perception method based on multi-mode fusion of voice and micro-expression
CN110991346A (en) Suspected drug addict identification method and device and storage medium
Shinde et al. Real time two way communication approach for hearing impaired and dumb person based on image processing
CN110738985A (en) Cross-modal biometric feature recognition method and system based on voice signals
Gawande et al. Biometric-based security system: Issues and challenges
Nahar et al. Twins and Similar Faces Recognition Using Geometric and Photometric Features with Transfer Learning
Shen et al. Secure mobile services by face and speech based personal authentication
CN110298331B (en) Witness comparison method
KR101208678B1 (en) Incremental personal autentication system and method using multi bio-data
Bigun et al. Combining biometric evidence for person authentication
Kadhim et al. A multimodal biometric database and case study for face recognition based deep learning
WO2006057475A1 (en) Face detection and authentication apparatus and method
Khalifa et al. Bimodal biometric verification with different fusion levels
Monica et al. Recognition of medicine using cnn for visually impaired
CN111460880B (en) Multimode biological feature fusion method and system
Shetty et al. Real-time translation of sign language for speech impaired
Raja et al. A Peculiar Reading System for Blind People using OCR Technology
Charishma et al. Smart Attendance System with and Without Mask using Face Recognition
Boujnah et al. Smartphone-captured ear and voice database in degraded conditions
CN106971725B (en) Voiceprint recognition method and system with priority
Shukla et al. A novel approach of speaker authentication by fusion of speech and image features using Artificial Neural Networks
Muruganantham et al. Biometric of speaker authentication using CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200131

WD01 Invention patent application deemed withdrawn after publication