CN114398611A - Bimodal identity authentication method, device and storage medium - Google Patents

Bimodal identity authentication method, device and storage medium Download PDF

Info

Publication number
CN114398611A
CN114398611A CN202111640915.5A CN202111640915A CN114398611A CN 114398611 A CN114398611 A CN 114398611A CN 202111640915 A CN202111640915 A CN 202111640915A CN 114398611 A CN114398611 A CN 114398611A
Authority
CN
China
Prior art keywords
face
identity authentication
feature vector
bimodal
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111640915.5A
Other languages
Chinese (zh)
Inventor
蔡晓东
周青松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin Topintelligent Communication Technology Co ltd
Original Assignee
Guilin Topintelligent Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin Topintelligent Communication Technology Co ltd filed Critical Guilin Topintelligent Communication Technology Co ltd
Priority to CN202111640915.5A priority Critical patent/CN114398611A/en
Publication of CN114398611A publication Critical patent/CN114398611A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches

Abstract

The invention provides a bimodal identity authentication method, a bimodal identity authentication device and a storage medium, which belong to the technical field of image processing, and the bimodal identity authentication method comprises the following steps: s1: importing face images and voice data; s2: respectively carrying out picture characteristic analysis on each face image to obtain a face characteristic vector; s3: respectively carrying out voice feature analysis on each voice data to obtain a voiceprint feature vector; s4: constructing a training model, and training the training model according to all face feature vectors and all voiceprint feature vectors to obtain a bimodal identity authentication model; s5: and importing the face image to be detected and the voice data to be detected, and performing identity authentication on the face image to be detected and the voice data to be detected through a bimodal identity authentication model to obtain an identity authentication result. The method can complement the characteristic information of the two modes, effectively makes up the defect that the single-mode biometric authentication technology is easily influenced by deception attack, environmental noise and the like, and further improves the identification accuracy rate.

Description

Bimodal identity authentication method, device and storage medium
Technical Field
The invention mainly relates to the technical field of image processing, in particular to a bimodal identity authentication method, a bimodal identity authentication device and a storage medium.
Background
Although existing face recognition and speech recognition techniques are well established. However, these single-mode authentication techniques still have many limitations, for example, face recognition is easily affected by occlusion, angle, illumination, posture change, etc., and voice recognition is easily affected by ambient noise and changes in the physical conditions of the user, so that these single-mode authentication techniques have a poor recognition effect in some specific scenarios. More challenging, at present, there are many deception jamming means for face recognition or voiceprint recognition, and a common single-mode identity authentication method is often difficult to withstand some special attacks, and once attacked or counterfeited by an illegal molecule, serious losses are easily caused to the life and property safety of people.
Disclosure of Invention
The invention provides a bimodal identity authentication method, a bimodal identity authentication device and a storage medium, aiming at the defects of the prior art.
The technical scheme for solving the technical problems is as follows: a bimodal identity authentication method comprises the following steps:
s1: importing a plurality of training data, wherein each training data comprises a face image and voice data;
s2: respectively carrying out picture characteristic analysis on the face images in the training data to obtain face characteristic vectors;
s3: respectively carrying out voice feature analysis on voice data in the training data to obtain voiceprint feature vectors;
s4: constructing a training model, and training the training model according to all the face feature vectors and all the voiceprint feature vectors to obtain a bimodal identity authentication model;
s5: and importing data to be authenticated, wherein the data to be authenticated comprises a face image to be authenticated and voice data to be authenticated, and authenticating the identity of the face image to be authenticated and the voice data to be authenticated through the bimodal identity authentication model to obtain an identity authentication result.
Another technical solution of the present invention for solving the above technical problems is as follows: a bimodal identity authentication apparatus comprising:
the data import module is used for importing a plurality of training data, and each training data comprises a face image and voice data;
the image feature analysis module is used for respectively carrying out image feature analysis on the face images in the training data to obtain face feature vectors;
the voice feature analysis module is used for respectively carrying out voice feature analysis on the voice data in the training data to obtain voiceprint feature vectors;
the model training module is used for constructing a training model, and training the training model according to all the face feature vectors and all the voiceprint feature vectors to obtain a bimodal identity authentication model;
and the identity authentication result obtaining module is used for importing data to be authenticated, the data to be authenticated comprises a face image to be authenticated and voice data to be authenticated, and identity authentication is carried out on the face image to be authenticated and the voice data to be authenticated through the dual-mode identity authentication model to obtain an identity authentication result.
Another technical solution of the present invention for solving the above technical problems is as follows: a dual-mode identity authentication device comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, and when the processor executes the computer program, the dual-mode identity authentication method is realized.
Another technical solution of the present invention for solving the above technical problems is as follows: a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a bimodal identity authentication method as described above.
The invention has the beneficial effects that: the face feature vectors are obtained through picture feature analysis of face images in the training data, the voiceprint feature vectors are obtained through voice feature analysis of voice data in the training data, a bimodal identity authentication model is obtained according to training of all the face feature vectors and all the voiceprint feature vectors on a training model, identity authentication results are obtained through identity authentication of a face image to be tested and voice data to be tested through the bimodal identity authentication model, feature information of two modes can be complemented, the defect that a single-mode biometric authentication technology is easily affected by attacks, environmental noises and the like is effectively overcome, and meanwhile, the recognition accuracy is further improved.
Drawings
Fig. 1 is a schematic flowchart of a bimodal identity authentication method according to an embodiment of the present invention;
fig. 2 is a block diagram of a bimodal identity authentication apparatus according to an embodiment of the present invention.
Detailed Description
The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.
Fig. 1 is a schematic flowchart of a bimodal identity authentication method according to an embodiment of the present invention.
As shown in fig. 1, a bimodal identity authentication method includes the following steps:
s1: importing a plurality of training data, wherein each training data comprises a face image and voice data;
s2: respectively carrying out picture characteristic analysis on the face images in the training data to obtain face characteristic vectors;
s3: respectively carrying out voice feature analysis on voice data in the training data to obtain voiceprint feature vectors;
s4: constructing a training model, and training the training model according to all the face feature vectors and all the voiceprint feature vectors to obtain a bimodal identity authentication model;
s5: and importing data to be authenticated, wherein the data to be authenticated comprises a face image to be authenticated and voice data to be authenticated, and authenticating the identity of the face image to be authenticated and the voice data to be authenticated through the bimodal identity authentication model to obtain an identity authentication result.
It should be understood that the face image may be picture data having the face image, and the voice data may be voice data of a spoken voice.
It should be understood that the face feature vector and the voiceprint feature vector represent face and voiceprint feature information, respectively.
In the above embodiment, the face feature vectors are obtained by respectively analyzing the picture features of the face images in each training data, the voiceprint feature vectors are obtained by respectively analyzing the voice features of the voice data in each training data, the bimodal identity authentication model is obtained by training the training model according to all the face feature vectors and all the voiceprint feature vectors, and the identity authentication result is obtained by authenticating the identity of the face image to be detected and the voice data to be detected through the bimodal identity authentication model.
Optionally, as an embodiment of the present invention, the step S2 process includes:
respectively carrying out face detection on the face images in the training data based on an MTCNN model to obtain detected face images corresponding to the face images;
and respectively carrying out picture feature extraction on each detected face picture based on a faceNet model to obtain a face feature vector corresponding to each face picture.
It should be understood that the MTCNN model is known as a multitasking convolutional neural network (Tutil-Task CNN), consisting of three cascaded lightweight CNNs: PNet, RNet and Onet. The image data is processed by the three networks in sequence, and finally face detection results and key point detection results are output.
It should be understood that the FaceNet model is a face recognition model, and the main idea is to map a face image to a multidimensional space, and represent the similarity of the face by spatial distance; the spatial distance between the face image and the face image is smaller, and the spatial distance between different face images is larger; therefore, the face recognition can be realized through the space mapping of the face image.
It should be understood that the FaceNet model may be replaced with other face recognition models such as insight face.
Specifically, the image containing the face image (i.e., the face image) is firstly subjected to face detection and alignment through the existing MTCNN model, and then sent into the existing faceNet model for feature extraction, so as to obtain a feature vector (i.e., the face feature vector) representing face feature information
Figure BDA0003443783500000051
Representing the vector dimension as d1
In the above embodiment, the picture feature analysis to the face image in each training data respectively obtains the face feature vector, and the face feature that can be accurate is drawed, provides the basis for follow-up data processing, has improved the discernment rate of accuracy, and simultaneously, also realized that expansibility is strong, characteristics that the range of application is wide.
Optionally, as an embodiment of the present invention, the process of step S3 includes:
respectively preprocessing the voice data in the training data to obtain processed voice data corresponding to the face images;
and respectively carrying out voice feature extraction on each processed voice data based on an x-vector model to obtain a voiceprint feature vector corresponding to each face image.
It should be understood that the x-vector model is a mainstream baseline model framework in the field of voiceprint recognition at present, and can accept input of any length and convert the input into feature expression of fixed length by virtue of a statistics posing layer in a network; in addition, a data enhancement strategy containing noise and reverberation is introduced in training, so that the model is more robust to interference such as the noise and the reverberation.
It should be understood that the x-vector model can be replaced by other voiceprint recognition models such as i-vector.
Specifically, the audio frequency (namely the voice data) containing the voiceprint information is subjected to preprocessing work such as framing windowing and pre-emphasis, and adverse factor influences such as aliasing, higher harmonic distortion and high frequency caused by human vocal organs and equipment for acquiring voice signals are eliminated. Then sending the voice print data into the existing x-vector model to carry out feature extraction on the voice print data to obtain a feature vector (namely the voice print feature vector) representing the voice print feature information and corresponding to the picture
Figure BDA0003443783500000052
In the above embodiment, the preprocessing of each voice data is performed to obtain the processed voice data, and the voice feature of each processed voice data is extracted to obtain the voiceprint feature vector based on the x-vector model, so that a basis is provided for subsequent data processing, the recognition accuracy is improved, and meanwhile, the characteristics of strong expansibility and wide application range are realized.
Optionally, as an embodiment of the present invention, the process of step S4 includes:
s41: constructing a training model, and respectively carrying out fusion analysis on each face feature vector and the voiceprint feature vector corresponding to each face image to obtain a global feature vector corresponding to each face image;
s42: respectively carrying out normalization processing on each global feature vector to obtain a predicted value corresponding to each face image;
s43: importing picture real values corresponding to the face images, and respectively calculating the predicted values and loss values of the picture real values corresponding to the face images to obtain loss values corresponding to the face images;
s44: and updating parameters of the training model by using a back propagation mechanism algorithm, a gradient descent algorithm and the loss values, and returning to the step S1 until a preset iteration number is reached to finally obtain the bimodal identity authentication model.
It should be understood that the model obtained by training after the fusion of the two-mode information (i.e. the dual-mode identity authentication model) is more robust and fault-tolerant than the single-mode identity authentication model, and has more accurate identification capability.
Specifically, the predicted values are compared according to a cross entropy loss function
Figure BDA0003443783500000061
And comparing the real values with corresponding imported sample real values (namely the picture real values) y to obtain a plurality of loss values, continuously and iteratively updating all learnable parameters from the steps S1 to S4 through a back propagation mechanism and a gradient descent method to enable the loss values to be minimum, and finally finishing the training of the whole model.
In the embodiment, the bimodal identity authentication model is obtained by training the training model through all the face feature vectors and all the voiceprint feature vectors, is more robust and fault-tolerant than a monomodal authentication model, has more accurate identification capability, can complement feature information of two modes, and effectively makes up the defect that a monomodal biometric authentication technology is easily influenced by deception attack, environmental noise and the like.
Optionally, as an embodiment of the present invention, in S41, the process of performing fusion analysis on each face feature vector and the voiceprint feature vector corresponding to each face image respectively to obtain a global feature vector corresponding to each face image includes:
respectively calculating the face hidden feature vectors of the face feature vectors by a first type to obtain the face hidden feature vectors corresponding to the face images, wherein the first type is as follows:
hf=tanh(wfef+bf),
wherein the content of the first and second substances,
Figure BDA0003443783500000071
is a face hidden feature vector, tanh is a tanh activation function,
Figure BDA0003443783500000072
to face feature vector efLearnable weight matrix for transformation, bfFor the face latent feature vector efBias term of efIs a face feature vector;
calculating the voiceprint hidden feature vectors of the voiceprint feature vectors respectively through a second formula to obtain the voiceprint hidden feature vectors corresponding to the face images, wherein the second formula is as follows:
hv=tanh(wvev+bv),
wherein the content of the first and second substances,
Figure BDA0003443783500000073
is the voiceprint hidden feature vector, tanh is the tanh activation function,
Figure BDA0003443783500000074
as to the voiceprint feature vector evLearnable weight matrix for transformation, evIs a voiceprint feature vector, bvFor hidden feature vectors e of voiceprintsvThe bias term of (d);
respectively calculating the gating vector of each face feature vector and the voiceprint feature vector corresponding to each face image through a third formula to obtain the gating vector corresponding to each face image, wherein the third formula is as follows:
z=σ(w1ef+w2ev),
wherein the content of the first and second substances,
Figure BDA0003443783500000075
for gating vectors, σ is the sigmoid activation function,
Figure BDA0003443783500000076
as a face feature vector efA matrix of weights that can be learned,
Figure BDA0003443783500000077
as a voiceprint feature vector evLearnable rightHeavy matrix, efIs a face feature vector, evIs a voiceprint feature vector;
respectively carrying out global feature vector calculation on each face feature vector, the voiceprint feature vector corresponding to each face image and the gating vector corresponding to each face image through a fourth formula to obtain the global feature vector corresponding to each face image, wherein the fourth formula is as follows:
hG=zhf+(1-z)hv
wherein the content of the first and second substances,
Figure BDA0003443783500000081
in order to be a global feature vector,
Figure BDA0003443783500000082
in order to be the gating vector, the method comprises the following steps of,
Figure BDA0003443783500000083
the hidden feature vectors of the human face are taken as the feature vectors,
Figure BDA0003443783500000084
is a voiceprint hidden feature vector.
It should be understood that the face feature vector e is calculated according to equation (1) (i.e., the first equation)fPerforming a non-linear transformation to map it from the original vector space to a new vector space
Figure BDA0003443783500000085
Said formula (1) (i.e. said first formula) is hf=tanh(wfef+bf) Wherein
Figure BDA0003443783500000086
Is a hidden feature vector after nonlinear transformation, tanh is a tanh activation function,
Figure BDA0003443783500000087
is to efLearnable weight matrix for transformation, bfIs the corresponding bias term.
It should be understood that the voiceprint feature vector e is aligned according to equation (2) (i.e., the second equation)vNon-linear transformation is carried out to ensure that the original vector space is also mapped to the same vector space S with the human face characteristic vectorP. The formula (2) (i.e., the second formula) is hv=tanh(wvev+bv) Wherein
Figure BDA0003443783500000088
Is a non-linear transformed latent feature vector,
Figure BDA0003443783500000089
is to evLearnable weight matrix for transformation, bvIs the corresponding bias term.
Specifically, the face feature vector e is calculated according to equation (3) (i.e., the third equation)fAnd the voiceprint feature vector evTransform, add, and activate operations are performed. Wherein the formula (3) (i.e., the third formula) is z ═ σ (w)1ef+w2ev) Wherein
Figure BDA00034437835000000810
Is a gating vector (the numerical range is 0-1) obtained by operation, sigma is a sigmoid activation function,
Figure BDA00034437835000000811
and
Figure BDA00034437835000000812
are respective learnable weight matrices.
Specifically, the gating vector z and the hidden feature vector h are processed according to equation (4) (i.e., the fourth equation)f(i.e., the face latent feature vector) and hv(i.e. the voiceprint hidden feature vectors) are in the same vector space SPPerforming feature fusion operation. The formula (4) (i.e., the fourth formula) is hG=zhf+(1-z)hvWherein
Figure BDA00034437835000000813
The global feature vector is used for representing the combined features of the face and the voiceprint of the same person after being fused.
It should be understood that the gating vector z is used for controlling contributions of different features to overall output, that is, the weight of the face feature and the voiceprint feature to the global feature can be adaptively adjusted, and information of two modes is complemented to obtain the global feature with higher discriminability, so as to finally achieve the purpose of more accurately judging the identity of the user. Even if the information of a single mode fails, the information of the other mode can work smoothly, and z is changed into an all-0 vector or an all-1 vector.
In the embodiment, the global feature vector is obtained by respectively performing fusion analysis on each face feature vector and each voiceprint feature vector, so that feature information of two modes can be complemented, the purpose of accurately judging the identity of the user is achieved, and the defect that a single-mode biometric authentication technology is easily affected by deception attack, environmental noise and the like is effectively overcome.
Optionally, as an embodiment of the present invention, the process of step S42 includes:
respectively carrying out normalization calculation on each global feature vector through a fifth formula to obtain a predicted value corresponding to each face image, wherein the fifth formula is as follows:
Figure BDA0003443783500000091
wherein h isGIn order to be a global feature vector,
Figure BDA0003443783500000092
is a predicted value.
It should be understood that AMsoftmax is called Additive mark softmax, and is an improvement on the original softmax function, so that intra-class samples among similar samples are more compact and heterogeneous samples are more discrete during classification, and a better classification effect is achieved.
In the above embodiment, the predicted values are obtained by respectively performing normalization calculation on the global feature vectors according to the fifth formula, so that intra-class samples of the same type are more compact during classification, and inter-class samples of different types are more discrete, thereby achieving a better classification effect.
Optionally, as an embodiment of the present invention, in step S43, the step of calculating loss values of each of the predicted values and the real picture value corresponding to each of the face images respectively to obtain the loss value corresponding to each of the face images includes:
calculating loss values of the predicted values and real picture values corresponding to the face images respectively through a sixth formula to obtain the loss values corresponding to the face images, wherein the sixth formula is as follows:
Figure BDA0003443783500000101
wherein y is the real value of the picture,
Figure BDA0003443783500000102
in order to predict the value of the target,
Figure BDA0003443783500000103
is the loss value.
It should be understood that the sixth expression is a loss function, and is used to evaluate the degree of inconsistency between the predicted value and the true value of the model, and the process of training or optimizing the model is the process of minimizing the loss function, and the smaller the loss function is, the closer the predicted value of the model is to the true value is, and the better the robustness of the model is.
In the embodiment, the loss values of the predicted values and the real values of the pictures are respectively calculated through the sixth formula to obtain the loss values, so that better parameters can be obtained, and the identification accuracy of the model is improved.
Optionally, as another embodiment of the present invention, the present invention includes an information acquisition module, an information processing module, an information base module, and a control module.
The information acquisition module is used for acquiring the face image and the speaking voice of the verified user and sending the face image and the speaking voice into the information processing module;
the information processing module is used for processing the face image and the speaking voice acquired by the information acquisition module, and particularly, the method of the invention is used for carrying out data preprocessing, feature extraction and feature fusion operation to obtain a global feature vector representing the face and voice print bimodal information of the user and then sending the global feature vector into the information base module;
the information base module is a database and is mainly used for storing the global feature vector and the corresponding user real identity information which are obtained by the information acquisition module and the information processing module when each user registers an account. Each user has one and only one global feature vector, which can be regarded as an electronic tag for identifying the identity ID of the user in the system. And the module supports feature vector similarity comparison, when the identity of a user is verified, the global feature vector obtained by the user through the information acquisition module and the information processing module and all feature vectors in the database are subjected to cosine similarity calculation to obtain a similarity score, and after the feature vector with the highest similarity to the global feature vector in the database is found, whether the user is a registered user is further judged according to a set threshold value. If the similarity score is larger than the threshold value, representing by a verification signal 1; below the threshold value is indicated by a verification signal 0. Finally, transmitting a verification signal 1 or 0 to the control module;
the control module is used for receiving the verification signal of the information base, making a decision and controlling whether the corresponding equipment allows the verification user to pass. A value of 1 indicates that the identity of the authenticated user is valid, the pass is allowed, and a value of 0 indicates that the pass and subsequent operations are not allowed.
Fig. 2 is a block diagram of a bimodal identity authentication apparatus according to an embodiment of the present invention.
Optionally, as another embodiment of the present invention, as shown in fig. 2, a dual-mode identity authentication apparatus includes:
the data import module is used for importing a plurality of training data, and each training data comprises a face image and voice data;
the image feature analysis module is used for respectively carrying out image feature analysis on the face images in the training data to obtain face feature vectors;
the voice feature analysis module is used for respectively carrying out voice feature analysis on the voice data in the training data to obtain voiceprint feature vectors;
the model training module is used for constructing a training model, and training the training model according to all the face feature vectors and all the voiceprint feature vectors to obtain a bimodal identity authentication model;
and the identity authentication result obtaining module is used for importing data to be authenticated, the data to be authenticated comprises a face image to be authenticated and voice data to be authenticated, and identity authentication is carried out on the face image to be authenticated and the voice data to be authenticated through the dual-mode identity authentication model to obtain an identity authentication result.
Optionally, another embodiment of the present invention provides a dual-mode identity authentication apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the computer program is executed by the processor, the dual-mode identity authentication method as described above is implemented. The device may be a computer or the like.
Optionally, another embodiment of the invention provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the bimodal identity authentication method as described above.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A bimodal identity authentication method is characterized by comprising the following steps:
s1: importing a plurality of training data, wherein each training data comprises a face image and voice data;
s2: respectively carrying out picture characteristic analysis on the face images in the training data to obtain face characteristic vectors;
s3: respectively carrying out voice feature analysis on voice data in the training data to obtain voiceprint feature vectors;
s4: constructing a training model, and training the training model according to all the face feature vectors and all the voiceprint feature vectors to obtain a bimodal identity authentication model;
s5: and importing data to be authenticated, wherein the data to be authenticated comprises a face image to be authenticated and voice data to be authenticated, and authenticating the identity of the face image to be authenticated and the voice data to be authenticated through the bimodal identity authentication model to obtain an identity authentication result.
2. The bimodal identity authentication method according to claim 1, wherein the step S2 procedure comprises:
respectively carrying out face detection on the face images in the training data based on an MTCNN model to obtain detected face images corresponding to the face images;
and respectively carrying out picture feature extraction on each detected face picture based on a faceNet model to obtain a face feature vector corresponding to each face picture.
3. The bimodal identity authentication method according to claim 2, wherein the process of the step S3 includes:
respectively preprocessing the voice data in the training data to obtain processed voice data corresponding to the face images;
and respectively carrying out voice feature extraction on each processed voice data based on an x-vector model to obtain a voiceprint feature vector corresponding to each face image.
4. The bimodal identity authentication method according to claim 3, wherein the process of the step S4 includes:
s41: constructing a training model, and respectively carrying out fusion analysis on each face feature vector and the voiceprint feature vector corresponding to each face image to obtain a global feature vector corresponding to each face image;
s42: respectively carrying out normalization processing on each global feature vector to obtain a predicted value corresponding to each face image;
s43: importing picture real values corresponding to the face images, and respectively calculating the predicted values and loss values of the picture real values corresponding to the face images to obtain loss values corresponding to the face images;
s44: and updating parameters of the training model by using a back propagation mechanism algorithm, a gradient descent algorithm and the loss values, and returning to the step S1 until a preset iteration number is reached to finally obtain the bimodal identity authentication model.
5. The dual-modality identity authentication method according to claim 4, wherein in the step S41, the process of performing fusion analysis on each face feature vector and the voiceprint feature vector corresponding to each face image to obtain the global feature vector corresponding to each face image includes:
respectively calculating the face hidden feature vectors of the face feature vectors by a first type to obtain the face hidden feature vectors corresponding to the face images, wherein the first type is as follows:
hf=tanh(wfef+bf),
wherein the content of the first and second substances,
Figure FDA0003443783490000021
is a face hidden feature vector, tanh is a tanh activation function,
Figure FDA0003443783490000022
to face feature vector efLearnable weight matrix for transformation, bfFor the face latent feature vector efBias term of efIs a face feature vector;
calculating the voiceprint hidden feature vectors of the voiceprint feature vectors respectively through a second formula to obtain the voiceprint hidden feature vectors corresponding to the face images, wherein the second formula is as follows:
hv=tanh(wvev+bv),
wherein the content of the first and second substances,
Figure FDA0003443783490000031
is the voiceprint hidden feature vector, tanh is the tanh activation function,
Figure FDA0003443783490000032
as to the voiceprint feature vector evLearnable weight matrix for transformation, evIs a voiceprint feature vector, bvFor hidden feature vectors e of voiceprintsvThe bias term of (d);
respectively calculating the gating vector of each face feature vector and the voiceprint feature vector corresponding to each face image through a third formula to obtain the gating vector corresponding to each face image, wherein the third formula is as follows:
z=σ(w1ef+w2ev),
wherein the content of the first and second substances,
Figure FDA0003443783490000033
for gating vectors, σ is the sigmoid activation function,
Figure FDA0003443783490000034
as a face feature vector efA matrix of weights that can be learned,
Figure FDA0003443783490000035
as a voiceprint feature vector evLearnable weight matrix, efIs a face feature vector, evIs a voiceprint feature vector;
respectively carrying out global feature vector calculation on each face feature vector, the voiceprint feature vector corresponding to each face image and the gating vector corresponding to each face image through a fourth formula to obtain the global feature vector corresponding to each face image, wherein the fourth formula is as follows:
hG=zhf+(1-z)hv
wherein the content of the first and second substances,
Figure FDA0003443783490000036
in order to be a global feature vector,
Figure FDA0003443783490000037
in order to be the gating vector, the method comprises the following steps of,
Figure FDA0003443783490000038
the hidden feature vectors of the human face are taken as the feature vectors,
Figure FDA0003443783490000039
is a voiceprint hidden feature vector.
6. The bimodal identity authentication method according to claim 4, wherein the process of the step S42 includes:
respectively carrying out normalization calculation on each global feature vector through a fifth formula to obtain a predicted value corresponding to each face image, wherein the fifth formula is as follows:
Figure FDA00034437834900000310
wherein h isGIn order to be a global feature vector,
Figure FDA00034437834900000311
is a predicted value.
7. The dual-mode identity authentication method according to claim 4, wherein in step S43, the process of calculating the loss value of each predicted value and the real picture value corresponding to each face image respectively to obtain the loss value corresponding to each face image comprises:
calculating loss values of the predicted values and real picture values corresponding to the face images respectively through a sixth formula to obtain the loss values corresponding to the face images, wherein the sixth formula is as follows:
Figure FDA0003443783490000041
wherein y is the real value of the picture,
Figure FDA0003443783490000042
in order to predict the value of the target,
Figure FDA0003443783490000043
is the loss value.
8. A bimodal identity authentication apparatus, comprising:
the data import module is used for importing a plurality of training data, and each training data comprises a face image and voice data;
the image feature analysis module is used for respectively carrying out image feature analysis on the face images in the training data to obtain face feature vectors;
the voice feature analysis module is used for respectively carrying out voice feature analysis on the voice data in the training data to obtain voiceprint feature vectors;
the model training module is used for constructing a training model, and training the training model according to all the face feature vectors and all the voiceprint feature vectors to obtain a bimodal identity authentication model;
and the identity authentication result obtaining module is used for importing data to be authenticated, the data to be authenticated comprises a face image to be authenticated and voice data to be authenticated, and identity authentication is carried out on the face image to be authenticated and the voice data to be authenticated through the dual-mode identity authentication model to obtain an identity authentication result.
9. A bimodal identity authentication system comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that when said processor executes said computer program, the bimodal identity authentication method as claimed in any one of claims 1 to 7 is implemented.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the bimodal identity authentication method as claimed in any one of claims 1 to 7.
CN202111640915.5A 2021-12-29 2021-12-29 Bimodal identity authentication method, device and storage medium Pending CN114398611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111640915.5A CN114398611A (en) 2021-12-29 2021-12-29 Bimodal identity authentication method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111640915.5A CN114398611A (en) 2021-12-29 2021-12-29 Bimodal identity authentication method, device and storage medium

Publications (1)

Publication Number Publication Date
CN114398611A true CN114398611A (en) 2022-04-26

Family

ID=81228583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111640915.5A Pending CN114398611A (en) 2021-12-29 2021-12-29 Bimodal identity authentication method, device and storage medium

Country Status (1)

Country Link
CN (1) CN114398611A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112397089A (en) * 2019-08-19 2021-02-23 中国科学院自动化研究所 Method and device for identifying identity of voice speaker, computer equipment and storage medium
CN117576763A (en) * 2024-01-11 2024-02-20 杭州世平信息科技有限公司 Identity recognition method and system based on voiceprint information and face information in cloud environment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112397089A (en) * 2019-08-19 2021-02-23 中国科学院自动化研究所 Method and device for identifying identity of voice speaker, computer equipment and storage medium
CN112397089B (en) * 2019-08-19 2023-07-04 中国科学院自动化研究所 Speech generator identity recognition method, device, computer equipment and storage medium
CN117576763A (en) * 2024-01-11 2024-02-20 杭州世平信息科技有限公司 Identity recognition method and system based on voiceprint information and face information in cloud environment

Similar Documents

Publication Publication Date Title
KR102239129B1 (en) End-to-end speaker recognition using deep neural network
Gonzalez-Rodriguez et al. Bayesian analysis of fingerprint, face and signature evidences with automatic biometric systems
WO2017215558A1 (en) Voiceprint recognition method and device
Chen et al. Robust deep feature for spoofing detection—The SJTU system for ASVspoof 2015 challenge
Aronowitz et al. Multi-modal biometrics for mobile authentication
AU2019200711B2 (en) Biometric verification
Khoury et al. Bi-modal biometric authentication on mobile phones in challenging conditions
CN114398611A (en) Bimodal identity authentication method, device and storage medium
CN111932269B (en) Equipment information processing method and device
CN108269575B (en) Voice recognition method for updating voiceprint data, terminal device and storage medium
WO2019136912A1 (en) Electronic device, identity authentication method and system, and storage medium
WO2019200744A1 (en) Self-updated anti-fraud method and apparatus, computer device and storage medium
CN108875463B (en) Multi-view vector processing method and device
CN113488073B (en) Fake voice detection method and device based on multi-feature fusion
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
Zhou et al. Deception detecting from speech signal using relevance vector machine and non-linear dynamics features
CN110111798B (en) Method, terminal and computer readable storage medium for identifying speaker
CN112560710B (en) Method for constructing finger vein recognition system and finger vein recognition system
Sholokhov et al. Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores
CN115293235A (en) Method for establishing risk identification model and corresponding device
CN111028847B (en) Voiceprint recognition optimization method based on back-end model and related device
Zhang et al. A highly stealthy adaptive decay attack against speaker recognition
CN114168788A (en) Audio audit processing method, device, equipment and storage medium
Bartuzi et al. Mobibits: Multimodal mobile biometric database
CN110991228A (en) Improved PCA face recognition algorithm resistant to illumination influence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination