CN112967724A - Long-sequence biological Hash authentication method based on feature fusion - Google Patents

Long-sequence biological Hash authentication method based on feature fusion Download PDF

Info

Publication number
CN112967724A
CN112967724A CN202110135480.2A CN202110135480A CN112967724A CN 112967724 A CN112967724 A CN 112967724A CN 202110135480 A CN202110135480 A CN 202110135480A CN 112967724 A CN112967724 A CN 112967724A
Authority
CN
China
Prior art keywords
hash
biological
authentication
matrix
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110135480.2A
Other languages
Chinese (zh)
Other versions
CN112967724B (en
Inventor
黄羿博
安丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest Normal University
Original Assignee
Northwest Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest Normal University filed Critical Northwest Normal University
Priority to CN202110135480.2A priority Critical patent/CN112967724B/en
Publication of CN112967724A publication Critical patent/CN112967724A/en
Application granted granted Critical
Publication of CN112967724B publication Critical patent/CN112967724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/001Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols using chaotic signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Collating Specific Patterns (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a long-sequence biological Hash authentication method based on feature fusion, which comprises the steps of extracting MFCC (Mel frequency cepstrum coefficient) features and CQCC features after voice signals are subjected to pre-emphasis and framing windowing, solving space cosine values of adjacent frames to obtain biological feature vectors, performing Schmidt orthogonalization on a pseudo-random matrix, performing point multiplication on the biological feature vectors and row vectors in an orthogonal set matrix to obtain a square matrix, performing chaotic shift on the square matrix to obtain an encrypted square matrix, and performing projection dimensionality reduction on the encrypted square matrix by using the row vectors in the orthogonal set matrix to obtain a biological safety template; binaryzation processing biological safety template to generate biological Hash long sequenceh(n) (ii) a The authentication voice of the user is processed by the steps to obtain the biological Hash long sequenceH(n) (ii) a By Hamming distance calculationH(n) Andh(n) If the error rate is less than or equal to the threshold value, the authentication is passed; and feeding back the result to the authenticated user. The authentication method can effectively reduce the probability that different voice segments are confirmed to be the same segment, improves the authentication rate, and has good robustness to common low signal-to-noise ratio noise background.

Description

Long-sequence biological Hash authentication method based on feature fusion
Technical Field
The invention belongs to the technical field of voice authentication, and relates to a long-sequence biological Hash authentication method based on feature fusion.
Background
In recent years, biometric authentication systems have been increasingly used, and the storage of unprotected biometric data poses a serious privacy threat. Because of the scarcity of personal biometrics, once lost, sensitive information about the user is exposed, leading to a safety hazard. At present, security holes exist in voice authentication from voice acquisition to data storage to a voice hash database, hash sequences constructed in the authentication method are short, the same hash sequence may come from different voice segments, low distinguishability among users causes high false recognition rate, and authentication effect is poor. Therefore, the research of security and distinctiveness in voice biometric content authentication becomes an important challenge.
The biometric authentication method widely adopts the biometric features of human face, palm print, fingerprint, signature, iris and the like, and rarely relates to the voice feature. In recent years, the voice perception hash authentication method not only can achieve a good authentication effect, but also can resist noise interference in a channel transmission process, but the voice authentication method is lack of safety. Due to the efficiency and safety of the biological hash calculation, the method is widely applied to biological characteristics for protecting privacy. Therefore, the voice perception hash is combined with the biological hash, so that the authentication effect can be improved, and the safety of voice characteristics can be ensured. Currently, the extracted speech signal features include: short-time cross-correlation coefficients, short-time zero-crossing rates, mel-frequency cepstral coefficients, linear prediction cepstral coefficients, spectral entropy, wavelet coefficients, cochlear maps, Modulated Complex Lapped Transforms (MCLT), spectrogram, etc., and fusions between various features.
In the prior art, a voice authentication method has short hash sequence, so that the distinguishability is not enough, the robustness in a common low signal-to-noise ratio noise environment is low, the real-time performance is low due to the complexity of the authentication method, and the voice biological characteristics are easy to leak.
Disclosure of Invention
The invention aims to provide a long-sequence biological Hash authentication method based on feature fusion, which not only considers distinguishability and robustness, but also meets high-efficiency authentication.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a long sequence biological Hash authentication method based on feature fusion is specifically carried out according to the following steps:
a registration stage:
step 1: pre-emphasizing an input voice signal s (N), and then performing frame division and windowing, wherein the pre-emphasized voice signal x (N) is divided into N frames to obtain a processed signal x (N, m); where N (N =1,2, …, N) is the index number of the frame and m (m =1,2, …, N) is the index number of the data per frame;
step 2: robust feature extraction
MFCC feature extraction
Converting the processed signal into a frequency domain signal by discrete Fourier transform
Figure 218395DEST_PATH_IMAGE001
Then, the power spectrum is obtained and the Mel filter transformation is carried out to obtain the Mel spectrumP n ={P(n,l 1)︱n =1,2,…,Nl 1=1,2,…,L 1}; taking logarithm of Mel spectrum and performing Discrete Cosine Transform (DCT) to obtain Mel cepstrum coefficientMFCC n ={MFCC(n,i)︱=1,2,…,Ni=1,2,…,L 1};
CQCC feature extraction
Constant Q conversion is carried out on the processed signal to obtain a converted frequency spectrum signal
Figure 383140DEST_PATH_IMAGE002
(ii) a Then, power spectrum is solved, logarithm and uniform sampling are taken, and transformed features are obtainedR n ={R(n,l 2)︱=1,2,…,Nl 2=1,2,…,L 2}; then, discrete cosine transform is carried out to obtain constant Q cepstrum coefficientCQCC n ={CQCC(n,j)︱n =1,2,…,Nj=1,2,…,J 2};
3. Calculating space cosine value of adjacent frame
Uniformly setting the extracted MFCC characteristic value and the CQCC characteristic value as a wholeMQ(n,i)(n=1,2,…,Ni=1,2,…,L) And is andL=L 1=L 2(ii) a Respectively averaging the row vectors of MFCC characteristic value and CQCC characteristic valueMQ(i) (i=1,2,…,L) AndMQ 1(i) (i=1,2,…,L) Then will beMQ(i) AndMQ 1(i) Carrying out matrix splicing to obtain a matrix lambda1=[MQ 1 MQ]、Λ2=[MQ MQ 1]To matrix Λ1Sum matrix Λ2Each row of (1) is subjected to cosine value to obtain a biological characteristic vectorF(n)(n=1,2,…,N p );
Figure 490773DEST_PATH_IMAGE003
And step 3: generating a pseudo-random matrix through the 2D-SIMM, and performing Schmidt orthogonalization on the pseudo-random matrix to obtain an orthogonal set matrix;
and 4, step 4: construction of biological safety formwork
Extracting the row vector in the orthogonal set matrix to obtain the orthogonal row vectorV 1(n) (n=1,2,…,N p ) (ii) a Transforming the biometric feature vectorF(n) And orthogonal row vectorV 1(n) The point multiplication results in the square matrix Ψ (n,n);
Figure 213879DEST_PATH_IMAGE005
To square matrix Ψ: (n,n) Chaotic shift is carried out to obtain an encrypted square matrix psi*(n,n) (ii) a Orthogonal set matrix row vectorV 2(n) (n=1,2,…,N p ) For encrypted square matrix psi*(n,n) Projection dimensionality reduction is carried out to obtain a biological safety templateW(n);
W(n)= V 2(n)·Ψ*(n,n)=[ W(1), W(2),…, W(N p )]
And 5: binary processing biological safety templateW(n) Generating a one-dimensional binary bio-hash long sequenceh(n),
Figure 497093DEST_PATH_IMAGE007
Wherein, it is provided withh(1)=0,h(n) A bio-hash value for each frame;
then the biological hash long sequenceh(n) Storing the data to a cloud end to finish a registration stage;
and (3) an authentication stage:
step 1: providing authentication voice by the authentication user, and obtaining the biological Hash long sequence by the authentication voice through the steps 1 to 5 of the registration stageH(n);
Step 2: biological Hash length sequence obtained by calculating authentication voice through Hamming distanceH(n) With the biological hash long sequence stored in the cloudh(n) Error rate ofBER(h,H):
Figure 539129DEST_PATH_IMAGE008
Where ≧ is an exclusive-or logic operation,N p is the length of the biological hash sequence;
the hash matching is described using BER hypothesis testing:
T0: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With the same content, then:BER(h, H)≤τthe authentication is passed;
T1: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With different content, then:BER(h, H)>τthe authentication fails;
wherein,τrepresents a perceptual authentication threshold;
and step 3: and feeding back the authentication result to the authenticated user.
The authentication method is a long-sequence biological Hash authentication method based on two-dimensional sinusoidal modulation mapping (2D-SIMM) and Constant Q Cepstrum Coefficient (CQCC) cosine values, the anti-collision performance is improved by adopting a Hash long sequence, the extracted frequency domain space distance characteristic has strong robustness, the safety of biological characteristics can be well ensured by a pseudo-random matrix generated by the 2D-SIMM, and the irreversibility of the biological Hash sequence is ensured. The method has the advantages of good robustness in the face of common low signal-to-noise ratio noise background, and can provide a safe template for biological features.
Compared with the prior art, the authentication method has the following advantages:
1) the method has good comprehensive performance and solves the problems of the existing biological hash authentication method.
2) The extracted biological characteristics can well cope with the interference of various content holding operations such as volume, resampling, MP3 compression and the like. And the matching accuracy of complex noise environments such as Babble and the like is better under the condition of low signal to noise ratio.
3) By adopting the Hash long sequence, the probability that different voice segments are confirmed to be the same segment can be effectively reduced, and the authentication rate is improved.
4) A ratio method is adopted to prove the trapdoor unidirectionality of the biological hash algorithm. The biological safety template generated by adopting the 2D-SIMM has higher safety and reduces the risk of leakage of biological characteristics.
Drawings
Fig. 1 is a flow chart of an authentication method of the present invention.
Fig. 2 is a flowchart of MFCC feature extraction in the authentication method of the present invention.
Fig. 3 is a flowchart of extracting CQCC features in the authentication method of the present invention.
FIG. 4 is a BER histogram of a matched speech with the rest 1199 speech;
FIGS. 5(a) and 5(b) are the MFCC cosine values at 1065bits and the CQCC cosine values at 1065bits, respectively;
FIGS. 6(a) and 6(b) are the MFCC cosine value (1065 bits) FRR-FAR curve and the CQCC cosine value (1065 bits) FRR-FAR curve, respectively;
FIG. 7 is a FAR-FRR curve of the authentication method of the present invention;
FIG. 8 is a unidirectional block diagram of an authentication biometric security template tape trapdoor;
fig. 9(a) and 9(b) are differences between the obtained feature of the correct key and the original feature, respectively;
fig. 10(a) and 10(b) are differences between the features obtained by the wrong key and the original features, respectively;
FIGS. 11(a) and 11(b) are Hamming code distances from and to, respectively;
fig. 12(a) and 12(b) are biological safety templates with correct chaotic shift and incorrect chaotic shift, respectively.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides a long-sequence biological Hash authentication method based on feature fusion, a flow chart of which is shown in figure 1, and the authentication method is specifically carried out according to the following steps:
a registration stage: and (3) the registered user performs feature extraction on the original voice, then constructs a biological security template, and finally stores the binary hash long sequence to the cloud.
Step 1: pre-emphasis is carried out on an input speech signal s (n) to obtain a pre-emphasis speech signal x (n), then, frame division and window addition are carried out on the pre-emphasis speech signal x (n), and a Hamming window is selected as a window function; the pre-emphasized speech signal x (N) is divided into N frames resulting in a processed signal x (N, m). Where N (N =1,2, …, N) is the index number of the frame and m (m =1,2, …, N) is the index number of the data per frame; the processed signal x (n, m) is a time domain signal.
Step 2: robust feature extraction
MFCC feature extraction
Converting a time domain signal to a frequency domain signal by Discrete Fourier Transform (DFT)
Figure 275004DEST_PATH_IMAGE009
Then, the power spectrum is obtained and the Mel filter transformation is carried out to obtain the Mel spectrumP n ={P(n,l 1)︱n =1,2,…,Nl 1=1,2,…,L 1}; taking logarithm of Mel spectrum and performing Discrete Cosine Transform (DCT) to obtain Mel cepstrum coefficientMFCC n ={MFCC(n,i)︱=1,2,…,Ni=1,2,…,L 1As in fig. 2;
CQCC feature extraction
Constant Q Transformation (CQT) is carried out on the time domain signal to obtain a transformed frequency spectrum signal
Figure 801800DEST_PATH_IMAGE010
(ii) a Then, power spectrum is solved, logarithm is taken, and uniform sampling is carried out, so as to obtain transformed featuresR n ={R(n,l 2)︱=1,2,…,Nl 2=1,2,…,L 2}; then, discrete cosine transform is carried out to obtain constant Q cepstrum coefficientCQCC n ={CQCC(n,j)︱n =1,2,…,Nj=1,2,…, J 2As in fig. 3;
for example: the MFCC features are evaluated using 16 Mel filters, so thatL 1= 16. When CQC is obtained, after CQT is transformed,Kthe value of (d) is 8; then, performing equal interval interpolation sampling to obtainL 2Is 16.
3. Calculating space cosine value of adjacent frame
Uniformly setting the extracted MFCC characteristic value and the CQCC characteristic value as a wholeMQ(n,i)(n=1,2,…,Ni=1,2,…,L) And is andL=L 1=L 2(ii) a Respectively averaging the row vectors of MFCC characteristic value and CQCC characteristic valueMQ(i) (i=1,2,…,L) AndMQ 1(i) (i=1,2,…,L) Then the matrix is dividedMQ(i) AndMQ 1(i) Splicing to obtain a matrix lambda1=[MQ 1 MQ]、Λ2=[MQ MQ 1]To matrix Λ1Sum matrix Λ2Each row of (1) is subjected to cosine value to obtain a biological characteristic vectorF(n)(n=1,2,…,N p );
Figure 831198DEST_PATH_IMAGE003
And step 3: generating a pseudo-random matrix through the 2D-SIMM, and performing Schmidt orthogonalization on the pseudo-random matrix to obtain an orthogonal set matrix;
as in the above example, a pseudo-random matrix is generated by 2D-SIMM, initial values a =1, b =5 of 2D-SIMM are set, and a secret key is setkAt the same time, the length and feature vector of the pseudo-random matrixF(n) The lengths are consistent. Setting by randomly selecting initial valuesx 0=0.2,y 0=0.3, yielding a pseudo-random matrixv(n,t)(n=1,2,…,Nt=1,2), performing schmitt orthogonalization on the pseudo-random matrix to obtain an orthogonal set matrixV(n,t);
And 4, step 4: construction of biological safety formwork
Extracting the row vector in the orthogonal set matrix to obtain the orthogonal row vectorV 1(n) (n=1,2,…,N p ) (ii) a To characterize the organismF(n) And orthogonal row vectorV 1(n) The point multiplication results in the square matrix Ψ (n,n);
Figure 11513DEST_PATH_IMAGE011
To further increase the security of the biological template, the matrix Ψ: (n,n) Performing chaotic shift, namely circularly shifting rows and columns in a ring form; to reduce the computational complexity and increase the efficiency, the square matrix Ψ: (n,n) Both row and column of (2) are shifted by 0.5NNRepresenting the length of the hash sequence) to obtain an encrypted square matrix Ψ*(n,n) (ii) a Orthogonal set matrix row vectorV 2(n) (n=1,2,…,N p ) After encryptingMatrix psi*(n,n) Projection dimensionality reduction is carried out to obtain a biological safety templateW(n);
W(n)= V 2(n)·Ψ*(n,n)=[ W(1), W(2),…, W(N p )]
And 5: constructing bio-hash long sequences
Binary processing biological safety templateW(n) Generating a one-dimensional binary bio-hash long sequenceh(n) Then the biological hash is long sequenceh(n) Storing the data to a cloud end to finish a registration stage;
Figure 595203DEST_PATH_IMAGE012
wherein, it is provided withh(1)=0,h(n) A bio-hash value for each frame;
taking the above example, the hash sequence length isN p bits。
And (3) an authentication stage:
the authentication user provides voice, constructs a biological Hash long sequence, performs matching authentication with the biological Hash sequence at the cloud end, and feeds back the result to the authentication user.
Step 1: providing authentication voice by the authentication user, and obtaining the biological Hash long sequence by the authentication voice through the steps 1 to 5 of the registration stageH(n);
Step 2: biological Hash length sequence obtained by calculating authentication voice through Hamming distanceH(n) With the biological hash long sequence stored in the cloudh(n) Error rate ofBER(h,H):
Figure 801057DEST_PATH_IMAGE008
Where ≧ is an exclusive-or logic operation,N p is the length of the biological hash sequence;
the hash matching is described using BER hypothesis testing:
T0: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With the same content, then:BER(h, H)≤τ
T1: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With different content, then:BER(h, H)>τ
wherein,τindicating a perceptual authentication threshold by setting a matching thresholdτThe authentication of the biological hash is realized; if the error rate is less than the thresholdτThen a bio-hash long sequenceh(n) And bio-hash long sequencesH(n) The biometric features of (1) are the same, authentication passes, otherwise authentication fails;
and step 3: feeding back the authentication result to the authenticated user
In order to evaluate the performance of the authentication method, a false positive rate (FAR) and a false negative rate (FRR) are respectively calculated by the following two formulas;
Figure DEST_PATH_IMAGE014
in the formula:τa perceptual authentication threshold;μis composed ofBERMean value;σis composed ofBERStandard deviation. The robustness and the distinguishability of the authentication method are generally evaluated by adopting FRR and FAR; the lower the FRR value is, the stronger the perception robustness is; the lower the FAR value, the better the distinctiveness.
The superiority of the performance of the invention is illustrated by the following simulation experiments:
first, experimental conditions and experimental instructions
The experimental speech data are from the TIMIT (Texas Instruments and Massachusetts Institute of technology) and TTS (text to speech) speech databases, with 1200 different speech segments in the original speech database. Each speech segment has a format of wav, a length of 4s, 16-bit PCM, mono, and a sampling frequency of 16 khz.
And performing content keeping operation on each voice in the voice database according to the environment of voice transmission. A speech database containing 10 content-preserving operations of volume, echo, noise, resampling and MP3 compression was built, for a total of 12000 speech segments.
In order to simulate mixed noise in a real environment, a noise-92 database is added into an original voice database. A86400-segment speech database of different real background noises is established, and comprises 8 noises including Gnousegen noise, Pink noise, Factory floor noise 1, Factory floor noise 2, HF channel noise, Machine gun noise, Babble noise and Volvo noise. Each noise has 9 SNR's of-10 db, -5db, 0db, 5db, 10db, 15db, 20db, 25db and 30db, respectively.
The experimental hardware platform is Intel (R) core (TM) i5-7500,4GB,3.4 GHz. The software environment is MATLAB R2018b under the Win10 operating system.
Second, the experimental contents
1. Discriminative testing and analysis
The BER of perceptual hash values of different speech contents substantially obeys a normal distribution. There are 1200 different voices, and the number of all available BER values calculated using binomial coefficients is 1200 × 1199/2= 719400. Fig. 4 shows BER histograms of one voice matched to the other 1199 voices, and it can be seen that the BER mean follows a normal distribution diagram, which is close to 0.5, indicating that the distinction is better. As shown in fig. 5, BER of hash sequences of different content voices substantially follows normal distribution, and fig. 5(a) shows MFCC cosine values at 1065 bits; (b) representing the CQCC cosine value at 1065 bits. As shown in fig. 5, the better the BER normal distribution curve, the better the randomness and collision resistance of the bio-hash sequence. The experimental results show that: the probability distribution of BER values of different voices has higher coincidence with the probability curve of standard normal distribution. As the hash sequence increases, the BER range is closer to 0.5 and the distributed values are closer to the theoretical values.
In the above example, the selected sequence length of 1065bits is smaller than that of 640bits and 799bits in BER range, and the effect is optimal. Compared with the MFCC cosine value algorithm, the actual value fluctuation of the CQCC cosine value algorithm is small, and the effect is better.
According to the clerkMophor-Laplace central limit theorem, Hamming distance approximate obedience
Figure DEST_PATH_IMAGE015
A normal distribution in which, among others,pthe probabilities of hash values "0" and "1" are generated for the feature values,N p is the length of the biological hash sequence.
In the above example, the length of the bio-hash sequence is bits, and the average value and standard deviation of the theoretical normal distribution parameter are as follows. Table 1 describes the normal distribution parameters for different robust features and different hash sequence lengths.
TABLE 1 Normal distribution parameters for different robust features and different Hash sequence lengths
Figure DEST_PATH_IMAGE016
As can be seen from table 1, as the length of the hash sequence increases, the actual value of the authentication method of the present invention is closer to the theoretical value, and the actual curve is closer to the theoretical curve, which indicates that the length of the hash sequence generated by the authentication method of the present invention has good randomness and anti-collision performance. Meanwhile, the difference between the CQCC cosine value and the actual value curve of the MFCC cosine value is small and is close to the theoretical curve, which shows that the two methods have good distinguishability.
2. Robustness testing and analysis
The original speech was subjected to a content-preserving operation as shown in table 2, resulting in 12000 pieces of operating speech.
Table 2 content holding operation
Figure DEST_PATH_IMAGE018
719400 BER data is obtained by comparing the perceptual hash values of 1200 speech segments pairwise, and when the hash length is set to 640bits, 799bits and 1065bits, the FAR-FRR graph shown in FIG. 6 is obtained: FRR curves and FAR curves of different hash sequence lengths and MFCC cosine values are intersected, so that the differentiability and the robustness cannot be well balanced, and the FRR curves and the FAR curves of different hash sequence lengths are not overlapped in the authentication method, so that the content holding operation and the voices of different contents can be accurately distinguished, and the authentication method can well balance the differentiability and the robustness.
Fig. 7 is a graph of the FAR-FRR curve obtained in the above example, the length of the hash sequence used is 1065bits, the obtained FRR-FAR curves are not overlapped, the interval of the final drop points of the FRR and the FAR is [ 0.2350.425 ], and experimental results show that the authentication method of the present invention has good distinguishability and robustness, and can accurately identify content retention operations and voices with different contents.
3. Matching rate test and analysis under real noise environment
In order to evaluate the robustness of the authentication method of the invention to noise, the matching rate is introducedp r
Figure DEST_PATH_IMAGE019
Wherein,T A the number of the voice segments which are correctly accepted by the system among the voice segments with the same content is sensed;T R the number of the voices rejected by the system error;F A in order to sense the number of the speech segments which are accepted by the system error among different speech segments of the content. Threshold valueτThe minimum bit error rate of the FAR curve is selected. Threshold values in different methodsτSelecting: the authentication method is 0.4173, the MFCC cosine value method is 0.4264, and the matching precision of the authentication method is higher than that of other methods for factory noise 1, white Gaussian noise, high-frequency channel noise and machine gun noise. For all noises, when the signal-to-noise ratio is greater than 10db, the matching rate of the authentication method reaches 100%, and the matching rate of the authentication method is slightly lower than the MFCC cosine characteristic only under the factory noise 2 and the Volvo noise. And other noises and MFCC cosine value characteristics have poor expression effect. In a whole view, the authentication method has stronger robustness and can better realize the biological authentication in an extreme noise environment. Therefore, the authentication method has stronger robustness for different noises under low signal-to-noise ratioAnd the requirement of voice matching under a complex environment can be met.
4. Unidirectional and safety testing and analysis
In order to verify the unidirectional property of the biological hash algorithm with the trap door, a unidirectional verification algorithm with the trap door based on a logarithmic ratio method is provided. In fig. 8, a part a is the direction of generating the bio-security template, a part B is the reverse direction of generating the bio-security template, and a part C is the uniqueness of the trapdoor for verifying and extracting the bio-hash algorithm.
Random extraction of speech segments in speech libraryx,Original features of its speechFThe bio-security template is obtained by the orientation of portion A in FIG. 8WThen to obtain the speech featuresF The difference between the two biometric sequences is finally calculated by the orientation of part B. The log ratio difference between two sequences is defined as:
Figure DEST_PATH_IMAGE020
wherein,F in order to obtain a characteristic value from the biometric security template,Fis the original characteristic value of the image to be processed,RCis the difference state of the biological characteristics.
And randomly extracting voice segments in the original voice library to verify the unidirectional characteristic of the trapdoor of the biological hash algorithm. FIGS. 9 and 10 are features of correct key and incorrect key derivation, respectivelyF 1F 2And original characteristicsFA difference. FIG. 9(a) shows that the ratio of the features of the correct key to the original features substantially matchesFIG. 9(b) is a specific difference between the feature of the correct key and the original feature ratio; FIG. 10(a) shows that the ratio of the signature of the wrong key to the original signature is inconsistentFig. 10(b) shows a specific difference between the signature of the wrong key and the original signature ratio. Comparing fig. 9 and 10, it can be seen that the features extracted using the correct keyF 1And original characteristicsFThe distance between the two is distributed at (-2.1 × 10)-15,1.3×10-15) To (3). Using features extracted from wrong keysSign forF 2And original characteristicsFQuite differently, the distance between the two is distributed around-4.1, since the error therein is only 10-8The error is too small, and thus is shown as a straight line in fig. 10 (b). The signature sequence generated by the incorrect key is far from the original signature sequence compared to the correct key, thus indicating the one-way nature of the biohash trapdoor.
To further verify the unidirectionality of the biohash algorithm with trapdoors, 150 voices are randomly extracted from a voice library and respectively calculatedF 1F 2AndFhamming code distance between them, hamming code distance is shown in fig. 11. FIG. 11(a) is a drawingF 1AndFthe hamming code distance between them is shown by the figure: features derived from correct keysF 1And original characteristicsFThe Hamming distance between them is (-3.7X 10)-18,6.9×10-18). FIG. 11(b) is a drawingF 2AndFthe hamming code distance between them is shown by the figure: features derived from incorrect keysF 2And original characteristicsFThe Hamming distance range between the two is (0.10,0.19), the biological hash algorithm is further verified to be unidirectional with trapdoors, and the safety of the biological hash algorithm is also proved.
To enhance the security of the authentication method of the present invention, chaotic shift is employed in the construction of the biometric template. FIG. 12 is a graph comparing the correct chaotic shift and the incorrect chaotic shift to obtain a biosafety template, and FIG. 12(a) shows that the biosafety template with the correct chaotic shift is distributed in an interval (-5.5 × 10)-4,-0.2×10-4) Whereas the biological safety templates with the erroneous chaotic shift in FIG. 12(b) are distributed in the interval (0.2X 10)-3,1.9×10-3) In the method, the values of the biological safety template with correct chaotic shift and the biological safety template with wrong chaotic shift are completely different, so that the correct biological safety template cannot be obtained when the correct chaotic shift is unknown.
5. Real-time analysis
Real-time performance is a very important evaluation criterion in voice content authentication. In order to evaluate the real-time performance of the authentication method of the present invention, 200 voice segments need to be randomly selected from the voice database, and then the average running time is calculated. The same operating environment is adopted, the voice segment is 4s, and the results of the authentication method and the MFCC cosine value algorithm are given in Table 3.
TABLE 3 authentication method of the invention and instantaneity of MFCC cosine values
Figure DEST_PATH_IMAGE021
As shown in table 3, in the authentication method of the present invention, as the length of the hash sequence increases, although the real-time performance is reduced, the difference is small, and the requirement of real-time authentication is satisfied. Compared with other hash sequence lengths, the authentication method provided by the invention has the advantages that although the real-time performance is lower, the distinguishability is greatly improved. Compared with the MFCC cosine value, when the hash sequences are 1065bits, the MFCC cosine value is 1.08 times of the authentication method. The authentication method has very good effect in the aspect of real-time performance, and can meet the requirement of real-time authentication.
In conclusion, the long-sequence biological Hash authentication method based on the 2D-SIMM and CQCC cosine value feature fusion has good comprehensive performance, and solves the problems of the existing biological Hash authentication algorithm. The following conclusions can be drawn through experimental analysis: by adopting the Hash long sequence, the probability that different voice segments are confirmed to be the same segment can be effectively reduced, and the authentication rate is improved. The extracted biological characteristics can well cope with the interference of various content holding operations such as volume, resampling, MP3 compression and the like. And the matching accuracy of complex noise environments such as Babble and the like is better under the condition of low signal to noise ratio. A ratio method is adopted to prove the trapdoor unidirectionality of the biological hash algorithm. The biological safety template generated by adopting the 2D-SIMM has higher safety and reduces the risk of leakage of biological characteristics.
In the voice authentication method in the prior art, the extracted voice features are directly subjected to hash construction and stored in the cloud, so that the voice features are easily leaked. When the hash is constructed, the utilization efficiency of the voice features is poor, the constructed hash sequence is too short, so that the distinction of the hash sequence is not enough, and the authentication has deviation. The invention provides a long-sequence biological Hash authentication method based on two-dimensional sinusoidal modulation mapping (2D-SIMM) and Constant Q Cepstrum Coefficient (CQCC) cosine value fusion. Firstly, extracting CQCC of voice signals, then solving a characteristic value of the space cosine distance of CQCC of adjacent voice frames, and finally performing projection mapping on the characteristic value and a pseudo-random matrix generated by 2D-SIMM to construct a biological Hash length sequence. Two proposed robust feature schemes, namely MFCC (Mel cepstral coefficient) space cosine distance and CQCC space cosine distance, are evaluated through experiments, and speech in TIMIT (Texas Instruments and Mass Instruments Institute of technology) and TTS (text to Speech) speech libraries is used for testing. Experimental results show that the authentication method has better effect by adopting the CQCC space cosine distance characteristic, not only gives consideration to distinguishability and robustness, but also meets high-efficiency authentication. In the face of common low signal-to-noise ratio noise background, the authentication method has good robustness, and can provide a safe template for biological characteristics.

Claims (3)

1. A long sequence biological Hash authentication method based on feature fusion is characterized by comprising the following steps:
a registration stage:
step 1: pre-emphasizing an input voice signal s (N), and then performing frame division and windowing, wherein the pre-emphasized voice signal x (N) is divided into N frames to obtain a processed signal x (N, m); where N (N =1,2, …, N) is the index number of the frame and m (m =1,2, …, N) is the index number of the data per frame;
step 2: robust feature extraction
MFCC feature extraction
Converting the processed signal into a frequency domain signal by discrete Fourier transform
Figure 314526DEST_PATH_IMAGE001
Then, the power spectrum is obtained and the Mel filter transformation is carried out to obtain the Mel spectrumP n ={P(n,l 1)︱n =1,2,…,Nl 1=1,2,…,L 1}; taking logarithm of Mel spectrum and performing Discrete Cosine Transform (DCT) to obtain Mel cepstrum coefficientMFCC n ={MFCC(n,i)︱=1,2,…,Ni=1,2,…,L 1};
CQCC feature extraction
Constant Q conversion is carried out on the processed signal to obtain a converted frequency spectrum signal
Figure 638191DEST_PATH_IMAGE002
(ii) a Then, power spectrum is solved, logarithm and uniform sampling are taken, and transformed features are obtainedR n ={R(n,l 2)︱=1,2,…,Nl 2=1,2,…,L 2}; then, discrete cosine transform is carried out to obtain constant Q cepstrum coefficientCQCC n ={CQCC(n,j)︱n =1,2,…,Nj=1,2,…,J 2};
3. Calculating space cosine value of adjacent frame
Uniformly setting the extracted MFCC characteristic value and the CQCC characteristic value as a wholeMQ(n,i)(n=1,2,…,Ni=1,2,…,L) And is andL=L 1=L 2(ii) a Respectively averaging the row vectors of MFCC characteristic value and CQCC characteristic valueMQ(i) (i=1,2,…,L) AndMQ 1(i) (i=1,2,…,L) Then will beMQ(i) AndMQ 1(i) Carrying out matrix splicing to obtain a matrix lambda1=[MQ 1 MQ]、Λ2=[MQMQ 1]To matrix Λ1Sum matrix Λ2Each row of (1) is subjected to cosine value to obtain a biological characteristic vectorF(n)(n=1,2,…,N p );
Figure 214666DEST_PATH_IMAGE003
And step 3: generating a pseudo-random matrix through the 2D-SIMM, and performing Schmidt orthogonalization on the pseudo-random matrix to obtain an orthogonal set matrix;
and 4, step 4: construction of biological safety formwork
Extracting the row vector in the orthogonal set matrix to obtain the orthogonal row vectorV 1(n) (n=1,2,…,N p ) (ii) a Transforming the biometric feature vectorF(n) And orthogonal row vectorV 1(n) The point multiplication results in the square matrix Ψ (n,n);
Figure 344296DEST_PATH_IMAGE004
To square matrix Ψ: (n,n) Chaotic shift is carried out to obtain an encrypted square matrix psi*(n,n) (ii) a Orthogonal set matrix row vectorV 2(n) (n=1,2,…,N p ) For encrypted square matrix psi*(n,n) Projection dimensionality reduction is carried out to obtain a biological safety templateW(n);
W(n)= V 2(n)·Ψ*(n,n)=[ W(1), W(2),…, W(N p )]
And 5: binary processing biological safety templateW(n) Generating a one-dimensional binary bio-hash long sequenceh(n),
Figure 830772DEST_PATH_IMAGE005
Wherein, it is provided withh(1)=0,h(n) A bio-hash value for each frame;
then the biological hash long sequenceh(n) Storing the data to a cloud end to finish a registration stage;
and (3) an authentication stage:
step 1: authenticating a user offerThe authentication voice is processed through the steps 1 to 5 of the registration stage to obtain the biological Hash long sequenceH(n);
Step 2: biological Hash length sequence obtained by calculating authentication voice through Hamming distanceH(n) With the biological hash long sequence stored in the cloudh(n) Error rate ofBER(h,H):
Figure 387655DEST_PATH_IMAGE006
Where ≧ is an exclusive-or logic operation,N p is the length of the biological hash sequence;
the hash matching is described using BER hypothesis testing:
T0: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With the same content, then:BER(h, H)≤τthe authentication is passed;
T1: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With different content, then:BER(h, H)>τthe authentication fails;
wherein,τrepresents a perceptual authentication threshold;
and step 3: and feeding back the authentication result to the authenticated user.
2. The feature fusion based long sequence biometric hash authentication method of claim 1, wherein the square matrix Ψ (a: (b:)n,n) Both row and column of (2) are shifted by 0.5NNRepresenting the length of the hash sequence) to obtain an encrypted square matrix Ψ*(n,n)。
3. The feature fusion based long-sequence biometric hash authentication method according to claim 1, wherein, for evaluating the performance of the authentication method, the false identification rate and the false rejection rate are calculated by the following two formulas, respectively;
Figure 185847DEST_PATH_IMAGE007
in the formula:τa perceptual authentication threshold;μis composed ofBERMean value;σis composed ofBERStandard deviation; the robustness and the distinguishability of the authentication method are generally evaluated by adopting FRR and FAR; the lower the FRR value is, the stronger the perception robustness is; the lower the FAR value, the better the distinctiveness.
CN202110135480.2A 2021-02-01 2021-02-01 Long-sequence biological Hash authentication method based on feature fusion Active CN112967724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110135480.2A CN112967724B (en) 2021-02-01 2021-02-01 Long-sequence biological Hash authentication method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110135480.2A CN112967724B (en) 2021-02-01 2021-02-01 Long-sequence biological Hash authentication method based on feature fusion

Publications (2)

Publication Number Publication Date
CN112967724A true CN112967724A (en) 2021-06-15
CN112967724B CN112967724B (en) 2022-06-14

Family

ID=76272754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110135480.2A Active CN112967724B (en) 2021-02-01 2021-02-01 Long-sequence biological Hash authentication method based on feature fusion

Country Status (1)

Country Link
CN (1) CN112967724B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153190A (en) * 2023-10-27 2023-12-01 广东技术师范大学 Playback voice detection method based on attention mechanism combination characteristics

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412960A (en) * 2013-08-31 2013-11-27 西安电子科技大学 Image perceptual hashing method based on two-sided random projection
CN103873254A (en) * 2014-03-03 2014-06-18 杭州电子科技大学 Method for generating human vocal print biometric key
CN109462482A (en) * 2018-11-09 2019-03-12 深圳壹账通智能科技有限公司 Method for recognizing sound-groove, device, electronic equipment and computer readable storage medium
CN110211608A (en) * 2019-06-11 2019-09-06 兰州理工大学 A kind of speech retrieval method and system
US20200153624A1 (en) * 2018-11-13 2020-05-14 Ares Technologies, Inc. Biometric scanner apparatus and methods for its use

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412960A (en) * 2013-08-31 2013-11-27 西安电子科技大学 Image perceptual hashing method based on two-sided random projection
CN103873254A (en) * 2014-03-03 2014-06-18 杭州电子科技大学 Method for generating human vocal print biometric key
CN109462482A (en) * 2018-11-09 2019-03-12 深圳壹账通智能科技有限公司 Method for recognizing sound-groove, device, electronic equipment and computer readable storage medium
US20200153624A1 (en) * 2018-11-13 2020-05-14 Ares Technologies, Inc. Biometric scanner apparatus and methods for its use
CN110211608A (en) * 2019-06-11 2019-09-06 兰州理工大学 A kind of speech retrieval method and system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ALAMGIR SARDAR等: "A Novel Cancelable FaceHashing Technique Based on Non-Invertible Transformation With Encryption and Decryption Template", 《IEEE ACCESS》 *
WENHAO LIU等: "A fast image encryption algorithm based on chaotic map", 《OPTICS AND LASERS IN ENGINEERING》 *
毋立芳等: "生物特征模板保护综述", 《仪器仪表学报》 *
黄羿博等: "融合MFCC和LPCC的语音感知哈希算法", 《华中科技大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117153190A (en) * 2023-10-27 2023-12-01 广东技术师范大学 Playback voice detection method based on attention mechanism combination characteristics
CN117153190B (en) * 2023-10-27 2024-01-19 广东技术师范大学 Playback voice detection method based on attention mechanism combination characteristics

Also Published As

Publication number Publication date
CN112967724B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
Liu et al. An MFCC‐based text‐independent speaker identification system for access control
CN107517207A (en) Server, auth method and computer-readable recording medium
US20140007210A1 (en) High security biometric authentication system
WO2010066310A1 (en) Method for verifying the identity of a speaker, system therefore and computer readable medium
US11514918B2 (en) Method for protecting biometric templates, and a system and method for verifying a speaker's identity
CN104835497A (en) Voiceprint card swiping system and method based on dynamic password
Chee et al. Cancellable speech template via random binary orthogonal matrices projection hashing
CN111897909A (en) Ciphertext voice retrieval method and system based on deep perception Hash
CN112967724B (en) Long-sequence biological Hash authentication method based on feature fusion
Mtibaa et al. Cancelable speaker verification system based on binary Gaussian mixtures
Zhang et al. Spectrogram-based Efficient Perceptual Hashing Scheme for Speech Identification.
CN113241081B (en) Far-field speaker authentication method and system based on gradient inversion layer
Xu et al. Cancelable voiceprint templates based on knowledge signatures
Huang et al. Encrypted speech perceptual hashing authentication algorithm based on improved 2D-Henon encryption and harmonic product spectrum
US11929077B2 (en) Multi-stage speaker enrollment in voice authentication and identification
CN104134443A (en) Symmetrical ternary string represented voice perception Hash sequence constructing and authenticating method
Zhang et al. Speech Perceptual Hashing Authentication Algorithm Based on Spectral Subtraction and Energy to Entropy Ratio.
Mirmohamadsadeghi et al. A template privacy protection scheme for fingerprint minutiae descriptors
Thebaud et al. Spoofing speaker verification with voice style transfer and reconstruction loss
CN104091104A (en) Feature extraction and authentication method for multi-format audio perceptual Hashing authentication
Sasikaladevi et al. SCAN-speech biometric template protection based on genus-2 hyper elliptic curve
Inthavisas et al. Speech biometric mapping for key binding cryptosystem
KR20110088851A (en) Apparatus and method for extracting feature vector for user authentication
Huang et al. Long sequence speech perception hash authentication based on multi-feature fusion and arnold transformation
Huang et al. Long sequence biometric hashing authentication based on 2D-SIMM and CQCC cosine values

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant