CN112967724A

CN112967724A - Long-sequence biological Hash authentication method based on feature fusion

Info

Publication number: CN112967724A
Application number: CN202110135480.2A
Authority: CN
Inventors: 黄羿博; 安丽
Original assignee: Northwest Normal University
Current assignee: Northwest Normal University
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2021-06-15
Anticipated expiration: 2041-02-01
Also published as: CN112967724B

Abstract

The invention discloses a long-sequence biological Hash authentication method based on feature fusion, which comprises the steps of extracting MFCC (Mel frequency cepstrum coefficient) features and CQCC features after voice signals are subjected to pre-emphasis and framing windowing, solving space cosine values of adjacent frames to obtain biological feature vectors, performing Schmidt orthogonalization on a pseudo-random matrix, performing point multiplication on the biological feature vectors and row vectors in an orthogonal set matrix to obtain a square matrix, performing chaotic shift on the square matrix to obtain an encrypted square matrix, and performing projection dimensionality reduction on the encrypted square matrix by using the row vectors in the orthogonal set matrix to obtain a biological safety template; binaryzation processing biological safety template to generate biological Hash long sequenceh(n) (ii) a The authentication voice of the user is processed by the steps to obtain the biological Hash long sequenceH(n) (ii) a By Hamming distance calculationH(n) Andh(n) If the error rate is less than or equal to the threshold value, the authentication is passed; and feeding back the result to the authenticated user. The authentication method can effectively reduce the probability that different voice segments are confirmed to be the same segment, improves the authentication rate, and has good robustness to common low signal-to-noise ratio noise background.

Description

Long-sequence biological Hash authentication method based on feature fusion

Technical Field

The invention belongs to the technical field of voice authentication, and relates to a long-sequence biological Hash authentication method based on feature fusion.

Background

In recent years, biometric authentication systems have been increasingly used, and the storage of unprotected biometric data poses a serious privacy threat. Because of the scarcity of personal biometrics, once lost, sensitive information about the user is exposed, leading to a safety hazard. At present, security holes exist in voice authentication from voice acquisition to data storage to a voice hash database, hash sequences constructed in the authentication method are short, the same hash sequence may come from different voice segments, low distinguishability among users causes high false recognition rate, and authentication effect is poor. Therefore, the research of security and distinctiveness in voice biometric content authentication becomes an important challenge.

The biometric authentication method widely adopts the biometric features of human face, palm print, fingerprint, signature, iris and the like, and rarely relates to the voice feature. In recent years, the voice perception hash authentication method not only can achieve a good authentication effect, but also can resist noise interference in a channel transmission process, but the voice authentication method is lack of safety. Due to the efficiency and safety of the biological hash calculation, the method is widely applied to biological characteristics for protecting privacy. Therefore, the voice perception hash is combined with the biological hash, so that the authentication effect can be improved, and the safety of voice characteristics can be ensured. Currently, the extracted speech signal features include: short-time cross-correlation coefficients, short-time zero-crossing rates, mel-frequency cepstral coefficients, linear prediction cepstral coefficients, spectral entropy, wavelet coefficients, cochlear maps, Modulated Complex Lapped Transforms (MCLT), spectrogram, etc., and fusions between various features.

In the prior art, a voice authentication method has short hash sequence, so that the distinguishability is not enough, the robustness in a common low signal-to-noise ratio noise environment is low, the real-time performance is low due to the complexity of the authentication method, and the voice biological characteristics are easy to leak.

Disclosure of Invention

The invention aims to provide a long-sequence biological Hash authentication method based on feature fusion, which not only considers distinguishability and robustness, but also meets high-efficiency authentication.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a long sequence biological Hash authentication method based on feature fusion is specifically carried out according to the following steps:

a registration stage:

step 1: pre-emphasizing an input voice signal s (N), and then performing frame division and windowing, wherein the pre-emphasized voice signal x (N) is divided into N frames to obtain a processed signal x (N, m); where N (N =1,2, …, N) is the index number of the frame and m (m =1,2, …, N) is the index number of the data per frame;

step 2: robust feature extraction

MFCC feature extraction

Converting the processed signal into a frequency domain signal by discrete Fourier transform

Then, the power spectrum is obtained and the Mel filter transformation is carried out to obtain the Mel spectrumP _n={P(n,l ₁)︱n =1,2,…,N；l ₁=1,2,…,L ₁}; taking logarithm of Mel spectrum and performing Discrete Cosine Transform (DCT) to obtain Mel cepstrum coefficientMFCC _n={MFCC(n,i)︱=1,2,…,N；i=1,2,…,L ₁}；

CQCC feature extraction

Constant Q conversion is carried out on the processed signal to obtain a converted frequency spectrum signal

(ii) a Then, power spectrum is solved, logarithm and uniform sampling are taken, and transformed features are obtainedR _n={R(n,l ₂)︱=1,2,…,N；l ₂=1,2,…,L ₂}; then, discrete cosine transform is carried out to obtain constant Q cepstrum coefficientCQCC _n={CQCC(n,j)︱n =1,2,…,N；j=1,2,…,J ₂}；

3. Calculating space cosine value of adjacent frame

Uniformly setting the extracted MFCC characteristic value and the CQCC characteristic value as a wholeMQ(n,i)(n=1,2,…,N；i=1,2,…,L) And is andL=L ₁=L ₂(ii) a Respectively averaging the row vectors of MFCC characteristic value and CQCC characteristic valueMQ(i) (i=1,2,…,L) AndMQ ₁(i) (i=1,2,…,L) Then will beMQ(i) AndMQ ₁(i) Carrying out matrix splicing to obtain a matrix lambda₁=[MQ ₁ MQ]、Λ₂=[MQ MQ ₁]To matrix Λ₁Sum matrix Λ₂Each row of (1) is subjected to cosine value to obtain a biological characteristic vectorF(n)(n=1,2,…,N _p)；

And step 3: generating a pseudo-random matrix through the 2D-SIMM, and performing Schmidt orthogonalization on the pseudo-random matrix to obtain an orthogonal set matrix;

and 4, step 4: construction of biological safety formwork

Extracting the row vector in the orthogonal set matrix to obtain the orthogonal row vectorV ₁(n) (n=1,2,…,N _p) (ii) a Transforming the biometric feature vectorF(n) And orthogonal row vectorV ₁(n) The point multiplication results in the square matrix Ψ (n,n)；

To square matrix Ψ: (n,n) Chaotic shift is carried out to obtain an encrypted square matrix psi^*(n,n) (ii) a Orthogonal set matrix row vectorV ₂(n) (n=1,2,…,N _p) For encrypted square matrix psi^*(n,n) Projection dimensionality reduction is carried out to obtain a biological safety templateW(n)；

W(n)= V ₂(n)·Ψ^*(n,n)=[ W(1), W(2),…, W(N _p)]

And 5: binary processing biological safety templateW(n) Generating a one-dimensional binary bio-hash long sequenceh(n)，

Wherein, it is provided withh(1)=0，h(n) A bio-hash value for each frame;

then the biological hash long sequenceh(n) Storing the data to a cloud end to finish a registration stage;

and (3) an authentication stage:

step 1: providing authentication voice by the authentication user, and obtaining the biological Hash long sequence by the authentication voice through the steps 1 to 5 of the registration stageH(n)；

Step 2: biological Hash length sequence obtained by calculating authentication voice through Hamming distanceH(n) With the biological hash long sequence stored in the cloudh(n) Error rate ofBER(h,H)：

Where ≧ is an exclusive-or logic operation,N _pis the length of the biological hash sequence;

the hash matching is described using BER hypothesis testing:

T₀: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With the same content, then:BER(h, H)≤τthe authentication is passed;

T₁: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With different content, then:BER(h, H)＞τthe authentication fails;

wherein,τrepresents a perceptual authentication threshold;

and step 3: and feeding back the authentication result to the authenticated user.

The authentication method is a long-sequence biological Hash authentication method based on two-dimensional sinusoidal modulation mapping (2D-SIMM) and Constant Q Cepstrum Coefficient (CQCC) cosine values, the anti-collision performance is improved by adopting a Hash long sequence, the extracted frequency domain space distance characteristic has strong robustness, the safety of biological characteristics can be well ensured by a pseudo-random matrix generated by the 2D-SIMM, and the irreversibility of the biological Hash sequence is ensured. The method has the advantages of good robustness in the face of common low signal-to-noise ratio noise background, and can provide a safe template for biological features.

Compared with the prior art, the authentication method has the following advantages:

1) the method has good comprehensive performance and solves the problems of the existing biological hash authentication method.

2) The extracted biological characteristics can well cope with the interference of various content holding operations such as volume, resampling, MP3 compression and the like. And the matching accuracy of complex noise environments such as Babble and the like is better under the condition of low signal to noise ratio.

3) By adopting the Hash long sequence, the probability that different voice segments are confirmed to be the same segment can be effectively reduced, and the authentication rate is improved.

4) A ratio method is adopted to prove the trapdoor unidirectionality of the biological hash algorithm. The biological safety template generated by adopting the 2D-SIMM has higher safety and reduces the risk of leakage of biological characteristics.

Drawings

Fig. 1 is a flow chart of an authentication method of the present invention.

Fig. 2 is a flowchart of MFCC feature extraction in the authentication method of the present invention.

Fig. 3 is a flowchart of extracting CQCC features in the authentication method of the present invention.

FIG. 4 is a BER histogram of a matched speech with the rest 1199 speech;

FIGS. 5(a) and 5(b) are the MFCC cosine values at 1065bits and the CQCC cosine values at 1065bits, respectively;

FIGS. 6(a) and 6(b) are the MFCC cosine value (1065 bits) FRR-FAR curve and the CQCC cosine value (1065 bits) FRR-FAR curve, respectively;

FIG. 7 is a FAR-FRR curve of the authentication method of the present invention;

FIG. 8 is a unidirectional block diagram of an authentication biometric security template tape trapdoor;

fig. 9(a) and 9(b) are differences between the obtained feature of the correct key and the original feature, respectively;

fig. 10(a) and 10(b) are differences between the features obtained by the wrong key and the original features, respectively;

FIGS. 11(a) and 11(b) are Hamming code distances from and to, respectively;

fig. 12(a) and 12(b) are biological safety templates with correct chaotic shift and incorrect chaotic shift, respectively.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention provides a long-sequence biological Hash authentication method based on feature fusion, a flow chart of which is shown in figure 1, and the authentication method is specifically carried out according to the following steps:

a registration stage: and (3) the registered user performs feature extraction on the original voice, then constructs a biological security template, and finally stores the binary hash long sequence to the cloud.

Step 1: pre-emphasis is carried out on an input speech signal s (n) to obtain a pre-emphasis speech signal x (n), then, frame division and window addition are carried out on the pre-emphasis speech signal x (n), and a Hamming window is selected as a window function; the pre-emphasized speech signal x (N) is divided into N frames resulting in a processed signal x (N, m). Where N (N =1,2, …, N) is the index number of the frame and m (m =1,2, …, N) is the index number of the data per frame; the processed signal x (n, m) is a time domain signal.

Step 2: robust feature extraction

MFCC feature extraction

Converting a time domain signal to a frequency domain signal by Discrete Fourier Transform (DFT)

Then, the power spectrum is obtained and the Mel filter transformation is carried out to obtain the Mel spectrumP _n={P(n,l ₁)︱n =1,2,…,N；l ₁=1,2,…,L ₁}; taking logarithm of Mel spectrum and performing Discrete Cosine Transform (DCT) to obtain Mel cepstrum coefficientMFCC _n={MFCC(n,i)︱=1,2,…,N；i=1,2,…,L ₁As in fig. 2;

CQCC feature extraction

Constant Q Transformation (CQT) is carried out on the time domain signal to obtain a transformed frequency spectrum signal

(ii) a Then, power spectrum is solved, logarithm is taken, and uniform sampling is carried out, so as to obtain transformed featuresR _n={R(n,l ₂)︱=1,2,…,N；l ₂=1,2,…,L ₂}; then, discrete cosine transform is carried out to obtain constant Q cepstrum coefficientCQCC _n={CQCC(n,j)︱n =1,2,…,N；j=1,2,…, J ₂As in fig. 3;

for example: the MFCC features are evaluated using 16 Mel filters, so thatL ₁= 16. When CQC is obtained, after CQT is transformed,Kthe value of (d) is 8; then, performing equal interval interpolation sampling to obtainL ₂Is 16.

3. Calculating space cosine value of adjacent frame

Uniformly setting the extracted MFCC characteristic value and the CQCC characteristic value as a wholeMQ(n,i)(n=1,2,…,N；i=1,2,…,L) And is andL=L ₁=L ₂(ii) a Respectively averaging the row vectors of MFCC characteristic value and CQCC characteristic valueMQ(i) (i=1,2,…,L) AndMQ ₁(i) (i=1,2,…,L) Then the matrix is dividedMQ(i) AndMQ ₁(i) Splicing to obtain a matrix lambda₁=[MQ ₁ MQ]、Λ₂=[MQ MQ ₁]To matrix Λ₁Sum matrix Λ₂Each row of (1) is subjected to cosine value to obtain a biological characteristic vectorF(n)(n=1,2,…,N _p)；

as in the above example, a pseudo-random matrix is generated by 2D-SIMM, initial values a =1, b =5 of 2D-SIMM are set, and a secret key is setkAt the same time, the length and feature vector of the pseudo-random matrixF(n) The lengths are consistent. Setting by randomly selecting initial valuesx ₀=0.2,y ₀=0.3, yielding a pseudo-random matrixv(n,t)(n=1,2,…,N；t=1,2), performing schmitt orthogonalization on the pseudo-random matrix to obtain an orthogonal set matrixV(n,t)；

And 4, step 4: construction of biological safety formwork

Extracting the row vector in the orthogonal set matrix to obtain the orthogonal row vectorV ₁(n) (n=1,2,…,N _p) (ii) a To characterize the organismF(n) And orthogonal row vectorV ₁(n) The point multiplication results in the square matrix Ψ (n,n)；

To further increase the security of the biological template, the matrix Ψ: (n,n) Performing chaotic shift, namely circularly shifting rows and columns in a ring form; to reduce the computational complexity and increase the efficiency, the square matrix Ψ: (n,n) Both row and column of (2) are shifted by 0.5N（NRepresenting the length of the hash sequence) to obtain an encrypted square matrix Ψ^*(n,n) (ii) a Orthogonal set matrix row vectorV ₂(n) (n=1,2,…,N _p) After encryptingMatrix psi^*(n,n) Projection dimensionality reduction is carried out to obtain a biological safety templateW(n)；

W(n)= V ₂(n)·Ψ^*(n,n)=[ W(1), W(2),…, W(N _p)]

And 5: constructing bio-hash long sequences

Binary processing biological safety templateW(n) Generating a one-dimensional binary bio-hash long sequenceh(n) Then the biological hash is long sequenceh(n) Storing the data to a cloud end to finish a registration stage;

wherein, it is provided withh(1)=0，h(n) A bio-hash value for each frame;

taking the above example, the hash sequence length isN _p bits。

And (3) an authentication stage:

the authentication user provides voice, constructs a biological Hash long sequence, performs matching authentication with the biological Hash sequence at the cloud end, and feeds back the result to the authentication user.

the hash matching is described using BER hypothesis testing:

T₀: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With the same content, then:BER(h, H)≤τ

T₁: if the biological hash of two speech segments is longh(n) And bio-hash long sequencesH(n) With different content, then:BER(h, H)＞τ

wherein,τindicating a perceptual authentication threshold by setting a matching thresholdτThe authentication of the biological hash is realized; if the error rate is less than the thresholdτThen a bio-hash long sequenceh(n) And bio-hash long sequencesH(n) The biometric features of (1) are the same, authentication passes, otherwise authentication fails;

and step 3: feeding back the authentication result to the authenticated user

In order to evaluate the performance of the authentication method, a false positive rate (FAR) and a false negative rate (FRR) are respectively calculated by the following two formulas;

in the formula:τa perceptual authentication threshold;μis composed ofBERMean value;σis composed ofBERStandard deviation. The robustness and the distinguishability of the authentication method are generally evaluated by adopting FRR and FAR; the lower the FRR value is, the stronger the perception robustness is; the lower the FAR value, the better the distinctiveness.

The superiority of the performance of the invention is illustrated by the following simulation experiments:

first, experimental conditions and experimental instructions

The experimental speech data are from the TIMIT (Texas Instruments and Massachusetts Institute of technology) and TTS (text to speech) speech databases, with 1200 different speech segments in the original speech database. Each speech segment has a format of wav, a length of 4s, 16-bit PCM, mono, and a sampling frequency of 16 khz.

And performing content keeping operation on each voice in the voice database according to the environment of voice transmission. A speech database containing 10 content-preserving operations of volume, echo, noise, resampling and MP3 compression was built, for a total of 12000 speech segments.

In order to simulate mixed noise in a real environment, a noise-92 database is added into an original voice database. A86400-segment speech database of different real background noises is established, and comprises 8 noises including Gnousegen noise, Pink noise, Factory floor noise 1, Factory floor noise 2, HF channel noise, Machine gun noise, Babble noise and Volvo noise. Each noise has 9 SNR's of-10 db, -5db, 0db, 5db, 10db, 15db, 20db, 25db and 30db, respectively.

The experimental hardware platform is Intel (R) core (TM) i5-7500,4GB,3.4 GHz. The software environment is MATLAB R2018b under the Win10 operating system.

Second, the experimental contents

1. Discriminative testing and analysis

The BER of perceptual hash values of different speech contents substantially obeys a normal distribution. There are 1200 different voices, and the number of all available BER values calculated using binomial coefficients is 1200 × 1199/2= 719400. Fig. 4 shows BER histograms of one voice matched to the other 1199 voices, and it can be seen that the BER mean follows a normal distribution diagram, which is close to 0.5, indicating that the distinction is better. As shown in fig. 5, BER of hash sequences of different content voices substantially follows normal distribution, and fig. 5(a) shows MFCC cosine values at 1065 bits; (b) representing the CQCC cosine value at 1065 bits. As shown in fig. 5, the better the BER normal distribution curve, the better the randomness and collision resistance of the bio-hash sequence. The experimental results show that: the probability distribution of BER values of different voices has higher coincidence with the probability curve of standard normal distribution. As the hash sequence increases, the BER range is closer to 0.5 and the distributed values are closer to the theoretical values.

In the above example, the selected sequence length of 1065bits is smaller than that of 640bits and 799bits in BER range, and the effect is optimal. Compared with the MFCC cosine value algorithm, the actual value fluctuation of the CQCC cosine value algorithm is small, and the effect is better.

According to the clerkMophor-Laplace central limit theorem, Hamming distance approximate obedience

A normal distribution in which, among others,pthe probabilities of hash values "0" and "1" are generated for the feature values,N _pis the length of the biological hash sequence.

In the above example, the length of the bio-hash sequence is bits, and the average value and standard deviation of the theoretical normal distribution parameter are as follows. Table 1 describes the normal distribution parameters for different robust features and different hash sequence lengths.

TABLE 1 Normal distribution parameters for different robust features and different Hash sequence lengths

As can be seen from table 1, as the length of the hash sequence increases, the actual value of the authentication method of the present invention is closer to the theoretical value, and the actual curve is closer to the theoretical curve, which indicates that the length of the hash sequence generated by the authentication method of the present invention has good randomness and anti-collision performance. Meanwhile, the difference between the CQCC cosine value and the actual value curve of the MFCC cosine value is small and is close to the theoretical curve, which shows that the two methods have good distinguishability.

2. Robustness testing and analysis

The original speech was subjected to a content-preserving operation as shown in table 2, resulting in 12000 pieces of operating speech.

Table 2 content holding operation

719400 BER data is obtained by comparing the perceptual hash values of 1200 speech segments pairwise, and when the hash length is set to 640bits, 799bits and 1065bits, the FAR-FRR graph shown in FIG. 6 is obtained: FRR curves and FAR curves of different hash sequence lengths and MFCC cosine values are intersected, so that the differentiability and the robustness cannot be well balanced, and the FRR curves and the FAR curves of different hash sequence lengths are not overlapped in the authentication method, so that the content holding operation and the voices of different contents can be accurately distinguished, and the authentication method can well balance the differentiability and the robustness.

Fig. 7 is a graph of the FAR-FRR curve obtained in the above example, the length of the hash sequence used is 1065bits, the obtained FRR-FAR curves are not overlapped, the interval of the final drop points of the FRR and the FAR is [ 0.2350.425 ], and experimental results show that the authentication method of the present invention has good distinguishability and robustness, and can accurately identify content retention operations and voices with different contents.

3. Matching rate test and analysis under real noise environment

In order to evaluate the robustness of the authentication method of the invention to noise, the matching rate is introducedp _r，

Wherein,T _Athe number of the voice segments which are correctly accepted by the system among the voice segments with the same content is sensed;T _Rthe number of the voices rejected by the system error;F _Ain order to sense the number of the speech segments which are accepted by the system error among different speech segments of the content. Threshold valueτThe minimum bit error rate of the FAR curve is selected. Threshold values in different methodsτSelecting: the authentication method is 0.4173, the MFCC cosine value method is 0.4264, and the matching precision of the authentication method is higher than that of other methods for factory noise 1, white Gaussian noise, high-frequency channel noise and machine gun noise. For all noises, when the signal-to-noise ratio is greater than 10db, the matching rate of the authentication method reaches 100%, and the matching rate of the authentication method is slightly lower than the MFCC cosine characteristic only under the factory noise 2 and the Volvo noise. And other noises and MFCC cosine value characteristics have poor expression effect. In a whole view, the authentication method has stronger robustness and can better realize the biological authentication in an extreme noise environment. Therefore, the authentication method has stronger robustness for different noises under low signal-to-noise ratioAnd the requirement of voice matching under a complex environment can be met.

4. Unidirectional and safety testing and analysis

In order to verify the unidirectional property of the biological hash algorithm with the trap door, a unidirectional verification algorithm with the trap door based on a logarithmic ratio method is provided. In fig. 8, a part a is the direction of generating the bio-security template, a part B is the reverse direction of generating the bio-security template, and a part C is the uniqueness of the trapdoor for verifying and extracting the bio-hash algorithm.

Random extraction of speech segments in speech libraryx，Original features of its speechFThe bio-security template is obtained by the orientation of portion A in FIG. 8WThen to obtain the speech featuresF ^′The difference between the two biometric sequences is finally calculated by the orientation of part B. The log ratio difference between two sequences is defined as:

wherein,F ^′in order to obtain a characteristic value from the biometric security template,Fis the original characteristic value of the image to be processed,RCis the difference state of the biological characteristics.

And randomly extracting voice segments in the original voice library to verify the unidirectional characteristic of the trapdoor of the biological hash algorithm. FIGS. 9 and 10 are features of correct key and incorrect key derivation, respectivelyF ^′ ₁、F ^′ ₂And original characteristicsFA difference. FIG. 9(a) shows that the ratio of the features of the correct key to the original features substantially matches，FIG. 9(b) is a specific difference between the feature of the correct key and the original feature ratio; FIG. 10(a) shows that the ratio of the signature of the wrong key to the original signature is inconsistent，Fig. 10(b) shows a specific difference between the signature of the wrong key and the original signature ratio. Comparing fig. 9 and 10, it can be seen that the features extracted using the correct keyF ^′ ₁And original characteristicsFThe distance between the two is distributed at (-2.1 × 10)^-15，1.3×10^-15) To (3). Using features extracted from wrong keysSign forF ^′ ₂And original characteristicsFQuite differently, the distance between the two is distributed around-4.1, since the error therein is only 10^-8The error is too small, and thus is shown as a straight line in fig. 10 (b). The signature sequence generated by the incorrect key is far from the original signature sequence compared to the correct key, thus indicating the one-way nature of the biohash trapdoor.

To further verify the unidirectionality of the biohash algorithm with trapdoors, 150 voices are randomly extracted from a voice library and respectively calculatedF ^′ ₁、F ^′ ₂AndFhamming code distance between them, hamming code distance is shown in fig. 11. FIG. 11(a) is a drawingF ^′ ₁AndFthe hamming code distance between them is shown by the figure: features derived from correct keysF ^′ ₁And original characteristicsFThe Hamming distance between them is (-3.7X 10)^-18，6.9×10^-18). FIG. 11(b) is a drawingF ^′ ₂AndFthe hamming code distance between them is shown by the figure: features derived from incorrect keysF ^′ ₂And original characteristicsFThe Hamming distance range between the two is (0.10,0.19), the biological hash algorithm is further verified to be unidirectional with trapdoors, and the safety of the biological hash algorithm is also proved.

To enhance the security of the authentication method of the present invention, chaotic shift is employed in the construction of the biometric template. FIG. 12 is a graph comparing the correct chaotic shift and the incorrect chaotic shift to obtain a biosafety template, and FIG. 12(a) shows that the biosafety template with the correct chaotic shift is distributed in an interval (-5.5 × 10)^-4,-0.2×10^-4) Whereas the biological safety templates with the erroneous chaotic shift in FIG. 12(b) are distributed in the interval (0.2X 10)^-3,1.9×10^-3) In the method, the values of the biological safety template with correct chaotic shift and the biological safety template with wrong chaotic shift are completely different, so that the correct biological safety template cannot be obtained when the correct chaotic shift is unknown.

5. Real-time analysis

Real-time performance is a very important evaluation criterion in voice content authentication. In order to evaluate the real-time performance of the authentication method of the present invention, 200 voice segments need to be randomly selected from the voice database, and then the average running time is calculated. The same operating environment is adopted, the voice segment is 4s, and the results of the authentication method and the MFCC cosine value algorithm are given in Table 3.

TABLE 3 authentication method of the invention and instantaneity of MFCC cosine values

As shown in table 3, in the authentication method of the present invention, as the length of the hash sequence increases, although the real-time performance is reduced, the difference is small, and the requirement of real-time authentication is satisfied. Compared with other hash sequence lengths, the authentication method provided by the invention has the advantages that although the real-time performance is lower, the distinguishability is greatly improved. Compared with the MFCC cosine value, when the hash sequences are 1065bits, the MFCC cosine value is 1.08 times of the authentication method. The authentication method has very good effect in the aspect of real-time performance, and can meet the requirement of real-time authentication.

In conclusion, the long-sequence biological Hash authentication method based on the 2D-SIMM and CQCC cosine value feature fusion has good comprehensive performance, and solves the problems of the existing biological Hash authentication algorithm. The following conclusions can be drawn through experimental analysis: by adopting the Hash long sequence, the probability that different voice segments are confirmed to be the same segment can be effectively reduced, and the authentication rate is improved. The extracted biological characteristics can well cope with the interference of various content holding operations such as volume, resampling, MP3 compression and the like. And the matching accuracy of complex noise environments such as Babble and the like is better under the condition of low signal to noise ratio. A ratio method is adopted to prove the trapdoor unidirectionality of the biological hash algorithm. The biological safety template generated by adopting the 2D-SIMM has higher safety and reduces the risk of leakage of biological characteristics.

In the voice authentication method in the prior art, the extracted voice features are directly subjected to hash construction and stored in the cloud, so that the voice features are easily leaked. When the hash is constructed, the utilization efficiency of the voice features is poor, the constructed hash sequence is too short, so that the distinction of the hash sequence is not enough, and the authentication has deviation. The invention provides a long-sequence biological Hash authentication method based on two-dimensional sinusoidal modulation mapping (2D-SIMM) and Constant Q Cepstrum Coefficient (CQCC) cosine value fusion. Firstly, extracting CQCC of voice signals, then solving a characteristic value of the space cosine distance of CQCC of adjacent voice frames, and finally performing projection mapping on the characteristic value and a pseudo-random matrix generated by 2D-SIMM to construct a biological Hash length sequence. Two proposed robust feature schemes, namely MFCC (Mel cepstral coefficient) space cosine distance and CQCC space cosine distance, are evaluated through experiments, and speech in TIMIT (Texas Instruments and Mass Instruments Institute of technology) and TTS (text to Speech) speech libraries is used for testing. Experimental results show that the authentication method has better effect by adopting the CQCC space cosine distance characteristic, not only gives consideration to distinguishability and robustness, but also meets high-efficiency authentication. In the face of common low signal-to-noise ratio noise background, the authentication method has good robustness, and can provide a safe template for biological characteristics.

Claims

1. A long sequence biological Hash authentication method based on feature fusion is characterized by comprising the following steps:

a registration stage:

step 2: robust feature extraction

MFCC feature extraction

CQCC feature extraction

3. Calculating space cosine value of adjacent frame

Uniformly setting the extracted MFCC characteristic value and the CQCC characteristic value as a wholeMQ(n,i)(n=1,2,…,N；i=1,2,…,L) And is andL=L ₁=L ₂(ii) a Respectively averaging the row vectors of MFCC characteristic value and CQCC characteristic valueMQ(i) (i=1,2,…,L) AndMQ ₁(i) (i=1,2,…,L) Then will beMQ(i) AndMQ ₁(i) Carrying out matrix splicing to obtain a matrix lambda₁=[MQ ₁ MQ]、Λ₂=[MQMQ ₁]To matrix Λ₁Sum matrix Λ₂Each row of (1) is subjected to cosine value to obtain a biological characteristic vectorF(n)(n=1,2,…,N _p)；

and 4, step 4: construction of biological safety formwork

W(n)= V ₂(n)·Ψ^*(n,n)=[ W(1), W(2),…, W(N _p)]

Wherein, it is provided withh(1)=0，h(n) A bio-hash value for each frame;

and (3) an authentication stage:

step 1: authenticating a user offerThe authentication voice is processed through the steps 1 to 5 of the registration stage to obtain the biological Hash long sequenceH(n)；

the hash matching is described using BER hypothesis testing:

wherein,τrepresents a perceptual authentication threshold;

2. The feature fusion based long sequence biometric hash authentication method of claim 1, wherein the square matrix Ψ (a: (b:)n,n) Both row and column of (2) are shifted by 0.5N（NRepresenting the length of the hash sequence) to obtain an encrypted square matrix Ψ^*(n,n)。

3. The feature fusion based long-sequence biometric hash authentication method according to claim 1, wherein, for evaluating the performance of the authentication method, the false identification rate and the false rejection rate are calculated by the following two formulas, respectively;

in the formula:τa perceptual authentication threshold;μis composed ofBERMean value;σis composed ofBERStandard deviation; the robustness and the distinguishability of the authentication method are generally evaluated by adopting FRR and FAR; the lower the FRR value is, the stronger the perception robustness is; the lower the FAR value, the better the distinctiveness.