CN112883206A

CN112883206A - Long-sequence biological Hash ciphertext voice retrieval method based on feature fusion

Info

Publication number: CN112883206A
Application number: CN202110135465.8A
Authority: CN
Inventors: 黄羿博; 蒲向荣
Original assignee: Northwest Normal University
Current assignee: Northwest Normal University
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2021-06-01
Anticipated expiration: 2041-02-01
Also published as: CN112883206B

Abstract

The invention discloses a long-sequence biological Hash ciphertext voice retrieval method based on feature fusion, which comprises the steps of registering, establishing a biological feature template with a single mapping key according to voice feature classification and classification results, generating a voice feature index from a feature vector through a biological Hash method, encrypting a voice file, and sending the feature index and encrypted voice to a cloud. And searching, extracting the feature vectors of the voice to be detected in the same way, matching the feature vectors with the classification result, and uploading the generated hash sequence and the address of the voice to be detected in the hash index table to the cloud if the result is successful. And searching again at the cloud end, matching and searching the uploaded hash sequence and the hash sequence in the hash index table of the system by adopting a Hamming distance algorithm, and returning the corresponding encrypted voice data in the successfully matched ciphertext voice library to the mobile terminal. The retrieval method can effectively prevent plaintext leakage, has good efficiency and precision, and improves the safety of the cloud encrypted voice retrieval system.

Description

Long-sequence biological Hash ciphertext voice retrieval method based on feature fusion

Technical Field

The invention belongs to the technical field of voice retrieval, and relates to a long-sequence biological hash ciphertext voice retrieval method based on feature fusion.

Background

With the increasing demand of cloud-based efficient voice storage and retrieval services, how to quickly find useful information in massive voice data becomes a subject of attention and challenge. However, the third-party cloud service provider is not completely trusted, and the voice data uploaded to the cloud server by the user is not poor for sensitive information related to national and enterprise secrets and personal privacy, which may cause problems such as sensitive information disclosure or personal privacy misuse. At present, with the continuous increase of the demand of cloud-side efficient voice storage and retrieval services, in order to ensure the security of user voice data, the voice data needs to be encrypted before being uploaded to a cloud server. Since most of the perceptual content of the speech is lost due to the encryption processing, how to extract speech features capable of meeting the search requirement from the ciphertext speech becomes a subject of attention and challenge. Therefore, the method has important research significance for the encryption mechanism of the voice data and the research on the encrypted massive ciphertext voice retrieval method.

Currently, there are many research achievements in the voice retrieval technology. The technology mainly comprises the following steps: text or keyword based retrieval and content based retrieval. And content-based speech retrieval can be further divided into: feature matching, deep learning, ranking and retrieval, and the like. Feature extraction is an important step of voice retrieval, and the feature extraction mainly comprises the following steps: perceptual hashing, audio fingerprinting, and the like. The ciphertext voice retrieval based on the content can not only ensure the privacy safety of the voice data, but also efficiently and accurately retrieve the data, so the research of the ciphertext voice retrieval based on the content has very important theoretical and application values. Many research institutions and scholars at home and abroad have already obtained many research achievements in the aspect of ciphertext voice retrieval based on contents.

The existing ciphertext voice retrieval based on contents has the following defects: in the aspect of voice feature extraction, the existing experiment cannot well balance the robustness, the distinguishability and the experiment efficiency because the three are mutually constrained, a perceptual hash sequence for retrieval is extracted from original domain (plaintext) voice, the perceptual hash sequence is embedded into the encrypted voice by using a digital watermarking technology, and a voice watermark of an encrypted voice file in a ciphertext voice library needs to be extracted for matching and then a result is returned during retrieval, so that the retrieval efficiency is influenced to a certain extent, the work of a data owner on a client is increased, and the system becomes complicated.

Disclosure of Invention

The invention aims to provide a long-sequence biological hash ciphertext voice retrieval method based on feature fusion, which improves the retrieval efficiency, reduces the work of a data owner on a client and simplifies the system.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a long-sequence biological Hash ciphertext voice retrieval method based on feature fusion specifically comprises the following steps:

step 1: registration

1) For original voice signalx(n) Pre-emphasis is carried out to obtain a pre-emphasis signal, and then the pre-emphasis signal is subjected to framing and windowing function processing to obtain a pre-processing signalx _i(n)；

2) For pre-processed signalx _i(n) Performing linear prediction minimum mean square error feature extraction to obtain feature vectorV ₁While, at the same time, pre-processing the signalx _i(n) Performing fast Fourier transform to obtain feature vectorV ₂Then to the feature vectorV ₂Reducing dimension to obtain vectorV′ ₂(ii) a For feature vectorV ₁Sum vectorV′ ₂Carrying out mean value filtering to obtain a fusion feature vectorV；

3) For the fusion feature vectorVCarrying out binarization differential hash to obtain a differential hash vector corresponding to the original voice, then summing the differential hash vectors, and classifying the voices with the same summation value into one class; establishing a biological characteristic template with a single mapping key according to different classes; and the three-dimensional Lorenz chaotic measurement matrix is generated into 1 in a database through a secret keyNOf (2) isNumber sequenceq(i) Then converted into mutually orthogonal random sequences through Schmidt orthogonalizationQ(i) Finally, mutually orthogonal random sequences are obtainedQ(i) And the feature parameter vectorVPerforming scalar product to obtainD=D((i)︱i=1,2,…,N) (ii) a Will be provided withD(i) Performing binarization processing to obtain a hash indexh=h((i)︱i=1,2,…,N)；

4) Generating original speech signal by Henon encryption algorithmx(n) Random sequence of equal lengthS ₁Will be random sequencesS ₁Obtaining the sequences according to descending orderS′₁And sequence ofS′₁Andx(n) Is a one-to-one mapping relationship; through the mapping relationx(n) Assign a value toS′₁Then the value is assignedS′₁Restore to unordered state, i.e. form scrambled encrypted sequence to original speechx′(i) (ii) a Finally, a private key with the bit length of 32 bytes is setS _kFor the original voicex(n) Performing iterative encryption to obtain ciphertextF′′(l)；

And (3) encryption process:

a. the original speech is processedx(n) And the hash index h are respectively passed through a private keyS _pScrambling and encrypting are performed to generate ciphertext data 1, and the ciphertext data 1 is usedF′Represents;

b. setting SHA256 to 32-byte long initial keyS _KAnd is andS _K(m) The value range is [0,255%]；

c. Cipher text data 1 and keyS _K(l,m) Performing 32-bit iterative encryption to generate ciphertext data 2, and using the ciphertext data 2F′′Represents:F′′=S _K×F′

in the formula:S _Kis the overall key after iteration, where the next set of private keys is generated by the previous set of iterations:S _K(i+1)=sha256(F′′(i))

5) sending the hash index and the ciphertext to a cloud end, and establishing a corresponding hash index table at the cloud end;

step 2: retrieval

Initial search

1) Firstly, generating a characteristic parameter vector according to step 1) and step 1) in step 1V' and to the feature parameter vectorVPerforming binarization processing to obtain differential hash vectors, then summing the differential hash vectors and performing hash matching on the differential hash vectors and the differential hash table obtained in the step 5) in the step 1; if the matching fails, directly feeding back the retrieval result to the user; if the matching is successful, executing the step 3) and the step 4) in the step 1);

2) then, the feature parameter vector is usedVSending to corresponding biological characteristic template to obtain hash indexh′₁；

3) To-be-retrieved speechx′(n) Sending the address and the hash index in the hash index table to the cloud;

4) for the result returned by the cloud, if the retrieval fails, directly feeding the retrieval result back to the user; if the search is successful, executing step 3) in step 1;

5) decrypting the encrypted voice in the reverse direction of the step 4) in the step 1, then generating a hash index h ' ' from the decrypted voice according to the step 2) and the step 3) in the step 1, comparing the hash index h ' ' with a hash index h ' returned by the cloud, and judging whether the voice is attacked maliciously;

6) feeding back the retrieval result and the comparison result of the mobile terminal to the user;

search again

I, according to the voice to be searched provided by the mobile terminalx′(n) Calculating the voice to be retrieved at the address of the Hash index tablex′(n) Hash index ofh′₁Hash index corresponding to cloud hash index table addressh′₂The hamming distance of;

feeding back the search result to the user, and indexing the hash index only when the Hamming code distance is less than the threshold valueh′₂And the encrypted voice is transmitted to the mobile terminal at the same time.

To measure this algorithm, a false positive rate is defined (FAR) The formula is as follows:

。

the retrieval method adopts the principles of voice short-time cross-correlation and perceptual hashing technology, so that plaintext leakage can be effectively prevented, and the biological characteristic template has good diversity and revocable property. Meanwhile, the retrieval method has good efficiency and precision, the problem of voice retrieval after content keeping operation is solved, and the safety of the cloud encryption voice retrieval system is improved.

Compared with the prior art, the retrieval method has the following advantages:

1) the retrieval method extracts the linear prediction minimum mean square error and the fast Fourier transform extraction characteristics for the voice, performs differential hash on the characteristics to generate the differential hash table, and greatly reduces the operation amount during cloud retrieval by uploading the addresses corresponding to the voice to be detected and the differential hash table in the retrieval process.

2) The ciphertext voice perception Hash construction method provided by the retrieval method of the invention also shows better distinguishability, robustness and summarization when being applied to the original domain (plaintext) voice. Therefore, the hash construction method in the retrieval method can be applied to the application fields of voice feature extraction, content retrieval, content authentication and the like of an original domain or an encrypted domain.

3) The voice encryption method in the retrieval method does not cause most of perception content of voice to be lost, and the key space is large, so that the perception hash abstract can be directly extracted from the ciphertext voice by the ciphertext voice perception hash method, voice does not need to be downloaded and decrypted when ciphertext voice retrieval, authentication and other operations are carried out, the safety of voice data is ensured, the method can be applied to encryption storage and management of large-scale voice data, and the possibility is provided for reducing the work of a user at a client.

4) The Hash sequence is not required to be embedded into the ciphertext voice as a digital watermark, so that the ciphertext voice retrieval system is safer, more efficient and simpler, and has higher retrieval efficiency and retrieval precision.

Drawings

FIG. 1 is a flow chart of the search method of the present invention.

FIG. 2 is a flow chart of the speech feature extraction in the retrieval method of the present invention.

Fig. 3 is a scatter diagram before and after encryption.

Fig. 4 is a comparison graph of voice encryption and decryption.

FIG. 5 is a graph illustrating the difference between the pre-encryption and post-decryption speech data.

Fig. 6 is a BER normal distribution diagram.

Fig. 7 is a graph of the average BER for different algorithm content retention operations.

FIG. 8 is a graph of threshold values versus precision and recall.

FIG. 9 is a graph of recall and precision for different threshold values;

fig. 10 is a diagram of an original voice encryption process.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention provides a long-sequence biological hash ciphertext voice retrieval method based on feature fusion, and a flow chart is shown in figure 1. The method comprises the following steps:

the method comprises the following steps: registration

1) For original voice signalx(n) Pre-emphasis is performed to enhance the high frequency portion to obtain a pre-emphasized signal, and then framing and windowing are performed on the pre-emphasized signal to obtain a pre-processed signalx _i(n)；

The low-frequency band energy of the voice signal is large, the high-frequency band signal energy is obviously small, the power spectral density of the noise output by the frequency discriminator is increased along with the square, the low-frequency signal-to-noise ratio of the signal is large, and the high-frequency signal-to-noise ratio is obviously insufficient, so that the low-frequency signal-to-noise ratio of the signal is large, and the highHigh frequency transmission is difficult and pre-emphasis is often used to solve this problem. Therefore, the input voice signal is treated firstx(n) Pre-emphasis is performed to enhance the high frequency portion, and then framing and windowing are performed on the pre-emphasized signal to obtain a pre-processed signalx _i(n)。

2) For pre-processed signalx _i(n) Respectively carrying out feature extraction of linear prediction minimum mean square error and feature extraction of fast Fourier inverse transformation to obtain feature vectorsV ₁And feature vectorsV ₂Then to the feature vectorV ₂Reducing dimension to obtain vectorV′ ₂. For feature vectorV ₁Sum vectorV′ ₂Carrying out mean value filtering to obtain a fusion feature vectorV；

The flow of feature extraction for linear prediction of minimum mean square error is shown in fig. 2, and the specific process is as follows:

the chaos measurement matrix generation algorithm is as follows: k order signalxMeasurement matrix Φ: (

) The satisfied RIP properties are as follows:

in the formula:ζ _kis a constant number of times that the number of the first,ζ _k∈(0,1)。

is the energy of the signal.

The general expression of the chaotic measurement matrix based on Lorenz is as follows:

in the formula (I), the compound is shown in the specification,ξ _x , _y , _zrepresenting Lorenz chaosThe mapping function of the measurement matrix is expressed as follows:

in the formula (I), the compound is shown in the specification,α、β、λis a constant number of times, wherein,α=8/3，β=28，λ=10。x(i)，y(i)，z(i) The initial values of (a) are 0, 1, 0, respectively;x′(i)、y′(i) Andz′(i) Is the partial derivative with respect to i.

3) For the fusion feature vectorVAnd carrying out binarization differential hash to obtain a differential hash vector corresponding to the original voice. Then, calculating fusion characteristic vectors and differential hash vectors for all voices in the voice library, summing the differential hash vectors, and classifying the voices with the same summation value into one class; establishing a biological characteristic template with a single mapping key according to different classes; and the three-dimensional Lorenz chaotic measurement matrix is generated into 1 in a database through a secret keyNRandom number sequence ofq(i) Then converted into mutually orthogonal random sequences through Schmidt orthogonalizationQ(i) Finally, mutually orthogonal random sequences are obtainedQ(i) And fusion of feature vectorsVPerforming scalar product to obtainD=D((i)︱i=1,2,…,N) (ii) a Will be provided withD(i) Performing binarization processing to obtain a hash indexh=h((i)︱i=1,2,…,N) Namely: and (4) indexing the features.

4) Generating original speech signal by Henon encryption algorithmx(n) Random sequence of equal lengthS ₁Will be random sequencesS ₁In descending order to obtainS′₁And is andS′₁andx(n) Is a one-to-one mapping relationship; through the mapping relationx(n) Assign a value toS′₁Then the value is assignedS′₁Recovering to unordered state, i.e. forming scrambling encryption sequence for original voice, finally setting a private key with bit length of 32 bytesS _kAnd pass throughForm to original speechx(n) Performing iterative encryption to obtain ciphertextF′′(l)。

In which, the encryption process, as shown in fig. 10:

a. the original speech is processedx(n) And hash indexhRespectively by means of a private keyS _pScrambling and encrypting are performed to generate ciphertext data 1: (F′）；

c. Ciphertext data 1 (F′) And a secret keyS _K(l,m) Performing 32-bit iterative encryption each time to generate ciphertext data 2: (F′′) As follows:

in the formula, n isx(n) The total number of sampling points;

F′′=S _K×F′

in the formula:S _Kis the overall key after iteration, where the next set of private keys is generated from the previous set of iterations as follows:

S _K(i+1)=sha256(F′′(i))

step two: retrieval

Initial search

1) The original speech is processedx′(n) Firstly, generating a characteristic parameter vector according to the step 1) and the step 2) in the step oneV' and to the feature parameter vectorV' binarization processing is performed to obtain a differenceC, carrying out hash vector summation, and matching the differential hash vector with the classification result obtained in the step c in the step one; if the matching fails, directly feeding back the retrieval result to the user; if the matching is successful, executing the step c and the step d in the step one;

2) then, the feature parameter vector is usedV' sending to and the feature vectorV' corresponding biometric template, resulting in a hash indexh′₁；

4) for the result returned by the cloud, if the retrieval is failed, the retrieval result is fed back to the user; if the search is successful, executing the step c in the step one;

5) d, decrypting the encrypted voice in the reverse direction of the step d in the step one, then generating a hash index h ' ' from the decrypted voice according to the step b and the step c in the step one, comparing the hash index h ' ' with a hash index h ' returned by the cloud end, and judging whether the voice is attacked maliciously or not;

decryption procedure for modified SHA256 used for decryption: first, a secret key is usedS _K(l,m) Carrying out iterative decryption on the ciphertext data 2 to generate ciphertext data 1; then, the ciphertext data 1 is reversely scrambled to restore the original data;

search again

To measure this algorithm, a false positive rate is defined (FAR) The formula thereofComprises the following steps:

analysis of experiments

The voice used by the experiment proposed by the search method of the invention is from voice signals in TIMIT (Texas instruments and massachusetts institute of technology) and TTS (text to speech) voice libraries, the time length of the voice signals is 4 s.

The experimental hardware platform is an Intel (R) core (TM) i5-7500M CPU, the 3.40GHz and the memory 4GB software environment is Matlab R2018b under the Windows 10 operating system.

Security and integrity analysis of plaintext data

In order to avoid plaintext leakage of data in an open mobile channel and a semi-open cloud storage and ensure data integrity during voice recovery, an improved SHA256 experiment is adopted to encrypt voice data.

In order to verify the safety of the experiment, namely the correlation of the voice before and after encryption, a voice is randomly selected from a voice library at first, and then 32000 continuous sample points of the voice are randomly selected as sampling points, so thatx(i) Is shown as the abscissa of the graph,x(i+1) is the ordinate, the scatter plot before and after encryption of the sample points is shown in fig. 3, fig. 3 (a) is the scatter plot before encryption of the sample points, and fig. 3 (b) is the scatter plot after encryption of the sample points: it can be seen from the figure that the correlation of the encrypted voice is greatly reduced, which indicates that the voice has good security.

To further verify the correlation of the speech before and after encryption, spearman correlation coefficient is used to calculate the correlation between before and after encryption, which is defined as follows:

calculating by the formula: the coefficient of this speech before the modified SHA256 encryption is 0.9840 and the encrypted coefficient is-0.0030. As can be seen from fig. 3 and the spearman correlation coefficient: after the voice is encrypted, the relevance is greatly reduced, which shows that the retrieval method of the invention has good safety, namely: the leakage of the plaintext data is effectively prevented.

In order to further verify the security of the improved SHA256 encryption experiment, the whole voice is encrypted and decrypted to obtain an encryption and decryption comparison graph shown in fig. 4; fig. 4 (a) is a waveform diagram of an original voice, fig. 4 (b) is a waveform diagram of an encrypted voice, fig. 4 (c) is a waveform diagram of an error key decrypted voice, and fig. 4 (d) is a waveform diagram of a correct key decrypted voice.

In order to verify the data integrity of the experiment, i.e. the invariance of the pre-encryption and post-decryption voice, the pre-encryption voice data and the post-decryption voice data are subtracted, and the result is shown in fig. 5.

As can be seen from fig. 4 and 5: the retrieval method of the invention not only has good encryption performance to voice, but also can ensure the integrity of data.

In summary, the following steps: in a cloud environment, the improved SHA256 method can not only effectively prevent leakage of plaintext data, but also maintain data integrity.

Performance analysis

1. Distinctiveness of

The BER of the different content speech signal biological hash values basically obeys normal distribution. The normal distribution of the speech library BER data in this experiment is shown in fig. 6:

the normalized Hamming distance of the hash sequence is approximately obeyed by the Limonid-Laplace central limit theorem

The normal distribution of (a), wherein,prepresenting the probability of 0 or 1 occurring in the bio-hash sequence;Nrepresents the total number of frames, whereinN=208。

TABLE 1 Positive Taiji distribution parameters

As can be seen from table 1: compared with other total frame numbers, the positive Taiji distribution parameter obtained by the experiment is closer to the theoretical value when the total frame number is 802.

TABLE 2 comparison of the false positives for different Total frame counts

As can be seen from table 2: when the total number of frames in the experiment was 535, 401 and 321, respectivelyτWhen =0.16, the FAR is about the total number of framesN2.3 × 10 in case of =802²⁴，1.0×10³⁶，1.6×10⁴²And (4) doubling.

It can also be seen from table 2: even in the same experiment, the FAR is different at different total frame numbers, that is, the FAR of the same voice segment is different at different hash sequence lengths obtained at different total frame numbers, so that it is not sufficient to measure the experimental distinguishability by the FAR alone.

To further measure the experimental distinctiveness, an entropy rate ER was used for the calculation. Entropy rate is used as a measure for measuring the uncertainty of random events, and the definition of ER is as follows:

by substituting the data in table 1 into the above equation, the ER values for the experiments at different total frame numbers are shown in table 3:

TABLE 3 ER comparison for different Total frame numbers

As can be seen from tables 1, 2 and 3, the experiment has good randomness and impact resistance at a total frame number of 802, namely: long sequence experiments are well differentiated.

TABLE 4 comparison of false positives for different thresholds

Robustness

To test robustness, the speech library was first subjected to 9 content-preserving operations as shown in table 5,

TABLE 5 Contents holding operation

Then calculate the average after the content hold operationBREAveraging of speech librariesBREAs shown in fig. 7: as can be seen from fig. 7: the BER mean value of the experiment after common 12 content keeping operations is integrally low, which shows that the retrieval method has better robustness. Because a linear prediction minimum mean square error and inverse fast Fourier transform feature fusion mode is adopted to carry out linear analysis on the voice signal:

the weighting operation is the operation of digitally encoding a signal, which causes fluctuation of speech data, increases an error of a minimum mean square error of linear prediction, and changes a linear coefficient of inverse fast fourier transform as a whole, and thus is the least robust.

For the echo adding operation, the echo is added to superpose the normal speech and the echo speech, so that the error of the minimum mean square error of linear prediction is increased, and the linear coefficients of the part of the silence frame and the part of the speech frame of the inverse fast fourier transform are changed, so that the robustness is poor.

For narrow-band noise operation, the robustness is poor because the interference of noise can increase the error of the linear prediction minimum mean square error and change the linear coefficient of the inverse fast fourier transform.

In the MP3 operation, since the signal is compressed but the linear processing of the entire speech is small, the minimum mean square error of the linear prediction is large and the linear coefficient of the inverse fast fourier transform is not changed, so that the robustness is good.

For the volume adjustment operation, the increase or decrease of the volume only affects the voice amplitude, so that the error of the voice frame of the minimum mean square error part of the linear prediction is increased, and the linear coefficient of the inverse fast Fourier transform is hardly changed, so that the robustness is good.

For the resampling operation, the frequency distribution of the voice signals is 0-4 KHz, so that the operation has the minimum influence on the voice signals, the minimum mean square error of linear prediction and the inverse fast Fourier transform are small in change, and the robustness is the best.

In conclusion, the retrieval method has good robustness and distinctiveness, and can meet the retrieval requirements of the system on the hash index table sequence.

Search performance

In information retrieval, precision ratio (Pr)ecision) And recall (Re)call) Is an important index reflecting the search performance, and the formula is as follows:

Precision=[S _T/(S _T+ S _N)]×100%

Recall=[S _T/(S _T+ S _F)]×100%

in the formula:S _Trepresenting the number of correct voices and retrieved in the retrieval result,S _Nthe number of erroneous speech sounds that are indicated and retrieved,S _Findicating the number of correct voices and not retrieved.

The following formula shows that: precision and recall are in conflicting relation, i.e. as precision increases, recall decreases. On the contrary, as the precision ratio decreases, the recall ratio increases, so it is important to select a proper threshold value to balance the two. As can be seen from fig. 8 (a): as the threshold value increases, the precision ratio of the experiment under different total frame numbers is 100%. As can be seen from fig. 8 (b): with the increase of the threshold value, the larger the total frame number is, the higher the recall ratio is, namely: under the same condition, the retrieval performance of the long sequence is better.

When the features are matched, if the normalized Hamming code distance is smaller than the threshold value, the matching is successful, otherwise, the matching is failed. As can be seen from fig. 6 and 7: the BER of the original voice is 0.43 at the minimum value, and the maximum value of the BER average value after the content holding operation is 0.24, so the threshold value should fall within the interval (0.24, 0.43).

As can be seen from fig. 9 (the recall ratio in fig. 9 (a) and the precision ratio in fig. 9 (b)): in the interval (0.24, 0.43), only when the interval [0.36, 0.4] satisfies that the precision ratio and the recall ratio are 1 at the same time, so in order to ensure that the retrieval method of the present invention still has good precision ratio and recall ratio when undergoing the content holding operation, 0.36 is adopted as the threshold value.

As can be seen from fig. 7: the original voice is changed in different degrees after the content keeping operation, so that the hash sequence after the feature extraction is changed in a certain degree, and the hash sequences of some voices cannot be distinguished when being searched and matched with the hash index table of the system, so that the recall ratio is reduced. The recall and precision of the original speech after the content preserving operation shown in table 5 are shown in tables 6 and 7, respectively:

TABLE 6 recall after CPO

TABLE 7 precision after CPO

As can be seen from tables 6 and 7: the precision ratio and the recall ratio of the retrieval method are both 100 percent after the content holding operation, which shows that the retrieval method has better retrieval performance. When the cloud voice library is searched, the related ciphertext voice can be effectively searched only by providing the address and the hash index.

In summary, the following steps: the search method has good precision ratio and recall ratio, and the P-R has good compromise relationship, so the experiment has good search precision.

Claims

1. A long-sequence biological Hash ciphertext voice retrieval method based on feature fusion is characterized by comprising the following steps:

step 1: registration

3) For the fusion feature vectorVCarrying out binarization differential hash to obtain a differential hash vector corresponding to the original voice, then summing the differential hash vectors, and classifying the voices with the same summation value into one class; establishing a biological characteristic template with a single mapping key according to different classes; and the three-dimensional Lorenz chaotic measurement matrix is generated into 1 in a database through a secret keyNRandom number sequence ofq(i) Then converted into mutually orthogonal random sequences through Schmidt orthogonalizationQ(i) Finally, mutually orthogonal random sequences are obtainedQ(i) And the feature parameter vectorVPerforming scalar product to obtainD=D((i)︱i=1,2,…,N) (ii) a Will be provided withD(i) Performing binarization processing to obtain a hash indexh=h((i)︱i=1,2,…,N)；

4) Generating original speech signal by Henon encryption algorithmx(n) Random sequence of equal lengthS ₁Will be randomSequence ofS ₁Obtaining the sequences according to descending orderS′₁And sequence ofS′₁Andx(n) Is a one-to-one mapping relationship; through the mapping relationx(n) Assign a value toS′₁Then the value is assignedS′₁Restore to unordered state, i.e. form scrambled encrypted sequence to original speechx′(i) (ii) a Finally, a private key with the bit length of 32 bytes is setS _kFor the original voicex(n) Performing iterative encryption to obtain ciphertextF′′(l) ；

And (3) encryption process:

F′′=S _K×F′

step 2: retrieval

Initial search

1) Firstly, generating a characteristic parameter vector according to step 1) and step 1) in step 1V' and to the feature parameter vectorVPerforming binarization processing to obtain differential hash vectors, then summing the differential hash vectors and performing hash matching on the differential hash vectors and the differential hash table obtained in the step 5) in the step 1;if the matching fails, directly feeding back the retrieval result to the user; if the matching is successful, executing the step 3) and the step 4) in the step 1);

search again

2. The feature fusion-based long-sequence bio-hash ciphertext speech retrieval method of claim 1, wherein the linear prediction minimum mean square error feature extraction process is as follows:

) The satisfied RIP properties are as follows:

in the formula:ζ _kis a constant number of times that the number of the first,ζ _k∈(0,1)；

is the energy of the signal;

in the formula (I), the compound is shown in the specification,ξ _x , _y , _zthe expression of the mapping function representing the Lorenz chaotic measurement matrix is shown as follows:

in the formula (I), the compound is shown in the specification,α、β、λis a constant number of times that the number of the first,x(i)，y(i)，z(i) The initial values of (a) are 0, 1, 0, respectively;x′(i)、y′(i) Andz′(i) Is the partial derivative with respect to i.

3. The feature fusion-based long-sequence bio-hash ciphertext voice retrieval method according to claim 1, wherein in the step 5) of the initial retrieval in the step 2), the modified SHA256 decryption process is adopted: first, a secret key is usedS _K(l,m) Carrying out iterative decryption on the ciphertext data 2 to generate ciphertext data 1; then, the ciphertext data 1 is restored to the original data by inverse scrambling.