CN112883206A - Long-sequence biological Hash ciphertext voice retrieval method based on feature fusion - Google Patents

Long-sequence biological Hash ciphertext voice retrieval method based on feature fusion Download PDF

Info

Publication number
CN112883206A
CN112883206A CN202110135465.8A CN202110135465A CN112883206A CN 112883206 A CN112883206 A CN 112883206A CN 202110135465 A CN202110135465 A CN 202110135465A CN 112883206 A CN112883206 A CN 112883206A
Authority
CN
China
Prior art keywords
voice
hash
feature
retrieval
hash index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110135465.8A
Other languages
Chinese (zh)
Other versions
CN112883206B (en
Inventor
黄羿博
蒲向荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwest Normal University
Original Assignee
Northwest Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwest Normal University filed Critical Northwest Normal University
Priority to CN202110135465.8A priority Critical patent/CN112883206B/en
Publication of CN112883206A publication Critical patent/CN112883206A/en
Application granted granted Critical
Publication of CN112883206B publication Critical patent/CN112883206B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/433Query formulation using audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/483Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Abstract

The invention discloses a long-sequence biological Hash ciphertext voice retrieval method based on feature fusion, which comprises the steps of registering, establishing a biological feature template with a single mapping key according to voice feature classification and classification results, generating a voice feature index from a feature vector through a biological Hash method, encrypting a voice file, and sending the feature index and encrypted voice to a cloud. And searching, extracting the feature vectors of the voice to be detected in the same way, matching the feature vectors with the classification result, and uploading the generated hash sequence and the address of the voice to be detected in the hash index table to the cloud if the result is successful. And searching again at the cloud end, matching and searching the uploaded hash sequence and the hash sequence in the hash index table of the system by adopting a Hamming distance algorithm, and returning the corresponding encrypted voice data in the successfully matched ciphertext voice library to the mobile terminal. The retrieval method can effectively prevent plaintext leakage, has good efficiency and precision, and improves the safety of the cloud encrypted voice retrieval system.

Description

Long-sequence biological Hash ciphertext voice retrieval method based on feature fusion
Technical Field
The invention belongs to the technical field of voice retrieval, and relates to a long-sequence biological hash ciphertext voice retrieval method based on feature fusion.
Background
With the increasing demand of cloud-based efficient voice storage and retrieval services, how to quickly find useful information in massive voice data becomes a subject of attention and challenge. However, the third-party cloud service provider is not completely trusted, and the voice data uploaded to the cloud server by the user is not poor for sensitive information related to national and enterprise secrets and personal privacy, which may cause problems such as sensitive information disclosure or personal privacy misuse. At present, with the continuous increase of the demand of cloud-side efficient voice storage and retrieval services, in order to ensure the security of user voice data, the voice data needs to be encrypted before being uploaded to a cloud server. Since most of the perceptual content of the speech is lost due to the encryption processing, how to extract speech features capable of meeting the search requirement from the ciphertext speech becomes a subject of attention and challenge. Therefore, the method has important research significance for the encryption mechanism of the voice data and the research on the encrypted massive ciphertext voice retrieval method.
Currently, there are many research achievements in the voice retrieval technology. The technology mainly comprises the following steps: text or keyword based retrieval and content based retrieval. And content-based speech retrieval can be further divided into: feature matching, deep learning, ranking and retrieval, and the like. Feature extraction is an important step of voice retrieval, and the feature extraction mainly comprises the following steps: perceptual hashing, audio fingerprinting, and the like. The ciphertext voice retrieval based on the content can not only ensure the privacy safety of the voice data, but also efficiently and accurately retrieve the data, so the research of the ciphertext voice retrieval based on the content has very important theoretical and application values. Many research institutions and scholars at home and abroad have already obtained many research achievements in the aspect of ciphertext voice retrieval based on contents.
The existing ciphertext voice retrieval based on contents has the following defects: in the aspect of voice feature extraction, the existing experiment cannot well balance the robustness, the distinguishability and the experiment efficiency because the three are mutually constrained, a perceptual hash sequence for retrieval is extracted from original domain (plaintext) voice, the perceptual hash sequence is embedded into the encrypted voice by using a digital watermarking technology, and a voice watermark of an encrypted voice file in a ciphertext voice library needs to be extracted for matching and then a result is returned during retrieval, so that the retrieval efficiency is influenced to a certain extent, the work of a data owner on a client is increased, and the system becomes complicated.
Disclosure of Invention
The invention aims to provide a long-sequence biological hash ciphertext voice retrieval method based on feature fusion, which improves the retrieval efficiency, reduces the work of a data owner on a client and simplifies the system.
In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a long-sequence biological Hash ciphertext voice retrieval method based on feature fusion specifically comprises the following steps:
step 1: registration
1) For original voice signalx(n) Pre-emphasis is carried out to obtain a pre-emphasis signal, and then the pre-emphasis signal is subjected to framing and windowing function processing to obtain a pre-processing signalx i (n);
2) For pre-processed signalx i (n) Performing linear prediction minimum mean square error feature extraction to obtain feature vectorV 1While, at the same time, pre-processing the signalx i (n) Performing fast Fourier transform to obtain feature vectorV 2Then to the feature vectorV 2Reducing dimension to obtain vectorV′ 2(ii) a For feature vectorV 1Sum vectorV′ 2Carrying out mean value filtering to obtain a fusion feature vectorV
3) For the fusion feature vectorVCarrying out binarization differential hash to obtain a differential hash vector corresponding to the original voice, then summing the differential hash vectors, and classifying the voices with the same summation value into one class; establishing a biological characteristic template with a single mapping key according to different classes; and the three-dimensional Lorenz chaotic measurement matrix is generated into 1 in a database through a secret keyNOf (2) isNumber sequenceq(i) Then converted into mutually orthogonal random sequences through Schmidt orthogonalizationQ(i) Finally, mutually orthogonal random sequences are obtainedQ(i) And the feature parameter vectorVPerforming scalar product to obtainD=D((i)︱i=1,2,…,N) (ii) a Will be provided withD(i) Performing binarization processing to obtain a hash indexh=h((i)︱i=1,2,…,N);
4) Generating original speech signal by Henon encryption algorithmx(n) Random sequence of equal lengthS 1Will be random sequencesS 1Obtaining the sequences according to descending orderS1And sequence ofS1Andx(n) Is a one-to-one mapping relationship; through the mapping relationx(n) Assign a value toS1Then the value is assignedS1Restore to unordered state, i.e. form scrambled encrypted sequence to original speechx′(i) (ii) a Finally, a private key with the bit length of 32 bytes is setS k For the original voicex(n) Performing iterative encryption to obtain ciphertextF′′(l);
And (3) encryption process:
a. the original speech is processedx(n) And the hash index h are respectively passed through a private keyS p Scrambling and encrypting are performed to generate ciphertext data 1, and the ciphertext data 1 is usedF′Represents;
b. setting SHA256 to 32-byte long initial keyS K And is andS K (m) The value range is [0,255%];
Figure 664726DEST_PATH_IMAGE001
c. Cipher text data 1 and keyS K (l,m) Performing 32-bit iterative encryption to generate ciphertext data 2, and using the ciphertext data 2F′′Represents:F′′=S K ×F′
in the formula:S K is the overall key after iteration, where the next set of private keys is generated by the previous set of iterations:S K (i+1)=sha256(F′′(i))
5) sending the hash index and the ciphertext to a cloud end, and establishing a corresponding hash index table at the cloud end;
step 2: retrieval
Initial search
1) Firstly, generating a characteristic parameter vector according to step 1) and step 1) in step 1V' and to the feature parameter vectorVPerforming binarization processing to obtain differential hash vectors, then summing the differential hash vectors and performing hash matching on the differential hash vectors and the differential hash table obtained in the step 5) in the step 1; if the matching fails, directly feeding back the retrieval result to the user; if the matching is successful, executing the step 3) and the step 4) in the step 1);
2) then, the feature parameter vector is usedVSending to corresponding biological characteristic template to obtain hash indexh1
3) To-be-retrieved speechx′(n) Sending the address and the hash index in the hash index table to the cloud;
4) for the result returned by the cloud, if the retrieval fails, directly feeding the retrieval result back to the user; if the search is successful, executing step 3) in step 1;
5) decrypting the encrypted voice in the reverse direction of the step 4) in the step 1, then generating a hash index h ' ' from the decrypted voice according to the step 2) and the step 3) in the step 1, comparing the hash index h ' ' with a hash index h ' returned by the cloud, and judging whether the voice is attacked maliciously;
6) feeding back the retrieval result and the comparison result of the mobile terminal to the user;
search again
I, according to the voice to be searched provided by the mobile terminalx′(n) Calculating the voice to be retrieved at the address of the Hash index tablex′(n) Hash index ofh1Hash index corresponding to cloud hash index table addressh2The hamming distance of;
feeding back the search result to the user, and indexing the hash index only when the Hamming code distance is less than the threshold valueh2And the encrypted voice is transmitted to the mobile terminal at the same time.
To measure this algorithm, a false positive rate is defined (FAR) The formula is as follows:
Figure 913305DEST_PATH_IMAGE002
the retrieval method adopts the principles of voice short-time cross-correlation and perceptual hashing technology, so that plaintext leakage can be effectively prevented, and the biological characteristic template has good diversity and revocable property. Meanwhile, the retrieval method has good efficiency and precision, the problem of voice retrieval after content keeping operation is solved, and the safety of the cloud encryption voice retrieval system is improved.
Compared with the prior art, the retrieval method has the following advantages:
1) the retrieval method extracts the linear prediction minimum mean square error and the fast Fourier transform extraction characteristics for the voice, performs differential hash on the characteristics to generate the differential hash table, and greatly reduces the operation amount during cloud retrieval by uploading the addresses corresponding to the voice to be detected and the differential hash table in the retrieval process.
2) The ciphertext voice perception Hash construction method provided by the retrieval method of the invention also shows better distinguishability, robustness and summarization when being applied to the original domain (plaintext) voice. Therefore, the hash construction method in the retrieval method can be applied to the application fields of voice feature extraction, content retrieval, content authentication and the like of an original domain or an encrypted domain.
3) The voice encryption method in the retrieval method does not cause most of perception content of voice to be lost, and the key space is large, so that the perception hash abstract can be directly extracted from the ciphertext voice by the ciphertext voice perception hash method, voice does not need to be downloaded and decrypted when ciphertext voice retrieval, authentication and other operations are carried out, the safety of voice data is ensured, the method can be applied to encryption storage and management of large-scale voice data, and the possibility is provided for reducing the work of a user at a client.
4) The Hash sequence is not required to be embedded into the ciphertext voice as a digital watermark, so that the ciphertext voice retrieval system is safer, more efficient and simpler, and has higher retrieval efficiency and retrieval precision.
Drawings
FIG. 1 is a flow chart of the search method of the present invention.
FIG. 2 is a flow chart of the speech feature extraction in the retrieval method of the present invention.
Fig. 3 is a scatter diagram before and after encryption.
Fig. 4 is a comparison graph of voice encryption and decryption.
FIG. 5 is a graph illustrating the difference between the pre-encryption and post-decryption speech data.
Fig. 6 is a BER normal distribution diagram.
Fig. 7 is a graph of the average BER for different algorithm content retention operations.
FIG. 8 is a graph of threshold values versus precision and recall.
FIG. 9 is a graph of recall and precision for different threshold values;
fig. 10 is a diagram of an original voice encryption process.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides a long-sequence biological hash ciphertext voice retrieval method based on feature fusion, and a flow chart is shown in figure 1. The method comprises the following steps:
the method comprises the following steps: registration
1) For original voice signalx(n) Pre-emphasis is performed to enhance the high frequency portion to obtain a pre-emphasized signal, and then framing and windowing are performed on the pre-emphasized signal to obtain a pre-processed signalx i (n);
The low-frequency band energy of the voice signal is large, the high-frequency band signal energy is obviously small, the power spectral density of the noise output by the frequency discriminator is increased along with the square, the low-frequency signal-to-noise ratio of the signal is large, and the high-frequency signal-to-noise ratio is obviously insufficient, so that the low-frequency signal-to-noise ratio of the signal is large, and the highHigh frequency transmission is difficult and pre-emphasis is often used to solve this problem. Therefore, the input voice signal is treated firstx(n) Pre-emphasis is performed to enhance the high frequency portion, and then framing and windowing are performed on the pre-emphasized signal to obtain a pre-processed signalx i (n)。
2) For pre-processed signalx i (n) Respectively carrying out feature extraction of linear prediction minimum mean square error and feature extraction of fast Fourier inverse transformation to obtain feature vectorsV 1And feature vectorsV 2Then to the feature vectorV 2Reducing dimension to obtain vectorV′ 2. For feature vectorV 1Sum vectorV′ 2Carrying out mean value filtering to obtain a fusion feature vectorV
The flow of feature extraction for linear prediction of minimum mean square error is shown in fig. 2, and the specific process is as follows:
the chaos measurement matrix generation algorithm is as follows: k order signalxMeasurement matrix Φ: (
Figure 777355DEST_PATH_IMAGE003
) The satisfied RIP properties are as follows:
Figure 326148DEST_PATH_IMAGE004
in the formula:ζ k is a constant number of times that the number of the first,ζ k ∈(0,1)。
Figure 617453DEST_PATH_IMAGE005
is the energy of the signal.
The general expression of the chaotic measurement matrix based on Lorenz is as follows:
Figure 871104DEST_PATH_IMAGE006
in the formula (I), the compound is shown in the specification,ξ x , y , z representing Lorenz chaosThe mapping function of the measurement matrix is expressed as follows:
Figure 804425DEST_PATH_IMAGE007
in the formula (I), the compound is shown in the specification,αβ、λis a constant number of times, wherein,α=8/3,β=28,λ=10。x(i),y(i),z(i) The initial values of (a) are 0, 1, 0, respectively;x′(i)、y′(i) Andz′(i) Is the partial derivative with respect to i.
3) For the fusion feature vectorVAnd carrying out binarization differential hash to obtain a differential hash vector corresponding to the original voice. Then, calculating fusion characteristic vectors and differential hash vectors for all voices in the voice library, summing the differential hash vectors, and classifying the voices with the same summation value into one class; establishing a biological characteristic template with a single mapping key according to different classes; and the three-dimensional Lorenz chaotic measurement matrix is generated into 1 in a database through a secret keyNRandom number sequence ofq(i) Then converted into mutually orthogonal random sequences through Schmidt orthogonalizationQ(i) Finally, mutually orthogonal random sequences are obtainedQ(i) And fusion of feature vectorsVPerforming scalar product to obtainD=D((i)︱i=1,2,…,N) (ii) a Will be provided withD(i) Performing binarization processing to obtain a hash indexh=h((i)︱i=1,2,…,N) Namely: and (4) indexing the features.
4) Generating original speech signal by Henon encryption algorithmx(n) Random sequence of equal lengthS 1Will be random sequencesS 1In descending order to obtainS1And is andS1andx(n) Is a one-to-one mapping relationship; through the mapping relationx(n) Assign a value toS1Then the value is assignedS1Recovering to unordered state, i.e. forming scrambling encryption sequence for original voice, finally setting a private key with bit length of 32 bytesS k And pass throughForm to original speechx(n) Performing iterative encryption to obtain ciphertextF′′(l)。
In which, the encryption process, as shown in fig. 10:
a. the original speech is processedx(n) And hash indexhRespectively by means of a private keyS p Scrambling and encrypting are performed to generate ciphertext data 1: (F′);
b. Setting SHA256 to 32-byte long initial keyS K And is andS K (m) The value range is [0,255%];
Figure RE-917083DEST_PATH_IMAGE001
c. Ciphertext data 1 (F′) And a secret keyS K (l,m) Performing 32-bit iterative encryption each time to generate ciphertext data 2: (F′′) As follows:
Figure RE-499374DEST_PATH_IMAGE008
in the formula, n isx(n) The total number of sampling points;
F′′=S K ×F′
in the formula:S K is the overall key after iteration, where the next set of private keys is generated from the previous set of iterations as follows:
S K (i+1)=sha256(F′′(i))
5) sending the hash index and the ciphertext to a cloud end, and establishing a corresponding hash index table at the cloud end;
step two: retrieval
Initial search
1) The original speech is processedx′(n) Firstly, generating a characteristic parameter vector according to the step 1) and the step 2) in the step oneV' and to the feature parameter vectorV' binarization processing is performed to obtain a differenceC, carrying out hash vector summation, and matching the differential hash vector with the classification result obtained in the step c in the step one; if the matching fails, directly feeding back the retrieval result to the user; if the matching is successful, executing the step c and the step d in the step one;
2) then, the feature parameter vector is usedV' sending to and the feature vectorV' corresponding biometric template, resulting in a hash indexh1
3) To-be-retrieved speechx′(n) Sending the address and the hash index in the hash index table to the cloud;
4) for the result returned by the cloud, if the retrieval is failed, the retrieval result is fed back to the user; if the search is successful, executing the step c in the step one;
5) d, decrypting the encrypted voice in the reverse direction of the step d in the step one, then generating a hash index h ' ' from the decrypted voice according to the step b and the step c in the step one, comparing the hash index h ' ' with a hash index h ' returned by the cloud end, and judging whether the voice is attacked maliciously or not;
decryption procedure for modified SHA256 used for decryption: first, a secret key is usedS K (l,m) Carrying out iterative decryption on the ciphertext data 2 to generate ciphertext data 1; then, the ciphertext data 1 is reversely scrambled to restore the original data;
6) feeding back the retrieval result and the comparison result of the mobile terminal to the user;
search again
I, according to the voice to be searched provided by the mobile terminalx′(n) Calculating the voice to be retrieved at the address of the Hash index tablex′(n) Hash index ofh1Hash index corresponding to cloud hash index table addressh2The hamming distance of;
feeding back the search result to the user, and indexing the hash index only when the Hamming code distance is less than the threshold valueh2And the encrypted voice is transmitted to the mobile terminal at the same time.
To measure this algorithm, a false positive rate is defined (FAR) The formula thereofComprises the following steps:
Figure 630112DEST_PATH_IMAGE010
analysis of experiments
The voice used by the experiment proposed by the search method of the invention is from voice signals in TIMIT (Texas instruments and massachusetts institute of technology) and TTS (text to speech) voice libraries, the time length of the voice signals is 4 s.
The experimental hardware platform is an Intel (R) core (TM) i5-7500M CPU, the 3.40GHz and the memory 4GB software environment is Matlab R2018b under the Windows 10 operating system.
Security and integrity analysis of plaintext data
In order to avoid plaintext leakage of data in an open mobile channel and a semi-open cloud storage and ensure data integrity during voice recovery, an improved SHA256 experiment is adopted to encrypt voice data.
In order to verify the safety of the experiment, namely the correlation of the voice before and after encryption, a voice is randomly selected from a voice library at first, and then 32000 continuous sample points of the voice are randomly selected as sampling points, so thatx(i) Is shown as the abscissa of the graph,x(i+1) is the ordinate, the scatter plot before and after encryption of the sample points is shown in fig. 3, fig. 3 (a) is the scatter plot before encryption of the sample points, and fig. 3 (b) is the scatter plot after encryption of the sample points: it can be seen from the figure that the correlation of the encrypted voice is greatly reduced, which indicates that the voice has good security.
To further verify the correlation of the speech before and after encryption, spearman correlation coefficient is used to calculate the correlation between before and after encryption, which is defined as follows:
Figure 871607DEST_PATH_IMAGE011
calculating by the formula: the coefficient of this speech before the modified SHA256 encryption is 0.9840 and the encrypted coefficient is-0.0030. As can be seen from fig. 3 and the spearman correlation coefficient: after the voice is encrypted, the relevance is greatly reduced, which shows that the retrieval method of the invention has good safety, namely: the leakage of the plaintext data is effectively prevented.
In order to further verify the security of the improved SHA256 encryption experiment, the whole voice is encrypted and decrypted to obtain an encryption and decryption comparison graph shown in fig. 4; fig. 4 (a) is a waveform diagram of an original voice, fig. 4 (b) is a waveform diagram of an encrypted voice, fig. 4 (c) is a waveform diagram of an error key decrypted voice, and fig. 4 (d) is a waveform diagram of a correct key decrypted voice.
In order to verify the data integrity of the experiment, i.e. the invariance of the pre-encryption and post-decryption voice, the pre-encryption voice data and the post-decryption voice data are subtracted, and the result is shown in fig. 5.
As can be seen from fig. 4 and 5: the retrieval method of the invention not only has good encryption performance to voice, but also can ensure the integrity of data.
In summary, the following steps: in a cloud environment, the improved SHA256 method can not only effectively prevent leakage of plaintext data, but also maintain data integrity.
Performance analysis
1. Distinctiveness of
The BER of the different content speech signal biological hash values basically obeys normal distribution. The normal distribution of the speech library BER data in this experiment is shown in fig. 6:
the normalized Hamming distance of the hash sequence is approximately obeyed by the Limonid-Laplace central limit theorem
Figure 35872DEST_PATH_IMAGE012
The normal distribution of (a), wherein,prepresenting the probability of 0 or 1 occurring in the bio-hash sequence;Nrepresents the total number of frames, whereinN=208。
TABLE 1 Positive Taiji distribution parameters
Figure 746339DEST_PATH_IMAGE013
As can be seen from table 1: compared with other total frame numbers, the positive Taiji distribution parameter obtained by the experiment is closer to the theoretical value when the total frame number is 802.
TABLE 2 comparison of the false positives for different Total frame counts
Figure 490304DEST_PATH_IMAGE014
As can be seen from table 2: when the total number of frames in the experiment was 535, 401 and 321, respectivelyτWhen =0.16, the FAR is about the total number of framesN2.3 × 10 in case of =80224,1.0×1036,1.6×1042And (4) doubling.
It can also be seen from table 2: even in the same experiment, the FAR is different at different total frame numbers, that is, the FAR of the same voice segment is different at different hash sequence lengths obtained at different total frame numbers, so that it is not sufficient to measure the experimental distinguishability by the FAR alone.
To further measure the experimental distinctiveness, an entropy rate ER was used for the calculation. Entropy rate is used as a measure for measuring the uncertainty of random events, and the definition of ER is as follows:
Figure 868196DEST_PATH_IMAGE015
by substituting the data in table 1 into the above equation, the ER values for the experiments at different total frame numbers are shown in table 3:
TABLE 3 ER comparison for different Total frame numbers
Figure 921471DEST_PATH_IMAGE016
As can be seen from tables 1, 2 and 3, the experiment has good randomness and impact resistance at a total frame number of 802, namely: long sequence experiments are well differentiated.
TABLE 4 comparison of false positives for different thresholds
Figure 384814DEST_PATH_IMAGE017
Robustness
To test robustness, the speech library was first subjected to 9 content-preserving operations as shown in table 5,
TABLE 5 Contents holding operation
Figure 198049DEST_PATH_IMAGE018
Then calculate the average after the content hold operationBREAveraging of speech librariesBREAs shown in fig. 7: as can be seen from fig. 7: the BER mean value of the experiment after common 12 content keeping operations is integrally low, which shows that the retrieval method has better robustness. Because a linear prediction minimum mean square error and inverse fast Fourier transform feature fusion mode is adopted to carry out linear analysis on the voice signal:
the weighting operation is the operation of digitally encoding a signal, which causes fluctuation of speech data, increases an error of a minimum mean square error of linear prediction, and changes a linear coefficient of inverse fast fourier transform as a whole, and thus is the least robust.
For the echo adding operation, the echo is added to superpose the normal speech and the echo speech, so that the error of the minimum mean square error of linear prediction is increased, and the linear coefficients of the part of the silence frame and the part of the speech frame of the inverse fast fourier transform are changed, so that the robustness is poor.
For narrow-band noise operation, the robustness is poor because the interference of noise can increase the error of the linear prediction minimum mean square error and change the linear coefficient of the inverse fast fourier transform.
In the MP3 operation, since the signal is compressed but the linear processing of the entire speech is small, the minimum mean square error of the linear prediction is large and the linear coefficient of the inverse fast fourier transform is not changed, so that the robustness is good.
For the volume adjustment operation, the increase or decrease of the volume only affects the voice amplitude, so that the error of the voice frame of the minimum mean square error part of the linear prediction is increased, and the linear coefficient of the inverse fast Fourier transform is hardly changed, so that the robustness is good.
For the resampling operation, the frequency distribution of the voice signals is 0-4 KHz, so that the operation has the minimum influence on the voice signals, the minimum mean square error of linear prediction and the inverse fast Fourier transform are small in change, and the robustness is the best.
In conclusion, the retrieval method has good robustness and distinctiveness, and can meet the retrieval requirements of the system on the hash index table sequence.
Search performance
In information retrieval, precision ratio (Pr)ecision) And recall (Re)call) Is an important index reflecting the search performance, and the formula is as follows:
Precision=[S T /(S T + S N )]×100%
Recall=[S T /(S T + S F )]×100%
in the formula:S T representing the number of correct voices and retrieved in the retrieval result,S N the number of erroneous speech sounds that are indicated and retrieved,S F indicating the number of correct voices and not retrieved.
The following formula shows that: precision and recall are in conflicting relation, i.e. as precision increases, recall decreases. On the contrary, as the precision ratio decreases, the recall ratio increases, so it is important to select a proper threshold value to balance the two. As can be seen from fig. 8 (a): as the threshold value increases, the precision ratio of the experiment under different total frame numbers is 100%. As can be seen from fig. 8 (b): with the increase of the threshold value, the larger the total frame number is, the higher the recall ratio is, namely: under the same condition, the retrieval performance of the long sequence is better.
When the features are matched, if the normalized Hamming code distance is smaller than the threshold value, the matching is successful, otherwise, the matching is failed. As can be seen from fig. 6 and 7: the BER of the original voice is 0.43 at the minimum value, and the maximum value of the BER average value after the content holding operation is 0.24, so the threshold value should fall within the interval (0.24, 0.43).
As can be seen from fig. 9 (the recall ratio in fig. 9 (a) and the precision ratio in fig. 9 (b)): in the interval (0.24, 0.43), only when the interval [0.36, 0.4] satisfies that the precision ratio and the recall ratio are 1 at the same time, so in order to ensure that the retrieval method of the present invention still has good precision ratio and recall ratio when undergoing the content holding operation, 0.36 is adopted as the threshold value.
As can be seen from fig. 7: the original voice is changed in different degrees after the content keeping operation, so that the hash sequence after the feature extraction is changed in a certain degree, and the hash sequences of some voices cannot be distinguished when being searched and matched with the hash index table of the system, so that the recall ratio is reduced. The recall and precision of the original speech after the content preserving operation shown in table 5 are shown in tables 6 and 7, respectively:
TABLE 6 recall after CPO
Figure 164868DEST_PATH_IMAGE019
TABLE 7 precision after CPO
Figure 405356DEST_PATH_IMAGE020
As can be seen from tables 6 and 7: the precision ratio and the recall ratio of the retrieval method are both 100 percent after the content holding operation, which shows that the retrieval method has better retrieval performance. When the cloud voice library is searched, the related ciphertext voice can be effectively searched only by providing the address and the hash index.
In summary, the following steps: the search method has good precision ratio and recall ratio, and the P-R has good compromise relationship, so the experiment has good search precision.

Claims (3)

1. A long-sequence biological Hash ciphertext voice retrieval method based on feature fusion is characterized by comprising the following steps:
step 1: registration
1) For original voice signalx(n) Pre-emphasis is carried out to obtain a pre-emphasis signal, and then the pre-emphasis signal is subjected to framing and windowing function processing to obtain a pre-processing signalx i (n);
2) For pre-processed signalx i (n) Performing linear prediction minimum mean square error feature extraction to obtain feature vectorV 1While, at the same time, pre-processing the signalx i (n) Performing fast Fourier transform to obtain feature vectorV 2Then to the feature vectorV 2Reducing dimension to obtain vectorV′ 2(ii) a For feature vectorV 1Sum vectorV′ 2Carrying out mean value filtering to obtain a fusion feature vectorV
3) For the fusion feature vectorVCarrying out binarization differential hash to obtain a differential hash vector corresponding to the original voice, then summing the differential hash vectors, and classifying the voices with the same summation value into one class; establishing a biological characteristic template with a single mapping key according to different classes; and the three-dimensional Lorenz chaotic measurement matrix is generated into 1 in a database through a secret keyNRandom number sequence ofq(i) Then converted into mutually orthogonal random sequences through Schmidt orthogonalizationQ(i) Finally, mutually orthogonal random sequences are obtainedQ(i) And the feature parameter vectorVPerforming scalar product to obtainD=D((i)︱i=1,2,…,N) (ii) a Will be provided withD(i) Performing binarization processing to obtain a hash indexh=h((i)︱i=1,2,…,N);
4) Generating original speech signal by Henon encryption algorithmx(n) Random sequence of equal lengthS 1Will be randomSequence ofS 1Obtaining the sequences according to descending orderS1And sequence ofS1Andx(n) Is a one-to-one mapping relationship; through the mapping relationx(n) Assign a value toS1Then the value is assignedS1Restore to unordered state, i.e. form scrambled encrypted sequence to original speechx′(i) (ii) a Finally, a private key with the bit length of 32 bytes is setS k For the original voicex(n) Performing iterative encryption to obtain ciphertextF′′(l) ;
And (3) encryption process:
a. the original speech is processedx(n) And the hash index h are respectively passed through a private keyS p Scrambling and encrypting are performed to generate ciphertext data 1, and the ciphertext data 1 is usedF′Represents;
b. setting SHA256 to 32-byte long initial keyS K And is andS K (m) The value range is [0,255%];
Figure 470850DEST_PATH_IMAGE001
F′′=S K ×F′
In the formula:S K is the overall key after iteration, where the next set of private keys is generated by the previous set of iterations:S K (i+1)=sha256(F′′(i))
5) sending the hash index and the ciphertext to a cloud end, and establishing a corresponding hash index table at the cloud end;
step 2: retrieval
Initial search
1) Firstly, generating a characteristic parameter vector according to step 1) and step 1) in step 1V' and to the feature parameter vectorVPerforming binarization processing to obtain differential hash vectors, then summing the differential hash vectors and performing hash matching on the differential hash vectors and the differential hash table obtained in the step 5) in the step 1;if the matching fails, directly feeding back the retrieval result to the user; if the matching is successful, executing the step 3) and the step 4) in the step 1);
2) then, the feature parameter vector is usedVSending to corresponding biological characteristic template to obtain hash indexh1
3) To-be-retrieved speechx′(n) Sending the address and the hash index in the hash index table to the cloud;
4) for the result returned by the cloud, if the retrieval fails, directly feeding the retrieval result back to the user; if the search is successful, executing step 3) in step 1;
5) decrypting the encrypted voice in the reverse direction of the step 4) in the step 1, then generating a hash index h ' ' from the decrypted voice according to the step 2) and the step 3) in the step 1, comparing the hash index h ' ' with a hash index h ' returned by the cloud, and judging whether the voice is attacked maliciously;
6) feeding back the retrieval result and the comparison result of the mobile terminal to the user;
search again
I, according to the voice to be searched provided by the mobile terminalx′(n) Calculating the voice to be retrieved at the address of the Hash index tablex′(n) Hash index ofh1Hash index corresponding to cloud hash index table addressh2The hamming distance of;
feeding back the search result to the user, and indexing the hash index only when the Hamming code distance is less than the threshold valueh2And the encrypted voice is transmitted to the mobile terminal at the same time.
2. The feature fusion-based long-sequence bio-hash ciphertext speech retrieval method of claim 1, wherein the linear prediction minimum mean square error feature extraction process is as follows:
the chaos measurement matrix generation algorithm is as follows: k order signalxMeasurement matrix Φ: (
Figure 672331DEST_PATH_IMAGE002
) The satisfied RIP properties are as follows:
Figure 648377DEST_PATH_IMAGE003
in the formula:ζ k is a constant number of times that the number of the first,ζ k ∈(0,1);
Figure 720239DEST_PATH_IMAGE004
is the energy of the signal;
the general expression of the chaotic measurement matrix based on Lorenz is as follows:
Figure 894868DEST_PATH_IMAGE005
in the formula (I), the compound is shown in the specification,ξ x , y , z the expression of the mapping function representing the Lorenz chaotic measurement matrix is shown as follows:
Figure 964455DEST_PATH_IMAGE006
in the formula (I), the compound is shown in the specification,αβ、λis a constant number of times that the number of the first,x(i),y(i),z(i) The initial values of (a) are 0, 1, 0, respectively;x′(i)、y′(i) Andz′(i) Is the partial derivative with respect to i.
3. The feature fusion-based long-sequence bio-hash ciphertext voice retrieval method according to claim 1, wherein in the step 5) of the initial retrieval in the step 2), the modified SHA256 decryption process is adopted: first, a secret key is usedS K (l,m) Carrying out iterative decryption on the ciphertext data 2 to generate ciphertext data 1; then, the ciphertext data 1 is restored to the original data by inverse scrambling.
CN202110135465.8A 2021-02-01 2021-02-01 Long-sequence biological hash ciphertext voice retrieval method based on feature fusion Active CN112883206B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110135465.8A CN112883206B (en) 2021-02-01 2021-02-01 Long-sequence biological hash ciphertext voice retrieval method based on feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110135465.8A CN112883206B (en) 2021-02-01 2021-02-01 Long-sequence biological hash ciphertext voice retrieval method based on feature fusion

Publications (2)

Publication Number Publication Date
CN112883206A true CN112883206A (en) 2021-06-01
CN112883206B CN112883206B (en) 2022-07-01

Family

ID=76052264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110135465.8A Active CN112883206B (en) 2021-02-01 2021-02-01 Long-sequence biological hash ciphertext voice retrieval method based on feature fusion

Country Status (1)

Country Link
CN (1) CN112883206B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055061A (en) * 2023-01-18 2023-05-02 南京龙垣信息科技有限公司 Voiceprint authentication privacy protection method based on hash encryption
CN116132977A (en) * 2023-04-19 2023-05-16 深圳锐爱电子有限公司 Mouse safety encryption authentication method
CN116756778A (en) * 2023-08-15 2023-09-15 四川玉米星球科技有限公司 Private cipher text storage and access method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6615253B1 (en) * 1999-08-31 2003-09-02 Accenture Llp Efficient server side data retrieval for execution of client side applications
CN1964244A (en) * 2005-11-08 2007-05-16 厦门致晟科技有限公司 A method to receive and transmit digital signal using vocoder
CN104835499A (en) * 2015-05-13 2015-08-12 西南交通大学 Cipher text speech perception hashing and retrieving scheme based on time-frequency domain trend change
CN105957119A (en) * 2016-05-20 2016-09-21 哈尔滨理工大学 Construction method for measurement matrix of compressed sensing magnetic resonance images based on chaotic system
CN108111710A (en) * 2017-12-15 2018-06-01 四川大学 A kind of more image encryption methods that can reduce ciphertext and key data amount
US20190132120A1 (en) * 2017-10-27 2019-05-02 EMC IP Holding Company LLC Data Encrypting System with Encryption Service Module and Supporting Infrastructure for Transparently Providing Encryption Services to Encryption Service Consumer Processes Across Encryption Service State Changes
CN111897909A (en) * 2020-08-03 2020-11-06 兰州理工大学 Ciphertext voice retrieval method and system based on deep perception Hash
CN112134681A (en) * 2020-08-19 2020-12-25 河南大学 Image compression encryption method and cloud-assisted decryption method based on compressed sensing and optical transformation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6615253B1 (en) * 1999-08-31 2003-09-02 Accenture Llp Efficient server side data retrieval for execution of client side applications
CN1964244A (en) * 2005-11-08 2007-05-16 厦门致晟科技有限公司 A method to receive and transmit digital signal using vocoder
CN104835499A (en) * 2015-05-13 2015-08-12 西南交通大学 Cipher text speech perception hashing and retrieving scheme based on time-frequency domain trend change
CN105957119A (en) * 2016-05-20 2016-09-21 哈尔滨理工大学 Construction method for measurement matrix of compressed sensing magnetic resonance images based on chaotic system
US20190132120A1 (en) * 2017-10-27 2019-05-02 EMC IP Holding Company LLC Data Encrypting System with Encryption Service Module and Supporting Infrastructure for Transparently Providing Encryption Services to Encryption Service Consumer Processes Across Encryption Service State Changes
CN108111710A (en) * 2017-12-15 2018-06-01 四川大学 A kind of more image encryption methods that can reduce ciphertext and key data amount
CN111897909A (en) * 2020-08-03 2020-11-06 兰州理工大学 Ciphertext voice retrieval method and system based on deep perception Hash
CN112134681A (en) * 2020-08-19 2020-12-25 河南大学 Image compression encryption method and cloud-assisted decryption method based on compressed sensing and optical transformation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
QIUYU ZHANG等: "A Classification Retrieval Method for Encrypted Speech Based on Deep Neural Network and Deep Hashing", 《IEEE ACCESS》, vol. 8, 5 November 2020 (2020-11-05), pages 202469 - 202482, XP011820705, DOI: 10.1109/ACCESS.2020.3036048 *
YONG WANG等: "Multi-format speech BioHashing based on energy to zero ratio and improved LP-MMSE parameter fusion", 《MULTIMEDIA TOOLS AND APPLICATIONS》, vol. 80, 16 November 2020 (2020-11-16), pages 10013 *
杨智辉: "语音识别在中医处方系统中的应用", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, no. 3, 15 March 2016 (2016-03-15), pages 136 - 701 *
钱永青: "基于混沌理论和压缩感知理论的语音加密", 《武汉轻工大学学报》, vol. 38, no. 5, 31 October 2019 (2019-10-31), pages 90 - 94 *
黄羿博等: "基于混沌测量矩阵的生物哈希密文语音检索", 《华中科技大学学报(自然科学版)》, vol. 48, no. 12, 31 December 2020 (2020-12-31), pages 1 - 2 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116055061A (en) * 2023-01-18 2023-05-02 南京龙垣信息科技有限公司 Voiceprint authentication privacy protection method based on hash encryption
CN116055061B (en) * 2023-01-18 2024-03-05 南京龙垣信息科技有限公司 Voiceprint authentication privacy protection method based on hash encryption
CN116132977A (en) * 2023-04-19 2023-05-16 深圳锐爱电子有限公司 Mouse safety encryption authentication method
CN116756778A (en) * 2023-08-15 2023-09-15 四川玉米星球科技有限公司 Private cipher text storage and access method and device
CN116756778B (en) * 2023-08-15 2023-11-14 四川玉米星球科技有限公司 Private cipher text storage and access method and device

Also Published As

Publication number Publication date
CN112883206B (en) 2022-07-01

Similar Documents

Publication Publication Date Title
CN112883206B (en) Long-sequence biological hash ciphertext voice retrieval method based on feature fusion
US20050066176A1 (en) Categorizer of content in digital signals
Zhao et al. Iris template protection based on local ranking
He et al. A retrieval algorithm of encrypted speech based on syllable-level perceptual hashing
CN112215165B (en) Face recognition method based on wavelet dimensionality reduction under homomorphic encryption
Billeb et al. Biometric template protection for speaker recognition based on universal background models
Zhang et al. A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing
CN111897909B (en) Ciphertext voice retrieval method and system based on deep perceptual hashing
CN106778292B (en) A kind of quick restoring method of Word encrypted document
CN111143865A (en) User behavior analysis system and method for automatically generating label on ciphertext data
Zhao et al. A retrieval algorithm for encrypted speech based on perceptual hashing
Yu et al. SVD‐based image compression, encryption, and identity authentication algorithm on cloud
CN115134082A (en) Social media false message detection method with privacy protection function
Zhang et al. An efficient retrieval algorithm of encrypted speech based on inverse fast Fourier transform and measurement matrix
Huang et al. Encrypted speech retrieval based on long sequence Biohashing
Zhang et al. An efficient retrieval approach for encrypted speech based on biological hashing and spectral subtraction
CN113779597B (en) Method, device, equipment and medium for storing and similar searching of encrypted document
Wang et al. Multi-format speech biohashing based on energy to zero ratio and improved lp-mmse parameter fusion
CN112883207B (en) High-safety biological Hash ciphertext voice retrieval method based on feature fusion
Huang et al. A high security BioHashing encrypted speech retrieval algorithm based on feature fusion
Zhang et al. An encrypted speech authentication method based on uniform subband spectrumvariance and perceptual hashing
Zhang et al. Verifiable speech retrieval algorithm based on diversity security template and biohashing
Khanduja et al. A scheme for robust biometric watermarking in web databases for ownership proof with identification
Jin et al. Efficient blind face recognition in the cloud
Wu et al. Robust and blind audio watermarking scheme based on genetic algorithm in dual transform domain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant