CN113129875A

CN113129875A - Voice data privacy protection method based on countermeasure sample

Info

Publication number: CN113129875A
Application number: CN202110271786.0A
Authority: CN
Inventors: 陈双喜; 肖文红; 马方超; 刘会; 吴至禹
Original assignee: Jiaxing Vocational and Technical College
Current assignee: Jiaxing Vocational and Technical College
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2021-07-16

Abstract

The invention discloses a voice data privacy protection method based on a countermeasure sample, which is used for protecting privacy voice data of a user and comprises the following steps of S1: loading a voice recognition model D; step S2: and pre-detecting the audio data x of the user according to the voice recognition model D to obtain a recognition result of a probability distribution matrix D (x). The invention discloses a voice data privacy protection method based on a countermeasure sample, which establishes a loss function for generating the countermeasure sample, sets a certain loss function threshold value, optimizes audio data of a user through a multi-iteration method to generate the countermeasure sample aiming at a voice recognition model, so that after disturbance is added to the private voice data, the private conversation content of the user can be clearly heard by people, the auditory experience cannot be influenced, but an intelligent device cannot recognize and analyze the speaking content of the user.

Description

Voice data privacy protection method based on countermeasure sample

Technical Field

The invention belongs to the technical field of voice data privacy protection, and particularly relates to a voice data privacy protection method based on countermeasure samples.

Background

Deep learning techniques have been applied to various aspects of daily life, particularly in speech recognition, picture recognition, object detection, and the like. However, with the development of deep learning technology, the privacy problem caused by the deep learning technology is also concerned by users. When a user carries out a private conversation, the voice assistant may secretly record the conversation contents of the user and upload voice data to the server without the permission of the user, and the private voice data may be illegally sold or used for training an artificial intelligence model of a business company. Deep learning techniques may also extract the user's private information from the leaked voice data, such as: home address, interpersonal relationship, personal preference, etc., by which the business company can push corresponding advertisements to the user or perform other business activities. The Recurrent Neural Network (RNN) has a good effect on speech recognition, obtains a probability distribution sequence of characters from an input audio signal, and then deduces sentences corresponding to the audio signal from the probability distribution sequence, and deep learning technology-based deep speech recognition system developed by Baidu corporation is the mainstream speech recognition system at present.

In the big data age, more and more voice data is being released for improving voice-based services or academic research. But there is a risk of privacy leakage during the voice data distribution process. For example, in voice data distribution, if an attacker knows the voice data of a particular user, the attacker can learn the sensitive information of the user by analyzing the voice data.

Although deep learning techniques have been widely used, more and more research has shown that there are many safety hazards to this technique itself. GoodFellow et al propose a fast gradient notation (FGSM), which is a countermeasure sample generation algorithm for a deep learning model, which is one of the mainstream countermeasure sample generation algorithms at present, by applying a slight perturbation on the input of the deep learning model so that the deep learning model produces an erroneous prediction result. The invention provides a countermeasure sample generation method based on gradient and multiple iterations, which is used for constructing a loss function for generating a countermeasure sample and performing iteration optimization to generate disturbance which is not perceivable by human ears, so that after disturbance is added to private voice data, the private conversation content of a user can be heard by people, but an intelligent device cannot recognize and analyze the speaking content of the user, and cannot be illegally utilized after the audio data is uploaded, so that the private voice data of the user is protected.

Disclosure of Invention

The invention mainly aims to provide a voice data privacy protection method based on countermeasure samples, which establishes a loss function for generating the countermeasure samples, sets a certain loss function threshold value, optimizes the audio data of a user through a multi-iteration method to generate the countermeasure samples aiming at a voice recognition model, so that after disturbance is added to the private voice data, the private conversation content of the user can be clearly heard by people, the auditory experience cannot be influenced, but an intelligent device cannot recognize and analyze the speaking content of the user, and cannot be illegally utilized after the audio data is uploaded, thereby protecting the private audio data of the user.

The invention also aims to provide a voice data privacy protection method based on countermeasure samples, which has no obvious effect on a mode of generating the countermeasure samples through one-step iteration for a voice recognition system with higher nonlinear degree.

In order to achieve the above object, the present invention provides a voice data privacy protection method based on countermeasure samples, for protecting the privacy voice data of a user, which is characterized by comprising the following steps:

step S1: loading a speech recognition model D (e.g., DeepSpeech);

step S2: pre-detecting audio data x of a user according to a speech recognition model D to obtain a recognition result of a probability distribution matrix D (x), wherein:

d (x) a probability distribution matrix representing that each frame of the audio data x corresponds to 26 characters in english;

step S3: by the formula

Extracting a character sequence S (x) corresponding to the audio data x from the probability distribution matrix D (x), wherein the character sequence S (x) is private voice data of a user needing to be protected, and the method comprises the following steps:

pr (s | D (x)) represents the probability that speech recognition model D recognizes speech data x as character sequence s, and the value of Pr (s | D (x)) is between [0,1 ];

step S4: let y be s (x), initialize x₀Setting a threshold T, an iteration step epsilon, a maximum iteration round number N, and constructing a loss function L (x) generating a challenge sample, where i is 0_i,y)＝-log(Pr(S(x)|D(x_i) ))) by the formula

The speech countermeasure samples are generated iteratively, and the loss function L (x) is recalculated after each iteration_iY) if L (x)_iY) > T then the iteration continues until L (x)_iY) is less than or equal to T, if L (x)_iStopping iteration and outputting x when y) is less than or equal to T_iIf the current iteration number exceeds the set maximum iteration number, stopping iteration and outputting x_i。

As a further preferable technical solution of the above technical solution, the audio data x is a K-dimensional vector, each dimension of the vector is 16 bits, which represents 16KHz, and the audio data x is preprocessed by using an MFC method.

As a more preferable mode of the above mode, in step S2, 26 english characters a to z are respectively indicated by using 26 numerals 0 to 25.

As a further preferable mode of the above-described mode, in step S3, the value of Pr (S | d (x)) is mapped between [0,1] using the softmax function in the torch.

As a further preferable technical solution of the above technical solution, in step S4, the threshold T is set to 0.5, the iteration step ∈ is set to 0.1, and the maximum number of iteration rounds N is set to 40.

As a further preferable technical solution of the above technical solution, in step S4, in each iteration, a fine disturbance is added to the audio data x, and the disturbance makes the audio data x face the loss making function L (x)_iY) is shifted in the direction of increasing value with the loss function L (x)_iY), the probability that the voice recognition model D recognizes the audio data x as y is gradually reduced until the audio data x is misjudged.

To achieve the above object, the present invention further provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the steps of a method for protecting privacy of voice data based on countermeasure samples.

To achieve the above object, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for privacy protection of voice data based on countermeasure samples.

Drawings

Fig. 1 is a schematic diagram of a method for protecting voice data privacy based on countermeasure samples according to the present invention.

Detailed Description

The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.

Referring to fig. 1 of the drawings, fig. 1 is a schematic diagram of a method for protecting privacy of voice data based on countermeasure samples according to the present invention.

In the preferred embodiment of the present invention, it should be noted by those skilled in the art that deep spech, torch, softmax, etc. related to the present invention can be regarded as the prior art.

Preferred embodiments.

The invention discloses a voice data privacy protection method based on a countermeasure sample, which is used for protecting privacy voice data of a user and comprises the following steps:

step S1: loading a speech recognition model D (e.g., DeepSpeech);

step S3: by the formula

Specifically, the audio data x is a K-dimensional vector, each dimension of the vector is 16 bits and represents 16KHz, and the audio data x is preprocessed by using the MFC method.

More specifically, in step S2, 26 numbers 0 to 25 are used to represent 26 english characters a to z, respectively.

Further, in step S3, the value of Pr (S | d (x)) is mapped between [0,1] using the softmax function in the torch.

Further, in step S4, the threshold T is set to 0.5, the iteration step ∈ is set to 0.1, the maximum number of iteration rounds N is set to 40, and the gradient is solved by a backward function in the torch.

Preferably, in step S4, in each iteration, a subtle perturbation is added to the audio data x that causes the audio data x to face a loss-causing function L (x)_iY) is shifted in the direction of increasing value with the loss function L (x)_iY), the probability that the voice recognition model D recognizes the audio data x as y is gradually reduced until the audio data x is misjudged.

Preferably, the method uses a gradient-based method, constructs a loss function for generating the countermeasure sample, and performs iterative optimization to generate disturbance imperceptible to human ears, so that after the disturbance is added to the private speech data, the conversation experience of the user is not affected, but the intelligent device cannot recognize the conversation content of the user, so that the private conversation content of the user cannot be analyzed and further illegally utilized, thereby protecting the private speech data of the user. Compared with the countermeasure sample generation method adopting large-step and one-step iteration, the method is more efficient. Meanwhile, by utilizing the mobility of the countermeasure sample, the invention can more effectively prevent the voice of the user from being stolen by various intelligent devices.

The invention also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the voice data privacy protection method based on the countermeasure sample.

The invention also discloses a non-transitory computer-readable storage medium on which a computer program is stored, which computer program, when executed by a processor, implements the steps of a method for privacy protection of speech data based on countermeasure samples.

It should be noted that the technical features of deep spech, torch, softmax, etc. related to the present patent application should be regarded as the prior art, and the specific structure, the operation principle, and the control mode and the spatial arrangement mode that may be related to the technical features should be adopted by the conventional selection in the field, and should not be regarded as the inventive point of the present patent, and the present patent is not further specifically described in detail.

It will be apparent to those skilled in the art that modifications and equivalents may be made in the embodiments and/or portions thereof without departing from the spirit and scope of the present invention.

Claims

1. A voice data privacy protection method based on countermeasure samples is used for protecting privacy voice data of a user, and is characterized by comprising the following steps:

step S1: loading a voice recognition model D;

step S3: by the formula

step S4: let y be s (x), initialize x₀Setting a threshold T, an iteration step size epsilon and a maximum iteration round number N, and constructing a loss function for generating a countermeasure sampleL(x_i,y)＝-log(Pr(S(x)|D(x_i) ))) by the formula x_i+1＝x_i+ε×sign(▽_xiL(x_iY)), iteratively generating a speech countermeasure sample, recalculating the loss function L (x) after each iteration_iY) if L (x)_iY) > T then the iteration continues until L (x)_iY) is less than or equal to T, if L (x)_iStopping iteration and outputting x when y) is less than or equal to T_iIf the current iteration number exceeds the set maximum iteration number, stopping iteration and outputting x_i。

2. The method of claim 1, wherein the audio data x is a K-dimensional vector, each dimension of the vector is 16 bits and represents 16KHz, and the audio data x is preprocessed by the MFC method.

3. The method for protecting privacy of voice data based on countermeasure sample as claimed in claim 2, wherein in step S2, 26 numbers 0-25 are used to represent 26 english characters a-z respectively.

4. The method for protecting privacy of voice data based on countermeasure sample according to claim 3, wherein in step S3, the value of Pr (S | D (x)) is mapped between [0,1] using softmax function in the torch.

5. The method for protecting privacy of voice data based on countermeasure samples according to claim 4, wherein in step S4, the threshold T is set to 0.5, the iteration step size ε is set to 0.1, and the maximum iteration round number N is set to 40.

6. The method of claim 5, wherein in step S4, a slight perturbation is added to the audio data x in each iteration, and the perturbation makes the audio data x face a loss making function L (x) according to the perturbation_iY) is shifted in the direction of increasing value with the loss function L (x)_i,y) The probability that the voice recognition model D recognizes the audio data x as y is gradually decreased until the audio data x is misjudged.

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of a method for privacy protection of voice data based on countermeasure samples according to any one of claims 1 to 6.

8. A non-transitory computer readable storage medium, having stored thereon a computer program, when being executed by a processor, the computer program implementing the steps of the method for privacy protection of voice data based on countermeasure samples according to any one of claims 1 to 6.