CN113129875A - Voice data privacy protection method based on countermeasure sample - Google Patents

Voice data privacy protection method based on countermeasure sample Download PDF

Info

Publication number
CN113129875A
CN113129875A CN202110271786.0A CN202110271786A CN113129875A CN 113129875 A CN113129875 A CN 113129875A CN 202110271786 A CN202110271786 A CN 202110271786A CN 113129875 A CN113129875 A CN 113129875A
Authority
CN
China
Prior art keywords
voice data
iteration
audio data
user
countermeasure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110271786.0A
Other languages
Chinese (zh)
Inventor
陈双喜
肖文红
马方超
刘会
吴至禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaxing Vocational and Technical College
Original Assignee
Jiaxing Vocational and Technical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiaxing Vocational and Technical College filed Critical Jiaxing Vocational and Technical College
Priority to CN202110271786.0A priority Critical patent/CN113129875A/en
Publication of CN113129875A publication Critical patent/CN113129875A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a voice data privacy protection method based on a countermeasure sample, which is used for protecting privacy voice data of a user and comprises the following steps of S1: loading a voice recognition model D; step S2: and pre-detecting the audio data x of the user according to the voice recognition model D to obtain a recognition result of a probability distribution matrix D (x). The invention discloses a voice data privacy protection method based on a countermeasure sample, which establishes a loss function for generating the countermeasure sample, sets a certain loss function threshold value, optimizes audio data of a user through a multi-iteration method to generate the countermeasure sample aiming at a voice recognition model, so that after disturbance is added to the private voice data, the private conversation content of the user can be clearly heard by people, the auditory experience cannot be influenced, but an intelligent device cannot recognize and analyze the speaking content of the user.

Description

Voice data privacy protection method based on countermeasure sample
Technical Field
The invention belongs to the technical field of voice data privacy protection, and particularly relates to a voice data privacy protection method based on countermeasure samples.
Background
Deep learning techniques have been applied to various aspects of daily life, particularly in speech recognition, picture recognition, object detection, and the like. However, with the development of deep learning technology, the privacy problem caused by the deep learning technology is also concerned by users. When a user carries out a private conversation, the voice assistant may secretly record the conversation contents of the user and upload voice data to the server without the permission of the user, and the private voice data may be illegally sold or used for training an artificial intelligence model of a business company. Deep learning techniques may also extract the user's private information from the leaked voice data, such as: home address, interpersonal relationship, personal preference, etc., by which the business company can push corresponding advertisements to the user or perform other business activities. The Recurrent Neural Network (RNN) has a good effect on speech recognition, obtains a probability distribution sequence of characters from an input audio signal, and then deduces sentences corresponding to the audio signal from the probability distribution sequence, and deep learning technology-based deep speech recognition system developed by Baidu corporation is the mainstream speech recognition system at present.
In the big data age, more and more voice data is being released for improving voice-based services or academic research. But there is a risk of privacy leakage during the voice data distribution process. For example, in voice data distribution, if an attacker knows the voice data of a particular user, the attacker can learn the sensitive information of the user by analyzing the voice data.
Although deep learning techniques have been widely used, more and more research has shown that there are many safety hazards to this technique itself. GoodFellow et al propose a fast gradient notation (FGSM), which is a countermeasure sample generation algorithm for a deep learning model, which is one of the mainstream countermeasure sample generation algorithms at present, by applying a slight perturbation on the input of the deep learning model so that the deep learning model produces an erroneous prediction result. The invention provides a countermeasure sample generation method based on gradient and multiple iterations, which is used for constructing a loss function for generating a countermeasure sample and performing iteration optimization to generate disturbance which is not perceivable by human ears, so that after disturbance is added to private voice data, the private conversation content of a user can be heard by people, but an intelligent device cannot recognize and analyze the speaking content of the user, and cannot be illegally utilized after the audio data is uploaded, so that the private voice data of the user is protected.
Disclosure of Invention
The invention mainly aims to provide a voice data privacy protection method based on countermeasure samples, which establishes a loss function for generating the countermeasure samples, sets a certain loss function threshold value, optimizes the audio data of a user through a multi-iteration method to generate the countermeasure samples aiming at a voice recognition model, so that after disturbance is added to the private voice data, the private conversation content of the user can be clearly heard by people, the auditory experience cannot be influenced, but an intelligent device cannot recognize and analyze the speaking content of the user, and cannot be illegally utilized after the audio data is uploaded, thereby protecting the private audio data of the user.
The invention also aims to provide a voice data privacy protection method based on countermeasure samples, which has no obvious effect on a mode of generating the countermeasure samples through one-step iteration for a voice recognition system with higher nonlinear degree.
In order to achieve the above object, the present invention provides a voice data privacy protection method based on countermeasure samples, for protecting the privacy voice data of a user, which is characterized by comprising the following steps:
step S1: loading a speech recognition model D (e.g., DeepSpeech);
step S2: pre-detecting audio data x of a user according to a speech recognition model D to obtain a recognition result of a probability distribution matrix D (x), wherein:
d (x) a probability distribution matrix representing that each frame of the audio data x corresponds to 26 characters in english;
step S3: by the formula
Figure BDA0002974439300000031
Extracting a character sequence S (x) corresponding to the audio data x from the probability distribution matrix D (x), wherein the character sequence S (x) is private voice data of a user needing to be protected, and the method comprises the following steps:
pr (s | D (x)) represents the probability that speech recognition model D recognizes speech data x as character sequence s, and the value of Pr (s | D (x)) is between [0,1 ];
step S4: let y be s (x), initialize x0Setting a threshold T, an iteration step epsilon, a maximum iteration round number N, and constructing a loss function L (x) generating a challenge sample, where i is 0i,y)=-log(Pr(S(x)|D(xi) ))) by the formula
Figure BDA0002974439300000032
The speech countermeasure samples are generated iteratively, and the loss function L (x) is recalculated after each iterationiY) if L (x)iY) > T then the iteration continues until L (x)iY) is less than or equal to T, if L (x)iStopping iteration and outputting x when y) is less than or equal to TiIf the current iteration number exceeds the set maximum iteration number, stopping iteration and outputting xi
As a further preferable technical solution of the above technical solution, the audio data x is a K-dimensional vector, each dimension of the vector is 16 bits, which represents 16KHz, and the audio data x is preprocessed by using an MFC method.
As a more preferable mode of the above mode, in step S2, 26 english characters a to z are respectively indicated by using 26 numerals 0 to 25.
As a further preferable mode of the above-described mode, in step S3, the value of Pr (S | d (x)) is mapped between [0,1] using the softmax function in the torch.
As a further preferable technical solution of the above technical solution, in step S4, the threshold T is set to 0.5, the iteration step ∈ is set to 0.1, and the maximum number of iteration rounds N is set to 40.
As a further preferable technical solution of the above technical solution, in step S4, in each iteration, a fine disturbance is added to the audio data x, and the disturbance makes the audio data x face the loss making function L (x)iY) is shifted in the direction of increasing value with the loss function L (x)iY), the probability that the voice recognition model D recognizes the audio data x as y is gradually reduced until the audio data x is misjudged.
To achieve the above object, the present invention further provides an electronic device, which includes a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program to implement the steps of a method for protecting privacy of voice data based on countermeasure samples.
To achieve the above object, the present invention also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method for privacy protection of voice data based on countermeasure samples.
Drawings
Fig. 1 is a schematic diagram of a method for protecting voice data privacy based on countermeasure samples according to the present invention.
Detailed Description
The following description is presented to disclose the invention so as to enable any person skilled in the art to practice the invention. The preferred embodiments in the following description are given by way of example only, and other obvious variations will occur to those skilled in the art. The basic principles of the invention, as defined in the following description, may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
Referring to fig. 1 of the drawings, fig. 1 is a schematic diagram of a method for protecting privacy of voice data based on countermeasure samples according to the present invention.
In the preferred embodiment of the present invention, it should be noted by those skilled in the art that deep spech, torch, softmax, etc. related to the present invention can be regarded as the prior art.
Preferred embodiments.
The invention discloses a voice data privacy protection method based on a countermeasure sample, which is used for protecting privacy voice data of a user and comprises the following steps:
step S1: loading a speech recognition model D (e.g., DeepSpeech);
step S2: pre-detecting audio data x of a user according to a speech recognition model D to obtain a recognition result of a probability distribution matrix D (x), wherein:
d (x) a probability distribution matrix representing that each frame of the audio data x corresponds to 26 characters in english;
step S3: by the formula
Figure BDA0002974439300000051
Extracting a character sequence S (x) corresponding to the audio data x from the probability distribution matrix D (x), wherein the character sequence S (x) is private voice data of a user needing to be protected, and the method comprises the following steps:
pr (s | D (x)) represents the probability that speech recognition model D recognizes speech data x as character sequence s, and the value of Pr (s | D (x)) is between [0,1 ];
step S4: let y be s (x), initialize x0Setting a threshold T, an iteration step epsilon, a maximum iteration round number N, and constructing a loss function L (x) generating a challenge sample, where i is 0i,y)=-log(Pr(S(x)|D(xi) ))) by the formula
Figure BDA0002974439300000052
The speech countermeasure samples are generated iteratively, and the loss function L (x) is recalculated after each iterationiY) if L (x)iY) > T then the iteration continues until L (x)iY) is less than or equal to T, if L (x)iStopping iteration and outputting x when y) is less than or equal to TiIf the current iteration number exceeds the set maximum iteration number, stopping iteration and outputting xi
Specifically, the audio data x is a K-dimensional vector, each dimension of the vector is 16 bits and represents 16KHz, and the audio data x is preprocessed by using the MFC method.
More specifically, in step S2, 26 numbers 0 to 25 are used to represent 26 english characters a to z, respectively.
Further, in step S3, the value of Pr (S | d (x)) is mapped between [0,1] using the softmax function in the torch.
Further, in step S4, the threshold T is set to 0.5, the iteration step ∈ is set to 0.1, the maximum number of iteration rounds N is set to 40, and the gradient is solved by a backward function in the torch.
Preferably, in step S4, in each iteration, a subtle perturbation is added to the audio data x that causes the audio data x to face a loss-causing function L (x)iY) is shifted in the direction of increasing value with the loss function L (x)iY), the probability that the voice recognition model D recognizes the audio data x as y is gradually reduced until the audio data x is misjudged.
Preferably, the method uses a gradient-based method, constructs a loss function for generating the countermeasure sample, and performs iterative optimization to generate disturbance imperceptible to human ears, so that after the disturbance is added to the private speech data, the conversation experience of the user is not affected, but the intelligent device cannot recognize the conversation content of the user, so that the private conversation content of the user cannot be analyzed and further illegally utilized, thereby protecting the private speech data of the user. Compared with the countermeasure sample generation method adopting large-step and one-step iteration, the method is more efficient. Meanwhile, by utilizing the mobility of the countermeasure sample, the invention can more effectively prevent the voice of the user from being stolen by various intelligent devices.
The invention also discloses an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the voice data privacy protection method based on the countermeasure sample.
The invention also discloses a non-transitory computer-readable storage medium on which a computer program is stored, which computer program, when executed by a processor, implements the steps of a method for privacy protection of speech data based on countermeasure samples.
It should be noted that the technical features of deep spech, torch, softmax, etc. related to the present patent application should be regarded as the prior art, and the specific structure, the operation principle, and the control mode and the spatial arrangement mode that may be related to the technical features should be adopted by the conventional selection in the field, and should not be regarded as the inventive point of the present patent, and the present patent is not further specifically described in detail.
It will be apparent to those skilled in the art that modifications and equivalents may be made in the embodiments and/or portions thereof without departing from the spirit and scope of the present invention.

Claims (8)

1. A voice data privacy protection method based on countermeasure samples is used for protecting privacy voice data of a user, and is characterized by comprising the following steps:
step S1: loading a voice recognition model D;
step S2: pre-detecting audio data x of a user according to a speech recognition model D to obtain a recognition result of a probability distribution matrix D (x), wherein:
d (x) a probability distribution matrix representing that each frame of the audio data x corresponds to 26 characters in english;
step S3: by the formula
Figure FDA0002974439290000011
Extracting a character sequence S (x) corresponding to the audio data x from the probability distribution matrix D (x), wherein the character sequence S (x) is private voice data of a user needing to be protected, and the method comprises the following steps:
pr (s | D (x)) represents the probability that speech recognition model D recognizes speech data x as character sequence s, and the value of Pr (s | D (x)) is between [0,1 ];
step S4: let y be s (x), initialize x0Setting a threshold T, an iteration step size epsilon and a maximum iteration round number N, and constructing a loss function for generating a countermeasure sampleL(xi,y)=-log(Pr(S(x)|D(xi) ))) by the formula xi+1=xi+ε×sign(▽xiL(xiY)), iteratively generating a speech countermeasure sample, recalculating the loss function L (x) after each iterationiY) if L (x)iY) > T then the iteration continues until L (x)iY) is less than or equal to T, if L (x)iStopping iteration and outputting x when y) is less than or equal to TiIf the current iteration number exceeds the set maximum iteration number, stopping iteration and outputting xi
2. The method of claim 1, wherein the audio data x is a K-dimensional vector, each dimension of the vector is 16 bits and represents 16KHz, and the audio data x is preprocessed by the MFC method.
3. The method for protecting privacy of voice data based on countermeasure sample as claimed in claim 2, wherein in step S2, 26 numbers 0-25 are used to represent 26 english characters a-z respectively.
4. The method for protecting privacy of voice data based on countermeasure sample according to claim 3, wherein in step S3, the value of Pr (S | D (x)) is mapped between [0,1] using softmax function in the torch.
5. The method for protecting privacy of voice data based on countermeasure samples according to claim 4, wherein in step S4, the threshold T is set to 0.5, the iteration step size ε is set to 0.1, and the maximum iteration round number N is set to 40.
6. The method of claim 5, wherein in step S4, a slight perturbation is added to the audio data x in each iteration, and the perturbation makes the audio data x face a loss making function L (x) according to the perturbationiY) is shifted in the direction of increasing value with the loss function L (x)i,y) The probability that the voice recognition model D recognizes the audio data x as y is gradually decreased until the audio data x is misjudged.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of a method for privacy protection of voice data based on countermeasure samples according to any one of claims 1 to 6.
8. A non-transitory computer readable storage medium, having stored thereon a computer program, when being executed by a processor, the computer program implementing the steps of the method for privacy protection of voice data based on countermeasure samples according to any one of claims 1 to 6.
CN202110271786.0A 2021-03-12 2021-03-12 Voice data privacy protection method based on countermeasure sample Pending CN113129875A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110271786.0A CN113129875A (en) 2021-03-12 2021-03-12 Voice data privacy protection method based on countermeasure sample

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110271786.0A CN113129875A (en) 2021-03-12 2021-03-12 Voice data privacy protection method based on countermeasure sample

Publications (1)

Publication Number Publication Date
CN113129875A true CN113129875A (en) 2021-07-16

Family

ID=76773034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110271786.0A Pending CN113129875A (en) 2021-03-12 2021-03-12 Voice data privacy protection method based on countermeasure sample

Country Status (1)

Country Link
CN (1) CN113129875A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115208507A (en) * 2022-07-21 2022-10-18 浙江大学 Privacy protection method and device based on white-box voice countermeasure sample
CN117648717A (en) * 2024-01-29 2024-03-05 知学云(北京)科技股份有限公司 Privacy protection method for artificial intelligent voice training

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample
CN110379418A (en) * 2019-06-28 2019-10-25 西安交通大学 A kind of voice confrontation sample generating method
CN110992951A (en) * 2019-12-04 2020-04-10 四川虹微技术有限公司 Method for protecting personal privacy based on countermeasure sample

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036389A (en) * 2018-08-28 2018-12-18 出门问问信息科技有限公司 The generation method and device of a kind of pair of resisting sample
CN110379418A (en) * 2019-06-28 2019-10-25 西安交通大学 A kind of voice confrontation sample generating method
CN110992951A (en) * 2019-12-04 2020-04-10 四川虹微技术有限公司 Method for protecting personal privacy based on countermeasure sample

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALEXEY KURAKIN等: "ADVERSARIAL EXAMPLES IN THE PHYSICAL WORLD", 《ARXIV》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115208507A (en) * 2022-07-21 2022-10-18 浙江大学 Privacy protection method and device based on white-box voice countermeasure sample
CN117648717A (en) * 2024-01-29 2024-03-05 知学云(北京)科技股份有限公司 Privacy protection method for artificial intelligent voice training
CN117648717B (en) * 2024-01-29 2024-05-03 知学云(北京)科技股份有限公司 Privacy protection method for artificial intelligent voice training

Similar Documents

Publication Publication Date Title
WO2022142014A1 (en) Multi-modal information fusion-based text classification method, and related device thereof
WO2022142006A1 (en) Semantic recognition-based verbal skill recommendation method and apparatus, device, and storage medium
CN107547718B (en) Telecommunication fraud identification and defense system based on deep learning
WO2022105118A1 (en) Image-based health status identification method and apparatus, device and storage medium
CN110379418B (en) Voice confrontation sample generation method
CN112395466B (en) Fraud node identification method based on graph embedded representation and cyclic neural network
CN113628059B (en) Associated user identification method and device based on multi-layer diagram attention network
CN112214775B (en) Injection attack method, device, medium and electronic equipment for preventing third party from acquiring key diagram data information and diagram data
US10291628B2 (en) Cognitive detection of malicious documents
CN113129875A (en) Voice data privacy protection method based on countermeasure sample
CN110347802B (en) Text analysis method and device
CN115051817B (en) Phishing detection method and system based on multi-mode fusion characteristics
CN112597759A (en) Text-based emotion detection method and device, computer equipment and medium
Ra et al. DeepAnti-PhishNet: Applying deep neural networks for phishing email detection
WO2023071105A1 (en) Method and apparatus for analyzing feature variable, computer device, and storage medium
CN112632244A (en) Man-machine conversation optimization method and device, computer equipment and storage medium
CN113326940A (en) Knowledge distillation method, device, equipment and medium based on multiple knowledge migration
CN111680161A (en) Text processing method and device and computer readable storage medium
Chen et al. XSS adversarial example attacks based on deep reinforcement learning
Miranda-García et al. Deep learning applications on cybersecurity: A practical approach
CN114282258A (en) Screen capture data desensitization method and device, computer equipment and storage medium
CN115952854B (en) Training method of text desensitization model, text desensitization method and application
Fang et al. Privacy leakage on dnns: A survey of model inversion attacks and defenses
Kwon et al. Toward backdoor attacks for image captioning model in deep neural networks
Wu et al. Semantic key generation based on natural language

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210716

RJ01 Rejection of invention patent application after publication