CN111310836B - Voiceprint recognition integrated model defending method and defending device based on spectrogram - Google Patents

Voiceprint recognition integrated model defending method and defending device based on spectrogram Download PDF

Info

Publication number
CN111310836B
CN111310836B CN202010105807.7A CN202010105807A CN111310836B CN 111310836 B CN111310836 B CN 111310836B CN 202010105807 A CN202010105807 A CN 202010105807A CN 111310836 B CN111310836 B CN 111310836B
Authority
CN
China
Prior art keywords
voiceprint recognition
spectrogram
sample
integrated model
voiceprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010105807.7A
Other languages
Chinese (zh)
Other versions
CN111310836A (en
Inventor
陈晋音
叶林辉
王雪柯
郑喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202010105807.7A priority Critical patent/CN111310836B/en
Publication of CN111310836A publication Critical patent/CN111310836A/en
Application granted granted Critical
Publication of CN111310836B publication Critical patent/CN111310836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a voiceprint recognition integrated model defending method based on a spectrogram, which comprises the following steps: (1) Collecting an audio file, and converting the audio file into a spectrogram, wherein the spectrogram is used as a benign sample; (2) Training a plurality of voiceprint recognition models by using benign samples to obtain a plurality of trained voiceprint recognition models; (3) A voting mechanism is adopted to integrate a plurality of voiceprint recognition models which are better and obtained from screening from the trained voiceprint recognition models, so as to form a voiceprint recognition integrated model, and the voiceprint recognition integrated model is retrained by utilizing benign samples; (4) Collecting a cuckoo search algorithm to attack a plurality of voiceprint recognition models respectively to generate an countermeasure sample; (5) Retraining the voiceprint recognition integrated model obtained in the step (3) by using an antagonism sample and a benign sample to obtain a voiceprint recognition integrated model capable of resisting attack; (6) And (5) performing defending and identifying on the spectrogram corresponding to the audio file by utilizing the voiceprint identification integrated model obtained in the step (5).

Description

Voiceprint recognition integrated model defending method and defending device based on spectrogram
Technical Field
The invention belongs to the field of information security research, and particularly relates to a voiceprint recognition integrated model defense method and a voiceprint recognition integrated model defense device based on a spectrogram.
Background
Because the voice organ, tongue, teeth, lung and the like of each person have great differences in size and shape, the voices of each person speaking are different, the spectrograms of the voice are different, in fact, each person voice has unique identity information, and voiceprint recognition is to use the characteristic of the voice to recognize the identity of a speaker. Voiceprint recognition is one of the biometric techniques, and is classified into text-dependent and text-independent voiceprint recognition. Text-independent voiceprint recognition: the finger print recognition system has no requirement on the voice text content, and the speaking content of a speaker is free and random. Text-related voiceprint recognition: referring to speaker recognition systems, a user is required to pronounce according to a pre-specified content. The text-related voiceprint recognition model requires a user to pronounce according to a specified text, and once the pronunciation of the user is wrong, the identity cannot be recognized, so that the application range is narrow. The voice print recognition model which is irrelevant to the text has no requirement on the sounding content of the user, is convenient to recognize, has wider application range and has higher realization difficulty.
The deep neural network can fully utilize the relevance among the voice features, and training the voice features of the continuous frames after combining, so that the recognition rate of the voiceprint recognition system is greatly improved. The voiceprint recognition system based on the deep neural network brings convenience to people and also brings corresponding risks while improving recognition accuracy. The deep neural network is easy to be subjected to a countermeasure attack in a form of adding a fine disturbance to input data, and an attacker can add a carefully calculated disturbance to the audio of a certain speaker after obtaining the characteristics of the certain target speaker, so that a generated countermeasure sample is erroneously identified as the target speaker by a voiceprint identification model, and great potential safety hazards are brought to a voiceprint identification system and personal property safety.
Existing voiceprint recognition attack methods are mainly divided into white box attacks and black box attacks. White-box attacks are performed by an attacker under the condition of knowing internal parameters of a model, the gradient of the model with respect to noise is calculated through back propagation, and the noise to be added is continuously optimized through iteration, so that the purpose of generating an countermeasure sample is achieved. The black box attack is performed by an attacker under the condition of unknown model parameters, and the disturbance required to be added can be optimized by utilizing optimization algorithms such as a genetic algorithm, a particle swarm algorithm and the like, so that an countermeasure sample is generated. Both white box attacks and black box attacks can attack the voiceprint recognition system, so that the voiceprint recognition system can erroneously recognize the challenge sample as a target speaker.
Disclosure of Invention
Aiming at the problems that the existing voiceprint recognition system is low in precision and poor in robustness and is easy to be subjected to security against sample attack, the invention provides a voiceprint recognition integrated model defense method and a voiceprint recognition integrated model defense device based on a spectrogram.
The technical scheme of the invention is as follows:
a voiceprint recognition integrated model defending method based on a spectrogram comprises the following steps:
(1) Collecting an audio file, and converting the audio file into a spectrogram, wherein the spectrogram is used as a benign sample;
(2) Training a plurality of image recognition models by utilizing benign samples to enable the image recognition models to achieve the effect of voiceprint recognition, so as to obtain a plurality of trained voiceprint recognition models based on images;
(3) Integrating the plurality of the voiceprint recognition models based on the images trained in the step (2) by adopting a voting mechanism to form a voiceprint recognition integrated model, and retraining the voiceprint recognition integrated model by utilizing a benign sample;
(4) Collecting a cuckoo search algorithm to attack a plurality of voiceprint recognition models respectively, generating an countermeasure sample, and converting the countermeasure sample into a spectrogram to serve as a malignant sample;
(5) Retraining the voiceprint recognition integrated model based on the image obtained in the step (3) by utilizing a malignant sample and a benign sample to obtain a voiceprint recognition integrated model capable of resisting attack;
(6) And (5) performing defending and identifying on the spectrogram corresponding to the audio file by utilizing the voiceprint identification integrated model obtained in the step (5).
Preferably, the specific steps of converting the audio file into a spectrogram are:
framing the audio, windowing each frame of voice signal, and performing short-time Fourier transform;
calculating a power spectrum of the short-time Fourier transform result, normalizing the power spectrum to obtain a spectrogram, and forming a benign sample by the spectrogram and a corresponding speaker.
Preferably, the image recognition model adopts VGG16 or VGG19.
Preferably, the specific process of training the plurality of voiceprint recognition models by using the benign samples is as follows:
preprocessing a spectrogram, setting the size of the spectrogram to 224 multiplied by 3, and obtaining a spectrogram sample;
sonogram sample x i Confidence level of output through voiceprint recognition model is y ipre Using cross entropy as the loss function, using the loss function L (x i ) Optimizing parameters of a voiceprint recognition model;
L(x i )=-[y i logy ipre +(1-y i )log(1-y ipre )]
and testing the accuracy of the trained voiceprint recognition model by utilizing the spectrogram in the test set, and retraining the voiceprint recognition model until the recognition accuracy reaches the requirement when the recognition accuracy does not reach the requirement.
The specific process of the step (3) is as follows:
integrating a plurality of voiceprint recognition models based on images by utilizing a voting mechanism to obtain a voiceprint recognition integrated model;
before voting, converting the prediction confidence coefficient returned by each voiceprint recognition model into a prediction category, namely, using a category label corresponding to the highest confidence coefficient as a prediction result of the voiceprint recognition model;
after each voiceprint recognition model obtains a prediction result of a voiceprint sample, if a certain prediction category obtains more than half of voiceprint recognition model votes, the prediction category is the prediction result of the voiceprint recognition integrated model;
and training the voiceprint recognition integrated model by using benign samples, and testing by using a testing set to improve the voiceprint recognition integrated model.
The device for defending the voiceprint recognition integrated model based on the spectrogram comprises a computer memory, a computer processor and a computer program which is stored in the computer memory and can be executed on the computer processor, wherein the computer processor realizes the defending method of the voiceprint recognition integrated model based on the spectrogram when executing the computer program.
Based on possible defects of the voiceprint recognition system and limitations of the existing attack method, the invention researches a method for converting voice into a spectrogram, and trains an image recognition model by utilizing the spectrogram so as to achieve the purpose of voiceprint recognition. And a plurality of trained image recognition models are integrated together, so that the special voiceprint recognition model can resist attack against a sample while the model precision is improved, and the defending capability of the model is further improved through the countermeasure training, so that the defending against white box or black box attack is realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for defending a voiceprint recognition integrated model based on a spectrogram according to an embodiment;
FIG. 2 is a schematic diagram of a structure for obtaining a challenge sample according to an embodiment;
FIG. 3 is a schematic diagram of retraining an integrated voiceprint recognition model provided by an embodiment.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
Referring to fig. 1 to 3, the defending method of the voiceprint recognition integrated model based on the spectrogram provided by the embodiment includes the following steps:
1) A data set for voiceprint recognition model training was prepared, using the train-clean-100 data set in the librispech speech data set as the data set. Each file of the train-clean-100 data stores audio of different speakers, so that one folder corresponds to one speaker, and the file name is actually a tag;
2) The audio files in the folders are preprocessed and converted into spectrograms, and the spectrograms are stored in the corresponding folders, wherein file names are class labels corresponding to the spectrograms, namely identities of speakers. Dividing the training set into a training set and a testing set according to a certain proportion. The specific process is as follows:
step1: for each audio file x (n) in the train-claen-100 dataset, it is framed, each frame length being 25ms, during which time the speech signal is considered steady state. The windowing function of the audio signal after framing avoids high frequency part signal leakage. After framing and windowing, the voice signal is subjected to short-time Fourier transform:
where k ε {0,1, … N-1}, where N represents the number of samples contained in a frame of audio file and w (N-m) is a window function that slides along the time axis.
Step2: the power spectrum is obtained according to X (n, k)
P(n,k)=|X(n,k)| 2 (2)
Step3: because of the large amount of non-zero noise in the silence segment of speech, the spectrogram is processed by a max-min normalization method. After normalization processing, the brightness degree and the brightness distribution condition corresponding to the mean value and the variance of the spectrogram are more uniform, and the normalization formula is as follows:
in G (a, b), a represents a corresponding time, b represents a frequency at a time a, and the magnitude of G (a, b) represents an energy magnitude contained in an audio component having a frequency magnitude b at the corresponding time a. The G (a, b) can draw a spectrogram, and the energy of different frequency components at each time point is represented by the same color but different shades.
Step4: the generated spectrogram is stored in a corresponding folder according to the corresponding speaker, the file name is a class mark, namely the corresponding speaker, and the generated spectrogram data set is divided into a training set and a testing set according to a certain proportion.
3) Training a voiceprint recognition model based on a spectrogram: and training a VGG16 model by using the generated spectrogram, wherein the file name is the class label of the spectrogram, so that the aim of realizing voiceprint recognition by using image recognition is fulfilled. After training, testing by using a testing set to enable the recognition precision to meet the requirement, and if the recognition precision does not meet the requirement, continuing training the model until the model precision meets the requirement. The method comprises the following specific steps:
step1, preprocessing the image, and setting the size of the spectrogram to 224×224×3.
Step2, building a VGG16 model. An image recognition model based on a CNN structure is built, and the structure is provided with 13 convolution layers and 3 full connection layers.
Step3, setting related parameters and training. Setting a spectrogram sample x i Confidence level of output through VGG16 model is y ipre Cross entropy is used as a loss function:
L(x i )=-[y i logy ipre +(1-y i )log(1-y ipre )] (4)
wherein y is i Representing the real label.
Step4, testing the accuracy of the identification model by using the test data set to ensure that the preset identification accuracy is achieved, otherwise, modifying the structure and parameters of the model to retrain.
4) Replacing the model structure, repeating the step 3), and training a plurality of voiceprint recognition models based on the spectrogram with different structures. After training, testing each image recognition model by using a test set to ensure that the recognition precision meets the requirement, and if the recognition precision does not meet the requirement, changing model parameters to continue training the model until the precision of each model meets the requirement. Thereby obtaining a plurality of voiceprint recognition models based on the voiceprint.
5) And integrating the obtained voiceprint recognition models based on the voiceprint spectrograms. The integrated model is provided with a plurality of voiceprint recognition models with different structures and based on the spectrogram, and the output of each model is voted by adopting a voting method. And then training is carried out again, so that the recognition accuracy and the robustness of the model are further improved. The method comprises the following specific steps:
step1: integrating the obtained voiceprint recognition models based on the spectrograms, and adopting a voting mechanism after integration.
Step2: before voting, converting the prediction confidence coefficient returned by each voiceprint recognition model into a prediction category, namely, using a category label corresponding to the highest confidence coefficient as a prediction result of the voiceprint recognition model.
Step3: after each model obtains the final prediction of the input sample x, if more than half of models are voted for a certain prediction class, namely if more than half of outputs in the voiceprint recognition integrated model output are speakers A for a voiceprint sample, the speakers A to which the audio corresponding to the voiceprint sample belongs are considered;
step trains the voiceprint recognition integrated model by using a train-clean-100 data set, and tests the voiceprint recognition integrated model by using a test set, so that the recognition precision of the model and the defending capability of the model are further improved.
6) Attack voiceprint recognition model based on voiceprint map: and attacking the voiceprint recognition model based on the spectrogram by adopting a cuckoo search algorithm. And (3) for the voiceprint recognition models based on the spectrograms obtained in the step (4), attacking each model by adopting a cuckoo search algorithm, continuously iterating, optimizing and searching for the optimal disturbance, and superposing the optimal disturbance on the original audio to generate an countermeasure sample. The method comprises the following specific steps:
step1: initializing a fitness function, and defining the fitness function as follows:
f=[y ti logy ipre +(1-y ti )log(1-y advipre )]+c·||x advi -x i,0 || 2 (5) (5) wherein x advi Representing challenge samples, x i,0 Representing the original audio, y ti A tag representing the targeted speaker,y advipre the output of the challenge sample is represented, where the difference between the challenge sample and the original sample is measured by an L2 function, the magnitude of this difference being controlled by a parameter c.
Step2: the bird nest is initialized. Setting the number of bird nests as G, initializing random disturbance with the same size as the original audio frequency, and superposing the random disturbance on the original audio frequency to form an initial countermeasure sample. I.e. the initial bird nest, set to:
X=x 1 ,x 2 ,…,x G (6)
step3: new bird nests are obtained by the lewy flight, i.e. new challenge samples are obtained by the lewy flight, which is updated as follows:
x i =x i +α*S*n (7)
where α is the step size scale factor and n is the sum of x i The dimensions are the same, an array of standard normally distributed random numbers. S is the step length:
where u, v are two variables subject to a gaussian distribution, β is a constant, σ 2 Calculated from the formula:
wherein the method comprises the steps ofIs a gamma function.
Step4: calculate fitness of each individual, noted as f=f 1 ,f 2 ,…,f G Finding the optimal individual in the population, namely the individual with the smallest fitness function value, and marking the optimal individual as F global . If the iteration number reaches the set maximum iteration number or the generated challenge sample can be classified into the target class, stopping iteration and outputting the challenge sample. If the above condition is not satisfied, step1-Step3, continuing iterative optimization of the population. Thus, challenge samples generated under different models can be obtained.
7) Challenge training is based on an integrated voiceprint recognition model of a spectrogram: and (3) converting the countermeasure sample generated in the step (6) into a spectrogram, adding the spectrogram into a training data set, retraining the integrated voiceprint recognition model based on the spectrogram, improving the recognition precision and the defense capability of the integrated voiceprint recognition model, and improving the safety and the stability of the voiceprint recognition model.
The embodiment also provides a defending device of the voiceprint recognition integrated model based on the spectrogram, which comprises a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer processor realizes the defending method of the voiceprint recognition integrated model based on the spectrogram when executing the computer program.
The computer program stored in the defending device and the computer memory is mainly used for realizing the defending method of the voiceprint recognition integrated model based on the spectrogram, so that the effect of the defending device on the defending method is corresponding, and the details are not repeated here.
Aiming at the possible attack to the white box or the black box of the voiceprint recognition system, the invention converts the voice signal into the spectrogram, achieves the purpose of voiceprint recognition by utilizing the image recognition model, and obtains the defending capability against samples and realizes the defending to the white box or the black box attack while improving the voiceprint recognition accuracy after integrating a plurality of image recognition models.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (4)

1. The voiceprint recognition integrated model defending method based on the spectrogram is characterized by comprising the following steps of:
(1) Collecting an audio file, and converting the audio file into a spectrogram, wherein the spectrogram is used as a benign sample;
(2) Training a plurality of image recognition models by utilizing benign samples to enable the image recognition models to achieve the effect of voiceprint recognition, so as to obtain a plurality of image-based voiceprint recognition models;
(3) Integrating the plurality of the trained voiceprint recognition models based on the images in the step (2) by adopting a voting mechanism to form a voiceprint recognition integrated model, and retraining the voiceprint recognition integrated model by utilizing a benign sample, wherein the method specifically comprises the following steps of: integrating a plurality of voiceprint recognition models by utilizing a voting mechanism to obtain a voiceprint recognition integrated model; before voting, converting the prediction confidence coefficient returned by each voiceprint recognition model into a prediction category, namely, using a category label corresponding to the highest confidence coefficient as a prediction result of the voiceprint recognition model; after each voiceprint recognition model obtains a prediction result of a voiceprint sample, if a certain prediction category obtains more than half of voiceprint recognition model votes, the prediction category is the prediction result of the voiceprint recognition integrated model; training the voiceprint recognition integrated model by using benign samples, and testing by using a testing set to improve the voiceprint recognition integrated model;
(4) The cuckoo search algorithm is adopted to attack a plurality of voiceprint recognition models respectively, an countermeasure sample is generated, and the countermeasure sample is converted into a spectrogram which is used as a malignant sample, and the method specifically comprises the following steps:
(4-1) initializing a fitness function, defining the fitness function as follows:
f=[y ti logy ipre +(1-y ti )log(1-y advipre )]+c·||x advi -x i,0 || 2
wherein x is advi Representing challenge samples, x i,0 Representing the original audio, y ti Tag representing target speaker, y advipre Representing the output of the challenge sample, wherein the difference between the challenge sample and the original audio is measured by an L2 function, the magnitude of this difference being controlled by a parameter c, y ipre Representing confidence of voiceprint recognition model output;
(4-2) initializing bird nests, setting the number of bird nests as G, initializing random disturbance with the same size as the original audio frequency, and superposing the random disturbance on the original audio frequency to form an initial countermeasure sample, namely setting the initial bird nests as follows:
X=x 1 ,x 2 ,…,x G
(4-3) obtaining a new bird nest by a lewy flight, i.e., obtaining a new challenge sample by a lewy flight, the lewy flight being updated as follows:
x i =x i +α*S*n
where α is the step size scale factor and n is the sum of x i The number of dimensions is the same, an array of standard normal distributed random numbers, S is the step size:
where u, v are two variables subject to a gaussian distribution, β is a constant, σ 2 Calculated from the formula:
wherein the method comprises the steps ofIs a gamma function;
(4-4) calculating fitness of each individual, denoted as f=f 1 ,f 2 ,…,f G Finding the optimal individual in the population, namely the individual with the smallest fitness function value, and marking the optimal individual as F global If the iteration number reaches the set maximum iteration number or the generated challenge sample can be classified into the target class, stopping iteration, outputting the challenge sample, and if the conditions are not met, repeating the steps (4-1) - (4-3), and continuing iteration optimizing on the population, so that the challenge sample generated under different voiceprint recognition models can be obtained;
(5) Retraining the voiceprint recognition integrated model based on the image obtained in the step (3) by utilizing a malignant sample and a benign sample to obtain a voiceprint recognition integrated model capable of resisting attack;
(6) And (5) performing defending and identifying on the spectrogram corresponding to the audio file by utilizing the voiceprint identification integrated model obtained in the step (5).
2. The method for defending a voiceprint recognition integrated model based on a spectrogram according to claim 1, wherein the specific steps of converting an audio file into the spectrogram are as follows:
framing the audio, windowing each frame of voice signal, and performing short-time Fourier transform;
calculating a power spectrum of the short-time Fourier transform result, normalizing the power spectrum to obtain a spectrogram, and forming a benign sample by the spectrogram and a corresponding speaker.
3. The method for defending a voiceprint recognition integrated model based on a spectrogram according to claim 1, wherein the image recognition model adopts VGG16 or VGG19.
4. A device for defending a voiceprint recognition integrated model based on a spectrogram, comprising a computer memory, a computer processor and a computer program stored in the computer memory and executable on the computer processor, wherein the computer processor implements the method for defending a voiceprint recognition integrated model based on a spectrogram according to any one of claims 1 to 3 when the computer program is executed.
CN202010105807.7A 2020-02-20 2020-02-20 Voiceprint recognition integrated model defending method and defending device based on spectrogram Active CN111310836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010105807.7A CN111310836B (en) 2020-02-20 2020-02-20 Voiceprint recognition integrated model defending method and defending device based on spectrogram

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010105807.7A CN111310836B (en) 2020-02-20 2020-02-20 Voiceprint recognition integrated model defending method and defending device based on spectrogram

Publications (2)

Publication Number Publication Date
CN111310836A CN111310836A (en) 2020-06-19
CN111310836B true CN111310836B (en) 2023-08-18

Family

ID=71162113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010105807.7A Active CN111310836B (en) 2020-02-20 2020-02-20 Voiceprint recognition integrated model defending method and defending device based on spectrogram

Country Status (1)

Country Link
CN (1) CN111310836B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112420072B (en) * 2021-01-25 2021-04-27 北京远鉴信息技术有限公司 Method and device for generating spectrogram, electronic equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014074732A (en) * 2012-10-02 2014-04-24 Nippon Hoso Kyokai <Nhk> Voice recognition device, error correction model learning method and program
CN107154258A (en) * 2017-04-10 2017-09-12 哈尔滨工程大学 Method for recognizing sound-groove based on negatively correlated incremental learning
US9824692B1 (en) * 2016-09-12 2017-11-21 Pindrop Security, Inc. End-to-end speaker recognition using deep neural network
CN108446765A (en) * 2018-02-11 2018-08-24 浙江工业大学 The multi-model composite defense method of sexual assault is fought towards deep learning
CN108711436A (en) * 2018-05-17 2018-10-26 哈尔滨工业大学 Speaker verification's system Replay Attack detection method based on high frequency and bottleneck characteristic
CN109801636A (en) * 2019-01-29 2019-05-24 北京猎户星空科技有限公司 Training method, device, electronic equipment and the storage medium of Application on Voiceprint Recognition model
CN110610708A (en) * 2019-08-31 2019-12-24 浙江工业大学 Voiceprint recognition attack defense method based on cuckoo search algorithm
CN110728993A (en) * 2019-10-29 2020-01-24 维沃移动通信有限公司 Voice change identification method and electronic equipment
CN110767216A (en) * 2019-09-10 2020-02-07 浙江工业大学 Voice recognition attack defense method based on PSO algorithm
CN110808033A (en) * 2019-09-25 2020-02-18 武汉科技大学 Audio classification method based on dual data enhancement strategy

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10657259B2 (en) * 2017-11-01 2020-05-19 International Business Machines Corporation Protecting cognitive systems from gradient based attacks through the use of deceiving gradients
CN108900725B (en) * 2018-05-29 2020-05-29 平安科技(深圳)有限公司 Voiceprint recognition method and device, terminal equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014074732A (en) * 2012-10-02 2014-04-24 Nippon Hoso Kyokai <Nhk> Voice recognition device, error correction model learning method and program
US9824692B1 (en) * 2016-09-12 2017-11-21 Pindrop Security, Inc. End-to-end speaker recognition using deep neural network
CN107154258A (en) * 2017-04-10 2017-09-12 哈尔滨工程大学 Method for recognizing sound-groove based on negatively correlated incremental learning
CN108446765A (en) * 2018-02-11 2018-08-24 浙江工业大学 The multi-model composite defense method of sexual assault is fought towards deep learning
CN108711436A (en) * 2018-05-17 2018-10-26 哈尔滨工业大学 Speaker verification's system Replay Attack detection method based on high frequency and bottleneck characteristic
CN109801636A (en) * 2019-01-29 2019-05-24 北京猎户星空科技有限公司 Training method, device, electronic equipment and the storage medium of Application on Voiceprint Recognition model
CN110610708A (en) * 2019-08-31 2019-12-24 浙江工业大学 Voiceprint recognition attack defense method based on cuckoo search algorithm
CN110767216A (en) * 2019-09-10 2020-02-07 浙江工业大学 Voice recognition attack defense method based on PSO algorithm
CN110808033A (en) * 2019-09-25 2020-02-18 武汉科技大学 Audio classification method based on dual data enhancement strategy
CN110728993A (en) * 2019-10-29 2020-01-24 维沃移动通信有限公司 Voice change identification method and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Jinyin Chen,et al..Can Adversarial Network Attack be Defended?.《arXiv》.2019,全文. *

Also Published As

Publication number Publication date
CN111310836A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
Yu et al. Spoofing detection in automatic speaker verification systems using DNN classifiers and dynamic acoustic features
WO2018107810A1 (en) Voiceprint recognition method and apparatus, and electronic device and medium
US9368110B1 (en) Method for distinguishing components of an acoustic signal
CN101136199B (en) Voice data processing method and equipment
CN110610708B (en) Voiceprint recognition attack defense method based on cuckoo search algorithm
Samizade et al. Adversarial example detection by classification for deep speech recognition
CN104900235A (en) Voiceprint recognition method based on pitch period mixed characteristic parameters
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
Chen et al. Towards understanding and mitigating audio adversarial examples for speaker recognition
Yücesoy et al. Gender identification of a speaker using MFCC and GMM
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN112735435A (en) Voiceprint open set identification method with unknown class internal division capability
Nidhyananthan et al. Language and text-independent speaker identification system using GMM
Biagetti et al. Speaker identification in noisy conditions using short sequences of speech frames
CN111310836B (en) Voiceprint recognition integrated model defending method and defending device based on spectrogram
Zhang et al. A highly stealthy adaptive decay attack against speaker recognition
Lin et al. A multiscale chaotic feature extraction method for speaker recognition
Mallikarjunan et al. Text-independent speaker recognition in clean and noisy backgrounds using modified VQ-LBG algorithm
Herrera-Camacho et al. Design and testing of a corpus for forensic speaker recognition using MFCC, GMM and MLE
Panda et al. Study of speaker recognition systems
Bakır Automatic speaker gender identification for the German language
Mansour et al. A comparative study in emotional speaker recognition in noisy environment
Ghonem et al. Classification of stuttering events using i-vector
Srinivas LFBNN: robust and hybrid training algorithm to neural network for hybrid features-enabled speaker recognition system
Al-Noori et al. Robust speaker recognition in noisy conditions by means of online training with noise profiles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant