CN114373467A - Antagonistic audio sample generation method based on three-group parallel genetic algorithm - Google Patents

Antagonistic audio sample generation method based on three-group parallel genetic algorithm Download PDF

Info

Publication number
CN114373467A
CN114373467A CN202210026272.3A CN202210026272A CN114373467A CN 114373467 A CN114373467 A CN 114373467A CN 202210026272 A CN202210026272 A CN 202210026272A CN 114373467 A CN114373467 A CN 114373467A
Authority
CN
China
Prior art keywords
population
individuals
auxiliary
populations
main
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210026272.3A
Other languages
Chinese (zh)
Inventor
徐东亮
翟文升
马骁
刘志伟
徐舜
杨承林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN202210026272.3A priority Critical patent/CN114373467A/en
Publication of CN114373467A publication Critical patent/CN114373467A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting
    • G10L2015/0636Threshold criteria for the updating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Abstract

The invention discloses a method for generating a antagonism audio sample based on three group parallel genetic algorithms, which comprises the following steps: a: obtaining an original audio sample; b: obtaining a corresponding input sample, a main population and two auxiliary populations; c: respectively calculating the fitness score of each individual; d: sorting all individuals in the main population and the auxiliary population respectively according to the sequence of fitness scores from low to high; e: using a voice recognition model to sequentially classify and recognize all the sequenced individuals in the main population; and F, performing cross-breeding, variation and individual update on the main population and the auxiliary population by using the three population parallel genetic algorithms, and then returning to the step D. The method can obtain the optimal solution meeting the requirements through multiple iterations, fully solves the problems of target network agnostic property and errors caused by conversion of the mel frequency cepstrum coefficient, and has the advantages of high convergence speed, strong global search capability and high convergence efficiency.

Description

Antagonistic audio sample generation method based on three-group parallel genetic algorithm
Technical Field
The invention relates to the field of voice recognition, in particular to a method for generating a antagonism audio sample based on three group parallel genetic algorithms.
Background
With the success of the deep learning model in the speech recognition application, the automatic speech recognition control system, such as Amazon Alexa, google speech assistant, apple siri, microsoft Cortana, science news flyer and other commercial products, is widely applied in the human-computer interaction, and is successful in multiple fields such as mobile equipment and smart home, and especially, the key application is realized in application scenes with higher security level, such as automatic driving, voiceprint identity authentication and the like.
Recent studies have shown that neural networks are vulnerable to adversarial attacks. In the field of speech recognition, there is also the problem that an attacker adds a slight disturbance to the audio, which causes the neural network to input distinct values, but the human ear cannot recognize the slight disturbance. The capability of the neural network for resisting malicious attacks can be effectively improved by training the model through generating the countermeasure sample. On one hand, however, compared with the generation of confrontational samples in the computer vision field, it is more difficult to design the confrontational samples in the voice field; on the other hand, the current research on the confrontational sample in the voice field is weak. At present, the anti-attack algorithm in the voice field is designed based on an optimized C & W attack algorithm, the method usually needs great computing resources and time overhead, and the practicability of the current voice anti-attack algorithm is severely restricted.
Disclosure of Invention
The invention aims to provide a method for generating a antagonism audio sample based on three group parallel genetic algorithms, which can obtain an optimal solution meeting requirements through multiple iterations, fully solve the problems of target network agnostic property and errors caused by Mel Frequency Cepstrum Coefficient (MFCC) conversion, and have the advantages of high convergence speed, strong global search capability and high convergence efficiency.
The invention adopts the following technical scheme:
a method for generating a antagonism audio sample based on three group parallel genetic algorithms comprises the following steps:
a: initializing each original voice file in the voice data set into an original audio sample in a binary string form; then entering the step B;
b: selecting an original audio sample, and repeatedly adding Gaussian noise to the least significant bit of the random subset of the original audio sample for N times to obtain corresponding N generated audio samples, namely input samples; obtaining a main population and two auxiliary populations of each original audio sample according to the method, wherein each population consists of N generated audio samples; then entering step C;
c: taking each input sample as an individual, and respectively calculating the fitness score of each individual; then entering step D; the fitness score of the individual is the Euclidean distance between the original audio sample corresponding to the individual and the generated audio sample;
d: sorting all individuals in the main population and the auxiliary population respectively according to the sequence of the fitness scores from low to high by utilizing the fitness scores; then entering step E;
e: and sequentially carrying out classification recognition on all the sequenced individuals in the main population by using a speech recognition model, wherein in the classification recognition process:
if successfully attacked individuals appear in the main population, stopping classification and identification, and directly outputting the successfully attacked individuals as final antagonistic audio samples;
if the successful attacking individuals do not appear in the main population, whether the set iteration times are reached is judged:
if the iteration times are reached, the judgment process is exited, and the result of the failure of generating the antagonistic audio is output;
if the iteration times are not reached, entering the step F;
dividing individuals in the main population and the two auxiliary populations into elite individuals and non-elite individuals according to a set retention probability, and respectively carrying out genetic operations including cross operation and variation operation on the main population and the two auxiliary populations after setting different genetic operation parameters, wherein the elite individuals do not carry out the genetic operations; obtaining a main population and two auxiliary populations which have completed genetic operation, and offspring individuals corresponding to each population, wherein the offspring individuals consist of elite individuals which are not subjected to cross variation operation and non-elite individuals which are subjected to cross variation operation; then, sequencing all the offspring individuals in the corresponding population subjected to genetic operation according to the fitness score of each offspring individual obtained by calculation from low to high; then, selecting a plurality of optimal individuals from the two auxiliary populations with the completed genetic operations according to a set optimal individual selection threshold, adding the optimal individuals into the main population with the completed genetic operations, and replacing a plurality of offspring individuals with the highest fitness scores in a corresponding number in the main population with the completed genetic operations according to the fitness scores to obtain a main population with the completed genetic operations and the updated individuals; and then returns to step D.
In the step A, the voice recognition model adopts the speed _ commands of the tensoflow official, 10 groups of voice files classified by the labels are recognized and trained by using the voice recognition model, the labels of each group of voice files are corresponding English words, and each group of voice files comprise voice files of which the English words are spoken by different speakers.
In the step B, the individuals and the quantity in the main population and the two auxiliary populations are the same, and the number of the individuals in a single population is set to be 20-40.
In the step B, when gaussian noise is added, each bit element of the original audio sample in the form of a binary string is traversed, and the currently traversed element is converted by a set conversion probability, from 1 to 0 or from 0 to 1.
The step F comprises the following specific steps:
f1: respectively sequencing the current individuals in the main population and the two auxiliary populations by using an elite sense method, taking the first P individuals as elite individuals according to a set retention probability, and taking the remaining N-P individuals as non-elite individuals;
f2: performing cross operation on a main population and two auxiliary populations consisting of the current generation individuals by adopting an average cross method, wherein the elite individuals in the main population and the two auxiliary populations do not participate in the cross operation; then proceed to step F3;
f3: performing variation operation on the main population and the two auxiliary populations which are subjected to the cross operation, wherein elite individuals in the main population and the two auxiliary populations do not participate in the variation operation; then proceed to step F4;
f4: calculating the fitness score of each offspring individual in the main population and the two auxiliary populations after genetic operation by using the method in the step D; then, according to the sequence of fitness scores from low to high, respectively sequencing all offspring individuals in the main population and the two auxiliary populations which have completed the previous round of genetic operations; then, according to a set optimal individual selection threshold value, sequencing the auxiliary populations into front L individuals serving as optimal individuals, and adding 2L optimal individuals into the main population after the genetic manipulation is finished; sequencing all offspring individuals in the main population added with the 2L optimal individuals according to the sequence of fitness scores from low to high, and removing the 2L individuals with the highest fitness scores which are the last in sequencing in the main population; finally, the main population from which 2L offspring individuals are removed is taken as the main population which has completed genetic operation and individual updating; and then returns to D.
In the step F, an average crossing method is adopted for carrying out crossing operation; when genetic operation is carried out, three groups of parallel genetic algorithms are adopted; one main population and two auxiliary populations are mutually independent, different populations execute different genetic operations, and information exchange and transmission are not carried out among the three populations; and (4) performing genetic operation after setting different genetic operation parameters for the main population and the two auxiliary populations.
In the step F, the first auxiliary population is set to have a small variation probability and a large cross probability, that is, the variation probability of the first auxiliary population is the minimum variation probability of the three populations, and the cross probability of the first auxiliary population is the maximum cross probability of the three populations; the second auxiliary population is set to have a small cross probability and a large variation probability, that is, the cross probability of the second auxiliary population is the minimum cross probability of the three populations, and the variation probability of the second auxiliary population is the maximum variation probability of the three populations.
In step F1, when performing the crossover operation on the population, two contemporary individuals a and B are randomly selected from the non-elite individuals in the population as parent individuals, then each element in the contemporary individuals a and B is traversed, the elements at the positions corresponding to the contemporary individuals a and B are cross-exchanged by using the set population crossover probability threshold, the crossover operation of the population is completed, and the contemporary non-elite individuals generated after the population is crossed are obtained.
In the step F3, when performing variation operation on the population, traversing each element of the current-generation non-elite individual generated after each population is crossed, and mutating part of the elements of the current-generation non-elite individual through a set population variation threshold, thereby completing the variation operation on the population, and obtaining the population having completed the genetic operation and the offspring individuals corresponding to each population; the main population which has finished genetic operation and the offspring individuals in the two auxiliary populations are both composed of elite individuals which are not subjected to cross variation operation and non-elite individuals which are subjected to cross variation operation.
Main population crossing probability threshold value MCThe range of M is more than or equal to 40 percentCLess than or equal to 60 percent; first auxiliary population crossing probability threshold A1CThe range of (A) is 60%<A1CLess than or equal to 90 percent; second auxiliary population crossing probability threshold A2CThe range of A is more than or equal to 10 percent2C<40 percent; wherein M isCWherein M is an acronym for main, A1CAnd A2CA in (A) is an acronym for auxiliary,1 and 2 in the subscripts represent the first and second auxiliary population, and c in the subscripts is the acronym for crossover;
main population variation probability threshold MmIn the range of 0.0004. ltoreq.MmLess than or equal to 0.0006; first auxiliary population variation probability threshold A1mIn the range of 0.0001. ltoreq.A1m<0.0004; second auxiliary population variation probability threshold A2mIn the range of 0.0006<A2mLess than or equal to 0.0009; wherein M ismWherein M is an acronym for main, A1mAnd A2mA in (A) is an acronym for auxiliary, and 1 and 2 in the subscript represent the first and second auxiliary population, and m in the subscript is an acronym for mutagenesis.
The method is based on three group parallel genetic algorithms, can obtain the optimal solution meeting the requirements through multiple iterations, fully solves the problems of target network agnostic property and errors caused by Mel Frequency Cepstrum Coefficient (MFCC) conversion, has the advantages of high convergence speed, strong global search capability and high convergence efficiency, reduces the calculated amount and time for generating the antagonistic sample, and effectively improves the global search capability. According to the invention, through a three-population parallel mode, the problems that the population diversity is difficult to guarantee and the algorithm is easy to fall into local optimum and cannot obtain a global optimum solution due to only one population in the traditional genetic algorithm are effectively solved.
Drawings
FIG. 1 is a schematic flow chart of the present invention.
Detailed Description
The invention is described in detail below with reference to the following figures and examples:
as shown in FIG. 1, the method for generating a resistant audio sample based on three group parallel genetic algorithms of the present invention comprises the following steps:
a: initializing each original voice file in the voice data set into an original audio sample in a binary string form; then entering the step B;
in the present invention, the speech recognition model employs the speed _ commands official by tensorflow. And using the voice recognition model to perform recognition training on 10 groups of voice files after label classification, wherein the labels of each group of voice files are corresponding English words such as go or stop and the like, and each group of voice files comprises voice files of which different speakers pronounce the English words. In this embodiment, the number of each group of voice files includes not less than 1700.
B: selecting an original audio sample, and repeatedly adding Gaussian noise to the least significant bit of the random subset of the original audio sample for N times to obtain corresponding N generated audio samples, namely input samples; obtaining a main population and two auxiliary populations of each original audio sample according to the method, wherein each population consists of N generated audio samples; then entering step C;
in this embodiment, one original audio sample obtained in step a is selected, and gaussian noise is added to the least significant bit of the random subset of the original audio sample to obtain a generated audio sample corresponding to the original audio sample, that is, an input sample; then repeatedly adding Gaussian noise to the original audio sample for N times according to the method to obtain N generated audio samples corresponding to the original audio sample; respectively generating a main population and two auxiliary populations by using N populations composed of generated audio samples, wherein the individuals and the quantity in the main population and the two auxiliary populations are the same, and the number of the individuals in a single population is set to be 20-40; by using the method, a main population and two auxiliary populations of each original audio sample are finally obtained.
Traversing each bit element of an original audio sample in a binary string form when Gaussian noise is added, converting the currently traversed element by a set conversion probability, and converting 1 into 0 or converting 0 into 1; and finally, obtaining a generated audio sample corresponding to the original audio sample. In this example, the set conversion probability was 0.0001.
C: taking each input sample as an individual, and respectively calculating the fitness score of each individual; then entering step D;
in the invention, the Euclidean distance between the original audio sample and the generated audio sample is used as the fitness score of the individual. The farther the euclidean distance between the original audio sample and the generated audio sample is, the higher the fitness score of the individual is, and the lower the acoustic similarity between the two is.
D: sorting all individuals in the main population and the auxiliary population respectively according to the sequence of the fitness scores from low to high by utilizing the fitness scores; then entering step E;
and setting that N individuals exist in the main population and each auxiliary population, wherein the lower the fitness score is, the higher the acoustic similarity of the individuals is, and the higher the ranking is.
E: and sequentially carrying out classification recognition on all the sequenced individuals in the main population by using a speech recognition model, wherein in the classification recognition process:
if successfully attacked individuals appear in the main population, stopping classification and identification, and directly outputting the successfully attacked individuals as final antagonistic audio samples;
if the successful attacking individuals do not appear in the main population, whether the set iteration times are reached is judged:
if the iteration times are reached, the judgment process is exited, and the result of the failure of generating the antagonistic audio is output;
if the iteration times are not reached, entering the step F;
in the invention, two attack modes of non-target attack and target attack exist:
aiming at non-target attack, for a certain individual, the voice recognition model recognizes the generated audio sample corresponding to the individual as any other label, namely the attack is successful.
Aiming at the target attack, for a certain individual, the voice recognition model recognizes the generated audio sample corresponding to the individual as another specified label, namely the attack is successful.
Dividing individuals in the main population and the two auxiliary populations into elite individuals and non-elite individuals according to a set retention probability, and respectively carrying out genetic operations including cross operation and variation operation on the main population and the two auxiliary populations after setting different genetic operation parameters, wherein the elite individuals do not carry out the genetic operations; obtaining a main population and two auxiliary populations which have completed genetic operation, and offspring individuals corresponding to each population, wherein the offspring individuals consist of elite individuals which are not subjected to cross variation operation and non-elite individuals which are subjected to cross variation operation; then, sequencing all the offspring individuals in the corresponding population subjected to genetic operation according to the fitness score of each offspring individual obtained by calculation from low to high; then, selecting a plurality of optimal individuals from the two auxiliary populations with the completed genetic operations according to a set optimal individual selection threshold, adding the optimal individuals into the main population with the completed genetic operations, and replacing a plurality of offspring individuals with the highest fitness scores in a corresponding number in the main population with the completed genetic operations according to the fitness scores to obtain a main population with the completed genetic operations and the updated individuals; and then returns to step D.
In the invention, the cross operation is to take two current generation individuals A and B in the form of binary strings as parent individuals, and exchange elements at corresponding positions of the current generation individuals A and B to generate offspring individuals. The intersection operation is divided into single-point intersection, multi-point intersection, average intersection and the like according to different intersection modes. In the invention, an average crossing method is adopted for carrying out crossing operation, wherein the average crossing is a crossing operation mode that every bit corresponding to two parent individuals has certain probability to carry out exchange. The mutation operation is to make the partial elements in the current generation individuals in the form of binary strings after the cross operation is executed mutate, and change from 1 to 0 or from 0 to 1.
Because the traditional genetic algorithm only has one population and all genetic operators only aim at individuals of one population, the diversity of the population is difficult to guarantee, and the situation that the global optimal solution cannot be obtained due to the fact that the population is trapped into local optimal solution is easy to occur. The problem is not solved, and three group parallel genetic algorithms are adopted in the invention: one main population and two auxiliary populations are mutually independent, different populations execute different genetic operations, and information exchange and transmission are not carried out among the three populations. And (4) performing genetic operation after setting different genetic operation parameters for the main population and the two auxiliary populations.
In the invention, the first auxiliary population emphasizes the global search capability and is set as small variation probability and large cross probability, namely the variation probability of the first auxiliary population is the minimum variation probability of the three populations, and the cross probability of the first auxiliary population is the maximum cross probability of the three populations; the second auxiliary population emphasizes the local search capability and is set to have a small cross probability and a large variation probability, that is, the cross probability of the second auxiliary population is the minimum cross probability of the three populations, and the variation probability of the second auxiliary population is the maximum variation probability of the three populations.
In the invention, the step F comprises the following specific steps:
f1: respectively sequencing the current individuals in the main population and the two auxiliary populations by using an elite meaning method, taking the first P individuals as elite individuals and taking the rest N-P individuals as non-elite individuals according to a set retention probability, wherein the value range of P is 2-4, and the set retention probability is 90%.
Elite is an optimization of basic genetic algorithms. Since the crossover and mutation operators are performed randomly, the evolution may be performed in a good direction and a bad direction, so that the population may lose the best individuals during the evolution process, thereby reducing the fitness function. In order to prevent the optimal solution generated in the evolution process from being damaged by crossover and mutation, the optimal solution in each generation is copied into the next generation without change. Therefore, the invention uses the elite meaning method, reserves a plurality of elite individuals with the best current generation of the population with a high probability, does not carry out any operation on the elite individuals, and directly adds the elite individuals into the offspring population.
F2: performing cross operation on a main population and two auxiliary populations consisting of the current generation individuals by adopting an average cross method, wherein the elite individuals in the main population and the two auxiliary populations do not participate in the cross operation; then proceed to step F3;
when the population is subjected to cross operation, two contemporary individuals A and B are randomly selected from non-elite individuals in the population as parent individuals, then each element in the contemporary individuals A and B is traversed, elements at the corresponding positions of the contemporary individuals A and B are subjected to cross exchange according to a set population cross probability threshold value, population cross operation is completed, and contemporary non-elite individuals generated after population cross are obtained;
wherein, the cross probability threshold value M of the main species groupCThe range of M is more than or equal to 40 percentCLess than or equal to 60 percent; first auxiliary population crossing probability threshold A1CThe range of (A) is 60%<A1CLess than or equal to 90 percent; second auxiliary population crossing probability threshold A2CThe range of A is more than or equal to 10 percent2C<40%;
In this embodiment, a main population crossing probability threshold M is setCFirst auxiliary population crossing probability threshold A1CAnd a second auxiliary population crossing probability threshold A2c50%, 70% and 30%, respectively. Wherein M isCWherein M is an acronym for main, A1CAnd A2CA in (A) is an acronym of auxiliary, 1 and 2 in the subscript represent the first and second auxiliary population, and c in the subscript is an acronym of crossover;
f3: performing variation operation on the main population and the two auxiliary populations which are subjected to the cross operation, wherein elite individuals in the main population and the two auxiliary populations do not participate in the variation operation; then proceed to step F4;
when performing variation operation on the population, traversing each element of the current generation non-elite individual generated after each population is crossed, and mutating partial elements in the current generation non-elite individual through a set population variation threshold value to complete the variation operation of the population and obtain the population which has completed genetic operation and the offspring individuals corresponding to each population; the main population which has finished genetic operation and the offspring individuals in the two auxiliary populations are both composed of elite individuals which are not subjected to cross variation operation and non-elite individuals which are subjected to cross variation operation;
wherein, the main population mutation probability threshold value MmIn the range of 0.0004. ltoreq.MmLess than or equal to 0.0006; in the variation process, each element of the contemporary non-elite generated after population crossing has MmThe rate of variation of (a). First auxiliary population variation probability threshold A1mIn the range of 0.0001. ltoreq.A1m<0.0004; second auxiliary population variation probability threshold A2mIn the range of 0.0006<A2mLess than or equal to 0.0009. Wherein M ismWherein M is an acronym for main, A1mAnd A2mA in (A) is an acronym of auxiliary, 1 and 2 in the subscript represent the first and second auxiliary population, and m in the subscript is an acronym of mutation;
in this embodiment, the set main population mutation probability threshold MmA first auxiliary population variation probability threshold A1mAnd a second auxiliary population variation probability threshold A2m0.0005, 0.0001 and 0.0009, respectively.
F4: calculating the fitness score of each offspring individual in the main population and the two auxiliary populations after genetic operation by using the method in the step D; then, according to the sequence of fitness scores from low to high, respectively sequencing all offspring individuals in the main population and the two auxiliary populations which have completed the previous round of genetic operations; then, according to a set optimal individual selection threshold value, sequencing the auxiliary populations into front L individuals serving as optimal individuals, and adding 2L optimal individuals into the main population after the genetic manipulation is finished; sequencing all offspring individuals in the main population added with the 2L optimal individuals according to the sequence of fitness scores from low to high, and removing the 2L individuals with the highest fitness scores which are the last in sequencing in the main population; finally, the main population from which 2L offspring individuals are removed is taken as the main population which has completed genetic operation and individual updating; and then returning to D, wherein the value of L is 1 or 2.
After the step D is returned, sorting all the individuals in the main population and the auxiliary population which have finished the genetic operation and the individual updating respectively according to the sequence of the fitness scores from low to high by utilizing the fitness scores; then entering step E;
after step E, sequentially carrying out classification recognition on all the sequenced individuals in the main population by using a voice recognition model, wherein in the classification recognition process:
if successfully attacked individuals appear in the main population, stopping classification and identification, and directly outputting the successfully attacked individuals as final antagonistic audio samples;
if the successful attacking individuals do not appear in the main population, whether the set iteration times are reached is judged:
if the iteration times are reached, the judgment process is exited, and the result of the failure of generating the antagonistic audio is output;
if the iteration times are not reached, entering the step F;
repeating steps D to F according to the method; until finding the individual with successful attack and outputting the individual as a final antagonistic audio sample; or after the iteration times are reached, an individual with successful attack is not found, the judgment process is quitted and the result of generating the antagonistic audio frequency failure is output.

Claims (10)

1. A method for generating a antagonism audio sample based on three group parallel genetic algorithms is characterized by comprising the following steps:
a: initializing each original voice file in the voice data set into an original audio sample in a binary string form; then entering the step B;
b: selecting an original audio sample, and repeatedly adding Gaussian noise to the least significant bit of the random subset of the original audio sample for N times to obtain corresponding N generated audio samples, namely input samples; obtaining a main population and two auxiliary populations of each original audio sample according to the method, wherein each population consists of N generated audio samples; then entering step C;
c: taking each input sample as an individual, and respectively calculating the fitness score of each individual; then entering step D; the fitness score of the individual is the Euclidean distance between the original audio sample corresponding to the individual and the generated audio sample;
d: sorting all individuals in the main population and the auxiliary population respectively according to the sequence of the fitness scores from low to high by utilizing the fitness scores; then entering step E;
e: and sequentially carrying out classification recognition on all the sequenced individuals in the main population by using a speech recognition model, wherein in the classification recognition process:
if successfully attacked individuals appear in the main population, stopping classification and identification, and directly outputting the successfully attacked individuals as final antagonistic audio samples;
if the successful attacking individuals do not appear in the main population, whether the set iteration times are reached is judged:
if the iteration times are reached, the judgment process is exited, and the result of the failure of generating the antagonistic audio is output;
if the iteration times are not reached, entering the step F;
dividing individuals in the main population and the two auxiliary populations into elite individuals and non-elite individuals according to a set retention probability, and respectively carrying out genetic operations including cross operation and variation operation on the main population and the two auxiliary populations after setting different genetic operation parameters, wherein the elite individuals do not carry out the genetic operations; obtaining a main population and two auxiliary populations which have completed genetic operation, and offspring individuals corresponding to each population, wherein the offspring individuals consist of elite individuals which are not subjected to cross variation operation and non-elite individuals which are subjected to cross variation operation; then, sequencing all the offspring individuals in the corresponding population subjected to genetic operation according to the fitness score of each offspring individual obtained by calculation from low to high; then, selecting a plurality of optimal individuals from the two auxiliary populations with the completed genetic operations according to a set optimal individual selection threshold, adding the optimal individuals into the main population with the completed genetic operations, and replacing a plurality of offspring individuals with the highest fitness scores in a corresponding number in the main population with the completed genetic operations according to the fitness scores to obtain a main population with the completed genetic operations and the updated individuals; and then returns to step D.
2. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 1, characterized in that: in the step A, the voice recognition model adopts the speed _ commands of the tensoflow official, 10 groups of voice files classified by the labels are recognized and trained by using the voice recognition model, the labels of each group of voice files are corresponding English words, and each group of voice files comprise voice files of which the English words are spoken by different speakers.
3. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 1, characterized in that: in the step B, the individuals and the quantity in the main population and the two auxiliary populations are the same, and the number of the individuals in a single population is set to be 20-40.
4. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 1, characterized in that: in the step B, when gaussian noise is added, each bit element of the original audio sample in the form of a binary string is traversed, and the currently traversed element is converted by a set conversion probability, from 1 to 0 or from 0 to 1.
5. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 1, wherein said step F comprises the following specific steps:
f1: respectively sequencing the current individuals in the main population and the two auxiliary populations by using an elite sense method, taking the first P individuals as elite individuals according to a set retention probability, and taking the remaining N-P individuals as non-elite individuals;
f2: performing cross operation on a main population and two auxiliary populations consisting of the current generation individuals by adopting an average cross method, wherein the elite individuals in the main population and the two auxiliary populations do not participate in the cross operation; then proceed to step F3;
f3: performing variation operation on the main population and the two auxiliary populations which are subjected to the cross operation, wherein elite individuals in the main population and the two auxiliary populations do not participate in the variation operation; then proceed to step F4;
f4: calculating the fitness score of each offspring individual in the main population and the two auxiliary populations after genetic operation by using the method in the step D; then, according to the sequence of fitness scores from low to high, respectively sequencing all offspring individuals in the main population and the two auxiliary populations which have completed the previous round of genetic operations; then, according to a set optimal individual selection threshold value, sequencing the auxiliary populations into front L individuals serving as optimal individuals, and adding 2L optimal individuals into the main population after the genetic manipulation is finished; sequencing all offspring individuals in the main population added with the 2L optimal individuals according to the sequence of fitness scores from low to high, and removing the 2L individuals with the highest fitness scores which are the last in sequencing in the main population; finally, the main population from which 2L offspring individuals are removed is taken as the main population which has completed genetic operation and individual updating; and then returns to D.
6. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 1, characterized in that: in the step F, an average crossing method is adopted for carrying out crossing operation; when genetic operation is carried out, three groups of parallel genetic algorithms are adopted; one main population and two auxiliary populations are mutually independent, different populations execute different genetic operations, and information exchange and transmission are not carried out among the three populations; and (4) performing genetic operation after setting different genetic operation parameters for the main population and the two auxiliary populations.
7. The method of claim 6, wherein the method comprises: in the step F, the first auxiliary population is set to have a small variation probability and a large cross probability, that is, the variation probability of the first auxiliary population is the minimum variation probability of the three populations, and the cross probability of the first auxiliary population is the maximum cross probability of the three populations; the second auxiliary population is set to have a small cross probability and a large variation probability, that is, the cross probability of the second auxiliary population is the minimum cross probability of the three populations, and the variation probability of the second auxiliary population is the maximum variation probability of the three populations.
8. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 5, wherein: in step F1, when performing the crossover operation on the population, two contemporary individuals a and B are randomly selected from the non-elite individuals in the population as parent individuals, then each element in the contemporary individuals a and B is traversed, the elements at the positions corresponding to the contemporary individuals a and B are cross-exchanged by using the set population crossover probability threshold, the crossover operation of the population is completed, and the contemporary non-elite individuals generated after the population is crossed are obtained.
9. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 5, wherein: in the step F3, when performing variation operation on the population, traversing each element of the current-generation non-elite individual generated after each population is crossed, and mutating part of the elements of the current-generation non-elite individual through a set population variation threshold, thereby completing the variation operation on the population, and obtaining the population having completed the genetic operation and the offspring individuals corresponding to each population; the main population which has finished genetic operation and the offspring individuals in the two auxiliary populations are both composed of elite individuals which are not subjected to cross variation operation and non-elite individuals which are subjected to cross variation operation.
10. The method for generating antagonistic audio samples based on three group parallel genetic algorithms according to claim 7, wherein: main population crossing probability threshold value MCThe range of M is more than or equal to 40 percentCLess than or equal to 60 percent; first auxiliary population crossing probability threshold A1CIn the range of 60% < A1CLess than or equal to 90 percent; second auxiliary population crossing probability threshold A2CThe range of A is more than or equal to 10 percent2CLess than 40 percent; wherein M isCWherein M is an acronym for main, A1CAnd A2CA in (A) is an acronym of auxiliary, 1 and 2 in the subscript represent the first and second auxiliary population, and c in the subscript is an acronym of crossover;
main population variation probability threshold MmIn the range of 0.0004. ltoreq.MmLess than or equal to 0.0006; first auxiliary population variation probability threshold A1mIn the range of 0.0001. ltoreq.A1m< 0.0004; second auxiliary population variation probability threshold A2mIn the range of 0.0006 < A2mLess than or equal to 0.0009; wherein M ismWherein M is an acronym for main, A1mAnd A2mA in (A) is an acronym of auxiliary, and 1 and 2 in the subscript represent the first and second auxiliary population, respectively, and m in the subscript is an acronym of mutation.
CN202210026272.3A 2022-01-11 2022-01-11 Antagonistic audio sample generation method based on three-group parallel genetic algorithm Pending CN114373467A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210026272.3A CN114373467A (en) 2022-01-11 2022-01-11 Antagonistic audio sample generation method based on three-group parallel genetic algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210026272.3A CN114373467A (en) 2022-01-11 2022-01-11 Antagonistic audio sample generation method based on three-group parallel genetic algorithm

Publications (1)

Publication Number Publication Date
CN114373467A true CN114373467A (en) 2022-04-19

Family

ID=81144077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210026272.3A Pending CN114373467A (en) 2022-01-11 2022-01-11 Antagonistic audio sample generation method based on three-group parallel genetic algorithm

Country Status (1)

Country Link
CN (1) CN114373467A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578611A (en) * 2023-05-16 2023-08-11 广州盛成妈妈网络科技股份有限公司 Knowledge management method and system for inoculated knowledge

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578611A (en) * 2023-05-16 2023-08-11 广州盛成妈妈网络科技股份有限公司 Knowledge management method and system for inoculated knowledge
CN116578611B (en) * 2023-05-16 2023-11-03 广州盛成妈妈网络科技股份有限公司 Knowledge management method and system for inoculated knowledge

Similar Documents

Publication Publication Date Title
CN108520268B (en) Black box antagonistic attack defense method based on sample selection and model evolution
Sainath et al. Auto-encoder bottleneck features using deep belief networks
Ding et al. Autospeech: Neural architecture search for speaker recognition
CN111653275B (en) Method and device for constructing voice recognition model based on LSTM-CTC tail convolution and voice recognition method
CN112216273A (en) Sample attack resisting method for voice keyword classification network
CN112560596B (en) Radar interference category identification method and system
Sun et al. Early exiting with ensemble internal classifiers
CN114373467A (en) Antagonistic audio sample generation method based on three-group parallel genetic algorithm
CN111785274B (en) Black box countermeasure sample generation method for voice recognition system
Dalila et al. Multimodal score-level fusion using hybrid ga-pso for multibiometric system
Zhao et al. Genetic optimization of radial basis probabilistic neural networks
CN113111180B (en) Chinese medical synonym clustering method based on deep pre-training neural network
CN112487933B (en) Radar waveform identification method and system based on automatic deep learning
Cui et al. An adaptive authentication based on reinforcement learning
CN111767949A (en) Multi-task learning method and system based on feature and sample confrontation symbiosis
CN107170442A (en) Multi-parameters optimization method based on self-adapted genetic algorithm
CN114584337A (en) Voice attack counterfeiting method based on genetic algorithm
Shekhar et al. Exploring adversaries to defend audio captcha
CN115640845A (en) Method for generating few-category samples of neural network of graph based on generation of confrontation network
CN115965086A (en) Non-perfect information game strategy enhancement method based on small sample opponent modeling
CN113449865B (en) Optimization method for enhancing training artificial intelligence model
CN113948067B (en) Voice countercheck sample repairing method with hearing high fidelity characteristic
CN115270891A (en) Method, device, equipment and storage medium for generating signal countermeasure sample
CN112132059B (en) Pedestrian re-identification method and system based on depth conditional random field
Chandra et al. A memetic framework for cooperative coevolution of recurrent neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination