CN108268638B

CN108268638B - Distributed implementation method for generating countermeasure network based on Spark framework

Info

Publication number: CN108268638B
Application number: CN201810047494.7A
Authority: CN
Inventors: 王万良; 张兆娟; 高楠; 吴菲; 李卓蓉; 赵燕伟
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2018-01-18
Filing date: 2018-01-18
Publication date: 2020-07-17
Anticipated expiration: 2038-01-18
Also published as: CN108268638A

Abstract

A distributed realization method for generating a countermeasure network based on a Spark framework adopts the following steps: the main node randomly initializes the network configuration and generates a parameter set; directly uploading the data file to a distributed file system; constructing a flexible distributed data set of Spark; for each training data subset RDD, the master node transmits the parameters, configuration, update state of the network to all slave nodes; each slave node trains partial data and updates parameters; and (4) parallelly training in a data parallel mode to generate an antagonistic network model until Nash equilibrium is reached, and finishing training. The method is distributed training for generating a confrontation network model when facing mass data. Meanwhile, the advantage that Spark is based on a memory computing frame and is suitable for recursive computation is fully utilized, training for generating a confrontation network model can be accelerated, efficiency is improved, and better expandability is achieved.

Description

Distributed implementation method for generating countermeasure network based on Spark framework

Technical Field

The invention relates to the field of artificial intelligence technology analysis methods in big data environments, in particular to a distributed implementation method for generating a countermeasure network based on a big data frame Spark.

Background

The artificial intelligence is an important tool for analyzing big data, breakthrough progress is achieved in recent years, and with the improvement of mass data resources and computing power, the power for the development of artificial intelligence technology and application in the big data era can be accelerated. Deep learning is one of the mainstream machine learning methods at present, shows excellent performance in many aspects, and is widely applied to the fields of pattern recognition, such as images, voice and the like. The deep models and their learning algorithms are influenced by brain structure and information processing heuristic mechanism, each deep model has a deep structure including many non-linear hidden layers and hierarchical feature abstraction mechanism.

At present, a depth model based on generation of a countermeasure network is a popular research direction of artificial intelligence, which is an important step in the development of high-grade artificial intelligence, mainly adopts the idea of game theory, consists of a generator and a discriminator, and trains in a countermeasure mode until Nash equilibrium is reached. With the continuous expansion of data size and dimension, the input and output of the model may appear exponential growth, which directly results in the increase of complexity and computational complexity of the deep learning model. The existing deep learning framework usually sets a single cluster for deep learning, and a deep learning process creates a plurality of programs, so that large data sets are transmitted among different independent clusters, and the complexity of the system and the end-to-end learning delay are increased.

The training for generating the countermeasure network is a calculation-intensive application, parameters such as thresholds and weights of all hidden layers in the model training process need to be subjected to a large amount of iterative computation, and when training data are increased, the training process of the whole model is very time-consuming. The big data oriented generation countermeasure network model is difficult to be put into only a single computer for operation, and needs to be deployed for distributed implementation. The big data frame Hadoop and Spark support distributed computing, and the distributed file system (HDFS) supports distributed file storage, so that a deep learning algorithm and a big data platform are combined, and parallel operation of the distributed frame is applied to process big data, so that the complexity of big data analysis based on deep learning can be effectively reduced, and the defect of low computing efficiency caused by transmission of a large amount of data among a plurality of clusters is avoided. Therefore, the invention provides a Spark framework-based distributed implementation method for generating a countermeasure network, which can accelerate the training of a depth model and improve the efficiency.

Disclosure of Invention

In order to solve the defects of the prior art, a distributed realization method for generating the countermeasure network based on the Spark framework is provided, the method references the Spark of the large data distributed programming framework, fully utilizes the characteristic that the Spark is very suitable for recursive computation, carries out parallelization design on the generated countermeasure network on a distributed cluster, can accelerate the training of generating the countermeasure network model, improves the efficiency, and has better expandability.

In order to solve the technical problems, the invention adopts the following technical scheme:

a distributed realization method for generating a countermeasure network based on a Spark framework adopts the following steps:

step 1: the method comprises the steps that a main node randomly initializes network configuration and generates parameter sets, and relevant parameters for generating a countermeasure network model are determined;

step 1.1: generating a confrontation network model distributed training objective function:

normalizing the real data calibrated for generating the confrontation network model training so that each value is in the range [0,1], wherein the training of the generator is to learn the real data distribution as much as possible; the arbiter is trained to distinguish whether the samples are from a true data distribution or a generator, the training to generate the antagonistic network model alternates between the training decisions of the generator to generate the samples and the actual samples, the weights of the generator are updated by keeping the arbiter weights constant, and the errors are back-propagated by the arbiter;

the objective function for the training of the antagonistic network model is generated as follows:

where differentiable functions D and G represent the discriminator and the generator, respectively, x represents the input real data, and P represents_data(x) Representing the distribution of the real data set, D (x) representing the probability that x is derived from real data rather than generated data, z representing a random variable, p_z(z) represents prior distribution, G (z) is a sample generated by G which is as compliant as possible with true data distribution, and E (eta) represents calculated expectation;

step 1.2: generating a confrontation network model training distributed parallelization training:

training several groups of generation countermeasure networks simultaneously, and enabling each discriminator to periodically discriminate samples generated by other generators generating the countermeasure networks;

step 2: directly uploading mass data files to a distributed file system (HDFS) for storage;

and step 3: constructing a flexible distributed data set (RDD) of Spark:

dividing raw input data into a plurality of batches, and inputting a batch of samples instead of a single sample at a time; ensuring that samples that are too similar are not placed in the same batch by a similarity measure; caching the data into an RDD cache of a Spark platform, and then broadcasting the data into Spark working nodes participating in generation of distributed processing of a countermeasure network model algorithm;

step 4, aiming at each training data subset RDD, the master node transmits parameters, configuration and network update states to all slave nodes;

and 5: each slave node trains partial data and updates parameters; after each iteration, all parameters and states return to the master node, and the master node integrates the parameters of all slave nodes and averages the parameters, so that the next iteration training is started; the average is transmitted to all slave nodes and then further training is carried out; training in parallel in a data parallel mode to generate an confrontation network model until Nash equilibrium is reached, and finishing training;

step 5.1: generating a master node workflow of the countermeasure network model:

the master node firstly carries out data blocking on a data set for training by referring to the step 3, then broadcasts parameters to the slave nodes, the slave nodes carry out training of partial models and then return the parameters to the master node, the master node synthesizes the parts learned by the slave nodes to generate a confrontation network model, and averages the learned parameters to reconstruct a master control depth model (called as Reduce step), then broadcasts the averaged parameters to each slave node, and then starts the next iteration according to the updated parameters until the next iteration reaches nash equilibrium, and the training of the generated confrontation network model is completed;

5.2: generating a countermeasure network model slave node workflow:

each slave node receives the parameters, network, configuration and other information broadcast by the master node, reads the created data set partition data RDD, then calculates the gradient, each working node calculates the updating of the gradient based on the RDD and updates the parameters until the gradient converges, and the slave node transmits part of the calculation parameters obtained by generating the antagonistic network model learning to the master node.

The basic idea of the generation of the confrontation network training is parameter averaging, input data are divided into a plurality of subsets in a Spark cluster, a plurality of copies of the generated confrontation network model are created, the subsets are trained simultaneously, and intermediate results are cached in a memory to improve the training speed of the model. And after the slave node finishes training each copy, transmitting the calculated parameter adjustment value to the master node, broadcasting a new parameter to the slave node by the master node, and carrying out next training until Nash equilibrium is reached. Specifically, the generation of confrontation network model learning mainly comprises two main steps: gradient calculation and parameter update. In the first step, the learning algorithm independently traverses all the batch data to perform the update of the calculated gradients, including the rate of change, model parameters, etc. In a second step, the parameters of the model are updated by averaging the calculated gradients over all data blocks. The iterative computation flow of the deep learning on spark comprises two parts: data set chunking and generation confrontation network model distributed parallel learning.

The invention is innovated aiming at the training mode of generating the confrontation network model when the data scale is large, and is different from the traditional method based on a single machine system. Meanwhile, different from a distributed system based on a Hadoop distributed computing framework MapReduce, the defects that the MapReduce is slow in disk reading and writing speed and is not suitable for iterative computation are overcome, the advantages that Spark is based on a memory computing framework and is suitable for recursive computation are fully utilized, training for generating a confrontation network model can be accelerated, efficiency is improved, and good expandability can be achieved.

Drawings

Fig. 1 is a countermeasure network model component composition based on Spark framework generation.

FIG. 2 generates distributed parallelization against network data and models.

Fig. 3 is a countermeasure network model training flow based on Spark framework generation.

Fig. 4 generates a master node training workflow against a network model.

Fig. 5 generates a countermeasure network model training slave workflow.

Detailed Description

Referring to the attached drawings, Spark based on the method is a framework based on memory calculation and starting from iterative multi-batch processing, and a Spark cluster consists of two main parts: one master node and four slave nodes. The master node initializes a Sparkdriver instance to manage the partially generated countermeasure network learning model running on many slave nodes. In each iteration of the training of the generation confrontation network model, the slave nodes feed back part of the calculation parameters obtained by model learning to the master node.

Training for generating a confrontation network model is carried out based on a Spark distributed computing framework, and the whole distributed implementation main process is as follows: when an countermeasure network model is generated and begins to train, firstly mass data is uploaded to an HDFS (Hadoop distributed File System), when an Action acts on an RDD (RDD), a countermeasure network model training Job (Job) is generated, and the Job is submitted by calling a SparkContext interface in a Spark platform; after receiving Job, the SparkContext defines how to store the calculation result after the Task (Task) is successfully executed, then submits Job to the eventProcessActor, then waits for the eventProcessActor to inform the Job of finishing the execution, and returns the calculation result to the main node after finishing the execution; after the eventProcessActor receives an event of submitting Job, a Task scheduler generates a Task required by the training of the confrontation network model according to Stage calculation, and distributes a plurality of training tasks to each slave node to start parallel calculation; and reporting the state and the result to an eventProcessActor after each Task is executed, counting whether all the tasks of the Job are completed or not by the eventProcessActor, if so, informing that the Job submitted by the SparkContext is finished, and returning the calculation result to the Masternod by the SparkContext. The main content specifically comprises the following steps:

step 1: the master node randomly initializes the network configuration and generates a set of parameters that determine the relevant parameters for generating the countermeasure network model. In the training of the standard generation confrontation network model, the parameters of the generator and the discriminator are updated by using a gradient descent method on the premise that the parameters of other models are fixed. The training for generating the confrontation network model is a process of finding the Nash equilibrium of the binary zero sum game, most methods for training and generating the confrontation network model simultaneously minimize cost functions of two parties through a gradient descent method, but the gradient descent method is difficult to ensure convergence because the cost function for generating the confrontation network model is non-convex and the parameter space dimension is extremely high.

Step 1.1: generating a confrontation network model distributed training objective function

Normalizing the real data calibrated for generating the confrontation network model training so that each value is in the range [0,1], wherein the training of the generator is to learn the real data distribution as much as possible; the discriminators are trained to distinguish whether the samples are from a true data distribution or a generator, the training to generate the antagonistic network model alternates between the training discrimination sum of the generator generated samples and the actual samples, the weights of the generator are updated by keeping the discriminators weights constant, and errors are back-propagated by the discriminators. Generating an optimization problem against network model training is a minimization-maximization problem, and the objective function can be described as follows:

wherein differentiable functions D and G represent the discriminator and the generator, respectively, x represents the input real data, P_data(x) Representing the distribution of the real data set, D (x) representing the probability that x is derived from real data rather than generated data, z representing a random variable, p_z(z) denotes the prior distribution, G (z) is a sample generated by G that best fits the true data distribution, and E (.) denotes the calculated expectation.

Generating the antagonistic network estimate is the ratio of the two probability distribution densities, and for the learning process of generating the antagonistic network, we need to train the discriminative model D to maximize the accuracy with which discriminative data is derived from real data or from a pseudo data distribution G (z), while we need to train the generative model G to minimize log (1-D (G (z)). The global optimal solution is reached if and only if the true dataset distribution probability and the generator data distribution probability are equal. When the training generates the confrontation network, in the same round of parameter updating, the parameters of D are generally updated k times and then the parameters of G are updated 1 time.

Step 1.2: generating antagonistic network model training distributed parallelized training

As shown in fig. 2, the distributed parallelization for generating the countermeasure network includes parallelization of data and models, that is, instead of having the discriminators perform countermeasure training with fixed generators (i.e., generators in the same countermeasure generator), several sets of the generated countermeasure networks are trained simultaneously and each discriminator is made to periodically discriminate samples generated by other generators generating the countermeasure network, so that any extended model of the generated countermeasure network is suitable for generating the countermeasure network, and thus different generated countermeasure network extended models can be allocated to different node computing resources in the cluster for parallel computing. The parallel confrontation training can increase the number of the modes processed by the discriminator, thereby effectively avoiding the problem of mode collapse.

Step 1.3: setting method for generating initial parameters of confrontation network model

The idea of adjusting parameters is mainly obtained by continuously verifying and analyzing the model effect, and the commonly adopted parameter setting method is as follows:

A. the learning rate. In the generation of the confrontation network learning model, the learning rate has no setting rule for reference, generally speaking, the model learning rate is not suitable to be set too large, otherwise, the model precision is easy to be too low or divergent, and the updating amount of the weight is usually about one thousandth of the weight.

B. Weights and initial values of the bias. In general, the connection weights may be initialized to a random number from a normal distribution N (0,0.01), the bias of the hidden unit and the bias Q of the 1 st visible unit, typically to 0.

Step 2: directly uploading real data generated by the confrontation network model for training to a distributed file system (HDFS) for storage;

and step 3: a distributed data set RDD of Spark is constructed. The original input data is divided into a plurality of batches, each time a batch of samples is input instead of a single sample, other samples in the same batch can be used as auxiliary information of the sample. In addition, the samples which are too similar are not placed in the same batch through the similarity measurement, the diversity of the samples in each training process can be guaranteed, and then the mode collapse is effectively avoided. The above method allows generators in a production confrontation network to process batches of samples at a time rather than individual samples. The data are cached in the RDD cache of the Spark platform and then broadcast to Spark working nodes participating in the distributed processing of the generation countermeasure network model algorithm, and the acceleration of the training of the depth model is very critical on the basis of the thought of the RDD, because the memory-based method has less read-write delay compared with a disk. Each Spark working node is a computer generally, and each node can only read locally when needing the data for calculation, and does not need to request the data any more, so that the time required for data transmission between the nodes is saved. Particularly, in a large-scale data set, multiple operations need to be performed on the distributed data block RDD, after the RDD is cached, calculation can be performed when the data subset RDD is called for the first time, and subsequently, if the data subset RDD is called again, values can be directly taken in a memory without recalculation.

Step 3.1: : defining InputSpilt of a Spark distributed computing framework, realizing a subclass of Java class InputSpilt according to API provided by Spark, customizing a subclass of Java class InputFormat class, and realizing a Java class getSpilts method of the subclass;

step 3.2: the SparkContext creates an RDD according to the file/directory and the optional fragment, firstly, an application program creates an instance of the SparkContext, if the instance is sc, and creates the RDD by using the created sc instance;

step 3.3: creating Hadoop RDD according to InputFormat and the like in Hadoop configuration;

step 3.4: defining a mode for reading InputSpilt data, and acquiring a Recordreader from an InputFormat according to Hadoop configuration and fragmentation to read data;

step 3.5: customizing a subclass of java class Recordreader, wherein the subclass is used for customizing a mode of reading fragment data and constructing a Spark distributed data set by combining the customized InputFormat in the step 3;

and 4, as shown in the figure 3, aiming at each training data subset RDD, the master node transmits the parameters, the configuration and the update state of the network to all the slave nodes. Firstly, completing distributed conversion in the Transformation of a distributed data set RDD, realizing map conversion of the RDD, and sequencing the output result of conversion after each map is input according to key. The RDD is partitioned according to each recorded key, data with the same key are stored on the same node to ensure that two data sets can run efficiently in join, content protocols in the data sets are specific numerical values, and the final result is stored in a sequence File file under an HDFS specified path and used for subsequently generating a confrontation network depth model for big data analysis. And then processing in Action, forming a DAG by the dependency relationship between the RDDs, dividing the generated confrontation network training Job into a plurality of Stages, wherein each stage comprises a plurality of times of iterative calculations based on MapReduce, the output of the previous iteration is used as the input of the next iteration, and each iteration trains part to generate the confrontation network model. After Stage is submitted, the Task scheduler calculates the required Task according to Stage and submits the Task to Worker in the corresponding Spark cluster.

Step 4.1: through a series of Transformation operations, the original RDD is converted into other types of RDD.

The RDD conversion process is Hadoop RDD → MappedRDD → FlatMappedRDD → MappedRDD → PairRDDFunction → ShuffleRDD → MapPartionsRDD. Generating HadoopRDD, generating MappedRDD through Map operation, converting the original MappedRDD into FlatMappedRDD by using the FlatMap, converting the FlatMappedRDD into the MappedRDD by using the generated key value pair, implicitly converting the MairedRDunons into PairRDUnctions in the redeceByKey, calling combeneByKey by the redeceByKey, converting the PairRDUnctions into ShuffleRDD, and after calling mapPartitionsWithContext, converting the ShuffleRDD into the MapParitionsRDD.

Step 4.2: job generation and operation

When an Action acts on the post-conversion RDD, the Action is submitted as a Job, and the runJob method of SparkContext is called, the call path is roughly as follows: runjob → dagschedule runjob → sumitojob; in DAGSSchedule, sumitJob creates an event of JobSummitted and sends the event to an embedded event processactor; the evenProcesssActor calls a processEvent processing function after receiving JobSummitted; job is converted into Stage, finalStage is generated and submitted to run, and submittage is called; calculating the dependency relationship between stages in Submitstage, wherein the dependency relationship is divided into wide dependency and narrow dependency; if the current Stage is found to have no dependency in the calculation or all the dependencies are already prepared, submitting the Task; submitting Task is completed by calling submitMissingTasks function; the Task scheduler manages the distribution of Task tasks, and submitMissingtasks mainly comprise the steps of calling the Task scheduler, namely submitTasks;

step 4.3: generating a countermeasure network model training corresponding Task flow analysis

All slave nodes execute the same Map task in parallel based on different partitions of the data blocks, and a generation countermeasure network model learning task with high computational overhead is divided into a plurality of parallel subtasks. An initialization process: firstly, sparkConf is generated according to the initialized import parameters, and then sparkEnv is created according to the sparkConf, wherein the sparkEnv mainly comprises 4 key components: BlockManager, MapOutputTracker, ShuffleFatcher, and ConnectionManager; creating a task scheduler, selecting a scheduler backup nd, and at the same time

Starting a task scheduler; specifically, a TaskScheduler instance is used as an entry parameter to create a DAGSScheduler; and creating a corresponding backkend according to the current operation mode in the tasskschedulermiprl, wherein the backkend receives a receiveoffer event transmitted by the tasskschedulermiprl. The final logic processing occurs in TaskRunner, which is an executive, packages the operation result into MapStatus, and then feeds back to DAGSSchedule through a series of internal information transmission.

And 5: the learning results of the depth model depend on the smaller instances and datasets associated with this learning problem. In Spark distributed cluster, each slave node trains part of the data and updates the parameters. After one iteration, all parameters and states return to the master node, and the master node performs parameter averaging and updates the state of the training neural network. The average is then passed to all slave nodes for further training. And (4) generating an antagonistic network model through parallel training in a data parallel mode, and repeating until the antagonistic network model reaches Nash equilibrium.

Step 5.1: generating a countermeasure network model host node workflow

As shown in fig. 4, the master node firstly performs data partitioning on the data set for training with reference to step 3, then broadcasts parameters to the slave nodes, the slave nodes perform training of partial models and then return the parameters to the master node, the master node synthesizes the parts learned by the slave nodes to generate a countermeasure network model, and averages the learned parameters to reconstruct a master depth model (also called Reduce step). And broadcasting the averaged parameters to each slave node, and starting the next iteration according to the updated parameters until the parameters reach Nash equilibrium, so as to complete the training of the generated countermeasure network model.

Step 5.2: generating a countermeasure network model slave node workflow

The slave node part model training process comprises the following steps that each slave node receives parameters, networks, configuration and other information broadcasted by the master node, reads created data set partition data RDD, then calculates gradient, each working node calculates updating of the gradient based on the RDD (also called Map step), updates parameters until the gradient is converged, and the slave nodes transmit part of the calculation parameters obtained by resisting network model learning to the master node.

Claims

1. A distributed realization method for generating a countermeasure network based on a Spark framework adopts the following steps:

step 1: the method comprises the steps that a main node randomly initializes network configuration and generates parameter sets, and relevant parameters for generating a countermeasure network model are determined; in the training of generating the confrontation network model, the parameters of the generator and the discriminator are updated by using a gradient descent method on the premise that the parameters of other models are fixed;

where differentiable functions D and G represent the discriminator and the generator, respectively, x represents the input real data, and P represents_data(x) Representing the distribution of the real data set, D (x) representing the probability that x is derived from real data rather than generated data, z representing a random variable, p_z(z) denotes a prior distribution, G (z) is_GGenerating samples which are as compliant with real data distribution as much as possible, wherein E (means) represents a calculation expected value;

the distributed parallelization of the generated countermeasure network comprises parallelization of data and models, a plurality of groups of generated countermeasure networks are trained simultaneously, each discriminator is made to periodically discriminate samples generated by other generators generating the countermeasure networks, the distributed parallelization of the generated countermeasure networks is suitable for any extension model generating the countermeasure networks, and therefore different generated countermeasure network extension models are distributed to different node computing resources in a cluster for parallel computing;

The idea of adjusting parameters is mainly obtained by continuously verifying and analyzing the effect of the model, and the adopted parameter setting method comprises the following steps:

A. a learning rate; in the generation of the confrontation network learning model, the learning rate has no setting rule for reference, so that the updating amount of the weight is about one thousandth of the weight;

B. initial values of weights and offsets; the connection weight can be initialized to 0 with a random number from a normal distribution N (0,0.01), the bias of the hidden unit and the bias Q of the 1 st visible unit;

and step 3: constructing a flexible Distributed data set (RDD) of Spark:

step 3.1: defining InputSpilt of a Spark distributed computing framework, realizing a subclass of Java class InputSpilt according to API provided by Spark, customizing a subclass of Java class InputFormat class, and realizing a Java class getSpilts method of the subclass;

step 3.2: the sparkContext creates RDD according to the file/directory and the optional fragment, and firstly, an application program creates an instance of the sparkContext;

step 3.3: creating Hadoop RDD according to the InputFormat in Hadoop configuration;

and 4, aiming at each training data subset RDD, the master node transmits the parameters, the configuration and the update state of the network to all the slave nodes:

step 4.1: through Transformation operation, the original RDD is converted into other types of RDD:

firstly, completing distributed conversion in the Transformation of a distributed data set RDD, realizing map conversion of the RDD, and sequencing the output result of conversion after each map is input according to key; the RDD is partitioned according to the key of each record, and data with the same key can be stored on the same node, so that the two data sets can run efficiently in join; the content protocol in the data set is a specific numerical value, and the final result is stored in a sequence File file under an HDFS specified path and is used for subsequently generating a countermeasure network depth model for big data analysis;

step 4.2: job generation and operation:

processing in Action, wherein a dependency relationship between RDDs forms a DAG, the generated confrontation network training Job is divided into a plurality of Stages, each stage comprises a plurality of times of iteration calculation based on MapReduce, the output of the previous iteration is used as the input of the next iteration, and each iteration trains part to generate a confrontation network model;

step 4.3: generating a Task flow analysis corresponding to the confrontation network model training:

after the Stage is submitted, a Task scheduler calculates a required Task according to the Stage and submits the Task to a Worker in a corresponding Spark cluster; all slave nodes execute the same Map task in parallel based on different partitions of the data block, and a generation confrontation network model learning task with high calculation cost is divided into parallel subtasks;

and 5: in the Spark distributed cluster, each slave node trains partial data and updates parameters; after one iteration, all parameters and states return to the master node, and the master node averages the parameters and updates the state of the training neural network; then, the average value is transmitted to all slave nodes, and further training is carried out; training in parallel in a data parallel mode to generate an confrontation network model, repeating until the confrontation network model reaches Nash equilibrium, and finishing training;

the master node firstly carries out data blocking on a data set for training by referring to the step 3, then broadcasts parameters to the slave nodes, the slave nodes carry out training on partial models and then return the parameters to the master node, the master node synthesizes the parts learned by the slave nodes to generate a confrontation network model, and averages the learned parameters to reconstruct a master control depth model (called a Reduce step), then broadcasts the averaged parameters to each slave node, and then starts the next iteration according to the updated parameters until the next iteration reaches nash equilibrium, and the training of the generated confrontation network model is completed;

step 5.2: generating a countermeasure network model slave node workflow:

each slave node receives parameters, network and configuration related information broadcasted by the master node, reads the created data set partition data RDD, then calculates the gradient, and each working node calculates the update of the gradient based on the RDD and updates the parameters until the gradient converges; when the confrontation network is generated by training, in the same round of parameter updating, the parameters of the confrontation model D are updated k times, and then the parameters of the generated model G are updated 1 time; the slave node transmits the parameters which partially generate the counternetwork model learning to the master node.