CN108268638B - Distributed implementation method for generating countermeasure network based on Spark framework - Google Patents

Distributed implementation method for generating countermeasure network based on Spark framework Download PDF

Info

Publication number
CN108268638B
CN108268638B CN201810047494.7A CN201810047494A CN108268638B CN 108268638 B CN108268638 B CN 108268638B CN 201810047494 A CN201810047494 A CN 201810047494A CN 108268638 B CN108268638 B CN 108268638B
Authority
CN
China
Prior art keywords
data
training
parameters
network model
generating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810047494.7A
Other languages
Chinese (zh)
Other versions
CN108268638A (en
Inventor
王万良
张兆娟
高楠
吴菲
李卓蓉
赵燕伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201810047494.7A priority Critical patent/CN108268638B/en
Publication of CN108268638A publication Critical patent/CN108268638A/en
Application granted granted Critical
Publication of CN108268638B publication Critical patent/CN108268638B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

A distributed realization method for generating a countermeasure network based on a Spark framework adopts the following steps: the main node randomly initializes the network configuration and generates a parameter set; directly uploading the data file to a distributed file system; constructing a flexible distributed data set of Spark; for each training data subset RDD, the master node transmits the parameters, configuration, update state of the network to all slave nodes; each slave node trains partial data and updates parameters; and (4) parallelly training in a data parallel mode to generate an antagonistic network model until Nash equilibrium is reached, and finishing training. The method is distributed training for generating a confrontation network model when facing mass data. Meanwhile, the advantage that Spark is based on a memory computing frame and is suitable for recursive computation is fully utilized, training for generating a confrontation network model can be accelerated, efficiency is improved, and better expandability is achieved.

Description

Distributed implementation method for generating countermeasure network based on Spark framework
Technical Field
The invention relates to the field of artificial intelligence technology analysis methods in big data environments, in particular to a distributed implementation method for generating a countermeasure network based on a big data frame Spark.
Background
The artificial intelligence is an important tool for analyzing big data, breakthrough progress is achieved in recent years, and with the improvement of mass data resources and computing power, the power for the development of artificial intelligence technology and application in the big data era can be accelerated. Deep learning is one of the mainstream machine learning methods at present, shows excellent performance in many aspects, and is widely applied to the fields of pattern recognition, such as images, voice and the like. The deep models and their learning algorithms are influenced by brain structure and information processing heuristic mechanism, each deep model has a deep structure including many non-linear hidden layers and hierarchical feature abstraction mechanism.
At present, a depth model based on generation of a countermeasure network is a popular research direction of artificial intelligence, which is an important step in the development of high-grade artificial intelligence, mainly adopts the idea of game theory, consists of a generator and a discriminator, and trains in a countermeasure mode until Nash equilibrium is reached. With the continuous expansion of data size and dimension, the input and output of the model may appear exponential growth, which directly results in the increase of complexity and computational complexity of the deep learning model. The existing deep learning framework usually sets a single cluster for deep learning, and a deep learning process creates a plurality of programs, so that large data sets are transmitted among different independent clusters, and the complexity of the system and the end-to-end learning delay are increased.
The training for generating the countermeasure network is a calculation-intensive application, parameters such as thresholds and weights of all hidden layers in the model training process need to be subjected to a large amount of iterative computation, and when training data are increased, the training process of the whole model is very time-consuming. The big data oriented generation countermeasure network model is difficult to be put into only a single computer for operation, and needs to be deployed for distributed implementation. The big data frame Hadoop and Spark support distributed computing, and the distributed file system (HDFS) supports distributed file storage, so that a deep learning algorithm and a big data platform are combined, and parallel operation of the distributed frame is applied to process big data, so that the complexity of big data analysis based on deep learning can be effectively reduced, and the defect of low computing efficiency caused by transmission of a large amount of data among a plurality of clusters is avoided. Therefore, the invention provides a Spark framework-based distributed implementation method for generating a countermeasure network, which can accelerate the training of a depth model and improve the efficiency.
Disclosure of Invention
In order to solve the defects of the prior art, a distributed realization method for generating the countermeasure network based on the Spark framework is provided, the method references the Spark of the large data distributed programming framework, fully utilizes the characteristic that the Spark is very suitable for recursive computation, carries out parallelization design on the generated countermeasure network on a distributed cluster, can accelerate the training of generating the countermeasure network model, improves the efficiency, and has better expandability.
In order to solve the technical problems, the invention adopts the following technical scheme:
a distributed realization method for generating a countermeasure network based on a Spark framework adopts the following steps:
step 1: the method comprises the steps that a main node randomly initializes network configuration and generates parameter sets, and relevant parameters for generating a countermeasure network model are determined;
step 1.1: generating a confrontation network model distributed training objective function:
normalizing the real data calibrated for generating the confrontation network model training so that each value is in the range [0,1], wherein the training of the generator is to learn the real data distribution as much as possible; the arbiter is trained to distinguish whether the samples are from a true data distribution or a generator, the training to generate the antagonistic network model alternates between the training decisions of the generator to generate the samples and the actual samples, the weights of the generator are updated by keeping the arbiter weights constant, and the errors are back-propagated by the arbiter;
the objective function for the training of the antagonistic network model is generated as follows:
Figure BDA0001551363580000021
where differentiable functions D and G represent the discriminator and the generator, respectively, x represents the input real data, and P representsdata(x) Representing the distribution of the real data set, D (x) representing the probability that x is derived from real data rather than generated data, z representing a random variable, pz(z) represents prior distribution, G (z) is a sample generated by G which is as compliant as possible with true data distribution, and E (eta) represents calculated expectation;
step 1.2: generating a confrontation network model training distributed parallelization training:
training several groups of generation countermeasure networks simultaneously, and enabling each discriminator to periodically discriminate samples generated by other generators generating the countermeasure networks;
step 2: directly uploading mass data files to a distributed file system (HDFS) for storage;
and step 3: constructing a flexible distributed data set (RDD) of Spark:
dividing raw input data into a plurality of batches, and inputting a batch of samples instead of a single sample at a time; ensuring that samples that are too similar are not placed in the same batch by a similarity measure; caching the data into an RDD cache of a Spark platform, and then broadcasting the data into Spark working nodes participating in generation of distributed processing of a countermeasure network model algorithm;
step 4, aiming at each training data subset RDD, the master node transmits parameters, configuration and network update states to all slave nodes;
and 5: each slave node trains partial data and updates parameters; after each iteration, all parameters and states return to the master node, and the master node integrates the parameters of all slave nodes and averages the parameters, so that the next iteration training is started; the average is transmitted to all slave nodes and then further training is carried out; training in parallel in a data parallel mode to generate an confrontation network model until Nash equilibrium is reached, and finishing training;
step 5.1: generating a master node workflow of the countermeasure network model:
the master node firstly carries out data blocking on a data set for training by referring to the step 3, then broadcasts parameters to the slave nodes, the slave nodes carry out training of partial models and then return the parameters to the master node, the master node synthesizes the parts learned by the slave nodes to generate a confrontation network model, and averages the learned parameters to reconstruct a master control depth model (called as Reduce step), then broadcasts the averaged parameters to each slave node, and then starts the next iteration according to the updated parameters until the next iteration reaches nash equilibrium, and the training of the generated confrontation network model is completed;
5.2: generating a countermeasure network model slave node workflow:
each slave node receives the parameters, network, configuration and other information broadcast by the master node, reads the created data set partition data RDD, then calculates the gradient, each working node calculates the updating of the gradient based on the RDD and updates the parameters until the gradient converges, and the slave node transmits part of the calculation parameters obtained by generating the antagonistic network model learning to the master node.
The basic idea of the generation of the confrontation network training is parameter averaging, input data are divided into a plurality of subsets in a Spark cluster, a plurality of copies of the generated confrontation network model are created, the subsets are trained simultaneously, and intermediate results are cached in a memory to improve the training speed of the model. And after the slave node finishes training each copy, transmitting the calculated parameter adjustment value to the master node, broadcasting a new parameter to the slave node by the master node, and carrying out next training until Nash equilibrium is reached. Specifically, the generation of confrontation network model learning mainly comprises two main steps: gradient calculation and parameter update. In the first step, the learning algorithm independently traverses all the batch data to perform the update of the calculated gradients, including the rate of change, model parameters, etc. In a second step, the parameters of the model are updated by averaging the calculated gradients over all data blocks. The iterative computation flow of the deep learning on spark comprises two parts: data set chunking and generation confrontation network model distributed parallel learning.
The invention is innovated aiming at the training mode of generating the confrontation network model when the data scale is large, and is different from the traditional method based on a single machine system. Meanwhile, different from a distributed system based on a Hadoop distributed computing framework MapReduce, the defects that the MapReduce is slow in disk reading and writing speed and is not suitable for iterative computation are overcome, the advantages that Spark is based on a memory computing framework and is suitable for recursive computation are fully utilized, training for generating a confrontation network model can be accelerated, efficiency is improved, and good expandability can be achieved.
Drawings
Fig. 1 is a countermeasure network model component composition based on Spark framework generation.
FIG. 2 generates distributed parallelization against network data and models.
Fig. 3 is a countermeasure network model training flow based on Spark framework generation.
Fig. 4 generates a master node training workflow against a network model.
Fig. 5 generates a countermeasure network model training slave workflow.
Detailed Description
Referring to the attached drawings, Spark based on the method is a framework based on memory calculation and starting from iterative multi-batch processing, and a Spark cluster consists of two main parts: one master node and four slave nodes. The master node initializes a Sparkdriver instance to manage the partially generated countermeasure network learning model running on many slave nodes. In each iteration of the training of the generation confrontation network model, the slave nodes feed back part of the calculation parameters obtained by model learning to the master node.
Training for generating a confrontation network model is carried out based on a Spark distributed computing framework, and the whole distributed implementation main process is as follows: when an countermeasure network model is generated and begins to train, firstly mass data is uploaded to an HDFS (Hadoop distributed File System), when an Action acts on an RDD (RDD), a countermeasure network model training Job (Job) is generated, and the Job is submitted by calling a SparkContext interface in a Spark platform; after receiving Job, the SparkContext defines how to store the calculation result after the Task (Task) is successfully executed, then submits Job to the eventProcessActor, then waits for the eventProcessActor to inform the Job of finishing the execution, and returns the calculation result to the main node after finishing the execution; after the eventProcessActor receives an event of submitting Job, a Task scheduler generates a Task required by the training of the confrontation network model according to Stage calculation, and distributes a plurality of training tasks to each slave node to start parallel calculation; and reporting the state and the result to an eventProcessActor after each Task is executed, counting whether all the tasks of the Job are completed or not by the eventProcessActor, if so, informing that the Job submitted by the SparkContext is finished, and returning the calculation result to the Masternod by the SparkContext. The main content specifically comprises the following steps:
step 1: the master node randomly initializes the network configuration and generates a set of parameters that determine the relevant parameters for generating the countermeasure network model. In the training of the standard generation confrontation network model, the parameters of the generator and the discriminator are updated by using a gradient descent method on the premise that the parameters of other models are fixed. The training for generating the confrontation network model is a process of finding the Nash equilibrium of the binary zero sum game, most methods for training and generating the confrontation network model simultaneously minimize cost functions of two parties through a gradient descent method, but the gradient descent method is difficult to ensure convergence because the cost function for generating the confrontation network model is non-convex and the parameter space dimension is extremely high.
Step 1.1: generating a confrontation network model distributed training objective function
Normalizing the real data calibrated for generating the confrontation network model training so that each value is in the range [0,1], wherein the training of the generator is to learn the real data distribution as much as possible; the discriminators are trained to distinguish whether the samples are from a true data distribution or a generator, the training to generate the antagonistic network model alternates between the training discrimination sum of the generator generated samples and the actual samples, the weights of the generator are updated by keeping the discriminators weights constant, and errors are back-propagated by the discriminators. Generating an optimization problem against network model training is a minimization-maximization problem, and the objective function can be described as follows:
Figure BDA0001551363580000051
wherein differentiable functions D and G represent the discriminator and the generator, respectively, x represents the input real data, Pdata(x) Representing the distribution of the real data set, D (x) representing the probability that x is derived from real data rather than generated data, z representing a random variable, pz(z) denotes the prior distribution, G (z) is a sample generated by G that best fits the true data distribution, and E (.) denotes the calculated expectation.
Generating the antagonistic network estimate is the ratio of the two probability distribution densities, and for the learning process of generating the antagonistic network, we need to train the discriminative model D to maximize the accuracy with which discriminative data is derived from real data or from a pseudo data distribution G (z), while we need to train the generative model G to minimize log (1-D (G (z)). The global optimal solution is reached if and only if the true dataset distribution probability and the generator data distribution probability are equal. When the training generates the confrontation network, in the same round of parameter updating, the parameters of D are generally updated k times and then the parameters of G are updated 1 time.
Step 1.2: generating antagonistic network model training distributed parallelized training
As shown in fig. 2, the distributed parallelization for generating the countermeasure network includes parallelization of data and models, that is, instead of having the discriminators perform countermeasure training with fixed generators (i.e., generators in the same countermeasure generator), several sets of the generated countermeasure networks are trained simultaneously and each discriminator is made to periodically discriminate samples generated by other generators generating the countermeasure network, so that any extended model of the generated countermeasure network is suitable for generating the countermeasure network, and thus different generated countermeasure network extended models can be allocated to different node computing resources in the cluster for parallel computing. The parallel confrontation training can increase the number of the modes processed by the discriminator, thereby effectively avoiding the problem of mode collapse.
Step 1.3: setting method for generating initial parameters of confrontation network model
The idea of adjusting parameters is mainly obtained by continuously verifying and analyzing the model effect, and the commonly adopted parameter setting method is as follows:
A. the learning rate. In the generation of the confrontation network learning model, the learning rate has no setting rule for reference, generally speaking, the model learning rate is not suitable to be set too large, otherwise, the model precision is easy to be too low or divergent, and the updating amount of the weight is usually about one thousandth of the weight.
B. Weights and initial values of the bias. In general, the connection weights may be initialized to a random number from a normal distribution N (0,0.01), the bias of the hidden unit and the bias Q of the 1 st visible unit, typically to 0.
Step 2: directly uploading real data generated by the confrontation network model for training to a distributed file system (HDFS) for storage;
and step 3: a distributed data set RDD of Spark is constructed. The original input data is divided into a plurality of batches, each time a batch of samples is input instead of a single sample, other samples in the same batch can be used as auxiliary information of the sample. In addition, the samples which are too similar are not placed in the same batch through the similarity measurement, the diversity of the samples in each training process can be guaranteed, and then the mode collapse is effectively avoided. The above method allows generators in a production confrontation network to process batches of samples at a time rather than individual samples. The data are cached in the RDD cache of the Spark platform and then broadcast to Spark working nodes participating in the distributed processing of the generation countermeasure network model algorithm, and the acceleration of the training of the depth model is very critical on the basis of the thought of the RDD, because the memory-based method has less read-write delay compared with a disk. Each Spark working node is a computer generally, and each node can only read locally when needing the data for calculation, and does not need to request the data any more, so that the time required for data transmission between the nodes is saved. Particularly, in a large-scale data set, multiple operations need to be performed on the distributed data block RDD, after the RDD is cached, calculation can be performed when the data subset RDD is called for the first time, and subsequently, if the data subset RDD is called again, values can be directly taken in a memory without recalculation.
Step 3.1: : defining InputSpilt of a Spark distributed computing framework, realizing a subclass of Java class InputSpilt according to API provided by Spark, customizing a subclass of Java class InputFormat class, and realizing a Java class getSpilts method of the subclass;
step 3.2: the SparkContext creates an RDD according to the file/directory and the optional fragment, firstly, an application program creates an instance of the SparkContext, if the instance is sc, and creates the RDD by using the created sc instance;
step 3.3: creating Hadoop RDD according to InputFormat and the like in Hadoop configuration;
step 3.4: defining a mode for reading InputSpilt data, and acquiring a Recordreader from an InputFormat according to Hadoop configuration and fragmentation to read data;
step 3.5: customizing a subclass of java class Recordreader, wherein the subclass is used for customizing a mode of reading fragment data and constructing a Spark distributed data set by combining the customized InputFormat in the step 3;
and 4, as shown in the figure 3, aiming at each training data subset RDD, the master node transmits the parameters, the configuration and the update state of the network to all the slave nodes. Firstly, completing distributed conversion in the Transformation of a distributed data set RDD, realizing map conversion of the RDD, and sequencing the output result of conversion after each map is input according to key. The RDD is partitioned according to each recorded key, data with the same key are stored on the same node to ensure that two data sets can run efficiently in join, content protocols in the data sets are specific numerical values, and the final result is stored in a sequence File file under an HDFS specified path and used for subsequently generating a confrontation network depth model for big data analysis. And then processing in Action, forming a DAG by the dependency relationship between the RDDs, dividing the generated confrontation network training Job into a plurality of Stages, wherein each stage comprises a plurality of times of iterative calculations based on MapReduce, the output of the previous iteration is used as the input of the next iteration, and each iteration trains part to generate the confrontation network model. After Stage is submitted, the Task scheduler calculates the required Task according to Stage and submits the Task to Worker in the corresponding Spark cluster.
Step 4.1: through a series of Transformation operations, the original RDD is converted into other types of RDD.
The RDD conversion process is Hadoop RDD → MappedRDD → FlatMappedRDD → MappedRDD → PairRDDFunction → ShuffleRDD → MapPartionsRDD. Generating HadoopRDD, generating MappedRDD through Map operation, converting the original MappedRDD into FlatMappedRDD by using the FlatMap, converting the FlatMappedRDD into the MappedRDD by using the generated key value pair, implicitly converting the MairedRDunons into PairRDUnctions in the redeceByKey, calling combeneByKey by the redeceByKey, converting the PairRDUnctions into ShuffleRDD, and after calling mapPartitionsWithContext, converting the ShuffleRDD into the MapParitionsRDD.
Step 4.2: job generation and operation
When an Action acts on the post-conversion RDD, the Action is submitted as a Job, and the runJob method of SparkContext is called, the call path is roughly as follows: runjob → dagschedule runjob → sumitojob; in DAGSSchedule, sumitJob creates an event of JobSummitted and sends the event to an embedded event processactor; the evenProcesssActor calls a processEvent processing function after receiving JobSummitted; job is converted into Stage, finalStage is generated and submitted to run, and submittage is called; calculating the dependency relationship between stages in Submitstage, wherein the dependency relationship is divided into wide dependency and narrow dependency; if the current Stage is found to have no dependency in the calculation or all the dependencies are already prepared, submitting the Task; submitting Task is completed by calling submitMissingTasks function; the Task scheduler manages the distribution of Task tasks, and submitMissingtasks mainly comprise the steps of calling the Task scheduler, namely submitTasks;
step 4.3: generating a countermeasure network model training corresponding Task flow analysis
All slave nodes execute the same Map task in parallel based on different partitions of the data blocks, and a generation countermeasure network model learning task with high computational overhead is divided into a plurality of parallel subtasks. An initialization process: firstly, sparkConf is generated according to the initialized import parameters, and then sparkEnv is created according to the sparkConf, wherein the sparkEnv mainly comprises 4 key components: BlockManager, MapOutputTracker, ShuffleFatcher, and ConnectionManager; creating a task scheduler, selecting a scheduler backup nd, and at the same time
Starting a task scheduler; specifically, a TaskScheduler instance is used as an entry parameter to create a DAGSScheduler; and creating a corresponding backkend according to the current operation mode in the tasskschedulermiprl, wherein the backkend receives a receiveoffer event transmitted by the tasskschedulermiprl. The final logic processing occurs in TaskRunner, which is an executive, packages the operation result into MapStatus, and then feeds back to DAGSSchedule through a series of internal information transmission.
And 5: the learning results of the depth model depend on the smaller instances and datasets associated with this learning problem. In Spark distributed cluster, each slave node trains part of the data and updates the parameters. After one iteration, all parameters and states return to the master node, and the master node performs parameter averaging and updates the state of the training neural network. The average is then passed to all slave nodes for further training. And (4) generating an antagonistic network model through parallel training in a data parallel mode, and repeating until the antagonistic network model reaches Nash equilibrium.
Step 5.1: generating a countermeasure network model host node workflow
As shown in fig. 4, the master node firstly performs data partitioning on the data set for training with reference to step 3, then broadcasts parameters to the slave nodes, the slave nodes perform training of partial models and then return the parameters to the master node, the master node synthesizes the parts learned by the slave nodes to generate a countermeasure network model, and averages the learned parameters to reconstruct a master depth model (also called Reduce step). And broadcasting the averaged parameters to each slave node, and starting the next iteration according to the updated parameters until the parameters reach Nash equilibrium, so as to complete the training of the generated countermeasure network model.
Step 5.2: generating a countermeasure network model slave node workflow
The slave node part model training process comprises the following steps that each slave node receives parameters, networks, configuration and other information broadcasted by the master node, reads created data set partition data RDD, then calculates gradient, each working node calculates updating of the gradient based on the RDD (also called Map step), updates parameters until the gradient is converged, and the slave nodes transmit part of the calculation parameters obtained by resisting network model learning to the master node.

Claims (1)

1. A distributed realization method for generating a countermeasure network based on a Spark framework adopts the following steps:
step 1: the method comprises the steps that a main node randomly initializes network configuration and generates parameter sets, and relevant parameters for generating a countermeasure network model are determined; in the training of generating the confrontation network model, the parameters of the generator and the discriminator are updated by using a gradient descent method on the premise that the parameters of other models are fixed;
step 1.1: generating a confrontation network model distributed training objective function:
normalizing the real data calibrated for generating the confrontation network model training so that each value is in the range [0,1], wherein the training of the generator is to learn the real data distribution as much as possible; the arbiter is trained to distinguish whether the samples are from a true data distribution or a generator, the training to generate the antagonistic network model alternates between the training decisions of the generator to generate the samples and the actual samples, the weights of the generator are updated by keeping the arbiter weights constant, and the errors are back-propagated by the arbiter;
the objective function for the training of the antagonistic network model is generated as follows:
Figure FDA0002480559520000011
where differentiable functions D and G represent the discriminator and the generator, respectively, x represents the input real data, and P representsdata(x) Representing the distribution of the real data set, D (x) representing the probability that x is derived from real data rather than generated data, z representing a random variable, pz(z) denotes a prior distribution, G (z) isGGenerating samples which are as compliant with real data distribution as much as possible, wherein E (means) represents a calculation expected value;
step 1.2: generating a confrontation network model training distributed parallelization training:
the distributed parallelization of the generated countermeasure network comprises parallelization of data and models, a plurality of groups of generated countermeasure networks are trained simultaneously, each discriminator is made to periodically discriminate samples generated by other generators generating the countermeasure networks, the distributed parallelization of the generated countermeasure networks is suitable for any extension model generating the countermeasure networks, and therefore different generated countermeasure network extension models are distributed to different node computing resources in a cluster for parallel computing;
step 1.3: setting method for generating initial parameters of confrontation network model
The idea of adjusting parameters is mainly obtained by continuously verifying and analyzing the effect of the model, and the adopted parameter setting method comprises the following steps:
A. a learning rate; in the generation of the confrontation network learning model, the learning rate has no setting rule for reference, so that the updating amount of the weight is about one thousandth of the weight;
B. initial values of weights and offsets; the connection weight can be initialized to 0 with a random number from a normal distribution N (0,0.01), the bias of the hidden unit and the bias Q of the 1 st visible unit;
step 2: directly uploading mass data files to a Distributed file system (HDFS) for storage;
and step 3: constructing a flexible Distributed data set (RDD) of Spark:
dividing raw input data into a plurality of batches, and inputting a batch of samples instead of a single sample at a time; ensuring that samples that are too similar are not placed in the same batch by a similarity measure; caching the data into an RDD cache of a Spark platform, and then broadcasting the data into Spark working nodes participating in generation of distributed processing of a countermeasure network model algorithm;
step 3.1: defining InputSpilt of a Spark distributed computing framework, realizing a subclass of Java class InputSpilt according to API provided by Spark, customizing a subclass of Java class InputFormat class, and realizing a Java class getSpilts method of the subclass;
step 3.2: the sparkContext creates RDD according to the file/directory and the optional fragment, and firstly, an application program creates an instance of the sparkContext;
step 3.3: creating Hadoop RDD according to the InputFormat in Hadoop configuration;
step 3.4: defining a mode for reading InputSpilt data, and acquiring a Recordreader from an InputFormat according to Hadoop configuration and fragmentation to read data;
step 3.5: customizing a subclass of java class Recordreader, wherein the subclass is used for customizing a mode of reading fragment data and constructing a Spark distributed data set by combining the customized InputFormat in the step 3;
and 4, aiming at each training data subset RDD, the master node transmits the parameters, the configuration and the update state of the network to all the slave nodes:
step 4.1: through Transformation operation, the original RDD is converted into other types of RDD:
firstly, completing distributed conversion in the Transformation of a distributed data set RDD, realizing map conversion of the RDD, and sequencing the output result of conversion after each map is input according to key; the RDD is partitioned according to the key of each record, and data with the same key can be stored on the same node, so that the two data sets can run efficiently in join; the content protocol in the data set is a specific numerical value, and the final result is stored in a sequence File file under an HDFS specified path and is used for subsequently generating a countermeasure network depth model for big data analysis;
step 4.2: job generation and operation:
processing in Action, wherein a dependency relationship between RDDs forms a DAG, the generated confrontation network training Job is divided into a plurality of Stages, each stage comprises a plurality of times of iteration calculation based on MapReduce, the output of the previous iteration is used as the input of the next iteration, and each iteration trains part to generate a confrontation network model;
step 4.3: generating a Task flow analysis corresponding to the confrontation network model training:
after the Stage is submitted, a Task scheduler calculates a required Task according to the Stage and submits the Task to a Worker in a corresponding Spark cluster; all slave nodes execute the same Map task in parallel based on different partitions of the data block, and a generation confrontation network model learning task with high calculation cost is divided into parallel subtasks;
and 5: in the Spark distributed cluster, each slave node trains partial data and updates parameters; after one iteration, all parameters and states return to the master node, and the master node averages the parameters and updates the state of the training neural network; then, the average value is transmitted to all slave nodes, and further training is carried out; training in parallel in a data parallel mode to generate an confrontation network model, repeating until the confrontation network model reaches Nash equilibrium, and finishing training;
step 5.1: generating a master node workflow of the countermeasure network model:
the master node firstly carries out data blocking on a data set for training by referring to the step 3, then broadcasts parameters to the slave nodes, the slave nodes carry out training on partial models and then return the parameters to the master node, the master node synthesizes the parts learned by the slave nodes to generate a confrontation network model, and averages the learned parameters to reconstruct a master control depth model (called a Reduce step), then broadcasts the averaged parameters to each slave node, and then starts the next iteration according to the updated parameters until the next iteration reaches nash equilibrium, and the training of the generated confrontation network model is completed;
step 5.2: generating a countermeasure network model slave node workflow:
each slave node receives parameters, network and configuration related information broadcasted by the master node, reads the created data set partition data RDD, then calculates the gradient, and each working node calculates the update of the gradient based on the RDD and updates the parameters until the gradient converges; when the confrontation network is generated by training, in the same round of parameter updating, the parameters of the confrontation model D are updated k times, and then the parameters of the generated model G are updated 1 time; the slave node transmits the parameters which partially generate the counternetwork model learning to the master node.
CN201810047494.7A 2018-01-18 2018-01-18 Distributed implementation method for generating countermeasure network based on Spark framework Active CN108268638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810047494.7A CN108268638B (en) 2018-01-18 2018-01-18 Distributed implementation method for generating countermeasure network based on Spark framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810047494.7A CN108268638B (en) 2018-01-18 2018-01-18 Distributed implementation method for generating countermeasure network based on Spark framework

Publications (2)

Publication Number Publication Date
CN108268638A CN108268638A (en) 2018-07-10
CN108268638B true CN108268638B (en) 2020-07-17

Family

ID=62775963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810047494.7A Active CN108268638B (en) 2018-01-18 2018-01-18 Distributed implementation method for generating countermeasure network based on Spark framework

Country Status (1)

Country Link
CN (1) CN108268638B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3564883B1 (en) 2018-04-30 2023-09-06 Hewlett Packard Enterprise Development LP System and method of decentralized management of device assets outside a computer network
EP3564873B1 (en) 2018-04-30 2022-11-30 Hewlett Packard Enterprise Development LP System and method of decentralized machine learning using blockchain
EP3565218B1 (en) 2018-04-30 2023-09-27 Hewlett Packard Enterprise Development LP System and method of decentralized management of multi-owner nodes using blockchain
CN109104737B (en) * 2018-07-31 2020-10-09 北京航空航天大学 Cluster countervailing capacity evaluation method based on time-varying network
CN109410131B (en) * 2018-09-28 2020-08-04 杭州格像科技有限公司 Face beautifying method and system based on condition generation antagonistic neural network
CN109460708A (en) * 2018-10-09 2019-03-12 东南大学 A kind of Forest fire image sample generating method based on generation confrontation network
CN109523014B (en) * 2018-10-22 2021-02-02 广州大学 News comment automatic generation method and system based on generative confrontation network model
CN111105349B (en) * 2018-10-26 2022-02-11 珠海格力电器股份有限公司 Image processing method
CN109447263B (en) * 2018-11-07 2021-07-30 任元 Space abnormal event detection method based on generation of countermeasure network
CN109299781B (en) * 2018-11-21 2021-12-03 安徽工业大学 Distributed deep learning system based on momentum and pruning
CN109784494A (en) * 2018-11-28 2019-05-21 同盾控股有限公司 A kind of machine learning method and device based on pyspark
CN111274795B (en) * 2018-12-04 2023-06-20 北京嘀嘀无限科技发展有限公司 Vector acquisition method, vector acquisition device, electronic equipment and computer readable storage medium
CN111274348B (en) * 2018-12-04 2023-05-12 北京嘀嘀无限科技发展有限公司 Service feature data extraction method and device and electronic equipment
CN109800092A (en) 2018-12-17 2019-05-24 华为技术有限公司 A kind of processing method of shared data, device and server
CN109871995B (en) * 2019-02-02 2021-03-26 浙江工业大学 Quantum optimization parameter adjusting method for distributed deep learning under Spark framework
US11966818B2 (en) 2019-02-21 2024-04-23 Hewlett Packard Enterprise Development Lp System and method for self-healing in decentralized model building for machine learning using blockchain
CN110704221B (en) * 2019-09-02 2020-10-27 西安交通大学 Data center fault prediction method based on data enhancement
CN111091180B (en) * 2019-12-09 2023-03-10 腾讯科技(深圳)有限公司 Model training method and related device
US11748835B2 (en) 2020-01-27 2023-09-05 Hewlett Packard Enterprise Development Lp Systems and methods for monetizing data in decentralized model building for machine learning using a blockchain
US11218293B2 (en) 2020-01-27 2022-01-04 Hewlett Packard Enterprise Development Lp Secure parameter merging using homomorphic encryption for swarm learning
CN111522657B (en) * 2020-04-14 2022-07-22 北京航空航天大学 Distributed equipment collaborative deep learning reasoning method
CN111539519A (en) * 2020-04-30 2020-08-14 成都成信高科信息技术有限公司 Convolutional neural network training engine method and system for mass data
CN111625291B (en) * 2020-05-13 2023-06-27 北京字节跳动网络技术有限公司 Automatic iteration method and device for data processing model and electronic equipment
CN111985609A (en) * 2020-07-06 2020-11-24 电子科技大学 Data parallel optimization method based on TensorFlow framework
CN112685504B (en) * 2021-01-06 2021-10-08 广东工业大学 Production process-oriented distributed migration chart learning method
CN112801417A (en) * 2021-03-16 2021-05-14 贵州电网有限责任公司 Optimized model parallel defect material prediction method
CN113556247B (en) * 2021-06-25 2023-08-01 深圳技术大学 Multi-layer parameter distributed data transmission method, device and readable medium
CN114756385B (en) * 2022-06-16 2022-09-02 合肥中科类脑智能技术有限公司 Elastic distributed training method under deep learning scene

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480772A (en) * 2017-08-08 2017-12-15 浙江大学 A kind of car plate super-resolution processing method and system based on deep learning
CN107563417A (en) * 2017-08-18 2018-01-09 北京天元创新科技有限公司 A kind of deep learning artificial intelligence model method for building up and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104463324A (en) * 2014-11-21 2015-03-25 长沙马沙电子科技有限公司 Convolution neural network parallel processing method based on large-scale high-performance cluster
CN106570565A (en) * 2016-11-21 2017-04-19 中国科学院计算机网络信息中心 Depth learning method and system for big data
CN107038244A (en) * 2017-04-24 2017-08-11 北京北信源软件股份有限公司 A kind of data digging method and device, a kind of computer-readable recording medium and storage control

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480772A (en) * 2017-08-08 2017-12-15 浙江大学 A kind of car plate super-resolution processing method and system based on deep learning
CN107563417A (en) * 2017-08-18 2018-01-09 北京天元创新科技有限公司 A kind of deep learning artificial intelligence model method for building up and system

Also Published As

Publication number Publication date
CN108268638A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN108268638B (en) Distributed implementation method for generating countermeasure network based on Spark framework
Nguyen et al. Fast-convergent federated learning
Guo et al. Cloud resource scheduling with deep reinforcement learning and imitation learning
Wang et al. Distributed machine learning with a serverless architecture
CN107888669B (en) Deep learning neural network-based large-scale resource scheduling system and method
Yan et al. Performance modeling and scalability optimization of distributed deep learning systems
Kabiljo et al. Social hash partitioner: a scalable distributed hypergraph partitioner
CN110659678B (en) User behavior classification method, system and storage medium
US11907821B2 (en) Population-based training of machine learning models
CN109445386B (en) Cloud manufacturing task shortest production time scheduling method based on ONBA
CN112884236B (en) Short-term load prediction method and system based on VDM decomposition and LSTM improvement
CN109032630B (en) Method for updating global parameters in parameter server
CN113469372A (en) Reinforcement learning training method, device, electronic equipment and storage medium
CN111597230A (en) Parallel density clustering mining method based on MapReduce
Hirschberger et al. A variational EM acceleration for efficient clustering at very large scales
Ravikumar et al. Computationally efficient neural rendering for generator adversarial networks using a multi-GPU cluster in a cloud environment
CN112328332B (en) Database configuration optimization method for cloud computing environment
CN112035234B (en) Distributed batch job distribution method and device
CN113743614A (en) Method and system for operating a technical installation using an optimal model
CN112232401A (en) Data classification method based on differential privacy and random gradient descent
CN108334532A (en) A kind of Eclat parallel methods, system and device based on Spark
Malik et al. Detailed performance analysis of distributed Tensorflow on a gpu cluster using deep learning algorithms
Xu et al. Parallel and distributed asynchronous adaptive stochastic gradient methods
Heye Scaling deep learning without increasing batchsize
Li et al. Optimizing machine learning on apache spark in HPC environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant