CN116306543B

CN116306543B - Form data generation method and system based on generation type countermeasure network

Info

Publication number: CN116306543B
Application number: CN202310595962.5A
Authority: CN
Inventors: 李长林; 陈燎; 未伟; 贾宁; 崔润邦; 孙洪贵
Original assignee: Beijing Fantike Technology Co ltd; Tianjin University
Current assignee: Beijing Fantike Technology Co ltd; Tianjin University
Priority date: 2023-05-25
Filing date: 2023-05-25
Publication date: 2023-07-28
Anticipated expiration: 2043-05-25
Also published as: CN116306543A

Abstract

The application discloses a form data generation method and system based on a generation type countermeasure network. According to the technical scheme of the application, the method comprises the following steps: step S1), cleaning data of a table to be generated; step S2), carrying out standardization treatment on the cleaned data; step S3) inputting the standardized data into a pre-established and trained form data generation model to obtain form generation data; the tabular data generation model is based on an improved generation type countermeasure network implementation, and comprises a generator and a regressor. The invention improves the generation type countermeasure network, introduces a new structure, namely a regressor, and is used for converting the output of the generator into final generation data; in model training, gradient punishment and random linear interpolation terms are introduced, so that the learning speed and stability of the model are improved, and problems such as gradient explosion are avoided.

Description

Form data generation method and system based on generation type countermeasure network

Technical Field

The present invention relates to the field of table data generation technology, and in particular, to a table data generation method and system based on a generation type countermeasure network.

Background

Form data is the most basic and common data form, and a large amount of data exists in form of form in engineering and production life (such as a user information form of a bank, a product holding condition form of a user and the like). With the development of informatization, more and more enterprises, researchers and managers choose to conduct management activities such as planning, organization, coordination, decision-making and control based on data analysis. However, this motivates the need to generate data as administrators encounter data quantity, quality, imbalance, and privacy issues in the process of processing data using machine learning to make decisions.

Data quantity problem: in some fields, the data size is not abundant, and the premise of the machine learning, especially the deep learning, can have good effect is that a large amount of marked data is needed, so if the original data can be expanded and enhanced, the better application effect can be realized by using less original data.

Data quality problem: data quality problems are now common, such as outliers due to erroneous recordings in some data that rely on manual collection. This data quality problem can be better addressed if the data distribution can be learned and then data enhancement can be performed according to the original distribution.

Data imbalance problem: imbalance of positive and negative samples, too little sample data of a certain type, can lead to many problems in subsequent applications. In this case, many people choose to divide the data set smaller. We consider data generation and enhancement to be the fundamental approach to solving the problem of data imbalance.

Data privacy problem: many data are sensitive information and often are difficult (or only a small portion of them) to access by researchers due to privacy concerns. This sensitive information problem can be avoided if "false" data of the same correlation characteristics can be generated.

Currently, table data generation mainly uses a statistical model or a deep learning model to learn the distribution of real data to generate table data. Statistical model methods use a series of predefined probability distributions to fit a new tabular dataset. For example, a gaussian mixture model (Gaussian mixture model) may model the joint distribution of several consecutive columns, while a bayesian network (Bayesian networks) may be used to model the joint distribution of discrete columns. However, this approach is severely limited by the distribution of data, and cannot be universally used if the data set is hashed both continuously and discretely. In this case, the method adopted is often to discretize the continuous column data and then model the continuous column data through a bayesian network or decision tree. Furthermore, statistical model-based methods are computationally expensive, and such models are difficult to apply to large data sets with thousands of columns and millions of rows. Another category is deep learning methods. Since deep learning is a good effect in terms of computer vision and natural language processing, many researchers have been motivated to try to use deep learning for table data generation. Deep learning methods such as Variational Auto-Encoders (VAEs) and generative antagonism networks (Generative Adversarial Networks, GANs) have the ability to learn complex high-dimensional distributions and generate high quality samples, which have been widely used in images and text. Also, it is highly possible to learn the implicit distribution of the table data based on this proposed model and then sample it therefrom to generate rows to obtain high-quality table data. However, while GANs have the potential to model arbitrary distributions, the current GANs-based models tend to be less excellent than simple statistical models for some special properties, such as non-gaussian distributed continuous columns, non-uniformly distributed discrete columns, etc.

Therefore, how to simply and universally implement the generation of table data is a technical problem that needs to be solved in the art.

Disclosure of Invention

In view of this, the present application proposes a table data generating method based on a generating type countermeasure network, so as to achieve a better generating effect for different types of columns. By adding a regression device for the standardization of different types of data in a targeted manner, the model has excellent generation effect on different types of columns, and can simply and universally realize the generation of table data.

According to one aspect of the present application, there is provided a form data generation method based on a generation type countermeasure network, the method including: step S1), cleaning data of a table to be generated; step S2), carrying out standardization treatment on the cleaned data; step S3) inputting the standardized data into a pre-established and trained form data generation model to obtain form generation data; the tabular data generation model is based on an improved generation type countermeasure network implementation, and comprises a generator and a regressor.

Preferably, the step S1) specifically includes: and checking the complete attribute of the data, and deleting the record with the null value.

Preferably, the data after washing in step S2) includes discrete columns and continuous columns, and the normalization process specifically includes:

for discrete columns, carrying out standardization processing by adopting one-hot coding; for the continuous columns, adopting variational Gaussian mixture for standardization treatment; and then splicing.

Preferably, for discrete columns, one-hot coding is adopted for standardization treatment; the method specifically comprises the following steps:

for the cj-th column of the discrete columnsI element->Using one-hot coding to obtain vector +.>Where d represents the total number of categories for the discrete column.

Preferably, the normalization is performed by using variational Gaussian mixture for the continuous columns; the method specifically comprises the following steps:

according to the cleaned data, the number K of Gaussian distributions and the average priori variance of the Gaussian distributions，

Obtaining the first step by adopting CAVI algorithmkWeights with selected gaussian distributionMean->Sum of variances->And a variation probability density +.>

According toFor column cj of the consecutive columns +.>Element->Establishing a Gaussian mixture model:

wherein, the liquid crystal display device comprises a liquid crystal display device,mean value of +.>Variance is->Gaussian distribution of->；

Sampling from the K variational probability densities to obtain a Gaussian distribution for normalization, expressed as。

Preferably, the splicing is specifically:

data after i-th row splicingThe method comprises the following steps:

wherein the symbols areRepresenting a vector concatenation operation.

Preferably, the generator G includes a convolution layer, a LeakyReLu activation function, a full connection layer, and a Tanh activation function connected in sequence;

the regressor R comprises a full connection layer, a Tanh activation function, a batch standardization layer, a full connection layer and a sigmoid function which are connected in sequence.

Preferably, the table data generating model further includes a discriminator D in training, including: a generator G and a regressor R; wherein, the liquid crystal display device comprises a liquid crystal display device,

comprising the following steps: the method comprises the following steps of sequentially connecting a convolution layer, a LeakyReLu activation function, a pooling layer, a Flatten layer, a full connection layer and a Sigmoid activation function.

Preferably, the method further comprises a training step of generating a model from the tabular data; the method specifically comprises the following steps:

step T1), a training set is established;

step T2) setting gradient penalty coefficients respectivelyNumber of arbiter iterations in each generator iteration/>Batch size b, adam super parameter +.>Training algebra +.>；

Step T3) number of iterations traversedUpdating the D parameter of the discriminator in each iteration;

step T4) when the number of iterations is reachedUpdate generator G when training algebra +.>Turning to the step T3), and continuing training; otherwise, obtaining a trained generator, and turning to a step T5);

step T5) traversing training algebraUpdating parameters of the regressor R to obtain a trained regressor R, and further obtaining a trained form data generation model by the trained generator G and the regressor R.

Preferably, the step T1) specifically includes:

selecting data comprising user basic information, product holding information, asset information and/or stream information; after cleaning and standardization treatment, a training set is established.

Preferably, the step T3) specifically includes:

number of iterations of traversalThe following processes are respectively carried out in each iteration:

establishing a mathematical expectation of 0, standardStandard normal distribution N (0, 1) with difference 1, from which b hidden vector samples are taken；

Taking b data samples from the training set;

Get at [0,1]Random numbers within a range；

Traversing b hidden vector samples, taking one sample at a time，/>Obtaining corresponding embedded vector via generator G>：

；

Embedded vectorObtaining interpolation data by random linear interpolation>：

；

Sample training setEmbedded vector->Interpolation data +.>Inputting a discriminator D;

current arbiter parameters according toUpdating to obtain new discriminator parameter +.>：

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing Adam optimization algorithm,/->Is a discriminator (I/O)>About the discriminator parameters->Is used for the gradient of (a),representing two norms>Which is a gradient penalty coefficient.

Preferably, said step T4) when the number of iterations is reachedAn update generator G; the method specifically comprises the following steps:

when the number of iterations is reachedB hidden vector samples +.>；

Taking a sample from the hidden vector sample，/>；

Current generator parameters are calculated according toUpdating to obtain new generator parameters +.>：

。

Wherein, the liquid crystal display device comprises a liquid crystal display device,is about generator parameters->Is a gradient of (a).

Preferably, the step T5) specifically includes:

step T5-1), traversing the training algebra E, and repeating the step T5-2); until reaching training algebra E, go to step T5-3);

step T5-2) taking b hidden vector samples from the normal distribution N (0, 1)Obtain->；

Traversing b hidden vector samples, taking one sample at a time，/>Obtaining the generated data by a regressor R>：

；

Updating regressor parameters：

Step T5-3) to obtain a trained regressor R, and further obtaining a trained form data generation model by the trained generator G and the regressor R.

According to still another aspect of the present application, there is provided a form data generation system based on a generation type countermeasure network, implemented according to the above form data generation method, the system including: the system comprises a cleaning module, a standardized processing module, a generating module and a form data generating model; wherein, the liquid crystal display device comprises a liquid crystal display device,

the cleaning module is used for cleaning the data of the form to be generated;

the standardized processing module is used for carrying out standardized processing on the cleaned data;

the generation module is used for inputting the standardized data into a pre-established and trained form data generation model to obtain form generation data;

the tabular data generation model is based on an improved generation type countermeasure network implementation, and comprises a generator and a regressor.

The application also provides a computer device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, and is characterized in that the method for generating the table data is realized when the processor executes the computer program.

The present application also provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the above-described table data generation method of the present application.

According to the technical scheme, in the data processing part, continuous data are processed by using Bayesian Gaussian mixture, discrete data are processed by using one-hot coding, and the characteristics of the discrete data are better depicted, so that the processing of a subsequent model is facilitated; the method uses and improves the leading-edge unsupervised learning technology in recent years, namely a generation type countermeasure network, and introduces a new structure, namely a regressor, on the basis of a generator and a discriminator in the generation type countermeasure network, wherein the regressor is used for converting the output of the generator into final generated data; in model training, gradient punishment and random linear interpolation terms are introduced, so that the learning speed and stability of the model are improved, and problems such as gradient explosion are avoided.

Additional features and advantages of the present application will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a flow chart of a method for generating table data according to the present invention;

FIG. 2 is a diagram of a tabular data generation model;

FIG. 3 is a document type column distribution comparison, where FIG. 3 (a) is the generated data and FIG. 3 (b) is the actual data;

FIG. 4 is a comparison of the distribution of the number of held products, wherein FIG. 4 (a) is the generated data and FIG. 4 (b) is the actual data;

FIG. 5 is a graph showing the comparison of the gender and whether the demand deposit line is held or not in a combined profile, wherein FIG. 5 (a) is the generated data and FIG. 5 (b) is the actual data;

FIG. 6 is a graph of contour contrast of the joint distribution of the belonging industry and whether there is a regular deposit line, wherein FIG. 6 (a) is the generated data and FIG. 6 (b) is the actual data;

FIG. 7 is a graph of the highest school, number of held products, and marital status versus the co-distribution of the columns, where FIG. 7 (a) is the generated data and FIG. 7 (b) is the actual data.

Detailed Description

As shown in fig. 1, embodiment 1 of the present invention provides a table data generating method based on a generated countermeasure network, including the steps of:

step one, obtaining a sample of form data to be generated.

Step two, data cleaning, cleaning out records that do not meet the conditions (the form data for the study must contain all the complete attributes).

And step three, cleaning the data according to the step two, adopting one-hot coding for discrete columns, and adopting variational Gaussian mixture (Variational Gaussian Mixture) for standardization processing for continuous columns.

Step four, taking the data sample processed in the step three, wherein hidden variable z (obeys distribution) Random number->Input into the proposed generative countermeasure network model (ReTGAN) for training.

And fifthly, after training is finished, N vectors are taken from the standard normal distribution and are input into a network, and form generation data meeting requirements can be obtained through output of a regressive in a ReTGAN.

The technical solutions of the present application will be described in detail below with reference to the accompanying drawings in combination with embodiments.

Example 1

Embodiment 1 of the present invention provides a table data generation method based on a generation type countermeasure network.

The data set is selected for research in the embodiment, and the data distribution is complex and has discrete columns and continuous columns. The system comprises 1980 attribute characteristics such as user basic information, product holding information, asset information, stream information and the like. Wherein, the liquid crystal display device comprises a liquid crystal display device,

the basic information includes: customer number, birth date, gender, ethnicity, marital status, date of customer opening, highest school, work unit, industry, address information, certificate type, etc.

The product holding information includes: customer number (for association with other tables), data time, number of savings cards held, number of products held, period of activity held, period of time held, financial reason held, national debt held, fund held, insurance held, loan held, effective mobile banking held, effective online banking held, effective WeChat banking held, subscription payment accepted, foreign currency savings held, short message subscription held, effective savings card held, effective credit card held, effective social security card customer held, effective medical security customer held, mobile banking held, online banking held, weChat banking held, credit card held, social security card customer held, medical security customer held, etc.

The asset information includes: customer numbering, data date, management asset balance, running period balance, periodic balance, financial balance, insurance balance, national debt balance, foundation balance, management asset average daily balance, running period average daily balance, periodic average daily balance, financial month average daily balance, insurance average daily balance, national debt average daily balance, foundation average daily balance, and the like.

The running water includes running water in a living period and running water in a regular period. Wherein the running water in the living period comprises: serial number, customer number (for association with other tables), deposit account number, transaction time, transaction amount, balance.

The regular running water includes: serial number, customer number (for association with other tables), deposit account number, purchase date, expiration date, transaction amount, product number, interest rate, deadline, and the like.

It should be noted that the above data is merely an example, and is not limited thereto.

The data statistics time is from 2019, 10 month, 1 day to 2019, 10 month, 31 days, and is 9090 records in total. The specific implementation mode is as follows:

step one, acquiring a data set, which is expressed as:；i，jrespectively row and column numbers. For example, the ith row of the table is denoted +.>。

And step two, cleaning the data, and deleting the record with the null value so that the data contains all the complete attributes.

Step three, according to the data washed in step two, for any columnEvery element->In case of discrete columns, using one-hot coding, the element one-hot coding in one discrete column can be expressed as vector +.>D represents the total number of categories for the discrete column, i.e. the dimension of the one-hot code is d-dimension. If a continuous column, a variational Gaussian mixture (Variational Gaussian Mixture) is used for normalization. The Gaussian mixture model isWherein->Is a weight representing the probability that the kth gaussian distribution is selected;and->The mean and variance of the kth gaussian distribution, respectively. Variable probability Density->. The model was trained using the Coordinate Ascent Variational Inference (CAVI) algorithm and the parameters were learned as shown in table 1.

TABLE 1 CAVI algorithm of variational Gaussian mixture model

Obtaining a variation probability densityWeight->Mean->Variance->Then, a Gaussian distribution is sampled from the K variational probability densities for normalization, denoted +.>. Splicing the discrete columns and the continuous columns, and setting the sign +.>Representing a vector concatenation operation, then a row may be represented as:

。

step four, designing a generated countermeasure network model (regsan) for generating the table data, as shown in fig. 2. The generator G comprises a convolution layer, a LeakyReLu activation function, a full connection layer and a Tanh activation function which are connected in sequence; the regressor R comprises a full connection layer, a Tanh activation function, a batch standardization layer, a full connection layer and a sigmoid function which are connected in sequence. The discriminator D comprises a convolution layer, a LeakyReLu activation function, a pooling layer, a Flatten layer, a full connection layer and a Sigmoid activation function which are connected in sequence. In the training, the generator G, the discriminator D and the regressor R are trained, and when the training is actually used, the generator G and the regressor R are adopted to form a table data generation model. The specific training process is as follows:

TABLE 2 ReTGAN training procedure

In the table:

the standard normal distribution N (0, 1) is mathematically expected to be 0, with a standard deviation of 1.

Is about generator parameters->Gradient of->Is about the discriminator parameter->Is a gradient of (a).

Step five, after training, N vectors are taken from standard normal distribution and input into a network, and generated data can be obtained through a regressor R in the RETGAN:

the structural settings and super parameters of the generator G, the arbiter D, and the regressor R network are shown in table 3.

TABLE 3 network parameters of the RETGAN model

Generating a model according to the trained form data, wherein the form data generating method comprises the following steps:

step one, cleaning data of a form to be generated;

step two, carrying out standardization treatment on the cleaned data;

inputting the standardized data into a pre-established and trained form data generation model to obtain form generation data.

Other common data normalization methods include, but are not limited to, min-Max normalization, Z-score normalization, and the like. Furthermore, different parameter settings based on this framework include, but are not limited to, different training algebra, different numbers of neurons, different numbers of network layers, etc.

Example 2

Embodiment 2 of the present invention provides a form data generation system based on a generation type countermeasure network, implemented according to the method of embodiment 1, the system including: the system comprises a cleaning module, a standardized processing module, a generating module and a form data generating model; wherein, the liquid crystal display device comprises a liquid crystal display device,

the cleaning module is used for cleaning the data of the form to be generated;

Example 3

Embodiment 3 of the present invention may also provide a computer apparatus, including: at least one processor, memory, at least one network interface, and a user interface. The various components in the device are coupled together by a bus system. It will be appreciated that a bus system is used to enable connected communications between these components. The bus system includes a power bus, a control bus, and a status signal bus in addition to the data bus.

The user interface may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, track ball, touch pad, or touch screen, etc.).

It is to be understood that the memory in the embodiments disclosed herein may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable EPROM (EEPROM), or a flash Memory. The volatile memory may be random access memory (Random Access Memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (Double Data Rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and Direct RAM (DRRAM). The memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory stores the following elements, executable modules or data structures, or a subset thereof, or an extended set thereof: an operating system and application programs.

The operating system includes various system programs, such as a framework layer, a core library layer, a driving layer, and the like, and is used for realizing various basic services and processing hardware-based tasks. Applications, including various applications such as Media Player (Media Player), browser (Browser), etc., are used to implement various application services. The program implementing the method of the embodiment of the present disclosure may be contained in an application program.

In the above embodiment, the processor may be further configured to call a program or an instruction stored in the memory, specifically, may be a program or an instruction stored in an application program:

the steps of the method of example 1 are performed.

The method of embodiment 1 may be applied to, or implemented by, a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps and logic blocks disclosed in embodiment 1 may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with embodiment 1 may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processors (Digital Signal Processing, DSP), digital signal processing devices (DSP devices, DSPD), programmable logic devices (Programmable Logic Device, PLD), field programmable gate arrays (Field-Programmable Gate Array, FPGA), general purpose processors, controllers, microcontrollers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the inventive techniques may be implemented with functional modules (e.g., procedures, functions, and so on) that perform the inventive functions. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Example 4

Embodiment 4 of the present invention may also provide a nonvolatile storage medium for storing a computer program. The steps of the above-described method embodiments may be implemented when the computer program is executed by a processor.

Verification effect

Two ideas are used for evaluating the technical effect, one is that the distribution of the generated data and the real data can be compared to evaluate the capability of learning the data distribution because the real data has a known probability distribution; the second is used in real machine learning tasks to evaluate the performance of the generated data in real scenes.

(1) Single column distribution: generating 300 samples, and drawing generated data and a real distribution diagram by taking certificate TYPE CERTI_TYPE and the PROD_CNT columns of the number of the held products as examples, wherein the generated data and the real distribution diagram are shown in fig. 3 and 4, fig. 3 is a comparison of certificate TYPE column distribution, wherein fig. 3 (a) is generated data, and fig. 3 (b) is real data; fig. 4 is a comparison of the distribution of the holding product number prod_cnt columns, in which fig. 4 (a) is the generated data and fig. 4 (b) is the true data. Taking fig. 4 as an example, in real data, the distribution of the number columns of the held products approximately follows the normal distribution, and the number columns of the held products in the generated data are very close to the normal distribution, so that the method has better capability of learning the data distribution of a single column.

(2) Multi-column joint distribution: generating 300 samples, drawing a joint distribution contour map by taking two columns of gender SEX and whether a demand deposit is held or not, belonging industry CORP_INDUX and whether a demand deposit is held or not as an example, wherein the joint distribution contour map is shown in fig. 5 and 6, fig. 5 is a contour contrast map of joint distribution of gender and whether the demand deposit is held or not, wherein fig. 5 (a) is generated data, and fig. 5 (b) is real data; FIG. 6 is a graph of contour contrast of the joint distribution of the belonging industry CORP_INDUX and whether to hold the regular deposit TERM_FLAG column, wherein FIG. 6 (a) is the generated data and FIG. 6 (b) is the actual data; the violin diagram is shown in fig. 7 by taking three columns of the highest-school edu_lev, the held product quantity prod_cnt, and marital status MARRIAGE as an example, wherein fig. 7 (a) is the generated data and fig. 7 (b) is the real data.

(3) Real machine learning tasks: taking a financial wind control scene as an example, the label is a risk level and is characterized by 1980 columns of related marketing solution data collected. Different random seeds were taken according to 7:3 dividing the training set and the test set, and calculating the AUC as shown in table 4, wherein the data generated by the method can be seen to obtain good technical effect and accuracy in the actual machine learning scene.

Table 4 AUC results

The preferred embodiments of the present application have been described in detail above, but the present application is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present application within the scope of the technical concept of the present application, and all the simple modifications belong to the protection scope of the present application.

In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described in detail.

Moreover, any combination of the various embodiments of the present application may be made without departing from the spirit of the present application, which should also be considered as the disclosure of the present invention.

Claims

1. A method of generating tabular data based on a generated countermeasure network, the method comprising:

step S1), cleaning data of a table to be generated;

step S2), carrying out standardization treatment on the cleaned data;

step S3) inputting the standardized data into a pre-established and trained form data generation model to obtain form generation data;

the table data generation model is based on improved generation type antagonism network realization, and comprises a generator and a regressor;

the generator G comprises a convolution layer, a LeakyReLu activation function, a full connection layer and a Tanh activation function which are connected in sequence;

2. The method for generating table data based on a generated type countermeasure network according to claim 1, wherein the step S1) specifically includes: and checking the complete attribute of the data, and deleting the record with the null value.

3. The method for generating table data based on a generated type countermeasure network according to claim 1, wherein the data washed in step S2) includes a discrete column and a continuous column, and the normalization process specifically includes:

4. A method for generating tabular data based on a generated type countermeasure network according to claim 3, wherein said normalization processing is performed by using one-hot coding for discrete columns; the method specifically comprises the following steps:

for the first of the discrete columnscjColumn ofIs the first of (2)iIndividual element->Using one-hot coding to obtain vector +.>Wherein, the method comprises the steps of, wherein,drepresenting the total number of categories for that discrete column.

5. The method for generating table data based on a generated type countermeasure network according to claim 4, wherein the normalization processing is performed by using a variational gaussian mixture for successive columns; the method specifically comprises the following steps:

according to the cleaned data, the number of Gaussian distributionKAverage prior variance of gaussian distribution，

Obtaining the weight of the kth Gaussian distribution selected by adopting a CAVI algorithmMean->Sum of variances->And a variation probability density +.>；

According toFor the first in the continuous columnscjColumn->Element->Establishing a Gaussian mixture model:

wherein (1)>Mean value of +.>Variance isGaussian distribution of->；

From the slaveKSampling the variation probability density to obtain a Gaussian distribution for normalization, expressed as。

6. The method for generating table data based on a generated type countermeasure network according to claim 5, wherein the splicing specifically includes:

first, theiData after row splicingThe method comprises the following steps:

wherein the symbol->Representing a vector concatenation operation.

7. The method for generating tabular data based on generated countermeasure network according to claim 1, wherein the tabular data generation model further comprises a discriminator D in training: the discriminator D comprises a convolution layer, a LeakyReLu activation function, a pooling layer, a Flatten layer, a full connection layer and a Sigmoid activation function which are connected in sequence.

8. The method of generating tabular data based on a generative countermeasure network of claim 7, wherein said method further comprises a training step of a tabular data generation model; the method specifically comprises the following steps:

step T1), a training set is established;

step T2) setting gradient penalty coefficients respectivelyThe number of arbiter iterations in each generator iteration +.>Batch size b, adam super parameter +.>Training algebra +.>；

9. The method for generating table data based on a generated type countermeasure network according to claim 8, wherein the step T1) specifically includes:

10. The method for generating table data based on a generated type countermeasure network according to claim 8, wherein the step T3) specifically includes:

establishing a standard normal distribution N (0, 1) with a mathematical expectation of 0 and a standard deviation of 1, and taking the standard normal distribution N (0, 1) from the standard normal distributionbIndividual hidden vector samples；

Fetching from training setbIndividual data samples；

Take [0,1 ]]Random numbers within a range；

TraversingbEach time, taking one sample from each hidden vector sample，/>Obtaining corresponding embedded vector via generator G>：

；

Embedded vectorObtaining interpolation data by random linear interpolation>：

；

Wherein the method comprises the steps of，/>Representing Adam optimization algorithm,/->Is a discriminator (I/O)>Is about the discriminator parameter->Is used for the gradient of (a),representing two norms>Which is a gradient penalty coefficient.

11. The method for generating table data based on a generated type countermeasure network according to claim 10, wherein said step T4) is performed when the number of iterations is reachedAn update generator G; the method specifically comprises the following steps:

when the number of iterations is reachedTaken from a standard normal distribution N (0, 1)bPersonal hidden vector sample->；

Taking a sample from the hidden vector sample，/>；

Wherein (1)>Is about generator parameters->Is a gradient of (a).

12. The method for generating table data based on a generated type countermeasure network according to claim 11, wherein the step T5) specifically includes:

step T5-1) traversing the training algebraERepeating the step T5-2); until reaching training algebraETurning to the step T5-3);

step T5-2) from the normal distribution N (0, 1)bIndividual hidden vector samplesObtaining；

TraversingbEach time, taking one sample from each hidden vector sample，/>Obtaining the generated data by a regressor R>：

；

Updating regressor parameters：

13. A tabular data generation system based on a generated countermeasure network, the system being implemented according to the method of any of claims 1-12, the system comprising: the system comprises a cleaning module, a standardized processing module, a generating module and a form data generating model; wherein, the liquid crystal display device comprises a liquid crystal display device,

the cleaning module is used for cleaning the data of the form to be generated;

14. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 12 when executing the computer program.

15. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the method of any one of claims 1 to 12.