CN115617882A - Time sequence diagram data generation method and system with structural constraint based on GAN - Google Patents

Time sequence diagram data generation method and system with structural constraint based on GAN Download PDF

Info

Publication number
CN115617882A
CN115617882A CN202211638436.4A CN202211638436A CN115617882A CN 115617882 A CN115617882 A CN 115617882A CN 202211638436 A CN202211638436 A CN 202211638436A CN 115617882 A CN115617882 A CN 115617882A
Authority
CN
China
Prior art keywords
sequence
data
graph
subgraph
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211638436.4A
Other languages
Chinese (zh)
Other versions
CN115617882B (en
Inventor
李松
齐逸岩
刘力铭
幺宝刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Digital Economy Academy IDEA
Original Assignee
International Digital Economy Academy IDEA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Digital Economy Academy IDEA filed Critical International Digital Economy Academy IDEA
Priority to CN202211638436.4A priority Critical patent/CN115617882B/en
Publication of CN115617882A publication Critical patent/CN115617882A/en
Application granted granted Critical
Publication of CN115617882B publication Critical patent/CN115617882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The invention discloses a time sequence diagram data generation method and a time sequence diagram data generation system with structural constraint based on GAN, based on a GAN network structure, a time sequence diagram sequence is used for generating simulated time sequence diagram sequence data by a network, real time sequence diagram sequence data is obtained by sampling a real time sequence diagram in a target field, a first loss value between the simulated time sequence diagram sequence data and the real time sequence diagram sequence data is obtained by a time sequence diagram sequence discrimination network, a subgraph distribution distance value of the simulated time sequence diagram sequence data and a subgraph distribution distance value of the real time sequence diagram sequence data are compared to obtain a second loss value, the time sequence diagram sequence generation network is optimized according to the first loss value and the second loss value, the time sequence diagram sequence generation network can learn the subgraph distribution of the real time sequence diagram, and the time sequence diagram sequence generation network which is trained can generate the time sequence diagram data which is approximate to the real time sequence diagram and has high quality.

Description

Time sequence diagram data generation method and system with structural constraint based on GAN
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a method and a system for generating timing diagram data with structural constraint based on GAN.
Background
Aspects of real life, such as interpersonal communication networks, chemical molecules, biological information, and the like, can be represented by using the graph. As graph computing technology is gradually matured, graph representation learning methods are more and more widely applied to the fields of finance, recommendation, medical treatment and the like, particularly the financial field.
In order to obtain a graph representation learning model with high prediction accuracy, a large amount of graph data is required to train the graph representation learning model; due to the sensitivity of financial data, it is difficult to acquire a large amount of real transaction data. Therefore, graph data generation methods are often employed to generate large amounts of simulated graph data to assist in the training of graph representation learning models.
The current static graph data generation method is not suitable for a time sequence diagram in the fields of finance and the like, and the distribution change condition of a time sequence subgraph of a graph is not considered in the graph data generation process of the existing method. Therefore, the generated graph data has a large difference from the real time chart in subgraph distribution and is not high in quality.
Thus, the prior art is in need of improvement and advancement.
Disclosure of Invention
The invention mainly aims to provide a method, a system, an intelligent terminal and a storage medium for generating timing diagram data with structural constraint based on GAN, which can solve the problems of large difference and low quality between the currently generated diagram data and a real timing diagram in sub-diagram distribution.
In order to achieve the above object, a first aspect of the present invention provides a GAN-based timing graph data generation method with structural constraints, where the method includes:
acquiring noise data, inputting the noise data into a sequence generation network of a time sequence diagram, and generating simulated sequence data of the time sequence diagram for representing a target field;
sampling a real time sequence chart of a target field to obtain real time sequence chart sequence data;
inputting the simulation sequence diagram sequence data and the real sequence diagram sequence data into a sequence diagram sequence discrimination network to obtain a first loss value;
calculating and comparing a sub-graph distribution distance value of the simulated sequence data with a sub-graph distribution distance value of the real sequence data to obtain a second loss value for restricting sub-graph distribution;
obtaining a total loss value according to the first loss value and the second loss value;
optimizing model parameters of the sequence diagram sequence generation network until the total loss value meets set conditions, and obtaining a trained sequence diagram sequence generation network;
and inputting the noise data into the trained time sequence chart sequence generation network to obtain the time sequence chart data with the structural constraint.
Optionally, the simulation sequence data and the real sequence data are sequence data of a sequence diagram, and calculating a subgraph distribution distance value of the sequence data of the sequence diagram includes:
calculating a subgraph structure distance value corresponding to each class of subgraph structures in a preset subgraph structure class based on sequence data of the time sequence chart;
and accumulating all the subgraph structure distance values to obtain the subgraph distribution distance value.
Optionally, calculating a subgraph structure distance value corresponding to the subgraph structure based on the sequence data of the time chart, including:
counting the number of the subgraph structures in sequence data of the time sequence chart to obtain the number of predicted subgraphs;
and subtracting the number of the predicted subgraphs from the number of the real subgraphs, and then squaring to obtain the subgraph structure distance value.
Optionally, the accumulating all sub-graph structure distance values to obtain the sub-graph distribution distance value includes:
acquiring the weight corresponding to each type of sub-graph structures in the preset sub-graph structure type;
and based on the weight, performing weighted accumulation on subgraph structure distance values corresponding to all classes of subgraph structures to obtain the subgraph distribution distance value.
Optionally, a gating network constructed based on LSTM is further provided, and the obtaining of the weight corresponding to each type of sub-graph structure in the preset sub-graph structure category includes:
inputting the real sequence diagram sequence data into the gating network to obtain a weight vector;
and inputting the weight vector into a full-connection layer to obtain the weight corresponding to each type of sub-graph structure.
Optionally, the sequence data of the timing graph includes several triple data describing the edges of the timing graph, and the triple data includes a start node, a stop node and a timestamp constituting the edges of the timing graph.
Optionally, the sequence diagram generation network is constructed based on an LSTM model, and a time constraint module is further disposed in the sequence diagram generation network, and the time constraint module is configured to constrain timestamps in the triple data according to a time sequence.
A second aspect of the present invention provides a GAN-based timing diagram data generation system with structural constraints, wherein the system comprises:
the sequence generation module of the time sequence is used for obtaining noise data, inputting the noise data into a trained sequence generation network of the time sequence, obtaining the time sequence data with structural constraint or inputting the noise data into the sequence generation network of the time sequence, and generating simulation sequence data of the time sequence in the representation target field;
the sequence sampling module of the time sequence is used for sampling a real time sequence chart of the target field to obtain sequence data of the real time sequence chart;
the sequence distinguishing module of the time sequence diagram is used for inputting the sequence data of the simulated time sequence diagram and the sequence data of the real time sequence diagram into the sequence distinguishing network of the time sequence diagram to obtain a first loss value;
the subgraph distribution constraint module is used for calculating and comparing a subgraph distribution distance value of the simulated sequence data of the time sequence diagram with a subgraph distribution distance value of the real sequence data of the time sequence diagram to obtain a second loss value for constraining subgraph distribution;
the optimization module is used for obtaining a total loss value according to the first loss value and the second loss value; and optimizing the model parameters of the sequence diagram sequence generation module until the total loss value meets the set condition to obtain the trained sequence diagram sequence generation module.
Optionally, the subgraph distribution constraint module further includes a gated neural network, and the gated neural network is configured to obtain a weight corresponding to each class of subgraph structure in the preset subgraph structure class according to the real sequence data of the time sequence diagram.
A third aspect of the present invention provides an intelligent terminal, where the intelligent terminal includes a memory, a processor, and a GAN-based timing graph data generating program with structural constraints, stored in the memory and executable on the processor, and when the GAN-based timing graph data generating program with structural constraints is executed by the processor, the intelligent terminal implements any one of the steps of the GAN-based timing graph data generating method with structural constraints.
A fourth aspect of the present invention provides a computer-readable storage medium, where a GAN-based timing graph data generation program with structural constraint is stored, and when executed by a processor, the GAN-based timing graph data generation program with structural constraint implements any one of the steps of the GAN-based timing graph data generation method with structural constraint.
As can be seen from the above, compared with the prior art, the method is based on the GAN network structure, the sequence diagram sequence generation network is used to generate the simulated sequence diagram sequence data, the real sequence diagram sequence data in the target field is sampled to obtain the real sequence diagram sequence data, the sequence diagram sequence discrimination network is used to obtain the first loss value between the simulated sequence diagram sequence data and the real sequence diagram sequence data, the subgraph distribution distance value of the simulated sequence diagram sequence data and the subgraph distribution distance value of the real sequence diagram sequence data are compared to obtain the second loss value, the sequence diagram sequence generation network is optimized according to the first loss value and the second loss value, and the training time sequence diagram sequence generation network can learn the subgraph distribution of the real sequence diagram, so that the trained sequence diagram sequence generation network can generate high-quality sequence diagram data approximate to the real sequence diagram.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a block diagram of a data generation architecture of a timing graph with structural constraints based on GAN according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for generating timing diagram data with structural constraints based on GAN according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a sequence data generating network of the embodiment of FIG. 2;
FIG. 4 is a schematic diagram of the encoding and decoding of the embodiment of FIG. 2 for consecutive time stamp data;
FIG. 5 is a diagram of nine classes of sub-graph structures for the embodiment of FIG. 2;
FIG. 6 is a detailed flowchart of step S400 in the embodiment of FIG. 2;
FIG. 7 is a schematic structural diagram of a GAN-based timing diagram data generation system with structural constraints according to an embodiment of the present invention;
fig. 8 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, models, structures, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when …" or "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted depending on the context to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
With the development of artificial intelligence, the graph-showing learning method is widely applied to various aspects in life, such as finance, recommendation, medical treatment and other fields. And the reality degree of graph data used when the training graph represents the learning model has great influence on the prediction accuracy of the graph representation learning model.
The current static graph data generation method is not suitable for a time sequence diagram in the fields of finance and the like, and the distribution change condition of a time sequence subgraph of a graph is not considered in the graph data generation process of the existing method. Therefore, the generated graph data has a large difference from the real time chart in subgraph distribution and is not high in quality. The graph trained by the generated graph data shows poor prediction effect when the learning model is used.
In order to solve the problems in the prior art, the invention provides a timing diagram data generation method with structural constraint based on a GAN (Generative adaptive Networks), wherein in the timing diagram data generation process, the subgraph structure distribution in the diagram is fully considered and is made to approach the real diagram data of the target field.
Exemplary method
The embodiment of the invention provides a GAN-based timing graph data generation method with structural constraints, which is used for generating timing graph data in the financial field, and fig. 1 is an architecture block diagram of the embodiment. During training, the timing sequence generation module generates a timing sequence random walk sequence according to input noise data, and then the timing sequence random walk sequence and a real image data timing sequence random walk sequence obtained by sampling in real image data are input into the timing sequence image sequence judgment module together to judge whether the generated timing sequence random walk sequence is from the real image data; the subgraph distribution constraint module constrains the subgraph distribution of the time-sequence random walk sequence generated by the time-sequence chart sequence generation module to approach the subgraph distribution in the real graph data according to a subgraph pattern (a predefined subgraph structure in fig. 1) which is predefined by a specific scene. After training, the noise data is input into the trained sequence diagram sequence generation module, and the sequence diagram data with the structural constraint can be obtained. Specifically, as shown in fig. 2, the present embodiment includes the following steps:
step S100: acquiring noise data, inputting the noise data into a sequence generation network of a time sequence chart, and generating simulated time sequence chart sequence data for representing a time sequence chart of a target field;
specifically, in GAN networks, noise data that obeys a certain distribution rule is generally used to generate target data such as analog sequence chart sequence data generated by the present invention. The description of the noise data is:
Figure 352276DEST_PATH_IMAGE001
wherein
Figure 534996DEST_PATH_IMAGE002
H is the vector length of the hidden layer,
Figure 642629DEST_PATH_IMAGE003
the distribution of the noise data is generally a uniform distribution or gaussian distribution, that is, the noise data is a real number vector of h dimension generated by the uniform distribution or gaussian distribution.
Because the present invention is directed to timing diagrams, timing diagram sequence data is employed to characterize the timing diagrams. The sequence data of the timing chart can be represented as a sequence of time-series random walk
Figure 303417DEST_PATH_IMAGE004
In which
Figure 321052DEST_PATH_IMAGE005
Indicating whether it is a start sequence, when the sequence is a start sequence of a time-series random walk sequence
Figure 409094DEST_PATH_IMAGE006
Is 1, other sequences
Figure 269602DEST_PATH_IMAGE006
Is 0; the time sequence random walk sequence comprises a plurality of triple data, and each triple data
Figure 671765DEST_PATH_IMAGE007
One edge in the timing diagram is shown,
Figure 606223DEST_PATH_IMAGE008
are the starting node and the terminating node of the edge,
Figure 927483DEST_PATH_IMAGE009
a timestamp of the edge (start or end time, duration, etc. for saving the edge),
Figure 947391DEST_PATH_IMAGE010
the length of the sequence data of a timing graph, i.e. the total number of edges characterizing the timing graph,
Figure 153245DEST_PATH_IMAGE011
indicating whether the sequence is terminated or not, when the sequence is terminated
Figure 4526DEST_PATH_IMAGE012
Is 1, otherwise
Figure 699950DEST_PATH_IMAGE012
Is 0.
In this embodiment, the sequence generation module generates a network for the sequence of timing diagrams. The sequence generation network of the timing diagram corresponds to a generation network in a generative countermeasure network. Noise data
Figure 144837DEST_PATH_IMAGE001
After the sequence of the financial field graph is input into the generation network of the sequence of the financial field graph, the sequence data of the simulated sequence graph which can represent the sequence of the financial field graph is output, namely, the sequence data of the financial field graph of each time sequence can be obtained by analyzing the sequence data of the simulated sequence graph.
Because the Long Short-Term Memory model (LSTM) can selectively acquire important information in the sequence according to the sequence characteristics and ignore irrelevant information when processing the time sequence data, the processing capability of the time sequence data is improved, and the time sequence data can be better generated. Preferably, as shown in this embodiment, the sequence of timing diagrams is constructed based on the LSTM model to generate the network. Of course, the sequence generation network can also be constructed based on other existing memory network models, such as RNN.
The output of the sequence generation network is a sequence random walk sequence
Figure 685540DEST_PATH_IMAGE004
Figure 656907DEST_PATH_IMAGE007
The data in the sequence can be divided into two categories: continuous time stamp data
Figure 992074DEST_PATH_IMAGE013
And discrete data
Figure 986574DEST_PATH_IMAGE014
Figure 862127DEST_PATH_IMAGE015
Figure 360104DEST_PATH_IMAGE016
Figure 397330DEST_PATH_IMAGE017
In which
Figure 551231DEST_PATH_IMAGE014
Figure 27212DEST_PATH_IMAGE015
Binary data (whose value is 0 or 1);
Figure 379696DEST_PATH_IMAGE016
Figure 587823DEST_PATH_IMAGE017
classify data for N (N is the total number of nodes in the timing diagram), for example:
Figure 229020DEST_PATH_IMAGE016
00100.. Denotes
Figure 915216DEST_PATH_IMAGE016
Is the third node. Thus, the sequence of timing diagrams includes both pairs of discrete values in the network
Figure 715682DEST_PATH_IMAGE018
Figure 94711DEST_PATH_IMAGE019
Also includes decoding and encoding continuous values
Figure 488783DEST_PATH_IMAGE020
Decoding and encoding.
Referring to FIG. 3, the method is shown for discrete data
Figure 713091DEST_PATH_IMAGE014
Figure 102484DEST_PATH_IMAGE015
Figure 652414DEST_PATH_IMAGE016
Figure 268203DEST_PATH_IMAGE017
When encoding and decoding are performed, in a decoder of a sequence chart sequence generation network, first, the LSTM cell is output
Figure 155257DEST_PATH_IMAGE021
Mapping is performed and then Gumbel-Max reparameterization techniques are used to generate discrete class values. Gumbel-Max reparameterization provides a method for distributing from categoriesThe method of sampling is a conventional technical means in the art, and is not described herein again. Optionally, gumbel-Softmax may also be used for reparameterization.
With discrete data
Figure 540101DEST_PATH_IMAGE017
For example, the decoder has the expression:
Figure 995354DEST_PATH_IMAGE022
Figure 223073DEST_PATH_IMAGE023
Figure 54762DEST_PATH_IMAGE024
wherein the mapping matrix
Figure 559693DEST_PATH_IMAGE025
Bias term
Figure 920267DEST_PATH_IMAGE026
Figure 572965DEST_PATH_IMAGE027
Is a vector formed by independently distributed sample values of the same distribution standard Gumbel,
Figure 282382DEST_PATH_IMAGE028
is a "Temperature" over-parameter,
Figure 438557DEST_PATH_IMAGE029
is a vector
Figure 704453DEST_PATH_IMAGE030
Maximum value of (2). Discrete data
Figure 641185DEST_PATH_IMAGE031
Processing method and
Figure 80256DEST_PATH_IMAGE032
the same is true.
For discrete data
Figure 294200DEST_PATH_IMAGE014
Figure 527735DEST_PATH_IMAGE015
Also associated with discrete data
Figure 623867DEST_PATH_IMAGE029
The decoding process is the same, and the specific expression is as follows:
Figure 928947DEST_PATH_IMAGE033
Figure 997397DEST_PATH_IMAGE034
Figure 401833DEST_PATH_IMAGE035
Figure 313158DEST_PATH_IMAGE036
Figure 828453DEST_PATH_IMAGE037
Figure 16988DEST_PATH_IMAGE038
the result of the decoder output is then input to the encoder. Re-combining nodes in an encoder
Figure 326747DEST_PATH_IMAGE039
Start mark
Figure 663050DEST_PATH_IMAGE040
Are respectively coded into vectors
Figure 44353DEST_PATH_IMAGE041
Figure 352975DEST_PATH_IMAGE042
Figure 99214DEST_PATH_IMAGE043
Input into the next LSTM cell, wherein
Figure 719551DEST_PATH_IMAGE044
Figure 842228DEST_PATH_IMAGE045
Figure 536514DEST_PATH_IMAGE046
. The abovementioned Dense operation refers to scalar quantity passing through a Dense layer (fully-connected neural network layer)
Figure 391338DEST_PATH_IMAGE047
Figure 702234DEST_PATH_IMAGE048
Figure 425339DEST_PATH_IMAGE049
Are respectively mapped into vectors
Figure 239711DEST_PATH_IMAGE050
Figure 796595DEST_PATH_IMAGE051
Figure 391524DEST_PATH_IMAGE052
Fig. 4 illustrates an encoding and decoding process for consecutive time stamp data. In the decoder (Decode), firstOutputting LSTM unit output vector by using deconvolution Layer (deconvolution Layer)
Figure 856003DEST_PATH_IMAGE053
Expansion into a matrix
Figure 259303DEST_PATH_IMAGE054
Wherein
Figure 721508DEST_PATH_IMAGE055
The dimensionality of the matrix output for the deconvolution layer; then in the matrix
Figure 741417DEST_PATH_IMAGE056
Uniformly sampling one or more lines of vectors and averaging to obtain average vectors
Figure 806325DEST_PATH_IMAGE057
(ii) a Finally, a Dense layer (full connection neural network layer) will be passed
Figure 595289DEST_PATH_IMAGE057
Mapping to a continuous scalar
Figure 493975DEST_PATH_IMAGE058
. In an encoder (Encode), another sense layer is used to map scalars
Figure 797917DEST_PATH_IMAGE059
Mapping as vectors
Figure 604199DEST_PATH_IMAGE060
. Continuous time stamp data through two times of Dense mapping
Figure 450933DEST_PATH_IMAGE059
Finally become vectors
Figure 51678DEST_PATH_IMAGE061
And input to the next LSTM unit.
Further, to ensure that the sequence data of the generated time chart is satisfiedInter-constraint, i.e. according to the time sequence constraint the time stamp data in every three groups of data, a time constraint mechanism is also designed in the decoder
Figure 780600DEST_PATH_IMAGE062
To ensure
Figure 187310DEST_PATH_IMAGE063
For example: clock constraints are employed.
After the above processing procedure, the sequence in the simulated sequence data of the sequence chart is generated in the sequence chart generation network firstly
Figure 950867DEST_PATH_IMAGE064
Then generating a vector recording the initial time
Figure 660197DEST_PATH_IMAGE065
Continuously generating L groups of ternary groups of data
Figure 469890DEST_PATH_IMAGE066
And finally, generating a sequence y in the simulation sequence chart sequence data and outputting the simulation sequence chart sequence data. The sequence data generated from the noisy data is randomly generated, also referred to as a time-sequential random walk sequence
Figure 617975DEST_PATH_IMAGE067
Step S200: sampling a real time sequence chart of a target field to obtain real time sequence chart sequence data;
step S300: inputting the simulation sequence diagram sequence data and the real sequence diagram sequence data into a sequence diagram sequence discrimination network to obtain a first loss value;
specifically, a GAN network architecture is adopted to train the sequence diagram sequence generation network, so that the sequence diagram sequence generation network can learn the characteristics of real sequence diagram data of the target field during training. In this embodiment, the timing sequence determination module is a timing sequence determination network, and the timing sequence determination network is equivalent to a determiner in the GAN. The timing sequence discrimination network and the timing sequence generation network together form a GAN network architecture. And (3) according to the countermeasure training of the sequence diagram sequence generation network and the sequence diagram sequence discrimination network, the extraction of the sequence data characteristics of the sequence diagram sequence generation network and the judgment of the abnormal data of the sequence diagram sequence discrimination network are realized. Due to the anti-training idea, when the data characteristics are learned, the time sequence diagram sequence generation network and the time sequence diagram sequence discrimination network can be continuously improved according to the learned characteristics, the capacity of the time sequence diagram sequence generation network for generating real time sequence diagram data and the capacity of the time sequence diagram sequence discrimination network for discriminating the generated diagram data and the real diagram data are improved, and the construction of the time sequence diagram sequence generation network is finally realized.
In this embodiment, the target field is the financial field, and after the real time sequence chart samples in the financial field are collected as training samples, the real time sequence chart samples are sampled to obtain real time sequence chart sequence data. Then, the simulated sequence diagram sequence data and the real sequence diagram sequence data are input into a sequence diagram sequence discrimination network, and the simulated sequence diagram sequence data and the real sequence diagram sequence data are compared to judge whether the simulated sequence diagram sequence data is from the real diagram data.
In the real time chart data
Figure 173721DEST_PATH_IMAGE068
When sampling is carried out, a random walk mode is adopted to obtain real sequence data of a time sequence chart
Figure 381848DEST_PATH_IMAGE069
Wherein V is the set of all nodes, E is the set of all edges, and T is the set of timestamps corresponding to the edges. Each sampling process can obtain a set of real sequence data of the time sequence chart
Figure 819783DEST_PATH_IMAGE070
The sampling process comprises the following specific steps:
step S110: initializing sampling parameters;
specifically, initializing the total number of edges sampled this time
Figure 568296DEST_PATH_IMAGE071
(which may be larger than the total number of edges of the real timing graph), a sequence length counter is set
Figure 306445DEST_PATH_IMAGE072
Initial value
Figure 357578DEST_PATH_IMAGE073
(ii) a And discrete node data
Figure 876284DEST_PATH_IMAGE074
Indicating the start of real profile data.
Step S120: sampling according to the average sampling probability to obtain a first edge of the real sequence data of the time sequence diagram;
in particular, by probability
Figure 366171DEST_PATH_IMAGE075
Sampling from real graph data to obtain a first edge in real time sequence chart sequence data
Figure 896509DEST_PATH_IMAGE076
And make an order
Figure 446439DEST_PATH_IMAGE077
Step S130: sampling for the next time to obtain the next edge of the real sequence data of the time sequence diagram;
in particular, sequentially by probability
Figure 186862DEST_PATH_IMAGE078
Sampling the next edge from the real image data and ordering
Figure 214861DEST_PATH_IMAGE079
Until the sequence length counter i equals
Figure 599706DEST_PATH_IMAGE080
. Wherein
Figure 54958DEST_PATH_IMAGE081
The normalization function is used, so that the probability that all edges in the real image data are sampled is added up to 1;
step S140: and when the number of the sampled edges is equal to the set total number of the sampled edges, finishing the sampling, otherwise, returning to the step S120 for the next sampling.
Specifically, when all edges in the real-time sequence diagram are sampled, it indicates that a round of sampling is finished. When the sampling is performed in the current round, the number of the sampled edges i is equal to the total number of the sampled edges
Figure 548256DEST_PATH_IMAGE082
When the sampling process is finished, the sampling process is finished to
Figure 583209DEST_PATH_IMAGE083
Indicating true profile sequence data termination; and when the sampling number of edges does not meet the requirement after the sampling of the round is finished, returning to the step S120 to execute the next sampling round until the sampled number of edges is equal to the set total sampling number of edges.
After sampling to obtain real sequence diagram sequence data, inputting the simulated sequence diagram sequence data and the real sequence diagram sequence data into a sequence diagram sequence discrimination network, and then obtaining a first loss value according to a loss function of the sequence diagram sequence discrimination network to judge whether the simulated sequence diagram sequence data is from the real diagram data. For example: and comparing the real sequence data of the time sequence chart with the simulated sequence data of the time sequence chart by adopting a cross entropy loss function in the time sequence chart judging network to obtain a first loss value. The embodiment uses a LSTM network-based classifier
Figure 884877DEST_PATH_IMAGE084
Sorter
Figure 838926DEST_PATH_IMAGE084
The input of the classifier is the output of the sequence chart generation network and the sample obtained by sampling obtains the real sequence chart sequence data
Figure 491625DEST_PATH_IMAGE084
The real time chart sequence data and the simulated time chart sequence data are classified respectively, and whether the simulated time chart sequence data is derived from the real map data or not is predicted according to the classification result. The specific loss function is:
Figure 595847DEST_PATH_IMAGE085
wherein the content of the first and second substances,
Figure 955284DEST_PATH_IMAGE086
for the true sequence of the sequence data of the time chart,
Figure 17918DEST_PATH_IMAGE087
simulated sequence data generated by the network is generated for the sequence of sequence diagrams.
The loss function is reduced by optimizing the sequence of the time sequence chart to generate the network and judging the model parameters of the network by the sequence of the time sequence chart
Figure 954650DEST_PATH_IMAGE088
The obtained first loss value allows the simulated sequence data to gradually approximate the real sequence data.
Step S400: calculating and comparing a subgraph distribution distance value of the simulated sequence data of the timing diagram with a subgraph distribution distance value of the real sequence data of the timing diagram to obtain a second loss value for restricting the subgraph distribution distance value;
specifically, the subgraph distribution refers to which types of subgraph structures exist in the sequence data of the time chart, and information such as the number, connection relationship, time correlation and the like of various subgraph structures. Wherein the sub-graph structure is also referred to as a minimum spanning tree.
In order to make the simulated sequence data of the sequence diagram generated by the sequence diagram generation network more similar to the real sequence data of the sequence diagram, the sequence diagram generated by the sequence diagram generation module can approach the real sequence diagram on a sub-graph distribution. Referring to fig. 1, a sub-graph distribution constraint module is used to guide a timing sequence generation module to learn sub-graph distribution characteristics of real pattern data in a target field, so as to further improve the quality of generated timing sequence data.
The present embodiment uses a subgraph distribution distance value to measure the similarity between simulated sequence data and real sequence data. The main process of carrying out the subgraph distribution constraint by the subgraph distribution constraint module is as follows: and calculating a subgraph distribution distance value in the real graph time sequence data and a subgraph distribution distance value in the simulated time sequence data respectively, and calculating a second loss value according to the two subgraph distribution distance values to restrict the subgraph distribution in the generated simulated time sequence data to be close to the subgraph distribution of the real graph data.
Specifically, after analyzing various subgraph structures, nine types of subgraph structures shown in fig. 5 are obtained. The sub-graph structures in the timing graph data for each scene vary and may include one or more of the sub-graph structures in fig. 5. In the financial anti-money laundering scenario of this embodiment, there is usually a ring sub-graph between nodes, i.e. the sub-graph structure of the financial real pattern data is a ring sub-graph structure (e.g. the sub-graph structures 5, 6, and 7 in fig. 5). Selecting real graph data samples according to the pre-selected sub-graph structure, for example: the embodiment mainly selects the real graph data sample with the annular sub-graph structure, so that the sub-graph distribution in the time sequence graph data generated by the time sequence graph sequence generation module is constrained to be close to the sub-graph distribution in the real financial graph data.
On the basis of the sub-graph structures, the sub-graph structure distance values are obtained and accumulated by calculating the distances of the sub-graph structures of various types, so that the sub-graph distribution distance value is obtained. As shown in fig. 6, the method specifically includes the following steps:
step S410: calculating a sub-graph structure distance value corresponding to each sub-graph structure in a preset sub-graph structure category;
specifically, taking the example of calculating the sub-graph structure distance value of the real sequence data under each class of sub-graph structures, when calculating the sub-graph structure distance value corresponding to a certain class of sub-graph structures (e.g. the 4 th sub-graph structure in fig. 5), the specific steps are as follows: when sampling real graph data G = (V, E, T) random walk, counting the number of subgraphs corresponding to the subgraph structures of the category in the real sequence data of the time sequence diagram to obtain the number of real subgraphs; or calibrating the real graph sample in advance to obtain the number of the real subgraphs. And identifying subgraph structures of various categories of sequence data of the real time sequence chart through a subgraph distribution prediction module, counting the number of subgraphs to obtain the number of predicted subgraphs, subtracting the number of the predicted subgraphs from the number of the real subgraphs, and then taking a square value to obtain a subgraph structure distance value. In the embodiment, the sub-graph distribution prediction module is constructed based on the LSTM, and the number of sub-graphs corresponding to each sub-graph structure in the real sequence data of the time sequence chart is predicted through the LSTM unit. Optionally, other common technical means in the field may also be adopted to count the number of some sub-graph structures in the real time graph data, for example, a path query algorithm on the time graph based on a greedy algorithm, or a path query method based on graph transformation, or through a TTL algorithm, etc.
The method for calculating the subgraph structure distance value of the simulation sequence data under each class of subgraph structures is similar to the method for calculating the subgraph structure distance value of the real sequence data under each class of subgraph structures, and the specific steps are as follows: counting the number of subgraphs corresponding to the subgraph structure of the category in the sequence data of the real time sequence chart when the real graph data G = (V, E, T) is randomly walked and sampled, and obtaining the number of real subgraphs; or calibrating the real graph sample in advance to obtain the number of the real subgraphs. And identifying subgraph structures of various categories of sequence data of the simulation time sequence chart through a subgraph distribution prediction module, counting the number of subgraphs to obtain the number of predicted subgraphs, subtracting the number of predicted subgraphs from the number of real subgraphs, and then taking a square value to obtain a subgraph structure distance value.
It should be noted that, in order to increase the processing speed and efficiency, the sub-graph structure distance value is calculated in this embodiment by simply counting the number of sub-graphs corresponding to the sub-graph structure, and in other scenarios, the sub-graph structure distance value may be calculated with reference to other items related to the sub-graph structure.
Step S420: and accumulating all the subgraph structure distance values to obtain subgraph distribution distance values.
Specifically, the subgraph structure distance values under all subgraph structure categories are accumulated to obtain subgraph distribution distance values. The specific expression for calculating the subgraph distribution distance value of the real time sequence data is as follows:
Figure 393722DEST_PATH_IMAGE089
wherein
Figure 607665DEST_PATH_IMAGE090
For the number of real subgraphs of the ith seed graph in the real time graph data,
Figure 841200DEST_PATH_IMAGE091
for the number of predicted subgraphs of the i-th seed graph in the real time chart data by the subgraph distribution prediction module, k refers to the total number of preset subgraph structure types (e.g. 9 types in fig. 5, k = 9),
Figure 265229DEST_PATH_IMAGE092
and sampling the obtained real time sequence data in the real image data for the sampling module.
The expression for calculating the subgraph distribution distance value of the simulation time sequence data is the same as the above expression except that
Figure 976833DEST_PATH_IMAGE093
Instead of using
Figure 310862DEST_PATH_IMAGE094
Figure 715299DEST_PATH_IMAGE095
Simulated sequence profile data generated by the network is generated for the sequence profile.
The subgraph distribution prediction module is combined with the subgraph distribution distance value in the real graph time sequence data, the subgraph distribution distance value in the time sequence data is simulated to construct a loss function of the subgraph distribution prediction module, and the specific expression is as follows:
Figure 626623DEST_PATH_IMAGE096
wherein the content of the first and second substances,
Figure 345180DEST_PATH_IMAGE097
a network is generated for the sequence of timing diagrams,
Figure 596033DEST_PATH_IMAGE098
is a distance function used to compute the sub-graph distribution distance values.
And calculating a second loss value according to the loss function to measure the difference between the subgraph distribution in the real graph time sequence data and the subgraph distribution in the simulated time sequence data, and optimizing the model parameters of the time sequence generation network, the time sequence discrimination network and the subgraph distribution prediction module according to the second loss value.
In one embodiment, the degree of similarity between the subgraph distribution in the real graph time sequence data and the subgraph distribution in the simulation time sequence data can be directly calculated by using a graph neural network, and a second loss value is obtained.
Because the subgraph distribution in the time sequence diagram data is considered in the time sequence diagram generation process, the real graph data can be approached, the quality of the generated graph data is improved, and the training effect of the graph representation learning model is further improved.
In order to further improve the accuracy of the sub-graph distribution distance value, the embodiment also designs a gating network in the sub-graph distribution prediction module, so as to learn the importance of various sub-graph structures in the timing diagram data. The gate control network is constructed based on an LSTM model, real sequence diagram sequence data are input into an LSTM unit, the number of various sub-graph structures in each real sequence diagram sequence data is memorized according to a memory unit of the LSTM, the LSTM model finally outputs the number of various sub-graph structures, the sub-graph structures are mapped into a vector to be input into a full-link layer, and the vector is mapped (0,1) through a sigmoid function to obtain the weight corresponding to each type of sub-graph structures. The specific expression of the gating network is as follows:
Figure 905791DEST_PATH_IMAGE099
wherein
Figure 304412DEST_PATH_IMAGE100
For output implicit vector representation of the final element of the LSTM network
Figure 623398DEST_PATH_IMAGE101
The vector is mapped to a scalar representing the estimated number of i-th seed maps. The model expression of the full connection layer is as follows:
Figure 666440DEST_PATH_IMAGE102
in which
Figure 412679DEST_PATH_IMAGE103
For the weight of the fully-connected network layer,
Figure 33016DEST_PATH_IMAGE104
is a weight vector. The gating network may be a single LSTM network or may be an LSTM network that generates a network based on sequence data from a timing graph.
After the weights of various sub-graph structures are obtained, the sub-graph structure distance values corresponding to all types of sub-graph structures are weighted and accumulated according to the weights, and the weighted sub-graph distribution distance values are obtained. For example:
Figure 155693DEST_PATH_IMAGE105
wherein
Figure 318821DEST_PATH_IMAGE106
For a true number of ith seed maps in the true timing graph data,
Figure 298278DEST_PATH_IMAGE107
the number is estimated for the ith seed map in the real timing graph by the sub-graph distribution prediction module,
Figure 609174DEST_PATH_IMAGE108
is the weight of the ith seed map.
By adopting the weighting method, the subgraph distribution distance value in the real graph time sequence data and the subgraph distribution distance value in the simulated time sequence data are calculated, so that the second loss value is more accurate, and the optimization effect is improved.
Step S500: obtaining a total loss value according to the first loss value and the second loss value;
step S600: and optimizing the model parameters of the sequence generation network until the total loss value meets the set condition, and obtaining the trained sequence generation network.
Step S700: and inputting the noise data into a trained time sequence chart sequence generation network to obtain time sequence chart data with structural constraint.
Specifically, when the sequence diagram sequence generates network training, the loss function of the module is predicted according to the subgraph distribution
Figure 191334DEST_PATH_IMAGE109
Sequence diagram sequence discrimination network loss function
Figure 208969DEST_PATH_IMAGE110
Constructing a total loss function, wherein the total loss function is as follows:
Figure 890486DEST_PATH_IMAGE111
. Performing joint training on the time sequence diagram sequence generation network, the time sequence diagram sequence discrimination network and the subgraph distribution prediction module according to the total loss function, and optimizing the time sequence diagram sequence generation network
Figure 360781DEST_PATH_IMAGE112
Until the total loss value calculated according to the total loss function reaches the set precision requirement. And after training is finished, obtaining a well-trained sequence diagram sequence to generate a network. When generating the time sequence data, the noise data is input into the trained time sequence generation network, and the time sequence data with structural constraint can be obtained.
In summary, in this embodiment, the simulated sequence diagram sequence data is generated by the sequence diagram sequence generation network, the real sequence diagram is sampled to obtain the real sequence diagram sequence data, the sequence diagram sequence discrimination network is used to obtain a first loss value between the simulated sequence diagram sequence data and the real sequence diagram sequence data, a second loss value is obtained according to the sub-graph distribution distance value of the simulated sequence diagram sequence data and the sub-graph distribution distance value of the real sequence diagram sequence data, and the training sequence diagram sequence generation network is optimized according to the first loss value and the second loss value, so that the training sequence diagram generation network can generate accurate and high-quality graph data.
Exemplary System
As shown in fig. 7, corresponding to the GAN-based timing graph data generating method with structural constraint, an embodiment of the present invention further provides a GAN-based timing graph data generating system with structural constraint, where the system includes:
a sequence generation module 600, configured to acquire noise data, input the noise data into a trained sequence generation network, acquire sequence data of a timing graph with structural constraints or input the noise data into the sequence generation network, and generate simulated sequence data of the timing graph for characterizing a target domain;
the sequence sampling module 610 of the time sequence diagram is used for sampling the real time sequence diagram of the target field to obtain the sequence data of the real time sequence diagram;
a sequence table sequence discrimination module 620, configured to input the sequence data of the simulated sequence table and the sequence data of the real sequence table into the sequence table sequence discrimination network to obtain a first loss value;
a sub-graph distribution constraint module 630, configured to calculate and compare a sub-graph distribution value of the simulated sequence data with a sub-graph distribution value of the real sequence data, so as to obtain a second loss value for constraining sub-graph distribution;
an optimization module 640, configured to obtain a total loss value according to the first loss value and the second loss value; and optimizing the model parameters of the sequence diagram sequence generation network until the total loss value meets the set condition, and obtaining the trained sequence diagram sequence generation network.
During training, the sequence diagram sequence generating module 600 outputs simulated sequence diagram sequence data according to the obtained noise data, and then inputs the simulated sequence diagram sequence data and real sequence diagram sequence data sampled from the real diagram data by the sequence diagram sampling module 610 to the sequence diagram sequence judging module 620 to judge whether the simulated sequence diagram sequence data is from the real diagram data. The subgraph distribution constraint module 630 constrains the subgraph distribution in the sequence diagram data generated by the sequence diagram generation module to be close to the subgraph in the real graph data according to a subgraph pattern predefined by a specific scene. The difference of the generated graph data and the real graph data on the subgraph structure is further considered, the generated graph data is closer to the real data, the generated graph data quality is higher, and the quality can be improved by 30% compared with the MMD (Maximum Mean redundancy) index of the node degree in the prior art through testing. After training is finished, noise data is input into the trained sequence diagram generation network, and sequence diagram sequence data with structural constraints can be obtained for representing training of the learning model.
Optionally, the sub-graph distribution constraint module 630 further includes a gating network constructed based on LSTM, and the gating network is configured to obtain a weight corresponding to each type of sub-graph structure according to the real sequence diagram sequence data. By learning the weight of each sub-graph structure through the gating network, the authenticity of the timing graph data generated by the timing graph sequence generation module 600 can be further improved.
Specifically, in this embodiment, the specific functions of each module of the GAN-based timing diagram data generation system with structural constraint may refer to the corresponding descriptions in the GAN-based timing diagram data generation method with structural constraint, and are not described herein again.
Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 8. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a GAN-based timing graph data generation program with structural constraints. The internal memory provides an environment for the operating system and GAN-based timing diagram data generator with structural constraints to run in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. When being executed by a processor, the GAN-based timing graph data generation program with structural constraint realizes the steps of any one of the GAN-based timing graph data generation methods with structural constraint. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.
It will be understood by those skilled in the art that the block diagram of fig. 8 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have different arrangements of components.
In one embodiment, an intelligent terminal is provided, where the intelligent terminal includes a memory, a processor, and a GAN-based timing graph data generation program with structural constraints stored in the memory and executable on the processor, and the GAN-based timing graph data generation program with structural constraints performs the following operations when executed by the processor:
acquiring noise data, inputting the noise data into a sequence generation network of a time sequence diagram, and generating simulated sequence data of the time sequence diagram for representing a target field;
sampling a real time sequence chart of a target field to obtain real time sequence chart sequence data;
inputting the simulation sequence diagram sequence data and the real sequence diagram sequence data into a sequence diagram sequence discrimination network to obtain a first loss value;
calculating and comparing a subgraph distribution distance value of the simulated sequence data with a subgraph distribution distance value of the real sequence data to obtain a second loss value for restricting the subgraph distribution distance value;
obtaining a total loss value according to the first loss value and the second loss value;
optimizing model parameters of the sequence diagram sequence generation network until the total loss value meets set conditions, and obtaining a trained sequence diagram sequence generation network;
and inputting the noise data into the trained time sequence chart sequence generation network to obtain the time sequence chart data with the structural constraint.
Optionally, the simulation sequence data and the real sequence data are sequence data of a sequence diagram, and calculating a subgraph distribution distance value of the sequence data of the sequence diagram includes:
calculating a subgraph structure distance value corresponding to each class of subgraph structures in a preset subgraph structure class based on sequence data of the time sequence chart;
and accumulating all the subgraph structure distance values to obtain the subgraph distribution distance value.
Optionally, calculating a subgraph structure distance value corresponding to the subgraph structure based on the sequence data of the time chart, including:
counting the number of the subgraph structures in sequence data of the time sequence chart to obtain the number of predicted subgraphs;
and subtracting the number of the predicted subgraphs from the number of the real subgraphs, and then squaring to obtain the subgraph structure distance value.
Optionally, the accumulating all sub-graph structure distance values to obtain the sub-graph distribution distance value includes:
acquiring the weight corresponding to each type of sub-graph structure in the preset sub-graph structure type;
and based on the weight, performing weighted accumulation on subgraph structure distance values corresponding to the subgraph structures of all classes to obtain the subgraph distribution distance value.
Optionally, a gating network constructed based on LSTM is further provided, and the obtaining of the weight corresponding to each type of sub-graph structure in the preset sub-graph structure category includes:
inputting the real sequence diagram sequence data into the gating network to obtain a weight vector;
and inputting the weight vector into a full-connection layer to obtain the weight corresponding to each type of sub-graph structure.
Optionally, the sequence data of the timing graph includes several triple data describing the edges of the timing graph, and the triple data includes a start node, a stop node and a timestamp constituting the edges of the timing graph.
Optionally, the sequence diagram generation network is constructed based on an LSTM model, and a time constraint module is further disposed in the sequence diagram generation network, and the time constraint module is configured to constrain timestamps in the triple data according to a time sequence.
The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a GAN-based time series graph data generating program with structural constraint, and when the GAN-based time series graph data generating program with structural constraint is executed by a processor, the steps of any GAN-based time series graph data generating method with structural constraint according to the embodiment of the present invention are implemented.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by functions and internal logic of the process, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned functions may be distributed as different functional units and modules according to needs, that is, the internal structure of the apparatus may be divided into different functional units or modules to implement all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art would appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the above modules or units is only one type of logical function division, and the actual implementation may be implemented by another division manner, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed.
The integrated modules/units described above, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the embodiments of the method when the computer program is executed by a processor. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file or some intermediate form. The computer readable medium may include: any entity or device capable of carrying the above-described computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier signal, telecommunications signal, software distribution medium, and the like. It should be noted that the contents contained in the computer-readable storage medium can be increased or decreased as required by legislation and patent practice in the jurisdiction.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein.

Claims (11)

1. A time sequence diagram data generation method with structural constraint based on GAN is characterized by comprising the following steps:
acquiring noise data, inputting the noise data into a sequence generation network of a time sequence diagram, and generating simulated sequence data of the time sequence diagram for representing a target field;
sampling a real time sequence chart of a target field to obtain real time sequence chart sequence data;
inputting the simulation sequence diagram sequence data and the real sequence diagram sequence data into a sequence diagram sequence discrimination network to obtain a first loss value;
calculating and comparing a subgraph distribution distance value of the simulated sequence data with a subgraph distribution distance value of the real sequence data to obtain a second loss value for restricting subgraph distribution;
obtaining a total loss value according to the first loss value and the second loss value;
optimizing model parameters of the sequence diagram sequence generation network until the total loss value meets set conditions, and obtaining a trained sequence diagram sequence generation network;
and inputting the noise data into a trained time sequence chart sequence generation network to obtain time sequence chart data with structural constraint.
2. The GAN-based profile data generation method with structural constraints according to claim 1, wherein the simulated profile sequence data and the real profile sequence data are profile sequence data, and the calculating the subgraph distribution distance value of the profile sequence data comprises:
calculating a subgraph structure distance value corresponding to each class of subgraph structures in a preset subgraph structure class based on sequence data of the time sequence chart;
and accumulating all the subgraph structure distance values to obtain the subgraph distribution distance value.
3. The GAN-based timing graph data generation method with structural constraints as claimed in claim 2, wherein calculating a subgraph distance value corresponding to the subgraph based on the timing graph sequence data comprises:
counting the number of the subgraph structures in sequence data of the time sequence chart to obtain the number of predicted subgraphs;
and subtracting the number of the predicted subgraphs from the number of the real subgraphs, and then squaring to obtain the subgraph structure distance value.
4. The GAN-based timing graph data generating method with structural constraints as claimed in claim 2, wherein the accumulating all sub-graph structure distance values to obtain the sub-graph distribution distance value comprises:
acquiring the weight corresponding to each type of sub-graph structure in the preset sub-graph structure type;
and based on the weight, performing weighted accumulation on subgraph structure distance values corresponding to all classes of subgraph structures to obtain the subgraph distribution distance value.
5. The GAN-based timing graph data generation method with structural constraints according to claim 4, wherein a gating network constructed based on LSTM is further provided, and the obtaining the weight corresponding to each sub-graph structure in the preset sub-graph structure category comprises:
inputting the real sequence diagram sequence data into the gating network to obtain a weight vector;
and inputting the weight vector into a full-connection layer to obtain the weight corresponding to each type of sub-graph structure.
6. The GAN-based timing graph data with structural constraints generating method as claimed in claim 1, wherein the timing graph sequence data comprises a number of triple data describing edges of the timing graph, the triple data comprising a start node, a stop node and a time stamp constituting the edges of the timing graph.
7. The GAN-based timing graph data generation method with structural constraints according to claim 6, wherein the timing graph sequence generation network is constructed based on an LSTM model, and a time constraint module is further disposed in the timing graph sequence generation network and is used for constraining the timestamps in the triple data according to the time sequence.
8. A GAN-based timing graph data generation system with structural constraints, the system comprising:
the sequence generation module of the time sequence is used for obtaining noise data, inputting the noise data into a trained sequence generation network of the time sequence, obtaining the time sequence data with structural constraint or inputting the noise data into the sequence generation network of the time sequence, and generating simulation sequence data of the time sequence in the representation target field;
the sequence sampling module of the time sequence is used for sampling a real time sequence chart of the target field to obtain sequence data of the real time sequence chart;
the sequence distinguishing module of the time sequence diagram is used for inputting the sequence data of the simulated time sequence diagram and the sequence data of the real time sequence diagram into the sequence distinguishing network of the time sequence diagram to obtain a first loss value;
the subgraph distribution constraint module is used for calculating and comparing a subgraph distribution distance value of the simulated sequence data of the time sequence diagram with a subgraph distribution distance value of the real sequence data of the time sequence diagram to obtain a second loss value for constraining subgraph distribution;
the optimization module is used for obtaining a total loss value according to the first loss value and the second loss value; and optimizing the model parameters of the sequence diagram sequence generation module until the total loss value meets the set condition to obtain the trained sequence diagram sequence generation module.
9. The GAN-based timing graph data generating system with structural constraints as claimed in claim 8 wherein the sub-graph distribution constraint module further comprises a gated neural network for obtaining a weight corresponding to each sub-graph structure in the preset sub-graph structure class according to the real timing graph sequence data.
10. An intelligent terminal, characterized in that the intelligent terminal comprises a memory, a processor and a GAN-based timing graph data generation program with structural constraints stored on the memory and operable on the processor, wherein the GAN-based timing graph data generation program with structural constraints realizes the steps of the GAN-based timing graph data generation method with structural constraints according to any one of claims 1 to 7 when executed by the processor.
11. A computer-readable storage medium, wherein the computer-readable storage medium stores thereon a GAN-based timing graph data generation program with structural constraints, and the GAN-based timing graph data generation program with structural constraints, when executed by a processor, implements the steps of the GAN-based timing graph data generation method with structural constraints according to any one of claims 1 to 7.
CN202211638436.4A 2022-12-20 2022-12-20 GAN-based time sequence diagram data generation method and system with structural constraint Active CN115617882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211638436.4A CN115617882B (en) 2022-12-20 2022-12-20 GAN-based time sequence diagram data generation method and system with structural constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211638436.4A CN115617882B (en) 2022-12-20 2022-12-20 GAN-based time sequence diagram data generation method and system with structural constraint

Publications (2)

Publication Number Publication Date
CN115617882A true CN115617882A (en) 2023-01-17
CN115617882B CN115617882B (en) 2023-05-23

Family

ID=84880110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211638436.4A Active CN115617882B (en) 2022-12-20 2022-12-20 GAN-based time sequence diagram data generation method and system with structural constraint

Country Status (1)

Country Link
CN (1) CN115617882B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117709210A (en) * 2024-02-18 2024-03-15 粤港澳大湾区数字经济研究院(福田) Constraint inference model training, constraint inference method, constraint inference component, constraint inference terminal and constraint inference medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111476200A (en) * 2020-04-27 2020-07-31 华东师范大学 Face de-identification generation method based on generation of confrontation network
CN112699912A (en) * 2020-11-19 2021-04-23 电子科技大学 Method for enhancing infrared thermal image by improving GAN
CN112835709A (en) * 2020-12-17 2021-05-25 华南理工大学 Method, system and medium for generating cloud load time sequence data based on generation countermeasure network
US20210182458A1 (en) * 2019-12-13 2021-06-17 EMC IP Holding Company LLC Method, device and computer program product for data simulation
CN113191301A (en) * 2021-05-14 2021-07-30 上海交通大学 Video dense crowd counting method and system integrating time sequence and spatial information
CN114494242A (en) * 2022-02-21 2022-05-13 平安科技(深圳)有限公司 Time series data detection method, device, equipment and computer storage medium
WO2022238967A1 (en) * 2021-05-14 2022-11-17 Nokia Technologies Oy Method, apparatus and computer program product for providing finetuned neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210182458A1 (en) * 2019-12-13 2021-06-17 EMC IP Holding Company LLC Method, device and computer program product for data simulation
CN111476200A (en) * 2020-04-27 2020-07-31 华东师范大学 Face de-identification generation method based on generation of confrontation network
CN112699912A (en) * 2020-11-19 2021-04-23 电子科技大学 Method for enhancing infrared thermal image by improving GAN
CN112835709A (en) * 2020-12-17 2021-05-25 华南理工大学 Method, system and medium for generating cloud load time sequence data based on generation countermeasure network
CN113191301A (en) * 2021-05-14 2021-07-30 上海交通大学 Video dense crowd counting method and system integrating time sequence and spatial information
WO2022238967A1 (en) * 2021-05-14 2022-11-17 Nokia Technologies Oy Method, apparatus and computer program product for providing finetuned neural network
CN114494242A (en) * 2022-02-21 2022-05-13 平安科技(深圳)有限公司 Time series data detection method, device, equipment and computer storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117709210A (en) * 2024-02-18 2024-03-15 粤港澳大湾区数字经济研究院(福田) Constraint inference model training, constraint inference method, constraint inference component, constraint inference terminal and constraint inference medium

Also Published As

Publication number Publication date
CN115617882B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN108171209B (en) Face age estimation method for metric learning based on convolutional neural network
CN108399428B (en) Triple loss function design method based on trace ratio criterion
CN110675623A (en) Short-term traffic flow prediction method, system and device based on hybrid deep learning
WO2018076571A1 (en) Method and system for detecting abnormal value in lte network
CN111461038B (en) Pedestrian re-identification method based on layered multi-mode attention mechanism
CN110826638A (en) Zero sample image classification model based on repeated attention network and method thereof
CN113128671B (en) Service demand dynamic prediction method and system based on multi-mode machine learning
CN115617882B (en) GAN-based time sequence diagram data generation method and system with structural constraint
CN111191722B (en) Method and device for training prediction model through computer
CN115660135A (en) Traffic flow prediction method and system based on Bayes method and graph convolution
CN112086144A (en) Molecule generation method, molecule generation device, electronic device, and storage medium
CN110163130B (en) Feature pre-alignment random forest classification system and method for gesture recognition
CN115511012B (en) Class soft label identification training method with maximum entropy constraint
CN113743572A (en) Artificial neural network testing method based on fuzzy
CN116451081A (en) Data drift detection method, device, terminal and storage medium
CN114494999B (en) Double-branch combined target intensive prediction method and system
CN112906785B (en) Zero sample object type identification method, device and equipment based on fusion
CN115730248A (en) Machine account detection method, system, equipment and storage medium
CN115661539A (en) Less-sample image identification method embedded with uncertainty information
CN114936890A (en) Counter-fact fairness recommendation method based on inverse tendency weighting method
CN111523649B (en) Method and device for preprocessing data aiming at business model
Cao et al. No-reference image quality assessment by using convolutional neural networks via object detection
CN114186646A (en) Block chain abnormal transaction identification method and device, storage medium and electronic equipment
CN114678083A (en) Training method and prediction method of chemical genetic toxicity prediction model
CN114139937A (en) Indoor thermal comfort data generation method, system, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant