CN113361606A - Deep map attention confrontation variational automatic encoder training method and system - Google Patents

Deep map attention confrontation variational automatic encoder training method and system Download PDF

Info

Publication number
CN113361606A
CN113361606A CN202110630525.3A CN202110630525A CN113361606A CN 113361606 A CN113361606 A CN 113361606A CN 202110630525 A CN202110630525 A CN 202110630525A CN 113361606 A CN113361606 A CN 113361606A
Authority
CN
China
Prior art keywords
attention
graph
encoder
confrontation
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110630525.3A
Other languages
Chinese (zh)
Inventor
张维玉
翁自强
夏忠秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN202110630525.3A priority Critical patent/CN113361606A/en
Publication of CN113361606A publication Critical patent/CN113361606A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a deep map attention confrontation variational automatic encoder training method and a system, comprising the following steps: obtaining a first feature vector corresponding to each node in graph data; after the first feature vector is subjected to aggregation operation taking the attention mechanism as a core, outputting a second feature vector of each node for forming a plurality of groups of independent attention mechanisms; applying attention distribution to a plurality of relevant features between the central node and the neighbor nodes by summarizing a plurality of groups of independent attention mechanisms to form a graph attention confrontation automatic encoder; encoding graph data using an encoder of a graph attention confrontation auto-encoder to obtain a graph representation vector; and performing inner product processing on the graph representation vector by using a decoder to reconstruct the graph data, and predicting whether a connecting edge exists between any two points in the graph. The problems of overfitting and overflugging are effectively solved, and the graph embedding capacity is further improved.

Description

Deep map attention confrontation variational automatic encoder training method and system
Technical Field
The disclosure belongs to the technical field of encoders, and particularly relates to a deep map attention immutation automatic encoder training method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
Non-euclidean data such as graph data is difficult to process compared to euclidean space data such as image, text, and voice data. Therefore, graph embedding algorithms have become a hotspot of research. The graph research focuses on tasks such as node classification, link prediction, graph classification and graph generation, and graph embedding algorithms can be divided into three types, namely graph decomposition, random walk and graph neural network.
Recently, graph embedding algorithms have entered the neural network era. Kipf et al simplify the definition of frequency domain convolution and propose GCN that performs convolution operation in the spatial domain, which greatly improves the embedding capability of the graph convolution model. Since then, researchers have proposed many variants of GCN. GraphSAGE does not limit sampling to the topology information of the nodes, but instead takes advantage of the intrinsic characteristics of the nodes and gives up the diffusion mechanism involving a large number of parameters, thereby enabling distributed training and inductive learning of large-scale graph data. A graph attention network (GAT) performs aggregation operations on neighbor nodes using an attention mechanism to adaptively assign neighbor weights.
The above method is a supervised graph embedding method. In recent years, the application of graph data has become more and more widespread, and the structure of a graph is more complicated. In practical application scenarios, many data tags typically have a higher acquisition threshold. Therefore, it is of great value to study unsupervised graphs of graph data to be effective in learning. A graph auto-encoder based on reconstruction loss is a typical unsupervised learning method. GAE and VGAE use an encoder to obtain latent vectors, while a decoder uses latent variables to reconstruct the structure. Due to the high-dimensional and complex distribution characteristics of the map data, the distribution of the potential vectors encoded by the encoder deviates from the actual distribution. To solve the problem with the distribution of the encoded data, the DVNE embeds the nodes directly from the gaussian distribution and uses the Wasserstein distance as a similarity measure between the distributions, effectively modeling the uncertainty of the nodes in the network. ARGA and argvga further introduce a countermeasure mechanism that forces the encoder to generate a potential vector that more closely approximates the true distribution of the data through countertraining. Although these autoencoders have achieved some success, they do not take into account differences in node importance. Inspired by the success of deep CNNs in image classification, researchers have proposed several attempts to explore the idea of how to construct deep GCNs, including GCN, GraphSAGE, ResGCN, and JKNet. However, none of them provide an architecture for a detailed presentation.
In addition, the current encoder uses a relatively small number of layers (typically 1 or 2) to notice the network, and cannot completely release the embedding performance of the image-attention encoder. The number of the power layers of the model cannot be deepened due to the existence of the over-smoothing and over-fitting problems.
Thus, the technical problems in the prior art include:
1. existing graph autoencoders ignore the differences of graph neighbor nodes and the potential data distribution of the graph.
2. Overfitting and overflustering are two major problems that hinder the deepening of the graph model. Overfitting comes from the following cases: when a hyper-parametric model is used to fit the limited distribution of training data, the learned model is very suitable for the training data, but the popularization of test data is poor, and the overfitting problem is particularly prominent when deep GCN is applied to a small graph. Over-smoothing, progressing towards the other extreme, makes training deep GCNs very difficult.
The nature of graph embedding is aggregation, and if the number of layers used is unlimited, the representation of all nodes will converge to a fixed point, which will isolate the result from the input elements and cause the gradient to vanish, a phenomenon known as over-smoothing.
Disclosure of Invention
In order to overcome the defects of the prior art, the deep map attention confrontation variational automatic encoder training method is provided, the deep map attention confrontation variational automatic encoder deepens the number of drawing attention layers of the encoder, effectively solves the problems of overfitting and oversmooth, and further improves the map embedding capacity.
In order to achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
in a first aspect, a deep map attention confrontation variational automatic encoder training method is disclosed, comprising:
obtaining a first feature vector corresponding to each node in graph data;
after the first feature vector is subjected to aggregation operation taking the attention mechanism as a core, outputting a second feature vector of each node for forming a plurality of groups of independent attention mechanisms;
applying attention distribution to a plurality of relevant features between the central node and the neighbor nodes by summarizing a plurality of groups of independent attention mechanisms to form a graph attention confrontation automatic encoder;
encoding graph data using an encoder of a graph attention confrontation auto-encoder to obtain a graph representation vector;
and performing inner product processing on the graph representation vector by using a decoder to reconstruct the graph data, and predicting whether a connecting edge exists between any two points in the graph.
In a further technical solution, the step of obtaining the second feature vector of each node includes:
setting weight coefficients of adjacent nodes;
selecting a single-layer full-connection layer as a correlation function;
normalizing the correlation calculation of all the neighbors to obtain a weight coefficient of each neighbor node;
and after the weight coefficient is obtained, obtaining a second feature vector of the node according to a weighted summation strategy of an attention mechanism.
According to the further technical scheme, the optimization of the graph attention confrontation automatic encoder by the random edge deletion technology is realized by carrying out graph attention aggregation through the obtained weight coefficients.
In the further technical scheme, in the aggregation process of the attention of the graph, the adjacent nodes are identified by means of the adjacent matrix, and after the attention coefficient which is not normalized is obtained in the attention layer of the graph, the masking operation is carried out.
In a further technical scheme, the attention confrontation automatic encoder is combined with a random edge deletion technology, and is called as a deep map attention confrontation variational automatic encoder.
In a further technical scheme, the random edge deletion technology randomly deletes edges of the input graph in a certain proportion, specifically: will randomly adjoin V of matrix ApThe non-zero elements are reset to zero, where V is the total number of edges and p is the erasure rate.
In a further technical scheme, the encoder in the automatic encoder is taken as a generator of a countermeasure network, and the generator deceives the discriminator by generating fake data, wherein the fake data refers to potential variables obtained by the encoder through image data encoding;
the task of the discriminator is to distinguish whether the sample is from the true data or the generator, the discriminator will be from the prior distributionzThe output data is judged to be positive, and the data from the latent variable output is judged to be negative.
In a second aspect, a deep map attention confrontation variational automatic encoder training system is disclosed, comprising:
a feature vector formation module configured to: obtaining a first feature vector corresponding to each node in graph data;
after the first feature vector is subjected to aggregation operation taking the attention mechanism as a core, outputting a second feature vector of each node for forming a plurality of groups of independent attention mechanisms;
an auto-encoder training module configured to: applying attention distribution to a plurality of relevant features between the central node and the neighbor nodes by summarizing a plurality of groups of independent attention mechanisms to form a graph attention confrontation automatic encoder;
encoding graph data using an encoder of a graph attention confrontation auto-encoder to obtain a graph representation vector;
and performing inner product processing on the graph representation vector by using a decoder to reconstruct the graph data, and predicting whether a connecting edge exists between any two points in the graph.
The above one or more technical solutions have the following beneficial effects:
the present invention focuses on differential representation of neighboring nodes, proposing an attention-directed anti-variation autoencoder (AAVGA). The purpose is to distinguish graph structure information and apply a counterregularization mechanism to improve the graph embedding capability of the model. The encoder generates potential feature vectors through a graph attention layer, differentiates node representations in an embedding process by utilizing adaptive distribution of weights, and adds a plurality of groups of independent attention mechanisms, so that attention aggregation is more stable. To normalize the distribution of encoded data, a countering mechanism is introduced into the attention-based graph variation autoencoder. The component can determine whether the input is from a low-dimensional representation of the graph network or from a true distribution of the sample. The discriminator encourages the encoder to generate low-dimensional variables with a more realistic distribution of data and learns an efficient representation of the graph.
In addition, the present invention introduces a random edge deletion technique (RDedge), which can help the model to randomly discard some edges of the input graph at each training period. RDedge is considered herein as a data enhancement technique. Different randomly deformed copies of the original graph are generated by the RDedge. This enhances the randomness and diversity of the input data and thus better prevents overfitting. RDEdge can also be viewed as a message passing reducer, where in the graph attention layer, the passing of messages between neighboring nodes is along an edge. Deleting certain edges makes the node connections more sparse, so excessive smoothing can be avoided to some extent when the attention level is getting deeper. RDEdge can bring much help to the graph model training, and as shown in FIG. 1, after the RDEdge is combined, AAVGA can well deal with the problems of overfitting and overflugging. This allows further deepening of the coding layer of the model, improving the graph embedding capability of the model.
The invention combines the graph attention confrontation automatic encoder (AAVGA) with the random edge deletion technology (RDedge), and is called a deep graph attention confrontation variation automatic encoder (AAVGA-d). AAVGA-d deepens the number of drawing attention layers of the encoder, effectively solves the problems of overfitting and overflugging, and further improves the drawing embedding capacity.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is an overall architecture of an embodiment of the disclosure AAVGA-d;
FIG. 2 shows the results of a link prediction experiment for 3 models;
FIG. 3 is a schematic diagram showing the layer-by-layer RDEdge and RDEdge loss comparison;
FIG. 4(a) is a schematic diagram of distance between attention layer outputs before training;
FIG. 4(b) is a diagram illustrating distance training between attention layer outputs;
the cluster visualization of the graph of fig. 5 is Cora visualization, Citeseer visualization, Pubmed visualization, respectively.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
The importance of each node in the graph is different, so the difference of the importance of the nodes must be considered in the graph embedding learning process, wherein the measure of the importance is the problem to be considered firstly.
The graph attention encoder embeds graph representation vectors which deviate from the real distribution of graph data, and how to ensure that the graph representation vectors obtained by encoding obey the real distribution of the data is another problem to be solved by the invention.
When reviewing various graph embedding methods, shallow graph neural networks (typically 2 layers) are used for both supervised and unsupervised graph embedding. When large-scale graph embedding learning is faced, the number of graph neural network layers needs to be deepened. However, overfitting and overflustering are two major problems that hinder the deepening of the graph model. Overfitting comes from the following cases: when a hyper-parametric model is used to fit the limited distribution of training data, the learned model is very suitable for the training data, but the popularization of test data is poor, and the overfitting problem is particularly prominent when deep GCN is applied to a small graph.
Over-smoothing, progressing towards the other extreme, makes training deep GCNs very difficult. The nature of graph convolution is aggregation, and if the number of layers used is unlimited, the representation of all nodes will converge to a fixed point, which will isolate the result from the input elements and cause the gradient to vanish, a phenomenon known as over-smoothing.
Example one
Referring to fig. 1, the embodiment discloses a deep map attention-confrontation variational auto-encoder training method, and the proposed AAVGA-d combines the strategy of GAT and replaces the two-layer graph convolution network in the general encoder with a multi-layer graph attention network to generate a potential representation of graph data.
The method comprises the following specific steps:
order and node v in L layeriThe corresponding feature vector is hi
Figure BDA0003103535200000071
Wherein d is(1)Representing the characteristic length of the node. Outputting a new feature vector h 'of each node after the aggregation operation taking attention mechanism as a core'iWherein
Figure BDA0003103535200000072
d(1+1)Representing the length of the output feature vector, where this is aggregatedThe resultant operation is referred to as the graph attention layer.
In order to obtain a new feature vector of each node, the specific process is as follows:
assume a central node of viNeighboring node vjTo viThe weight coefficients of (a) are set as:
eij=a(Whi,Whj)#(1)
in the formula:
Figure BDA0003103535200000073
is a weight parameter of the node feature transformation of the layer, and a (-) is a function of calculating the correlation between two nodes. In principle, any node in the graph to node v can be computed hereiThe weight parameter of (2); however, to simplify the computation, it is limited to first order neighbors only. a can define a parameter-free correlation computation using the inner product of vectors<whi,whj>. Alternatively, it may be defined as a parameterized neural network layer. If a:
Figure BDA0003103535200000074
a scalar value representing the correlation between the two vectors is output. Here a single fully connected layer is chosen as the correlation function:
eij=Leaky ReLU(aT[Whi||Whj])#(2)
in the formula: the weight parameter is
Figure BDA0003103535200000075
The activation function is LeakyReLU. To better assign weights, the correlation calculations for all neighbors must be normalized by softmax normalization:
Figure BDA0003103535200000076
after the weight coefficient is obtained through calculation, a node v is obtained according to a weighted summation strategy of an attention mechanismiNew feature vector of (2):
Figure BDA0003103535200000081
to further improve the expression capacity of the attention layer, a multi-head attention mechanism is also introduced in AAVGA-d, wherein formula (5) is used to form K sets of independent attention mechanisms. Compared to GAT, to reduce the dimensionality of the output potential feature vectors, the stitching operation is replaced with an averaging operation:
Figure BDA0003103535200000082
by aggregating multiple sets of independent attention mechanisms, the multi-head attention mechanism can apply attention distribution to multiple correlation features between the central node and the neighbor nodes, thereby enhancing the representation capability of the encoder.
The present invention uses an attention-based encoder to fit μ and σ:
μ=GNNμ(X,A)#(7)
logσ=GNNσ(X,A)#(8)
in the formula: μ is the mean vector μiLog σ shares the weight w, σ with μ in the attention layer2Is the variance.
The output of the encoder based on the attention mechanism is the distribution (mu, sigma) of the low-dimensional representation vector of the graph2) Assume that the prior distribution of the map data is gaussian. The model uses a graph attention force variation encoder in the encoding part, and a graph representation vector is obtained from the learned representation vector distribution through a sampling operation, wherein a weight parameter method is used: z is mu + epsilonσ, where ε is sampled from the Gaussian distribution (0, 1). In short, the low-order representation vector is constrained in a distribution (Gaussian distribution) and randomly sampled from the distribution, and the generated low-order representation vector can generate approximate real samples through a decoder, and even can obtain new samples. Such training patterns of the graph variation autoencoder can often be obtained more than the graph autoencoderGood effect.
It should be noted that, the model of the embodiment of the present disclosure uses the automatic attention variance encoder in the encoding portion, and the automatic variance encoder and the automatic attention encoder are integrated.
Specifically, the graph is embedded using a multi-layer graph attention network, and the distribution of the representation vectors of the graph is learned.
The definition of the variational diagram autoencoder is as follows:
Figure BDA0003103535200000091
Figure BDA0003103535200000092
with respect to the decoder: the invention performs inner product processing on the graph representation vectors in a decoder to reconstruct a graph A:
Z=GNN(X,A)#(11)
Figure BDA0003103535200000093
Figure BDA0003103535200000094
the adjacency relation is reconstructed by using the inner product of the low-dimensional representation vector of the graph, so that whether a connecting edge exists between any two points in the graph is predicted:
Figure BDA0003103535200000095
Figure BDA0003103535200000096
empirically, the standard normal distribution was chosen as the prior distribution of the latent variable z of the graph:
Figure BDA0003103535200000097
the complete loss function is defined as follows:
Figure BDA0003103535200000098
if only will
Figure BDA0003103535200000099
Used as a loss function to optimize the performance of the encoder, the variance value obtained by the model is zero, since sampling from a fixed normal distribution will reduce the number of samples generated and the actual difference between samples. However, the main goal is to optimize the variational auto-encoder. To achieve this goal, the KL divergence of the distribution of potential vectors and the standard normal distribution is added to the loss function herein, forcing the distribution of each potential vector to approximate the standard normal distribution.
It should be further noted that the graph data after being embedded by the encoder is a distribution of graph representation vectors, and it is assumed that the prior distribution of the graph data is a gaussian distribution (0, 1), but the embedded distribution does not necessarily follow the gaussian distribution, so that an anti-regularization constraint is added in the training process, and the distribution of the table vector generated by the encoder is forced to follow the prior distribution of the graph data. The decoder reconstructs the representation vector to obtain an adjacent matrix of the graph, and trains the encoder through a loss function to generate a more accurate representation vector, and the reconstructed adjacent matrix of the graph is similar to a real adjacent matrix of the graph as possible and has no direct relation to the unsupervised device and the decoder.
Countermeasure mechanism of the joint encoder:
the challenge model of the present invention consists of two parts: the figure notes that the encoder in the autoencoder acts as a generator of the countermeasure network. The generator attempts to fool the discriminator by generating spurious data, which refers to the latent variables that the encoder obtains from the encoding of the graph data. Losses of generators such asFormula (16). The task of the discriminator is to distinguish whether the sample is from the real data or the generator. The discriminator will be from the prior distribution pzThe output data is judged to be positive, the output data from the latent variable z is judged to be negative, and the loss is as follows:
Figure BDA0003103535200000101
the present invention uses gaussian distributions as prior distributions for the image data and assumes that the potential vector z generated by the encoder does not satisfy the prior distributions of the data in euclidean space; thus, a standard multilayer perceptron is used as the discriminator. In the embedding and learning process, an antagonistic regularization constraint is applied to reduce the bias of the data distribution in the training process. The main goal of the confrontation model is to co-train the encoder and the discriminator through the minimax game so that they optimize each other:
Figure BDA0003103535200000102
training D to maximize the correct discrimination of the samples from the training data and G. At the same time, G was trained to minimize logarithms (1-D (G (Z))).
Random edge deletion technology:
in order to solve the problems of overfitting and overflustering caused by deepening the attention layer of the graph, the invention designs a random edge deletion technology (RDedge) aiming at the attention model of the graph. Dropedge has contributed to the graph model's resistance to over-smoothing, but it does not work well for deep graph attention models because graph attention does not rely on the graph's Laplace matrix during aggregation.
The invention deepens the figure attention layer number (8 layers) in the encoder to obtain a deep figure attention model and provides RDedge. In each training, the RDEdge technique randomly deletes a certain proportion of the edges of the input graph. In particular, it randomly resets Vp non-zero elements of the adjacency matrix a toZero, where V is the total number of edges and p is the deletion rate. If the adjacency matrix after randomly deleting edges is represented as ArdThen, the relationship between the original image and the original image adjacency matrix a is:
Ard=A-A′#(17)
in the formula: a' is a subset of the original adjacent matrix A and has a size VpA sparse matrix of (2). In the process of aggregation of the attention, the adjacency matrix a does not directly participate in the operation, but the adjacent nodes need to be identified by the adjacency matrix a. In the attention level of the graph, the unnormalized attention coefficient e is obtained by the formula (2)ijAfter that, a mask operation (mask) is required:
eijrd=Ard⊙eij#(18)
the invention obtains e after mask operationijrdThe weight coefficient alpha of each neighbor node can be obtained by carrying out normalization operation in the formula (3)ijrd. And finally, carrying out graph attention aggregation through the obtained weight coefficients to realize the optimization of the deep graph attention model by the RDEdge.
For the over-smoothing problem of GNN, Oono and Suzuki consider that as the number of layers increases, the representation of the node eventually converges to a subspace. The theoretical analysis of the anti-over-smoothing capability of RDedge is carried out by the concept. The following definitions are first given:
a subspace is defined 1. Order to
Figure BDA0003103535200000111
Is a space
Figure BDA0003103535200000112
M-dimensional subspace of (N is the number of nodes, C is the dimension of the node feature, where
Figure BDA0003103535200000113
Is an orthogonal matrix and has ETE=IM,M≤N
In the formula: projection matrix
Figure BDA0003103535200000114
dwCharacteristic dimension for a contextual user, each row P in the projection matrixiRepresenting a unique translation of the user context.
Definition 2 is over-smoothed. Giving input feature independent subspaces
Figure BDA0003103535200000115
If the hidden matrix H of the l-th layer(l)The distance of all vectors in the GNN does not exceed the epsilon (epsilon is larger than 0), and the node characteristics in the GNN are called to be over-fitted.
Figure BDA0003103535200000121
Define 3 order original graph as
Figure BDA0003103535200000122
The graph after the RDedge is randomly deleted is
Figure BDA0003103535200000123
Given a minimum e, assume
Figure BDA0003103535200000124
And
Figure BDA0003103535200000125
in a subspace
Figure BDA0003103535200000126
And
Figure BDA0003103535200000127
when the problem of over-smoothness is encountered, the following two inequalities are deleted and are satisfied after enough sides are deleted.
Figure BDA0003103535200000128
Figure BDA0003103535200000129
According to Oono and Suzuki[28]In conclusion, deep GNNs under certain conditions have an over-smoothing problem for any small e-value, but they do not propose a corresponding solution. This document illustrates that RDEdge helps to alleviate the over-smoothing problem from two perspectives:
(1) by reducing the connection between the nodes, the RDEdge can reduce the over-smooth convergence speed and improve the upper bound of the E-smooth layer.
(2) The difference (N-M) between the original space and the convergence subspace measures the information loss amount, and the larger the difference is, the more serious the information loss is. RDedge can increase the dimension of the subspace and has the capability of reducing information loss.
The flow of the AAVGA-d is shown in fig. 1, and before each training, the RDEdge technology is used to perform sparse processing on the adjacency matrix a of the graph to obtain Ard. Obtaining an unnormalized attention coefficient e through node characteristics X of a multi-layer graph attention layer fitting graphij. Ard and eijPerforming a masking operation to obtain eijrd. To eijrdNormalization processing is carried out to obtain the final attention weight coefficient
Figure BDA00031035352000001210
The subsequent attention clustering of the graphs results in a low-dimensional representation matrix Z of the graphs. The prior distribution pz representing the matrix Z and the data is sampled and the samples are used to train the discriminator. Attempting to spoof the discriminator using the encoder, which may also be understood as training the encoder with the discriminator to bring the data distribution generated by the encoder closer to the true distribution, and finally, training the entire model using the overall loss function of AAVGA-d.
Experimental evaluation examples:
AAVGA-d was evaluated on three benchmark quote network datasets, demonstrating the effectiveness of this framework through the link prediction task in unsupervised graph learning. Three most popular citation data sets (Cora, Citeseer and Pubmed) were used herein in graph-embedding learning to evaluate the proposed model. The structure of the data set is described in table 1:
TABLE 1 statistics of the three cited data sets
Cora Citeseer Pubmed
Nodes 2708 3327 19717
Edges 5429 4732 44338
Features 1433 3703 500
Labels 7 6 3
The link prediction algorithm may be trained to obtain a similarity value (i.e., a similarity value of an edge) for each pair of nodes in the network. Graphic region enclosed by ROC curve and x-axisThe domain is regarded as a comprehensive measure, called AUC. The AUC can be understood as the probability that the similarity value of an edge in the test set is higher than the similarity value of an edge that does not actually exist. Specifically, one edge is randomly selected from the test set each time to be compared with the randomly selected nonexistent edge, and if the similarity value of the edge in the test set is larger than that of the nonexistent edge, 1 point is added; if the two score values are equal, add 0.5 points. Independently comparing n times, if there is n1The similarity value of the edges in the secondary test set is greater than the similarity value of the non-existing edges, with n2The second two similar values are equal, then AUC is defined as:
Figure BDA0003103535200000131
another evaluation index is AP, representing the area of the graph enclosed by the Precision-recycle (PR) curve and the x-axis. Precision and Recall are specified below:
Figure BDA0003103535200000132
Figure BDA0003103535200000133
in the formula, TP represents the node number of the predicted connection and the correct node number, all detections represents the node number of the predicted connection, and all ground nodes represents the node number of the actual connection.
The two indexes are main evaluation indexes of the link prediction task. The data set is divided into a training set, a validation set and a test set. The validation set contains 5% of edges for hyper-parametric optimization and the test set contains 10% of edges for performance evaluation. To ensure accuracy, 10 experiments were performed on each data set and the average of the experimental results was calculated.
To verify that the AAVGA-d framework proposed herein has competitive graph embedding capabilities, it is compared to six popular graph embedding algorithms:
(1) spectral Clustering, which is a graph theory-based Clustering method, a weighted undirected graph is divided into two or more optimal subgraphs so that the interiors of the subgraphs are as similar as possible and the distance between the subgraphs is as large as possible to achieve a common Clustering goal.
(2) Deep Walk: learning the social representation by truncated random walks yields better results if the vertices of the network are few, and the method is also scalable and can adapt to changes in the network.
(3) And (3) GAE: a representative of unsupervised graph embedding learning learns an efficient representation of input graph data by encoding and decoding based on reconstruction loss.
(4) VGAE: the encoder no longer learns the low-dimensional vector representation of the samples, it learns the distribution of the sample representation. Assuming this representation follows a normal distribution, then samples are taken from the learned distribution to obtain a low-dimensional vector representation.
(5) ARGA: a countermeasure mechanism is added on the basis of the graph automatic encoder to ensure the consistency of data distribution in the training process.
(6) ARVGA: a countermeasure mechanism is introduced on the basis of a graph variation automatic encoder, the true data distribution is directly sampled, and distribution difference identification is carried out through a potential vector obtained by a discriminator and the encoder.
The results of the experiment are shown in table 2:
table 2 link prediction experimental results
Figure BDA0003103535200000141
Figure BDA0003103535200000151
The results of the link prediction experiments are shown in table 2. The method herein AAVGA-d gave excellent results on all three datasets. Compared with AAVGA, the AAVGA-d graph embedding performance using the RDedge technology is better, and the AP and AUC of the three data sets are surpassed. This indicates that strategies to deepen the graph attention layer and apply the RDedge technique against overfitting and against overflugging are feasible. Other indicators on the data set, except for the AUC of Cora, exceeded 94%. The model performed most well on the Citeseer dataset compared to other benchmarks, with 3.7% and 3% improvement in AUC and AP, respectively, compared to VGAE and 2.1% and 2% improvement compared to ARVGA. The performance of this method is significantly improved compared to the non-encoder picture embedding method. On the Citeser dataset, AUC of AAVGA-d is 14% higher than that of Spectral Cluster and Deepwalk; AP increased by 10% and 11.4%, respectively. Experimental results show that by combining the attention mechanism and the countermeasure mechanism in the graph encoder, the graph embedding capability can be improved.
Analytical experiments were performed by using three data sets, Cora, Citeseer and Pubmed. Taking the Cora dataset as an example, the dataset consists of 2078 machine learning papers, the number of references among the papers reaches 5429 times, and 1433 words are divided. From the perspective of the figure, the dataset has 2078 points, 1433 dimensional features and 5429 edges. The method comprises the steps of performing embedded learning on data by using a deep-layer graph attention immutation automatic encoder, firstly, performing one-hot encoding on the data to obtain an adjacency matrix A and a characteristic X, embedding the characteristic by using the deep-layer attention encoder to obtain an expression vector, fully considering importance difference between neighboring nodes by using an attention weight distribution mechanism, giving a larger weight to similar nodes (with a plurality of same words) in an aggregation process, and giving a smaller weight to dissimilar nodes (with fewer same words). Secondly, the discriminator is used for carrying out countermeasure supervision on the encoder in embedding learning, and the obtained expression vector is forced to obey the real distribution of the Cora data set, so that a more accurate embedding result is obtained. And finally, reconstructing by using the obtained graph representation vector to obtain a prediction matrix, wherein data in the matrix represents whether a reference relation exists between the two articles. The image embedding performance is finally improved through the joint training of the encoder, the decoder and the monitor.
Next, the effect of RDEdge on the model will be discussed further. The above has shown that RDEdge has an anti-overfitting and anti-smoothing effect on the depth map attention model. Specifically, in order to explain the improvement of the model precision by the RDEdge technology, the document performs further work: and comparing the precisions of the three models of AAVGA, AAVGA-4/8 and AAVGA-d under the link prediction experiment.
(1) AAVGA: single-layer picture attention-resisting variational automatic encoder
(2) AAVGA-4/8 AAVGA's multi-attention layer version of AAVGA.
(3) AAVGA-d: while deepening the attention layer of the graph, a random edge deletion technology is applied.
Experiment each hyper-parameter set is the same as the above experiment set, where AAVGA uses only a single layer of the graph attention layer on all three datasets. AAVGA-4/8 and AAVGA-d deepen the image attention layer to 4 layers on the Cora, Citeseer dataset and Pubmed to 8 layers. The results of the experiment are shown in FIG. 2. Note that simply deepening the encoder's map attention on the Cora and Citeseer datasets results in a decrease in the accuracy of AUC and AP, since the deep map attention leads to overfitting and over-smoothing problems. Secondly, no significant experimental accuracy degradation is seen on the Pubmed dataset, which is considered herein to be due to the fact that the Pubmed dataset itself is large enough in data volume, and the graph herein notes that the force model is limited to the first-order neighbors of the nodes at the time of aggregation, so that no significant over-fitting and over-smoothing problems occur. It is worth noting that AAVGA-d combined with RDedge technology has excellent performance on three data. This indicates that the RDEdge technique does have the capability of resisting overfitting and overflugging for the graph attention model; on the other hand, it is also shown that appropriately deepening the drawing attention layer number can improve the drawing embedding capability of the model.
In the above experiment, when the random edge deletion technique is applied to the AAVGA model, all attention layers share the same ara. Notably, the model can perform RDEdge separately for each attention layer, specifically computing each layer independently
Figure BDA0003103535200000171
This allows the attention layer to be made unique
Figure BDA0003103535200000172
Thereby forming diversified expression of attention and obtaining more randomness. The layer-by-layer random edge deletion technology (RDedge-L) is experimentally evaluated on a Cora data set from the perspective of a loss function, as shown in FIG. 3, AAVGA-dl combined with the layer-by-layer random edge deletion technology has smaller training loss than AAVGA-d, but the loss curves on the two verification sets are close to coincidence and have little difference in performance, which indicates that the layer-by-layer random edge deletion technology is more favorable for training.
To further verify that AAVGA-d has the ability to mitigate over-smoothing, the degree of over-smoothing is quantified herein by calculating the difference between the output of the current attention tier and the output of the next tier, with euclidean distance being used for the difference calculation, with smaller distance meaning more over-smoothing. The experiments were performed on a Cora dataset, using 6 layers of graphical attention for both AAVGA and AAVGA-d. As shown in fig. 4(a), the over-smoothing phenomenon becomes severe as the number of layers increases before training. However, the distance between the AAVGA-d layers is relatively large and the rate of degradation is slow. As shown in fig. 4(b), after 400 training passes, without the AAVGA that uses the RDEdge technique, the difference between the 5 th and 6 th layers is almost zero, which indicates that the hidden features have converged to a certain fixed point. On the contrary, the distance between the AAVGA-d layers is not reduced, the AAVGA-d shows a trend of slowly rising, and the AAVGA-d can better deal with the problem of over-smoothness caused by deepening of the layer number.
To more intuitively present the graph embedding capabilities of the AAVGA-d model and demonstrate its versatility, we will visualize graph data in this section.
First, we embed three processed citation data sets (Cora, Citeseer and Pubmed) using AAVGA-d. After training is complete, the graph data is passed through a deep-graph attention-fighting variational auto-encoder to obtain a representation matrix of the graph composed of feature vectors. To achieve two-dimensional visualization of graphical data, the dimensionality of the feature vectors is compressed using PCA. And finally, clustering the dimensionality reduction data by using k-means + +. The visualization results are shown in fig. 5. Wherein each color represents a category, the quotation categories in each data set are well divided, which again demonstrates the excellent graph embedding ability of the AAVGA-d model.
The invention provides a random edge deletion technology (RDedge) aiming at a graph attention network, which randomly deletes edges in an original input graph in an embedding process to achieve the functions of anti-over-smoothing and anti-over-fitting and deepens the number of attention layers. The picture embedding capability of the encoder is improved. In the face of large graphs, it is particularly necessary to deepen the graph to notice the number of layers.
In order to improve the embedding capability of the graph automatic encoder, the invention provides a graph-attention-resisting variation automatic encoder (AAVGA-d), which draws attention to the encoder and uses a resisting mechanism in embedding training. The graph attention encoder realizes the self-adaptive distribution of the weights of the neighbor nodes, and the anti-regularization enables the distribution of embedded vectors generated by the encoder to be close to the real distribution of data. In order to deepen the attention layer number, a random edge deletion technology (RDedge) aiming at the attention network is designed, and the loss of the over-smooth information caused by the over-deep layer number is reduced.
Example two
It is an object of this embodiment to provide a computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the program.
EXAMPLE III
An object of the present embodiment is to provide a computer-readable storage medium.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
Example four
The present embodiment is directed to a deep map attention confrontation variational automatic encoder training system, including:
a feature vector formation module configured to: obtaining a first feature vector corresponding to each node in graph data;
after the first feature vector is subjected to aggregation operation taking the attention mechanism as a core, outputting a second feature vector of each node for forming a plurality of groups of independent attention mechanisms;
an auto-encoder training module configured to: applying attention distribution to a plurality of relevant features between the central node and the neighbor nodes by summarizing a plurality of groups of independent attention mechanisms to form a graph attention confrontation automatic encoder;
encoding graph data using an encoder of a graph attention confrontation auto-encoder to obtain a graph representation vector;
and performing inner product processing on the graph representation vector by using a decoder to reconstruct the graph data, and predicting whether a connecting edge exists between any two points in the graph.
According to the invention, the attention and countermeasure mechanism of the graph is introduced into the graph automatic encoder, and the graph attention and countermeasure automatic encoder can well embed graph data by realizing the self-adaptive distribution of the neighbor node weight and the regularization constraint of the expression vector distribution in the embedding process. Meanwhile, a random edge deletion technology (RDedge) aiming at the graph attention model is designed, which is helpful for deepening the attention layer number of the graph attention confronted with the variational automatic encoder and well solves the problems of overfitting and overflugging caused by the graph attention model. The RDEdge randomly discards a certain proportion of edges in the graph, so that the input data generates random deformation, and the diversity of the data is increased to deal with overfitting. Messaging is reduced during attention aggregation to mitigate over-smoothing. After the RDedge technology is combined, a deep graph attention confrontation variable division automatic encoder (AAVGA-d) is developed, which is a deep graph attention encoder considering importance difference between nodes, and a confrontation mechanism is combined, so that consistency of graph representation vector distribution and prior distribution obtained by embedding the encoder is ensured. The deepening of the attention layer number further improves the graph embedding capability of the encoder.
The attention encoder only limits the first-order neighbor of the node to participate in the calculation when calculating the attention weight of the neighbor node, thereby greatly reducing the complexity of the calculation and avoiding the more serious over-smoothing problem to a certain extent.
The steps involved in the apparatuses of the above second, third and fourth embodiments correspond to the first embodiment of the method, and the detailed description thereof can be found in the relevant description of the first embodiment. The term "computer-readable storage medium" should be taken to include a single medium or multiple media containing one or more sets of instructions; it should also be understood to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor and that cause the processor to perform any of the methods of the present disclosure.
Those skilled in the art will appreciate that the modules or steps of the present disclosure described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code executable by computing means, whereby the modules or steps may be stored in memory means for execution by the computing means, or separately fabricated into individual integrated circuit modules, or multiple modules or steps thereof may be fabricated into a single integrated circuit module. The present disclosure is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. The deep map attention confrontation variational automatic encoder training method is characterized by comprising the following steps:
obtaining a first feature vector corresponding to each node in graph data;
after the first feature vector is subjected to aggregation operation taking the attention mechanism as a core, outputting a second feature vector of each node for forming a plurality of groups of independent attention mechanisms;
applying attention distribution to a plurality of relevant features between the central node and the neighbor nodes by summarizing a plurality of groups of independent attention mechanisms to form a graph attention confrontation automatic encoder;
encoding graph data using an encoder of a graph attention confrontation auto-encoder to obtain a graph representation vector;
and performing inner product processing on the graph representation vector by using a decoder to reconstruct the graph data, and predicting whether a connecting edge exists between any two points in the graph.
2. The deep map attention-fighting variational automatic encoder training method of claim 1, wherein the step of obtaining the second feature vector of each node comprises:
setting weight coefficients of adjacent nodes;
selecting a single-layer full-connection layer as a correlation function;
normalizing the correlation calculation of all the neighbors to obtain a weight coefficient of each neighbor node;
and after the weight coefficient is obtained, obtaining a second feature vector of the node according to a weighted summation strategy of an attention mechanism.
3. The deep map attention-confrontation variational automatic encoder training method of claim 1, wherein the optimization of the graph attention-confrontation automatic encoder by the random edge deletion technique is realized by performing graph attention aggregation through the obtained weight coefficients.
4. The deep graph attention-confrontation variational automatic encoder training method as claimed in claim 1, wherein in the process of graph attention aggregation, the neighbor nodes are identified by means of the adjacency matrix, and after the graph attention layer obtains the unnormalized attention coefficient, the masking operation is performed.
5. The deep map attention-confrontation variational automatic encoder training method as claimed in claim 1, wherein the map attention-confrontation automatic encoder is combined with a random edge deletion technique, called deep map attention-confrontation variational automatic encoder, which randomly deletes a certain proportion of edges of the input map by the random edge deletion technique in each training.
6. The deep map attention-confrontation variational automatic encoder training method of claim 1, wherein the random edge deletion technique randomly deletes a certain proportion of edges of the input map, specifically: will randomly adjoin V of matrix ApThe non-zero elements are reset to zero, where V is the total number of edges and p is the erasure rate.
7. The deep map attention-fighting variational automatic encoder training method as set forth in claim 1, wherein the encoder in the map attention-fighting automatic encoder serves as a generator of the fighting network, the generator deceives the discriminator by generating fake data, wherein the fake data refers to latent variables obtained by the encoder by encoding of the map data;
the task of the discriminator is to distinguish whether the sample is from the true data or the generator, the discriminator will be from the prior distributionzThe output data is judged to be positive, and the data from the latent variable output is judged to be negative.
8. Deep map attention confrontation variational automatic encoder training system, characterized by, includes:
a feature vector formation module configured to: obtaining a first feature vector corresponding to each node in graph data;
after the first feature vector is subjected to aggregation operation taking the attention mechanism as a core, outputting a second feature vector of each node for forming a plurality of groups of independent attention mechanisms;
an auto-encoder training module configured to: applying attention distribution to a plurality of relevant features between the central node and the neighbor nodes by summarizing a plurality of groups of independent attention mechanisms to form a graph attention confrontation automatic encoder;
encoding graph data using an encoder of a graph attention confrontation auto-encoder to obtain a graph representation vector;
and performing inner product processing on the graph representation vector by using a decoder to reconstruct the graph data, and predicting whether a connecting edge exists between any two points in the graph.
9. A computing device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of the preceding claims 1 to 7.
CN202110630525.3A 2021-06-07 2021-06-07 Deep map attention confrontation variational automatic encoder training method and system Pending CN113361606A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110630525.3A CN113361606A (en) 2021-06-07 2021-06-07 Deep map attention confrontation variational automatic encoder training method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110630525.3A CN113361606A (en) 2021-06-07 2021-06-07 Deep map attention confrontation variational automatic encoder training method and system

Publications (1)

Publication Number Publication Date
CN113361606A true CN113361606A (en) 2021-09-07

Family

ID=77532677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110630525.3A Pending CN113361606A (en) 2021-06-07 2021-06-07 Deep map attention confrontation variational automatic encoder training method and system

Country Status (1)

Country Link
CN (1) CN113361606A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114004859A (en) * 2021-11-26 2022-02-01 山东大学 Method and system for segmenting echocardiography left atrium map based on multi-view fusion network
CN114580252A (en) * 2022-05-09 2022-06-03 山东捷瑞数字科技股份有限公司 Graph neural network simulation method and system for fluid simulation
CN115271033A (en) * 2022-07-05 2022-11-01 西南财经大学 Medical image processing model construction and processing method based on federal knowledge distillation
CN116108375A (en) * 2022-12-19 2023-05-12 南京理工大学 Graph classification method based on structure sensitive graph dictionary embedding
CN116126947A (en) * 2023-04-18 2023-05-16 西昌学院 Big data analysis method and system applied to enterprise management system
CN116584951A (en) * 2023-04-23 2023-08-15 山东省人工智能研究院 Electrocardiosignal detection and positioning method based on weak supervision learning
CN116738201A (en) * 2023-02-17 2023-09-12 云南大学 Illegal account identification method based on graph comparison learning

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114004859A (en) * 2021-11-26 2022-02-01 山东大学 Method and system for segmenting echocardiography left atrium map based on multi-view fusion network
CN114580252A (en) * 2022-05-09 2022-06-03 山东捷瑞数字科技股份有限公司 Graph neural network simulation method and system for fluid simulation
CN115271033A (en) * 2022-07-05 2022-11-01 西南财经大学 Medical image processing model construction and processing method based on federal knowledge distillation
CN115271033B (en) * 2022-07-05 2023-11-21 西南财经大学 Medical image processing model construction and processing method based on federal knowledge distillation
CN116108375A (en) * 2022-12-19 2023-05-12 南京理工大学 Graph classification method based on structure sensitive graph dictionary embedding
CN116108375B (en) * 2022-12-19 2023-08-01 南京理工大学 Graph classification method based on structure sensitive graph dictionary embedding
CN116738201A (en) * 2023-02-17 2023-09-12 云南大学 Illegal account identification method based on graph comparison learning
CN116738201B (en) * 2023-02-17 2024-01-16 云南大学 Illegal account identification method based on graph comparison learning
CN116126947A (en) * 2023-04-18 2023-05-16 西昌学院 Big data analysis method and system applied to enterprise management system
CN116126947B (en) * 2023-04-18 2023-06-30 西昌学院 Big data analysis method and system applied to enterprise management system
CN116584951A (en) * 2023-04-23 2023-08-15 山东省人工智能研究院 Electrocardiosignal detection and positioning method based on weak supervision learning
CN116584951B (en) * 2023-04-23 2023-12-12 山东省人工智能研究院 Electrocardiosignal detection and positioning method based on weak supervision learning

Similar Documents

Publication Publication Date Title
CN113361606A (en) Deep map attention confrontation variational automatic encoder training method and system
Kuo et al. Green learning: Introduction, examples and outlook
CN112529168B (en) GCN-based attribute multilayer network representation learning method
Natesan Ramamurthy et al. Model agnostic multilevel explanations
Chen et al. An efficient network behavior anomaly detection using a hybrid DBN-LSTM network
Zhang et al. Interpreting and boosting dropout from a game-theoretic view
CN113919441A (en) Classification method based on hypergraph transformation network
CN113065649B (en) Complex network topology graph representation learning method, prediction method and server
Li et al. Embedded stacked group sparse autoencoder ensemble with L1 regularization and manifold reduction
CN115168443A (en) Anomaly detection method and system based on GCN-LSTM and attention mechanism
CN115496144A (en) Power distribution network operation scene determining method and device, computer equipment and storage medium
CN111782804A (en) TextCNN-based same-distribution text data selection method, system and storage medium
Sun et al. Optimization of classifier chains via conditional likelihood maximization
CN111291810A (en) Information processing model generation method based on target attribute decoupling and related equipment
CN112329918A (en) Anti-regularization network embedding method based on attention mechanism
Wickramasinghe et al. Deep self-organizing maps for visual data mining
CN117272195A (en) Block chain abnormal node detection method and system based on graph convolution attention network
CN114596464A (en) Multi-feature interactive unsupervised target detection method and system, electronic device and readable storage medium
Mitchell et al. Deep learning using partitioned data vectors
Xiang et al. Efficient learning-based community-preserving graph generation
Tosun et al. Assessing diffusion of spatial features in deep belief networks
Choi et al. Deep learning price momentum in US equities
Fonseca et al. A genetic algorithm assisted by a locally weighted regression surrogate model
Ling Score prediction of sports events based on parallel self-organizing nonlinear neural network
CN117272119B (en) User portrait classification model training method, user portrait classification method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination