CN112819154B

CN112819154B - Method and device for generating pre-training model applied to graph learning field

Info

Publication number: CN112819154B
Application number: CN202110072779.8A
Authority: CN
Inventors: 杨洋; 邵平; 王春平; 徐晟尧; 胥奇; 陈磊
Original assignee: Shanghai Shanghu Information Technology Co ltd; Zhejiang University ZJU
Current assignee: Shanghai Shanghu Information Technology Co ltd; Zhejiang University ZJU
Priority date: 2021-01-20
Filing date: 2021-01-20
Publication date: 2024-05-28
Anticipated expiration: 2041-01-20
Also published as: CN112819154A

Abstract

The invention discloses a method and a device for generating a pre-training model applied to the field of graph learning, wherein the method comprises the following steps: determining a first node from a first graph structure sample, determining a first sample sub-graph of the first node, wherein the first graph structure sample is any one of first training samples, the attribute of each node of the first graph structure sample is different from the attribute of each node of a second graph structure sample in the first training sample, determining a graph base element label vector of the first node and an initial representation vector of the first sample sub-graph, obtaining a first characteristic representation vector and a graph base element prediction vector of the first node, and training an initial pre-training model according to the graph base element prediction vector and the graph base element label vector until a pre-training model is obtained. And further, the pre-training model is obtained through a plurality of graph structure samples, and the attributes of the plurality of graph structure samples are different, so that the pre-training model can be used as the pre-training model of the graph structure with each attribute.

Description

Method and device for generating pre-training model applied to graph learning field

Technical Field

The invention relates to the field of graph learning, in particular to a method and a device for generating a pre-training model applied to the field of graph learning.

Background

The graph convolutional neural network learns the representation vectors of the nodes in the picture by gathering the characteristics of surrounding neighbor nodes, and can process downstream tasks such as node classification, picture classification and the like in the picture. The picture can be used as a picture structure for representing data, and the picture structure has unique advantages in the correlation information between study objects, for example, in the field of biochemistry, molecules can be regarded as the picture structure, and different atoms are nodes and are connected through chemical bonds; in the academic citation network, nodes represent scholars, and cooperation among scholars is association information among the nodes, namely edges in the graph structure; in the field of electronic commerce, users and goods may compose a graph structure for personalized recommendation.

However, the semantics represented by the nodes in the graph structure are different for different graph structures, e.g., each node represents a learner in the graph structure of an academic-referenced network, and the nodes also include user interests in the graph structure of the social network. In the prior art, the graph roll-up neural network model is generally directly trained through a data set, a pre-training process is not performed, even if the graph roll-up neural network model is pre-trained, the pre-training model can be obtained only through graph structures of a single attribute or a series of attributes in a certain unique field, but the obtained pre-training model can be used only for graph structures of a corresponding single attribute or a series of attributes in a certain unique field, and cannot be used for graph structures of other attributes, namely, the pre-training model cannot be obtained through graph structures of different attributes, and the obtained pre-training model cannot be used for training structures of different attributes to obtain a final model.

Therefore, a model pre-training method is needed, a pre-training model can be obtained through the graph structure of each attribute, and then the model of each attribute is obtained through the obtained pre-training model, so that the model training efficiency is improved.

Disclosure of Invention

The embodiment of the invention provides a method and a device for generating a pre-training model applied to the field of graph learning, which are used for obtaining the pre-training model through graph structure samples of all attributes, obtaining a graph neural network model of all the attributes through the obtained pre-training model and accelerating the training efficiency of the graph neural network model of all the attributes.

In a first aspect, an embodiment of the present invention provides a method for generating a pre-training model applied to the field of graph learning, including:

determining a first node from a first graph structure sample and determining a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; the attribute of each node of the first graph structure sample is different from the attribute of each node of the second graph structure sample in the first training sample;

determining a graph primitive tag vector of the first node and an initial representation vector of the first sample subgraph according to the first graph structure sample;

Inputting the initial representation vector of the first sample subgraph into an initial pre-training model to obtain a first characteristic representation vector; inputting the first characteristic representation vector to a first initial neural network model to obtain a graph base element prediction vector of the first node;

and training the initial pre-training model according to the graph base element prediction vector and the graph base element label vector until the pre-training model is obtained.

According to the technical scheme, nodes are determined from a plurality of graph structure samples, then sample sub-graphs of the nodes are obtained, further, graph base element label vectors of the nodes and initial representation vectors of the sample sub-graphs are obtained, wherein feature representation vectors of the sample sub-graphs represent structural information of the nodes in graph structure samples, then the initial representation vectors of the sample sub-graphs are input into an initial pre-training model to obtain feature representation vectors, then the feature representation vectors are input into a first initial neural network model to obtain graph base element prediction vectors of the nodes, and finally, the initial pre-training model is trained according to the graph base element prediction vectors and the graph base element label vectors until the pre-training model is obtained. Because the graph base element represents the structure information of the nodes in the graph structure samples, it is required to be noted that the node attributes among the plurality of graph structure samples are different, a pre-training model is obtained through the feature representation vector representing the structure information and the graph base element vector, and the pre-training model is obtained through the graph structure samples of all the attributes, namely, the pre-training model can be used for training the graph structure samples of all the attributes to obtain the graph neural network model of the corresponding attributes.

Optionally, determining a first node from the first graph structure sample and determining a first sample sub-graph of the first node includes:

Randomly selecting a node in the first graph structure sample as the first node;

And obtaining a first sample subgraph of the first node by restarting a random walk strategy algorithm according to the connection relation among the nodes in the first graph structure sample.

The sample subgraphs of the nodes are obtained through restarting the random walk strategy algorithm, the generalization capability of the obtained pre-training model can be increased, a plurality of sample subgraphs of the nodes can be obtained through restarting the random walk strategy algorithm for a plurality of times, the sample subgraphs of the nodes are increased, and the pre-training samples are increased.

Optionally, determining, according to the first graph structure sample, a graph primitive label vector of the first node includes:

And counting the number of the graph primitives of each preset type of the first node according to the connection relation among the nodes in the first graph structure sample, and obtaining the graph primitive label vector of the first node according to the number of the graph primitives of each preset type.

Optionally, determining an initial representation vector of the first sample subgraph according to the first graph structure sample includes:

Obtaining an initial representation vector of the first sample sub-graph according to the following formula (1);

I-(K^-1/2MK^-1/2)＝UΛU^T………………………………………………(1)；

wherein I is an identity matrix; k is a degree matrix of the first node in the first graph structure sample; m is an adjacency matrix of the first node in the first graph structure sample; u is the initial representation vector of the first sample sub-graph.

Optionally, training the initial pre-training model according to the primitive prediction vector and the primitive label vector until the pre-training model is obtained, including:

determining a vector difference value according to the graph primitive prediction vector and the graph primitive label vector through the following formula (2);

Wherein V is a preset value; c _u is the graph primitive label vector of the first node; f (G' _u) is the graph primitive predictive vector of the first node; g is a first graph structure sample; l is the vector difference value;

updating the initial pre-training model and the first initial neural network model according to the vector difference value; and obtaining the pre-training model until the vector difference value meets a set condition.

Optionally, after obtaining the pre-training model, the method further includes:

Constructing an initial model, wherein the initial model comprises the pre-training model and a second initial neural network model;

inputting a third graph structure sample into the pre-training model aiming at any third graph structure sample in the second training samples to obtain a second characteristic expression vector; inputting the second characteristic representation vector to the second initial neural network model to obtain a predicted value of the third graph structure sample; nodes of each graph structure sample in the second training sample have the same attribute;

and training the initial model according to the predicted value and the label value of the third graph structure sample until the training ending condition is met.

In the technical scheme, the pre-training model is obtained according to the graph structure samples of the attributes, namely, the pre-training model is equivalent to a comprehensive pre-training model, and the graph neural network model of the corresponding attribute is obtained by training according to the graph structure samples of the attributes through the pre-training model, so that the training efficiency of the graph neural network model of the attributes is improved, and the model training time is shortened.

Optionally, training the initial model according to the predicted value and the label value of the third graph structure sample includes:

Updating the second initial neural network model according to the predicted value and the label value of the third graph structure sample; or updating the pre-training model and the second initial neural network model according to the predicted value and the label value of the third graph structure sample.

According to the technical scheme, the training efficiency is improved and the training time is shortened by only updating the second initial neural network model to train the initial model. Training the initial model by updating the pre-training model and the second initial neural network model increases the accuracy of the initial model after training.

In a second aspect, an embodiment of the present invention provides a device for generating a pre-training model applied to the field of graph learning, including:

a selection module, configured to determine a first node from a first graph structure sample and determine a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; the attribute of each node of the first graph structure sample is different from the attribute of each node of the second graph structure sample in the first training sample;

The processing module is used for determining a graph base element label vector of the first node and an initial representation vector of the first sample subgraph according to the first graph structure sample;

Optionally, the selecting module is specifically configured to:

Optionally, the processing module is specifically configured to:

Optionally, obtaining an initial representation vector of the first sample sub-graph according to the following formula (1);

Optionally, the processing module is specifically configured to:

Optionally, the processing module is further configured to:

After the pre-training model is obtained, an initial model is built, wherein the initial model comprises the pre-training model and a second initial neural network model;

Optionally, the processing module is specifically configured to:

In a third aspect, embodiments of the present invention also provide a computing device, comprising:

A memory for storing program instructions;

And the processor is used for calling the program instructions stored in the memory and executing the generation method of the pre-training model applied to the graph learning field according to the obtained program.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where computer-executable instructions are stored, where the computer-executable instructions are configured to cause a computer to perform the method for generating a pre-training model applied to the graph learning field.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a diagram structure according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a system architecture according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a method for generating a pre-training model applied to the field of graph learning according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a primitive provided in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a model calculation according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an initial pre-training model calculation according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a device for generating a pre-training model applied to the field of graph learning according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the prior art, the graph structure is various, taking a as an example in fig. 1, a represents a graph structure, wherein the graph structure comprises nodes (such as a and B) and edges between the nodes; the nodes represent objects to be analyzed, and the edges represent the association relations with certain attributes among the objects. It can be seen that the nodes in different graph structures are different, the number of edges between the nodes is different, the neighbors of the same node are different, the nodes and the edges are not in fixed order, the attributes of the nodes may be the same or different, the edges may or may not be weighted, and the graph may be dynamic or static. The graph structure is a bridge bridging the real world data set and the artificial intelligence technology, and has extremely high research significance and practical value. Many real-life scene problems can be translated into classical problems in the field of graph learning. Finding fraud in a telecommunication network can be understood as an abnormal node detection problem in the graph structure.

The graph neural network model is typically trained by using a large number of graph structures as training data. However, in a real application scenario, it is often difficult to obtain sufficient training data. Pretraining techniques are often used to address the problem of insufficient amounts of training data. Pretraining techniques have achieved pleasing results in the areas of computer vision and natural language processing, but pretraining in the area of graphics learning remains a challenging problem. The reason for this is that objects in the natural language processing domain and in the computer vision domain have a uniform meaning, e.g. for the natural language processing domain the same words and commonly used expressions in different texts all contain the same semantics. However, in the field of graph learning, because the attributes of the graph structures are different, for example, fig. 1 schematically illustrates a graph structure, as shown in fig. 1, in which a graph a in fig. 1 is a graph structure of social attributes, and nodes represent users, where attributes between users a and B are the same interests. In fig. 1, b is a graph structure of academic attributes, and nodes represent users, wherein attributes between users a and C are the same set up schools directions of the academic study. Pretraining graph structures for different attributes is therefore not possible.

For graph structures of different attributes, a pre-training model cannot be obtained from multiple graph structures in the prior art. Therefore, there is a need for a method for generating a pre-training model to reduce training time of a graph neural network model and improve training efficiency of the graph neural network model.

Fig. 2 illustrates a system architecture to which embodiments of the present invention are applicable, the system architecture including a server 200, the server 200 may include a processor 210, a communication interface 220, and a memory 230.

Wherein the communication interface 220 is configured to obtain a graph structure sample of each attribute.

Processor 210 is a control center of server 200, connects various portions of the entire server 200 using various interfaces and routes, performs various functions of server 200 and processes data by running or executing software programs and/or modules stored in memory 230, and invoking data stored in memory 230. Optionally, the processor 210 may include one or more processing units.

The memory 230 may be used to store software programs and modules that the processor 210 executes to perform various functional applications and data processing by executing the software programs and modules stored in the memory 230. The memory 230 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function, and the like; the storage data area may store data created according to business processes, etc. In addition, memory 230 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

It should be noted that the structure shown in fig. 2 is merely an example, and the embodiment of the present invention is not limited thereto.

Based on the above description, fig. 3 exemplarily illustrates a flow of a method for generating a pre-training model applied to the field of graph learning, which is provided by an embodiment of the present invention, and the flow may be executed by a generating device for a pre-training model applied to the field of graph learning.

As shown in fig. 3, the process specifically includes:

Step 310, determining a first node from a first graph structure sample and determining a first sample subgraph of the first node.

In the embodiment of the invention, the first graph structure sample is any one of the first training samples, and the attribute of each node of the first graph structure sample is different from the attribute of each node of the second graph structure sample in the first training sample. In other words, the first training sample includes a plurality of graph structures with different attributes, and each graph structure is constructed according to a single attribute. For example, the first training samples include a graph structure 1, a graph structure 2 and a graph structure 3, where the graph structure 1 is a graph structure in which the attribute of each node is social attribute, that is, the nodes with edges in the graph structure 1 have the same interest and taste, the graph structure 2 is a graph structure in which the attribute of each node is academic attribute, that is, the nodes with edges in the graph structure 2 have the same research direction, and the graph structure 3 is a graph structure in which the attribute of each node is taste, that is, the edges between the nodes with edges in the graph structure 3 have the same eating taste. The first drawing structure sample is any one of the drawing structures of fig. 1, fig. 2, and fig. 3. The second drawing structure sample is any one of the drawing structures except the first drawing structure sample in the drawing structure 1, the drawing structure 2 and the drawing structure 3.

In the embodiment of the present invention, the manner of determining the first node may be: and sequencing the nodes according to the number of edges of the nodes in the first graph structure, and determining the first nodes from the nodes from small to large or from large to small, or randomly selecting the first nodes in the first graph structure, or selecting the first nodes according to the positions of the nodes in the graph structure, and the like.

The first sample sub-graph refers to a sub-graph formed by searching nodes which have direct or indirect association relation with the first node in the first graph structure sample. Fig. 5 is a schematic diagram schematically illustrating a model calculation, where fig. 5 is a graph structure sample, fig. 5 (1) is a graph structure sample, a black node (node a in fig. 5 (3)) in fig. 5 (2) is a determined first node, and fig. 5 (3) is a first sample subgraph of the first node. Of course, the first sample subgraph of the first node may also be a subgraph formed by four nodes of ABCD. In the embodiment of the present invention, the method for determining the first sample subgraph of the first node may be determined according to the number of edges of other nodes connected to the first node, for example, node x is connected to node y and node z, where the number of edges of node y is 5, and the number of edges of node z is 3, because the number of edges of node y is greater than a number threshold (e.g., a number threshold is 4), the sample subgraph of node x includes node x and node y.

Further, randomly selecting a node in the first graph structure sample as a first node, and obtaining a first sample subgraph of the first node by restarting a random walk strategy algorithm according to the connection relation among the nodes in the first graph structure sample. And obtaining a first sample subgraph of the first node by restarting the random walk strategy algorithm so as to increase the generalization capability of the trained pre-training model.

Step 320, determining a graph base label vector of the first node and an initial representation vector of the first sample sub-graph according to the first graph structure sample.

A graph primitive (Motif) is a basic module that constitutes a graph, describing the sub-graph structure of a particular connection between different nodes. The graph base has wide application in the fields of bioinformatics, neuroscience, biology, social networks and the like. As a sub-graph structure frequently occurring in a graph, the graph base contains rich information. For graph structures of different properties, having a denser structure implies a more intimate relationship between nodes, such as Bob and his two intimate friends in a social network and Bob and his two co-workers in an academic network. In contrast, the meaning of the graph element having a relatively loose structure is the opposite.

The graph primitive comprises a plurality of types, and the embodiment of the invention selects 2-4-order undirected graph primitives, as shown in fig. 4, wherein the preset types of the graph primitives comprise 15 types, and black nodes are determined first nodes.

In the embodiment of the invention, the graph base element label vector of the first node is obtained according to the connection relation among the nodes in the graph structure sample. The primitive label vector represents the proportion of each primitive in the first primitive structure sample of the first node, and can be specifically determined according to the weight of the edge connected between the first node and other nodes, or according to the number of primitives.

Further, according to the connection relation among the nodes in the first graph structure sample, the number of graph primitives of each preset type of the first node is counted, and according to the number of the graph primitives of each preset type, the graph primitive label vector of the first node is obtained. The graph primitive tag vector (motif count vector) of a node represents the number of occurrences c _u of each graph primitive surrounding the node that includes the node, respectively, whereIndicating the number of times the i-th graph base element appears around node u. In the embodiment of the invention, for each graph structure sample, a graph primitive label vector of each node in the graph structure sample can be obtained by calculation using an orca algorithm.

As shown in fig. 4, the number of primitives of each preset type that exist in the graph structure sample by the first node can be determined by the preset type of primitives, for example, the number of primitives of m ₀ type in fig. 4 that only includes two nodes, and the number of primitives of m ₀ type in fig. 1 that is shown in the graph a is 4. Similarly, the number of primitives of each preset type for node A may be determined.

In this embodiment of the present invention, the initial representation vector of the first sample sub-graph may be obtained according to a degree matrix of the first node in the first graph structure sample, for example, the initial representation vector of the first sample sub-graph is an inverse matrix or a transpose matrix of the degree matrix of the first node in the first graph structure sample, and the initial representation vector of the first sample sub-graph may also be obtained according to an adjacent matrix of the first node in the first graph structure sample, for example, the initial representation vector of the first sample sub-graph is an inverse matrix or a transpose matrix of the adjacent matrix of the first node in the first graph structure sample, or the initial representation vector of the first sample sub-graph is a multiplication of the degree matrix of the first node in the first graph structure sample and the adjacent matrix of the first node in the first graph structure sample.

Further, from the first graph structure samples, determining an initial representation vector of the first sample graph includes:

Step 330, inputting the initial representation vector of the first sample subgraph to an initial pre-training model to obtain a first feature representation vector; and inputting the first characteristic representation vector into a first initial neural network model to obtain a graph base element prediction vector of the first node.

In the embodiment of the invention, the characteristic representation vector of the first node can be obtained by inputting the initial representation vector of the sample subgraph of the first node into the initial pre-training model, wherein the characteristic representation vector represents the structural information of the first node in the graph structure sample, and then the graph base element prediction vector of the first node is obtained through the first initial neural network model. As shown in FIG. 5, the pattern structure determines the node A by randomly selecting the node, obtaining a sample sub-graph of the node A by restarting the random walk strategy algorithm, determining an initial representation of the sample sub-graph of the node A according to the above formula (1), for example, [3,2,2,4,3,5,1] of the sample sub-graph of the node A, inputting an initial representation vector of the sample sub-graph of the node A into an initial preset training model to obtain a feature representation vector of the node A, for example, a 32-dimensional vector of the feature representation vector of the node A is [2,4, … …,3,2], and inputting the feature representation vector of the node A into a first initial neural network model to obtain a graph primitive prediction vector of the node A, so as to reduce the dimension of the feature representation vector of the node A, and enable the dimension of the obtained graph primitive prediction vector of the node A to be consistent with the dimension of the graph primitive label vector of the node A, and the graph primitive prediction vector of the node A represents the number of graph primitives of each preset type of the node A.

Fig. 6 is an exemplary diagram illustrating calculation of an initial pre-training model, as shown in fig. 6, by using an initial representation vector of a sample sub-graph as an input value of the initial pre-training model, and further obtaining a first layer training value of each node in the sample sub-graph, where the first layer training value of the node B is obtained according to the node a and the node C, and similarly, the first layer training value of the node a is obtained according to the first layer training values of the node B, the node C, the node E and the node D, and further training is continued by using the first layer training value of the node a to obtain a second layer training value of the node a-G until a final feature representation vector of the node a is obtained.

It should be noted that, the dimension of the primitive vector of the first node obtained by inputting the feature expression vector of the first node into the first initial neural network model is consistent with the dimension of the primitive label vector of the first node counted in the above step 320, for example, as shown in fig. 5, the dimension of the primitive label vector of the first node counted is 15 dimensions, and then the initial expression vector of the sample subgraph of the node a is input into the initial preset training model to obtain a 64-dimension feature expression vector, and then the obtained 64-dimension feature expression vector is input into the first initial neural network model to obtain a 15-dimension primitive vector of the first node.

And step 340, training the initial pre-training model according to the graph base element prediction vector and the graph base element label vector until the pre-training model is obtained.

In the embodiment of the invention, a vector can be obtained by calculating according to the graph base element prediction vector and the graph base element label vector, and a value is determined by the obtained vector and is used for training an initial pre-training model. For example, the ratio of each dimension vector in the primitive prediction vector and the primitive label vector is determined, a 15-dimensional ratio vector is obtained, the variance is determined according to 15 values of the ratio vector, and the variance is used for training the initial pre-training model.

Determining a vector difference value of the first node according to the graph base element prediction vector and the graph base element label vector of the first node, training an initial pre-training model through the vector difference value of the first node, and specifically determining the vector difference value according to the graph base element prediction vector and the graph base element label vector through the following formula (2);

Wherein V is a preset value; c _u is the graph primitive label vector of the first node; f (G' _u) is the graph primitive predictive vector of the first node; g is a first graph structure sample; l is the vector difference value. It should be noted that V may also be the number of nodes in the first graph structure sample. For example, if the number of nodes in the first graph structure sample is 1000, v=1000, c _u is a graph primitive label vector of a u-th node in the first graph structure sample, which is a 15-dimensional vector, f (G' _u) is a graph primitive prediction vector of the u-th node, which is also a 15-dimensional vector, and a 15-dimensional difference vector can be obtained by |c _u-f(G'_u) |, and then the 15 difference vectors are summed according to the summation operation, and then the vector difference value of the u-th node is obtained according to V.

And updating the initial pre-training model and the first initial neural network model according to the vector difference value until the vector difference value meets the set condition to obtain the pre-training model.

In the embodiment of the invention, the vector difference value of each node is used as an error term or an initial pre-training model loss value in a gradient descent method, so that the initial pre-training model and the first initial neural network model are reversely propagated until the loss function of the initial pre-training model meets a set condition, and a pre-training model is obtained. Or the average value of vector difference values of a plurality of nodes can be used as an error term in a gradient descent method or an initial pre-training model loss value. For example, in combination with the example in step 330, as can be seen from the above formula 2, the difference value between the primitive prediction vector of a first node in 15 dimensions and the primitive label vector of the first node in 15 dimensions is determined first, then the vector difference value of the first node is obtained according to the summation operation, for example, the vector difference value of the first node is 0.9, similarly, the vector difference value of the second node is determined to be 1.0, the vector difference values of 64 nodes are determined in total, the average value of the vector difference values of 64 nodes is obtained through the average value algorithm, and the average value is used as the error term or the initial pre-training model loss value in the gradient descent method.

The invention uses the graph primitive as the label of the training sample of the pre-training model, and the training of the initial pre-training model by using the graph primitive in the graph structure has the following two advantages.

(1) High-order structure information: the different graph elements represent different roles that each node assumes at the structural level. If the graph base label vectors of two nodes are relatively close, then the two nodes will be considered to be relatively similar in structure.

(2) Structural layer: the invention uses an initial pre-training model to grasp the structure information, so that the trained pre-training model can distinguish the types and the numbers of the primitives, and can distinguish different semanteme born by each node in different primitives.

In the embodiment of the present invention, before determining the first sample graph of the first node, the method further includes preprocessing a graph structure in the first training sample. Specifically, the weights of edges between nodes in the graph structure are removed from the graph structure in the first training sample through a preset algorithm, such as an orca algorithm, so that subsequent calculation amount is reduced, and calculation is simplified.

In the embodiment of the invention, after the pre-training model is obtained, a final model can be obtained through the third graph structure sample and the pre-training model.

Further, an initial model is built, the initial model comprises a pre-training model and a second initial neural network model, the third graph structure samples are input into the pre-training model aiming at any third graph structure sample in the second training samples to obtain second characteristic expression vectors, the second characteristic expression vectors are input into the second initial neural network model to obtain predicted values of the third graph structure samples, nodes of all graph structure samples in the second training samples have the same attribute, and the initial model is trained according to the predicted values and label values of the third graph structure samples until the training ending conditions are met.

In the embodiment of the invention, the model training is performed through the pre-training model and the second training sample with the same attribute, so that the graph neural network model is obtained, and the model training efficiency is improved. For example, the graph structure samples in the second training samples are all graph structure samples with communication attributes, such as nodes are users, attributes among the nodes are historical communication records, and a graph neural network model is obtained through the pre-training model and the graph structure samples with communication attributes, and is used for finding out abnormal users in the graph structure with communication attributes, wherein the abnormal users may be marketing users, fraudulent users and the like.

It should be noted that, in the training process, parameters of the pre-training model and/or the second initial neural network model may be selectively updated to obtain the graph neural network model according to the actual application.

Further, the second initial neural network model is updated according to the predicted value and the label value of the third graph structure sample, or the pre-training model and the second initial neural network model are updated according to the predicted value and the label value of the third graph structure sample.

In the embodiment of the invention, the training of the initial model comprises two modes, namely, the first mode: training the initial model by updating only the second initial neural network model increases training efficiency and reduces training time.

The second way is: training the initial model by updating the pre-training model and the second initial neural network model increases the accuracy of the initial model after training.

The trained initial model obtained in the embodiment of the invention can be used for various tasks, such as classifying the nodes in the graph structure (classifying the nodes in the communication graph structure into abnormal users and normal users), classifying the graph structure (such as classifying the graph structure of 120 movie posters into the graph structure of action movies or the graph structure of love movies), and referring to the pooling operation of the graph neural network to obtain the initial representation vector of the graph structure when classifying the graph structure, such as reading operation (readout).

Data proofs are provided by way of example in embodiments of the present invention, as described below.

For node classification in the Graph structure, the embodiment of the invention exemplarily provides two data sets U1 and U2, and reference algorithms ProNE, netMF, VERSE, graphWave, GIN and GCC for node classification in the prior art, and reference algorithms Graph2vec, DGCNN, GIN, GCC and InfoGraph for the Graph structure classification.

In the data set U1, nodes represent registered users, edges represent call records of users, wherein users who do not repay in time are regarded as positive samples, and users who repay in time are regarded as negative samples, wherein 1104 nodes (123 positive samples and 981 negative samples) and 1719 edges are included.

In the data set U2, each node represents a customer, and an edge represents that there is a call record between two customers. Wherein the nodes of the big clients are positive samples, the nodes of the ordinary clients are negative samples, and the data set U2 comprises 9953 nodes (1140 big clients, 8813 ordinary clients) and 373923 edges.

For the classification of graph structures, embodiments of the present invention exemplarily provide five data sets R1, R2, R3, R4, and R5.

Data set R1: a total of 1000 figures containing two labels (action-like and love-like) each corresponding to one label.

Data set R2: a total of 1500 figures, one for each of the three labels (comedy, love movies, and science fiction movies) are included.

Data set R3: two tags (question and answer community and discussion community) were included for a total of 2000 figures. The nodes in each graph represent users, and the edges between two nodes represent that one user and the other user have exchanged one time in the comment.

Data set R4: five tags (global news community, video community, animal community, etc.) were included for a total of 5000 figures.

Data set R5: the three tags contained a total of 5000 figures (high energy physics, condensed physics and celestial physics). Each figure represents a self-centering network of a particular researcher, and the labels represent the research areas in which the researchers are located.

ProNE algorithm: the representation vectors of the nodes are learned by a scalable and efficient model that employs sparse matrix decomposition and propagation in spectral space. In the data demonstration in the present invention, proNE algorithm adopts default parameters, namely setting θ=0.5, μ=0.2, and recursion step number 10.

NetMF algorithm: in the word2 vec-based graph embedded learning model, large frameworks of matrix decomposition are unified, and common models such as DeepWalk, node vec, LINE and PTE are included. In the data demonstration in the present invention, netMF algorithm employed default parameters. For example 256 feature pairs, using an approximate normalized graph laplace decomposition (approximate normalized GRAPH LAPLACIAN) results in a 128-dimensional representation vector.

VERSE algorithm: three similarities are considered when learning the representation vector: community structure, roles and structural equivalence. And then obtaining the representation vector of the node through neural network learning. In the data demonstration in the present invention, the VERSE algorithm uses default parameters, i.e., set α to 0.85 and the learning rate to 0.0025.

GRAPHWAVE algorithm: the representation vector of the node is obtained in an unsupervised form by a wavelet diffusion mode in thermodynamics. In the data demonstration in the present invention, GRAPHWAVE algorithm adopts an accurate mechanism, namely the sampling number is 50, and the thermal coefficient is 1000.

GIN algorithm: the graph neural network is designed by means of the addition pooling operation, and has the same excellent capability of distinguishing different graph structures as WEISFEILER-Leman test.

GCC algorithm: the model is pre-trained through a contrast learning task in the unsupervised learning, so that the graph structure information is grasped.

Graph2vec algorithm: the graph is treated as a document, the subgraphs therein are treated as words, and the representation of the graph structure is learned.

DGCNN algorithm: a novel SortPooling layer is matched through a local graph rolling model.

InfoGraph algorithm: the representation vector is learned by maximizing mutual information.

Table 1 below is data for each algorithm for node classification, where the MPT algorithm is a technical solution of the present invention.

TABLE 1

Table 2 below is data of each algorithm for classifying the graph structure, wherein the MPT algorithm is a technical solution in the first mode of the present invention.

TABLE 2

Accuray	R1	R2	R3	R4	R5
						Graph2vec	0.6103	0.3467	0.7850	0.3793	0.7180
DGCNN	0.7100	0.5133	/	/	0.7040
						InfoGrpah	0.6945	0.4677	0.8095	0.4900	0.6878
GIN	0.7280	0.4705	0.7841	0.5014	0.7265
						GCC	0.6726	0.4785	0.8483	0.4454	0.7562
MPT(1)	0.6717	0.4712	0.8524	0.5110	0.7456

Table 3 below is data of each algorithm for classifying the graph structure, wherein the MPT algorithm is a technical solution in the second mode of the present invention.

TABLE 3 Table 3

Accuray	R1	R2	R3	R4	R5
						GCC	0.7080	0.4850	0.8640	0.4740	0.7900
MPT(2)	0.7177	0.4951	0.8660	0.4962	0.8048

In tables 1,2 and 3, accuray is the accuracy of the algorithm, precision is the accuracy of the algorithm, recall is the recall of the algorithm (recall is a measure of the algorithm coverage data), and F1 is the overall evaluation index.

Wherein,

TP represents True Positive, i.e., the True label is Positive and the model is also judged as Positive (Positive), FP represents False Positive, i.e., the True label is Negative and the model is judged as Positive (Positive), FN represents FALSE NEGATIVE, i.e., the True label is Positive and the model is judged as Negative (Negative), TN represents True Negative, i.e., the True label is Negative and the model is also judged as Negative (Negative). First by computing the overall TP, FN and FP, then F1 is computed.

Table 1 shows the data results, and the technical scheme of the invention achieves better effect on each index on the two data sets. For the F1 index, the invention exceeds the average of 10.23% and 14.31% of the baseline algorithm on the U1 and U2 data sets, respectively. It proves that the downstream task based on these two data sets is specifically to find abnormal fraudulent users in the data sets. As a fraudulent user in the graph structure would make a call to all known people as much as possible, the person contacted by the fraudulent user would tend to be unknown, so that more graph elements of the non-closed structure would appear around the fraudulent user, whereas, on the contrary, for a normal user, two people contacted by his phone would likely be in the same circle and also have call records, e.g. staff of a company made a call to co-workers in two companies, which co-workers would also have a call contact, so that relatively more graph elements of the closed structure would appear around the normal user.

For the graph classification task, as shown in table 2, the technical scheme in the first mode of the invention is that on the basis of the R3 and R4 data sets, the accuracy index exceeds other basic algorithms by 5.72% and 12.55% of the average value of the other basic algorithms respectively.

As shown in fig. 3, the accuracy of the technical solution in the second mode of the present invention on each data set exceeds the GCC algorithm.

In addition, training the initial model using the pre-training model can reduce the time required for training.

Based on the same technical concept, fig. 7 is a schematic structural diagram schematically illustrating a device for generating a pre-training model applied to the graph learning field, which is provided by the embodiment of the present invention, and the device may execute a flow of a method for generating a pre-training model applied to the graph learning field.

As shown in fig. 7, the apparatus specifically includes:

A selection module 710 for determining a first node from a first graph structure sample and determining a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; the attribute of each node of the first graph structure sample is different from the attribute of each node of the second graph structure sample in the first training sample;

a processing module 720, configured to determine, according to the first graph structure sample, a graph primitive label vector of the first node and an initial representation vector of the first sample subgraph;

Optionally, the selecting module 710 is specifically configured to:

Optionally, the processing module 720 is specifically configured to:

Optionally, the processing module 720 is further configured to:

Optionally, the processing module 720 is specifically configured to:

Based on the same technical concept, the embodiment of the invention further provides a computing device, including:

A memory for storing program instructions;

Based on the same technical concept, the embodiment of the invention further provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions for causing a computer to execute the method for generating the pre-training model applied to the graph learning field.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CK-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. An abnormal user detection method, comprising:

Determining a first node from a first graph structure sample and determining a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; each node in the first graph structure sample represents a user, and the nodes in the first graph structure sample have the same user attribute; the user attribute of each node of the first graph structure sample is different from the user attribute of each node of the second graph structure sample in the first training sample;

Training the initial pre-training model according to the graph base element prediction vector and the graph base element label vector until the pre-training model is obtained;

Performing model training through the pre-training model and a second training sample with the same user attribute to obtain a graph neural network model, wherein the user attributes of graph structure samples in the second training sample are all communication attributes;

inputting the graph structure to be detected of the communication attribute into the graph neural network model, and predicting to obtain abnormal users in the graph structure to be detected.

2. The method of claim 1, wherein determining a first node from a first graph structure sample and determining a first sample subgraph of the first node comprises:

3. The method of claim 1, wherein determining a graph primitive tag vector for the first node from the first graph structure sample comprises:

4. The method of claim 1, wherein determining an initial representation vector for the first sample subgraph from the first graph structure sample comprises:

5. The method of claim 1, wherein training the initial pre-training model based on the graph base prediction vector and the graph base label vector until the pre-training model is obtained comprises:

6. The method of any one of claims 1 to 5, further comprising, after deriving the pre-trained model:

7. The method of claim 6, wherein training the initial model based on the predicted value and the label value of the third graph structure sample comprises:

Updating the second initial neural network model according to the predicted value and the label value of the third graph structure sample; or (b)

And updating the pre-training model and the second initial neural network model according to the predicted value and the label value of the third graph structure sample.

8. An abnormal user detection apparatus, comprising:

A selection module, configured to determine a first node from a first graph structure sample and determine a first sample subgraph of the first node; the first graph structure sample is any one of first training samples; each node in the first graph structure sample represents a user, and the nodes in the first graph structure sample have the same user attribute; the user attribute of each node of the first graph structure sample is different from the user attribute of each node of the second graph structure sample in the first training sample;

9. A computing device, comprising:

A memory for storing program instructions;

A processor for invoking program instructions stored in said memory to perform the method of any of claims 1-7 in accordance with the obtained program.

10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the method of any one of claims 1 to 7.