CN112966763A

CN112966763A - Training method and device for classification model, electronic equipment and storage medium

Info

Publication number: CN112966763A
Application number: CN202110285723.0A
Authority: CN
Inventors: 王啸; 石川; 赵健安
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-06-15
Anticipated expiration: 2041-03-17
Also published as: CN112966763B

Abstract

The training method, the training device, the electronic equipment and the storage medium of the classification model are applied to the technical field of information, the sample heterogeneous graph is generated according to the sample heterogeneous graph, the semantic graph is extracted, the relation subgraph is generated, the classification result of the target to be classified is obtained according to the relation subgraph, the current loss is calculated, and the parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained are adjusted simultaneously according to the current loss, so that the model training efficiency can be improved.

Description

Training method and device for classification model, electronic equipment and storage medium

Technical Field

The present application relates to the field of information technology, and in particular, to a method and an apparatus for training a classification model, an electronic device, and a storage medium.

Background

The heterogeneous graph is a graph containing various different types of nodes and relations, and the relations among the objects to be classified can be reflected through the heterogeneous graph in reality. For example, in classifying a movie and actors, the same actor may be playing multiple movies, or the same movie may include multiple actors.

However, when the classification model of the heterogeneous graph of the target to be classified is trained, the heterogeneous graph is directly identified and the target to be classified is classified, and because the relationship between the nodes in the heterogeneous graph is often complex, the observed heterogeneous graph structure often cannot well reflect the complex relationship, so that the characteristics between the targets to be classified obtained through identification are often limited, and the training efficiency of the heterogeneous graph classification model in the training process is low.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for training a classification model, an electronic device, and a storage medium, so as to solve the problem of low efficiency in training a heterogeneous graph classification model. The specific technical scheme is as follows:

in a first aspect of the embodiments of the present application, a method for training a classification model is provided, where the method is applied to a classification model to be trained, the classification model to be trained includes a heterogeneous graph structure learning network to be trained and a graph neural network to be trained, and the method includes:

acquiring the characteristics of each node and each element path in a sample heterogeneous graph, wherein the sample heterogeneous graph is used for representing the relationship between the targets to be classified;

generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of each node and the features of each element path through a heterogeneous graph structure learning network to be trained, and aggregating to obtain a feature graph;

generating a semantic graph of the sample heterogeneous graph according to the characteristics of each element path through a heterogeneous graph structure learning network to be trained;

fusing the feature graph, the semantic graph and the sample heterogeneous graph to obtain a relation subgraph;

inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified, and calculating the current loss through a preset loss function;

adjusting parameters of a heterogeneous graph structure learning network to be trained and a graph neural network to be trained according to the current loss;

and returning to pass through the heterogeneous graph structure learning network to be trained, generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of the nodes and the features of the element paths, and continuously executing the step of obtaining the feature graph by aggregation until preset conditions are met to obtain the trained heterogeneous graph structure learning network and graph neural network.

Optionally, the method includes generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to features of each node and features of each meta path through a heterogeneous graph structure learning network to be trained, and aggregating the feature similarity graph and the feature propagation graph to obtain a feature graph, where the feature graph includes:

performing heterogeneous feature extraction and metric learning on the features of each node and the features of each element path through a heterogeneous graph structure learning network to be trained to generate a feature similarity graph;

generating a feature propagation graph through a topological structure propagation feature similarity matrix according to the features of each node and the features of each element path;

and aggregating the feature similarity graph and the feature propagation graph to obtain a feature graph.

Optionally, generating a semantic graph of the sample heterogeneous graph according to the feature of each meta path through a heterogeneous graph structure learning network to be trained, including:

generating a semantic subgraph adjacency matrix corresponding to each meta-path according to the characteristics of each meta-path through a heterogeneous graph structure learning network to be trained;

and aggregating the adjacent matrixes of the semantic subgraphs to obtain the semantic graph.

Optionally, inputting the relationship subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified, and calculating the current loss through a preset loss function, including:

inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified;

calculating the loss of the heterogeneous graph structure learning network to be trained according to the classification result and through a first preset loss function;

calculating the loss of the neural network of the graph to be trained according to the classification result and through a second preset loss function;

and summing the loss of the heterogeneous graph structure learning network to be trained and the loss of the graph neural network to be trained to obtain the current loss.

In a second aspect of the embodiments of the present application, a training apparatus for a classification model is further provided, which is applied to a classification model to be trained, where the classification model to be trained includes a heterogeneous graph structure learning network to be trained and a graph neural network to be trained, and the apparatus includes:

the characteristic obtaining module is used for obtaining the characteristics of each node and each element path in the sample heterogeneous graph, wherein the sample heterogeneous graph is a graph used for representing the relation between the targets to be classified;

the characteristic aggregation module is used for generating a characteristic similarity graph and a characteristic propagation graph of the sample heterogeneous graph according to the characteristics of each node and the characteristics of each meta path through a heterogeneous graph structure learning network to be trained, and aggregating to obtain a characteristic graph;

the semantic graph generating module is used for generating a semantic graph of the sample heterogeneous graph according to the characteristics of each element path through a heterogeneous graph structure learning network to be trained;

the relational sub-graph generation module is used for fusing the feature graph, the semantic graph and the sample heterogeneous graph to obtain a relational sub-graph;

the loss calculation module is used for inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of a target to be classified and calculating the current loss through a preset loss function;

the parameter adjusting module is used for adjusting parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained according to the current loss;

and the neural network acquisition module is used for returning to pass through the heterogeneous graph structure learning network to be trained, generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of each node and the features of each element path, and continuously executing the steps of obtaining the feature graph by aggregation until preset conditions are met to obtain the trained heterogeneous graph structure learning network and graph neural network.

Optionally, the feature aggregation module includes:

the characteristic similarity graph generation submodule is used for carrying out heterogeneous characteristic extraction and metric learning on the characteristics of each node and the characteristics of each element path through a heterogeneous graph structure learning network to be trained to generate a characteristic similarity graph;

the characteristic propagation diagram generation submodule is used for generating a characteristic propagation diagram through a topological structure propagation characteristic similarity matrix according to the characteristics of each node and the characteristics of each element path;

and the characteristic diagram aggregation submodule is used for aggregating the characteristic similarity diagram and the characteristic propagation diagram to obtain the characteristic diagram.

Optionally, the semantic graph generating module includes:

the adjacency matrix generation submodule is used for generating a semantic subgraph adjacency matrix corresponding to each element path according to the characteristics of each element path through a heterogeneous graph structure learning network to be trained;

and the adjacency matrix aggregation sub-module is used for aggregating the adjacency matrixes of the semantic subgraphs to obtain the semantic graph.

Optionally, the loss calculating module includes:

the classification result acquisition submodule is used for inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified;

the first loss calculation submodule is used for calculating the loss of the heterogeneous graph structure learning network to be trained according to the classification result and through a first preset loss function;

the second loss calculation submodule is used for calculating the loss of the graph neural network to be trained according to the classification result and through a second preset loss function;

and the current loss calculation submodule is used for summing the loss of the heterogeneous graph structure learning network to be trained and the loss of the graph neural network to be trained to obtain the current loss.

The embodiment of the application also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the training method of any classification model when executing the program stored in the memory.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for training any of the above classification models is implemented.

Embodiments of the present application further provide a computer program product containing instructions, which when run on a computer, cause the computer to perform any of the above-mentioned methods for training a classification model.

The embodiment of the application has the following beneficial effects:

the training method, the training device, the electronic device and the storage medium for the classification model provided by the embodiment of the application can acquire the characteristics of each node and each meta-path in a sample heterogeneous graph, wherein the sample heterogeneous graph is a graph for representing the relation between the targets to be classified; generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of each node and the features of each element path through a heterogeneous graph structure learning network to be trained, and aggregating to obtain a feature graph; generating a semantic graph of the sample heterogeneous graph according to the characteristics of each element path through a heterogeneous graph structure learning network to be trained; fusing the feature graph, the semantic graph and the sample heterogeneous graph to obtain a relation subgraph; inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified, and calculating the current loss through a preset loss function; adjusting parameters of a heterogeneous graph structure learning network to be trained and a graph neural network to be trained according to the current loss; and returning to pass through the heterogeneous graph structure learning network to be trained, generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of the nodes and the features of the element paths, and continuously executing the step of obtaining the feature graph by aggregation until preset conditions are met to obtain the trained heterogeneous graph structure learning network and graph neural network. The method comprises the steps of generating a sample heterogeneous graph according to a sample heterogeneous graph, extracting a semantic graph, generating a relation subgraph, obtaining a classification result of a target to be classified according to the relation subgraph, calculating current loss, and simultaneously adjusting parameters of a heterogeneous graph structure learning network to be trained and a graph neural network to be trained according to the current loss, so that the model training efficiency can be improved.

Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for training a classification model according to an embodiment of the present disclosure;

FIG. 2a is a schematic diagram of a neural network according to an embodiment of the present application;

fig. 2b is a schematic structural diagram of a feature diagram generator according to an embodiment of the present application;

fig. 2c is a schematic structural diagram of a semantic graph generator according to an embodiment of the present application;

FIG. 3 is a schematic flow chart of a feature map obtained by aggregation according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart illustrating semantic graph generation according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart of calculating the current loss according to the embodiment of the present application;

FIG. 6 is a schematic structural diagram of a training apparatus for classification models according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the description herein are intended to be within the scope of the present disclosure.

Therefore, by the training method of the classification model, the sample heterogeneous graph can be generated according to the sample heterogeneous graph, the semantic graph is extracted, the relation subgraph is generated, the classification result of the target to be classified is obtained according to the relation subgraph, the current loss is calculated, and the parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained are adjusted simultaneously according to the current loss, so that the efficiency of model training can be improved.

In the present application, in order to solve the problem of low efficiency of model training in the prior art, a graph structure and GNN (graph neural network) parameters are jointly optimized by the HGSL (heterogeneous graph structure learning) framework. The HGSL first constructs a semantic embedding matrix Z based on node embedding of M element paths, and then performs joint training on a heterogeneous graph structure and GNN parameters. For the graph learning part, the HGSL embeds the original relationship subgraph, the node characteristics and the semantics as input and respectively generates a relationship subgraph. In particular, in the relation r₁For example, HGSL learning profile

And semantic graphs

And compares it with the original graph

Obtaining a learned relational subgraph by fusion

The learned subgraphs are then input into the GNN and regularization term for node classification with regularization. By minimizing the classification loss, the HGSL co-optimizes the graph structure and GNN parameters.

Specifically, referring to fig. 1, fig. 1 is a schematic flow chart of a training method of a classification model provided in the embodiment of the present application, including:

and step S11, acquiring the characteristics of each node and each meta path in the sample heterogeneous graph.

The sample heterogeneous graph is used for representing the relation between the targets to be classified. For example, each node in the heterogeneous graph may represent different actors, movies, and directors, where the same actor may play multiple movies, the same director may correspond to multiple movies, and the same movie may correspond to multiple actors.

The training method of the classification model provided by the embodiment of the application is applied to the classification model to be trained, and the classification model to be trained comprises a heterogeneous graph structure learning network to be trained and a graph neural network to be trained. The training method of the classification model provided by the embodiment of the application is applied to the intelligent terminal, and specifically, the intelligent terminal can be a computer or a server and the like used for training the model.

And step S12, generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of each node and the features of each meta path through the heterogeneous graph structure learning network to be trained, and aggregating to obtain the feature graph.

The feature map is generated by a feature map generator. By first generating a feature similarity graph that captures the potential relationships generated by the node features through heterogeneous feature projection and metric learning. Then, a feature propagation map is generated by propagating the feature similarity matrix through the topology. And finally, aggregating the generated feature similarity graph and the feature propagation graph into a final feature graph through the channel attention layer.

For example, the heterogeneous graph G ═ (V, E, F) is composed of a node set V, an edge set E, and a feature set F, while also having a node type mapping function

And edge type mapping function

Wherein the content of the first and second substances,

and

a set of node and edge types is represented,

feature set F consisting of

The characteristic matrix is composed of a plurality of characteristic matrixes,

V_τtype of representation

Set of nodes of d_τRepresenting the characteristic dimension of the τ -type node.

And step S13, generating a semantic graph of the sample heterogeneous graph according to the characteristics of each meta path through the heterogeneous graph structure learning network to be trained.

The semantic graph of the sample heterogeneous graph can be generated through the semantic graph generator. The semantic graph is generated according to a high-order topological structure in the HIN and describes multi-hop structural interaction between two nodes.

And step S14, fusing the feature graph, the semantic graph and the sample heterogeneous graph to obtain a relational sub-graph.

Node relationship triplet (v)_i,r,v_j) Describing two nodes v_i(head node) and v_j(tail node) passing relationship

Connected, type mapping function

The relationships are mapped to their head node type and tail node type, respectively. Example in a user-item heterogeneous graph, let r be "UI" (user purchased an item), then there is φ_h(r) ═ User and phi_t(r)＝"Item"。

The relationship subgraph is that when a heterogeneous graph G is given as (V, E, F), the relationship subgraphDrawing G_rIs a subgraph of G that contains all node-relationship triplets with relationship r. G_rIs a contiguous matrix of

Wherein A is_r[i,j]If (v) is 1_i,r,v_j) In G, otherwise A_r[i,j]＝0，

Representing a collection of all relational subgraphs in G, i.e.

And step S15, inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified, and calculating the current loss through a preset loss function.

And calculating the current loss according to a preset loss function, wherein the current loss can be calculated by respectively calculating the loss of the HGSL and the loss of the GNN and summing the losses. Specifically, the current loss may be calculated as shown in fig. 2a, and fig. 2a is a schematic structural diagram of a neural network provided in the embodiment of the present application. Wherein the feature set F consists of

V_τtype of representation

Set of nodes of d_τRepresenting the characteristic dimension of the τ -type node. For the graph learning part, the original relationship subgraph is divided into

Node feature F and semantics embedding Z as input and through the original relationship subgraph

And the node feature F generates a feature graph

Semantic graph generation by semantic embedding Z

Learning feature maps via graph neural networks

And semantic graphs

Obtaining a learned relational subgraph

Obtaining losses from regularization through relational subgraphs

Obtaining loss according to the relation subgraph according to the graph neural network

Fig. 2b may be referred to as a feature diagram generator, fig. 2b is a schematic structural diagram of the feature diagram generator provided in the embodiment of the present application, and fig. 2c may be referred to as a semantic diagram generator, and fig. 2c is a schematic structural diagram of the semantic diagram generator provided in the embodiment of the present application.

And step S16, adjusting parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained according to the current loss.

And step S17, returning to the heterogeneous graph structure learning network to be trained, generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of each node and the features of each meta-path, and continuously executing the step of obtaining the feature graph through aggregation until preset conditions are met to obtain the trained heterogeneous graph structure learning network and graph neural network.

Optionally, referring to fig. 3, in step S12, through the learning network of the heterogeneous graph structure to be trained, according to the features of each node and the features of each meta path, a feature similarity graph and a feature propagation graph of the sample heterogeneous graph are generated, and are aggregated to obtain a feature graph, which includes:

step S121, performing heterogeneous feature extraction and metric learning on the features of each node and the features of each meta-path through a heterogeneous graph structure learning network to be trained to generate a feature similarity graph;

step S122, generating a feature propagation diagram through a topological structure propagation feature similarity matrix according to the features of each node and the features of each element path;

and step S123, aggregating the feature similarity graph and the feature propagation graph to obtain a feature graph.

Feature similarity graph

Determining a type between two nodes based on node characteristics

In particular, for the features of

Type phi (v)_i) Each node v of_iUsing a particular type of mapping layer to map the feature f_iProjection to d_cDimension common features

Where σ (-) represents a non-linear activation function,

respectively represent the type phi (v)_i) And an offset vector. Then, for the relation r, metric learning is carried out on the common features, and a feature similarity graph is obtained

Wherein node v_iAnd v_jThe edges in between are obtained by:

wherein e is^FS∈[0,1]Is a threshold value for controlling the sparsity of the feature similarity graph, belongs to^FSThe larger the number the more sparse the feature similarity map is,

is K-headed weighted cosine similarity, defined as:

wherein, l represents a hadamard product,

is a parameter matrix which can be learnt and is used for measuring the importance of different dimensions of the characteristic vector. By using the formula (3) to carry out metric learning and setting a threshold value to eliminate the edge with small feature similarity, the HGSL learns the candidate feature similarity graph

Optionally, referring to fig. 4, in step S13, generating a semantic graph of the sample heterogeneous graph according to the feature of each meta path through the heterogeneous graph structure learning network to be trained, where the semantic graph includes:

s131, generating a semantic subgraph adjacency matrix corresponding to each meta-path according to the characteristics of each meta-path through a heterogeneous graph structure learning network to be trained;

and step S132, aggregating the adjacent matrixes of the semantic subgraphs to obtain the semantic graph.

For example, given a relationship of

Meta path of

And generating a semantic graph by fusing the adjacency matrixes.

Wherein generating the semantic graph may be generated by a semantic graph generator that generates the underlying semantic graph structure by metric learning of the resulting meta-path based node embedding. In particular, for a meta-path set having M meta-paths

The HGSL uses the trained MP2Vec embedded representation to generate the semantic graph. Since the training process of semantic embedding is offline, computational cost and model complexity can be reduced. Moreover, due to the mechanism of the heterogeneous Skip-gram, the information of the intermediate node is well preserved.

After obtaining the semantic embedding Z, for each meta-path P_mA candidate semantic subgraph adjacency matrix can be generated

Wherein the calculation formula of each edge is as follows:

wherein the content of the first and second substances,

represents

The number of the ith row of (a),

is provided with parameters

The relation r generates M candidate semantic subgraphs, and the overall semantic subgraphs of the relation r can be obtained by aggregating the M candidate semantic subgraphs, and are represented as

Wherein the content of the first and second substances,

is a stacked matrix of M candidate semantic graphs.

Representing the channel attention layer, its weight matrix

Representing the importance of candidate graphs based on different meta-paths. Obtaining the semantic graph obtained by aggregation

Thereafter, the learned feature graph, semantic graph, and original graph structure may be aggregated to obtain a final generated graph structure A 'of the relationship r'_r：

Is a stacked matrix of candidate graphs. Psi_rIs the channel attention layer, its weight matrix

Represents different candidate graphs to obtain an overall relational sub-graph A 'after being fused'_rOf importance in (1). For each relationship r, a new relationship adjacency matrix A'_rGenerating new heterostructure, i.e.

Optionally, referring to fig. 5, step S15 is to input the relationship subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified, and calculate the current loss by using a preset loss function, where the method includes:

step S151, inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified;

step S152, calculating the loss of the heterogeneous graph structure learning network to be trained according to the classification result and through a first preset loss function;

step S153, calculating the loss of the graph neural network to be trained according to the classification result and through a second preset loss function;

and step S154, summing the loss of the heterogeneous graph structure learning network to be trained and the loss of the graph neural network to be trained to obtain the current loss.

In the present application, the graph structure

Has the parameter theta ═ (W)₁,W₂)，

Wherein X is the original node feature matrix ifThe dimensions of all the features are the same,

otherwise, X is constructed using common features, i.e.

From learned heterogeneous graphs by treating all nodes as one type

The adjacent matrix is constructed as A',

wherein

Thus, GNN classification is lost

Wherein f is_θ(X,A′)_iIs node v_i∈V_LThe predicted label of (1), l (·,) measures the predicted value and the true label y_iThe difference between them.

The original GNNs will be more easily overfit because of their greater ability to adapt to downstream tasks by the graph structure learning method. Thus, regularization term

Applied to the learned graph, as follows:

in the present application, the current loss can be represented by the formula:

and (4) calculating.

In the present application, the method can be realized by minimization

The HGSL jointly optimizes the heterostructure and GNN parameters to achieve better downstream task performance.

Further, in order to explain the advantageous effects of the present application, the following description is made in conjunction with specific comparative tests.

Our proposed model was evaluated experimentally using real-world datasets, the statistics of which are shown in table 1:

table 1 data set statistics

DBLP (DataBase systems and Logic Programming, Integrated DataBase System), a subset of DBLPs, containing 4323 papers (P), 2957 authors (A), and 20 meetings (C). The authors are divided into four areas: database, data mining, machine learning, and information retrieval. Node characteristics are terms related to papers, authors and meetings, respectively.

ACM (Association for Computing Machinery, international society for computers), using the same data set as GTN and experimental setup, contains 3025 papers (P), 5912 authors (a) and 57 conference topics (S). The papers are labeled according to their meeting. The node features are composed of keywords.

Yelp, a Yelp dataset containing 2614 merchants (B), 1286 users (U), 4 services (S) and 9 ratings (L). The merchant nodes are labeled by their category. The node features are composed of bag-of-words representations of related keywords.

Baseline, HGSL was compared with 11 most advanced embedding methods in this application, including 4 homogeneous graph embedding methods, i.e., deep walk algorithm, GCN (graph convolution network), GAT (graph attention network), and graphsage (graph SAmple and aggregate), four heterogeneous graph embedding methods, i.e., MP2Vec (meta 2Vec), HAN (heterogeneous graph attention network), HeGAN (high embedding with GAN-based acquired adaptive learning, heterogeneous graph embedding algorithm based on generation of an antagonistic network), and GTN (graph transformation network), and three graph structure learning methods, i.e., lds (learning Discrete structure for gnn), Pro-gnn (performance gnn), and gemcom-GCN (geometric convolution network).

Experimental setup, in this application, for all GNN-related models, the number of layers is set to 2, the characteristic dimension d in public space_cAnd the embedding dimension d is set to 16 and 64, respectively, using the cosine similarity function defined in equation (3), i.e., K ═ 2. The learning rate and weight attenuation are set to 0.01 and 0.0005, respectively. For e^FP,∈^MPAnd alpha and other hyper-parameters, and can be adjusted through grid search.

TABLE 2 evaluation of Performance of node Classification (mean (percent). + -. Standard deviation)

The performance of the HGSL on the node classification task is evaluated in this application. Macro-F1 and Micro-F1 were selected as evaluation indices. The mean and standard deviation in percentage are shown in table 2, from which we can observe the following: the HGSL has the capability of self-adaptively learning the structure of the heterogeneous graph and is always superior to all baselines; graph structure learning methods are generally superior to the original GCN because it allows the GCN to aggregate features from the learned structure; compared with the homogeneous GNN, the HGNN methods such as HAN, GTN and HGSL have better performance due to the consideration of heterogeneity; due to the use of node features, GNN-based methods are mostly preferred over graph embedding methods based on random walks. This phenomenon is more pronounced on the Yelp dataset because the node features (i.e., keywords) help to classify the business category.

Ablation experiments to verify the effectiveness of different parts of the HGSL, the present application designed three variants, denoted HGSL-w/o-FSG, HGSL-w/o-FPG and HGSL-w/o-SG, respectively, by deleting each type of map from the HGSL, respectively. It can be observed that HGSL outperforms these variants, indicating that it is necessary to consider all these candidate figures. Furthermore, the degree of performance degradation of these three variables compared to HGSL is different on different datasets, indicating that the importance of these candidate maps is different in different situations and should be carefully weighed.

Validity of weight learning, to evaluate the importance of whether the HGSL can effectively learn different graphs, the present application replaces each channel attention layer in the HGSL with an average aggregation layer, i.e., a graph obtained by averaging all candidate graphs, denoted as HGSL-avg (average of HGSL). It is clear that HGSL performs significantly better than HGSL-avg, indicating the effectiveness of weight learning through the channel attention layer. Notably, the performance of HGSL-avg on Yelp is significantly reduced compared to HGSL. This is because the node characteristics of Yelp are very important, but HGSL-avg equally fuses three candidate graphs, where both graphs (semantic graph and original graph) are generated from the topology. Therefore, the influence of the node characteristics in Yelp is greatly reduced, thereby affecting the performance.

Importance analysis of candidate graph to investigate whether the HGSL can distinguish the importance of the candidate graph, the present application analyzes the weight distribution of the channel attention layer fusing each relational subgraph on three datasets, i.e.,/' in equation (6)_rThe weight of (c). The HGSL is trained 20 times in this application and all thresholds for the HGSL are set to 0.2. Note that the distribution is as shown in fig. 3. It can be seen that the original graph structure is the most important structure in the GNN-based classification for ACM and DBLP. But for Yelp, the channel attention values of different relational subgraphs are different from each other. In particular, for the B-U (merchant-user) and B-L (merchant-rating) relational sub-graphs, larger channel attentiveness values are assigned to the feature graphs in graph structure learning. This phenomenon means that the information in the node features plays a more important role than the semantically embedded information, which is consistent with the previous experiment, and further proves that the HGSL can adaptively learn the channel attention value to obtain more important information.

By the framework of the HGSL of the present application, the heterostructure and GNN parameters can be learned together to achieve classification goals. In particular, by utilizing complex interactions inside the heterogeneous map, a feature similarity map, a feature propagation map, and a semantic map are generated and fused, thereby learning an optimal heterogeneous map structure for classification.

In a second aspect of the embodiment of the present application, there is further provided a training apparatus for a classification model, which is applied to a classification model to be trained, where the classification model to be trained includes a heterogeneous graph structure learning network to be trained and a graph neural network to be trained, see fig. 6, where fig. 6 is a schematic structural diagram of the training apparatus for the classification model provided in the embodiment of the present application, and the apparatus includes:

a feature obtaining module 601, configured to obtain features of each node and each meta-path in a sample heterogeneous graph, where the sample heterogeneous graph is a graph used for representing a relationship between targets to be classified;

the feature aggregation module 602 is configured to generate a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to features of each node and features of each meta path through a heterogeneous graph structure learning network to be trained, and perform aggregation to obtain a feature graph;

the semantic graph generating module 603 is configured to generate a semantic graph of the sample heterogeneous graph according to the features of each meta path through a heterogeneous graph structure learning network to be trained;

the relational sub-graph generation module 604 is configured to fuse the feature graph, the semantic graph and the sample heterogeneous graph to obtain a relational sub-graph;

a loss calculation module 605, configured to input the relationship subgraph into a graph neural network to be trained, to obtain a classification result of the target to be classified, and calculate a current loss through a preset loss function;

a parameter adjusting module 606, configured to adjust parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained according to the current loss;

and the neural network acquisition module 607 is configured to return to pass through the heterogeneous graph structure learning network to be trained, generate a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of each node and the features of each meta-path, and continue to perform the step of obtaining the feature graph by aggregation until a preset condition is met, so as to obtain the trained heterogeneous graph structure learning network and the graph neural network.

Optionally, the feature aggregation module 602 includes:

Optionally, the semantic graph generating module 603 includes:

Optionally, the loss calculating module 605 includes:

Therefore, by the training device of the classification model, the sample heterogeneous graph can be generated according to the sample heterogeneous graph, the semantic graph is extracted, the relation subgraph is generated, the classification result of the target to be classified is obtained according to the relation subgraph, the current loss is calculated, and the parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained are adjusted simultaneously according to the current loss, so that the efficiency of model training can be improved.

The embodiment of the present application further provides an electronic device, as shown in fig. 7, which includes a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,

a memory 703 for storing a computer program;

the processor 701 is configured to implement the following steps when executing the program stored in the memory 703:

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided by the present application, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the training method of any one of the above classification models.

In yet another embodiment provided by the present application, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of training a classification model according to any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the storage medium, and the computer program product embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The above description is only for the preferred embodiment of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A training method of a classification model is applied to a classification model to be trained, wherein the classification model to be trained comprises a heterogeneous graph structure learning network to be trained and a graph neural network to be trained, and the method comprises the following steps:

generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of the nodes and the features of the meta-paths through a heterogeneous graph structure learning network to be trained, and aggregating to obtain a feature graph;

fusing the feature graph, the semantic graph and the sample heterogeneous graph to obtain a relational subgraph;

adjusting parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained according to the current loss;

and returning to the learning network of the heterogeneous graph structure to be trained, generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of the nodes and the features of the meta paths, and continuously executing the step of obtaining the feature graph by aggregation until preset conditions are met to obtain the trained learning network of the heterogeneous graph structure and the graph neural network.

2. The method according to claim 1, wherein the generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of the nodes and the features of the meta-paths through the heterogeneous graph structure learning network to be trained, and aggregating the feature similarity graph and the feature propagation graph to obtain a feature graph comprises:

generating a feature propagation diagram by propagating a feature similarity matrix through a topological structure according to the features of the nodes and the features of the element paths;

3. The method according to claim 1, wherein the generating the semantic graph of the sample heterogeneous graph according to the feature of each meta-path through the heterogeneous graph structure learning network to be trained comprises:

and aggregating the adjacent matrixes of the semantic sub-graphs to obtain the semantic graphs.

4. The method of claim 2, wherein the inputting the relational subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified, and calculating a current loss through a preset loss function, comprises:

calculating the loss of the graph neural network to be trained according to the classification result and through a second preset loss function;

5. The training device of the classification model is applied to the classification model to be trained, the classification model to be trained comprises a heterogeneous graph structure learning network to be trained and a graph neural network to be trained, and the device comprises:

the characteristic obtaining module is used for obtaining the characteristics of each node and each element path in a sample heterogeneous graph, wherein the sample heterogeneous graph is a graph used for representing the relation between the targets to be classified;

the characteristic aggregation module is used for generating a characteristic similarity graph and a characteristic propagation graph of the sample heterogeneous graph according to the characteristics of each node and the characteristics of each meta-path through a heterogeneous graph structure learning network to be trained, and aggregating to obtain a characteristic graph;

the loss calculation module is used for inputting the relation subgraph into a graph neural network to be trained to obtain a classification result of the target to be classified and calculating the current loss through a preset loss function;

the parameter adjusting module is used for adjusting the parameters of the heterogeneous graph structure learning network to be trained and the graph neural network to be trained according to the current loss;

and the neural network acquisition module is used for returning to the heterogeneous graph structure learning network to be trained, generating a feature similarity graph and a feature propagation graph of the sample heterogeneous graph according to the features of the nodes and the features of the element paths, and continuously executing the steps of obtaining the feature graph by aggregation until preset conditions are met to obtain the trained heterogeneous graph structure learning network and graph neural network.

6. The apparatus of claim 5, wherein the feature aggregation module comprises:

the characteristic similarity graph generating submodule is used for carrying out heterogeneous characteristic extraction and metric learning on the characteristics of each node and the characteristics of each element path through a heterogeneous graph structure learning network to be trained to generate a characteristic similarity graph;

and the feature map aggregation submodule is used for aggregating the feature similarity map and the feature propagation map to obtain a feature map.

7. The apparatus of claim 5, wherein the semantic graph generation module comprises:

the adjacency matrix generation sub-module is used for generating a semantic subgraph adjacency matrix corresponding to each element path according to the characteristics of each element path through a heterogeneous graph structure learning network to be trained;

and the adjacency matrix aggregation submodule is used for aggregating the adjacency matrixes of the semantic sub-graphs to obtain the semantic graphs.

8. The apparatus of claim 6, wherein the loss calculation module comprises:

9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 4 when executing a program stored in the memory.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 4.