CN115631057A - Social user classification method and system based on graph neural network - Google Patents
Social user classification method and system based on graph neural network Download PDFInfo
- Publication number
- CN115631057A CN115631057A CN202211307295.8A CN202211307295A CN115631057A CN 115631057 A CN115631057 A CN 115631057A CN 202211307295 A CN202211307295 A CN 202211307295A CN 115631057 A CN115631057 A CN 115631057A
- Authority
- CN
- China
- Prior art keywords
- node
- nodes
- synthetic
- graph
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 31
- 238000000547 structure data Methods 0.000 claims abstract description 12
- 238000003860 storage Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 29
- 239000011159 matrix material Substances 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 11
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000013145 classification model Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 239000002131 composite material Substances 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002194 synthesizing effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000009827 uniform distribution Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a social user classification method, a social user classification system, electronic equipment and a computer-readable storage medium based on a graph neural network, and belongs to the technical field of social user classification. The method comprises the steps of obtaining node representation aiming at input original graph structure data constructed based on social user data, and executing oversampling operation based on the node representation; generating a synthetic node aiming at a few nodes in the data; acquiring adjacency information of the synthetic node based on the synthetic node; distributing pseudo labels for the synthetic nodes; and combining the synthesized nodes, the adjacent information and the real nodes to construct a node balance graph and classifying. The problem of unbalanced classification can be solved, the accuracy of social user classification is improved, and the problems that in the prior art, the accuracy is low and the calculation cost is high due to unbalanced classification and classification of social users are solved.
Description
Technical Field
The present application relates to the field of social user classification technologies, and in particular, to a social user classification method and system based on a graph neural network.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
In recent years, with the development of Graph Neural Networks (GNNs), graph representation learning has been improved greatly and is widely applied to classification tasks, but existing work still mainly focuses on data balance learning.
Node classification is an important research topic in graph representation learning. Graph Neural Networks (GNNs) have achieved the most advanced node classification performance. However, existing GNNs solve the problem of data sample balancing of different classes; for many real scenes, some classes may have far fewer instances than others. In this case, training the GNN classifier directly would be insufficient to represent samples from those few classes and result in sub-optimal performance.
However, in the real world, the number of different classes of samples in the data may be unbalanced, i.e., there may be some phenomena that are much more than samples of other classes. In the aspect of detection of false users, most of the social users of the video website and the social network site are real users, and only a small part of the social users are robot users (false users), so that the phenomenon is particularly prominent, and the problem of unbalanced classification exists.
Semi-supervised classification learning is to use a small part of labeled data to train a classifier in a large amount of data so as to complete a final classification task, and because there is only limited labeled data when performing social user classification, the semi-supervised learning causes fewer labeled samples, which further enlarges the severity of the problem because we have only limited labeled data, so that fewer labeled samples become fewer.
In the field of machine learning, the problem of unbalanced classification is widely studied and can be summarized into three categories, namely a data-level method, an algorithm-level method and a hybrid method. The data level method enables category distribution to be more balanced by using an oversampling or undersampling technology, wherein the oversampling is used for balancing a data set by oversampling a few types of samples and the undersampling is used for undersampling a plurality of types of samples; undersampling may lead to more efficient classification, but since it discards useful information in most classes, it eventually shakes decision boundaries and leads to poor classifiers; in contrast, oversampling preserves more information by copying existing samples or synthesizing new samples, copying (also known as random oversampling) randomly copies some few samples, so it usually produces a smaller few class regions, which may result in an overfitting. Algorithmic-level methods typically introduce different misclassification penalties or prior probabilities for different classes; the mixing method combines the two. However, directly applying them to the graph may result in sub-optimal results, the relationship is the key information that needs to be mined in the graph structure data, and the insufficient representation of a few samples not only affects the embedding quality, but also affects the knowledge exchange process between adjacent nodes. Previous algorithms fail to solve this problem because they assume that each sample is independent.
While existing methods have demonstrated their success in unbalanced data learning, two problems remain:
(1) In an oversampling strategy, synthesis provides a wider decision area than replication, but results in heavy computational costs;
(2) Hybrid strategies rebalance the dataset with a wide decision region, but integrated learning strategies require a large computational cost in training, especially when combined with over-sampling strategies;
(3) For real scenarios like social user classification, some classes may have far fewer instances than others. In this case, training the GNN classifier directly would be insufficient to represent samples from those few classes and result in suboptimal performance.
Disclosure of Invention
To address the deficiencies of the prior art, the present application provides a graph neural network based social user classification method, system, electronic device, and computer-readable storage medium that generate a composite minority node by interpolating values in an expressive embedding space obtained by a GNN-based feature extractor, and predict links of the composite node using an edge generator, thereby balancing the minority node with other nodes to facilitate node classification by GNN.
In a first aspect, the application provides a social user classification method based on a graph neural network;
a social user classification method based on a graph neural network comprises the following steps:
acquiring node representation aiming at input original graph structure data constructed based on social user data, and performing oversampling operation based on the node representation;
generating a synthetic node aiming at a few nodes in the data;
acquiring adjacency information of the synthetic node based on the synthetic node;
distributing pseudo labels for the synthetic nodes;
and combining the synthesized nodes, the adjacent information and the real nodes to construct a node balance graph and classifying.
Further, the specific steps of performing the oversampling operation to generate the synthetic node based on the node representation are:
based on the node representation, aiming at a few nodes in the data, acquiring corresponding node representation through feature extraction;
and generating a synthetic node according to the attribute information and the topology information of the minority nodes.
Further, the specific step of allocating the pseudo label to the partial synthesis node is as follows:
acquiring the influence degree of the neighborhood label information on the predicted label according to the weight matrix and the neighborhood label information;
and obtaining the predicted label according to the influence degree of the original label and the neighborhood label information on the predicted label.
Further, the specific steps of combining the synthesized node, the adjacent information and the real node to construct the node balance graph include:
connecting real node embedding with synthetic node embedding to obtain an enhanced node representation set;
and embedding the synthesized nodes into the label node set to obtain an enhanced tag set.
Further, before classification, the network is trained through an objective function to obtain a neural network classification model.
Further, the objective function is:
wherein eta is node For cross entropy loss function, η edge For training the loss function of the edge generator, eta p For the objective function of adaptive tag propagation, λ is the hyper-parameter, θ, φ,are the parameters of the feature extractor, edge generator and node classifier, respectively.
Furthermore, two layers of GraphSage are adopted as a main model structure.
In a second aspect, the present application provides a social user classification system based on a graph neural network;
a graph neural network-based social user classification system, comprising:
the method comprises a feature extractor, a node generator, an edge generator, a label propagator and a GNN classifier;
the feature extractor is used for acquiring original graph structure data constructed based on social user data and acquiring node representation according to the original graph structure data;
the node generator is used for generating a synthetic node aiming at a few nodes in the data based on the node representation;
the edge generator is used for acquiring the adjacent information of the synthetic node based on the synthetic node;
the label propagator is used for distributing pseudo labels for the synthetic nodes;
the GNN classifier is used for combining the synthesized nodes, the adjacent information and the real nodes to construct a node balance graph and classify.
In a third aspect, the present application provides an electronic device;
an electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of the method for social user classification based on a graph neural network.
In a fourth aspect, the present application provides a computer-readable storage medium;
a computer readable storage medium for storing computer instructions, which when executed by a processor, perform the steps of the above-mentioned method for social user classification based on graph neural network.
Compared with the prior art, the beneficial effect of this application is:
1. the method expands the prior unbalanced learning technology for independent and identically distributed data to an unbalanced node classification task, adopts the most stable and most effective synthesis minority oversampling algorithm to provide relationship information for newly synthesized samples, classifies the newly synthesized samples based on class balanced data, and improves the classification accuracy;
2. according to the method, for the processing of a small number of nodes, the GNN feature extractor is used for generating embedding, the node generator generates a small number of nodes in a potential space, then the edge generator adds connection to the new nodes to obtain an enhanced graph with class balance, and finally the nodes are classified through the GNN classifier.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
Fig. 1 is a schematic flowchart of a social user classification method based on a graph neural network according to an embodiment of the present application;
fig. 2 is a schematic diagram of a framework provided in an embodiment of the present application.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular is intended to include the plural unless the context clearly dictates otherwise, and furthermore, it should be understood that the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
In the prior art, in the problem of social user classification, the category of a false user is far smaller than that of a real user, and in the process of prediction classification through a graph neural network, the existing method for the unbalanced classification problem has high calculation cost, and directly trained samples are not enough to represent the samples from a few categories, so that suboptimal performance is caused, and the classification accuracy is reduced; therefore, the application provides a social user classification method based on the graph neural network.
Next, a social user classification method based on a graph neural network disclosed in this embodiment is described in detail with reference to fig. 1-2.
The embodiment provides a social user classification method based on a graph neural network, which comprises the following steps:
s1, acquiring node representation aiming at input original graph structure data constructed based on social user data, and executing oversampling operation based on the node representation; the method comprises the following specific steps:
extracting, by a feature extractor, an input node representation of original graph structure data constructed based on social user data; in this embodiment, the first level GraphSage computing node is selected to represent:
where F represents the input node attribute matrix, F [,:]representing the node attributes. A [: ,]representing the v-th column in the adjacency matrix,is the embedding of nodes, W 1 Is a weight parameter representing an activation function like RELU.
S2, after the representation of each node is obtained in an embedding space constructed by the feature extractor, aiming at a few nodes in the data, generating a synthetic node; in this embodiment, the minority nodes are nodes with tags in the data, specifically, a SMOTE algorithm is adopted, and common oversampling is increased by changing repeated interpolation, and the basic idea of the SMOTE algorithm is to interpolate samples from a target minority class in an embedding space and interpolate neighbors of the target minority class in the embedding space; the specific process is as follows:
is provided withIs a few nodes with labels of Y u . First, find andthe closest labeled node in the same class, i.e.,
where nn (v) refers to the nearest neighbor of v in the same class, measured in embedding space using euclidean metrics.
Using nearest neighbors, the resulting nodes are generated as:
where δ is a random variable that follows a uniform distribution over the range [0,1 ].
Due to the fact thatAndbelonging to the same class and being very close to each other, so that resulting composite nodesShould also belong to the same class in order to obtain a labeled synthetic node.
S3, acquiring adjacency information of the synthetic node based on the synthetic node; synthetic nodes have now been generated to balance the class distribution, since these nodes are not linked to the original graph G, and are therefore isolated from the original graph G. Firstly, training through a real node and an existing edge generator; then, adjacency information of the composite node is predicted by the edge generator, and the generated composite node and edge are added to the initial adjacency matrix. The node representation can be well utilized to reconstruct the adjacency matrix, and good link prediction is provided for the synthesized node.
To maintain model simplicity and make analysis easier, the edge generator is implemented using weighted inner products, as follows:
wherein, E v,u And representing the information of the predicted relationship between the nodes v and u, and S represents a parameter matrix for capturing the interaction between the nodes.
Training the edge generator by a loss function, the loss function being:
where E represents the predicted connection between nodes in V.
S4, distributing pseudo labels for the synthetic nodes; utensil for cleaning buttockIn particular, the goal of label propagation is to find a prediction matrix Y that is consistent with the label matrix L . The specific formula is as follows:
wherein Y (0) = Y, K represents the number of power iteration steps,is a prediction label and the transpose matrix is denoted by T and can be set as a normalized adjacency matrix. After the label is spread for K times, the label is predicted to obtain the neighborhood label information of the K hop distance. For this reason, we have designed an adaptive label propagation algorithm, and the specific formula can be expressed as:
wherein gamma is ik Representing the degree of influence of k-hop neighborhood information on the predicted label, gamma ik Can be expressed as:
wherein,note that vector W is the weight matrix and ReLU is the activation function. The self-adaptive label propagation operator sets the attention vector and the weight matrix as learnable parameters, adjusts the propagation strategy of each node, and finally, the smooth label can capture rich structural information in the input graph.
The objective function of adaptive tag propagation is as follows:
wherein,is to the node v i Prediction of (a), y i Is the original label and l () is the cross entropy loss.
And S5, combining the synthesized nodes, the adjacent information and the real nodes to construct a node balance graph and classifying.Is prepared by reacting H 1 (embedding of real nodes) and the set of enhanced node representations resulting from the concatenation of the embedding of synthetic nodes,by embedding the synthetic node into V L The resulting enhanced marker set; thereby obtaining a set of nodes with labelsEnhanced graph of
In particular, a second GraphSage block is used, with the addition of a linear layer forThe node classification above, as follows:
wherein H 2 The node representing the 2 nd GraphSage block represents a matrix, and W represents a weight parameter. P is v Is the probability distribution over class labels for node v. The classifier module is optimized by using cross entropy loss:
during the test, the nodes v, Y are connected v ' the prediction class is set as the most probable class
Further, before social user classification, training a network through an objective function to obtain a graph neural network classification model, and when the objective function is minimum, obtaining an optimal graph neural network classification model, wherein the specific training steps are the same as the steps of the method, and the objective function is as follows:
wherein eta is node For cross entropy loss function, η edge For training the loss function of the edge generator, eta p For the objective function of adaptive tag propagation, λ is the hyper-parameter, θ, φ,are the parameters of the feature extractor, edge generator and node classifier, respectively.
Example two
The embodiment discloses a social user classification system based on a graph neural network, which comprises a feature extractor, a node generator, an edge generator, a label propagator and a GNN classifier;
the feature extractor is used for acquiring original graph structure data constructed based on social user data and acquiring node representation according to the original graph structure data; the feature extractor can be implemented using any type of GNN, and in particular, the feature extractor chooses GraphSage as the backbone model structure because it can effectively learn various local topologies and can be well generalized to new structures. The message transmission and fusion process comprises the following steps:
where F represents the input node attribute matrix, F [,:]representing the node attributes. A [: ,]representing the v-th column in the adjacency matrix,is the embedding of nodes, W 1 Is a weight parameter representing an activation function like RELU.
The node generator is used for generating a synthetic node aiming at a few nodes in the data based on the node representation; specifically, after the representation of each node is obtained in the embedding space constructed by the feature extractor, a synthetic node is generated for a few nodes in the data; specifically, a SMOTE algorithm is adopted, common oversampling is increased by changing repeated interpolation, and the SMOTE algorithm has the basic idea that samples from a target minority class are interpolated in an embedding space, and neighbors of the target minority class are interpolated in the embedding space; the specific process is as follows:
is provided withIs a few nodes with labels of Y u . First, find andthe closest labeled node in the same class, i.e.,
where nn (v) refers to the nearest neighbor of v in the same class, measured in embedding space using euclidean metrics.
Using nearest neighbors, the resulting nodes are generated as:
where δ is a random variable that follows a uniform distribution over the range [0,1 ].
Due to the fact thatAndbelong to the same class and are in close proximity to each other, so the resulting composite nodesShould also belong to the same class, so that a labeled synthetic node is obtained.
The edge generator is used for acquiring the adjacency information of the synthetic node based on the synthetic node; (ii) a Specifically, the generator trains the actual node and the existing edge for predicting the neighbor information of the synthetic node. These new nodes and edges will be added to the initial adjacency matrix and serve as inputs to the GNN-based classifier.
To maintain model simplicity and make analysis easier, the edge generator is implemented using weighted inner products, as follows:
wherein E is v,u And S represents a parameter matrix for capturing the interaction between the nodes.
Training the edge generator by a loss function, the loss function being:
where E represents the predicted connection between nodes in V.
Label propagators for synthesizing sectionsPoint distributing a pseudo label; the goal of label propagation is to find a prediction matrix Y that is consistent with the label matrix L . The concrete formula is as follows:
wherein Y (0) = Y, K represents the number of power iteration steps,is a prediction label and the transpose matrix is denoted by T and can be set as a normalized adjacency matrix. After the label is spread for K times, the label is predicted to obtain the neighborhood label information of the K hop distance. For this reason, we have designed an adaptive label propagation algorithm, and the specific formula can be expressed as:
wherein gamma is ik Representing the degree of influence of k-hop neighborhood information on the predicted label, gamma ik Can be expressed as:
wherein,note that vector W is a weight matrix and ReLU is an activation function. The self-adaptive label propagation operator sets the attention vector and the weight matrix as learnable parameters, adjusts the propagation strategy of each node, and finally, the smooth label can capture rich structural information in the input graph.
The GNN classifier is used for combining the synthesized nodes, the adjacent information and the real nodes to construct a node balance graph and classifying the node balance graph; using a second GraphSage block, add a linear layer forThe node classification above, as follows:
wherein H 2 The node representing the 2 nd GraphSage block represents a matrix and W represents a weight parameter. P is v Is the probability distribution over class labels for node v. The classifier module is optimized by using cross entropy loss:
during the test, the nodes v, Y are connected v ' the prediction class is set as the most probable class
It should be noted that the above feature extractor, node generator, edge generator, label propagator and GNN classifier correspond to the steps in the first embodiment, and the above modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer executable instructions.
EXAMPLE III
The third embodiment of the invention provides electronic equipment, which comprises a memory, a processor and computer instructions stored on the memory and run on the processor, wherein when the computer instructions are run by the processor, the steps of the social user classification method based on the graph neural network are completed.
Example four
The fourth embodiment of the present invention provides a computer-readable storage medium, configured to store computer instructions, where the computer instructions, when executed by a processor, perform the steps of the social user classification method based on a graph neural network.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the foregoing embodiments, the description of each embodiment has an emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions in other embodiments.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (10)
1. A social user classification method based on a graph neural network is characterized by comprising the following steps:
acquiring node representation aiming at input original graph structure data constructed based on social user data, and performing oversampling operation based on the node representation;
generating a synthetic node aiming at a few nodes in the data;
acquiring adjacency information of the synthetic node based on the synthetic node;
distributing pseudo labels for the synthetic nodes;
and combining the synthesized nodes, the adjacent information and the real nodes to construct a node balance graph for classification.
2. The method for classifying social users based on graph neural network as claimed in claim 1, wherein the step of performing the oversampling operation to generate the synthetic node based on the node representation comprises:
aiming at a few nodes in the data, acquiring corresponding node representation through feature extraction;
and generating a synthetic node according to the attribute information and the topology information of the minority nodes.
3. The method for classifying social users based on the graph neural network as claimed in claim 1, wherein the step of assigning the pseudo labels to the partial synthetic nodes comprises:
according to the weight matrix and the neighborhood label information, obtaining the influence degree of the neighborhood label information on the predicted label;
and obtaining the predicted label according to the influence degree of the original label and the neighborhood label information on the predicted label.
4. The method for classifying social users based on the graph neural network as claimed in claim 1, wherein the step of combining the synthetic nodes, the adjacent information and the real nodes to construct the node balance graph comprises the following specific steps:
connecting real node embedding with synthetic node embedding to obtain an enhanced node representation set;
and embedding the synthesized nodes into the label node set to obtain an enhanced mark set.
5. The method of claim 1, wherein prior to the classification, the network is trained using an objective function to obtain a graph neural network classification model.
6. The method of claim 5, wherein the objective function is:
7. The method for classifying social users based on a graph neural network as claimed in claim 5, wherein two layers of GraphSage are used as a backbone model structure.
8. A social user classification system based on a graph neural network is characterized by comprising a feature extractor, a node generator, an edge generator, a label propagator and a GNN classifier;
the feature extractor is used for acquiring original graph structure data constructed based on social user data and acquiring node representation according to the original graph structure data;
the node generator is used for generating a synthetic node aiming at a few nodes in the data based on the node representation;
the edge generator is used for acquiring the adjacent information of the synthetic node based on the synthetic node;
the label propagator is used for distributing pseudo labels for the synthetic nodes;
the GNN classifier is used for combining the synthesized nodes, the adjacent information and the real nodes to construct a node balance graph and classify.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the steps of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211307295.8A CN115631057A (en) | 2022-10-24 | 2022-10-24 | Social user classification method and system based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211307295.8A CN115631057A (en) | 2022-10-24 | 2022-10-24 | Social user classification method and system based on graph neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115631057A true CN115631057A (en) | 2023-01-20 |
Family
ID=84907642
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211307295.8A Pending CN115631057A (en) | 2022-10-24 | 2022-10-24 | Social user classification method and system based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115631057A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116032665A (en) * | 2023-03-28 | 2023-04-28 | 北京芯盾时代科技有限公司 | Network group discovery method, device, equipment and storage medium |
-
2022
- 2022-10-24 CN CN202211307295.8A patent/CN115631057A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116032665A (en) * | 2023-03-28 | 2023-04-28 | 北京芯盾时代科技有限公司 | Network group discovery method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing | |
CN110263280B (en) | Multi-view-based dynamic link prediction depth model and application | |
CN109816032B (en) | Unbiased mapping zero sample classification method and device based on generative countermeasure network | |
CN112784964A (en) | Image classification method based on bridging knowledge distillation convolution neural network | |
CN111862140A (en) | Panoramic segmentation network and method based on collaborative module level search | |
CN113326377A (en) | Name disambiguation method and system based on enterprise incidence relation | |
CN113988464B (en) | Network link attribute relation prediction method and device based on graph neural network | |
CN116541779B (en) | Individualized public safety emergency detection model training method, detection method and device | |
CN113554100B (en) | Web service classification method for enhancing attention network of special composition picture | |
CN114064627A (en) | Knowledge graph link completion method and system for multiple relations | |
CN111325340A (en) | Information network relation prediction method and system | |
CN115631057A (en) | Social user classification method and system based on graph neural network | |
CN106407379A (en) | Hadoop platform based movie recommendation method | |
CN115396366A (en) | Distributed intelligent routing method based on graph attention network | |
CN117315331A (en) | Dynamic graph anomaly detection method and system based on GNN and LSTM | |
CN108614932B (en) | Edge graph-based linear flow overlapping community discovery method, system and storage medium | |
CN109697511B (en) | Data reasoning method and device and computer equipment | |
CN106875043B (en) | Node migration network block optimization method based on GN splitting algorithm | |
CN104732278A (en) | Deep neural network training method based on sea-cloud collaboration framework | |
KR101878213B1 (en) | Method, apparatus and computer program for summaring of a weighted graph | |
CN113159976B (en) | Identification method for important users of microblog network | |
Bhardwaj et al. | User intent classification using memory networks: A comparative analysis for a limited data scenario | |
CN110110764B (en) | Random forest strategy optimization method based on hybrid network and storage medium | |
Vatchova et al. | DEEP LEARNING OF COMPLEX INTERCONNECTED PROCESSES FOR BILEVEL OPTIMIZATION PROBLEM UNDER UNCERTAINTY | |
CN116155755B (en) | Link symbol prediction method based on linear optimization closed sub-graph coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Applicant after: Qilu University of Technology (Shandong Academy of Sciences) Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Applicant before: Qilu University of Technology Country or region before: China |