WO2024120166A1 - 数据处理方法、类别识别方法及计算机设备 - Google Patents

数据处理方法、类别识别方法及计算机设备 Download PDF

Info

Publication number
WO2024120166A1
WO2024120166A1 PCT/CN2023/132685 CN2023132685W WO2024120166A1 WO 2024120166 A1 WO2024120166 A1 WO 2024120166A1 CN 2023132685 W CN2023132685 W CN 2023132685W WO 2024120166 A1 WO2024120166 A1 WO 2024120166A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
nodes
feature
target
matrix
Prior art date
Application number
PCT/CN2023/132685
Other languages
English (en)
French (fr)
Inventor
赵宏宇
赵国庆
蒋宁
肖冰
Original Assignee
马上消费金融股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 马上消费金融股份有限公司 filed Critical 马上消费金融股份有限公司
Publication of WO2024120166A1 publication Critical patent/WO2024120166A1/zh

Links

Definitions

  • the present application relates to the field of computer technology, and in particular to a data processing method, a category identification method and a computer device.
  • graph neural network models can be used as classification models to predict the categories of unlabeled nodes in graph data.
  • the graph data contains labeled nodes of known categories and unlabeled nodes of unknown categories.
  • the graph neural network model can be used to predict the categories of unlabeled nodes in the graph data; for example, the unlabeled nodes include article nodes, and the graph neural network model is used to identify the category of the article to be classified.
  • the unlabeled nodes include user nodes, and the graph neural network model is used to identify whether the user is a risky user, or to identify whether the user is a potential user of a product to be pushed.
  • the feature similarity between two adjacent nodes it is usually necessary to calculate the feature similarity between two adjacent nodes, and then use the feature similarity as the weight coefficient.
  • the feature aggregation of another target node is performed, that is, different neighboring nodes are aggregated according to the feature similarity between the neighboring nodes and the target node.
  • Different weight values are assigned, and then the category of the node is predicted based on the aggregated feature information; wherein, since all nodes in the graph data need to be traversed, the feature similarities between two nodes with connecting edges need to be calculated one by one.
  • the amount of calculation of the feature similarities between the target objects (such as target articles, target users, etc.) represented by the target nodes during the training of the classification model (such as the neural network model in the figure) is relatively large, resulting in a relatively long training time for the classification model used to identify the target object category (such as identifying the article category, identifying the user category, etc.).
  • the purpose of this application is to provide a data processing method and a category recognition method, so as to improve the prediction accuracy of the category of the object to be classified (such as article category, user category, etc.).
  • the present application provides a data processing method, which is applied to a classification model, wherein the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer, and the method includes: the node screening layer determines the dominant node information of the dominant node based on the edge data and feature information of the target node of the graph data; the target node includes a labeled node, and the labeled node includes the dominant node; the feature aggregation layer calculates the feature similarity between the target node and the neighboring nodes of the target node based on the dominant node information; and performs feature aggregation based on the feature similarity and the feature information of the neighboring nodes to obtain the aggregated feature information of the target node; the category prediction layer determines the node category prediction result based on the aggregated feature information, and the node category prediction result includes the predicted category label of the labeled node; based on the predicted category label and the true category label of the labeled no
  • the present application provides a category recognition method, which is applied to a classification model, wherein the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer, and the method includes: the node screening layer determines the dominant node information of the dominant node based on graph data, wherein the graph data includes nodes to be classified and sample nodes, and the nodes to be classified and the sample nodes include the dominant node; The feature aggregation layer calculates the feature similarity between the node to be classified and the neighboring nodes of the node to be classified based on the dominant node information, and performs feature aggregation based on the feature similarity and the feature information of the neighboring nodes to obtain the aggregated feature information of the node to be classified; the category prediction layer determines the predicted category information of the node to be classified based on the aggregated feature information.
  • the present application provides a data processing device, which is provided with a classification model, and the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer.
  • the device includes: the node screening layer determines the dominant node information of the dominant node based on the edge data and feature information of the target node of the graph data; the target node includes a labeled node, and the labeled node includes the dominant node; the feature aggregation layer calculates the feature similarity between the target node and the neighboring nodes of the target node based on the dominant node information; and performs feature aggregation based on the feature similarity and the feature information of the neighboring nodes to obtain the aggregated feature information of the target node; the category prediction layer determines the node category prediction result based on the aggregated feature information, and the node category prediction result includes the predicted category label of the labeled node; based on the predicted category label and the true category label of the labeled node,
  • the present application provides a category identification device, which is provided with a classification model, and the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer.
  • the device includes: the node screening layer determines the dominant node information of the dominant node based on graph data, and the graph data includes nodes to be classified and sample nodes, and the nodes to be classified and the sample nodes include the dominant node; the feature aggregation layer calculates the feature similarity between the node to be classified and the neighboring nodes of the node to be classified based on the dominant node information, and performs feature aggregation based on the feature similarity and the feature information of the neighboring nodes to obtain the aggregated feature information of the node to be classified; the category prediction layer determines the predicted category information of the node to be classified based on the aggregated feature information.
  • the present application provides a computer device, the device comprising: a processor; and A memory is arranged to store computer executable instructions, wherein the executable instructions are configured to be executed by the processor, and the executable instructions include instructions for executing the steps in the above method.
  • the present application provides a storage medium, wherein the storage medium is used to store computer-executable instructions, and the executable instructions enable a computer to execute the steps in the above method.
  • an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the above method.
  • FIG1 is a schematic diagram of a first flow chart of a data processing method provided in an embodiment of the present application.
  • FIG2a is a schematic diagram of a second flow chart of a data processing method provided in an embodiment of the present application.
  • FIG2b is a schematic diagram of the flow of each model training process in the data processing method provided in an embodiment of the present application.
  • FIG3 is a schematic diagram of a first implementation principle of a data processing method provided in an embodiment of the present application.
  • FIG4 is a schematic diagram of a second implementation principle of the data processing method provided in an embodiment of the present application.
  • FIG5 is a schematic diagram of a third implementation principle of the data processing method provided in an embodiment of the present application.
  • FIG6 is a schematic diagram of a fourth implementation principle of the data processing method provided in an embodiment of the present application.
  • FIG7a is a schematic diagram of a first flow chart of a category identification method provided in an embodiment of the present application.
  • FIG7b is a schematic diagram of a second flow chart of the category identification method provided in an embodiment of the present application.
  • FIG8 is a schematic diagram of the module composition of a data processing device provided in an embodiment of the present application.
  • FIG9 is a schematic diagram of the module composition of a category identification device provided in an embodiment of the present application.
  • FIG. 10 is a schematic diagram of the structure of a computer device provided in an embodiment of the present application.
  • One or more embodiments of the present application provide a data processing method and a category recognition method.
  • a classification model such as a neural network model
  • the number of nodes in the graph data is usually increased, which will increase the amount of calculation of node feature similarity. Therefore, in order to reduce the amount of calculation of node feature similarity, if random sampling of the neighborhood nodes of a target node is adopted, only the feature similarity between the target node and the randomly sampled neighborhood nodes is calculated, and then the features of the randomly sampled neighborhood nodes are aggregated to the target node based on different weight coefficients, there is bound to be a problem of low accuracy of the randomly sampled neighborhood nodes.
  • the technical solution is precisely based on the long-tail distribution characteristics of the self-attention mechanism, that is, the self-attention mechanism is sparse, and a small number of dot products contribute to most of the attention scores, while the remaining dot products can be ignored, that is, some neighborhood nodes with a relatively large contribution to a certain central node can be used as dominant neighborhood nodes, while the remaining non-dominant neighborhood nodes with a relatively small contribution to a certain central node can be ignored.
  • the dominant node information based on the node edge data and the node feature information, that is, locating the structure-rich And the K dominant nodes with high feature discrimination are selected, and the feature similarity is calculated based on the relevant information of the dominant nodes.
  • the non-dominant nodes will not participate in the calculation of feature similarity as the dominant neighborhood nodes of a certain central node (i.e., any target node), so as to achieve sparseness of the neighborhood nodes of the central node, thereby reducing the amount of calculation of node feature similarity (i.e., eliminating the amount of attention score calculation of the central node and some neighborhood nodes), thereby improving the model training efficiency;
  • the amount of calculation of node feature similarity can be accurately controlled to be relatively small in the model training stage, therefore, for the case where the classification model training and node category prediction are completed together, that is, the target node can include the node to be classified, the category prediction efficiency of the node to be classified can also be ensured, and since the model parameters used for category prediction are obtained by iteratively updating the parameters of the node to be classified as an unlabeled node, the accuracy of the model parameters can be improved, so as to achieve the effect of taking into account both category prediction efficiency and prediction accuracy
  • FIG1 is a first flow chart of a data processing method provided by one or more embodiments of the present application.
  • the method in FIG1 can be executed by an electronic device provided with a classification model training device, which can be a terminal device or a designated server.
  • the classification model can be applied to any application scenario where a node in graph data needs to be classified, for example, to predict the category of an article node to be tested in graph data, or to predict the type of a user node to be identified in graph data (such as identifying whether a user applying for a loan business is a risky user, or identifying whether a registered user of a preset application is a push user of a target product, etc.); specifically, the above data processing method is applied to a classification model, which includes a node screening layer, a feature aggregation layer, and a category prediction layer. As shown in FIG1 , the above data processing method includes at least the following steps:
  • the node screening layer determines the dominant node information of the dominant node based on the edge data and feature information of the target node of the graph data; the target node includes a labeled node, and the labeled node includes the dominant node.
  • the above-mentioned graph data may include first graph data corresponding to the target classification task, and the first graph data may include target nodes, feature information of the target nodes, and edge data between the target nodes.
  • the feature aggregation layer calculates the feature similarity between the target node and the neighboring nodes of the target node based on the dominant node information; and based on the feature similarity and the neighboring nodes The feature information of the target node is aggregated to obtain the aggregated feature information of the target node.
  • the above-mentioned neighborhood node can be a neighborhood node of the target node in the above-mentioned dominant node, that is, the neighborhood node of the target node refers to a dominant node that has a connecting edge with the target node among the multiple screened dominant nodes, and can also be called a dominant neighborhood node of the target node.
  • the category prediction layer determines a node category prediction result based on the above-aggregated feature information, and the node category prediction result includes a predicted category label of the labeled node.
  • the above data processing method includes at least the following steps:
  • the first graph data includes P target nodes, feature information of the target nodes, and edge data between the target nodes, the P target nodes include N labeled nodes, P and N are both integers greater than 1 and N is less than P.
  • the above-mentioned target classification task may include any one of an article classification task, a risk user classification task, and a push user classification task.
  • the above-mentioned target node is used to represent a target object, which includes any one of a target article and a target user.
  • the characteristic information of the target node includes the characteristic information of the target object represented by the target node; for example, the characteristic information of the target user may include any one of basic characteristic information and business-related characteristic information.
  • the basic characteristic information may include any one of user age, user gender, and user occupation.
  • the business-related characteristic information may include relevant characteristic information generated in response to the target user's business request for the target business.
  • the target business includes online shopping business or loan business.
  • the business-related characteristic information may include loan amount, loan method, loan limit, etc. If the target task is to display shopping business, the business-related characteristic information may include user payment frequency, user payment amount, user delivery address, etc.; for another example, the characteristic information of the target article may include any one of keywords, cited articles, and article authors.
  • the edge data between target nodes includes the connection edges between target objects represented by the target nodes (that is, it indicates that there is a preset association relationship between the target objects).
  • the target object as the target user as an example, if If the target user A and the target user B use the same device number when initiating a business application, it is considered that there is a preset association relationship between the target users; if the target user A's reserved credit reporting mobile phone number is the same as the contact mobile phone number of the target user B in the business application, it is considered that there is a preset association relationship between the target users; if the application mobile phone number filled in by the target user A when applying for the business is the same as the contact mobile phone number of the target user B in the business application, it is considered that there is a preset association relationship between the target users; if the bound mobile phone number of a bank card of the target user A is the same as the contact mobile phone number of the target user B in the business application, it is considered that there is a preset association relationship between the target users.
  • the above-mentioned labeled nodes are used to represent a target object with a category label; with respect to the naming of target nodes in graph data, from the perspective of the model usage stage (model training or category prediction), the above-mentioned target nodes can be sample nodes or nodes to be classified; from the perspective of whether the node has a category label, the above-mentioned target node can be a labeled node or an unlabeled node.
  • labeled nodes serve as sample nodes and participate in the calculation of model loss values
  • unlabeled nodes can be sample nodes and participate in feature aggregation processing.
  • Unlabeled samples can also be nodes to be classified; in addition, with respect to the feature aggregation process, the target node can serve as a central node or a neighborhood node (i.e., the feature information of the central node is updated based on the feature information of the neighborhood nodes). From the perspective of the importance of the node itself, the target node can be divided into a dominant node or a non-dominant node. From the perspective of the importance of the neighborhood node to the central node, the neighborhood node can be divided into a dominant neighborhood node and a non-dominant neighborhood node; the node naming method can be adjusted according to actual needs and does not limit the protection scope of the present application.
  • N is less than or equal to P. If N can be equal to P, the P target nodes are all labeled nodes. However, in order to ensure the node feature aggregation effect without the need for a large number of labeled nodes, multiple unlabeled nodes are usually added to the graph data, that is, the P target nodes include not only labeled nodes but also unlabeled nodes. N is less than P. In this way, the features of the labeled nodes can be aggregated based on the feature information of the unlabeled nodes, thereby enhancing the feature representation of the labeled nodes, thereby improving the accuracy of category prediction based on the aggregated node features.
  • the above-mentioned classification model can be a graph neural network model (such as a graph attention network model), which iteratively updates the model parameters based on the first graph data until the current model training results meet the preset model training end conditions to obtain the trained classification model;
  • the above-mentioned preset model training end conditions may include: the current model training round number is equal to the total training round number, or the model loss function converges.
  • the specific implementation process of the model iterative training is described below. Since the processing process of each model training in the model iterative training process is the same, any model training is taken as an example for detailed description.
  • the above classification model may include a node screening layer, a feature aggregation layer, and a category prediction layer; as shown in FIG2b, the specific implementation method of each model training may include the following steps S1042 to S1048:
  • the node screening layer determines the first dominant node information based on the edge data and feature information corresponding to the first graph data; wherein the first dominant node information corresponds to K dominant nodes selected from P target nodes; K is an integer greater than 1 and less than P.
  • K dominant nodes are screened out from the P target nodes, and then the first dominant node information is determined based on the relevant information of the K dominant nodes (such as feature importance scores or node feature information), that is, the above-mentioned first dominant node information may include any one of the node feature information and the feature importance score; in some example embodiments, the feature importance can be evaluated based on the feature information first, and then the structural importance can be evaluated based on the edge data to screen out the K dominant nodes; in some example embodiments, the structural importance can be evaluated based on the edge data first, and then the feature importance can be evaluated based on the feature information to screen out the K dominant nodes; in some example embodiments, the feature importance can be evaluated based on the feature information to screen out K1 candidate nodes from the P target nodes, and the structural importance can be evaluated based on the structural information to screen out K2 candidate nodes from the P target nodes, and then
  • the feature aggregation layer calculates the feature similarity between the target node and the neighboring nodes of the target node among the K dominant nodes based on the first dominant node information; and based on the feature similarity
  • the similarity degree and the feature information of the corresponding neighborhood nodes are used to aggregate the features of the target node to obtain the aggregated feature information of the target node; wherein the neighborhood node of the target node among the above-mentioned K dominant nodes refers to the dominant node among the K dominant nodes that has a connecting edge with the target node, that is, the dominant node among the K dominant nodes that has a preset association relationship with the target node, that is, the dominant neighborhood node of the target node.
  • the central node After determining the above-mentioned first dominant node information, take any target node as the central node, calculate the feature similarity between the central node and the dominant neighborhood nodes (i.e., the neighborhood nodes of the central node among the K dominant nodes), and then use the feature similarity as the weight coefficient, based on the feature information of the dominant neighborhood nodes, update the feature information of the central node; wherein, since the dominant neighborhood nodes of the central node are not all neighborhood nodes, but some neighborhood nodes belonging to the K dominant nodes, the amount of calculation can be reduced to the amount of calculation of the target node and the dominant neighborhood nodes, while the amount of calculation of the target node and the non-dominant neighborhood nodes (i.e., the neighborhood nodes of the target node among the (P-K) non-dominant nodes) is omitted.
  • the dominant neighborhood nodes of the central node among the K dominant nodes i.e., the neighborhood nodes of the central node among the K dominant
  • (P-K) non-dominant nodes may participate or not.
  • the weight coefficient between the central node and the non-dominant neighboring nodes can be set to a target value (such as a preset value or the minimum value of the weight coefficients corresponding to the dominant neighboring nodes).
  • feature aggregation is performed on the target node to obtain the aggregated feature information of the target node.
  • the category prediction layer determines the node category prediction result based on the above-aggregated feature information; wherein the node category prediction result includes the predicted category labels of N labeled nodes.
  • the aggregated feature information is used as the final feature information of the target node, and then the category of the target node is predicted based on the final feature information to obtain the predicted category label corresponding to each target node.
  • the loss value is calculated based on the predicted category label and the true category label of the labeled node, and then the total loss value is determined based on the loss values corresponding to the N labeled nodes. Then, the model parameters are updated based on the total loss value. It should be noted that the model parameters are iteratively trained based on the total loss value of the model to be trained. The existing process of tuning the model parameters by back propagation using the gradient descent method can be referred to, which will not be repeated here.
  • the dominant node information is first screened out based on the node edge data and the node feature information, that is, K dominant nodes with rich structure and high feature distinction are located, and then the feature similarity is calculated based on the relevant information of the dominant nodes.
  • non-dominant nodes will not participate in the calculation of feature similarity as the dominant neighborhood nodes of a certain central node (that is, any target node), so as to achieve sparseness of the neighborhood nodes of the central node, thereby reducing the amount of calculation of node feature similarity (that is, the amount of calculation of attention scores of the central node and some neighborhood nodes is omitted), thereby improving the model training efficiency; on the other hand, since in the model training stage, the K dominant nodes are screened out based on the node edge data and the node feature information, and are not randomly selected from the P target nodes, this can not only reduce the amount of calculation of feature similarity, but also accurately determine which nodes need to calculate the feature similarity (that is, the target pair represented by any target node).
  • the category prediction efficiency of the node to be classified (that is, the prediction efficiency of the category of the object to be classified represented by the node to be classified) can also be ensured.
  • the model parameters used for category prediction are obtained by iteratively updating the parameters of the node to be classified used to represent the object to be classified as an unlabeled node, the feature information and edge data of the object to be classified affect the model parameter update of the classification model. Therefore, the accuracy of category prediction of the object to be classified using the classification model is higher, thereby achieving the effect of taking into account both the category prediction efficiency and the prediction accuracy of the object to be classified.
  • the above classification model is a graph neural network model, which includes a node screening layer, a feature aggregation layer, and a category prediction layer;
  • the specific implementation process of the training method of the graph neural network model is given, which specifically includes:
  • target graph data corresponding to the risk user identification task (i.e., the above-mentioned first graph data); wherein, the target graph data includes P user nodes, feature information of the user nodes, and edge data between user nodes, the P user nodes include N labeled nodes, P and N are both integers greater than 1 and N is less than P; specifically, the edge data includes connecting edges between user nodes with preset association relationships, and the preset association relationships may include any one of the following: the device number used by the target user when initiating a business application is the same, the credit reserved mobile phone number of the first user is the same as the contact mobile phone number of the second user applying for the business, the application mobile phone number filled in by the first user when applying for the business is the same as the contact mobile phone number of the second user applying for the business, and the mobile phone number bound to a bank card of the first user is the same as the contact mobile phone number of the second user applying for the business.
  • the target graph data is input into the graph neural network model to be trained for iterative model training, and the trained graph neural network model is obtained as a classification model; wherein, the specific implementation methods of each model training can be: the node screening layer determines the dominant user node information (i.e., the first dominant node information) based on the edge data and feature information between the nodes of the target graph data, that is, the node screening layer performs structural importance evaluation and feature importance evaluation based on the edge data and feature information between the nodes of the target graph data to obtain the dominant user node information; wherein, the dominant user node information corresponds to K dominant user nodes selected from P user nodes; K is an integer greater than 1 and less than P; the feature aggregation layer calculates the feature similarity between any user node and the neighboring nodes of the user node in the K dominant user nodes based on the dominant user node information; and based on the feature similarity and the feature information of the corresponding neighboring nodes, the user node is feature aggregated to obtain the
  • the total loss value is determined, and the parameters of the graph neural network model to be trained are updated based on the total loss value.
  • the target classification task is an article classification task or a push user classification task.
  • the specific implementation process of the service please refer to the specific implementation process of the above-mentioned target classification task as the risk user classification task, which will not be repeated here.
  • feature importance evaluation may be performed based on feature information first, and then structure importance evaluation may be performed based on edge data.
  • structure importance evaluation may be performed based on edge data first, and then feature importance evaluation may be performed based on feature information. Since structural information can more intuitively affect the accuracy of feature aggregation, taking the example of performing structural importance evaluation first and then performing feature importance evaluation, for the process of determining the first dominant node information, the first dominant node information is determined based on the edge data and feature information corresponding to the first graph data in S1042, specifically including:
  • Step A1 performing structural importance evaluation on P target nodes based on edge data between nodes of the first graph data to obtain a first evaluation result.
  • the structural importance of the target node is scored based on the number of connecting edges between the target node and the neighboring nodes to obtain a structural importance score; wherein, the more connecting edges there are between the target node and the neighboring nodes, the greater the structural importance score; in the specific implementation, since for a fixed graph data, the structural information of the graph data will not change, therefore, during multiple rounds of model training, the above step A1 can be calculated only once.
  • the structural importance evaluation may be performed directly based on the structural information of each target node, or the feature importance evaluation may be performed based on the adjacency matrix corresponding to the P target nodes.
  • Step A2 based on the first evaluation result and the feature information corresponding to the first graph data, determine the candidate node feature information; the candidate node feature information includes feature information of M candidate nodes selected from P target nodes, where M is greater than K and less than P.
  • the first evaluation result includes the structural importance score of each target node
  • a candidate node set (including M candidate nodes) can be screened out from the P target nodes based on the structural importance score; and considering that the node feature importance needs to be evaluated based on the feature information of the candidate nodes in the future, it is necessary to locate the feature information of the M candidate nodes first, so as to facilitate the subsequent calculation.
  • the feature importance scores of the M candidate nodes provide basic data.
  • Step A3 perform feature importance evaluation on the M candidate nodes based on the above candidate node feature information to obtain a second evaluation result.
  • the feature importance of the candidate node is scored to obtain a feature importance score; wherein, the higher the discriminability of the feature information of the target node (that is, it contains more key information that helps to identify the node category), the greater the feature importance score.
  • the structural importance evaluation may be performed directly based on the feature information of each candidate node, or may be performed based on the feature matrix of the M candidate nodes.
  • Step A4 based on the second evaluation result, determine the first dominant node information; the first dominant node information includes relevant information of K dominant nodes selected from M candidate nodes, and the relevant information includes any one of feature importance scores and node feature information.
  • a dominant node set (including K dominant nodes) can be screened out from the M candidate nodes based on the feature importance score; considering that the node feature similarity needs to be calculated based on the first dominant node information in the future, it is necessary to determine the first dominant node information based on the relevant information of the above K dominant nodes, so as to provide basic data for the subsequent calculation of the node feature similarity; wherein the first dominant node information can be regarded as the relevant information of K dominant nodes with rich structural information and high feature information differentiation, which are screened out after the node structure importance evaluation and the node feature importance evaluation.
  • a specific implementation process of the data processing method is given for the process of determining the dominant user node information, which specifically includes: performing a structural importance evaluation on P user nodes based on the edge data between the nodes of the target graph data to obtain a first evaluation result; determining candidate node feature information based on the first evaluation result and feature information corresponding to the target graph data; the candidate node feature information includes feature information of M candidate user nodes selected from the P user nodes; performing a feature importance evaluation on the M candidate user nodes based on the candidate node feature information to obtain a second evaluation result; determining the dominant user node information based on the second evaluation result;
  • the above-mentioned dominant user node information includes relevant information of K dominant user nodes selected from M candidate user nodes, and the relevant information includes any one of feature importance scores and node feature information.
  • the adjacency matrix corresponding to the first graph data can be fully used to score the structural richness of the P target nodes, and then the structural importance scores corresponding to the P target nodes are obtained, and then the first evaluation result is generated based on the structural importance scores of the P nodes.
  • the above step A1 based on the edge data between the nodes of the first graph data, performs structural importance evaluation on the P target nodes to obtain the first evaluation result, specifically including:
  • Step A11 determining a first score matrix based on the adjacency matrix corresponding to the first graph data and the first reference matrix.
  • the adjacency matrix is obtained based on the edge data between the nodes of the first graph data, the first reference matrix is a column matrix containing P preset values, and the first score matrix is a column matrix containing the structural importance scores of P target nodes.
  • an adjacency matrix may be generated in advance based on edge data between nodes of the first graph data, or may be generated in real time based on edge data between nodes of the first graph data; wherein the value of each element in the adjacency matrix represents the connection relationship between two nodes (i.e., whether there is a connection).
  • the values of the P elements in the above-mentioned first reference matrix are all preset values, which can be 1 or an integer greater than 1. Since the adjacency matrix corresponding to the first graph data is a (P ⁇ P) matrix, the P elements in a row of the (P ⁇ P) matrix constitute a (1 ⁇ P) row matrix, and the row matrix represents the connection relationship between a target node and itself and other nodes.
  • the first reference matrix is a (P ⁇ 1) column matrix with all elements being 1
  • the product of the (1 ⁇ P) row matrix corresponding to a target node and the (P ⁇ 1) column matrix i.e., the first reference matrix
  • the adjacency matrix corresponding to the first graph data is multiplied by the first reference matrix to obtain the first score matrix, which is also a (P ⁇ 1) column matrix, and the values of the elements in the first score matrix correspond one-to-one to the structural importance scores of the target nodes, i.e., the value of an element in the first score matrix is the structural importance score of the corresponding target node.
  • S represents the first score matrix (which can be used as the node structure importance score vector)
  • A represents the adjacency matrix (which can be used as a square matrix to represent node structure information)
  • E represents a first reference matrix (which may be a column matrix whose elements are all preset values, such as 1), That is, the adjacency matrix of (P ⁇ P) is multiplied by the first reference matrix of (P ⁇ 1) to obtain the first score matrix of (P ⁇ 1).
  • Step A12 Determine the first evaluation results of the P target nodes based on the first score matrix.
  • the first score matrix can be directly determined as the first evaluation result, or the first score matrix can be subjected to preset processing (such as normalization processing, score fine-tuning processing, etc.), and the first score matrix after preset processing can be used as the first evaluation result.
  • preset processing such as normalization processing, score fine-tuning processing, etc.
  • the process of determining the candidate node feature information based on the first evaluation result i.e., first screening out M candidate nodes, and then thinning the feature information set to provide basic data for the subsequent calculation of the feature importance scores of the M candidate nodes
  • the process of obtaining the preliminary sparse node feature matrix based on this, the above step A2, based on the above first evaluation result and the feature information corresponding to the first graph data, determines the candidate node feature information, specifically including:
  • Step A21 based on the first evaluation result, select M candidate nodes from the P target nodes.
  • the structural importance scores of P target nodes the higher the structural importance score of the target node, the higher the structural richness of the target node. Therefore, based on the size relationship of the structural importance scores of the P target nodes, the structural importance scores can be sorted in descending order, and the target nodes corresponding to the top M structural importance scores (i.e., M candidate nodes) can be screened out.
  • the above M candidate nodes can be considered as a set of candidate nodes preliminarily screened out from the evaluation dimension of the richness of node structure information.
  • Step A22 determining a first target matrix based on the initial feature matrix and M candidate nodes corresponding to the first graph data.
  • the above-mentioned initial feature matrix is obtained based on the feature information corresponding to the above-mentioned first graph data, and the above-mentioned first target matrix is a feature matrix containing the feature information of M candidate nodes; specifically, the initial feature matrix can be obtained by linearly transforming the node feature matrix corresponding to the first graph data, and the node feature matrix can be a feature matrix obtained by pre-conversion based on the feature information corresponding to the first graph data.
  • the structural importance scores of the M candidate nodes selected based on the first evaluation result are relatively high, that is, from the evaluation dimension of the richness of node structure information
  • M candidate nodes with high structural richness are preliminarily selected (equivalent to the initial screening process of the dominant node), and then the initial feature matrix corresponding to the first graph data is sparsely processed based on the M candidate nodes to obtain the node feature matrix after preliminary sparse processing (i.e., the first target matrix), which saves the calculation amount of the feature similarity between the non-dominant nodes with relatively low node structure richness and the central node (i.e., any target node).
  • the above-mentioned first target matrix can be considered as a target matrix obtained by sparsely processing the feature information from the evaluation dimension of the richness of the node structure information. Then, the first target matrix is used as the basic data of the evaluation dimension of the node feature discrimination, so as to provide a basis for calculating the feature importance scores of the M candidate nodes.
  • the following formula may be used to perform sparse processing on the initial feature matrix based on the node identifiers of the M candidate nodes to obtain a first target matrix, which may be:
  • H1 represents the first target matrix (i.e., the node feature matrix after preliminary sparsification obtained from the evaluation dimension of the richness of node structure information)
  • H 0 represents the initial feature matrix (i.e., the node feature matrix after linear transformation and before sparseness)
  • D 1 represents the position information of M candidate nodes
  • M c 1 log(P)
  • c 1 is a scalar, that is, the initial feature matrix of (P ⁇ F) is converted to the first target matrix of (M ⁇ F).
  • Step A23 based on the above-mentioned first target matrix, determine the candidate node feature information corresponding to the first graph data.
  • the first target matrix can be directly determined as the candidate node feature information, or the first target matrix can be subjected to preset processing (such as normalization processing, feature fine-tuning processing, etc.), and the first target matrix after preset processing can be determined as the candidate node feature information.
  • preset processing such as normalization processing, feature fine-tuning processing, etc.
  • the node feature importance can be scored, that is, the size of a certain element in the projection transformation matrix corresponding to the node feature matrix can reflect the degree of feature importance of the node.
  • the projection transformation matrix corresponding to the node feature matrix can be fully utilized to obtain the feature importance scores corresponding to the M candidate nodes, and then based on the M node feature importance scores Generate a second evaluation result; in addition, since the node feature importance evaluation is implemented on the basis of the node structure importance evaluation, that is, it is necessary to perform feature importance evaluation on the above-mentioned M candidate nodes based on the above-mentioned candidate node feature information.
  • the above-mentioned candidate node feature information includes the first target matrix, that is, the node feature importance evaluation is performed based on the first target matrix; based on this, the above-mentioned step A3, based on the above-mentioned candidate node feature information, performs feature importance evaluation on the M candidate nodes to obtain the second evaluation result, which specifically includes:
  • Step A31 determining a second score matrix based on the first target matrix and the first parameter matrix.
  • the first parameter matrix is a (F ⁇ 1) parameter matrix
  • the second score matrix is a column matrix containing the feature importance scores of M candidate nodes, where F is an integer greater than 1.
  • the first parameter matrix is related to the parameters to be trained of the model network layer itself, and the F elements in the first parameter matrix correspond to the network layer parameters of the output dimension of the single-layer graph convolution calculation, that is, one element in the first parameter matrix corresponds to a network parameter of one output dimension.
  • F1 represents the second score matrix (which can be used as the node feature importance score vector)
  • H 1 represents the first target matrix
  • W 1 represents the first parameter matrix (which can be used as the model parameters to be trained)
  • the first target matrix (M ⁇ F) is multiplied by the first parameter matrix (F ⁇ 1) to obtain the second score matrix (M ⁇ 1).
  • Step A32 Determine the second evaluation results of the M candidate nodes based on the second score matrix.
  • the second score matrix can be directly determined as the second evaluation result, or the second score matrix can be subjected to preset processing (such as normalization processing, score fine-tuning processing, partial element zeroing processing, etc.), and the first score matrix after preset processing can be used as the second evaluation result.
  • preset processing such as normalization processing, score fine-tuning processing, partial element zeroing processing, etc.
  • the first target moment contained in the candidate node feature information The matrix can be obtained based on the above step A22, or it can be obtained by other processing methods.
  • the present application does not limit this. That is, the specific processing method for obtaining the feature matrix containing feature information of M candidate nodes is within the protection scope of the present application.
  • the K dominant nodes may contain nodes with relatively small feature importance scores, so that feature aggregation of the target node based on the feature information of the node with relatively low feature discrimination may reduce the accuracy of node feature aggregation (that is, the node feature importance score is relatively small, indicating that the node feature discrimination is low, which will reduce the discrimination of the aggregated feature information).
  • the elements whose feature importance scores in the second score matrix meet the preset constraints can be first set to zero to obtain a third score matrix, and then K dominant nodes are selected based on the third score matrix.
  • the above step A32 determines the second evaluation results of the above M candidate nodes, specifically including:
  • Step A321 based on the above-mentioned second score matrix, determine the score deviation matrix; wherein the score deviation matrix is a column matrix containing the difference between the feature importance scores of M candidate nodes and the score mean; the score mean can be the average value of the M feature importance scores in the second score matrix.
  • each element in the above-mentioned score deviation matrix is related to the above-mentioned preset constraints (i.e., the constraints used to limit which elements in the second score matrix are set to zero). For example, if the preset constraints are to set the elements in the second score matrix whose feature importance scores are less than the score mean to zero, then the values of the M elements in the score deviation matrix are the difference between the feature importance scores of the M candidate nodes and the score mean; for example, if the preset constraints are to set the elements in the second score matrix whose feature importance scores are less than the preset threshold to zero, then the values of the M elements in the score deviation matrix are the difference between the feature importance scores of the M candidate nodes and the preset threshold; the preset constraints can be set according to actual needs, and no matter how they are set, they are within the scope of protection of this application.
  • the preset constraints can be set according to actual needs, and no matter how they are set, they are within the scope of protection of this application.
  • F represents the score deviation matrix
  • F 1 represents the second score matrix
  • mean(F 1 ) represents the mean score matrix (that is, the values of the M elements are all score means), that is, the (M ⁇ 1) second score matrix is subtracted from the (M ⁇ 1) mean score matrix to obtain the (M ⁇ 1) score deviation matrix.
  • Step A322 determining a third score matrix based on the second score matrix and the second reference matrix; wherein the second reference matrix is determined based on the score deviation matrix.
  • the determination process of the above-mentioned second reference matrix may include: if the value of any element in the above-mentioned fractional deviation matrix is greater than a certain value (such as the value is 0), then the value of the corresponding element in the above-mentioned second reference matrix is 1; if the value of any element in the above-mentioned fractional deviation matrix is less than or equal to a certain value (such as the value is 0), then the value of the corresponding element in the above-mentioned second reference matrix is 0, that is, the above-mentioned second reference matrix is a 0-1 matrix.
  • the following formula may be used to determine the second reference matrix based on the fractional deviation matrix, which may be:
  • F′ represents the second reference matrix (a 0-1 matrix used to constrain which elements in the second fractional matrix are set to zero), and F represents the fractional deviation matrix, that is, the (M ⁇ 1) fractional deviation matrix is converted to the (M ⁇ 1) 0-1 matrix (that is, the second reference matrix).
  • any element in the third score matrix is the product of the corresponding element in the second score matrix and the second reference matrix, so that the elements in the second score matrix whose feature importance scores are less than or equal to the score mean are set to zero, while the elements in the second score matrix whose feature importance scores are greater than the score mean remain unchanged.
  • F 2 ′ represents the third score matrix
  • F′ represents the second reference matrix
  • F 1 represents the second score matrix
  • * represents element multiplication, that is, the second reference matrix (M ⁇ 1) and the second score matrix (M ⁇ 1)
  • the corresponding elements in are multiplied to obtain a third score matrix of (M ⁇ 1), so that the elements in the second score matrix whose feature importance scores meet the preset constraints are set to zero.
  • Step A323 Determine the second evaluation results of the M candidate nodes based on the third score matrix.
  • the third score matrix can be directly determined as the second evaluation result, so that the final feature importance scores of the candidate nodes whose feature importance scores in the second evaluation result are less than or equal to the score mean (ie, the feature discrimination is relatively low) become zero.
  • K dominant nodes are selected from the M candidate nodes, which can not only ensure the number of dominant nodes finally screened out, but also ensure that only nodes with feature importance scores greater than the mean score will participate in the calculation of feature similarity as dominant neighborhood nodes of the target node. Even if the K dominant nodes include a target node with a feature importance score less than or equal to the mean score, since the feature importance score corresponding to the dominant node in the third score matrix becomes zero, the feature information of the dominant node will not affect the feature aggregation of the target node.
  • the above step A4 determines the first dominant node information, specifically including:
  • Step A41 Based on the second evaluation result, K dominant nodes are selected from the M candidate nodes.
  • the second evaluation result may include the second score matrix or the third score matrix.
  • the K dominant nodes may be considered to be the nodes that are successively evaluated based on the richness of node structure information and node feature information. The two evaluation dimensions of discrimination select the dominant nodes from the dominant node subset.
  • Step A42 Determine a second target matrix based on the K dominant nodes.
  • the above-mentioned second target matrix is mainly used as the basic data for calculating the node feature similarity. Since the feature similarity between two adjacent nodes can be calculated by comparing the node feature information, considering that the feature importance score of each node is also obtained based on the information content contained in the node feature information, the feature similarity between two adjacent nodes can also be calculated directly based on the feature importance score of each node. In this way, there is no need to compare and calculate the node feature information, which can further reduce the amount of information calculation. Based on this, the above-mentioned first dominant node information includes any one of the node feature information and feature importance scores of the K dominant nodes.
  • the above-mentioned second target matrix includes any one of the sparse feature matrix corresponding to the K dominant nodes and the fourth score matrix.
  • the sparse feature matrix is a feature matrix containing the feature information of the K dominant nodes
  • the fourth score matrix is a score matrix containing the feature importance scores of the K dominant nodes.
  • the second target matrix can be considered as a target matrix obtained by sparsely processing the feature information or feature importance scores from the evaluation dimension of the node feature information discrimination degree, and then the feature similarity between two adjacent nodes is calculated based on the second target matrix.
  • the following formula is used to sparse the second score matrix based on the node identifiers of the K dominant nodes to obtain the fourth score matrix, which can be:
  • F2 represents the fourth score matrix
  • F1 represents the second score matrix
  • D2 represents the position information of K dominant nodes, that is, the (M ⁇ 1) second score matrix is sparsely processed to obtain the (K ⁇ 1) fourth score matrix (that is, the second target matrix).
  • the following formula is used to sparse the third score matrix based on the node identifiers of the K dominant nodes to obtain the fourth score matrix, which can be:
  • F 2 represents the fourth score matrix
  • F 2 ′ represents the third score matrix
  • D 2 represents the position information of K dominant nodes, that is, the (M ⁇ 1) third score matrix is sparsely processed to obtain the (K ⁇ 1) fourth score matrix (that is, the second target matrix).
  • the target nodes other than the K dominant nodes can be regarded as long-tail nodes whose attention scores can be ignored. These long-tail nodes can be exempted from the node feature similarity calculation. Since the long-tail nodes account for a relatively large proportion of the P target nodes, the calculation amount of the node feature similarity can be greatly reduced.
  • the number of nodes with non-zero scores in the fourth score matrix F2 is less than or equal to K, and the nodes with zero scores in the fourth score matrix F2 can also be regarded as long-tail nodes.
  • the long-tail nodes do not participate in the node feature similarity calculation, the long-tail nodes still participate in the graph convolution calculation (i.e., feature aggregation). Therefore, in the specific implementation, a unified weight can be used as the attention score of the long-tail nodes (such as min( F2 )).
  • the initial feature matrix in the case where the second target matrix includes the sparse feature matrix, can be used as the basic data for feature sparsification.
  • the following formula can be used to perform sparsification processing on the initial feature matrix based on the node identifiers of the K dominant nodes to obtain a sparse feature matrix, which can be:
  • H 2 represents the sparse feature matrix (i.e., the second target matrix, the node feature matrix obtained by performing feature sparse processing from the two node importance evaluation dimensions of node structure information and node feature information)
  • H 0 represents the initial feature matrix
  • D 2 represents the position information of K dominant nodes
  • K c 2 log(P)
  • c 2 is a scalar
  • c 1 >c 2 that is, the initial feature matrix of (P ⁇ F) is converted into a sparse feature matrix of (K ⁇ F) (ie, the second target matrix).
  • the first target matrix can also be used as the basic data for feature sparsification.
  • the following formula is used to perform sparsification processing on the first target matrix based on the node identifiers of K dominant nodes to obtain a sparse feature matrix, which can be:
  • H 2 represents the sparse feature matrix (i.e., the second target matrix, the node feature matrix obtained by performing feature sparse processing from the two node importance evaluation dimensions of node structure information and node feature information)
  • H 1 represents the first target matrix
  • D 2 represents the position information of K dominant nodes
  • K c 2 log(P)
  • c 2 is a scalar
  • c 1 >c 2 that is, the (M ⁇ F) first target matrix is converted into a (K ⁇ F) sparse feature matrix (ie, the second target matrix).
  • the second target matrix includes the sparse feature matrix
  • the initial feature matrix or the first target matrix is directly sparsely processed based on the node identifiers of the K dominant nodes to obtain a sparse feature matrix
  • the obtained sparse feature matrix contains feature information of nodes with relatively low feature discrimination.
  • the target feature matrix is sparsely processed based on K dominant nodes using the following formula to obtain a sparse feature matrix, which can be:
  • H 2 represents the sparse feature matrix
  • H 1 ′ represents the target feature matrix
  • D 2 represents the position information of K dominant nodes, that is, the (M ⁇ F) target feature matrix is sparsely processed to obtain a (K ⁇ F) sparse feature matrix (ie, the second target matrix).
  • the above-mentioned second target matrix includes the above-mentioned fourth score matrix.
  • Step A43 Determine the first dominant node information corresponding to the first graph data based on the second target matrix.
  • the second target matrix can be directly determined as the first dominant node information, or the second target matrix can be subjected to preset processing (such as normalization processing, feature fine-tuning processing, etc.), and the second target matrix after preset processing can be determined as the first dominant node information.
  • preset processing such as normalization processing, feature fine-tuning processing, etc.
  • a more specific implementation process is given for the process of determining the dominant user node information as shown in FIG5, which specifically includes: determining a first score matrix based on the adjacency matrix and the first reference matrix corresponding to the target graph data, that is, performing a structural importance assessment based on the adjacency matrix and the first reference matrix to obtain a first score matrix; wherein the adjacency matrix is obtained based on the edge data between the nodes of the target graph data, the first reference matrix is a column matrix containing P preset values, and the first score matrix is a column matrix containing the structural importance scores of P user nodes; based on the first score matrix, determining the first assessment results of the P user nodes; based on the first assessment results, selecting M candidate user nodes from the P user nodes; based on the initial The first target matrix is determined based on the feature matrix and M candidate user nodes; wherein the initial feature matrix is obtained based on the feature information corresponding to the target graph data, and the first target matrix is
  • the process of calculating the node feature similarity can only calculate the feature similarity between the central node (i.e., any target node) and the corresponding dominant neighborhood nodes (i.e., the neighborhood nodes of the central node among the K dominant nodes), wherein, if in the process of evaluating the importance of node features, the second target matrix is directly determined as the first dominant node information, therefore, the above-mentioned first dominant node information includes the second target matrix, that is, the node feature similarity is calculated based on the second target matrix; specifically, the above-mentioned step S1044 calculates the feature similarity between the target node and the neighborhood nodes of the target node among the K dominant nodes based on the above-mentioned first dominant node information, specifically including: calculating the feature similarity between the target node and the neighborhood nodes of the target node among the K
  • the above-mentioned second target matrix includes any one of the sparse feature matrix and the fourth score matrix; the neighborhood nodes of the target node among the above-mentioned K dominant nodes include the dominant neighborhood nodes corresponding to the target node (that is, the first-order dominant neighborhood nodes of the target node among the K dominant nodes); specifically, for a certain central node (that is, any target node), the dominant neighborhood nodes corresponding to the central node are determined based on the adjacency matrix corresponding to the first graph data, that is, based on the row matrix corresponding to the target node in the adjacency matrix, determine the target node whose element value in the row matrix is 1 and whose column number corresponds to a certain dominant node.
  • the similarity between the feature information of the target node and the feature information of the dominant neighboring node of the target node may be calculated based on the sparse feature matrix
  • e ij represents the feature similarity between node i and node j
  • LeakyReLU represents the activation function
  • represents the vector concatenation operation
  • a T represents the parameter to be trained, which is used to project the concatenated vector into a one-dimensional vector
  • zi represents the feature vector of node i, zi ⁇ H 2
  • z j represents the feature vector of node j, zi ⁇ H2 .
  • the above-mentioned second target matrix can also include a fourth score matrix. Based on the fourth score matrix, the feature similarity between the target node and the dominant neighborhood node is calculated. At this time, there is no need to compare and calculate the node feature information, which can further reduce the amount of information calculation. Therefore, the feature similarity between the target node and the neighborhood nodes of the target node among the K dominant nodes (i.e., the dominant neighborhood nodes) can also be calculated based on the above-mentioned fourth score matrix.
  • e ij represents the feature similarity between node i and node j
  • LeakyReLU represents the activation function
  • yi represents the feature importance score of node i
  • y j represents the feature importance score of node j, yi ⁇ F 2 T .
  • the feature similarity may be directly used as the weight coefficient, or the feature similarity may be normalized first and the normalized feature similarity may be used as the weight coefficient.
  • the above-mentioned classification model also includes a feature conversion layer; the specific implementation method of each model training also includes: the feature conversion layer determines the initial feature matrix based on the node feature matrix and the second parameter matrix corresponding to the above-mentioned first graph data; wherein the above-mentioned node feature matrix is obtained based on the feature information corresponding to the first graph data, each row in the above-mentioned node feature matrix corresponds to C feature dimensions of a target node, the above-mentioned second parameter matrix is a (C ⁇ F) parameter matrix, and C and F are both integers greater than 1.
  • H 0 represents the initial feature matrix
  • X represents the node feature matrix
  • W 0 represents the second parameter matrix (which can be used as the model parameters to be trained), That is, the node feature matrix of (P ⁇ C) is multiplied by the second parameter matrix of (C ⁇ F) to obtain the initial feature matrix of (P ⁇ F).
  • the node category prediction can be completed in the process of model training.
  • the above-mentioned P target nodes also include (P-N) unlabeled nodes
  • the (P-N) unlabeled nodes include X nodes to be classified
  • the above-mentioned node category prediction results also include the predicted category labels of the (P-N) unlabeled nodes.
  • the above first graph data is input into the classification model to be trained for model iterative training.
  • the trained classification model also includes: based on the node category prediction results output by the last round of training of the above classification model, determining the predicted category labels of X nodes to be classified, where X is an integer greater than or equal to 1.
  • the above-mentioned P target nodes include not only labeled nodes for calculating parameter loss values, but also first unlabeled nodes for providing more feature information during feature aggregation, and second unlabeled nodes as nodes to be classified, that is, the above-mentioned P target nodes include N labeled nodes (i.e., sample nodes), X second unlabeled nodes (i.e., nodes to be classified), and (P-N-X) first unlabeled nodes (i.e., sample nodes).
  • N labeled nodes i.e., sample nodes
  • X second unlabeled nodes i.e., nodes to be classified
  • P-N-X first unlabeled nodes
  • one implementation method is that the model training stage is performed separately from the node category prediction stage, that is, the predicted category label of the node to be classified is performed after the model training is completed, wherein the above (PN) unlabeled nodes only include sample nodes, but not the nodes to be classified; another implementation method is that the model training stage is performed together with the node category prediction stage, that is, the predicted category label of the node to be classified can be obtained in the last round of model training, wherein the above (PN) unlabeled nodes include not only sample nodes but also nodes to be classified, and the nodes to be classified During the model training phase, nodes can participate in the iterative training of model parameters as unlabeled nodes.
  • the predicted category labels of the nodes to be classified can be determined (i.e., the model training has been completed, and the node category prediction results are accurate at this time. Therefore, the predicted category labels of the nodes to be classified can be directly determined).
  • the above-mentioned S102 obtaining the first graph data corresponding to the target classification task, specifically includes:
  • the target classification task is an article classification task
  • first graph data corresponding to the article classification task is obtained, and the target node includes an article node.
  • the above-mentioned classification model can be a graph neural network model for article classification.
  • the first graph data containing P article nodes is input into the graph neural network model to be trained for model iterative training, and the trained graph neural network model is obtained as the classification model; specifically, each target node corresponds to an article. If there is a preset association relationship (such as a citation relationship) between two articles, there is a connecting edge between the target nodes corresponding to the two articles.
  • the target classification task is a risk user classification task
  • first graph data corresponding to the risk user classification task is obtained, and the target node includes a user node.
  • the first graph data can be constructed based on a social network graph, the user node corresponds to a user in the social network graph, and the classification model can be a graph neural network model for identifying risky users.
  • the first graph data containing P user nodes is input into the graph neural network model to be trained for model iterative training, and the trained graph neural network model is obtained as a classification model; specifically, each target node corresponds to a target user, and if there is a preset association relationship between two users (such as a transfer transaction), there is a connecting edge between the target nodes corresponding to the two users.
  • the target classification task is a push user classification task
  • first graph data corresponding to the push user classification task is obtained, and the target node includes a user node.
  • the first graph data may be constructed based on a friend relationship graph, the user node corresponds to a user in the friend relationship graph, the classification model may be a graph neural network model for target push user identification, and the first graph data containing P user nodes is input into the graph neural network to be trained.
  • the graph neural network model is iteratively trained to obtain the trained graph neural network model as the classification model; specifically, each target node corresponds to a target user. If there is a preset association relationship (such as a friend relationship) between two users, there is a connection edge between the target nodes corresponding to the two users.
  • the data processing method in the embodiment of the present application first screens out the dominant node information based on the node edge data and the node feature information, that is, locates K dominant nodes with rich structure and high feature distinction, and then calculates the feature similarity based on the relevant information of the dominant node, so that the non-dominant node will not participate in the calculation of the feature similarity as the dominant neighborhood node of a certain central node (that is, any target node), thereby achieving sparseness of the neighborhood nodes of the central node, thereby reducing the amount of calculation of the node feature similarity (that is, eliminating the amount of calculation of the attention scores of the central node and some neighborhood nodes), thereby improving the model training efficiency; on the other hand, since the amount of calculation of the node feature similarity can be accurately controlled to be relatively small in the model training stage, therefore, for the case where the classification model training and node category prediction are completed together, that is, the target node can include the node to be classified, the category prediction efficiency of the node
  • an embodiment of the present application further provides a category identification method.
  • Figure 7a is a flow chart of the category identification method provided by the embodiment of the present application. The method in Figure 7a can be executed by an electronic device provided with the above-mentioned trained classification model, and the electronic device can be a terminal device or a designated server.
  • the classification model can be applied to any application scenario that requires classification of a node in the graph data, for example, to predict the category of the article node to be tested in the graph data, and for example, to predict the type of the user node to be identified in the graph data (such as identifying whether the user applying for a loan business is a risky user, or identifying whether the registered user of a preset application is the push user of the target product, etc.); specifically, the above-mentioned category identification method is applied to the classification model, and the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer. As shown in Figure 7a, the method includes at least the following steps:
  • the node screening layer determines the dominant node information of the dominant node based on the graph data.
  • the data includes nodes to be classified and sample nodes, and the nodes to be classified and the sample nodes include the dominant node.
  • the above-mentioned graph data may include second graph data corresponding to the target classification task, and the second graph data may include nodes to be classified and sample nodes, as well as node feature information and edge data between nodes.
  • the feature aggregation layer calculates the feature similarity between the node to be classified and the neighboring nodes of the node to be classified based on the dominant node information, and performs feature aggregation based on the feature similarity and feature information of the neighboring nodes to obtain aggregated feature information of the node to be classified;
  • the above-mentioned neighborhood nodes can be the neighborhood nodes of the node to be classified in the above-mentioned dominant node, that is, the neighborhood nodes of the node to be classified refer to the dominant nodes that have connecting edges with the node to be classified among the multiple dominant nodes screened out, and can also be called the dominant neighborhood nodes of the node to be classified.
  • the category prediction layer determines the predicted category information of the node to be classified based on the aggregated feature information.
  • the above category recognition method includes at least the following steps:
  • the second graph data includes X nodes to be classified and Q sample nodes, as well as node feature information and edge data between nodes, each node to be classified is used to represent an object to be classified, and the object to be classified includes any one of an article to be classified and a user to be classified, and X and Q are both integers greater than or equal to 1.
  • the Q sample nodes mentioned above may include the sample nodes used in the classification model training process, and Q is less than or equal to P; the target classification task mentioned above may include any one of an article classification task, a risk user classification task, and a push user classification task.
  • S704 Input the second graph data into the trained classification model to perform node category prediction, and obtain predicted category information of X nodes to be classified.
  • the above classification model can be a graph neural network model (such as a graph attention network model); the above trained classification model can be a model iteration of the trained classification model based on the above data processing method.
  • the specific implementation process is detailed in the above steps S102 and S104, which will not be repeated here.
  • the above classification model may include a node screening layer, a feature aggregation layer, and a category prediction layer; the specific implementation methods of the above node category prediction are:
  • the node screening layer determines the second dominant node information based on the second graph data; wherein the second dominant node information corresponds to L dominant nodes selected from (X+Q) nodes; L is an integer greater than 1 and less than (X+Q);
  • the process of determining the second dominant node information may refer to the process of determining the first dominant node information, which will not be described in detail herein.
  • the feature aggregation layer calculates the degree of feature similarity between the node to be classified and the neighboring nodes of the node to be classified among the L dominant nodes based on the second dominant node information; and performs feature aggregation on the node to be classified based on the feature similarity and the feature information of the corresponding neighboring nodes to obtain the aggregated feature information of the node to be classified; wherein, the neighboring nodes of the node to be classified among the L dominant nodes refer to the dominant nodes among the L dominant nodes that have connecting edges with the node to be classified, that is, the dominant nodes among the L dominant nodes that have a preset association relationship with the node to be classified, that is, the dominant neighboring nodes of the node to be classified.
  • the process of determining the degree of feature similarity between the node to be classified and the corresponding dominant neighboring node can refer to the process of determining the degree of feature similarity between the above-mentioned target node and the neighboring nodes of the target node among the K dominant nodes, which will not be repeated here; in addition, the feature aggregation process can also refer to the above-mentioned embodiment, which will not be repeated here.
  • the category prediction layer determines the predicted category information of the X nodes to be classified based on the aggregated feature information.
  • the preset information determined in the model training stage can be directly obtained without repeated determination; for example, the feature similarity between two nodes in Q sample nodes will not change due to the addition of X nodes to be classified in the graph data, therefore, the feature similarity between the sample nodes can be directly used as calculated in the model training stage.
  • the above-mentioned S702 obtaining the second graph data corresponding to the target classification task, specifically includes:
  • the target classification task is an article classification task
  • the second graph data corresponding to the article classification task is obtained, and the nodes to be classified and the sample nodes both include article nodes.
  • the target classification task is a risky user classification task
  • second graph data corresponding to the risky user classification task is obtained, and both the nodes to be classified and the sample nodes include user nodes.
  • the target classification task is a push user classification task
  • the category recognition method in the embodiment of the present application can first screen out the dominant node information based on the node edge data and the node feature information, that is, locate multiple dominant nodes with rich structure and high feature distinction, and then calculate the feature similarity based on the relevant information of the dominant nodes.
  • non-dominant nodes will not participate in the calculation of feature similarity as the dominant neighborhood nodes of a certain central node (that is, any target node), thereby achieving sparseness of the neighborhood nodes of the central node, thereby reducing the amount of calculation of node feature similarity (that is, eliminating the amount of calculation of the attention scores of the central node and some neighborhood nodes), thereby improving the efficiency of category prediction; in addition, since more unlabeled nodes can be added to participate in the iterative update of model parameters in the model training stage, the accuracy of the trained model parameters is higher, thereby making the accuracy of category prediction higher.
  • Figure 8 is a schematic diagram of the module composition of the data processing device provided in the embodiment of the present application. The device is used to execute the data processing method described in Figures 1 to 6.
  • the device includes: a graph data acquisition module 802, which is used to acquire the first graph data corresponding to the target classification task; the first graph data includes P target nodes, the features of the target nodes information, and edge data between the target nodes, the P target nodes include N labeled nodes, P and N are both integers greater than 1 and N is less than P; a model training module 804, used to input the first graph data into the classification model to be trained for model iterative training to obtain a trained classification model; wherein the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer; the specific implementation method of each model training is: the node screening layer determines the first dominant node information based on the edge data and the feature information; the first dominant node information corresponds to K dominant nodes selected from the P target nodes; K is greater than 1 and An integer less than P; the feature aggregation layer calculates the feature similarity between the target node and the neighboring nodes of the target node among the K
  • the data processing device in the embodiment of the present application first screens out the dominant node information based on the node edge data and the node feature information, that is, locates K dominant nodes with rich structure and high feature discrimination, and then calculates the feature similarity based on the relevant information of the dominant node.
  • the non-dominant node will not participate in the calculation of the feature similarity as the dominant neighborhood node of a certain central node (that is, any target node), thereby achieving the sparseness of the neighborhood nodes of the central node, thereby reducing the amount of calculation of the node feature similarity (that is, the amount of calculation of the attention score of the central node and some neighborhood nodes is omitted), thereby improving the model training efficiency;
  • the amount of calculation of the node feature similarity can be accurately controlled to be relatively small in the model training stage, therefore, for the case where the classification model training and the node category prediction are completed together, that is, the target node can include the node to be classified, the category prediction efficiency of the node to be classified can also be ensured, and since the model parameters used for the category prediction are obtained by iteratively updating the parameters of the node to be classified as an unlabeled node, the accuracy of the model parameters is improved, thereby achieving the effect of taking into account both
  • FIG. 9 is a schematic diagram of the module composition of the category identification device provided in the embodiment of the present application. The device is used to execute the category identification method described in FIG. 7a. As shown in FIG.
  • the device includes: a graph data acquisition module 902, used to acquire second graph data corresponding to the target classification task; the second graph data includes X nodes to be classified and Q sample nodes, each of the nodes to be classified is used to represent an object to be classified, and the object to be classified includes any one of an article to be classified and a user to be classified, and X and Q are both integers greater than or equal to 1; a category prediction module 904, used to input the second graph data into the trained classification model for node category prediction, and obtain the predicted category information of the node to be classified; wherein , the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer; the specific implementation method of the node category prediction is: the node screening layer determines the second dominant node information based on the second graph data; the second dominant node information corresponds to L dominant nodes selected from (X+Q) nodes; L is an integer greater than 1 and less than (X+Q); the feature aggregation layer calculates the feature similar
  • the category recognition device in the embodiment of the present application can first screen out the dominant node information based on the node edge data and the node feature information, that is, locate multiple dominant nodes with rich structure and high feature distinction, and then calculate the feature similarity based on the relevant information of the dominant nodes, so that the non-dominant nodes will not participate in the calculation of feature similarity as the dominant neighboring nodes of a certain central node (that is, any target node), so as to achieve the central node.
  • the neighborhood nodes are sparsified to reduce the amount of calculation of node feature similarity (i.e., the amount of calculation of attention scores between the central node and some neighborhood nodes is omitted), thereby improving the efficiency of category prediction.
  • the accuracy of the trained model parameters is higher, thereby making the accuracy of category prediction higher.
  • an embodiment of the present application further provides a computer device for executing the above data processing method, as shown in FIG. 10 .
  • Computer devices may have relatively large differences due to different configurations or performances, and may include one or more processors 1001 and memory 1002, and the memory 1002 may store one or more storage applications or data.
  • the memory 1002 may be a short-term storage or a persistent storage.
  • the application stored in the memory 1002 may include one or more modules (not shown in the figure), and each module may include a series of computer executable instructions in the computer device.
  • the processor 1001 may be configured to communicate with the memory 1002 and execute a series of computer executable instructions in the memory 1002 on the computer device.
  • the computer device may also include one or more power supplies 1003, one or more wired or wireless network interfaces 1004, one or more input and output interfaces 1005, one or more keyboards 1006, etc.
  • the computer device includes a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions in the computer device, and the one or more programs are configured to be executed by one or more processors, including the following computer executable instructions: obtaining first graph data corresponding to a target classification task; the first graph data includes P target nodes, feature information of the target nodes, and the target
  • the method comprises the following steps: the first graph data is input into a classification model to be trained for iterative model training to obtain a trained classification model; wherein the classification model comprises a node screening layer, a feature aggregation layer, and a category prediction layer; the specific implementation methods of each model training are as follows: the node screening layer determines the first dominant node information based on the edge data and the feature information; the first dominant node information corresponds to K dominant nodes selected from the P target nodes; K is an integer greater than 1 and less than P; the
  • the computer device includes a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer executable instructions in the computer device, and the one or more programs are configured to be executed by one or more processors, including the following computer executable instructions: obtaining second graph data corresponding to the target classification task; the second graph data includes X nodes to be classified and Q sample nodes, each of the nodes to be classified is used to represent an object to be classified, and the object to be classified includes any one of articles to be classified and users to be classified, and X and Q are both integers greater than or equal to 1; inputting the second graph data into the trained classification model to predict the node category, and obtaining the predicted category information of the node to be classified; wherein the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer; the specific implementation method of the node category prediction is: the node screening layer determines the second dominant node
  • the computer device in the embodiment of the present application first screens out the dominant node information based on the node edge data and the node feature information, that is, locates K dominant nodes with rich structure and high feature distinction, and then calculates the feature similarity based on the relevant information of the dominant node.
  • the non-dominant node will not participate in the calculation of the feature similarity as the dominant neighborhood node of a certain central node (that is, any target node), thereby achieving sparseness of the neighborhood nodes of the central node, thereby reducing the amount of calculation of the node feature similarity (that is, eliminating the amount of calculation of the attention scores of the central node and some neighborhood nodes), thereby improving the model training efficiency;
  • the amount of calculation of the node feature similarity can be accurately controlled to be relatively small in the model training stage, therefore, for the case where the classification model training and node category prediction are completed together, that is, the target node can include the node to be classified, the category prediction efficiency of the node to be classified can also be ensured, and since the model parameters used for category prediction are obtained by iteratively updating the parameters of the node to be classified as an unlabeled node, the accuracy of the model parameters is improved, thereby achieving the effect of taking into account both the category prediction efficiency and the
  • the embodiment of the present application also provides a storage medium for storing computer executable instructions.
  • the storage medium can be a USB flash drive, a CD, a hard disk, etc.
  • the computer executable instructions stored in the storage medium can implement the following process when executed by the processor: obtain the first graph data corresponding to the target classification task; the first graph data includes P target nodes, feature information of the target nodes, and the number of edges between the target nodes
  • the P target nodes include N labeled nodes, P and N are both integers greater than 1 and N is less than P;
  • the first graph data is input into the classification model to be trained for iterative model training to obtain the trained classification model; wherein the classification model includes a node screening layer, a feature aggregation layer, and a category prediction layer;
  • the specific implementation methods of each model training are as follows: the node screening layer determines the first dominant node information based on the edge data and the feature information; the first dominant node information corresponds to K dominant nodes selected from the P target nodes; K is an integer greater than 1 and less than P; the feature aggregation layer calculates the feature similarity between the target node and the neighboring nodes of the target node in the K
  • the storage medium may be a USB flash drive, a CD, a hard disk, etc.
  • the computer executable instructions stored in the storage medium can implement the following process when executed by the processor: obtaining the second graph data corresponding to the target classification task; the second graph data includes X nodes to be classified and Q sample nodes, each of which is used to represent an object to be classified, and the object to be classified includes any one of articles to be classified and users to be classified, and X and Q are both integers greater than or equal to 1; inputting the second graph data into the trained classification model to predict the node category, and obtaining the predicted category information of the node to be classified; wherein the classification model includes node screening layer, feature aggregation layer, and category prediction layer; the specific implementation methods of the node category prediction are: the node screening layer determines the second dominant node information based on the second graph data; the second dominant node information corresponds to L dominant nodes selected from (X+Q) nodes; L is an integer greater than 1 and less than (X+Q); the
  • the dominant node information is first screened out based on the node edge data and the node feature information, that is, K dominant nodes with rich structure and high feature distinction are located, and then the feature similarity is calculated based on the relevant information of the dominant nodes.
  • non-dominant nodes will not participate in the calculation of feature similarity as the dominant neighborhood nodes of a certain central node (that is, any target node), so as to achieve sparseness of the neighborhood nodes of the central node, thereby reducing the amount of calculation of node feature similarity (that is, the amount of calculation of attention scores of the central node and some neighborhood nodes is omitted), thereby improving the model training efficiency;
  • the amount of calculation of node feature similarity can be accurately controlled to be relatively small in the model training stage, therefore, for the case where the classification model training and node category prediction are completed together, that is, the target node can include the node to be classified, the category prediction efficiency of the node to be classified can also be ensured, and since the model parameters used for category prediction are obtained by iteratively updating the parameters of the node to be classified as an unlabeled node, the accuracy of the model parameters is improved, thereby achieving the effect of taking into account both category prediction efficiency and prediction accuracy.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • Memory may include non-permanent memory in a computer-readable medium, random access memory (RAM) and/or non-volatile memory in the form of read-only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media that can store information by any method or technology. The information can be computer-readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (ELPROM), and other types of random access memory (RAM).
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • ELPROM electrically erasable programmable read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory media such as modulated data signals and carrier waves.
  • Embodiments of the present application may be described in the general context of computer executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • One or more embodiments of the present application may also be practiced in distributed computing environments, in which tasks are performed by remote processing devices connected through a communication network.
  • program modules may be located in local and remote computer storage media, including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请的数据处理方法、类别识别方法及计算机设备,先基于节点边数据和特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的K个主导节点,再基于主导节点的相关信息计算特征相似度;另一方面,由于节点特征相似度的计算量比较小,因此,针对模型训练与节点类别预测一并完成的情况,即目标节点可以包括待分类节点,并且,类别预测所使用的模型参数是将待分类节点作为无标签节点进行参数迭代更新得到的。

Description

数据处理方法、类别识别方法及计算机设备
交叉引用
本发明要求在2022年12月08日提交中国专利局、申请号为202211570272.6、发明名称为“数据处理方法、类别识别方法及计算机设备”的中国专利申请的优先权,该申请的全部内容通过引用结合在本发明中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种数据处理方法、类别识别方法及计算机设备。
背景技术
随着人工智能技术的不断发展,由于图数据具有更多地结构化数据,如果图数据作为模型的输入数据将更有助于模型参数学习,随之图神经网络模型应运而生,在一些情形中,图神经网络模型可以作为分类模型,用于对图数据中无标签节点进行类别预测,例如,图数据中包含已知类别的有标签节点和未知类别的无标签节点,利用图神经网络模型可以对图数据中的无标签节点进行类别预测;例如,无标签节点包括文章节点,利用图神经网络模型识别待分类文章的类别,又如,无标签节点包括用户节点,利用图神经网络模型识别用户是否为风险用户,或者识别用户是否为某一待推送产品的潜在使用用户。
然而,在图神经网络模型的训练过程中,考虑引入注意力机制,通常需要计算相邻两个节点之间的特征相似度,进而将该特征相似度作为权重系数,基于一个目标节点的特征信息和对应的权重系数,对另一个目标节点进行特征聚合,即根据邻域节点与目标节点之间的特征相似程度对不同的邻域节点 分配不同的权重值,再基于聚合后的特征信息对节点的类别进行预测;其中,由于需要遍历图数据中所有节点,具有连接边的两个节点之间的特征相似度均需要逐一计算,因此,在一些情形下,分类模型(如图神经网络模型)的训练过程中目标节点所表征的目标对象(如目标文章、目标用户等)之间的特征相似度的计算量比较大,导致用于识别目标对象类别(如识别文章类别、识别用户类别等)的分类模型的训练耗时比较长。
发明内容
本申请的目的是提供一种数据处理方法、类别识别方法,从而提高待分类对象类别(如文章类别、用户类别等)的预测准确度。
为了实现上述技术方案,本申请实施例是这样实现的:
一方面,本申请提供的一种数据处理方法,该方法应用于分类模型,所述分类模型包括节点筛选层、特征聚合层、类别预测层,所述方法包括:所述节点筛选层基于图数据的目标节点的边数据和特征信息,确定主导节点的主导节点信息;所述目标节点包括有标签节点,所述有标签节点包括所述主导节点;所述特征聚合层基于所述主导节点信息,计算所述目标节点与所述目标节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息进行特征聚合,得到所述目标节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定节点类别预测结果,所述节点类别预测结果包括所述有标签节点的预测类别标签;基于所述有标签节点的预测类别标签和真实类别标签,对所述分类模型进行参数迭代更新,得到训练后的分类模型。
一方面,本申请提供的一种类别识别方法,该方法应用于分类模型,所述分类模型包括节点筛选层、特征聚合层、类别预测层,所述方法包括:所述节点筛选层基于图数据,确定主导节点的主导节点信息,所述图数据包括待分类节点和样本节点,所述待分类节点和所述样本节点包括所述主导节点; 所述特征聚合层基于所述主导节点信息,计算所述待分类节点与所述待分类节点的邻域节点之间的特征相似程度,并基于所述特征相似程度和所述邻域节点的特征信息进行特征聚合,得到所述待分类节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定所述待分类节点的预测类别信息。
一方面,本申请提供的一种数据处理装置,该装置设置有分类模型,所述分类模型包括节点筛选层、特征聚合层、类别预测层,所述装置包括:所述节点筛选层基于图数据的目标节点的边数据和特征信息,确定主导节点的主导节点信息;所述目标节点包括有标签节点,所述有标签节点包括所述主导节点;所述特征聚合层基于所述主导节点信息,计算所述目标节点与所述目标节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息进行特征聚合,得到所述目标节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定节点类别预测结果,所述节点类别预测结果包括所述有标签节点的预测类别标签;基于所述有标签节点的预测类别标签和真实类别标签,对所述分类模型进行参数迭代更新,得到训练后的分类模型。
一方面,本申请提供的一种类别识别装置,该装置设置有分类模型,所述分类模型包括节点筛选层、特征聚合层、类别预测层,所述装置包括:所述节点筛选层基于图数据,确定主导节点的主导节点信息,所述图数据包括待分类节点和样本节点,所述待分类节点和所述样本节点包括所述主导节点;所述特征聚合层基于所述主导节点信息,计算所述待分类节点与所述待分类节点的邻域节点之间的特征相似程度,并基于所述特征相似程度和所述邻域节点的特征信息进行特征聚合,得到所述待分类节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定所述待分类节点的预测类别信息。
一方面,本申请提供的一种计算机设备,所述设备包括:处理器;以及 被安排成存储计算机可执行指令的存储器,所述可执行指令被配置由所述处理器执行,所述可执行指令包括用于执行如上述方法中的步骤。
一方面,本申请提供的一种存储介质,其中,所述存储介质用于存储计算机可执行指令,所述可执行指令使得计算机执行如上述方法中的步骤。
一方面,本申请实施例提供了一种计算机程序产品,其中,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如上述方法。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的数据处理方法的第一种流程示意图;
图2a为本申请实施例提供的数据处理方法的第二种流程示意图;
图2b为本申请实施例提供的数据处理方法中每次模型训练过程的流程示意图;
图3为本申请实施例提供的数据处理方法的第一种实现原理示意图;
图4为本申请实施例提供的数据处理方法的第二种实现原理示意图;
图5为本申请实施例提供的数据处理方法的第三种实现原理示意图;
图6为本申请实施例提供的数据处理方法的第四种实现原理示意图;
图7a为本申请实施例提供的类别识别方法的第一流程示意图;
图7b为本申请实施例提供的类别识别方法的第二流程示意图;
图8为本申请实施例提供的数据处理装置的模块组成示意图;
图9为本申请实施例提供的类别识别装置的模块组成示意图;
图10为本申请实施例提供的计算机设备的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请一个或多个中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一个或多个一部分实施例,而不是全部的实施例。基于本申请一个或多个中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请的保护范围。
需要说明的是,在不冲突的情况下,本申请中的一个或多个实施例以及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请实施例。
本申请一个或多个实施例提供了一种数据处理方法、类别识别方法,考虑到通常为了提高分类模型(如图神经网络模型)的类别预测准确度,通常会增加图数据中的节点数量,这样会增加节点特征相似度的计算量,因此,为了减少节点特征相似度的计算量,如果采用对某一目标节点的邻域节点进行随机采样,仅计算目标节点与随机采样的邻域节点之间的特征相似度,再基于不同的权重系数将随机采样的邻域节点的特征聚合到该目标节点上,这样势必存在随机采样的邻域节点的准确度低的问题,例如,如果将目标节点的主导邻域节点(如结构信息丰富且特征信息区分度高的邻域节点)筛选掉,将导致目标节点的特征聚合准确度低,从而导致目标节点的预测类别标签的准确度低,进而导致模型参数的迭代更新准确度低;基于上述问题,本技术方案恰恰是借助自注意力机制的长尾分布特点,即自注意力机制存在稀疏性,较少的点积贡献了绝大部分的注意力分数,而其余部分点积可以忽略,也即对某一中心节点的贡献度比较大的部分邻域节点可以作为主导邻域节点,而其余对某一中心节点的贡献度比较小的非主导邻域节点可以忽略,因此,通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富 且特征区分度高的K个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高模型训练效率;另一方面,由于在模型训练阶段就能够精准地控制节点特征相似度的计算量比较小,因此,针对分类模型训练与节点类别预测一并完成的情况,即目标节点可以包括待分类节点,也能够确保待分类节点的类别预测效率,并且,由于类别预测所使用的模型参数是将待分类节点作为无标签节点进行参数迭代更新得到的,从而能够提高模型参数的准确度,达到同时兼顾类别预测效率和预测准确度的效果。
图1为本申请一个或多个实施例提供的数据处理方法的第一种流程示意图,图1中的方法能够由设置有分类模型训练装置的电子设备执行,该电子设备可以是终端设备或者指定服务器,该分类模型可以应用于任一需要对图数据中某一节点进行分类的应用场景,例如,对图数据中待测文章节点进行类别预测,又如,对图数据中待识别用户节点进行类型预测(如识别贷款业务申请用户是否为风险用户、或识别预设应用的注册用户是否为目标商品的推送用户等);具体的,上述数据处理方法应用于分类模型,该分类模型包括节点筛选层、特征聚合层、类别预测层,如图1所示,上述数据处理方法至少包括以下步骤:
S100,节点筛选层基于图数据的目标节点的边数据和特征信息,确定主导节点的主导节点信息;所述目标节点包括有标签节点,所述有标签节点包括所述主导节点。
上述图数据可以包括目标分类任务对应的第一图数据,第一图数据可以包括目标节点、目标节点的特征信息、以及目标节点之间的边数据。
S200,特征聚合层基于上述主导节点信息,计算目标节点与该目标节点的邻域节点之间的特征相似程度;并基于上述特征相似程度和上述邻域节点 的特征信息进行特征聚合,得到目标节点聚合后的特征信息。
上述邻域节点可以是上述主导节点中目标节点的邻域节点,即目标节点的邻域节点是指,在筛选出的多个主导节点中,与该目标节点存在连接边的主导节点,也可以称为目标节点的主导邻域节点。
S300,类别预测层基于上述聚合后的特征信息,确定节点类别预测结果,该节点类别预测结果包括有标签节点的预测类别标签。
S400,基于上述有标签节点的预测类别标签和真实类别标签,对待训练的分类模型进行参数迭代更新,得到训练后的分类模型。
在一些示例实施例中,以上述图数据包括目标分类任务对应的第一图数据为例,如图2a所示,上述数据处理方法至少包括以下步骤:
S102,获取目标分类任务对应的第一图数据;该第一图数据包括P个目标节点、目标节点的特征信息、以及目标节点之间的边数据,P个目标节点包括N个有标签节点,P、N均为大于1的整数且N小于P。
上述目标分类任务可以包括文章分类任务、风险用户分类任务、推送用户分类任务中任一项,对应的,上述目标节点用于表征一个目标对象,该目标对象包括目标文章、目标用户中任一项,目标节点的特征信息包括目标节点所表征的目标对象的特征信息;例如,目标用户的特征信息可以包括基础特征信息和业务相关特征信息中任一项,基础特征信息可以包括用户年龄、用户性别、用户职业中任一项,业务相关特征信息可以包括响应目标用户对目标业务的业务请求所产生的相关特征信息,目标业务包括线上购物业务或者贷款业,如目标业务为贷款业务,业务相关特征信息可以包括贷款金额、贷款方式、贷款额度等,如目标有任务为显示购物业务,业务相关特征信息可以包括用户支付频率、用户支付金额、用户收货地址等;又如,目标文章的特征信息可以包括关键词、引用文章、文章作者中任一项。
目标节点之间的边数据包括目标节点所表征的目标对象之间的连接边(即说明目标对象之间具有预设关联关系),以目标对象为目标用户为例,若 目标用户A与目标用户B发起业务申请时使用的设备号相同,则认为目标用户之间具有预设关联关系;若目标用户A的征信预留手机号与业务申请目标用户B的联系人手机号相同,则认为目标用户之间具有预设关联关系;若目标用户A在业务申请时填写的申请手机号与业务申请目标用户B的联系人手机号相同,则认为目标用户之间具有预设关联关系;若目标用户A的某一银行卡的绑定手机号与业务申请目标用户B的联系人手机号相同,则认为目标用户之间具有预设关联关系。
上述有标签节点用于表征一个具有类别标签的目标对象;针对图数据中目标节点的命名而言,从模型使用阶段(模型训练或类别预测)的角度出发,上述目标节点可以是样本节点,也可以是待分类节点;从节点是否具有类别标签的角度出发,上述目标节点可以是有标签节点,也可以是无标签节点,通常情况下,有标签节点作为样本节点,参与模型损失值的计算,无标签节点可以是样本节点,参与特征聚合处理,无标签样本也可以是待分类节点;另外,针对特征聚合过程,目标节点可以作为中心节点或者邻域节点(即基于邻域节点的特征信息更新中心节点的特征信息),从节点自身重要性程度的角度出发,目标节点可以分为主导节点或者非主导节点,从邻域节点对中心节点的重要程度的角度出发,邻域节点可以分为主导邻域节点和非主导邻域节点;该节点命名方式可以根据实际需求进行调整,并不限制本申请的保护范围。
上述N小于或等于P,若N可以等于P,P个目标节点均为有标签节点,然而,为了在不需要大量有标签节点的情况下,确保节点特征聚合效果,通常会在图数据中增加多个无标签节点,即P个目标节点不仅包括有标签节点,还包括无标签节点,N小于P,这样可以基于无标签节点的特征信息对有标签节点进行特征聚合,从而增强有标签节点的特征表示,从而提高基于聚合后节点特征进行类别预测的准确度。
S104,将上述第一图数据输入至待训练的分类模型进行模型迭代训练, 得到训练后的分类模型。
上述分类模型可以是图神经网络模型(如图注意力网络模型),基于第一图数据对模型参数进行迭代更新,直到当前模型训练结果满足预设模型训练结束条件,得到训练后的分类模型;上述预设模型训练结束条件可以包括:当前模型训练轮数等于总训练轮数、或者模型损失函数收敛。
针对上述步骤S104中的模型迭代训练过程,下述对模型迭代训练的具体实现过程进行说明,由于模型迭代训练过程中每次模型训练的处理过程相同,因此,以任意一次模型训练为例进行细化说明。其中,上述分类模型可以包括节点筛选层、特征聚合层、类别预测层;如图2b所示,每次模型训练的具体实现方式均可以有如下步骤S1042至步骤S1048:
S1042,节点筛选层基于第一图数据对应的边数据和特征信息,确定第一主导节点信息;其中,第一主导节点信息对应于从P个目标节点中选取的K个主导节点;K为大于1且小于P的整数。
从目标节点的结构信息丰富程度和特征信息区分度这两个维度,在P个目标节点中筛选出K个主导节点,进而基于K个主导节点的相关信息(如特征重要性分数或者节点特征信息)确定第一主导节点信息,即上述第一主导节点信息可以包括节点特征信息和特征重要性分数中任一项;在一些示例实施例中,可以先基于特征信息进行特征重要性评估,再基于边数据进行结构重要性评估,筛选出K个主导节点;在一些示例实施例中,也可以先基于边数据进行结构重要性评估,再基于特征信息进行特征重要性评估,筛选出K个主导节点;在一些示例实施例中,还可以基于特征信息进行特征重要性评估,从P个目标节点中筛选出K1个候选节点,以及基于结构信息进行结构重要性评估,从P个目标节点中筛选出K2个候选节点,再将K1个候选节点和K2个候选节点进行节点去重处理,得到K个主导节点。
S1044,特征聚合层基于上述第一主导节点信息,计算目标节点与K个主导节点中该目标节点的邻域节点之间的特征相似程度;并基于上述特征相 似程度和对应的邻域节点的特征信息,对目标节点进行特征聚合,得到目标节点聚合后的特征信息;其中,上述K个主导节点中所述目标节点的邻域节点是指在K个主导节点中,与所述目标节点存在连接边的主导节点,即在K个主导节点中,与所述目标节点具有预设关联关系的主导节点,也即所述目标节点的主导邻域节点。
在确定出上述第一主导节点信息后,以任一目标节点为中心节点,计算该中心节点与主导邻域节点(即在K个主导节点中,该中心节点的邻域节点)之间的特征相似程度,再将该特征相似程度作为权重系数,基于主导邻域节点的特征信息,更新中心节点的特征信息;其中,由于中心节点的主导邻域节点并非所有邻域节点,而是属于K个主导节点的部分邻域节点,因此,可以将计算量降至目标节点与主导邻域节点的计算量,而省去了目标节点与非主导邻域节点(即(P-K)个非主导节点中该目标节点的邻域节点)的计算量。
在特征聚合的过程中,(P-K)个非主导节点可以参与,也可以不参与,对于(P-K)个非主导节点参与特征聚合的情况,可以将中心节点与非主导邻域节点之间的权重系数设置目标数值(如预设值或主导邻域节点对应的权重系数中最小值),对应的,基于主导邻域节点对应的特征相似程度和特征信息、以及非主导邻域节点对应的权重系数和特征信息,对目标节点进行特征聚合,得到目标节点聚合后的特征信息。
S1046,类别预测层基于上述聚合后的特征信息,确定节点类别预测结果;其中,节点类别预测结果包括N个有标签节点的预测类别标签。
在利用特征聚合层对各目标节点进行特征更新后,将聚合后的特征信息作为目标节点的最终特征信息,再基于最终特征信息对目标节点进行类别预测,得到各目标节点对应的预测类别标签。
S1048,基于N个有标签节点的预测类别标签和真实类别标签,对待训练的分类模型进行参数更新。
针对每个有标签节点,基于该有标签节点的预测类别标签和真实类别标签,计算损失值,再基于N个有标签节点分别对应的损失值,确定总损失值;然后,基于该总损失值对模型参数进行更新;需要说明的是,基于待训练模型的总损失值对模型参数进行迭代训练,可以参见已有的利用梯度下降方法反向传播对模型参数进行调优的过程,在此不再赘述。
本申请实施例中,通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的K个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高模型训练效率;另一方面,由于在模型训练阶段,K个主导节点是基于节点边数据和节点特征信息筛选出来的,并非随机从P个目标节点中选取出来的,这样不仅能够减少特征相似度的计算量,还能够精准地决定需要计算哪些节点之间的特征相似度(即任一目标节点所表征的目标对象与其主导邻域节点所表征的目标对象之间的特征相似度),并忽略其他节点之间的特征相似度(如任一目标节点所表征的目标对象与其非主导邻域节点所表征的目标对象之间的特征相似度),因此,针对分类模型训练与对象类别预测一并完成的情况,即目标节点可以包括待分类节点,也能够确保待分类节点的类别预测效率(即待分类节点所表征的待分类对象类别的预测效率),并且,由于类别预测所使用的模型参数是将用于表征待分类对象的待分类节点作为无标签节点进行参数迭代更新得到的,使得待分类对象的特征信息和边数据影响了分类模型的模型参数更新,因此利用该分类模型对待分类对象进行类别预测的准确度更高,进而达到同时兼顾待分类对象的类别预测效率和预测准确度的效果。
以目标分类任务为风险用户识别任务为例,其中,上述分类模型为图神经网络模型,图神经网络模型包括节点筛选层、特征聚合层、类别预测层; 如图3所示,给出了图神经网络模型的训练方法的具体实现过程,具体包括:
获取风险用户识别任务对应的目标图数据(即上述第一图数据);其中,目标图数据包括P个用户节点、用户节点的特征信息、以及用户节点之间的边数据,P个用户节点包括N个有标签节点,P、N均为大于1的整数且N小于P;具体的,边数据包括具有预设关联关系的用户节点之间的连接边,预设关联关系可以包括目标用户发起业务申请时使用的设备号相同、第一用户的征信预留手机号与申请业务的第二用户的联系人手机号相同、第一用户在申请业务时填写的申请手机号与申请业务的第二用户的联系人手机号相同、第一用户的某一银行卡的绑定手机号与申请业务的第二用户的联系人手机号相同中任一项。
将上述目标图数据输入至待训练的图神经网络模型进行模型迭代训练,得到训练后的图神经网络模型作为分类模型;其中,每次模型训练的具体实现方式均可以有:上述节点筛选层基于目标图数据的节点之间的边数据和特征信息,确定主导用户节点信息(即上述第一主导节点信息),也就是说,节点筛选层基于目标图数据的节点之间的边数据和特征信息进行结构重要性评估以及特征重要性评估,得到主导用户节点信息;其中,主导用户节点信息对应于从P个用户节点中选取的K个主导用户节点;K为大于1且小于P的整数;上述特征聚合层基于上述主导用户节点信息,计算任一用户节点与K个主导用户节点中该用户节点的邻域节点之间的特征相似程度;并基于上述特征相似程度和对应的邻域节点的特征信息,对该用户节点进行特征聚合,得到用户节点聚合后的特征信息;上述类别预测层基于上述聚合后的特征信息,确定N个有标签节点的预测类别标签,其中,N个有标签节点包括有标签节点1,…,有标签节点N。
基于N个有标签节点的预测类别标签和真实类别标签,确定总损失值,并基于总损失值对待训练的图神经网络模型进行参数更新。
需要说明的是,针对目标分类任务为文章分类任务或者推送用户分类任 务的具体实现过程,参见上述目标分类任务为风险用户分类任务的具体实现过程,在此不再赘述。
考虑到从目标节点的结构信息丰富程度和特征信息区分度这两个维度,筛选出K个主导节点的具体实现过程,在一些示例实施例,可以先基于特征信息进行特征重要性评估,再基于边数据进行结构重要性评估,在另一些示例实施例中,也可以先基于边数据进行结构重要性评估,再基于特征信息进行特征重要性评估,由于结构信息能够更加直观地影响特征聚合的准确度,因此,以先执行结构重要性评估,再执行特征重要性评估为例,对于第一主导节点信息的确定过程,上述S1042中的基于第一图数据对应的边数据和特征信息,确定第一主导节点信息,具体包括:
步骤A1,基于第一图数据的节点之间的边数据对P个目标节点进行结构重要性评估,得到第一评估结果。
针对每个目标节点,基于该目标节点与邻域节点之间的连接边的数量,对该目标节点的结构重要性进行打分,得到结构重要性分数;其中,目标节点与邻域节点之间的连接边的数量越多,结构重要性分数越大;在具体实施时,由于针对一个固定的图数据而言,该图数据的结构信息不会变化,因此,在多轮模型训练的过程中,上述步骤A1可以仅计算一次即可。
在一些示例实施例中,可以直接基于每个目标节点的结构信息进行结构重要性评估,也可以基于P个目标节点对应的邻接矩阵进行特征重要性评估。
步骤A2,基于上述第一评估结果和第一图数据对应的特征信息,确定候选节点特征信息;上述候选节点特征信息包括从P个目标节点中选取的M个候选节点的特征信息,M大于K且小于P。
由于第一评估结果包括各目标节点的结构重要性分数,因此,可以基于该结构重要性分数,从P个目标节点中筛选出候选节点集合(包括M个候选节点);又考虑到后续需要基于候选节点的特征信息,继续对节点特征重要性进行评估,因此,需要先定位到M个候选节点的特征信息,以便为后续计算 M个候选节点的特征重要性分数提供基础数据。
步骤A3,基于上述候选节点特征信息对M个候选节点进行特征重要性评估,得到第二评估结果。
在确定出候选节点特征信息后,针对每个候选节点,基于该候选节点的特征信息,对该候选节点的特征重要性进行打分,得到特征重要性分数;其中,目标节点的特征信息区分度越高(即包含更多有助于识别节点类别的关键信息),特征重要性分数越大。
在一些示例实施例中,可以直接基于每个候选节点的特征信息进行结构重要性评估,也可以基于M个候选节点的特征矩阵进行结构重要性评估。
步骤A4,基于上述第二评估结果,确定第一主导节点信息;上述第一主导节点信息包括从M个候选节点中选取的K个主导节点的相关信息,所述相关信息包括特征重要性分数和节点特征信息中任一项。
由于第二评估结果包括各候选节点的特征重要性分数,因此,可以基于该特征重要性分数,从M个候选节点中筛选出主导节点集合(包括K个主导节点);又考虑到后续需要基于第一主导节点信息,计算节点特征相似程度,因此,需要基于上述K个主导节点的相关信息,确定第一主导节点信息,以便为后续计算节点特征相似程度提供基础数据;其中,第一主导节点信息可以视为经过节点结构重要性评估和节点特征重要性评估,筛选出的结构信息丰富且特征信息区分度高的K个主导节点的相关信息。
在上述图3的基础上,对于主导用户节点信息的确定过程,如图4所示,给出了数据处理方法的具体实现过程,具体包括:基于目标图数据的节点之间的边数据对P个用户节点进行结构重要性评估,得到第一评估结果;基于上述第一评估结果和目标图数据对应的特征信息,确定候选节点特征信息;上述候选节点特征信息包括从P个用户节点中选取的M个候选用户节点的特征信息;基于上述候选节点特征信息对M个候选用户节点进行特征重要性评估,得到第二评估结果;基于上述第二评估结果,确定主导用户节点信息; 上述主导用户节点信息包括从M个候选用户节点中选取的K个主导用户节点的相关信息,所述相关信息包括特征重要性分数和节点特征信息中任一项。
针对从节点结构丰富程度出发,来减少节点特征相似度的计算量的具体实现方式,为了简化节点结构重要性评估的处理过程,提高节点结构重要性评估维度对应的第一评估结果的准确度,考虑到邻接矩阵中每个元素的取值均表征两个节点之间是否具有连接关系,因此,可以充分借助第一图数据对应的邻接矩阵,对P个目标节点的结构丰富程度进行打分,进而得到P个目标节点分别对应的结构重要性分数,然后基于P个节点结构重要性分数生成第一评估结果,基于此,上述步骤A1,基于第一图数据的节点之间的边数据对P个目标节点进行结构重要性评估,得到第一评估结果,具体包括:
步骤A11,基于第一图数据对应的邻接矩阵和第一参考矩阵,确定第一分数矩阵。
上述邻接矩阵是基于第一图数据的节点之间的边数据得到的,上述第一参考矩阵为包含P个预设值的列矩阵,上述第一分数矩阵为包含P个目标节点的结构重要性分数的列矩阵。
在一些示例实施例中,可以预先基于第一图数据的节点之间的边数据生成邻接矩阵,也可以实时基于第一图数据的节点之间的边数据生成邻接矩阵;其中,邻接矩阵中每个元素的取值表示两个节点之间的连接关系(即是否具有连线),例如,邻接矩阵为(P×P)矩阵A,若邻接矩阵中某一元素aij=1,则说明第i个目标节点与第j个目标节点之间具有连线,若邻接矩阵中某一元素aij=0,则说明第i个目标节点与第j个目标节点之间不具有连线。
上述第一参考矩阵中P个元素的取值均为预设值,该预设值可以是1,也可以是大于1的整数;由于第一图数据对应的邻接矩阵为(P×P)矩阵,该(P×P)矩阵中某一行的P个元素构成的(1×P)行矩阵,该行矩阵表示某一目标节点与自身、以及其他节点之间的连接关系;以第一参考矩阵中P个元素的取值均等于1为例,即第一参考矩阵为元素全为1的(P×1)列矩阵, 这样某一目标节点对应的(1×P)行矩阵与(P×1)列矩阵(即第一参考矩阵)的乘积,得到的数值表示与该目标节点具有连接关系的邻域节点的数量,该数值越大,说明该目标节点的邻域节点越多,即该目标节点的结构信息越丰富,因此,可以将该数值视为该目标节点的结构重要性分数;对应的,第一图数据对应的邻接矩阵与第一参考矩阵相乘,得到第一分数矩阵,该第一分数矩阵也为(P×1)列矩阵,且第一分数矩阵中元素的取值与目标节点的结构重要性分数一一对应,即第一分数矩阵中一个元素的取值为对应目标节点的结构重要性分数。
在一些示例实施例中,可以采用下述公式,对邻接矩阵和第一参考矩阵进行相乘,得到第一分数矩阵,可以为:
S=AE
其中,S表示第一分数矩阵(可以作为节点结构重要性分数向量),A表示邻接矩阵(可以作为表征节点结构信息的方阵),E表示第一参考矩阵(可以为元素全为预设值的列矩阵,如预设值为1),即(P×P)的邻接矩阵与(P×1)的第一参考矩阵相乘,得到(P×1)的第一分数矩阵。
步骤A12,基于上述第一分数矩阵,确定上述P个目标节点的第一评估结果。
在一些示例实施例中,可以直接将第一分数矩阵确定为第一评估结果,也可以对第一分数矩阵进行预设处理(如归一化处理、分数微调处理等),将预设处理后的第一分数矩阵作为第一评估结果。
针对从节点结构丰富程度和节点特征区分度这两个维度进行节点重要性评估,来减少节点特征相似度的计算量的具体实现方式,在确定出能够表征节点结构重要性分数的第一评估结果之后,对于基于第一评估结果确定候选节点特征信息的过程(即先筛选出M个候选节点,再通过对特征信息集合进行稀疏化,以便为后续计算M个候选节点的特征重要性分数提供基础数据), 同样地,为了简化候选节点特征信息的确定过程,提高候选节点特征信息的确定准确度,基于上述第一评估结果,并借助对应的初始特征矩阵得到初步稀疏后的节点特征矩阵的过程,基于此,上述步骤A2,基于上述第一评估结果和第一图数据对应的特征信息,确定候选节点特征信息,具体包括:
步骤A21,基于上述第一评估结果,从P个目标节点中选取M个候选节点。
由于上述第一评估结果包括P个目标节点的结构重要性分数,目标节点的结构重要性分数越高,说明目标节点的结构丰富程度越高,因此,可以基于P个目标节点的结构重要性分数的大小关系,按照分数由大到小的顺序对结构重要性分数进行排序,筛选出排序靠前的M个结构重要性分数对应的目标节点(即M个候选节点)。
上述M个候选节点可以认为是从节点结构信息丰富程度的评估维度,初步筛选出的候选节点集合。
步骤A22,基于上述第一图数据对应的初始特征矩阵和M个候选节点,确定第一目标矩阵。
上述初始特征矩阵是基于上述第一图数据对应的特征信息得到的,上述第一目标矩阵为包含M个候选节点的特征信息的特征矩阵;具体的,初始特征矩阵可以是通过对第一图数据对应的节点特征矩阵进行线性变换得到的,节点特征矩阵可以是预先基于第一图数据对应的特征信息进行转换得到的特征矩阵。
由于基于第一评估结果选取的M个候选节点的结构重要性分数比较高,即从节点结构信息丰富程度的评估维度,初步选取出结构丰富程度较高的M个候选节点(相当于主导节点的初筛过程),再基于M个候选节点对第一图数据对应的初始特征矩阵进行稀疏处理,得到初步稀疏后的节点特征矩阵(即第一目标矩阵),这样省去了节点结构丰富程度比较低的非主导节点与中心节点(即任一目标节点)之间的特征相似程度的计算量,因此,在确保节点特 征聚合准确度的情况下,大大减少了参与特征相似程度的节点数量,从而实现初步减少节点特征相似度的计算量(相当于注意力分数的初步稀疏化)。
上述第一目标矩阵可以认为是从节点结构信息丰富程度的评估维度,对特征信息进行稀疏化处理得到的目标矩阵,然后,将第一目标矩阵作为节点特征区分度的评估维度的基础数据,以便提供计算M个候选节点的特征重要性分数的依据。
在一些示例实施例中,可以采用下述公式,基于M个候选节点的节点标识对初始特征矩阵进行稀疏化处理,得到第一目标矩阵,可以为:
其中,H1表示第一目标矩阵(即从节点结构信息丰富程度的评估维度得到的初步稀疏化后的节点特征矩阵),H0表示初始特征矩阵(即线性变换后且稀疏化前的节点特征矩阵),D1表示M个候选节点的位置信息,M=c1log(P),c1为标量,即将(P×F)的初始特征矩阵转换为(M×F)的第一目标矩阵。
步骤A23,基于上述第一目标矩阵,确定第一图数据对应的候选节点特征信息。
在一些示例实施例中,可以直接将第一目标矩阵确定为候选节点特征信息,也可以对第一目标矩阵进行预设处理(如归一化处理、特征微调处理等),将预设处理后的第一目标矩阵确定为候选节点特征信息。
针对从节点特征区分度出发,来减少节点特征相似度的计算量的具体实现方式,为了简化节点特征重要性评估的处理过程,提高节点特征重要性评估维度对应的第二评估结果的准确度,考虑到通过对节点特征矩阵进行投影变换(即关注节点自身重要特征),即可实现对节点特征重要性进行打分,即节点特征矩阵对应的投影变换矩阵中某一元素的大小能够反映节点的特征重要性程度,因此,可以充分借助节点特征矩阵对应的投影变换矩阵,得到M个候选节点分别对应的特征重要性分数,然后基于M个节点特征重要性分数 生成第二评估结果;另外,由于节点特征重要性评估是在节点结构重要性评估的基础上实现的,即需要在上述候选节点特征信息的基础上,对上述M个候选节点进行特征重要性评估,对应的,若在节点结构重要性评估的过程中,直接将第一目标矩阵确定为候选节点特征信息,因此,上述候选节点特征信息包括第一目标矩阵,即基于第一目标矩阵进行节点特征重要性评估;基于此,上述步骤A3,基于上述候选节点特征信息对M个候选节点进行特征重要性评估,得到第二评估结果,具体包括:
步骤A31,基于上述第一目标矩阵和第一参数矩阵,确定第二分数矩阵。
上述第一参数矩阵为(F×1)参数矩阵,上述第二分数矩阵为包含M个候选节点的特征重要性分数的列矩阵,F为大于1的整数;具体的,第一参数矩阵与模型网络层本身的待训练参数相关,第一参数矩阵中的F个元素对应于单层图卷积计算的输出维度的网络层参数,即第一参数矩阵中一个元素对应于一个输出维度的网络参数。
在一些示例实施例中,可以采用下述公式,对第一目标矩阵进行投影变换,得到第二分数矩阵,可以为:
F1=H1W1
其中,F1表示第二分数矩阵(可以作为节点特征重要性分数向量),H1表示第一目标矩阵,W1表示第一参数矩阵(可以作为待训练的模型参数),即(M×F)的第一目标矩阵与(F×1)的第一参数矩阵相乘,得到(M×1)的第二分数矩阵。
步骤A32,基于上述第二分数矩阵,确定上述M个候选节点的第二评估结果。
在一些示例实施例中,可以直接将第二分数矩阵确定为第二评估结果,也可以对第二分数矩阵进行预设处理(如归一化处理、分数微调处理、部分元素置零处理等),将预设处理后的第一分数矩阵作为第二评估结果。
针对第二评估结果的确定过程,候选节点特征信息中包含的第一目标矩 阵可以是基于上述步骤A22得到的,也可以是采用其他处理方式得到的,本申请对此不作限定,即针对能够得到包含M个候选节点的特征信息的特征矩阵的具体处理方式,均在本申请保护范围内。
考虑到第二分数矩阵中排序靠前的特征重要性分数中可能包含分数比较小的元素,因此,如果直接基于第二分数矩阵中包含的M个特征重要性分数的大小,选取K个主导节点,那么K个主导节点中可能包含特征重要性分数比较小的节点,从而基于特征区分度比较低的节点的特征信息,对目标节点进行特征聚合,可能会降低节点特征聚合的准确度(即节点特征重要性分数比较小,说明节点特征区分度低,会降低聚合后的特征信息的区分度),因此,可以先对第二分数矩阵中特征重要性分数满足预设约束条件(如分数小于均值或小于预设阈值等)的元素进行置零处理,得到第三分数矩阵,再基于第三分数矩阵选取K个主导节点,基于此,上述步骤A32,基于上述第二分数矩阵,确定上述M个候选节点的第二评估结果,具体包括:
步骤A321,基于上述第二分数矩阵,确定分数偏差矩阵;其中,分数偏差矩阵为包含M个候选节点的特征重要性分数与分数均值的差值的列矩阵;分数均值可以是第二分数矩阵中M个特征重要性分数的平均值。
上述分数偏差矩阵中各元素的取值与上述预设约束条件(即用于限定第二分数矩阵中哪些元素置零的约束条件)有关,例如,若预设约束条件为将第二分数矩阵中特征重要性分数小于分数均值的元素置零,则分数偏差矩阵中M个元素的取值为M个候选节点的特征重要性分数与分数均值的差值;又如,若预设约束条件为将第二分数矩阵中特征重要性分数小于预设阈值的元素置零,则分数偏差矩阵中M个元素的取值为M个候选节点的特征重要性分数与预设阈值的差值;预设约束条件可以根据实际需求进行设置,无论如何设置均在本申请保护范围内。
以上述预设约束条件为将第二分数矩阵中特征重要性分数小于分数均值的元素置零为例,采用下述公式,确定分数偏差矩阵,可以为:
F=F1-mean(F1)
其中,F表示分数偏差矩阵,F1表示第二分数矩阵,mean(F1)表示均值分数矩阵(即M个元素的取值均为分数均值),即(M×1)的第二分数矩阵与(M×1)的均值分数矩阵相减,得到(M×1)的分数偏差矩阵。
步骤A322,基于上述第二分数矩阵和第二参考矩阵,确定第三分数矩阵;其中,上述第二参考矩阵是基于上述分数偏差矩阵确定的。
上述第二参考矩阵的确定过程可以包括:若上述分数偏差矩阵中任一元素的取值大于某一数值(如该数值为0),则上述第二参考矩阵中对应元素的取值为1,若上述分数偏差矩阵中任一元素的取值小于或等于某一数值(如该数值为0),则上述第二参考矩阵中对应元素的取值为0,即上述第二参考矩阵为0-1矩阵。
在一些示例实施例中,可以采用下述公式,基于分数偏差矩阵确定第二参考矩阵,可以为:
其中,F′表示第二参考矩阵(用于约束第二分数矩阵中哪些元素置零的0-1矩阵),F表示分数偏差矩阵,即(M×1)的分数偏差矩阵转换为(M×1)的0-1矩阵(即第二参考矩阵)。
上述第三分数矩阵中任一元素的取值为第二分数矩阵与第二参考矩阵中对应元素相乘,使得第二分数矩阵中特征重要性分数小于或等于分数均值的元素置零,而第二分数矩阵中特征重要性分数大于分数均值的元素保持不变。
在一些示例实施例中,可以采用下述公式,基于第二分数矩阵和第二参考矩阵进行对应元素相乘,得到第三分数矩阵,可以为:
F2′=F′*F1
其中,F2′表示第三分数矩阵,F′表示第二参考矩阵,F1表示第二分数矩阵,*表示元素相乘,即(M×1)的第二参考矩阵与(M×1)的第二分数矩阵 中对应元素相乘,得到(M×1)的第三分数矩阵,使得第二分数矩阵中特征重要性分数满足预设约束条件的元素置为零。
步骤A323,基于上述第三分数矩阵,确定上述M个候选节点的第二评估结果。
可以直接将第三分数矩阵确定为第二评估结果,这样第二评估结果中特征重要性分数小于或等于分数均值(即特征区分度比较低)的候选节点的最终特征重要性分数变为零。
由于在第二评估结果中,针对特征区分度比较低的候选节点,其对应的最终特征重要性分数变为零,因此,基于第二评估结果中特征重要性分数的大小关系,从M个候选节点中选取K个主导节点,既能够确保最终筛选出的主导节点的数量,也能够确保只有特征重要性分数大于分数均值的节点,才会作为目标节点的主导邻域节点参与特征相似度的计算,即使K个主导节点中包含特征重要性分数小于或等于分数均值的目标节点,由于在第三分数矩阵中该主导节点对应的特征重要性分数变为零,使得该主导节点的特征信息不会影响目标节点的特征聚合。
针对从节点结构丰富程度和节点特征区分度这两个维度进行节点重要性评估,来减少节点特征相似度的计算量的具体实现方式,在确定出能够表征节点特征重要性分数的第二评估结果之后,针对基于第二评估结果确定第一主导节点信息的过程(即先筛选出K个主导节点,再通过对特征信息集合或者特征重要性分数进行稀疏化,以便为后续计算节点特征相似程度提供基础数据),具体的,上述步骤A4,基于上述第二评估结果,确定第一主导节点信息,具体包括:
步骤A41,基于上述第二评估结果,从上述M个候选节点中选取K个主导节点。
上述第二评估结果可以包括上述第二分数矩阵或者上述第三分数矩阵,上述K个主导节点可以认为是先后从节点结构信息丰富程度和节点特征信息 区分度这两个评估维度,筛选出的主导节点子集中的主导节点。
步骤A42,基于上述K个主导节点,确定第二目标矩阵。
上述第二目标矩阵主要是作为节点特征相似度计算的基础数据,由于可以通过将节点特征信息进行比对,计算两个相邻节点之间的特征相似程度,考虑到节点各自的特征重要性分数也是基于节点特征信息中包含的信息内容得到的,因此还可以直接基于节点各自的特征重要性分数,计算两个相邻节点之间的特征相似程度,这样无需将节点特征信息进行比对计算,能够进一步减少信息计算量,基于此,上述第一主导节点信息包括K个主导节点的节点特征信息和特征重要性分数中任一项,对应的,上述第二目标矩阵包括K个主导节点对应的稀疏特征矩阵和第四分数矩阵中任一项,该稀疏特征矩阵为包含K个主导节点的特征信息的特征矩阵,第四分数矩阵为包含K个主导节点的特征重要性分数的分数矩阵。
上述第二目标矩阵可以认为是从节点特征信息区分度的评估维度,对特征信息或者特征重要性分数进行稀疏化处理得到的目标矩阵,然后,基于第二目标矩阵计算两个相邻节点之间的特征相似程度。
针对上述第二目标矩阵包括上述第四分数矩阵的情况,若上述第二评估结果包括第二分数矩阵,在具体实施时,采用下述公式,基于上述K个主导节点的节点标识对第二分数矩阵进行稀疏化,得到第四分数矩阵,可以为:
其中,F2表示第四分数矩阵,F1表示第二分数矩阵,D2表示K个主导节点的位置信息,即对(M×1)的第二分数矩阵进行稀疏化处理,得到(K×1)的第四分数矩阵(即第二目标矩阵)。
另外,针对上述第二目标矩阵包括上述第四分数矩阵的情况,若上述第二评估结果包括第三分数矩阵,在具体实施时,采用下述公式,基于上述K个主导节点的节点标识对第三分数矩阵进行稀疏化,得到第四分数矩阵,可以为:
其中,F2表示第四分数矩阵,F2′表示第三分数矩阵,D2表示K个主导节点的位置信息,即对(M×1)的第三分数矩阵进行稀疏化处理,得到(K×1)的第四分数矩阵(即第二目标矩阵)。
K个主导节点之外的目标节点可以视为可以忽略注意力分数的长尾节点,这些长尾节点均可以不参与节点特征相似度计算,由于长尾节点在P个目标节点中占比比较大,因此,可以大大减小节点特征相似度的计算量;另外,第四分数矩阵F2中分数不为零的节点数小于或等于K,第四分数矩阵F2中分数为零的节点也可以视为长尾节点;其中,虽然长尾节点不参与节点特征相似度计算,但长尾节点仍参与图卷积计算(即特征聚合),因此,在具体实施时,可以针对长尾节点采用统一权值作为其注意力分数(如min(F2)。
另一种实现方式,针对上述第二目标矩阵包括上述稀疏特征矩阵的情况,可以将初始特征矩阵作为特征稀疏化的基础数据,可以采用下述公式,基于K个主导节点的节点标识对初始特征矩阵进行稀疏化处理,得到稀疏特征矩阵,可以为:
其中,H2表示稀疏特征矩阵(即第二目标矩阵,从节点结构信息和节点特征信息这两个节点重要性评估维度进行特征稀疏化得到的节点特征矩阵),H0表示初始特征矩阵,D2表示K个主导节点的位置信息,K=c2log(P),c2为标量且c1>c2,即将(P×F)的初始特征矩阵转换为(K×F)的稀疏特征矩阵(即第二目标矩阵)。
另外,针对上述第二目标矩阵包括上述稀疏特征矩阵的情况,也可以将第一目标矩阵作为特征稀疏化的基础数据,在具体实施时,采用下述公式,基于K个主导节点的节点标识对第一目标矩阵进行稀疏化处理,得到稀疏特征矩阵,可以为:
其中,H2表示稀疏特征矩阵(即第二目标矩阵,从节点结构信息和节点特征信息这两个节点重要性评估维度进行特征稀疏化得到的节点特征矩阵),H1表示第一目标矩阵,D2表示K个主导节点的位置信息,K=c2log(P),c2为标量且c1>c2,即将(M×F)的第一目标矩阵转换为(K×F)的稀疏特征矩阵(即第二目标矩阵)。
针对上述第二目标矩阵包括上述稀疏特征矩阵的情况,考虑到如果直接基于K个主导节点的节点标识,对初始特征矩阵或者第一目标矩阵进行稀疏化处理,得到稀疏特征矩阵,同样存在得到的稀疏特征矩阵包含特征区分度比较低的节点的特征信息的问题,因此,为了进一步特征聚合准确度,也需要对初始特征矩阵或者第一目标矩阵中,特征重要性分数满足预设约束条件的候选节点(即第三分数矩阵中元素取值为零的元素)的特征信息进行置零处理,得到目标特征矩阵,再基于K个主导节点的节点标识对目标特征矩阵进行稀疏化处理,得到稀疏特征矩阵。
在对第一目标矩阵中,特征重要性分数满足预设约束条件的候选节点的特征信息进行置零处理,得到目标特征矩阵的基础上,然后采用下述公式,基于K个主导节点对目标特征矩阵进行稀疏化处理,得到稀疏特征矩阵,可以为:
其中,H2表示稀疏特征矩阵,H1′表示目标特征矩阵,D2表示K个主导节点的位置信息,即对(M×F)的目标特征矩阵进行稀疏化处理,得到(K×F)的稀疏特征矩阵(即第二目标矩阵)。
由于直接基于节点各自的特征重要性分数,计算两个相邻节点之间的特征相似程度,无需将节点特征信息进行比对计算,能够进一步减少信息计算量,因此,优选地,上述第二目标矩阵包括上述第四分数矩阵。
步骤A43,基于上述第二目标矩阵,确定第一图数据对应的第一主导节点信息。
可以直接将第二目标矩阵确定为第一主导节点信息,也可以对第二目标矩阵进行预设处理(如归一化处理、特征微调处理等),将预设处理后的第二目标矩阵确定为第一主导节点信息。
在上述图4的基础上,针对主导用户节点信息的确定过程,如图5所示,给出了一种更具体的实现过程,具体包括:基于目标图数据对应的邻接矩阵和第一参考矩阵,确定第一分数矩阵,也就是说,基于上述邻接矩阵和第一参考矩阵进行结构重要性评估,得到第一分数矩阵;其中,上述邻接矩阵是基于目标图数据的节点之间的边数据得到的,上述第一参考矩阵为包含P个预设值的列矩阵,上述第一分数矩阵为包含P个用户节点的结构重要性分数的列矩阵;基于上述第一分数矩阵,确定上述P个用户节点的第一评估结果;基于上述第一评估结果,从P个用户节点中选取M个候选用户节点;基于上述目标图数据对应的初始特征矩阵和M个候选用户节点,确定第一目标矩阵;其中,上述初始特征矩阵是基于上述目标图数据对应的特征信息得到的,上述第一目标矩阵为包含M个候选用户节点的特征信息的特征矩阵;基于上述第一目标矩阵(对应于候选节点特征信息)和第一参数矩阵,确定第二分数矩阵,也就是说,基于上述第一目标矩阵和第一参数矩阵进行特征重要性评估,得到第二分数矩阵;基于上述第二分数矩阵,确定上述M个候选用户节点的第二评估结果;基于上述第二评估结果,从上述M个候选用户节点中选取K个主导用户节点;基于上述K个主导用户节点,确定第二目标矩阵;基于上述第二目标矩阵,确定主导用户节点信息。
在上述图5的基础上,针对基于第二分数矩阵,确定M个候选节点的第二评估结果的过程,以第二目标矩阵包括第四分数矩阵为例,如图6所示,给出了另一种更为具体的实现过程,具体包括:基于上述第二分数矩阵,确定分数偏差矩阵;其中,分数偏差矩阵为包含M个候选用户节点的特征重要性分数与分数均值的差值的列矩阵;基于上述第二分数矩阵和第二参考矩阵,确定第三分数矩阵;其中,上述第二参考矩阵是基于上述分数偏差矩阵确定 的;基于上述第三分数矩阵,确定上述M个候选用户节点的第二评估结果。
在从目标节点的结构信息丰富程度和特征信息区分度这两个维度,筛选出K个主导节点,并确定出第二目标矩阵之后,针对节点特征相似程度的计算过程,可以仅计算中心节点(即任一目标节点)与对应的各主导邻域节点(即K个主导节点中该中心节点的邻域节点)之间的特征相似程度,其中,若在节点特征重要性评估的过程中,直接将第二目标矩阵确定为第一主导节点信息,因此,上述第一主导节点信息包括第二目标矩阵,即基于第二目标矩阵计算节点特征相似程度;具体的,上述步骤S1044中的基于上述第一主导节点信息,计算目标节点与K个主导节点中该目标节点的邻域节点之间的特征相似程度,具体包括:基于上述第二目标矩阵,计算目标节点与K个主导节点中该目标节点的邻域节点之间的特征相似程度。
上述第二目标矩阵包括稀疏特征矩阵和第四分数矩阵中任一项;上述K个主导节点中该目标节点的邻域节点包括目标节点对应的主导邻域节点(即K个主导节点中该目标节点的一阶主导邻域节点);具体的,针对某一中心节点(即任一目标节点),基于第一图数据对应的邻接矩阵确定的该中心节点对应的主导邻域节点,即基于邻接矩阵中与该目标节点对应的行矩阵,确定该行矩阵中元素的取值为1且列序号与某一主导节点相对应的目标节点。
针对上述第二目标矩阵包括稀疏特征矩阵的情况,可以基于稀疏特征矩阵,计算目标节点的特征信息与该目标节点的主导邻域节点的特征信息之间的相似程度;
针对基于稀疏特征矩阵计算目标节点与其主导邻域节点之间的特征相似程度的过程,可以采用下述公式实现,具体为:
eij=LeakyReLU(aT[zi||zj])
其中,eij表示节点i与节点j之间的特征相似程度,LeakyReLU表示激活函数,||表示向量拼接操作,aT表示待训练参数,用于将拼接后的向量投影为一维向量,zi表示节点i的特征向量,zi∈H2,zj表示节点j的特征向量, zi∈H2
考虑到两个节点的特征重要性分数的大小关系,在一定程度上能够反映目标节点与主导邻域节点的特征接近程度,两个节点的特征重要性分数接近且分数均比较大,两个节点的特征接近程度越高,基于此,上述第二目标矩阵还可以包括第四分数矩阵,基于第四分数矩阵,计算目标节点与主导邻域节点之间的特征相似程度,此时无需将节点特征信息进行比对计算,能够进一步减少信息计算量,因此,也可以基于上述第四分数矩阵计算目标节点与K个主导节点中该目标节点的邻域节点(即主导邻域节点)之间的特征相似程度。
针对基于第四分数矩阵计算目标节点与其主导邻域节点之间的特征相似程度的过程,可以采用下述公式实现,具体为:
eij=LeakyReLU(yi+yj)
其中,eij表示节点i与节点j之间的特征相似程度,LeakyReLU表示激活函数,yi表示节点i的特征重要性分数,yi∈F2,yj表示节点j的特征重要性分数,yi∈F2 T
另外,可以直接将特征相似程度作为权重系数,也可以先对特征相似程度进行归一化处理,将归一化后的特征相似程度作为权重系数。
针对第一图数据对应的初始特征矩阵的确定过程,上述分类模型还包括特征转换层;每次模型训练的具体实现方式还包括:特征转换层基于上述第一图数据对应的节点特征矩阵和第二参数矩阵,确定初始特征矩阵;其中,上述节点特征矩阵是基于第一图数据对应的特征信息得到的,上述节点特征矩阵中每一行对应一个目标节点的C个特征维度,上述第二参数矩阵为(C×F)参数矩阵,C、F均为大于1的整数。
在一些示例实施例中,可以采用下述公式,对节点特征矩阵和第二参数矩阵进行相乘,得到初始特征矩阵,可以为:
H0=XW0
其中,H0表示初始特征矩阵,X表示节点特征矩阵,W0表示第二参数矩阵(可以作为待训练的模型参数),即(P×C)的节点特征矩阵与(C×F)的第二参数矩阵相乘,得到(P×F)的初始特征矩阵。
由于在模型训练阶段,在计算节点特征相似程度时,针对某一中心节点(即任一目标节点)而言,仅需计算该中心节点与结构丰富且特征区分度高的K个主导节点中所述中心节点的邻域节点之间的特征相似程度,使得模型训练阶段整体计算量大大降低,因此,可以在模型训练的过程中一并完成节点类别预测,基于此,上述P个目标节点还包括(P-N)个无标签节点,(P-N)个无标签节点包括X个待分类节点,上述节点类别预测结果还包括(P-N)个无标签节点的预测类别标签。
在上述S104,将上述第一图数据输入至待训练的分类模型进行模型迭代训练,得到训练后的分类模型之后,还包括:基于上述分类模型的最后一轮训练所输出的节点类别预测结果,确定X个待分类节点的预测类别标签,X为大于或等于1的整数。
上述P个目标节点不仅包括用于计算参数损失值的有标签节点,还包括用于在特征聚合时提供更多特征信息的第一无标签节点,又包括作为待分类节点的第二无标签节点,即上述P个目标节点包括N个有标签节点(即样本节点)、X个第二无标签节点(即待分类节点)、(P-N-X)个第一无标签节点(即样本节点)。
针对节点类别预测的过程,一种实现方式为,模型训练阶段与节点类别预测阶段分开执行,即待分类节点的预测类别标签是在模型训练完成之后执行的,其中,上述(P-N)个无标签节点仅包括样本节点,而不包括待分类节点;另一种实现方式为,模型训练阶段与节点类别预测阶段一并执行,即待分类节点的预测类别标签可以是在模型训练的最后一轮得到的,其中,上述(P-N)个无标签节点不仅包括样本节点,还包括待分类节点,该待分类 节点在模型训练阶段,可以作为无标签节点参与模型参数的迭代训练,然后,基于最后一轮的预测结果,即可确定待分类节点的预测类别标签(即模型训练已完成,此时节点类别预测结果是准确的,因此,可以直接确定出待分类节点的预测类别标签)。
针对利用图神经网络模型(即上述分类模型)进行分类的具体应用场景,上述S102,获取目标分类任务对应的第一图数据,具体包括:
(1)若上述目标分类任务为文章分类任务,则获取上述文章分类任务对应的第一图数据,上述目标节点包括文章节点。
上述分类模型可以是用于文章分类的图神经网络模型,将包含P个文章节点的第一图数据输入至待训练的图神经网络模型进行模型迭代训练,得到训练后的图神经网络模型作为分类模型;具体的,每个目标节点对应于一篇文章,若两篇文章之间存在预设关联关系(如引用关系),则在该两篇文章对应的目标节点之间具有一条连接边。
(2)若上述目标分类任务为风险用户分类任务,则获取上述风险用户分类任务对应的第一图数据,上述目标节点包括用户节点。
上述第一图数据可以基于社交网络图构建,上述用户节点对应于社交网络图中的某一用户,上述分类模型可以是用于风险用户识别的图神经网络模型,将包含P个用户节点的第一图数据输入至待训练的图神经网络模型进行模型迭代训练,得到训练后的图神经网络模型作为分类模型;具体的,每个目标节点对应于一个目标用户,若两个用户之间存在预设关联关系(如转账交易),则在该这两个用户对应的目标节点之间具有一条连接边。
(3)若上述目标分类任务为推送用户分类任务,则获取上述推送用户分类任务对应的第一图数据,上述目标节点包括用户节点。
上述第一图数据可以是基于好友关系图所构建的,上述用户节点对应于好友关系图中的某一用户,上述分类模型可以是用于目标推送用户识别的图神经网络模型,将包含P个用户节点的第一图数据输入至待训练的图神经网 络模型进行模型迭代训练,得到训练后的图神经网络模型作为分类模型;具体的,每个目标节点对应于一个目标用户,若两个用户之间存在预设关联关系(如好友关系),则在该这两个用户对应的目标节点之间具有一条连接边。
本申请实施例中的数据处理方法,通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的K个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高模型训练效率;另一方面,由于在模型训练阶段就能够精准地控制节点特征相似度的计算量比较小,因此,针对分类模型训练与节点类别预测一并完成的情况,即目标节点可以包括待分类节点,也能够确保待分类节点的类别预测效率,并且,由于类别预测所使用的模型参数是将待分类节点作为无标签节点进行参数迭代更新得到的,从而提高模型参数的准确度,达到同时兼顾类别预测效率和预测准确度的效果。
对应上述图1至图6描述的数据处理方法,基于相同的技术构思,针对模型训练与节点类别预测分别执行的情况,本申请实施例还提供了一种类别识别方法,图7a为本申请实施例提供的类别识别方法的流程示意图,图7a中的方法能够由设置有上述训练后的分类模型的电子设备执行,该电子设备可以是终端设备或者指定服务器,该分类模型可以应用于任一需要对图数据中某一节点进行分类的应用场景,例如,对图数据中待测文章节点进行类别预测,又如,对图数据中待识别用户节点进行类型预测(如识别贷款业务申请用户是否为风险用户、或识别预设应用的注册用户是否为目标商品的推送用户等);具体的,上述类别识别方法应用于分类模型,该分类模型包括节点筛选层、特征聚合层、类别预测层,如图7a所示,该方法至少包括以下步骤:
S700,节点筛选层基于图数据,确定主导节点的主导节点信息,该图数 据包括待分类节点和样本节点,所述待分类节点和所述样本节点包括所述主导节点。
上述图数据可以包括目标分类任务对应的第二图数据,第二图数据可以包括待分类节点和样本节点、以及节点特征信息、节点之间的边数据。
S800,特征聚合层基于所述主导节点信息,计算所述待分类节点与所述待分类节点的邻域节点之间的特征相似程度,并基于所述特征相似程度和所述邻域节点的特征信息进行特征聚合,得到所述待分类节点聚合后的特征信息;
上述邻域节点可以是上述主导节点中待分类节点的邻域节点,即待分类节点的邻域节点是指,在筛选出的多个主导节点中,与该待分类节点存在连接边的主导节点,也可以称为待分类节点的主导邻域节点。
S900,类别预测层基于所述聚合后的特征信息,确定所述待分类节点的预测类别信息。
在一些示例实施例中,以上述图数据包括目标分类任务对应的第二图数据为例,如图7b所示,上述类别识别方法至少包括以下步骤:
S702,获取目标分类任务对应的第二图数据;其中,第二图数据包括X个待分类节点和Q个样本节点、以及节点特征信息、节点之间的边数据,每个待分类节点用于表征一个待分类对象,待分类对象包括待分类文章、待分类用户中任一项,X、Q均为大于或等于1的整数。
上述Q个样本节点可以包括上述分类模型训练过程所使用的样本节点,Q小于或等于P;上述目标分类任务可以包括文章分类任务、风险用户分类任务、推送用户分类任务中任一项。
S704,将上述第二图数据输入至训练后的分类模型进行节点类别预测,得到X个待分类节点的预测类别信息。
上述分类模型可以是图神经网络模型(如图注意力网络模型);上述训练后的分类模型可以是基于上述数据处理方法对待训练的分类模型进行模型迭 代训练得到的,具体实现过程详见上述步骤S102和S104,在此不再赘述。
上述分类模型可以包括节点筛选层、特征聚合层、类别预测层;上述节点类别预测的具体实现方式有:
上述节点筛选层基于上述第二图数据,确定第二主导节点信息;其中,第二主导节点信息对应于从(X+Q)个节点中选取的L个主导节点;L为大于1且小于(X+Q)的整数;
针对第二主导节点信息的确定过程,可以参照上述第一主导节点信息的确定过程,在此不再赘述。
上述特征聚合层基于上述第二主导节点信息,计算待分类节点与上述L个主导节点中所述待分类节点的邻域节点之间的特征相似程度;并基于上述特征相似程度和对应的邻域节点的特征信息,对待分类节点进行特征聚合,得到待分类节点聚合后的特征信息;其中,上述L个主导节点中所述待分类节点的邻域节点是指在L个主导节点中,与所述待分类节点存在连接边的主导节点,即在L个主导节点中,与所述待分类节点具有预设关联关系的主导节点,也即所述待分类节点的主导邻域节点。
针对待分类节点与对应的主导邻域节点之间的特征相似程度的确定过程,可以参照上述目标节点与K个主导节点中该目标节点的邻域节点之间的特征相似程度的确定过程,在此不再赘述;另外,特征聚合过程也可以参照上述实施例,在此不再赘述。
上述类别预测层基于上述聚合后的特征信息,确定X个待分类节点的预测类别信息。
另外,针对在增加X个待分类节点的情况下,如果在类别预测过程所需的预设信息保持不变,则直接获取在模型训练阶段所确定的预设信息即可,不需要重复确定;例如,Q个样本节点中两个节点之间的特征相似度不会因在图数据中增加X个待分类节点而变化,因此,样本节点之间的特征相似度可以直接使用在模型训练阶段计算得到的。
针对利用图神经网络模型(即上述分类模型)进行分类的具体应用场景,上述S702,获取目标分类任务对应的第二图数据,具体包括:
(1)若所述目标分类任务为文章分类任务,则获取所述文章分类任务对应的第二图数据,所述待分类节点和所述样本节点均包括文章节点。
(2)若所述目标分类任务为风险用户分类任务,则获取所述风险用户分类任务对应的第二图数据,所述待分类节点和所述样本节点均包括用户节点。
(3)若所述目标分类任务为推送用户分类任务,则获取所述推送用户分类任务对应的第二图数据,所述待分类节点和所述样本节点均包括用户节点。
本申请实施例中的类别识别方法,由于无论是在模型训练阶段,还是在节点类别预测阶段,均可以通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的多个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高类别预测效率;另外,由于在模型训练阶段可以加入更多地无标签节点参与模型参数迭代更新,使得训练后的模型参数的准确度更高,进而使得类别预测的准确率更高。
需要说明的是,本申请中该实施例与本申请中上一实施例基于同一发明构思,因此该实施例的具体实施可以参见前述数据处理方法的实施,重复之处不再赘述。
对应上述图1至图6描述的数据处理方法,基于相同的技术构思,本申请实施例还提供了一种数据处理装置,图8为本申请实施例提供的数据处理装置的模块组成示意图,该装置用于执行图1至图6描述的数据处理方法,如图8所示,该装置包括:图数据获取模块802,用于获取目标分类任务对应的第一图数据;所述第一图数据包括P个目标节点、所述目标节点的特征 信息、以及所述目标节点之间的边数据,所述P个目标节点包括N个有标签节点,P、N均为大于1的整数且N小于P;模型训练模块804,用于将所述第一图数据输入至待训练的分类模型进行模型迭代训练,得到训练后的分类模型;其中,所述分类模型包括节点筛选层、特征聚合层、类别预测层;每次模型训练的具体实现方式有:所述节点筛选层基于所述边数据和所述特征信息,确定第一主导节点信息;所述第一主导节点信息对应于从所述P个目标节点中选取的K个主导节点;K为大于1且小于P的整数;所述特征聚合层基于所述第一主导节点信息,计算所述目标节点与所述K个主导节点中所述目标节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息,对所述目标节点进行特征聚合,得到所述目标节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定节点类别预测结果;所述节点类别预测结果包括N个所述有标签节点的预测类别标签;基于N个所述有标签节点的预测类别标签和真实类别标签,对所述分类模型进行参数更新。
本申请实施例中的数据处理装置,通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的K个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高模型训练效率;另一方面,由于在模型训练阶段就能够精准地控制节点特征相似度的计算量比较小,因此,针对分类模型训练与节点类别预测一并完成的情况,即目标节点可以包括待分类节点,也能够确保待分类节点的类别预测效率,并且,由于类别预测所使用的模型参数是将待分类节点作为无标签节点进行参数迭代更新得到的,从而提高模型参数的准确度,达到同时兼顾类别预测效率和预测准确度的效果。
需要说明的是,本申请中关于数据处理装置的实施例与本申请中关于数据处理方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的数据处理方法的实施,重复之处不再赘述。
对应上述图7a描述的类别识别方法,基于相同的技术构思,本申请实施例还提供了一种类别识别装置,图9为本申请实施例提供的类别识别装置的模块组成示意图,该装置用于执行图7a描述的类别识别方法,如图9所示,该装置包括:图数据获取模块902,用于获取目标分类任务对应的第二图数据;所述第二图数据包括X个待分类节点和Q个样本节点,每个所述待分类节点用于表征一个待分类对象,所述待分类对象包括待分类文章、待分类用户中任一项,X、Q均为大于或等于1的整数;类别预测模块904,用于将所述第二图数据输入至训练后的分类模型进行节点类别预测,得到所述待分类节点的预测类别信息;其中,所述分类模型包括节点筛选层、特征聚合层、类别预测层;所述节点类别预测的具体实现方式有:所述节点筛选层基于所述第二图数据,确定第二主导节点信息;所述第二主导节点信息对应于从(X+Q)个节点中选取的L个主导节点;L为大于1且小于(X+Q)的整数;所述特征聚合层基于所述第二主导节点信息,计算所述待分类节点与所述L个主导节点中所述待分类节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息,对所述待分类节点进行特征聚合,得到所述待分类节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定X个所述待分类节点的预测类别信息。
本申请实施例中的类别识别装置,由于无论是在模型训练阶段,还是在节点类别预测阶段,均可以通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的多个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的 邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高类别预测效率;另外,由于在模型训练阶段可以加入更多地无标签节点参与模型参数迭代更新,使得训练后的模型参数的准确度更高,进而使得类别预测的准确率更高。
需要说明的是,本申请中关于类别识别装置的实施例与本申请中关于类别识别方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的类别识别方法的实施,重复之处不再赘述。
对应上述图1至图7b所示的方法,基于相同的技术构思,本申请实施例还提供了一种计算机设备,该设备用于执行上述的数据处理方法,如图10所示。
计算机设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上的处理器1001和存储器1002,存储器1002中可以存储有一个或一个以上存储应用程序或数据。其中,存储器1002可以是短暂存储或持久存储。存储在存储器1002的应用程序可以包括一个或一个以上模块(图示未示出),每个模块可以包括对计算机设备中的一系列计算机可执行指令。更进一步地,处理器1001可以设置为与存储器1002通信,在计算机设备上执行存储器1002中的一系列计算机可执行指令。计算机设备还可以包括一个或一个以上电源1003,一个或一个以上有线或无线网络接口1004,一个或一个以上输入输出接口1005,一个或一个以上键盘1006等。
计算机设备包括有存储器,以及一个或一个以上的程序,其中一个或者一个以上程序存储于存储器中,且一个或者一个以上程序可以包括一个或一个以上模块,且每个模块可以包括对计算机设备中的一系列计算机可执行指令,且经配置以由一个或者一个以上处理器执行该一个或者一个以上程序包含用于进行以下计算机可执行指令:获取目标分类任务对应的第一图数据;所述第一图数据包括P个目标节点、所述目标节点的特征信息、以及所述目 标节点之间的边数据,所述P个目标节点包括N个有标签节点,P、N均为大于1的整数且N小于P;将所述第一图数据输入至待训练的分类模型进行模型迭代训练,得到训练后的分类模型;其中,所述分类模型包括节点筛选层、特征聚合层、类别预测层;每次模型训练的具体实现方式有:所述节点筛选层基于所述边数据和所述特征信息,确定第一主导节点信息;所述第一主导节点信息对应于从所述P个目标节点中选取的K个主导节点;K为大于1且小于P的整数;所述特征聚合层基于所述第一主导节点信息,计算所述目标节点与所述K个主导节点中所述目标节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息,对所述目标节点进行特征聚合,得到所述目标节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定节点类别预测结果;所述节点类别预测结果包括N个所述有标签节点的预测类别标签;基于N个所述有标签节点的预测类别标签和真实类别标签,对所述分类模型进行参数更新。
计算机设备包括有存储器,以及一个或一个以上的程序,其中一个或者一个以上程序存储于存储器中,且一个或者一个以上程序可以包括一个或一个以上模块,且每个模块可以包括对计算机设备中的一系列计算机可执行指令,且经配置以由一个或者一个以上处理器执行该一个或者一个以上程序包含用于进行以下计算机可执行指令:获取目标分类任务对应的第二图数据;所述第二图数据包括X个待分类节点和Q个样本节点,每个所述待分类节点用于表征一个待分类对象,所述待分类对象包括待分类文章、待分类用户中任一项,X、Q均为大于或等于1的整数;将所述第二图数据输入至训练后的分类模型进行节点类别预测,得到所述待分类节点的预测类别信息;其中,所述分类模型包括节点筛选层、特征聚合层、类别预测层;所述节点类别预测的具体实现方式有:所述节点筛选层基于所述第二图数据,确定第二主导节点信息;所述第二主导节点信息对应于从(X+Q)个节点中选取的L个主导节点;L为大于1且小于(X+Q)的整数;所述特征聚合层基于所述第二 主导节点信息,计算所述待分类节点与所述L个主导节点中所述待分类节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息,对所述待分类节点进行特征聚合,得到所述待分类节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定X个所述待分类节点的预测类别信息。
本申请实施例中的计算机设备,通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的K个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高模型训练效率;另一方面,由于在模型训练阶段就能够精准地控制节点特征相似度的计算量比较小,因此,针对分类模型训练与节点类别预测一并完成的情况,即目标节点可以包括待分类节点,也能够确保待分类节点的类别预测效率,并且,由于类别预测所使用的模型参数是将待分类节点作为无标签节点进行参数迭代更新得到的,从而提高模型参数的准确度,达到同时兼顾类别预测效率和预测准确度的效果。
需要说明的是,本申请中关于计算机设备的实施例与本申请中关于数据处理方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的数据处理方法的实施,重复之处不再赘述。
对应上述图1至图7b所示的方法,基于相同的技术构思,本申请实施例还提供了一种存储介质,用于存储计算机可执行指令,该存储介质可以为U盘、光盘、硬盘等,该存储介质存储的计算机可执行指令在被处理器执行时,能实现以下流程:获取目标分类任务对应的第一图数据;所述第一图数据包括P个目标节点、所述目标节点的特征信息、以及所述目标节点之间的边数 据,所述P个目标节点包括N个有标签节点,P、N均为大于1的整数且N小于P;将所述第一图数据输入至待训练的分类模型进行模型迭代训练,得到训练后的分类模型;其中,所述分类模型包括节点筛选层、特征聚合层、类别预测层;每次模型训练的具体实现方式有:所述节点筛选层基于所述边数据和所述特征信息,确定第一主导节点信息;所述第一主导节点信息对应于从所述P个目标节点中选取的K个主导节点;K为大于1且小于P的整数;所述特征聚合层基于所述第一主导节点信息,计算所述目标节点与所述K个主导节点中所述目标节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息,对所述目标节点进行特征聚合,得到所述目标节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定节点类别预测结果;所述节点类别预测结果包括N个所述有标签节点的预测类别标签;基于N个所述有标签节点的预测类别标签和真实类别标签,对所述分类模型进行参数更新。
上述存储介质可以为U盘、光盘、硬盘等,该存储介质存储的计算机可执行指令在被处理器执行时,能实现以下流程:获取目标分类任务对应的第二图数据;所述第二图数据包括X个待分类节点和Q个样本节点,每个所述待分类节点用于表征一个待分类对象,所述待分类对象包括待分类文章、待分类用户中任一项,X、Q均为大于或等于1的整数;将所述第二图数据输入至训练后的分类模型进行节点类别预测,得到所述待分类节点的预测类别信息;其中,所述分类模型包括节点筛选层、特征聚合层、类别预测层;所述节点类别预测的具体实现方式有:所述节点筛选层基于所述第二图数据,确定第二主导节点信息;所述第二主导节点信息对应于从(X+Q)个节点中选取的L个主导节点;L为大于1且小于(X+Q)的整数;所述特征聚合层基于所述第二主导节点信息,计算所述待分类节点与所述L个主导节点中所述待分类节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息,对所述待分类节点进行特征聚合,得到所述待分 类节点聚合后的特征信息;所述类别预测层基于所述聚合后的特征信息,确定X个所述待分类节点的预测类别信息。
本申请实施例中的存储介质存储的计算机可执行指令在被处理器执行时,通过先基于节点边数据和节点特征信息筛选出主导节点信息,即定位结构丰富且特征区分度高的K个主导节点,再基于主导节点的相关信息计算特征相似度,这样非主导节点不会作为某一中心节点(即任一目标节点)的主导邻域节点参与特征相似度的计算,实现对中心节点的邻域节点进行稀疏化,从而减少节点特征相似度的计算量(即省去了中心节点与部分邻域节点的注意力分数计算量),进而提高模型训练效率;另一方面,由于在模型训练阶段就能够精准地控制节点特征相似度的计算量比较小,因此,针对分类模型训练与节点类别预测一并完成的情况,即目标节点可以包括待分类节点,也能够确保待分类节点的类别预测效率,并且,由于类别预测所使用的模型参数是将待分类节点作为无标签节点进行参数迭代更新得到的,从而提高模型参数的准确度,达到同时兼顾类别预测效率和预测准确度的效果。
需要说明的是,本申请中关于存储介质的实施例与本申请中关于数据处理方法的实施例基于同一发明构思,因此该实施例的具体实施可以参见前述对应的数据处理方法的实施,重复之处不再赘述。
上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。本领域内的技术人员应明白,本申请实施例可提供为方法、系统或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可读存储介质(包括但不限于磁盘存储器、 CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器 (EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本申请实施例可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请的一个或多个实施例,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。以上所述仅为本文件的实施例而已,并不用于限制本文件。对于本领域技术人员来说,本文件可以有各种更改和变化。凡在本文件的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本文件的权利要求范围之内。

Claims (15)

  1. 一种数据处理方法,该方法应用于分类模型,所述分类模型包括节点筛选层、特征聚合层、类别预测层,所述方法包括:
    所述节点筛选层基于图数据的目标节点的边数据和特征信息,确定主导节点的主导节点信息;所述目标节点包括有标签节点,所述有标签节点包括所述主导节点;
    所述特征聚合层基于所述主导节点信息,计算所述目标节点与所述目标节点的邻域节点之间的特征相似程度;并基于所述特征相似程度和所述邻域节点的特征信息进行特征聚合,得到所述目标节点聚合后的特征信息;
    所述类别预测层基于所述聚合后的特征信息,确定节点类别预测结果,所述节点类别预测结果包括所述有标签节点的预测类别标签;
    基于所述有标签节点的预测类别标签和真实类别标签,对所述分类模型进行参数迭代更新,得到训练后的分类模型。
  2. 根据权利要求1所述的方法,其中,所述基于图数据的目标节点的边数据和特征信息,确定主导节点的主导节点信息,包括:
    基于所述边数据对所述目标节点进行结构重要性评估,得到第一评估结果;
    基于所述第一评估结果和所述特征信息,确定候选节点特征信息;所述候选节点特征信息包括从所述目标节点中选取的候选节点的特征信息;
    基于所述候选节点特征信息对所述候选节点进行特征重要性评估,得到第二评估结果;
    基于所述第二评估结果,确定所述主导节点信息;所述主导节点信息包括从所述候选节点中选取的主导节点的相关信息,所述相关信息包括特征重要性分数和节点特征信息中任一项。
  3. 根据权利要求2所述的方法,其中,所述基于所述边数据对所述目标节点进行结构重要性评估,得到第一评估结果,包括:
    基于所述图数据的邻接矩阵和第一参考矩阵确定第一分数矩阵;所述邻接矩阵是基于所述边数据得到,所述第一参考矩阵为包含多个预设值的列矩阵,所述第一分数矩阵为包含所述目标节点的结构重要性分数的列矩阵;
    基于所述第一分数矩阵,确定所述目标节点的第一评估结果。
  4. 根据权利要求2所述的方法,其中,所述基于所述第一评估结果和所述特征信息,确定候选节点特征信息,包括:
    基于所述第一评估结果,从多个所述目标节点中选取候选节点;
    基于所述图数据的初始特征矩阵和所述候选节点,确定第一目标矩阵;所述初始特征矩阵是基于所述特征信息得到,所述第一目标矩阵为包含所述候选节点的特征信息的特征矩阵;
    基于所述第一目标矩阵,确定所述候选节点特征信息。
  5. 根据权利要求4所述的方法,其中,所述基于所述候选节点特征信息对所述候选节点进行特征重要性评估,得到第二评估结果,包括:
    基于所述第一目标矩阵确定第二分数矩阵,所述第二分数矩阵为包含所述候选节点的特征重要性分数的列矩阵;
    基于所述第二分数矩阵,确定所述候选节点的第二评估结果。
  6. 根据权利要求5所述的方法,其中,所述基于所述第二分数矩阵,确定所述候选节点的第二评估结果,包括:
    基于所述第二分数矩阵,确定分数偏差矩阵,所述分数偏差矩阵为包含所述候选节点的特征重要性分数与分数均值的差值的列矩阵;
    基于所述第二分数矩阵和第二参考矩阵,确定第三分数矩阵,其中,所述第二参考矩阵是基于所述分数偏差矩阵确定;
    基于所述第三分数矩阵,确定所述候选节点的第二评估结果。
  7. 根据权利要求2所述的方法,其中,所述基于所述第二评估结果,确定所述主导节点信息,包括:
    基于所述第二评估结果,从多个所述候选节点中选取主导节点;
    基于所述主导节点,确定第二目标矩阵;所述第二目标矩阵包括稀疏特征矩阵和第四分数矩阵中任一项,所述稀疏特征矩阵为包含所述主导节点的特征信息的特征矩阵,所述第四分数矩阵为包含所述主导节点的特征重要性分数的分数矩阵;
    基于所述第二目标矩阵,确定所述主导节点信息。
  8. 根据权利要求1至7任一项所述的方法,其中,所述目标节点还包括无标签节点,所述无标签节点包括待分类节点,所述节点类别预测结果还包括所述无标签节点的预测类别标签;所述方法还包括:
    基于所述分类模型的最后一轮训练所输出的节点类别预测结果,确定所述待分类节点的预测类别标签。
  9. 根据权利要求1至7任一项所述的方法,还包括:
    获取目标分类任务对应的所述图数据;其中,
    若所述目标分类任务为文章分类任务,则获取所述文章分类任务的图数据,所述目标节点包括文章节点;
    若所述目标分类任务为风险用户分类任务,则获取所述风险用户分类任务的图数据,所述目标节点包括用户节点;
    若所述目标分类任务为推送用户分类任务,则获取所述推送用户分类任务的图数据,所述目标节点包括用户节点。
  10. 根据权利要求9所述的方法,所述目标节点的特征信息包括目标文 章或者目标用户的特征信息,所述目标节点之间的边数据包括两个目标节点之间的连接边,所述连接边表征两个目标文章之间或者两个目标用户之间具有预设关联关系。
  11. 一种类别识别方法,该方法应用于分类模型,所述分类模型包括节点筛选层、特征聚合层、类别预测层,所述方法包括:
    所述节点筛选层基于图数据,确定主导节点的主导节点信息,所述图数据包括待分类节点和样本节点,所述待分类节点和所述样本节点包括所述主导节点;
    所述特征聚合层基于所述主导节点信息,计算所述待分类节点与所述待分类节点的邻域节点之间的特征相似程度,并基于所述特征相似程度和所述邻域节点的特征信息进行特征聚合,得到所述待分类节点聚合后的特征信息;
    所述类别预测层基于所述聚合后的特征信息,确定所述待分类节点的预测类别信息。
  12. 根据权利要求11所述的方法,还包括:
    获取目标分类任务对应的所述图数据;其中,
    若所述目标分类任务为文章分类任务,则获取所述文章分类任务的图数据,所述待分类节点和所述样本节点均包括文章节点;
    若所述目标分类任务为风险用户分类任务,则获取所述风险用户分类任务的图数据,所述待分类节点和所述样本节点均包括用户节点;
    若所述目标分类任务为推送用户分类任务,则获取所述推送用户分类任务的图数据,所述待分类节点和所述样本节点均包括用户节点。
  13. 一种计算机设备,所述设备包括:
    处理器;以及
    被安排成存储计算机可执行指令的存储器,所述可执行指令被配置由所 述处理器执行,所述可执行指令包括用于执行如权利要求1至10任一项或者权利要求11至12任一项所述的方法中的步骤。
  14. 一种存储介质,所述存储介质用于存储计算机可执行指令,所述可执行指令使得计算机执行如权利要求1至10任一项或者权利要求11至12任一项所述的方法。
  15. 一种计算机程序产品,所述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,所述计算机程序可操作来使计算机执行如权利要求1至10任一项或者权利要求11至12任一项所述的方法。
PCT/CN2023/132685 2022-12-08 2023-11-20 数据处理方法、类别识别方法及计算机设备 WO2024120166A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211570272.6A CN118171150A (zh) 2022-12-08 2022-12-08 分类模型的训练方法、类别识别方法及计算机设备
CN202211570272.6 2022-12-08

Publications (1)

Publication Number Publication Date
WO2024120166A1 true WO2024120166A1 (zh) 2024-06-13

Family

ID=91353632

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/132685 WO2024120166A1 (zh) 2022-12-08 2023-11-20 数据处理方法、类别识别方法及计算机设备

Country Status (2)

Country Link
CN (1) CN118171150A (zh)
WO (1) WO2024120166A1 (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112070216A (zh) * 2020-09-29 2020-12-11 支付宝(杭州)信息技术有限公司 一种基于图计算系统训练图神经网络模型的方法及系统
CN112529115A (zh) * 2021-02-05 2021-03-19 支付宝(杭州)信息技术有限公司 一种对象聚类方法和系统
US20210142108A1 (en) * 2018-11-15 2021-05-13 Tencent Technology (Shenzhen) Company Limited Methods, apparatus, and storage medium for classifying graph nodes
CN113269228A (zh) * 2021-04-20 2021-08-17 重庆邮电大学 一种图网络分类模型的训练方法、装置、系统及电子设备
WO2021189729A1 (zh) * 2020-03-27 2021-09-30 深圳壹账通智能科技有限公司 复杂关系网络的信息分析方法、装置、设备及存储介质
CN114201572A (zh) * 2022-02-15 2022-03-18 深圳依时货拉拉科技有限公司 基于图神经网络的兴趣点分类方法和装置
CN114358111A (zh) * 2021-11-03 2022-04-15 腾讯科技(深圳)有限公司 对象聚类模型的获取方法、对象聚类方法及装置
CN114707644A (zh) * 2022-04-25 2022-07-05 支付宝(杭州)信息技术有限公司 图神经网络的训练方法及装置
CN115130554A (zh) * 2022-05-30 2022-09-30 马上消费金融股份有限公司 对象分类方法、装置、电子设备及存储介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210142108A1 (en) * 2018-11-15 2021-05-13 Tencent Technology (Shenzhen) Company Limited Methods, apparatus, and storage medium for classifying graph nodes
WO2021189729A1 (zh) * 2020-03-27 2021-09-30 深圳壹账通智能科技有限公司 复杂关系网络的信息分析方法、装置、设备及存储介质
CN112070216A (zh) * 2020-09-29 2020-12-11 支付宝(杭州)信息技术有限公司 一种基于图计算系统训练图神经网络模型的方法及系统
CN112529115A (zh) * 2021-02-05 2021-03-19 支付宝(杭州)信息技术有限公司 一种对象聚类方法和系统
CN113313208A (zh) * 2021-02-05 2021-08-27 支付宝(杭州)信息技术有限公司 一种对象聚类方法和系统
CN113269228A (zh) * 2021-04-20 2021-08-17 重庆邮电大学 一种图网络分类模型的训练方法、装置、系统及电子设备
CN114358111A (zh) * 2021-11-03 2022-04-15 腾讯科技(深圳)有限公司 对象聚类模型的获取方法、对象聚类方法及装置
CN114201572A (zh) * 2022-02-15 2022-03-18 深圳依时货拉拉科技有限公司 基于图神经网络的兴趣点分类方法和装置
CN114707644A (zh) * 2022-04-25 2022-07-05 支付宝(杭州)信息技术有限公司 图神经网络的训练方法及装置
CN115130554A (zh) * 2022-05-30 2022-09-30 马上消费金融股份有限公司 对象分类方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN118171150A (zh) 2024-06-11

Similar Documents

Publication Publication Date Title
CN107436875B (zh) 文本分类方法及装置
WO2022126971A1 (zh) 基于密度的文本聚类方法、装置、设备及存储介质
US11714831B2 (en) Data processing and classification
JP6484730B2 (ja) 時間因子を融合させる協調フィルタリング方法、装置、サーバおよび記憶媒体
CN112256762B (zh) 基于产业地图的企业画像方法、系统、设备及介质
US11741094B2 (en) Method and system for identifying core product terms
US10599985B2 (en) Systems and methods for expediting rule-based data processing
CN110647683B (zh) 一种信息推荐方法、装置
CN111126442B (zh) 一种物品关键属性生成方法、物品分类方法和装置
CN111695024A (zh) 对象评估值的预测方法及系统、推荐方法及系统
CN113569115A (zh) 数据分类方法、装置、设备及计算机可读存储介质
US8862609B2 (en) Expanding high level queries
CN111144098B (zh) 扩展问句的召回方法和装置
WO2024120166A1 (zh) 数据处理方法、类别识别方法及计算机设备
CN111754126A (zh) 对应用进行评估的方法和系统
CN111340601A (zh) 商品信息的推荐方法和装置、电子设备和存储介质
CN113761379B (zh) 商品推荐方法及装置、电子设备和介质
CN111125322B (zh) 信息搜索方法和装置、电子设备和存储介质
CN114528378A (zh) 文本分类方法、装置、电子设备及存储介质
CN113901278A (zh) 一种基于全局多探测和适应性终止的数据搜索方法和装置
CN114358879A (zh) 一种基于大数据的物价实时监测方法和系统
CN114282121A (zh) 业务节点推荐方法、系统、设备及存储介质
CN112990311A (zh) 一种准入客户的识别方法和装置
CN112561569A (zh) 基于双模型的到店预测方法、系统、电子设备及存储介质
CN110110199B (zh) 信息输出方法和装置