CN111967271A - Analysis result generation method, device, equipment and readable storage medium - Google Patents

Analysis result generation method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN111967271A
CN111967271A CN202010839225.1A CN202010839225A CN111967271A CN 111967271 A CN111967271 A CN 111967271A CN 202010839225 A CN202010839225 A CN 202010839225A CN 111967271 A CN111967271 A CN 111967271A
Authority
CN
China
Prior art keywords
node
semantic
features
heterogeneous graph
updated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010839225.1A
Other languages
Chinese (zh)
Inventor
吕肖庆
张晨睿
林衍凯
李鹏
周杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Peking University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Tencent Technology Shenzhen Co Ltd filed Critical Peking University
Priority to CN202010839225.1A priority Critical patent/CN111967271A/en
Publication of CN111967271A publication Critical patent/CN111967271A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, a device and equipment for generating an analysis result and a readable storage medium, and relates to the field of machine learning. The method comprises the following steps: acquiring a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure comprises node data and edge data; determining initial node characteristics corresponding to the node data and semantic aspects corresponding to the edge data; and embedding the initial node features by taking the heterogeneous graph structure as a random variable and taking the semantic aspect as a hidden variable to obtain the heterogeneous graph features of the target heterogeneous graph, wherein the heterogeneous graph features comprise updated semantic features and updated node features. The method has the advantages that the semantic aspect is used as a hidden variable, so that the heterogeneous graph is embedded, the hidden semantic is obtained from the heterogeneous graph, the node feature vector and the semantic feature vector are updated, the node feature is prevented from being updated in a meta-path setting mode, the updating efficiency of the node feature is improved, and the execution accuracy in the generation of the downstream analysis result is higher.

Description

Analysis result generation method, device, equipment and readable storage medium
Technical Field
The embodiment of the application relates to the field of machine learning, in particular to a method, a device, equipment and a readable storage medium for generating an analysis result.
Background
Graph embedding is an algorithm for converting an attribute graph into a vector or a vector set, a heterogeneous graph is a graph data structure containing multiple types of nodes and multiple types of edges, that is, heterogeneous graph embedding refers to an algorithm for expressing the structure and semantic information of the heterogeneous graph as a node vector, and finally obtained updated node features meet the condition that semantically related nodes are close to each other and semantically unrelated nodes are far away from each other in an embedding space.
In the related technology, a meta-path is set to embed a heterogeneous graph, wherein the meta-path is used for representing a node type sequence with a specific semantic meaning, a developer predefines according to domain knowledge, guides the heterogeneous graph to walk immediately by using the meta-path to obtain a positive sample conforming to the meta-path and a negative sample not conforming to the meta-path, and inputs the positive sample and the negative sample into a feature embedding model so as to update node features.
However, because the number of manually defined meta-paths is limited, some potential and complex node semantic dependencies cannot be covered, so that the node feature expression is deficient, and the update accuracy of the node feature is poor.
Disclosure of Invention
The embodiment of the application provides a method, a device and equipment for generating an analysis result and a readable storage medium, which can improve the update accuracy of node characteristics. The technical scheme is as follows:
in one aspect, a method for generating an analysis result is provided, the method including:
acquiring a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure comprises node data and edge data;
determining initial node features corresponding to the node data and semantic aspects corresponding to the edge data;
embedding the initial node features by taking the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable to obtain heterogeneous graph features of the target heterogeneous graph, wherein the heterogeneous graph features comprise updated semantic features and updated node features;
and analyzing the updated semantic features and the updated node features to generate an analysis result corresponding to the node data.
In another aspect, an apparatus for generating an analysis result is provided, the apparatus including:
the acquisition module is used for acquiring a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure comprises node data and edge data, the node data corresponds to an initial node characteristic, and the edge data corresponds to a semantic aspect;
a determining module, configured to determine an initial node feature corresponding to the node data and a semantic aspect corresponding to the edge data;
the embedding module is used for embedding the initial node features by taking the heterogeneous graph structure as a random variable and taking the semantic aspect as a hidden variable to obtain heterogeneous graph features of the target heterogeneous graph, wherein the heterogeneous graph features comprise updated semantic features and updated node features;
and the analysis module is used for analyzing the updated semantic features and the updated node features to generate an analysis result corresponding to the node data.
In another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the method for generating an analysis result as described in any one of the embodiments of the present application.
In another aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the method for generating an analysis result as described in any one of the embodiments of the present application.
In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to make the computer device execute the method for generating the analysis result in any of the above embodiments.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
the heterogeneous graph structure is used as a random variable, the semantic aspect is used as a hidden variable, so that the heterogeneous graph is embedded, the hidden semantics are obtained from the heterogeneous graph, the node feature vector is updated, the semantic feature vector is updated, more accurate node feature vectors and semantic feature vectors are obtained, the node features are prevented from being updated in a meta-path setting mode, the updating efficiency of the node features is improved, and the task execution accuracy in a downstream prediction task is higher.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of extracting node features of a heterogeneous graph by setting a meta path in the related art according to an exemplary embodiment of the present application;
FIG. 2 is a diagram illustrating a heterogeneous graph attention network extracting node features of a heterogeneous graph in the related art according to an exemplary embodiment of the present application;
FIG. 3 is a block diagram of an overall scheme provided by an exemplary embodiment of the present application;
FIG. 4 is a flow chart of a method for generating analysis results provided by an exemplary embodiment of the present application;
FIG. 5 is a flow chart of a method of generating analysis results provided by another exemplary embodiment of the present application;
FIG. 6 is a flow chart of a method of generating analysis results provided by another exemplary embodiment of the present application;
fig. 7 is a block diagram of an apparatus for generating an analysis result according to an exemplary embodiment of the present application;
fig. 8 is a block diagram of an apparatus for generating an analysis result according to another exemplary embodiment of the present application;
fig. 9 is a block diagram of a server according to an exemplary embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
First, a brief description is given of terms referred to in the embodiments of the present application:
artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Machine Learning (ML): the method is a multi-field cross discipline and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and teaching learning.
Heterogeneous Graph (Heterogeneous Graph): also known as Heterogeneous Information networks (Heterogeneous Information networks), is a graph data structure that contains multiple types of nodes, as well as multiple types of edges. Illustratively, taking an application scenario of a recommendation system as an example, entities in the recommendation system include a user account, dynamics issued by a user, and recommendation content uploaded by an advertiser; the incidence relation between the entities comprises the friend relation between the user accounts and the interaction condition between the user accounts and the dynamic state (such as praise, forwarding, account number dynamic mention and the like). Then, for the above recommendation system, the entities are expressed by nodes in the heterogeneous graph, and the association relationship between the entities is expressed by edges between the nodes in the heterogeneous graph.
Graph embedding: the Heterogeneous Graph Embedding (HGE) is an algorithm for converting an attribute Graph into a vector or a vector set, and means an algorithm for representing the structure and semantic information of the Heterogeneous Graph as a node vector, and finally, the obtained updated node features are close to each other in an Embedding space and nodes which are semantically related are far away from each other.
In the related art, the heterogeneous graph is embedded by setting a Meta-path (Meta-path), which refers to a node type sequence with a specific semantic meaning. For example, a Paper quotes a heterogeneous graph network and is provided with a meta-path "A-P-C", which represents that a Paper (Paper) written by an Author (Author) is published on an academic Conference (Conference). In the related art, heterogeneous graph embedding methods based on meta-paths include the following two types:
firstly, randomly walking on a heterogeneous graph by using a meta-path, taking a predefined meta-path as a conditional filter, taking a sample which meets the meta-path as a positive sample, taking a sample which does not meet the meta-path as a negative sample, inputting the positive and negative samples into a characteristic embedded model, and outputting to obtain an updated node;
referring to fig. 1, schematically, taking a paper citation heterogeneous graph 100 as an example, a mechanism node 110, an author node 120, a paper node 130 and an academic conference node 140 are provided, and random walk is performed on the heterogeneous graph 100 according to a predefined meta-path 150, a plurality of positive sample sequences are sampled from the input heterogeneous graph 100, and the sampled positive sample sequences are input into a Skip-Gram model 160 after being encoded. The Skip-Gram model 160 predicts the probability of the occurrence of a context node of a node, i.e., the probability of the selected node and its neighbors 'co-occurrence', such that the probability of the nodes co-occurrence in the positive sample sequence is as large as possible. Similarly, the model randomly samples multiple negative samples from the heterogeneous graph 100, and inputs the multiple negative samples into the Skip-Gram model 160 to minimize the probability of the nodes in the negative samples co-occurring. By this constraint, the updated node features are made to satisfy the condition that semantically related nodes are close to each other and semantically unrelated nodes are far from each other in the embedding space.
Secondly, the meta-path is used as prior information, the weights of neighbor nodes corresponding to the central node are learned through the neural network of the guide graph, and the node characteristics are weighted according to the weights during aggregation, so that updated nodes are obtained.
Referring to fig. 2, for example, a Heterogeneous Graph Attention Network (HAN) is taken as an example, learning is performed at two levels, namely, a node level 210 and a semantic level 220, wherein the Attention of the node level 210 is used for mainly learning the weight between a main learning node and its neighboring nodes, and the Attention of the semantic level 220 is used for mainly learning the weight based on different meta-paths.
However, because the number of manually defined meta-paths is limited, some potential and complex node semantic dependencies cannot be covered, so that the node feature expression is deficient, and the update accuracy of the node feature is poor.
In conjunction with the above noun introduction, the application scenario involved in the embodiment of the present application is illustrated:
taking an application scene of the recommendation system as an example, entities in the recommendation system comprise user accounts, dynamic states issued by users and recommendation contents uploaded by advertisers; the incidence relation between the entities comprises the friend relation between the user accounts and the interaction condition between the user accounts and the dynamic state (such as praise, forwarding, account number dynamic mention and the like). Then, for the above recommendation system, the entities are expressed by nodes in the heterogeneous graph, and the association relationship between the entities is expressed by edges between the nodes in the heterogeneous graph. By the analysis result generation method provided by the embodiment of the application, the updated node feature and the updated semantic feature corresponding to the heterogeneous graph of the recommendation system are obtained, and the updated node feature and the updated semantic feature imply semantic information and structural information contained in the heterogeneous graph of the recommendation system.
After determining the heterogeneous graph features, applying the heterogeneous graph features to content recommendation. Schematically, performing association prediction on a first node corresponding to a user account and a second node corresponding to candidate recommended content according to the characteristics of the heterogeneous graph to obtain an association prediction result, wherein the association prediction result is used for indicating a program interested in prediction of the candidate recommended content by the user account. Optionally, when performing association prediction, inputting a content recommendation model, which is a machine learning model obtained by pre-selection training, by using the heterogeneous graph characteristics as input parameters, performing association prediction on the first node and the second node through the content recommendation model to obtain an association prediction result, and outputting target recommendation content recommended to a target user account. The content recommendation model is a neural network model obtained by pre-training.
Optionally, the heterogeneous graph features include updated node features and semantic features between nodes, and the content recommendation model is configured to determine association degrees between the first node and the second node according to the updated node features and the semantic features, and determine candidate recommended content associated with the target user account with higher association degrees as the target recommended content according to the association degrees.
In the above example, the method is described as being applied to a recommendation system, and the method for generating an analysis result provided by the present application may also be applied to other scenarios that perform embedding processing on a heterogeneous graph to obtain updated node features and semantic features, and apply the updated features to a downstream task, which is not limited in the embodiment of the present application.
Referring to fig. 3, schematically, a framework diagram of an overall scheme provided in an exemplary embodiment of the present application is shown, as shown in fig. 3, taking two semantic aspect types as an example, regarding a heterogeneous graph 300, embedding an initial node feature of the heterogeneous graph 300 with a structure of the heterogeneous graph 300 as a random variable and a semantic aspect 310 as a hidden variable, and embedding an initial node feature of the heterogeneous graph 300 with a semantic aspect 320 as a hidden variable to obtain an updated node feature and a semantic feature.
With reference to the above noun introduction and application scenario, a method for generating an analysis result provided in the present application is described, and taking application of the method to a server as an example, as shown in fig. 4, the method includes:
step 401, obtaining a heterogeneous graph structure of a target heterogeneous graph, where the heterogeneous graph structure includes node data and edge data.
In one example, the heterogeneous graph structure is expressed as
Figure BDA0002640813400000062
Wherein, v represents the node set of the heterogeneous graph, namely the node data of the heterogeneous graph, and represents the edge set of the heterogeneous graph, namely the edge data, and the mapping of each node is composed of
Figure BDA0002640813400000061
V → T, which represents the type of node, and the type of each edge is determined by the mapping ψ: → R determines, R denotes the type of edge. For heterogeneous graphs, | T | + | R | non-woven>2。
Step 402, determining initial node features corresponding to the node data and semantic aspects corresponding to the edge data.
The initial node characteristics correspond to attribute data of each node, and the attribute data comprise node types, node names and the like corresponding to the nodes. Illustratively, the target heterogeneous graph is a recommendation system network, which includes a node a, where a node type corresponding to the node a is a user account, and a node name corresponding to the node a is "00123" and is used to represent an account identifier of the user account.
The semantic aspect corresponds to the association relationship between nodes in the heterogeneous graph, which is schematically exemplified by a movie recommendation system, wherein the node types include: 1. a user account number; 2. a movie name; 3. a movie category; 4. a director; 5. and (6) a lead actor. The semantic aspects include, for example: movies viewed by the user account, movie categories liked by the user account, directors liked by the user account, actors liked by the user account, and the like.
The initial feature vector is expressed as X ∈ R|ν|×dinWherein | v | represents the number of nodes, dinRepresenting the dimensions of the input initial feature vector.
The semantic aspect is expressed as a (aspect), and the semantic aspect further corresponds to a semantic number, and the semantic number is used to indicate the number of semantic aspect types indicated by the edge data, that is, the semantic number is specified according to the prior condition.
And step 403, embedding the initial node features by taking the heterogeneous graph structure as a random variable and taking the semantic aspect as a hidden variable to obtain the heterogeneous graph features of the target heterogeneous graph.
In one example, a heterogeneous graph structure is input into a probability graph generation model, the heterogeneous graph structure is used as a random variable, the semantic aspect is used as a hidden variable, and initial node features are embedded through the probability graph generation model.
And when the heterogeneous graph structure is input into the probability graph generation model, inputting the initial node characteristics and the semantic number into the probability graph generation model, so that the probability graph generation model updates the initial node characteristics by combining with the input content.
Probabilistic Graphical Models (PGMs) refer to a mathematical tool that represents probability distributions by graph structure, where nodes of a probability graph represent random variables and edges of the graph represent dependencies between the random variables.
The heterogeneous graph features include an update semantic feature and an update node feature.
The probability map generation model is also corresponding to generation model parameters, and in response to the generation model parameters being trained parameters, the initial node features are directly embedded through the probability map generation model to obtain updated semantic features and updated node features; in response to the generated model parameters being untrained parameters, the model parameters need to be iteratively adjusted according to the heterogeneous graph structure.
The process of generating the updated node features and semantic features by the probability map generation model is expressed as the following formula one:
the formula I is as follows: p is a radical ofφ(G,A,X)=pφ(G│A,X)pφ(A|X)p(X)
Wherein phi represents a generative model parameter of the probabilistic graphical generative model, and the distribution of the initial node features X is non-parameterized. In the formalization of the generation process, the heterogeneous graph is assumed to be generated by the input initial node characteristics and the semantic hidden variables, namely the joint probability distribution of G, A and X is expanded by a chain rule, and the topological structure of the heterogeneous graph is generated by the joint action of the initial node characteristics X and the unobserved semantic hidden variables A.
The generative model parameter φ of the generative model can be solved by means of maximum likelihood estimation, i.e. maximizing logpφ(G | X). Because hidden variables exist in the model and the maximum optimization problem cannot be directly solved, a variational Expectation Maximization (vEM) algorithm is adopted for solving.
Optionally, determining model parameters corresponding to the heterogeneous graph structure and the initial nodesMaximum likelihood estimation function logp of featuresφ(G | X), adjusting the model parameters through a maximum likelihood estimation function, and embedding the initial node characteristics through the adjusted model parameters and a probability graph generation model.
Wherein the maximum likelihood estimation function is converted into a distribution q of variationθ(A | X) and a posterior distribution pφ(A | G, X) expression, the variational distribution corresponds to a variational model parameter theta, the posterior distribution corresponds to a generative model parameter phi, the variational model parameter is adjusted according to the divergence requirement between the variational distribution and the posterior distribution, and the generative model parameter is adjusted according to the adjusted variational model parameter. Illustratively, the maximum likelihood estimation function is shown as the following equation two:
the formula II is as follows:
Figure BDA0002640813400000081
wherein KL represents KL Divergence (Kullback-Leibler Divergence) for measuring the difference of two probability distributions,
Figure BDA0002640813400000082
and a Lower Bound of the likelihood function is represented and called an Evidence Lower Bound (ELBO), and the maximization solution of the maximum likelihood estimation function is realized through the maximization processing of the Evidence Lower Bound.
And adjusting the variation model parameters and the generation model parameters through a vEM algorithm. The vEM algorithm is divided into a step E and a step M, wherein the step E represents an inference (inference) process, and the step M represents a learning (learning) process, in the step E, a variation model parameter theta is firstly adjusted according to the divergence requirement between variation distribution and posterior distribution, and a generation model parameter phi is adjusted by combining the adjusted variation model parameter theta in the step M. And the variation model parameter theta and the generation model parameter phi are updated iteratively through a vEM algorithm. The description will be made for the step E and the step M, respectively.
Step E represents the inference process, and the probability map generation model needs to be applied to the posterior distribution pφ(A | G, X) is estimated, and the estimation process is carried out due to the existence of a hidden variable A and a node characteristic XKnowing the complex correlation, the estimation result cannot be directly obtained, so that the posterior distribution p is fixed in the step E according to the vEM algorithmφ(A | G, X), and distributing q by updating the variationθ(A | X) approximates a true posterior distribution.
Optionally, the variational distribution q is instantiated using a graphical neural network according to an amortized inferenceθ(A | X), as GNNθAnd rewriting the variation distribution into the following formula three according to mean-field theory:
the formula III is as follows:
Figure BDA0002640813400000091
where K represents the number of aspect-hidden variables in the heterogeneous graph. a iskRepresenting the kth aspect variable.
GNNθThe optimization objective of (a) is defined as the following formula four:
the formula four is as follows:
Figure BDA0002640813400000092
according to the above optimization target formula, the variation distribution qθThe optimization of (a | X) needs to satisfy the following formula five:
the formula five is as follows:
Figure BDA0002640813400000093
where C is const, which is a constant. Illustratively, with one of the variables a0For example, the derivation process of the above formula five is described, please refer to the following formula six:
Figure BDA0002640813400000094
logF (a) in the above derived formula0) The following formula seven is developed:
the formula seven:
Figure BDA0002640813400000101
based on the above derivation, optimal
Figure BDA0002640813400000102
The KL divergence is taken to be zero, at which time the two distributions within the KL divergence are equal. I.e. with reference to the following equation eight:
the formula eight:
Figure BDA0002640813400000103
m steps represent the learning process in which the variational distribution q is fixedθ(A | X), update posterior distribution pφ(A | G, X), the optimization procedure of the parameter φ is as follows:
the formula is nine:
Figure BDA0002640813400000104
wherein logp is a compound of formulaφ(G | A, X) and logpφ(A | X) is exemplified as the graph neural network GNNφ
That is, the probability map generation model includes the GNN described aboveθAnd GNNφBy GNNθAnd GNNφAnd embedding the initial node characteristics, and outputting to obtain updated semantic characteristics and updated node characteristics.
Step 404, analyzing the updated semantic features and the updated node features to generate an analysis result corresponding to the node data.
Optionally, performing association prediction on node data in the heterogeneous graph structure according to the heterogeneous graph characteristics to obtain an association prediction result as an analysis result.
And the association prediction result is used for indicating the predicted association relation between the node data.
According to the correlation prediction result, the subsequent landing application is performed on the heterogeneous graph structure, and schematically, the application mode of the correlation prediction result is illustrated as follows:
firstly, the node data comprises a first node and a second node, wherein the first node corresponds to account data, and the second node corresponds to candidate recommended content; and performing association prediction on the first node and the second node according to the heterogeneous graph characteristics to obtain an association prediction result, wherein the association prediction result is used for indicating the prediction interest degree of the account data on the candidate recommended content.
Optionally, sending candidate recommended content with a high predicted interest degree to the target account according to the correlation prediction result; or, the number of target accounts with higher interest degrees corresponding to the candidate recommended content is predicted according to the correlation prediction result, so that the prediction and popularization effect of the candidate recommended content is obtained.
Optionally, in the process of performing association prediction, inputting the feature of the heterogeneous graph into a recommendation model, where the recommendation model is a machine learning model obtained by pre-training, and performing association prediction on the first node and the second node through the recommendation model to obtain an association prediction result.
And secondly, the node data comprises a third node and a fourth node, wherein the third node corresponds to an abnormal description content, the fourth node corresponds to an abnormal state type, the abnormal description content is a content for describing the abnormal condition of the program, and the abnormal state type is a program running state diagnosis result corresponding to the abnormal description content. And performing association prediction on the third node and the fourth node according to the heterogeneous graph characteristics to obtain an association prediction result, wherein the association prediction result is used for indicating the abnormal state type speculation corresponding to each abnormal description content.
Optionally, the abnormal state type with higher association degree corresponding to the target abnormal description content is determined according to the association prediction result, so that the processing of the program abnormal condition is performed by using the solution corresponding to the abnormal state type.
Optionally, in the process of performing association prediction, inputting the characteristics of the heterogeneous graph into an anomaly prediction model, where the anomaly prediction model is a machine learning model obtained by pre-training, and performing association prediction on the third node and the fourth node through the anomaly prediction model to obtain an association prediction result.
In summary, in the analysis result generation method provided in this embodiment, the heterogeneous graph structure is used as a random variable, and the semantic aspect is used as a hidden variable, so that the heterogeneous graph is subjected to embedding processing, implicit semantics are obtained from the heterogeneous graph, the node feature vector is updated, the semantic feature vector is updated, more accurate node feature vectors and semantic feature vectors are obtained, the node features are prevented from being updated by setting a meta-path, the update efficiency of the node features is improved, and the task execution accuracy in the downstream task is higher.
In the embodiment, in the probability map generation model, the maximum likelihood estimation function is solved in an auxiliary mode through variation distribution, so that a map neural network for updating node features and semantic features is obtained through derivation, and the accuracy of node updating is improved.
Schematically, the algorithm of the overall scheme framework provided by the embodiment of the present application is shown as the following process:
Input:
Figure BDA0002640813400000111
initialnodefeatureX∈R|ν|×din,aspectnumberK
Output:NodeembeddingH∈R|ν|×dout
1:whilenotconvergedo
2:E-Step:Inference Procedure
3:Updatenodeembeddingsandedgeweightsbyqθ
4:InferenceaspectembeddingsAbyqθ
5:UpdateqθbyEq(4)
6:M-Step:Learning Procedure
7:Updatenodeembeddingsandedgeweightsbypφ
8:Updatepφ
9:endwhile
the Input in the first row of the algorithm indicates that the Input parameters comprise a heterogeneous graph structure G, an initial node feature X and a semantic number K, wherein the semantic number is used for indicating the number of semantic aspect types indicated by edge data, namely the number of aspect hidden variables. The second line Output represents Output content, including update node characteristics and update semanticsFeatures where R is used to indicate the type of edge, | v | represents the number of nodes, doutRepresenting the dimensions of the output updated feature vector.
In the algorithm process, step 1 represents starting iterative updating; step 2 represents the prediction process, i.e. corresponding to step E above; step 3 represents according to qθUpdating the node characteristics and the edge weights; step 4 represents according to qθA prediction-wise hidden vector a; step 5 represents updating q four according to the above formulaθ(ii) a Step 6 represents the beginning of the learning process, i.e. corresponding to step M above; step 7 represents according to pφUpdating the node characteristics and the edge weights; step 8 represents updating pφ(ii) a Step 9 represents ending the iteration.
In an alternative implementation, the above-mentioned GNN is usedθAnd GNNφWhen node feature update is performed, the node feature update is implemented by obtaining K channels by decoupling edge weights and initial node features, fig. 5 is a flowchart of a method for generating an analysis result according to another exemplary embodiment of the present application, taking application of the method in a server as an example, as shown in fig. 5, the method includes:
step 501, obtaining a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure comprises node data and edge data.
The node data corresponds to an initial node feature, and the edge data corresponds to a semantic aspect.
In one example, the heterogeneous graph structure is expressed as
Figure BDA0002640813400000121
The initial feature vector is expressed as X ∈ R|ν|×dinThe semantic aspect is expressed as a (aspect).
And 502, inputting the heterogeneous graph structure of the target heterogeneous graph into a probability graph generation model.
And when the heterogeneous graph structure is input into the probability graph generation model, inputting the initial node characteristics and the semantic number into the probability graph generation model, so that the probability graph generation model updates the initial node characteristics by combining with the input content.
Step 503, decoupling the edge weight and the initial node feature to obtain K number of aspect channels, where K is a positive integer.
The value of K corresponds to the semantic number, the semantic number is used for indicating the number of semantic aspect types indicated by the edge data, and the value of K is predefined.
Optionally, the neural network structure adopted by the probability map generation model is an Aspect-aware GNN (a)2GNN), i.e., the above-mentioned GNNθAnd GNNφ
The contents needing to be output by the probability graph generation model comprise an update semantic feature and an update node feature, wherein the update node feature is expressed as
Figure BDA0002640813400000131
hiRepresenting the characteristics of the output of the neural network layer, i representing the ith node; updating a semantic feature expression of
Figure BDA0002640813400000132
akDenotes the kth aspect hidden variable, K denotes the total number of semantic aspects.
Step 504, updating the edge weight according to the node feature vector and the aspect feature vector of the kth aspect channel.
Due to different aspects to conceal variable akThe conditions need to be kept as independent as possible to describe semantic aspects with different meanings in the heterogeneous graph, and each node can participate in different semantic aspect descriptions, so that the co-occurrence probability between a node and the peripheral neighbor node needs to meet aspect-sensitive (aspect-aware) conditions, that is, the probability graph generation model needs to have the capability of modeling the co-occurrence probability between the node and the peripheral neighbor node.
Therefore, in this embodiment, the edge data further corresponds to an edge weight, the edge weight and the initial node feature are decoupled to obtain K aspect channels, K is a positive integer, and a value of K corresponds to the semantic number, the edge weight is updated according to the node feature vector and the aspect feature vector of the kth aspect channel, and the initial node feature is updated according to the updated edge weight. The node feature vector is obtained by mapping the initial node feature vector through a nonlinear mapping function, namely, the initial node feature is mapped through the nonlinear mapping function to obtain the node feature vector of the channel in the kth aspect; the aspect feature vectors of the k aspect channels are obtained by performing global graph pooling in the kth aspect channel, where the global graph pooling in this embodiment is implemented as global average pooling, and the global graph pooling may also be implemented as other graph pooling techniques, which is not limited in this embodiment. For illustration, the global pooling process refers to the following formula ten:
formula ten:
Figure BDA0002640813400000133
wherein GLOBALPOOL represents a global graph pooling process,
Figure BDA0002640813400000134
w represents a semantic type.
When the side weight is updated, acquiring a first node feature vector and a first aspect feature vector of the ith node in the kth aspect channel, and splicing to obtain a first spliced vector; acquiring a second node characteristic vector and a second aspect characteristic vector of a kth node in a kth aspect channel, and splicing to obtain a second spliced vector; and acquiring a third node characteristic vector and a third aspect characteristic vector of the mth node in the kth aspect channel, splicing to obtain a third spliced vector, wherein i, j and m are positive integers, the mth node is a neighbor node of the ith node, determining the first semantic similarity of the first spliced vector and the second spliced vector, determining the second semantic similarity of the first spliced vector and the third spliced vector, and updating the side weight according to the first semantic similarity and the second semantic similarity.
For example, please refer to the following formula eleven:
formula eleven:
Figure BDA0002640813400000141
wherein, for the k-th aspect channel,
Figure BDA0002640813400000142
for the first stitching vector in the l-th iteration,
Figure BDA0002640813400000143
for the second stitching vector in the l-th iteration,
Figure BDA0002640813400000144
as an edge weight between the ith and jth nodes in the l-1 th iteration, N (v)i) Indicating the neighbor nodes of the ith node,
Figure BDA0002640813400000145
for the third splice vector in the l-th iteration,
Figure BDA0002640813400000146
is the edge weight between the ith node and the mth node in the (l-1) th iteration. K represents a kernel function for measuring semantic similarity between nodes.
And 505, updating the initial node characteristics according to the updated edge weights, and outputting to obtain the heterogeneous graph characteristics.
The heterogeneous graph features include an update semantic feature and an update node feature.
Illustratively, decoupling the initial node features according to semantic aspects based on updating the edge weights, please refer to the following formula twelve:
equation twelve:
Figure BDA0002640813400000147
wherein, for the k-th aspect channel,
Figure BDA0002640813400000148
is the first stitching vector in the l-1 th iteration.
The denominator in the above formula ten and formula eleven is to normalize the corresponding features in the neighborhood. Through alternate updating of edge weights and node features, the model can capture semantics of different aspects in heterogeneous graphs. The final node features are obtained by splicing the node features in the channels in different aspects, please refer to formula thirteen as the following diagram:
formula thirteen:
Figure BDA0002640813400000149
wherein h isiRepresenting updated nodes, WoutFeatures representing the output of the neural network, zi,kAnd representing a splicing vector obtained by splicing the node feature vector and the aspect feature vector in the kth aspect channel after iteration.
Step 506, analyzing the updated semantic features and the updated node features to generate an analysis result corresponding to the node data.
Optionally, performing association prediction on node data in the heterogeneous graph structure according to the heterogeneous graph characteristics to obtain an association prediction result.
And the association prediction result is used for indicating the predicted association relation between the node data.
In summary, in the analysis result generation method provided in this embodiment, the heterogeneous graph structure is used as a random variable, and the semantic aspect is used as a hidden variable, so that the heterogeneous graph is subjected to embedding processing, implicit semantics are obtained from the heterogeneous graph, the node feature vector is updated, the semantic feature vector is updated, more accurate node feature vectors and semantic feature vectors are obtained, the node features are prevented from being updated by setting a meta-path, the update efficiency of the node features is improved, and the task execution accuracy in the downstream task is higher.
In this example, for A2GNN neural network, obtaining K aspect channels through decoupling, and keeping different aspect hidden variables akThe conditions are required to be kept as independent as possible to describe the semantic aspect with different meanings in the heterogeneous graph, and the probability graph generation model is ensured to have the modeling capacity for the co-occurrence probability between the node and the peripheral adjacent node.
Schematically, A provided in the examples of the present application2The algorithm of the GNN neural network is shown as follows:
Input:
Figure BDA0002640813400000155
initialnodefeatureX∈R|ν|×din,aspectnumberK,layernumberL
Output:NodeembeddingH∈R|ν|×dout
1:Randomlyinitializeak,1≤k≤K
2:whilenotconvergedo
3:Calculatexi,k=fk(xi),
Figure BDA0002640813400000151
4:forlayerl=1,2,…,L,do
5:fork=1,2,…,K,do
6:for(i,j)∈do
7:
Figure BDA0002640813400000152
8:endfor
9:fori=1,2,…,|V|do
10:
Figure BDA0002640813400000153
11:endfor
12:ifl=Lthen
13:
Figure BDA0002640813400000154
14:Updateak
15:endif
16:endfor
17:endfor
18:endwhile
wherein the algorithmThe first line Input represents that the Input parameters include a heterogeneous graph structure G, an initial node feature X and a semantic number K, where the semantic number is used to indicate the number of semantic aspect types indicated by the edge data, that is, the number of aspect hidden variables, and the number of neural network iterations L. The second line Output represents Output content, including updated node features and updated semantic features, where R is used to indicate the type of edge, | v | represents the number of nodes, doutRepresenting the dimensions of the output updated feature vector.
In the algorithm process, step 1 represents a feature vector a in the aspect of random initializationk(ii) a Step 2 represents starting iterative updating; step 3 represents the calculation of xi,kAnd
Figure BDA0002640813400000162
step 4, judging whether the iteration frequency L reaches L; step 5 represents judging whether the current aimed-at aspect channel K reaches K; step 6 represents judging whether the current edge (i, j) belongs to a known edge in the heterogeneous graph structure; step 7 represents updating the edge weights; step 8 represents that after the weight is updated for each edge, the for cycle of weight update is ended; step 9 represents judging whether the current updated node i reaches the total number of nodes | V |; step 10, updating the splicing vector of the node feature vector and the semantic feature vector in the kth channel; step 11, after the node is updated, ending the for cycle of node feature update; step 12 represents when the iteration number L reaches the set number L; step 13, obtaining total node characteristics by splicing the node characteristics in the channels in different aspects; step 14 represents updating semantic features; steps 15 to 18 represent that the iterative process is ended when all conditions are met.
In an optional embodiment, the generative model parameters involved in the embodiment of the present application further need to be trained and adjusted by a loss function, fig. 6 is a flowchart of a method for generating an analysis result according to another exemplary embodiment of the present application, taking application of the method in a server as an example, as shown in fig. 6, the method includes:
step 601, obtaining a heterogeneous graph structure of the target heterogeneous graph, wherein the heterogeneous graph structure comprises node data and edge data.
The node data corresponds to an initial node feature, and the edge data corresponds to a semantic aspect.
In one example, the heterogeneous graph structure is expressed as
Figure BDA0002640813400000161
The initial feature vector is expressed as X ∈ R|ν|×dinThe semantic aspect is expressed as a (aspect).
Step 602, inputting the heterogeneous graph structure of the target heterogeneous graph into a probability graph generation model.
And when the heterogeneous graph structure is input into the probability graph generation model, inputting the initial node characteristics and the semantic number into the probability graph generation model, so that the probability graph generation model updates the initial node characteristics by combining with the input content.
And 603, embedding the initial node characteristics through a probability graph generation model by taking the heterogeneous graph structure as a random variable and taking the semantic aspect as a hidden variable, and outputting to obtain the heterogeneous graph characteristics of the target heterogeneous graph.
The heterogeneous graph features include an update semantic feature and an update node feature.
The probability map generation model also corresponds to generation model parameters, and in response to the generation model parameters being untrained parameters, the model parameters need to be iteratively adjusted according to the heterogeneous map structure.
Illustratively, as shown in the above equation one, φ represents the generative model parameters of the probabilistic graphical generative model.
The generative model parameter φ of the generative model can be solved by means of maximum likelihood estimation, i.e. maximizing logpφ(G | X). Because hidden variables exist in the model and the maximum optimization problem cannot be directly solved, a variational Expectation Maximization (vEM) algorithm is adopted for solving.
Optionally, a maximum likelihood estimation function logp of the model parameter corresponding to the heterogeneous graph structure and the initial node feature is determinedφ(G | X), adjusting the model parameters by the maximum likelihood estimation function, and generating a model by the adjusted model parameters through a probability mapThe type embeds the initial node features.
Wherein the maximum likelihood estimation function is converted into a distribution q of variationθ(A | X) and a posterior distribution pφ(A | G, X) expression, the variational distribution corresponds to a variational model parameter theta, the posterior distribution corresponds to a generative model parameter phi, the variational model parameter is adjusted according to the divergence requirement between the variational distribution and the posterior distribution, and the generative model parameter is adjusted according to the adjusted variational model parameter.
Optionally, the generated model parameters may also be adjusted by a loss value according to the updated semantic features and the updated node features.
And step 604, inputting the updated semantic features and the updated node features into a preset loss function, and outputting a loss value corresponding to the updated result.
Traditional undirected probabilistic graphical models (also known as markov networks) follow pairwise markov property, i.e., each node has only probabilistic dependence on its first-order neighbors, and is conditionally independent of nodes that are not directly connected. The heterogeneous graph has complex long-distance dependence, and the incidence relation between nodes which are not directly adjacent in the heterogeneous graph is modeled in a regularization mode through an automatic supervision training strategy. Illustratively, for a given central node vcNodes in the surrounding n-hop domain are studied. First, a specific rule structure mask subgraph G is designedmaskWherein G ismaskThe contained edges satisfy: (1) keep the original heterogeneous graph and the central node vcEdges corresponding to directly connected one-hop neighbors are used as anchor points; (2) for edges except anchor points, shielding the edges in the original heterogeneous graph and supplementing the edges with a central node vcEdges that are not directly connected. Is well constructed GmaskThereafter, the task is self-supervised to center node v by updating node characteristicscIs predicted as the target. The prediction task is formalized as a classification task, a cross entropy loss function is adopted as a loss function, and the form of the loss function is shown as the following formula fourteen:
the formula fourteen:
Figure BDA0002640813400000181
wherein I represents an indicator function.
Step 605, adjusting the probability map generation model according to the loss value.
Optionally, the generated model parameters are adjusted according to the loss value calculated by the formula fourteen, and after the adjustment, the initial node feature is embedded by using the updated node feature as the initial node feature and iteratively executing the step of generating the model through the probability map.
Optionally, when calculating the loss value, in addition to the cross entropy loss function provided by the above formula fourteen, the total loss value may be determined according to an objective function of the downstream task, that is, in combination with various downstream tasks related to the heterogeneous graph, end-to-end training is performed by using a standard back propagation algorithm.
Illustratively, the total loss function may be formalized as the following equation fifteen:
Figure BDA0002640813400000183
wherein L isdRepresenting the corresponding penalty function for the downstream task.
In summary, in the analysis result generation method provided in this embodiment, the heterogeneous graph structure is used as a random variable, and the semantic aspect is used as a hidden variable, so that the heterogeneous graph is subjected to embedding processing, implicit semantics are obtained from the heterogeneous graph, the node feature vector is updated, the semantic feature vector is updated, more accurate node feature vectors and semantic feature vectors are obtained, the node features are prevented from being updated by setting a meta-path, the update efficiency of the node features is improved, and the task execution accuracy in the downstream task is higher.
For an example, the accuracy of node classification can be improved by combining the method for generating the analysis result provided in the embodiment of the present application, please refer to table one below, in which the numerical values are expressed in percentage.
Watch 1
Figure BDA0002640813400000182
Figure BDA0002640813400000191
As shown in the above table, the node classification task is used to compare the classification results of the related art and the present application, and the databases used include a DataBase of computer english literature (DBLP), an Association for Computer Machinery (ACM) and an Internet Movie DataBase (IMDB).
The related art mainly includes a method for processing a homogeneity map: deep walk (deep walk), Graph Convolutional neural Networks (GCN), and heterogeneous Graph processing methods: a meta 2vec, a Heterogeneous Graph Attention Network (HAN), a Graph Transformation Network (GTN). As can be seen from the results in table one, the method provided by the present application is superior in node classification accuracy.
Fig. 7 is a block diagram of a structure of an apparatus for generating an analysis result according to an exemplary embodiment of the present application, and as shown in fig. 7, the apparatus includes:
an obtaining module 710, configured to obtain a heterogeneous graph structure of a target heterogeneous graph, where the heterogeneous graph structure includes node data and edge data, the node data corresponds to an initial node feature, and the edge data corresponds to a semantic aspect;
a determining module 720, configured to determine an initial node feature corresponding to the node data and a semantic aspect corresponding to the edge data;
an embedding module 730, configured to embed the initial node feature with the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable to obtain a heterogeneous graph feature of the target heterogeneous graph, where the heterogeneous graph feature includes an updated semantic feature and an updated node feature;
the analysis module 740 is configured to analyze the updated semantic features and the updated node features to generate an analysis result corresponding to the node data.
In an alternative embodiment, please refer to fig. 8, the embedding module 730 is further configured to input the heterogeneous graph structure into a probability graph generation model; and embedding the initial node features by using the probability graph generation model by using the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable.
In an alternative embodiment, the probability map generation model includes generation model parameters;
the embedding module 730 includes:
a determining unit 731 for determining a maximum likelihood estimation function of the model parameters corresponding to the heterostructure and the initial node feature;
an adjusting unit 732, configured to adjust the model parameter through the maximum likelihood estimation function;
an embedding unit 733, configured to embed the initial node feature through the probability map generation model with the adjusted model parameter.
In an optional embodiment, the adjusting unit 732 is specifically configured to convert the maximum likelihood estimation function into a function expressed by a variation distribution and a posterior distribution, where the variation distribution corresponds to a variation model parameter, and the posterior distribution corresponds to the generation model parameter; adjusting the parameters of the variation model according to the divergence requirement between the variation distribution and the posterior distribution; and adjusting the generated model parameters according to the adjusted variation model parameters.
In an optional embodiment, the pre-embedding module 730 is further configured to input the updated semantic features and the updated node features into a preset loss function, and output a loss value corresponding to an updated result;
the embedding module 730 is further configured to adjust the probability map generation model according to the loss value.
In an optional embodiment, the embedding module 730 is further configured to iteratively perform the step of embedding the initial node feature through the probability map generation model by using the updated node feature as the initial node feature.
In an optional embodiment, the edge data further corresponds to an edge weight, the semantic aspect corresponds to a semantic number, and the semantic number is used for indicating the number of semantic aspect types indicated by the edge data;
the embedding module 730 includes:
an embedding unit 733, configured to decouple the edge weight and the initial node feature to obtain K aspect channels, where K is a positive integer, and a value of K corresponds to the semantic number, where each aspect channel corresponds to a group of node feature vectors and aspect feature vectors;
an adjusting unit 732, configured to update the edge weight according to the node feature vector and the aspect feature vector of the kth aspect channel; and updating the initial node characteristics according to the updated edge weights.
In an optional embodiment, the embedding unit is further configured to map the initial node feature by using a nonlinear mapping function to obtain the node feature vector of the kth-aspect channel; and performing global graph pooling in the kth aspect channel to obtain the aspect feature vector of the kth aspect channel.
In an optional embodiment, the adjusting unit 732 is further configured to obtain a first node feature vector and a first aspect feature vector of the ith node in the kth aspect channel, and splice the first node feature vector and the first aspect feature vector to obtain a first spliced vector; acquiring a second node characteristic vector and a second aspect characteristic vector of the jth node in the kth aspect channel, and splicing to obtain a second spliced vector; acquiring a third node characteristic vector and a third aspect characteristic vector of an mth node in the kth aspect channel, and splicing to obtain a third spliced vector, wherein i, j and m are positive integers, and the mth node is a neighbor node of the ith node;
the adjusting unit 732 is further configured to determine a first semantic similarity between the first spliced vector and the second spliced vector; determining a second semantic similarity of the first mosaic vector and the third mosaic vector; and updating the edge weight according to the first semantic similarity and the second semantic similarity.
In an optional embodiment, the node data includes a first node and a second node, the first node corresponds to the account data, and the second node corresponds to the candidate recommended content;
the analysis module 740 is further configured to perform association prediction on the first node and the second node according to the heterogeneous graph features, and obtain an association prediction result as the analysis result, where the association prediction result is used to indicate the predicted interest degree of the account data in the candidate recommended content.
In an optional embodiment, the analysis module 740 is further configured to input the feature of the heterogeneous map into a recommendation model, where the recommendation model is a machine learning model obtained through pre-training; and performing association prediction on the first node and the second node through the recommendation model to obtain an association prediction result.
In summary, the analysis result generation apparatus provided in this embodiment uses the heterogeneous graph structure as a random variable and uses the semantic aspect as a hidden variable, so as to perform embedding processing on the heterogeneous graph, obtain the hidden semantic from the heterogeneous graph, update the node feature vector, update the semantic feature vector, obtain more accurate node feature vector and semantic feature vector, and avoid updating the node feature by setting the meta path, thereby improving the update efficiency of the node feature, and improving the accuracy of executing the task in the downstream task.
It should be noted that: the analysis result generating device provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the analysis result generation device and the analysis result generation method provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.
Fig. 9 shows a schematic structural diagram of a server according to an exemplary embodiment of the present application. Specifically, the method comprises the following steps:
the server 900 includes a Central Processing Unit (CPU) 901, a system Memory 904 including a Random Access Memory (RAM) 902 and a Read Only Memory (ROM) 903, and a system bus 905 connecting the system Memory 904 and the CPU 901. The server 900 also includes a mass storage device 906 for storing an operating system 913, application programs 914, and other program modules 915.
The mass storage device 906 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905. The mass storage device 906 and its associated computer-readable media provide non-volatile storage for the server 900. That is, mass storage device 906 may include a computer-readable medium (not shown) such as a hard disk or Compact disk Read Only Memory (CD-ROM) drive.
Without loss of generality, computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other solid state Memory technology, CD-ROM, Digital Versatile Disks (DVD), or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage, or other magnetic storage devices. Of course, those skilled in the art will appreciate that computer storage media is not limited to the foregoing. The system memory 904 and mass storage device 906 described above may be collectively referred to as memory.
The server 900 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the present application. That is, the server 900 may be connected to the network 912 through the network interface unit 911 connected to the system bus 905, or the network interface unit 911 may be used to connect to other types of networks or remote computer systems (not shown).
The memory further includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.
Embodiments of the present application further provide a computer device, which includes a processor and a memory, where the memory stores at least one instruction, at least one program, a code set, or a set of instructions, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the analysis result generation method provided by the foregoing method embodiments.
Embodiments of the present application further provide a computer-readable storage medium, where at least one instruction, at least one program, a code set, or a set of instructions is stored on the computer-readable storage medium, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by a processor to implement the analysis result generation method provided by the foregoing method embodiments.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to make the computer device execute the method for generating the analysis result in any of the above embodiments.
Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (14)

1. A method for generating an analysis result, the method comprising:
acquiring a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure comprises node data and edge data;
determining initial node features corresponding to the node data and semantic aspects corresponding to the edge data;
embedding the initial node features by taking the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable to obtain heterogeneous graph features of the target heterogeneous graph, wherein the heterogeneous graph features comprise updated semantic features and updated node features;
and analyzing the updated semantic features and the updated node features to generate an analysis result corresponding to the node data.
2. The method according to claim 1, wherein the embedding the initial node features with the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable comprises:
inputting the heterogeneous graph structure into a probability graph generation model;
and embedding the initial node features by using the probability graph generation model by using the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable.
3. The method of claim 2, wherein the probability map generation model comprises generation model parameters;
the embedding the initial node features through the probability map generation model comprises:
determining a maximum likelihood estimation function of the model parameters corresponding to the heterogeneous graph structure and the initial node characteristics;
adjusting the model parameters through the maximum likelihood estimation function;
embedding the initial node characteristics through the adjusted model parameters through the probability graph generation model.
4. The method of claim 3, wherein adjusting the model parameters through the maximum likelihood estimation function comprises:
converting the maximum likelihood estimation function into a representation through variation distribution and posterior distribution, wherein the variation distribution corresponds to variation model parameters, and the posterior distribution corresponds to the generation model parameters;
adjusting the parameters of the variation model according to the divergence requirement between the variation distribution and the posterior distribution;
and adjusting the generated model parameters according to the adjusted variation model parameters.
5. The method according to any one of claims 2 to 4, wherein after outputting the updated semantic features and the updated node features, the method further comprises:
inputting the updated semantic features and the updated node features into a preset loss function, and outputting to obtain a loss value corresponding to an updated result;
and adjusting the probability map generation model according to the loss value.
6. The method of claim 5, wherein after the outputting obtains the heterogeneous map feature of the target heterogeneous map, further comprising:
and iteratively executing the step of embedding the initial node features through the probability graph generation model by taking the updated node features as the initial node features.
7. The method according to any one of claims 1 to 4, wherein the edge data further corresponds to edge weights, the semantic aspect corresponds to a semantic quantity, and the semantic quantity is used for indicating the quantity of semantic aspect types indicated by the edge data;
the embedding the initial node features comprises:
decoupling the edge weight and the initial node feature to obtain K aspect channels, wherein K is a positive integer, and the value of K corresponds to the semantic number, and each aspect channel corresponds to a group of node feature vectors and aspect feature vectors;
updating the side weight according to the node feature vector and the aspect feature vector of the kth aspect channel;
and updating the initial node characteristics according to the updated edge weights.
8. The method of claim 7, wherein before updating the side weights based on the node feature vector and the aspect feature vector of the kth aspect channel, the method comprises:
mapping the initial node features through a nonlinear mapping function to obtain the node feature vector of the kth-aspect channel;
and performing global graph pooling in the kth aspect channel to obtain the aspect feature vector of the kth aspect channel.
9. The method of claim 7, wherein the updating the edge weights according to the node feature vector and the aspect feature vector of the kth aspect channel comprises:
acquiring a first node feature vector and a first aspect feature vector of the ith node in the kth aspect channel, and splicing to obtain a first spliced vector;
acquiring a second node characteristic vector and a second aspect characteristic vector of the jth node in the kth aspect channel, and splicing to obtain a second spliced vector;
acquiring a third node characteristic vector and a third aspect characteristic vector of an mth node in the kth aspect channel, and splicing to obtain a third spliced vector, wherein i, j and m are positive integers, and the mth node is a neighbor node of the ith node;
determining a first semantic similarity of the first mosaic vector and the second mosaic vector;
determining a second semantic similarity of the first mosaic vector and the third mosaic vector;
and updating the edge weight according to the first semantic similarity and the second semantic similarity.
10. The method according to any one of claims 1 to 4, wherein the node data comprises a first node and a second node, the first node corresponds to the account data, and the second node corresponds to the candidate recommended content;
the analyzing the updated semantic features and the updated node features to obtain an analysis result corresponding to the node data includes:
and performing association prediction on the first node and the second node according to the heterogeneous graph characteristics to obtain an association prediction result as the analysis result, wherein the association prediction result is used for indicating the prediction interest degree of the account data on the candidate recommended content.
11. The method according to claim 10, wherein the performing the association prediction on the first node and the second node according to the heterogeneous map features to obtain an association prediction result as the analysis result includes:
inputting the characteristics of the heterogeneous graph into a recommendation model, wherein the recommendation model is a machine learning model obtained by pre-training;
and performing association prediction on the first node and the second node through the recommendation model to obtain an association prediction result.
12. An apparatus for generating an analysis result, the apparatus comprising:
the acquisition module is used for acquiring a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure comprises node data and edge data, the node data corresponds to an initial node characteristic, and the edge data corresponds to a semantic aspect;
a determining module, configured to determine an initial node feature corresponding to the node data and a semantic aspect corresponding to the edge data;
the embedding module is used for embedding the initial node features by taking the heterogeneous graph structure as a random variable and taking the semantic aspect as a hidden variable to obtain heterogeneous graph features of the target heterogeneous graph, wherein the heterogeneous graph features comprise updated semantic features and updated node features;
and the analysis module is used for analyzing the updated semantic features and the updated node features to generate an analysis result corresponding to the node data.
13. A computer device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of generating an analysis result according to any one of claims 1 to 11.
14. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of generating an analysis result according to any one of claims 1 to 11.
CN202010839225.1A 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium Pending CN111967271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010839225.1A CN111967271A (en) 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010839225.1A CN111967271A (en) 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN111967271A true CN111967271A (en) 2020-11-20

Family

ID=73389387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010839225.1A Pending CN111967271A (en) 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111967271A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033194A (en) * 2021-03-09 2021-06-25 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of semantic representation graph model
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network
WO2021139738A1 (en) * 2020-01-07 2021-07-15 北京嘀嘀无限科技发展有限公司 Target task execution vehicle determination method, and system
CN115828931A (en) * 2023-02-09 2023-03-21 中南大学 Chinese and English semantic similarity calculation method for paragraph-level text
CN117496161A (en) * 2023-12-29 2024-02-02 武汉理工大学 Point cloud segmentation method and device
WO2024169263A1 (en) * 2023-02-13 2024-08-22 腾讯科技(深圳)有限公司 Search data processing method and apparatus, computer device and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139738A1 (en) * 2020-01-07 2021-07-15 北京嘀嘀无限科技发展有限公司 Target task execution vehicle determination method, and system
CN113033194A (en) * 2021-03-09 2021-06-25 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of semantic representation graph model
CN113033194B (en) * 2021-03-09 2023-10-24 北京百度网讯科技有限公司 Training method, device, equipment and storage medium for semantic representation graph model
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network
CN113094707B (en) * 2021-03-31 2024-05-14 中国科学院信息工程研究所 Lateral movement attack detection method and system based on heterogeneous graph network
CN115828931A (en) * 2023-02-09 2023-03-21 中南大学 Chinese and English semantic similarity calculation method for paragraph-level text
WO2024169263A1 (en) * 2023-02-13 2024-08-22 腾讯科技(深圳)有限公司 Search data processing method and apparatus, computer device and storage medium
CN117496161A (en) * 2023-12-29 2024-02-02 武汉理工大学 Point cloud segmentation method and device
CN117496161B (en) * 2023-12-29 2024-04-05 武汉理工大学 Point cloud segmentation method and device

Similar Documents

Publication Publication Date Title
CN111967271A (en) Analysis result generation method, device, equipment and readable storage medium
Khaireddin et al. Facial emotion recognition: State of the art performance on FER2013
CN109816032B (en) Unbiased mapping zero sample classification method and device based on generative countermeasure network
CN110659723B (en) Data processing method and device based on artificial intelligence, medium and electronic equipment
CN111611488B (en) Information recommendation method and device based on artificial intelligence and electronic equipment
CN113761261A (en) Image retrieval method, image retrieval device, computer-readable medium and electronic equipment
CN113822315A (en) Attribute graph processing method and device, electronic equipment and readable storage medium
CN111797327B (en) Social network modeling method and device
CN111382190A (en) Object recommendation method and device based on intelligence and storage medium
CN113011167A (en) Cheating identification method, device and equipment based on artificial intelligence and storage medium
CN114723037A (en) Heterogeneous graph neural network computing method for aggregating high-order neighbor nodes
CN116992151A (en) Online course recommendation method based on double-tower graph convolution neural network
CN115062779A (en) Event prediction method and device based on dynamic knowledge graph
Zhang et al. Reinforcement learning with actor-critic for knowledge graph reasoning
CN117194771B (en) Dynamic knowledge graph service recommendation method for graph model characterization learning
CN111091198A (en) Data processing method and device
CN116976402A (en) Training method, device, equipment and storage medium of hypergraph convolutional neural network
CN115019342A (en) Endangered animal target detection method based on class relation reasoning
CN114266512A (en) User energy consumption behavior analysis method, system, device and medium
CN111027709B (en) Information recommendation method and device, server and storage medium
CN113590720A (en) Data classification method and device, computer equipment and storage medium
CN113822293A (en) Model processing method, device and equipment for graph data and storage medium
Kapoor et al. A genetic programming approach to the automated design of CNN models for image classification and video shorts creation
CN117540828B (en) Training method and device for training subject recommendation model, electronic equipment and storage medium
CN118197402B (en) Method, device and equipment for predicting drug target relation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination