CN111967271A - Analysis result generation method, device, equipment and readable storage medium - Google Patents

Analysis result generation method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN111967271A
CN111967271A CN202010839225.1A CN202010839225A CN111967271A CN 111967271 A CN111967271 A CN 111967271A CN 202010839225 A CN202010839225 A CN 202010839225A CN 111967271 A CN111967271 A CN 111967271A
Authority
CN
China
Prior art keywords
node
feature
semantic
heterogeneous graph
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010839225.1A
Other languages
Chinese (zh)
Inventor
吕肖庆
张晨睿
林衍凯
李鹏
周杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Peking University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Tencent Technology Shenzhen Co Ltd filed Critical Peking University
Priority to CN202010839225.1A priority Critical patent/CN111967271A/en
Publication of CN111967271A publication Critical patent/CN111967271A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种分析结果的生成方法、装置、设备及可读存储介质,涉及机器学习领域。该方法包括:获取目标异质图的异质图结构,异质图结构中包括节点数据和边数据;确定节点数据对应的初始节点特征,以及边数据对应的语义方面;以异质图结构为随机变量,语义方面为隐变量,对初始节点特征进行嵌入,得到目标异质图的异质图特征,异质图特征中包括更新语义特征和更新节点特征。以语义方面为隐变量,从而对异质图进行嵌入处理,从异质图中获取隐含的语义,从而对节点特征向量和语义特征向量进行更新,避免通过设置元路径的方式更新节点特征,提高了节点特征的更新效率,在下游分析结果的生成中的执行准确度更高。

Figure 202010839225

The present application discloses a method, apparatus, device and readable storage medium for generating analysis results, and relates to the field of machine learning. The method includes: acquiring a heterogeneous graph structure of a target heterogeneous graph, where the heterogeneous graph structure includes node data and edge data; determining initial node features corresponding to the node data and semantic aspects corresponding to the edge data; taking the heterogeneous graph structure as The random variable is a hidden variable in terms of semantics. The initial node features are embedded to obtain the heterogeneous graph features of the target heterogeneous graph. The heterogeneous graph features include updated semantic features and updated node features. Taking the semantic aspect as the hidden variable, the heterogeneous graph is embedded, and the implicit semantics is obtained from the heterogeneous graph, so as to update the node feature vector and the semantic feature vector, and avoid updating the node feature by setting the meta path. The update efficiency of node features is improved, and the execution accuracy is higher in the generation of downstream analysis results.

Figure 202010839225

Description

分析结果的生成方法、装置、设备及可读存储介质Method, apparatus, device and readable storage medium for generating analysis results

技术领域technical field

本申请实施例涉及机器学习领域,特别涉及一种分析结果的生成方法、装置、设备及可读存储介质。The embodiments of the present application relate to the field of machine learning, and in particular, to a method, apparatus, device, and readable storage medium for generating an analysis result.

背景技术Background technique

图嵌入是一种将属性图转换为向量或向量集的算法,异质图是一种包含多种类型的节点,以及包含多种类型的边的图数据结构,也即,异质图嵌入是指将异质图的结构以及语义信息表示为节点向量的算法,最终得到的更新后的节点特征在嵌入空间中满足语义相关的节点彼此接近,语义无关的节点彼此远离。Graph embedding is an algorithm that converts a property graph into a vector or set of vectors. A heterogeneous graph is a graph data structure that contains multiple types of nodes and multiple types of edges. That is, a heterogeneous graph embedding is Refers to an algorithm that expresses the structure and semantic information of heterogeneous graphs as node vectors, and finally obtains updated node features in the embedding space. Nodes that are semantically related are close to each other, and nodes that are semantically unrelated are far away from each other.

相关技术中,通过设置元路径的方式对异质图进行嵌入,其中,元路径用于表示特定语义含义的节点类型序列,由开发人员根据领域知识预定义,利用元路径指导在异质图上的随即游走,得到符合元路径的正样本和不符合元路径的负样本,将正样本和负样本输入特征嵌入模型中,从而更新节点特征。In the related art, the heterogeneous graph is embedded by setting the meta-path, wherein the meta-path is used to represent the sequence of node types with specific semantic meaning, which is predefined by the developer according to the domain knowledge, and the meta-path is used to guide the heterogeneous graph. The random walk of , obtains positive samples that conform to the meta-path and negative samples that do not conform to the meta-path, and embeds the positive samples and negative samples into the model to update the node features.

然而,由于人工定义的元路径的数量有限,无法覆盖一些潜在的、复杂的节点语义相关性,导致节点特征表达存在欠缺,节点特征的更新准确率较差。However, due to the limited number of artificially defined meta-paths, it cannot cover some potential and complex node semantic dependencies, resulting in the lack of node feature expression and the poor update accuracy of node features.

发明内容SUMMARY OF THE INVENTION

本申请实施例提供了一种分析结果的生成方法、装置、设备及可读存储介质,能够提高节点特征的更新准确率。所述技术方案如下:Embodiments of the present application provide a method, apparatus, device, and readable storage medium for generating an analysis result, which can improve the update accuracy of node features. The technical solution is as follows:

一方面,提供了一种分析结果的生成方法,所述方法包括:In one aspect, a method for generating an analysis result is provided, the method comprising:

获取目标异质图的异质图结构,所述异质图结构中包括节点数据和边数据;obtaining a heterogeneous graph structure of the target heterogeneous graph, where the heterogeneous graph structure includes node data and edge data;

确定所述节点数据对应的初始节点特征和所述边数据对应的语义方面;determining the initial node feature corresponding to the node data and the semantic aspect corresponding to the edge data;

以所述异质图结构为随机变量,所述语义方面为隐变量,对所述初始节点特征进行嵌入,得到所述目标异质图的异质图特征,所述异质图特征中包括更新语义特征和更新节点特征;Taking the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, the initial node feature is embedded to obtain the heterogeneous graph feature of the target heterogeneous graph, and the heterogeneous graph feature includes an update Semantic features and update node features;

对所述更新语义特征和所述更新节点特征进行分析,生成与所述节点数据对应的分析结果。The updated semantic feature and the updated node feature are analyzed to generate an analysis result corresponding to the node data.

另一方面,提供了一种分析结果的生成装置,所述装置包括:In another aspect, a device for generating analysis results is provided, the device comprising:

获取模块,用于获取目标异质图的异质图结构,所述异质图结构中包括节点数据和边数据,所述节点数据对应有初始节点特征,所述边数据对应有语义方面;an acquisition module, configured to acquire a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure includes node data and edge data, the node data corresponds to initial node features, and the edge data corresponds to semantic aspects;

确定模块,用于确定所述节点数据对应的初始节点特征和所述边数据对应的语义方面;a determination module, configured to determine the initial node feature corresponding to the node data and the semantic aspect corresponding to the edge data;

嵌入模块,用于以所述异质图结构为随机变量,所述语义方面为隐变量,对所述初始节点特征进行嵌入,得到所述目标异质图的异质图特征,所述异质图特征中包括更新语义特征和更新节点特征;The embedding module is used to embed the initial node feature with the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable to obtain the heterogeneous graph feature of the target heterogeneous graph. The graph features include update semantic features and update node features;

分析模块,用于对所述更新语义特征和所述更新节点特征进行分析,生成与所述节点数据对应的分析结果。An analysis module, configured to analyze the updated semantic feature and the updated node feature, and generate an analysis result corresponding to the node data.

另一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如上述本申请实施例中任一所述分析结果的生成方法。In another aspect, a computer device is provided, the computer device includes a processor and a memory, the memory stores at least one instruction, at least a section of a program, a code set or an instruction set, the at least one instruction, the at least one A piece of program, the code set or the instruction set is loaded and executed by the processor to implement the method for generating the analysis result as described in any of the foregoing embodiments of the present application.

另一方面,提供了一种计算机可读存储介质,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如上述本申请实施例中任一所述的分析结果的生成方法。In another aspect, a computer-readable storage medium is provided, wherein the storage medium stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the at least one piece of program, the code The set or instruction set is loaded and executed by the processor to implement the method for generating the analysis result as described in any of the foregoing embodiments of the present application.

另一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述实施例中任一所述的分析结果的生成方法。In another aspect, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for generating an analysis result described in any of the foregoing embodiments.

本申请实施例提供的技术方案带来的有益效果至少包括:The beneficial effects brought by the technical solutions provided in the embodiments of the present application include at least:

以异质图结构为随机变量,并以语义方面为隐变量,从而对异质图进行嵌入处理,从异质图中获取隐含的语义,从而对节点特征向量进行更新,以及对语义特征向量进行更新,获取更准确的节点特征向量和语义特征向量,以及避免通过设置元路径的方式更新节点特征,提高了节点特征的更新效率,在下游预测任务中的任务执行准确度更高。Taking the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, the heterogeneous graph is embedded, and the implicit semantics are obtained from the heterogeneous graph, so as to update the node feature vector and the semantic feature vector. Update, obtain more accurate node feature vectors and semantic feature vectors, and avoid updating node features by setting meta-paths, which improves the update efficiency of node features, and has higher task execution accuracy in downstream prediction tasks.

附图说明Description of drawings

为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本申请一个示例性实施例提供的相关技术中通过设置元路径提取异质图的节点特征的示意图;1 is a schematic diagram of extracting node features of a heterogeneous graph by setting a meta-path in the related art provided by an exemplary embodiment of the present application;

图2是本申请一个示例性实施例提供的相关技术中异质图注意力网络提取异质图的节点特征的示意图;2 is a schematic diagram of a heterogeneous graph attention network extracting node features of a heterogeneous graph in the related art provided by an exemplary embodiment of the present application;

图3是本申请一个示例性实施例提供的整体方案的框架示意图;FIG. 3 is a schematic framework diagram of an overall solution provided by an exemplary embodiment of the present application;

图4是本申请一个示例性实施例提供的分析结果的生成方法的流程图;4 is a flowchart of a method for generating an analysis result provided by an exemplary embodiment of the present application;

图5是本申请另一个示例性实施例提供的分析结果的生成方法的流程图;5 is a flowchart of a method for generating an analysis result provided by another exemplary embodiment of the present application;

图6是本申请另一个示例性实施例提供的分析结果的生成方法的流程图;6 is a flowchart of a method for generating an analysis result provided by another exemplary embodiment of the present application;

图7是本申请一个示例性实施例提供的分析结果的生成装置的结构框图;7 is a structural block diagram of an apparatus for generating an analysis result provided by an exemplary embodiment of the present application;

图8是本申请另一个示例性实施例提供的分析结果的生成装置的结构框图;8 is a structural block diagram of an apparatus for generating an analysis result provided by another exemplary embodiment of the present application;

图9是本申请一个示例性实施例提供的服务器的结构框图。FIG. 9 is a structural block diagram of a server provided by an exemplary embodiment of the present application.

具体实施方式Detailed ways

为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.

首先,针对本申请实施例中涉及的名词进行简单介绍:First, briefly introduce the terms involved in the embodiments of the present application:

人工智能(Artificial Intelligence,AI):是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial Intelligence (AI): It is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, involving a wide range of fields, including both hardware-level technology and software-level technology. The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

机器学习(Machine Learning,ML):是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习、示教学习等技术。Machine Learning (ML): It is a multi-field interdisciplinary subject involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. It specializes in how computers simulate or realize human learning behaviors to acquire new knowledge or skills, and to reorganize existing knowledge structures to continuously improve their performance. Machine learning is the core of artificial intelligence and the fundamental way to make computers intelligent, and its applications are in all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, teaching learning and other techniques.

异质图(Heterogeneous Graph):又称为异质信息网络(HeterogeneousInformation Network),是一种包含多种类型的节点,以及包含多种类型的边的图数据结构。示意性的,以推荐系统的应用场景为例,推荐系统中的实体包括用户帐号、用户发布的动态、广告主上传的推荐内容;实体间的关联关系包括用户帐号之间的好友关系、用户帐号与动态的互动情况(如:点赞、转发、动态提到帐号等)。则针对上述推荐系统,其中的实体通过异质图中的节点表达,而实体之间的关联关系通过异质图中节点之间的边表达。Heterogeneous Graph: Also known as Heterogeneous Information Network, it is a graph data structure that includes multiple types of nodes and multiple types of edges. Illustratively, taking the application scenario of the recommendation system as an example, the entities in the recommendation system include user accounts, news published by users, and recommended content uploaded by advertisers; associations between entities include friend relationships between user accounts, user accounts Interaction with the dynamic (such as: likes, reposts, accounts mentioned in the dynamic, etc.). For the above recommendation system, the entities are represented by nodes in the heterogeneous graph, and the associations between entities are represented by the edges between the nodes in the heterogeneous graph.

图嵌入:是一种将属性图转换为向量或向量集的算法,异质图嵌入(Heterogeneous Graph Embedding,HGE)是指将异质图的结构以及语义信息表示为节点向量的算法,最终得到的更新后的节点特征在嵌入空间中满足语义相关的节点彼此接近,语义无关的节点彼此远离。Graph Embedding: It is an algorithm that converts attribute graphs into vectors or vector sets. Heterogeneous Graph Embedding (HGE) refers to an algorithm that expresses the structure and semantic information of heterogeneous graphs as node vectors. The updated node features satisfy semantically related nodes in the embedding space are close to each other, and semantically irrelevant nodes are far away from each other.

相关技术中,通过设置元路径(Meta-path)的方式对异质图进行嵌入处理,其中,元路径是指特定语义含义的节点类型序列。如论文引用异质图网络中设置有元路径“A-P-C”,该元路径表示作者(Author)撰写的论文(Paper)发表在学术会议(Conference)上。在相关技术中,基于元路径的异质图嵌入方法包括如下两类:In the related art, the heterogeneous graph is embedded by setting a meta-path, wherein the meta-path refers to a sequence of node types with specific semantic meaning. For example, a meta-path "A-P-C" is set in the heterogeneous graph network of paper citations, and the meta-path indicates that the paper (Paper) written by the author (Author) is published in an academic conference (Conference). In the related art, meta-path-based heterogeneous graph embedding methods include the following two categories:

第一,利用元路径在异质图上随机游走,将预定义的元路径作为条件过滤器,符合元路径的作为正样本,不符合元路径的作为负样本,将正负样本输入特征嵌入模型中,输出得到更新后的节点;First, use the meta-path to randomly walk on the heterogeneous graph, use the predefined meta-path as a conditional filter, use the meta-path as a positive sample, and use the meta-path as a negative sample, and embed the positive and negative samples as input features. In the model, the output gets updated nodes;

示意性的,请参考图1,以论文引用异质图100为例,其中设置有机构节点110、作者节点120、论文节点130和学术会议节点140,根据预定义的元路径150在异质图100上进行随机游走,从输入的异质图100中采样多个正样本序列,编码后输入到Skip-Gram模型160中。Skip-Gram模型160预测节点的上下文节点出现的概率,即所选节点和其邻居的节点“共现”的概率,使得正样本序列中节点共现的概率尽量大。同样地,模型也会从异质图100中随机采样多个负样本,同样输入到Skip-Gram模型160中,让负样本中的节点共现的概率尽量小。通过这种约束,使得更新后的节点特征在嵌入空间中满足语义相关的节点之间彼此接近,语义无关的节点之间彼此远离的条件。Illustratively, please refer to FIG. 1, taking the paper citation heterogeneous graph 100 as an example, in which an institutional node 110, an author node 120, a paper node 130 and an academic conference node 140 are set. A random walk is performed on 100, multiple positive sample sequences are sampled from the input heterogeneous graph 100, and then encoded and input into the Skip-Gram model 160. The Skip-Gram model 160 predicts the probability of occurrence of the node's context node, that is, the probability of "co-occurrence" of the selected node and its neighbor nodes, so that the probability of co-occurrence of nodes in the positive sample sequence is as large as possible. Similarly, the model will randomly sample multiple negative samples from the heterogeneous graph 100 and input them into the Skip-Gram model 160, so that the probability of the nodes in the negative samples co-occurring is as small as possible. Through this constraint, the updated node features in the embedding space satisfy the condition that semantically related nodes are close to each other, and semantically unrelated nodes are far away from each other.

第二,将元路径作为先验信息,指导图神经网络学习中心节点所对应的邻居节点的权重,并在聚合时根据权重对节点特征进行加权,得到更新后的节点。Second, the meta-path is used as a priori information to guide the graph neural network to learn the weight of the neighbor nodes corresponding to the central node, and weight the node features according to the weight during aggregation to obtain the updated node.

示意性的,请参考图2,以异质图注意力网络(Heterogeneous Graph AttentionNetwork,HAN)为例,在节点级别210和语义级别220两个层面进行学习,其中,节点级别210的注意力机制主要学习节点及其近邻节点间的权重,语义级别220的注意力机制主要学习基于不同元路径的权重。Schematically, please refer to FIG. 2, taking Heterogeneous Graph Attention Network (HAN) as an example, learning is performed at two levels of node level 210 and semantic level 220, wherein the attention mechanism of node level 210 is mainly Learning the weights between nodes and their neighbors, the attention mechanism of semantic level 220 mainly learns the weights based on different meta-paths.

然而,由于人工定义的元路径的数量有限,无法覆盖一些潜在的、复杂的节点语义相关性,导致节点特征表达存在欠缺,节点特征的更新准确率较差。However, due to the limited number of artificially defined meta-paths, it cannot cover some potential and complex node semantic dependencies, resulting in the lack of node feature expression and the poor update accuracy of node features.

结合上述名词简介,对本申请实施例中涉及的应用场景进行举例说明:The application scenarios involved in the embodiments of the present application are described with examples in conjunction with the above-mentioned nomenclature introductions:

以推荐系统的应用场景为例,推荐系统中的实体包括用户帐号、用户发布的动态、广告主上传的推荐内容;实体间的关联关系包括用户帐号之间的好友关系、用户帐号与动态的互动情况(如:点赞、转发、动态提到帐号等)。则针对上述推荐系统,其中的实体通过异质图中的节点表达,而实体之间的关联关系通过异质图中节点之间的边表达。通过本申请实施例提供的分析结果的生成方法,获取上述推荐系统的异质图对应的更新节点特征和更新语义特征,更新节点特征和更新语义特征隐含了推荐系统的异质图中所包含的语义信息和结构信息。Taking the application scenario of the recommender system as an example, the entities in the recommender system include user accounts, news published by users, and recommended content uploaded by advertisers; the associations between entities include the friend relationship between user accounts, and the interaction between user accounts and news. Circumstances (such as: likes, retweets, dynamic mentions of accounts, etc.). For the above recommendation system, the entities are represented by nodes in the heterogeneous graph, and the associations between entities are represented by the edges between the nodes in the heterogeneous graph. Through the generation method of the analysis result provided by the embodiment of the present application, the update node feature and update semantic feature corresponding to the heterogeneous graph of the recommendation system are obtained, and the updated node feature and the updated semantic feature imply the information contained in the heterogeneous graph of the recommendation system. semantic and structural information.

在确定异质图特征后,将异质图特征应用于内容推荐中。示意性的,根据异质图特征对用户帐号对应的第一节点,以及候选推荐内容对应的第二节点进行关联预测,得到关联预测结果,该关联预测结果用于指示用户帐号对候选推荐内容的预测感兴趣程序。可选地,在进行关联预测时,将异质图特征作为输入参数,输入内容推荐模型,该内容推荐模型为预选训练得到的机器学习模型,通过内容推荐模型对第一节点和第二节点进行关联预测,得到关联预测结果,并输出得到向目标用户帐号进行推荐的目标推荐内容。该内容推荐模型为预先训练得到的神经网络模型。After determining the heterogeneous graph features, the heterogeneous graph features are applied in content recommendation. Illustratively, perform association prediction on the first node corresponding to the user account and the second node corresponding to the candidate recommended content according to the heterogeneous graph feature, and obtain an association prediction result, where the association prediction result is used to indicate the user account's effect on the candidate recommended content. Predict programs of interest. Optionally, when performing association prediction, the heterogeneous graph feature is used as an input parameter, and a content recommendation model is input. Relevance prediction, obtain the association prediction result, and output the target recommended content recommended to the target user account. The content recommendation model is a pre-trained neural network model.

可选地,异质图特征中包括更新后的节点特征以及节点之间的语义特征,内容推荐模型用于根据更新后的节点特征和语义特征确定第一节点与第二节点之间的关联度,并根据关联度将关联度较高的与目标用户帐号关联的候选推荐内容确定为目标推荐内容。Optionally, the heterogeneous graph features include updated node features and semantic features between nodes, and the content recommendation model is used to determine the degree of association between the first node and the second node according to the updated node features and semantic features. , and according to the degree of relevancy, the candidate recommended content associated with the target user account with a relatively high degree of relevancy is determined as the target recommended content.

上述举例中,以应用于推荐系统中为例进行说明,本申请提供的分析结果的生成方法还可以应用于其他对异质图进行嵌入处理,得到更新后的节点特征和语义特征,并将更新后的特征应用于下游任务的场景中,本申请实施例对此不加以限定。In the above example, the application in a recommender system is taken as an example for illustration. The generation method of the analysis result provided in this application can also be applied to other heterogeneous graphs to perform embedding processing to obtain updated node features and semantic features, and update them. The latter feature is applied to the scenario of the downstream task, which is not limited in this embodiment of the present application.

示意性的,请参考图3,其示出了本申请一个示例性实施例提供的整体方案的框架示意图,如图3所示,以设置有两个语义方面类型为例,针对异质图300,以异质图300的结构为随机变量,以语义方面310为隐变量,对异质图300的初始节点特征进行嵌入,以及以语义方面320为隐变量,对异质图300的初始节点特征进行嵌入,得到更新后的节点特征和语义特征。Illustratively, please refer to FIG. 3 , which shows a schematic frame diagram of an overall solution provided by an exemplary embodiment of the present application. As shown in FIG. 3 , taking two semantic aspect types as an example, for the heterogeneous graph 300 , take the structure of the heterogeneous graph 300 as a random variable, take the semantic aspect 310 as a hidden variable, embed the initial node features of the heterogeneous graph 300, and take the semantic aspect 320 as a hidden variable, and use the semantic aspect 320 as a hidden variable. Embedding is performed to obtain updated node features and semantic features.

结合上述名词简介和应用场景,对本申请提供的分析结果的生成方法进行说明,以该方法应用于服务器中为例,如图4所示,该方法包括:In conjunction with the above-mentioned noun introduction and application scenario, the method for generating the analysis result provided by this application is described. Taking the method applied to the server as an example, as shown in FIG. 4 , the method includes:

步骤401,获取目标异质图的异质图结构,异质图结构中包括节点数据和边数据。In step 401, a heterogeneous graph structure of the target heterogeneous graph is obtained, and the heterogeneous graph structure includes node data and edge data.

在一个示例中,将异质图结构表达为

Figure BDA0002640813400000062
其中,ν表示异质图的节点集合,也即异质图的节点数据,ε表示异质图的边集合,也即边数据,每个节点的映射由
Figure BDA0002640813400000061
ν→T确定,T表示节点的类型,每条边的类型由映射ψ:ε→R确定,R表示边的类型。对于异质图而言,|T|+|R|>2。In one example, the heterogeneous graph structure is expressed as
Figure BDA0002640813400000062
Among them, ν represents the node set of the heterogeneous graph, that is, the node data of the heterogeneous graph, ε represents the edge set of the heterogeneous graph, that is, the edge data, and the mapping of each node is given by
Figure BDA0002640813400000061
ν→T is determined, T represents the type of node, and the type of each edge is determined by the mapping ψ:ε→R, where R represents the type of edge. For heterogeneous graphs, |T|+|R|>2.

步骤402,确定节点数据对应的初始节点特征和边数据对应的语义方面。Step 402: Determine the initial node feature corresponding to the node data and the semantic aspect corresponding to the edge data.

其中,初始节点特征对应每个节点的属性数据,属性数据中包括节点对应的节点类型、节点名称等。示意性的,目标异质图为推荐系统网络,其中包括节点A,节点A对应的节点类型为用户帐号,节点A对应的节点名称为“00123”,用于表示用户帐号的帐号标识。The initial node feature corresponds to attribute data of each node, and the attribute data includes the node type, node name, etc. corresponding to the node. Illustratively, the target heterogeneous graph is a recommender system network, which includes node A, the node type corresponding to node A is user account, and the node name corresponding to node A is "00123", which is used to represent the account identifier of the user account.

语义方面对应异质图中节点之间的关联关系,示意性的,以电影推荐系统为例,其中的节点类型包括:1、用户帐号;2、电影名称;3、电影类别;4、导演;5、主演。则语义方面包括如:用户帐号观看的电影、用户帐号喜欢的电影类别、用户帐号喜欢的导演、用户帐号喜欢的演员等。The semantic aspect corresponds to the relationship between the nodes in the heterogeneous graph. Illustratively, taking a movie recommendation system as an example, the node types include: 1. User account; 2. Movie name; 3. Movie category; 4. Director; 5. Leading role. The semantic aspects include, for example, movies watched by the user account, movie categories preferred by the user account, directors preferred by the user account, actors preferred by the user account, and the like.

初始特征向量表达为X∈R|ν|×din,其中,|ν|表示节点数量,din表示输入的初始特征向量的维度。The initial feature vector is expressed as X∈R |ν|×din , where |ν| represents the number of nodes and din represents the dimension of the input initial feature vector.

语义方面表达为A(aspect),语义方面还对应有语义数量,语义数量用于指示边数据指示的语义方面类型的数量,也即,语义数量为根据先验条件指定的。The semantic aspect is expressed as A(aspect), and the semantic aspect also corresponds to a semantic quantity. The semantic quantity is used to indicate the quantity of the semantic aspect type indicated by the edge data, that is, the semantic quantity is specified according to a priori condition.

步骤403,以异质图结构为随机变量,语义方面为隐变量,对初始节点特征进行嵌入,得到目标异质图的异质图特征。Step 403 , using the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, embed the initial node feature to obtain the heterogeneous graph feature of the target heterogeneous graph.

在一个示例性,将异质图结构输入概率图生成模型,以异质图结构为随机变量,语义方面为隐变量,通过概率图生成模型对初始节点特征进行嵌入。In an example, the heterogeneous graph structure is input into the probability graph generation model, the heterogeneous graph structure is used as a random variable, and the semantic aspect is a hidden variable, and initial node features are embedded through the probability graph generation model.

将异质图结构输入概率图生成模型时,包括将初始节点特征和上述语义数量输入概率图生成模型,从而,概率图生成模型结合输入内容对初始节点特征进行更新。Inputting the heterogeneous graph structure into the probability graph generation model includes inputting the initial node features and the above-mentioned semantic quantities into the probability graph generation model, so that the probability graph generation model updates the initial node features in combination with the input content.

概率图模型(Probabilistic Graphical Models,PGMs)是指通过图结构来表示概率分布的一种数学工具,其中概率图的节点表示随机变量,图的边表示随机变量之间的依赖关系。Probabilistic Graphical Models (PGMs) refers to a mathematical tool that represents probability distribution through graph structure, in which the nodes of the probability graph represent random variables, and the edges of the graph represent the dependencies between random variables.

异质图特征包括更新语义特征和更新节点特征。Heterogeneous graph features include update semantic features and update node features.

其中,概率图生成模型还对应有生成模型参数,响应于该生成模型参数为已经过训练的参数,则直接通过概率图生成模型对初始节点特征进行嵌入,得到更新语义特征和更新节点特征;响应于生成模型参数为未训练完毕的参数,则需要首先根据异质图结构对模型参数进行迭代调整。Among them, the probability map generation model also corresponds to the generation model parameters. In response to the generation model parameters being trained parameters, the initial node features are directly embedded through the probability map generation model to obtain updated semantic features and updated node features; response Since the generated model parameters are untrained parameters, it is necessary to iteratively adjust the model parameters according to the heterogeneous graph structure.

概率图生成模型生成更新后的节点特征和语义特征的过程表示为如下公式一:The process of generating updated node features and semantic features by the probability graph generation model is expressed as the following formula 1:

公式一:pφ(G,A,X)=pφ(G│A,X)pφ(A|X)p(X)Formula 1: p φ (G, A, X) = p φ (G│A, X)p φ (A|X)p(X)

其中,φ表示概率图生成模型的生成模型参数,初始节点特征X的分布是非参数化的。上述生成过程的形式化中,假设异质图是由输入的初始节点特征以及语义方面隐变量生成的,即G,A,X三者的联合概率分布由链式法则展开,由初始节点特征X和未观测到的语义方面隐变量A共同作用,生成异质图的拓扑结构。Among them, φ represents the generative model parameters of the probability graph generative model, and the distribution of the initial node feature X is non-parametric. In the formalization of the above generation process, it is assumed that the heterogeneous graph is generated by the input initial node features and semantic hidden variables, that is, the joint probability distribution of G, A, and X is expanded by the chain rule, and the initial node feature X Together with the unobserved semantic aspect latent variable A, the topology of the heterogeneous graph is generated.

生成模型的生成模型参数φ可以通过最大似然估计的方式来求解,即最大化logpφ(G|X)。由于模型内部存在隐变量,无法直接求解这个最大优化问题,故采用变分最大期望(variational Expectation Maximization,vEM)算法求解。The generative model parameters φ of the generative model can be solved by means of maximum likelihood estimation, that is, maximizing logp φ (G|X). Due to the existence of hidden variables in the model, this maximum optimization problem cannot be solved directly, so the variational Expectation Maximization (vEM) algorithm is used to solve it.

可选地,确定模型参数对应异质图结构和初始节点特征的最大似然估计函数logpφ(G|X),通过最大似然估计函数对模型参数进行调整,并以调整后的模型参数通过概率图生成模型对初始节点特征进行嵌入。Optionally, determine the maximum likelihood estimation function logp φ (G|X) of the model parameters corresponding to the heterogeneous graph structure and initial node features, adjust the model parameters through the maximum likelihood estimation function, and use the adjusted model parameters to pass The probabilistic graph generation model embeds the initial node features.

其中,将最大似然估计函数转化为通过变分分布qθ(A|X)和后验分布pφ(A|G,X)表达,变分分布对应变分模型参数θ,后验分布对应生成模型参数φ,根据变分分布和后验分布之间的散度要求,对变分模型参数进行调整,并根据调整后的变分模型参数对生成模型参数进行调整。示意性的,该最大似然估计函数如下公式二所示:Among them, the maximum likelihood estimation function is transformed into a variational distribution q θ (A|X) and a posterior distribution p φ (A|G, X) expression, the variational distribution corresponds to the variational model parameter θ, and the posterior distribution corresponds to The model parameter φ is generated, the variational model parameters are adjusted according to the divergence requirements between the variational distribution and the posterior distribution, and the generated model parameters are adjusted according to the adjusted variational model parameters. Illustratively, the maximum likelihood estimation function is shown in the following formula 2:

公式二:

Figure BDA0002640813400000081
Formula two:
Figure BDA0002640813400000081

其中,KL表示KL散度(Kullback-Leibler Divergence),用于衡量两个概率分布的差异性,

Figure BDA0002640813400000082
表示似然函数的一个下界,称为证据下界(Evidence Lower Bound,ELBO),通过对该证据下界的最大化处理,实现对最大似然估计函数的最大化求解。Among them, KL represents the KL divergence (Kullback-Leibler Divergence), which is used to measure the difference between two probability distributions,
Figure BDA0002640813400000082
Represents a lower bound of the likelihood function, called the Evidence Lower Bound (ELBO). By maximizing the evidence lower bound, the maximum likelihood estimation function can be maximized.

通过vEM算法对变分模型参数和生成模型参数进行调整。vEM算法分为E步和M步,E步代表推断(inference)过程,M步代表学习(learning)过程,在E步中,根据变分分布和后验分布之间的散度要求,首先对变分模型参数θ进行调整,在M步结合调整后的变分模型参数θ,对生成模型参数φ进行调整。变分模型参数θ和生成模型参数φ通过vEM算法迭代更新。针对E步和M步分别进行说明。The variational model parameters and generative model parameters are adjusted by the vEM algorithm. The vEM algorithm is divided into E steps and M steps. E step represents the inference process, and M step represents the learning process. In the E step, according to the divergence requirements between the variational distribution and the posterior distribution, first of all The variational model parameter θ is adjusted, and the generated model parameter φ is adjusted in combination with the adjusted variational model parameter θ in step M. The variational model parameters θ and generative model parameters φ are iteratively updated by the vEM algorithm. The E step and the M step are described separately.

E步代表推断过程,概率图生成模型需要对后验分布pφ(A|G,X)进行估计,而估计过程中由于存在隐变量A和节点特征X之间未知、复杂的关联关系,故,无法直接得到估计结果,从而,按照vEM算法,在E步固定后验分布pφ(A|G,X),并通过更新变分分布qθ(A|X)近似真实的后验分布。Step E represents the inference process. The probability map generation model needs to estimate the posterior distribution p φ (A|G, X). In the estimation process, due to the unknown and complex relationship between the latent variable A and the node feature X, so , the estimation result cannot be obtained directly. Therefore, according to the vEM algorithm, the posterior distribution p φ (A|G, X) is fixed at step E, and the true posterior distribution is approximated by updating the variational distribution q θ (A|X).

可选地,根据摊还分析(amortized inference)采用图神经网络实例化变分分布qθ(A|X),记为GNNθ,并根据平均场理论(mean-field theory)将变分分布重写为如下公式三:Optionally, a graph neural network is used to instantiate the variational distribution q θ (A|X) according to amortized inference, denoted as GNN θ , and the variational distribution is resized according to mean-field theory. It is written as the following formula three:

公式三:

Figure BDA0002640813400000091
Formula three:
Figure BDA0002640813400000091

其中,K表示异质图中方面隐变量的数量。ak表示第k个方面变量。where K represents the number of facet latent variables in the heterogeneous graph. a k represents the k-th aspect variable.

GNNθ的优化目标定义为如下公式四:The optimization objective of GNN θ is defined as the following formula 4:

公式四:

Figure BDA0002640813400000092
Formula four:
Figure BDA0002640813400000092

根据上述优化目标公式,变分分布qθ(A|X)的优化需要满足如下公式五:According to the above optimization objective formula, the optimization of the variational distribution q θ (A|X) needs to satisfy the following formula 5:

公式五:

Figure BDA0002640813400000093
Formula five:
Figure BDA0002640813400000093

其中,C即为const,是一个常量。示意性的,以其中一个方面变量a0为例,对上述公式五的推导过程进行说明,请参考如下公式六:Among them, C is const, is a constant. Illustratively, taking one aspect variable a 0 as an example, to illustrate the derivation process of the above formula 5, please refer to the following formula 6:

Figure BDA0002640813400000094
Figure BDA0002640813400000094

上述推导公式中的logF(a0)展开如下公式七所示:The logF(a 0 ) in the above derivation formula is expanded as shown in the following formula 7:

公式七:

Figure BDA0002640813400000101
Formula seven:
Figure BDA0002640813400000101

根据上述推导,最优的

Figure BDA0002640813400000102
在KL散度为零时取到,此时KL散度内部的两个分布相等。即参考如下公式八:According to the above derivation, the optimal
Figure BDA0002640813400000102
Taken when the KL divergence is zero, at which point the two distributions inside the KL divergence are equal. That is, refer to the following formula eight:

公式八:

Figure BDA0002640813400000103
Formula eight:
Figure BDA0002640813400000103

M步代表学习过程,在该过程中,固定变分分布qθ(A|X),更新后验分布pφ(A|G,X),参数φ的优化过程如下公式九:The M step represents the learning process. In this process, the variational distribution q θ (A|X) is fixed, the posterior distribution p φ (A|G, X) is updated, and the optimization process of the parameter φ is as follows:

公式九:

Figure BDA0002640813400000104
Formula nine:
Figure BDA0002640813400000104

其中,将logpφ(G|A,X)和logpφ(A|X)实例化为图神经网络GNNφwhere logp φ (G|A,X) and logp φ (A|X) are instantiated as a graph neural network GNN φ .

也即,概率图生成模型中包括上述GNNθ和GNNφ,通过GNNθ和GNNφ对初始节点特征进行嵌入,输出得到更新语义特征和更新节点特征。That is, the probability graph generation model includes the above-mentioned GNN θ and GNN φ , the initial node features are embedded through GNN θ and GNN φ , and the updated semantic features and updated node features are outputted.

步骤404,对更新语义特征和更新节点特征进行分析,生成与节点数据对应的分析结果。Step 404 , analyze the updated semantic feature and the updated node feature, and generate an analysis result corresponding to the node data.

可选地,根据异质图特征对异质图结构中的节点数据进行关联预测,得到关联预测结果作为分析结果。Optionally, the association prediction is performed on the node data in the heterogeneous graph structure according to the heterogeneous graph feature, and the association prediction result is obtained as the analysis result.

关联预测结果用于指示节点数据之间预测得到的关联关系。The association prediction result is used to indicate the predicted association relationship between node data.

根据该关联预测结果,对异质图结构进行后续的落地应用,示意性的,对关联预测结果的应用方式进行举例说明:According to the association prediction result, follow-up application of the heterogeneous graph structure is performed.

第一,节点数据中包括第一节点和第二节点,其中,第一节点对应帐号数据,第二节点对应候选推荐内容;根据异质图特征对第一节点和第二节点进行关联预测,得到关联预测结果,该关联预测结果用于指示帐号数据对候选推荐内容的预测感兴趣程度。First, the node data includes a first node and a second node, wherein the first node corresponds to the account data, and the second node corresponds to the candidate recommended content; the first node and the second node are correlated according to the characteristics of the heterogeneous graph, and get The associated prediction result, where the associated prediction result is used to indicate that the account data is interested in the prediction of the candidate recommended content.

可选地,根据关联预测结果向目标帐号发送预测感兴趣程度较高的候选推荐内容;或,根据关联预测结果预测候选推荐内容对应感兴趣程度较高的目标帐号数量,从而得到候选推荐内容的预测推广效果。Optionally, according to the association prediction result, send candidate recommended content with a higher predicted interest degree to the target account; or, according to the association prediction result, predict the number of target accounts with higher interest degree corresponding to the candidate recommended content, so as to obtain the candidate recommended content. Predict the effect of promotion.

可选地,在进行关联预测的过程中,将异质图特征输入推荐模型,该推荐模型为预先训练得到的机器学习模型,通过推荐模型对第一节点和第二节点进行关联预测,得到关联预测结果。Optionally, in the process of association prediction, the heterogeneous graph features are input into a recommendation model, the recommendation model is a machine learning model obtained by pre-training, and the association prediction is performed on the first node and the second node through the recommendation model, and the association is obtained. forecast result.

第二,节点数据中包括第三节点和第四节点,其中,第三节点对应异常描述内容,第四节点对应异常状态类型,其中,异常描述内容为对程序异常情况进行描述的内容,异常状态类型为与异常描述内容对应的程序运行状态诊断结果。根据异质图特征对第三节点和第四节点进行关联预测,得到关联预测结果,该关联预测结果用于指示每种异常描述内容所对应的异常状态类型推测。Second, the node data includes a third node and a fourth node, wherein the third node corresponds to the abnormal description content, and the fourth node corresponds to the abnormal state type, wherein the abnormal description content is the content describing the abnormal situation of the program, and the abnormal state The type is the diagnosis result of the program running state corresponding to the abnormal description content. The association prediction is performed on the third node and the fourth node according to the feature of the heterogeneous graph, and the association prediction result is obtained, and the association prediction result is used to indicate the abnormal state type prediction corresponding to each abnormal description content.

可选地,根据关联预测结果确定与目标异常描述内容对应关联度较高的异常状态类型,从而以异常状态类型对应的解决方案进行程序异常情况的处理。Optionally, an abnormal state type with a high degree of correlation corresponding to the target abnormality description content is determined according to the correlation prediction result, so that the program abnormality is processed with a solution corresponding to the abnormal state type.

可选地,在进行关联预测的过程中,将异质图特征输入异常预测模型,该异常预测模型为预先训练得到的机器学习模型,通过异常预测模型对第三节点和第四节点进行关联预测,得到关联预测结果。Optionally, in the process of association prediction, the heterogeneous graph features are input into an anomaly prediction model, the anomaly prediction model is a machine learning model obtained by pre-training, and the association prediction is performed on the third node and the fourth node through the anomaly prediction model. , get the correlation prediction result.

综上所述,本实施例提供的分析结果的生成方法,以异质图结构为随机变量,并以语义方面为隐变量,从而对异质图进行嵌入处理,从异质图中获取隐含的语义,从而对节点特征向量进行更新,以及对语义特征向量进行更新,获取更准确的节点特征向量和语义特征向量,以及避免通过设置元路径的方式更新节点特征,提高了节点特征的更新效率,在下游任务中的任务执行准确度更高。To sum up, the method for generating the analysis result provided by this embodiment uses the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, so as to perform embedding processing on the heterogeneous graph, and obtain implicit information from the heterogeneous graph. The semantics of the node feature vector is updated, and the semantic feature vector is updated, so as to obtain more accurate node feature vector and semantic feature vector, and avoid updating the node feature by setting the meta-path, which improves the update efficiency of the node feature. , the task execution accuracy in downstream tasks is higher.

本实施例中,在概率图生成模型中,通过变分分布辅助最大似然估计函数的最大化求解,从而推导得到用于更新节点特征和语义特征的图神经网络,提高了节点更新的准确率。In this embodiment, in the probability graph generation model, the maximum likelihood estimation function is assisted by variational distribution, thereby deriving a graph neural network for updating node features and semantic features, which improves the accuracy of node updating .

示意性的,本申请实施例提供的整体方案框架的算法如下过程所示:Illustratively, the algorithm of the overall solution framework provided by the embodiments of the present application is as follows:

Input:

Figure BDA0002640813400000111
initialnodefeatureX∈R|ν|×din,aspectnumberKInput:
Figure BDA0002640813400000111
initialnodefeatureX∈R |ν|×din , aspectnumberK

Output:NodeembeddingH∈R|ν|×dout Output: NodeembeddingH∈R |ν|×dout

1:whilenotconvergedo1: whilenotconverged

2:E-Step:Inference Procedure2: E-Step: Inference Procedure

3:Updatenodeembeddingsandedgeweightsbyqθ 3: Updatenodeembeddingsandedgeweightsbyq θ

4:InferenceaspectembeddingsAbyqθ 4: InferenceaspectembeddingsAbyq θ

5:UpdateqθbyEq(4)5: Updateq θ byEq(4)

6:M-Step:Learning Procedure6: M-Step: Learning Procedure

7:Updatenodeembeddingsandedgeweightsbypφ 7: Updatenodeembeddingsandedgeweightsbyp φ

8:Updatepφ 8: Updatep φ

9:endwhile9: endwhile

其中,算法第一行Input表示输入参数包括异质图结构G,初始节点特征X和语义数量K,语义数量用于指示边数据指示的语义方面类型的数量,也即,方面隐变量的数量。第二行Output表示输出内容,包括更新节点特征和更新语义特征,其中,R用于指示边的类型,|ν|表示节点数量,dout表示输出的更新特征向量的维度。Among them, the first line Input of the algorithm indicates that the input parameters include heterogeneous graph structure G, initial node feature X and semantic quantity K. The semantic quantity is used to indicate the number of semantic aspect types indicated by the edge data, that is, the number of aspect hidden variables. The second line Output represents the output content, including updated node features and updated semantic features, where R is used to indicate the type of edge, |ν| represents the number of nodes, and d out represents the dimension of the output updated feature vector.

算法过程中,步骤1表示开始迭代更新;步骤2表示预测过程,也即对应上述E步;步骤3表示根据qθ更新节点特征和边权重;步骤4表示根据qθ预测方面隐向量A;步骤5表示根据上述公式四更新qθ;步骤6表示开始学习过程,也即对应上述M步;步骤7表示根据pφ更新节点特征和边权重;步骤8表示更新pφ;步骤9表示结束迭代。In the algorithm process, step 1 represents the start of iterative update; step 2 represents the prediction process, which corresponds to the above E step; step 3 represents the update of node features and edge weights according to q θ ; step 4 represents the prediction of aspect hidden vector A according to q θ ; 5 represents updating q θ according to the above-mentioned formula four; Step 6 represents starting the learning process, that is, corresponding to the above-mentioned M step; Step 7 represents updating node features and edge weights according to p φ ; Step 8 represents updating p φ ; Step 9 represents ending iteration.

在一个可选的实施中,在通过上述GNNθ和GNNφ进行节点特征更新时,是通过将边权重和初始节点特征解耦得到K个方面通道实现的,图5是本申请另一个示例性实施例提供的分析结果的生成方法的流程图,以该方法应用于服务器中为例,如图5所示,该方法包括:In an optional implementation, when updating node features through the above GNN θ and GNN φ , it is achieved by decoupling edge weights and initial node features to obtain K aspect channels. FIG. 5 is another example of this application. The flow chart of the method for generating the analysis result provided by the embodiment, taking the method applied to the server as an example, as shown in FIG. 5 , the method includes:

步骤501,获取目标异质图的异质图结构,异质图结构中包括节点数据和边数据。In step 501, a heterogeneous graph structure of the target heterogeneous graph is obtained, and the heterogeneous graph structure includes node data and edge data.

节点数据对应有初始节点特征,边数据对应有语义方面。Node data corresponds to initial node features, and edge data corresponds to semantic aspects.

在一个示例中,将异质图结构表达为

Figure BDA0002640813400000121
初始特征向量表达为X∈R|ν|×din,语义方面表达为A(aspect)。In one example, the heterogeneous graph structure is expressed as
Figure BDA0002640813400000121
The initial feature vector is expressed as X∈R |ν|×din and the semantic aspect is expressed as A(aspect).

步骤502,将目标异质图的异质图结构输入概率图生成模型。Step 502 , input the heterogeneous graph structure of the target heterogeneous graph into the probability graph generation model.

将异质图结构输入概率图生成模型时,包括将初始节点特征和上述语义数量输入概率图生成模型,从而,概率图生成模型结合输入内容对初始节点特征进行更新。Inputting the heterogeneous graph structure into the probability graph generation model includes inputting the initial node features and the above-mentioned semantic quantities into the probability graph generation model, so that the probability graph generation model updates the initial node features in combination with the input content.

步骤503,将边权重和初始节点特征解耦得到K个方面通道,K为正整数。Step 503, decoupling the edge weight and the initial node feature to obtain K aspect channels, where K is a positive integer.

其中,K的取值与语义数量对应,语义数量用于指示边数据指示的语义方面类型的数量,其中,K的取值为预定义的。The value of K corresponds to the semantic quantity, and the semantic quantity is used to indicate the quantity of semantic aspect types indicated by the edge data, wherein the value of K is predefined.

可选地,概率图生成模型所采用的神经网络结构为图神经网络(Aspect-awareGNN,A2GNN),即上述GNNθ和GNNφOptionally, the neural network structure adopted by the probability graph generation model is a graph neural network (Aspect-awareGNN, A 2 GNN), that is, the above-mentioned GNN θ and GNN φ .

概率图生成模型需要输出的内容中包括更新语义特征和更新节点特征,其中,更新节点特征表达为

Figure BDA0002640813400000131
hi表示神经网络层输出的特征,i表示第i个节点;更新语义特征表达为
Figure BDA0002640813400000132
ak表示第k个方面隐变量,K表示语义方面的总数量。The content that the probability graph generation model needs to output includes the update semantic feature and the update node feature, where the update node feature is expressed as
Figure BDA0002640813400000131
h i represents the feature output by the neural network layer, i represents the ith node; the update semantic feature is expressed as
Figure BDA0002640813400000132
a k denotes the k-th aspect latent variable, and K denotes the total number of semantic aspects.

步骤504,根据第k个方面通道的节点特征向量和方面特征向量,对边权重进行更新。Step 504: Update the edge weights according to the node feature vector and the aspect feature vector of the kth aspect channel.

由于不同的方面隐变量ak之间需要保持条件尽量独立,以描述异质图中具备不同含义的语义方面,而每个节点都能够参与到不同的语义方面描述中,因此,节点与其周侧的邻居节点之间的共现概率需要满足方面敏感(aspect-aware)条件,也即,概率图生成模型需要具备对节点与其周侧的邻居节点之间的共现概率的建模能力。Since the hidden variables a k of different aspects need to keep the conditions as independent as possible to describe the semantic aspects with different meanings in the heterogeneous graph, and each node can participate in the description of different semantic aspects, therefore, the node and its surrounding The co-occurrence probability between the neighbor nodes of the node needs to satisfy the aspect-aware condition, that is, the probability graph generation model needs to have the ability to model the co-occurrence probability between the node and its surrounding neighbor nodes.

故,本实施例中,边数据还对应有边权重,将边权重和初始节点特征解耦得到K个方面通道,K为正整数,且K的取值与语义数量对应,根据第k个方面通道的节点特征向量和方面特征向量,对边权重进行更新,并根据更新后的边权重对初始节点特征进行更新。其中,节点特征向量是通过将初始节点特征向量通过非线性映射函数映射得到的,也即,将初始节点特征通过非线性映射函数进行映射,得到第k方面通道的节点特征向量;k个方面通道的方面特征向量是通过在第k个方面通道中进行全局图池化处理得到的,其中,本实施例中,的全局图池化处理实现为全局平均池化处理,全局图池化处理还可以实现为其他图池化技术,本实施例对此不加以限定。示意性的,全局池化过程请参考如下公式十:Therefore, in this embodiment, edge data also corresponds to edge weights, and K aspect channels are obtained by decoupling edge weights and initial node features, where K is a positive integer, and the value of K corresponds to the semantic quantity. According to the kth aspect The node feature vector and aspect feature vector of the channel update the edge weights, and update the initial node features according to the updated edge weights. Among them, the node feature vector is obtained by mapping the initial node feature vector through the nonlinear mapping function, that is, the initial node feature is mapped through the nonlinear mapping function to obtain the node feature vector of the kth aspect channel; k aspect channels The aspect feature vector of is obtained by performing global graph pooling processing in the kth aspect channel, wherein, in this embodiment, the global graph pooling processing is implemented as global average pooling processing, and the global graph pooling processing can also It is implemented as other graph pooling technology, which is not limited in this embodiment. Illustratively, please refer to the following formula 10 for the global pooling process:

公式十:

Figure BDA0002640813400000133
Formula ten:
Figure BDA0002640813400000133

其中,GLOBALPOOL表示全局图池化处理,

Figure BDA0002640813400000134
W表示语义类型。Among them, GLOBALPOOL represents the global graph pooling process,
Figure BDA0002640813400000134
W represents the semantic type.

在对边权重进行更新时,获取第i个节点在第k个方面通道的第一节点特征向量和第一方面特征向量,拼接得到第一拼接向量;获取第j个节点在第k个方面通道的第二节点特征向量和第二方面特征向量,拼接得到第二拼接向量;获取第m个节点在第k个方面通道的第三节点特征向量和第三方面特征向量,拼接得到第三拼接向量,i,j,m皆为正整数,且所述第m个节点为所述第i个节点的邻居节点,确定第一拼接向量和第二拼接向量的第一语义相似度,以及确定第一拼接向量和第三拼接向量的第二语义相似度,根据第一语义相似度和第二语义相似度对边权重进行更新。When updating the edge weight, obtain the first node feature vector and the first aspect feature vector of the i-th node in the k-th aspect channel, and splicing to obtain the first splicing vector; obtain the j-th node in the k-th aspect channel The second node feature vector and the second aspect feature vector are spliced to obtain the second splicing vector; the third node feature vector and the third aspect feature vector of the mth node in the kth aspect channel are obtained, and the third splicing vector is obtained by splicing , i, j, m are all positive integers, and the mth node is the neighbor node of the ith node, determine the first semantic similarity of the first splicing vector and the second splicing vector, and determine the first The second semantic similarity between the splicing vector and the third splicing vector, and the edge weight is updated according to the first semantic similarity and the second semantic similarity.

示意性的,边权重的更新过程请参考如下公式十一:Illustratively, please refer to the following formula 11 for the update process of edge weights:

公式十一:

Figure BDA0002640813400000141
Formula eleven:
Figure BDA0002640813400000141

其中,针对第k个方面通道而言,

Figure BDA0002640813400000142
为第l次迭代中的第一拼接向量,
Figure BDA0002640813400000143
为第l次迭代中的第二拼接向量,
Figure BDA0002640813400000144
为第l-1次迭代中的第i个节点和第j个节点之间的边权重,N(vi)指示第i个节点的邻居节点,
Figure BDA0002640813400000145
为第l次迭代中的第三拼接向量,
Figure BDA0002640813400000146
为第l-1次迭代中的第i个节点和第m个节点之间的边权重。K表示衡量节点之间语义相似度的核函数。Among them, for the kth aspect channel,
Figure BDA0002640813400000142
is the first splice vector in the lth iteration,
Figure BDA0002640813400000143
is the second splice vector in the lth iteration,
Figure BDA0002640813400000144
is the edge weight between the ith node and the jth node in the l-1th iteration, N(v i ) indicates the neighbor nodes of the ith node,
Figure BDA0002640813400000145
is the third splice vector in the lth iteration,
Figure BDA0002640813400000146
is the edge weight between the ith node and the mth node in the l-1th iteration. K represents a kernel function that measures the semantic similarity between nodes.

步骤505,根据更新后的边权重对初始节点特征进行更新,输出得到异质图特征。Step 505: Update the initial node feature according to the updated edge weight, and output the heterogeneous graph feature.

异质图特征包括更新语义特征和更新节点特征。Heterogeneous graph features include update semantic features and update node features.

示意性的,基于对边权重的更新,对初始节点特征根据语义方面进行解耦,请参考如下公式十二:Illustratively, based on the update of the edge weight, the initial node features are decoupled according to the semantic aspect, please refer to the following formula 12:

公式十二:

Figure BDA0002640813400000147
Formula twelve:
Figure BDA0002640813400000147

其中,针对第k个方面通道而言,

Figure BDA0002640813400000148
为第l-1次迭代中的第一拼接向量。Among them, for the kth aspect channel,
Figure BDA0002640813400000148
is the first splice vector in the l-1th iteration.

上述公式十和公式十一中的分母都是在邻域范围内对相应的特征进行归一化操作。通过边权重和节点特征的交替更新,模型可以捕获到异质图中不同方面的语义。最终的节点特征通过拼接不同方面通道中的节点特征得到,请参考如图公式十三:The denominators in the above formula 10 and formula 11 are the normalization operations on the corresponding features in the neighborhood range. Through alternating updates of edge weights and node features, the model can capture different aspects of semantics in heterogeneous graphs. The final node feature is obtained by splicing the node features in different aspects of the channel, please refer to the formula 13 in the figure:

公式十三:

Figure BDA0002640813400000149
Formula Thirteen:
Figure BDA0002640813400000149

其中,hi表示更新后的节点,Wout表示神经网络输出的特征,zi,k表示迭代后第k个方面通道中,节点特征向量和方面特征向量拼接得到的拼接向量。Among them, hi represents the updated node, W out represents the feature of the neural network output, and z i ,k represents the splicing vector obtained by splicing the node feature vector and the aspect feature vector in the k-th aspect channel after iteration.

步骤506,对更新语义特征和更新节点特征进行分析,生成与节点数据对应的分析结果。Step 506 , analyze the updated semantic feature and the updated node feature, and generate an analysis result corresponding to the node data.

可选地,根据异质图特征对异质图结构中的节点数据进行关联预测,得到关联预测结果。Optionally, the association prediction is performed on the node data in the heterogeneous graph structure according to the heterogeneous graph feature to obtain an association prediction result.

关联预测结果用于指示节点数据之间预测得到的关联关系。The association prediction result is used to indicate the predicted association relationship between node data.

综上所述,本实施例提供的分析结果的生成方法,以异质图结构为随机变量,并以语义方面为隐变量,从而对异质图进行嵌入处理,从异质图中获取隐含的语义,从而对节点特征向量进行更新,以及对语义特征向量进行更新,获取更准确的节点特征向量和语义特征向量,以及避免通过设置元路径的方式更新节点特征,提高了节点特征的更新效率,在下游任务中的任务执行准确度更高。To sum up, the method for generating the analysis result provided by this embodiment uses the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, so as to perform embedding processing on the heterogeneous graph, and obtain implicit information from the heterogeneous graph. The semantics of the node feature vector is updated, and the semantic feature vector is updated, so as to obtain more accurate node feature vector and semantic feature vector, and avoid updating the node feature by setting the meta-path, which improves the update efficiency of the node feature. , the task execution accuracy in downstream tasks is higher.

本实施例中,针对A2GNN神经网络,通过解耦得到K个方面通道,保持不同的方面隐变量ak之间需要保持条件尽量独立,以描述异质图中具备不同含义的语义方面,确保概率图生成模型具备对节点与其周侧的邻居节点之间的共现概率的建模能力。In this embodiment, for the A 2 GNN neural network, K aspect channels are obtained through decoupling, and the hidden variables a k of different aspects need to be kept as independent as possible, so as to describe the semantic aspects with different meanings in the heterogeneous graph, Make sure that the probability graph generation model has the ability to model the co-occurrence probability between a node and its neighboring nodes.

示意性的,本申请实施例提供的A2GNN神经网络的算法如下过程所示:Illustratively, the algorithm of the A 2 GNN neural network provided by the embodiment of the present application is shown in the following process:

Input:

Figure BDA0002640813400000155
initialnodefeatureX∈R|ν|×din,aspectnumberK,layernumberLInput:
Figure BDA0002640813400000155
initialnodefeatureX∈R |ν|×din , aspectnumberK, layernumberL

Output:NodeembeddingH∈R|ν|×dout Output: NodeembeddingH∈R |ν|×dout

1:Randomlyinitializeak,1≤k≤K1: Randomlyinitializea k , 1≤k≤K

2:whilenotconvergedo2: whilenotconvergedo

3:Calculatexi,k=fk(xi),

Figure BDA0002640813400000151
3: Calculate x i ,k =f k (x i ),
Figure BDA0002640813400000151

4:forlayerl=1,2,…,L,do4: forlayerl=1,2,...,L,do

5:fork=1,2,…,K,do5: fork=1,2,...,K,do

6:for(i,j)∈εdo6: for(i,j)∈εdo

7:

Figure BDA0002640813400000152
7:
Figure BDA0002640813400000152

8:endfor8: endfor

9:fori=1,2,…,|V|do9: fori=1,2,...,|V|do

10:

Figure BDA0002640813400000153
10:
Figure BDA0002640813400000153

11:endfor11: endfor

12:ifl=Lthen12: ifl=Lthen

13:

Figure BDA0002640813400000154
13:
Figure BDA0002640813400000154

14:Updateak 14: Updateak

15:endif15: endif

16:endfor16: endfor

17:endfor17: endfor

18:endwhile18: endwhile

其中,算法第一行Input表示输入参数包括异质图结构G,初始节点特征X和语义数量K,语义数量用于指示边数据指示的语义方面类型的数量,也即,方面隐变量的数量,以及神经网络迭代次数L。第二行Output表示输出内容,包括更新节点特征和更新语义特征,其中,R用于指示边的类型,|ν|表示节点数量,dout表示输出的更新特征向量的维度。Among them, the first line Input of the algorithm indicates that the input parameters include the heterogeneous graph structure G, the initial node feature X and the semantic quantity K. The semantic quantity is used to indicate the number of semantic aspect types indicated by the edge data, that is, the number of aspect hidden variables, and the number of neural network iterations L. The second line Output represents the output content, including updated node features and updated semantic features, where R is used to indicate the type of edge, |ν| represents the number of nodes, and d out represents the dimension of the output updated feature vector.

算法过程中,步骤1表示随机初始化方面特征向量ak;步骤2表示开始迭代更新;步骤3表示计算xi,k

Figure BDA0002640813400000162
步骤4表示判断迭代次数l是否达到L;步骤5表示判断当前针对的方面通道k是否达到K;步骤6表示判断当前边(i,j)是否属于异质图结构中的已知边;步骤7表示更新边权重;步骤8表示针对每一条边更新权重后,结束权重更新的for循环;步骤9表示判断当前已更新的节点i是否达到节点总数|V|;步骤10表示更新第k个通道中节点特征向量和语义特征向量的拼接向量;步骤11表示节点更新完毕后,结束节点特征更新的for循环;步骤12表示当迭代次数l达到设定的次数L时;步骤13表示通过拼接不同方面通道中的节点特征得到总的节点特征;步骤14表示更新语义特征;步骤15至步骤18表示当所有条件符合时,结束迭代过程。In the algorithm process, step 1 represents the random initialization aspect of the feature vector a k ; step 2 represents the start of iterative update; step 3 represents the calculation of x i, k and
Figure BDA0002640813400000162
Step 4 means judging whether the number of iterations l reaches L; Step 5 means judging whether the current aspect channel k reaches K; Step 6 means judging whether the current edge (i, j) belongs to the known edge in the heterogeneous graph structure; Step 7 Represents updating edge weights; Step 8 represents that after updating the weights for each edge, the for loop of weight update ends; Step 9 represents judging whether the currently updated node i reaches the total number of nodes |V|; Step 10 represents updating the kth channel The splicing vector of the node feature vector and the semantic feature vector; Step 11 indicates that after the node update is completed, the for loop of node feature update ends; Step 12 indicates that when the number of iterations l reaches the set number of times L; Step 13 indicates that by splicing different aspects of the channel The total node features are obtained from the node features in ; Step 14 represents updating the semantic features; Steps 15 to 18 represent that when all conditions are met, the iterative process is ended.

在一个可选的实施例中,本申请实施例中涉及的生成模型参数还需要通过损失函数进行训练调整,图6是本申请另一个示例性实施例提供的分析结果的生成方法的流程图,以该方法应用于服务器中为例,如图6所示,该方法包括:In an optional embodiment, the parameters of the generative model involved in the embodiment of the present application also need to be trained and adjusted through a loss function. FIG. 6 is a flowchart of a method for generating an analysis result provided by another exemplary embodiment of the present application. Taking the method applied to the server as an example, as shown in FIG. 6 , the method includes:

步骤601,获取目标异质图的异质图结构,异质图结构中包括节点数据和边数据。Step 601: Obtain a heterogeneous graph structure of the target heterogeneous graph, where the heterogeneous graph structure includes node data and edge data.

节点数据对应有初始节点特征,边数据对应有语义方面。Node data corresponds to initial node features, and edge data corresponds to semantic aspects.

在一个示例中,将异质图结构表达为

Figure BDA0002640813400000161
初始特征向量表达为X∈R|ν|×din,语义方面表达为A(aspect)。In one example, the heterogeneous graph structure is expressed as
Figure BDA0002640813400000161
The initial feature vector is expressed as X∈R |ν|×din and the semantic aspect is expressed as A(aspect).

步骤602,将目标异质图的异质图结构输入概率图生成模型。Step 602: Input the heterogeneous graph structure of the target heterogeneous graph into the probability graph generation model.

将异质图结构输入概率图生成模型时,包括将初始节点特征和上述语义数量输入概率图生成模型,从而,概率图生成模型结合输入内容对初始节点特征进行更新。Inputting the heterogeneous graph structure into the probability graph generation model includes inputting the initial node features and the above-mentioned semantic quantities into the probability graph generation model, so that the probability graph generation model updates the initial node features in combination with the input content.

步骤603,以异质图结构为随机变量,语义方面为隐变量,通过概率图生成模型对初始节点特征进行嵌入,输出得到目标异质图的异质图特征。Step 603 , using the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, embed the initial node feature through a probability graph generation model, and output the heterogeneous graph feature of the target heterogeneous graph.

异质图特征包括更新语义特征和更新节点特征。Heterogeneous graph features include update semantic features and update node features.

其中,概率图生成模型还对应有生成模型参数,响应于生成模型参数为未训练完毕的参数,则需要首先根据异质图结构对模型参数进行迭代调整。The probability graph generation model also corresponds to generation model parameters. In response to the generation model parameters being untrained parameters, the model parameters need to be iteratively adjusted first according to the heterogeneous graph structure.

示意性的,如上述公式一所示,φ表示概率图生成模型的生成模型参数。Illustratively, as shown in the above formula 1, φ represents the generation model parameter of the probability map generation model.

生成模型的生成模型参数φ可以通过最大似然估计的方式来求解,即最大化logpφ(G|X)。由于模型内部存在隐变量,无法直接求解这个最大优化问题,故采用变分最大期望(variational Expectation Maximization,vEM)算法求解。The generative model parameters φ of the generative model can be solved by means of maximum likelihood estimation, that is, maximizing logp φ (G|X). Due to the existence of hidden variables in the model, this maximum optimization problem cannot be solved directly, so the variational Expectation Maximization (vEM) algorithm is used to solve it.

可选地,确定模型参数对应异质图结构和初始节点特征的最大似然估计函数logpφ(G|X),通过最大似然估计函数对模型参数进行调整,并以调整后的模型参数通过概率图生成模型对初始节点特征进行嵌入。Optionally, determine the maximum likelihood estimation function logp φ (G|X) of the model parameters corresponding to the heterogeneous graph structure and initial node features, adjust the model parameters through the maximum likelihood estimation function, and use the adjusted model parameters to pass The probabilistic graph generation model embeds the initial node features.

其中,将最大似然估计函数转化为通过变分分布qθ(A|X)和后验分布pφ(A|G,X)表达,变分分布对应变分模型参数θ,后验分布对应生成模型参数φ,根据变分分布和后验分布之间的散度要求,对变分模型参数进行调整,并根据调整后的变分模型参数对生成模型参数进行调整。Among them, the maximum likelihood estimation function is transformed into a variational distribution q θ (A|X) and a posterior distribution p φ (A|G, X) expression, the variational distribution corresponds to the variational model parameter θ, and the posterior distribution corresponds to The model parameter φ is generated, the variational model parameters are adjusted according to the divergence requirements between the variational distribution and the posterior distribution, and the generated model parameters are adjusted according to the adjusted variational model parameters.

可选地,根据更新语义特征和更新节点特征,还可以通过损失值对生成模型参数进行调整。Optionally, according to the updated semantic feature and the updated node feature, the parameters of the generative model can also be adjusted through the loss value.

步骤604,将更新语义特征和更新节点特征输入预设损失函数,输出得到更新结果对应的损失值。Step 604: Input the updated semantic feature and the updated node feature into the preset loss function, and output the loss value corresponding to the update result.

传统的无向概率图模型(又称为马尔可夫网络)遵循成对马尔科夫性,即每个节点只与其一阶邻居之间存在概率依赖,而和未直接相连的节点之间是条件独立的。而异质图存在复杂的长距离依赖本申请通过自监督的训练策略,以正则化的形式来建模异质图中未直接相邻节点之间的关联关系。示意性的,对给定的中心节点vc周围的n跳领域内的节点展开研究。首先,设计特定的规则构造掩码子图Gmask,其中,Gmask包含的边满足:(1)保持原异质图中和中心节点vc直接相连的一跳邻居所对应的边,将这些边作为锚点;(2)对于除锚点之外的边,将原始异质图中的边屏蔽,并补充与中心节点vc不直接相连的边。构造好Gmask后,自监督任务以通过更新节点特征来对中心节点vc的类型进行预测为目标。该预测任务被形式化为一个分类任务,采用交叉熵损失函数来作为损失函数,损失函数的形式如下公式十四所示:The traditional undirected probabilistic graph model (also known as Markov network) follows the pairwise Markov property, that is, each node only has probabilistic dependencies between its first-order neighbors, and there is a conditional relationship between nodes that are not directly connected. independent. However, there are complex long-distance dependencies in heterogeneous graphs. This application uses a self-supervised training strategy to model the associations between non-directly adjacent nodes in heterogeneous graphs in the form of regularization. Illustratively, the nodes in the n-hop field around a given central node vc are studied. First, design a specific rule to construct a mask subgraph G mask , in which the edges contained in G mask satisfy: (1) Keep the edges corresponding to the one-hop neighbors directly connected to the central node vc in the original heterogeneous graph, and convert these edges as anchor points; (2) For edges other than anchor points, the edges in the original heterogeneous graph are masked, and the edges that are not directly connected to the central node vc are supplemented. After constructing the G mask , the self-supervised task aims to predict the type of the central node vc by updating the node features. The prediction task is formalized as a classification task, and the cross-entropy loss function is used as the loss function. The form of the loss function is shown in the following formula 14:

公式十四:

Figure BDA0002640813400000181
Formula fourteen:
Figure BDA0002640813400000181

其中,I代表指示函数。where I represents the indicator function.

步骤605,根据损失值对概率图生成模型进行调整。Step 605: Adjust the probability map generation model according to the loss value.

可选地,根据公式十四计算得到的损失值,对上述生成模型参数进行调整,并在调整后,以更新节点特征为初始节点特征,迭代执行通过概率图生成模型对初始节点特征进行嵌入的步骤。Optionally, according to the loss value calculated by formula 14, adjust the parameters of the generation model, and after the adjustment, take the updated node feature as the initial node feature, and iteratively execute the method of embedding the initial node feature through the probability map generation model. step.

可选地,在计算损失值时,除了上述公式十四提供的交叉熵损失函数外,还可以根据下游任务的目标函数确定总损失值,也即,与各种关于异质图的下游任务结合起来,利用标准的反向传播算法进行端到端的训练。Optionally, when calculating the loss value, in addition to the cross-entropy loss function provided by the above formula 14, the total loss value can also be determined according to the objective function of the downstream task, that is, combined with various downstream tasks on heterogeneous graphs. Up, end-to-end training using the standard backpropagation algorithm.

示意性的,总损失函数可以形式化如下为如下公式十五:Illustratively, the total loss function can be formalized as the following Equation 15:

Figure BDA0002640813400000183
Figure BDA0002640813400000183

其中,Ld表示下游任务对应的损失函数。Among them, L d represents the loss function corresponding to the downstream task.

综上所述,本实施例提供的分析结果的生成方法,以异质图结构为随机变量,并以语义方面为隐变量,从而对异质图进行嵌入处理,从异质图中获取隐含的语义,从而对节点特征向量进行更新,以及对语义特征向量进行更新,获取更准确的节点特征向量和语义特征向量,以及避免通过设置元路径的方式更新节点特征,提高了节点特征的更新效率,在下游任务中的任务执行准确度更高。To sum up, the method for generating the analysis result provided by this embodiment uses the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, so as to perform embedding processing on the heterogeneous graph, and obtain implicit information from the heterogeneous graph. The semantics of the node feature vector is updated, and the semantic feature vector is updated, so as to obtain more accurate node feature vector and semantic feature vector, and avoid updating the node feature by setting the meta-path, which improves the update efficiency of the node feature. , the task execution accuracy in downstream tasks is higher.

示意性的,结合上述本申请实施例提供的分析结果的生成方法,能够提高节点分类的准确率,请参考如下表一,其中的数值以百分制表示。Illustratively, in combination with the methods for generating analysis results provided by the above embodiments of the present application, the accuracy of node classification can be improved. Please refer to Table 1 below, where the values are expressed in percentages.

表一Table I

Figure BDA0002640813400000182
Figure BDA0002640813400000182

Figure BDA0002640813400000191
Figure BDA0002640813400000191

如上表一所示,利用节点分类任务对比了相关技术与本申请的分类结果,采用的数据库包括计算机英文文献(DataBase systems and Logic Programming,DBLP)数据库、计算机协会(Association for Computing Machinery,ACM)数据库以及互联网电影资料库(Internet Movie DataBase,IMDB)。As shown in Table 1 above, the classification results of related technologies and this application are compared by using the node classification task. The databases used include the DataBase systems and Logic Programming (DBLP) database and the Association for Computing Machinery (ACM) database. And Internet Movie Database (Internet Movie DataBase, IMDB).

其中,相关技术主要包括处理同质图的方法:深度游走(DeepWalk)、图卷积神经网络(Graph Convolutional Networks,GCN),以及异质图处理方法:metapath2vec、异质图注意力网络(Heterogeneous Graph Attention Network,HAN)、图转化网络(GraphTransformer Networks,GTN)。从表一中的结果可以看出,本申请提供的方法在节点分类准确率上处于领先。Among them, related technologies mainly include methods for processing homogeneous graphs: DeepWalk, Graph Convolutional Networks (GCN), and heterogeneous graph processing methods: metapath2vec, Heterogeneous Graph Attention Network (Heterogeneous Graph). Graph Attention Network, HAN), Graph Transformer Networks (GTN). It can be seen from the results in Table 1 that the method provided by this application is in the lead in the node classification accuracy.

图7是本申请一个示例性实施例提供的分析结果的生成装置的结构框图,如图7所示,该装置包括:FIG. 7 is a structural block diagram of an apparatus for generating an analysis result provided by an exemplary embodiment of the present application. As shown in FIG. 7 , the apparatus includes:

获取模块710,用于获取目标异质图的异质图结构,所述异质图结构中包括节点数据和边数据,所述节点数据对应有初始节点特征,所述边数据对应有语义方面;The obtaining module 710 is configured to obtain the heterogeneous graph structure of the target heterogeneous graph, wherein the heterogeneous graph structure includes node data and edge data, the node data corresponds to initial node features, and the edge data corresponds to semantic aspects;

确定模块720,用于确定所述节点数据对应的初始节点特征和所述边数据对应的语义方面;A determination module 720, configured to determine the initial node feature corresponding to the node data and the semantic aspect corresponding to the edge data;

嵌入模块730,用于以所述异质图结构为随机变量,所述语义方面为隐变量,对所述初始节点特征进行嵌入,得到所述目标异质图的异质图特征,所述异质图特征中包括更新语义特征和更新节点特征;The embedding module 730 is configured to use the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, and embed the initial node feature to obtain the heterogeneous graph feature of the target heterogeneous graph. The qualitative graph features include update semantic features and update node features;

分析模块740,用于对所述更新语义特征和所述更新节点特征进行分析,生成与所述节点数据对应的分析结果。The analysis module 740 is configured to analyze the updated semantic feature and the updated node feature, and generate an analysis result corresponding to the node data.

在一个可选的实施例中,请参考图8,所述嵌入模块730,还用于将所述异质图结构输入概率图生成模型;以所述异质图结构为随机变量,所述语义方面为隐变量,通过所述概率图生成模型对所述初始节点特征进行嵌入。In an optional embodiment, please refer to FIG. 8 , the embedding module 730 is further configured to input the heterogeneous graph structure into a probability graph generation model; taking the heterogeneous graph structure as a random variable, the semantic Aspects are hidden variables, and the initial node features are embedded through the probability graph generation model.

在一个可选的实施例中,所述概率图生成模型包括生成模型参数;In an optional embodiment, the probability map generation model includes generating model parameters;

所述嵌入模块730,包括:The embedded module 730 includes:

确定单元731,用于确定所述模型参数对应所述异质图结构和所述初始节点特征的最大似然估计函数;A determination unit 731, configured to determine the maximum likelihood estimation function of the model parameter corresponding to the heterogeneous graph structure and the initial node feature;

调整单元732,用于通过所述最大似然估计函数对所述模型参数进行调整;an adjustment unit 732, configured to adjust the model parameters through the maximum likelihood estimation function;

嵌入单元733,用于以调整后的所述模型参数通过所述概率图生成模型对所述初始节点特征进行嵌入。The embedding unit 733 is configured to use the adjusted model parameters to embed the initial node feature through the probability map generation model.

在一个可选的实施例中,所述调整单元732,具体用于将所述最大似然估计函数转化为通过变分分布和后验分布表达,所述变分分布对应变分模型参数,所述后验分布对应所述生成模型参数;根据所述变分分布和所述后验分布之间的散度要求,对所述变分模型参数进行调整;根据调整后的所述变分模型参数对所述生成模型参数进行调整。In an optional embodiment, the adjustment unit 732 is specifically configured to convert the maximum likelihood estimation function into a variational distribution and a posterior distribution, where the variational distribution corresponds to the variational model parameters, so The posterior distribution corresponds to the generation model parameters; the variational model parameters are adjusted according to the divergence requirements between the variational distribution and the posterior distribution; and the variational model parameters are adjusted according to the The generative model parameters are adjusted.

在一个可选的实施例中,所述前嵌入模块730,还用于将所述更新语义特征和所述更新节点特征输入预设损失函数,输出得到更新结果对应的损失值;In an optional embodiment, the pre-embedding module 730 is further configured to input the update semantic feature and the update node feature into a preset loss function, and output the loss value corresponding to the update result;

所述嵌入模块730,还用于根据所述损失值对所述概率图生成模型进行调整。The embedding module 730 is further configured to adjust the probability map generation model according to the loss value.

在一个可选的实施例中,所述嵌入模块730,还用于以所述更新节点特征为所述初始节点特征,迭代执行所述通过所述概率图生成模型对所述初始节点特征进行嵌入的步骤。In an optional embodiment, the embedding module 730 is further configured to use the updated node feature as the initial node feature, and iteratively execute the embedding of the initial node feature by using the probability graph generation model A step of.

在一个可选的实施例中,所述边数据还对应有边权重,所述语义方面对应有语义数量,所述语义数量用于指示所述边数据指示的语义方面类型的数量;In an optional embodiment, the edge data also corresponds to an edge weight, and the semantic aspect corresponds to a semantic quantity, and the semantic quantity is used to indicate the quantity of the semantic aspect type indicated by the edge data;

所述嵌入模块730,包括:The embedded module 730 includes:

嵌入单元733,用于将所述边权重和所述初始节点特征解耦得到K个方面通道,K为正整数,且K的取值与所述语义数量对应,其中,每个方面通道对应一组节点特征向量和方面特征向量;The embedding unit 733 is configured to decouple the edge weight and the initial node feature to obtain K aspect channels, where K is a positive integer, and the value of K corresponds to the semantic quantity, wherein each aspect channel corresponds to a group node eigenvectors and aspect eigenvectors;

调整单元732,用于根据第k个方面通道的节点特征向量和方面特征向量,对边权重进行更新;根据更新后的边权重对所述初始节点特征进行更新。The adjustment unit 732 is configured to update the edge weight according to the node feature vector and the aspect feature vector of the kth aspect channel; update the initial node feature according to the updated edge weight.

在一个可选的实施例中,所述嵌入单元,还用于将所述初始节点特征通过非线性映射函数进行映射,得到所述第k方面通道的所述节点特征向量;在所述第k个方面通道中进行全局图池化处理,得到所述第k个方面通道的所述方面特征向量。In an optional embodiment, the embedding unit is further configured to map the initial node feature through a nonlinear mapping function to obtain the node feature vector of the kth aspect channel; A global graph pooling process is performed in each aspect channel to obtain the aspect feature vector of the kth aspect channel.

在一个可选的实施例中,所述调整单元732,还用于获取第i个节点在所述第k个方面通道的第一节点特征向量和第一方面特征向量,拼接得到第一拼接向量;获取第j个节点在所述第k个方面通道的第二节点特征向量和第二方面特征向量,拼接得到第二拼接向量;获取第m个节点在所述第k个方面通道的第三节点特征向量和第三方面特征向量,拼接得到第三拼接向量,i,j,m皆为正整数,且所述第m个节点为所述第i个节点的邻居节点;In an optional embodiment, the adjustment unit 732 is further configured to obtain the first node feature vector and the first aspect feature vector of the i-th node in the k-th aspect channel, and splicing to obtain the first splicing vector ; Obtain the second node feature vector and the second aspect feature vector of the jth node in the kth aspect channel, and splicing to obtain the second splicing vector; Obtain the mth node in the kth aspect channel. The third The node feature vector and the third aspect feature vector are spliced to obtain a third splicing vector, i, j, m are all positive integers, and the mth node is the neighbor node of the ith node;

所述调整单元732,还用于确定所述第一拼接向量和所述第二拼接向量的第一语义相似度;确定所述第一拼接向量和所述第三拼接向量的第二语义相似度;根据所述第一语义相似度和所述第二语义相似度对所述边权重进行更新。The adjustment unit 732 is further configured to determine the first semantic similarity between the first splicing vector and the second splicing vector; determine the second semantic similarity between the first splicing vector and the third splicing vector ; Update the edge weights according to the first semantic similarity and the second semantic similarity.

在一个可选的实施例中,所述节点数据中包括第一节点和第二节点,所述第一节点对应帐号数据,所述第二节点对应候选推荐内容;In an optional embodiment, the node data includes a first node and a second node, the first node corresponds to account data, and the second node corresponds to candidate recommended content;

分析模块740,还用于根据所述异质图特征对所述第一节点和所述第二节点进行关联预测,得到关联预测结果作为所述分析结果,所述关联预测结果用于指示所述帐号数据对所述候选推荐内容的预测感兴趣程度。The analysis module 740 is further configured to perform association prediction on the first node and the second node according to the heterogeneous graph feature, and obtain an association prediction result as the analysis result, and the association prediction result is used to indicate the The predicted interest level of the account data in the candidate recommended content.

在一个可选的实施例中,分析模块740,还用于将所述异质图特征输入推荐模型,所述推荐模型为预先训练得到的机器学习模型;通过所述推荐模型对所述第一节点和所述第二节点进行关联预测,得到所述关联预测结果。In an optional embodiment, the analysis module 740 is further configured to input the heterogeneous graph features into a recommendation model, where the recommendation model is a machine learning model obtained by pre-training; The node and the second node perform association prediction to obtain the association prediction result.

综上所述,本实施例提供的分析结果的生成装置,以异质图结构为随机变量,并以语义方面为隐变量,从而对异质图进行嵌入处理,从异质图中获取隐含的语义,从而对节点特征向量进行更新,以及对语义特征向量进行更新,获取更准确的节点特征向量和语义特征向量,以及避免通过设置元路径的方式更新节点特征,提高了节点特征的更新效率,在下游任务中的任务执行准确度更高。To sum up, the device for generating analysis results provided by this embodiment uses the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, so as to perform embedding processing on the heterogeneous graph, and obtain implicit information from the heterogeneous graph. The semantics of the node feature vector is updated, and the semantic feature vector is updated, so as to obtain more accurate node feature vector and semantic feature vector, and avoid updating the node feature by setting the meta-path, which improves the update efficiency of the node feature. , the task execution accuracy in downstream tasks is higher.

需要说明的是:上述实施例提供的分析结果的生成装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的分析结果的生成装置与分析结果的生成方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: the apparatus for generating analysis results provided in the above-mentioned embodiments is only illustrated by the division of the above-mentioned functional modules. The internal structure is divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus for generating an analysis result provided in the above embodiment and the embodiment of the method for generating an analysis result belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment, which will not be repeated here.

图9示出了本申请一个示例性实施例提供的服务器的结构示意图。具体来讲:FIG. 9 shows a schematic structural diagram of a server provided by an exemplary embodiment of the present application. Specifically:

服务器900包括中央处理单元(Central Processing Unit,CPU)901、包括随机存取存储器(Random Access Memory,RAM)902和只读存储器(Read Only Memory,ROM)903的系统存储器904,以及连接系统存储器904和中央处理单元901的系统总线905。服务器900还包括用于存储操作系统913、应用程序914和其他程序模块915的大容量存储设备906。The server 900 includes a central processing unit (CPU) 901 , a system memory 904 including a random access memory (RAM) 902 and a read only memory (ROM) 903 , and a connection system memory 904 And the system bus 905 of the central processing unit 901. Server 900 also includes mass storage device 906 for storing operating system 913 , application programs 914 and other program modules 915 .

大容量存储设备906通过连接到系统总线905的大容量存储控制器(未示出)连接到中央处理单元901。大容量存储设备906及其相关联的计算机可读介质为服务器900提供非易失性存储。也就是说,大容量存储设备906可以包括诸如硬盘或者紧凑型光盘只读存储器(Compact Disc Read Only Memory,CD-ROM)驱动器之类的计算机可读介质(未示出)。Mass storage device 906 is connected to central processing unit 901 through a mass storage controller (not shown) connected to system bus 905 . Mass storage device 906 and its associated computer-readable media provide non-volatile storage for server 900 . That is, mass storage device 906 may include a computer-readable medium (not shown) such as a hard disk or a Compact Disc Read Only Memory (CD-ROM) drive.

不失一般性,计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、带电可擦可编程只读存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、闪存或其他固态存储其技术,CD-ROM、数字通用光盘(Digital Versatile Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知计算机存储介质不局限于上述几种。上述的系统存储器904和大容量存储设备906可以统称为存储器。Without loss of generality, computer-readable media can include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other solid-state storage Its technology, CD-ROM, Digital Versatile Disc (DVD) or other optical storage, cassette, magnetic tape, magnetic disk storage or other magnetic storage device. Of course, those skilled in the art know that the computer storage medium is not limited to the above-mentioned types. The system memory 904 and mass storage device 906 described above may be collectively referred to as memory.

根据本申请的各种实施例,服务器900还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即服务器900可以通过连接在系统总线905上的网络接口单元911连接到网络912,或者说,也可以使用网络接口单元911来连接到其他类型的网络或远程计算机系统(未示出)。According to various embodiments of the present application, the server 900 may also run on a remote computer connected to a network through a network such as the Internet. That is, the server 900 can be connected to the network 912 through the network interface unit 911 connected to the system bus 905, or can also use the network interface unit 911 to connect to other types of networks or remote computer systems (not shown).

上述存储器还包括一个或者一个以上的程序,一个或者一个以上程序存储于存储器中,被配置由CPU执行。The above-mentioned memory also includes one or more programs, and the one or more programs are stored in the memory and configured to be executed by the CPU.

本申请的实施例还提供了一种计算机设备,该计算机设备包括处理器和存储器,该存储器中存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行以实现上述各方法实施例提供的分析结果的生成方法。Embodiments of the present application also provide a computer device, the computer device includes a processor and a memory, and the memory stores at least one instruction, at least one piece of program, code set or instruction set, at least one instruction, at least one piece of program, code The set or instruction set is loaded and executed by the processor to implement the method for generating the analysis result provided by the above method embodiments.

本申请的实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储有至少一条指令、至少一段程序、代码集或指令集,至少一条指令、至少一段程序、代码集或指令集由处理器加载并执行,以实现上述各方法实施例提供的分析结果的生成方法。Embodiments of the present application further provide a computer-readable storage medium, on which is stored at least one instruction, at least one piece of program, code set or instruction set, at least one instruction, at least one piece of program, code set or The instruction set is loaded and executed by the processor, so as to implement the method for generating the analysis result provided by the foregoing method embodiments.

本申请的实施例还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述实施例中任一所述的分析结果的生成方法。Embodiments of the present application also provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method for generating an analysis result described in any of the foregoing embodiments.

可选地,该计算机可读存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、固态硬盘(SSD,Solid State Drives)或光盘等。其中,随机存取记忆体可以包括电阻式随机存取记忆体(ReRAM,Resistance RandomAccess Memory)和动态随机存取存储器(DRAM,Dynamic Random Access Memory)。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。Optionally, the computer-readable storage medium may include: Read Only Memory (ROM, Read Only Memory), Random Access Memory (RAM, Random Access Memory), Solid State Drives (SSD, Solid State Drives), or an optical disc. The random access memory may include a resistive random access memory (ReRAM, Resistance Random Access Memory) and a dynamic random access memory (DRAM, Dynamic Random Access Memory). The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above embodiments can be completed by hardware, or can be completed by instructing relevant hardware through a program, and the program can be stored in a computer-readable storage medium. The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, etc.

以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above are only optional embodiments of the present application, and are not intended to limit the present application. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present application shall be included in the protection of the present application. within the range.

Claims (14)

1.一种分析结果的生成方法,其特征在于,所述方法包括:1. a generation method of analysis result, is characterized in that, described method comprises: 获取目标异质图的异质图结构,所述异质图结构中包括节点数据和边数据;obtaining a heterogeneous graph structure of the target heterogeneous graph, where the heterogeneous graph structure includes node data and edge data; 确定所述节点数据对应的初始节点特征,以及所述边数据对应的语义方面;determining the initial node feature corresponding to the node data, and the semantic aspect corresponding to the edge data; 以所述异质图结构为随机变量,所述语义方面为隐变量,对所述初始节点特征进行嵌入,得到所述目标异质图的异质图特征,所述异质图特征中包括更新语义特征和更新节点特征;Taking the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, the initial node feature is embedded to obtain the heterogeneous graph feature of the target heterogeneous graph, and the heterogeneous graph feature includes an update Semantic features and update node features; 对所述更新语义特征和所述更新节点特征进行分析,生成与所述节点数据对应的分析结果。The updated semantic feature and the updated node feature are analyzed to generate an analysis result corresponding to the node data. 2.根据权利要求1所述的方法,其特征在于,所述以所述异质图结构为随机变量,所述语义方面为隐变量,对所述初始节点特征进行嵌入,包括:2. The method according to claim 1, characterized in that, using the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable, embedding the initial node features, comprising: 将所述异质图结构输入概率图生成模型;inputting the heterogeneous graph structure into a probability graph generation model; 以所述异质图结构为随机变量,所述语义方面为隐变量,通过所述概率图生成模型对所述初始节点特征进行嵌入。Taking the heterogeneous graph structure as a random variable and the semantic aspect as a latent variable, the initial node feature is embedded through the probability graph generation model. 3.根据权利要求2所述的方法,其特征在于,所述概率图生成模型包括生成模型参数;3. The method according to claim 2, wherein the probability map generating model comprises generating model parameters; 所述通过所述概率图生成模型对所述初始节点特征进行嵌入,包括:Embedding the initial node feature through the probability map generation model includes: 确定所述模型参数对应所述异质图结构和所述初始节点特征的最大似然估计函数;determining the maximum likelihood estimation function of the model parameters corresponding to the heterogeneous graph structure and the initial node feature; 通过所述最大似然估计函数对所述模型参数进行调整;adjusting the model parameters through the maximum likelihood estimation function; 以调整后的所述模型参数通过所述概率图生成模型对所述初始节点特征进行嵌入。The initial node feature is embedded through the probability map generation model with the adjusted model parameters. 4.根据权利要求3所述的方法,其特征在于,通过所述最大似然估计函数对所述模型参数进行调整,包括:4. The method according to claim 3, wherein the model parameters are adjusted by the maximum likelihood estimation function, comprising: 将所述最大似然估计函数转化为通过变分分布和后验分布表达,所述变分分布对应变分模型参数,所述后验分布对应所述生成模型参数;Converting the maximum likelihood estimation function to be expressed by a variational distribution and a posterior distribution, the variational distribution corresponds to the variational model parameters, and the posterior distribution corresponds to the generation model parameters; 根据所述变分分布和所述后验分布之间的散度要求,对所述变分模型参数进行调整;adjusting the variational model parameters according to the divergence requirement between the variational distribution and the posterior distribution; 根据调整后的所述变分模型参数对所述生成模型参数进行调整。The generative model parameters are adjusted according to the adjusted variational model parameters. 5.根据权利要求2至4任一所述的方法,其特征在于,所述输出得到更新语义特征和更新节点特征之后,还包括:5. The method according to any one of claims 2 to 4, wherein after the output obtains the updated semantic feature and the updated node feature, it further comprises: 将所述更新语义特征和所述更新节点特征输入预设损失函数,输出得到更新结果对应的损失值;Inputting the update semantic feature and the update node feature into a preset loss function, and outputting the loss value corresponding to the update result; 根据所述损失值对所述概率图生成模型进行调整。The probability map generation model is adjusted according to the loss value. 6.根据权利要求5所述的方法,其特征在于,所述输出得到所述目标异质图的异质图特征之后,还包括:6. The method according to claim 5, wherein after the outputting the heterogeneous graph feature of the target heterogeneous graph, the method further comprises: 以所述更新节点特征为所述初始节点特征,迭代执行所述通过所述概率图生成模型对所述初始节点特征进行嵌入的步骤。Taking the updated node feature as the initial node feature, iteratively executes the step of embedding the initial node feature by using the probability graph generation model. 7.根据权利要求1至4任一所述的方法,其特征在于,所述边数据还对应有边权重,所述语义方面对应有语义数量,所述语义数量用于指示所述边数据指示的语义方面类型的数量;7. The method according to any one of claims 1 to 4, wherein the edge data further corresponds to an edge weight, and the semantic aspect corresponds to a semantic quantity, and the semantic quantity is used to indicate that the edge data indicates the number of semantic aspect types; 所述对所述初始节点特征进行嵌入,包括:The embedding of the initial node features includes: 将所述边权重和所述初始节点特征解耦得到K个方面通道,K为正整数,且K的取值与所述语义数量对应,其中,每个方面通道对应一组节点特征向量和方面特征向量;Decoupling the edge weights and the initial node features to obtain K aspect channels, where K is a positive integer, and the value of K corresponds to the semantic quantity, wherein each aspect channel corresponds to a set of node feature vectors and aspects Feature vector; 根据第k个方面通道的节点特征向量和方面特征向量,对边权重进行更新;Update the edge weights according to the node feature vector and the aspect feature vector of the kth aspect channel; 根据更新后的边权重对所述初始节点特征进行更新。The initial node features are updated according to the updated edge weights. 8.根据权利要求7所述的方法,其特征在于,所述根据第k个方面通道的节点特征向量和方面特征向量,对边权重进行更新之前,包括:8. The method according to claim 7, wherein, before updating the edge weights according to the node feature vector and the aspect feature vector of the kth aspect channel, the method comprises: 将所述初始节点特征通过非线性映射函数进行映射,得到所述第k方面通道的所述节点特征向量;Mapping the initial node feature through a nonlinear mapping function to obtain the node feature vector of the kth aspect channel; 在所述第k个方面通道中进行全局图池化处理,得到所述第k个方面通道的所述方面特征向量。A global graph pooling process is performed in the kth aspect channel to obtain the aspect feature vector of the kth aspect channel. 9.根据权利要求7所述的方法,其特征在于,所述根据第k个方面通道的节点特征向量和方面特征向量,对边权重进行更新,包括:9. The method according to claim 7, wherein, updating the edge weights according to the node feature vector and the aspect feature vector of the kth aspect channel, comprising: 获取第i个节点在所述第k个方面通道的第一节点特征向量和第一方面特征向量,拼接得到第一拼接向量;Obtain the first node feature vector and the first aspect feature vector of the i-th node in the k-th aspect channel, and splicing to obtain the first splicing vector; 获取第j个节点在所述第k个方面通道的第二节点特征向量和第二方面特征向量,拼接得到第二拼接向量;Obtain the second node feature vector and the second aspect feature vector of the jth node in the kth aspect channel, and splicing to obtain the second splicing vector; 获取第m个节点在所述第k个方面通道的第三节点特征向量和第三方面特征向量,拼接得到第三拼接向量,i,j,m皆为正整数,且所述第m个节点为所述第i个节点的邻居节点;Obtain the third node feature vector and the third aspect feature vector of the mth node in the kth aspect channel, splicing to obtain a third splicing vector, i, j, m are all positive integers, and the mth node is the neighbor node of the i-th node; 确定所述第一拼接向量和所述第二拼接向量的第一语义相似度;determining the first semantic similarity of the first splicing vector and the second splicing vector; 确定所述第一拼接向量和所述第三拼接向量的第二语义相似度;determining the second semantic similarity of the first splicing vector and the third splicing vector; 根据所述第一语义相似度和所述第二语义相似度对所述边权重进行更新。The edge weights are updated according to the first semantic similarity and the second semantic similarity. 10.根据权利要求1至4任一所述的方法,其特征在于,所述节点数据中包括第一节点和第二节点,所述第一节点对应帐号数据,所述第二节点对应候选推荐内容;10. The method according to any one of claims 1 to 4, wherein the node data includes a first node and a second node, the first node corresponds to account data, and the second node corresponds to candidate recommendation content; 所述对所述更新语义特征和所述更新节点特征进行分析,得到与所述节点数据对应的分析结果,包括:Analyzing the update semantic feature and the update node feature to obtain an analysis result corresponding to the node data, including: 根据所述异质图特征对所述第一节点和所述第二节点进行关联预测,得到关联预测结果作为所述分析结果,所述关联预测结果用于指示所述帐号数据对所述候选推荐内容的预测感兴趣程度。Perform association prediction on the first node and the second node according to the heterogeneous graph feature, and obtain an association prediction result as the analysis result, where the association prediction result is used to indicate that the account data recommends the candidate The predicted level of interest in the content. 11.根据权利要求10所述的方法,其特征在于,所述根据所述异质图特征对所述第一节点和所述第二节点进行关联预测,得到关联预测结果作为所述分析结果,包括:11. The method according to claim 10, wherein the association prediction is performed on the first node and the second node according to the heterogeneous graph feature, and an association prediction result is obtained as the analysis result, include: 将所述异质图特征输入推荐模型,所述推荐模型为预先训练得到的机器学习模型;Inputting the heterogeneous graph features into a recommendation model, where the recommendation model is a machine learning model obtained by pre-training; 通过所述推荐模型对所述第一节点和所述第二节点进行关联预测,得到所述关联预测结果。The association prediction is performed on the first node and the second node through the recommendation model to obtain the association prediction result. 12.一种分析结果的生成装置,其特征在于,所述装置包括:12. A device for generating analysis results, wherein the device comprises: 获取模块,用于获取目标异质图的异质图结构,所述异质图结构中包括节点数据和边数据,所述节点数据对应有初始节点特征,所述边数据对应有语义方面;an acquisition module, configured to acquire a heterogeneous graph structure of a target heterogeneous graph, wherein the heterogeneous graph structure includes node data and edge data, the node data corresponds to initial node features, and the edge data corresponds to semantic aspects; 确定模块,用于确定所述节点数据对应的初始节点特征和所述边数据对应的语义方面;a determination module, configured to determine the initial node feature corresponding to the node data and the semantic aspect corresponding to the edge data; 嵌入模块,用于以所述异质图结构为随机变量,所述语义方面为隐变量,对所述初始节点特征进行嵌入,得到所述目标异质图的异质图特征,所述异质图特征中包括更新语义特征和更新节点特征;The embedding module is used to embed the initial node feature with the heterogeneous graph structure as a random variable and the semantic aspect as a hidden variable to obtain the heterogeneous graph feature of the target heterogeneous graph. The graph features include update semantic features and update node features; 分析模块,用于对所述更新语义特征和所述更新节点特征进行分析,生成与所述节点数据对应的分析结果。An analysis module, configured to analyze the updated semantic feature and the updated node feature, and generate an analysis result corresponding to the node data. 13.一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现如权利要求1至11任一所述的分析结果的生成方法。13. A computer device, characterized in that the computer device comprises a processor and a memory, and the memory stores at least one instruction, at least a piece of program, a code set or an instruction set, the at least one instruction, the at least one A piece of program, the code set or the instruction set is loaded and executed by the processor to implement the method for generating an analysis result according to any one of claims 1 to 11. 14.一种计算机可读存储介质,其特征在于,所述存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由处理器加载并执行以实现如权利要求1至11任一所述的分析结果的生成方法。14. A computer-readable storage medium, wherein the storage medium stores at least one instruction, at least one piece of program, code set or instruction set, the at least one instruction, the at least one piece of program, the code The set or instruction set is loaded and executed by the processor to implement the method for generating the analysis result according to any one of claims 1 to 11 .
CN202010839225.1A 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium Pending CN111967271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010839225.1A CN111967271A (en) 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010839225.1A CN111967271A (en) 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN111967271A true CN111967271A (en) 2020-11-20

Family

ID=73389387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010839225.1A Pending CN111967271A (en) 2020-08-19 2020-08-19 Analysis result generation method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN111967271A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033194A (en) * 2021-03-09 2021-06-25 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of semantic representation graph model
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network
WO2021139738A1 (en) * 2020-01-07 2021-07-15 北京嘀嘀无限科技发展有限公司 Target task execution vehicle determination method, and system
CN115828931A (en) * 2023-02-09 2023-03-21 中南大学 Chinese and English semantic similarity calculation method for paragraph-level text
CN117496161A (en) * 2023-12-29 2024-02-02 武汉理工大学 Point cloud segmentation method and device
WO2024169263A1 (en) * 2023-02-13 2024-08-22 腾讯科技(深圳)有限公司 Search data processing method and apparatus, computer device and storage medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021139738A1 (en) * 2020-01-07 2021-07-15 北京嘀嘀无限科技发展有限公司 Target task execution vehicle determination method, and system
CN113033194A (en) * 2021-03-09 2021-06-25 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of semantic representation graph model
CN113033194B (en) * 2021-03-09 2023-10-24 北京百度网讯科技有限公司 Training method, device, equipment and storage medium for semantic representation graph model
CN113094707A (en) * 2021-03-31 2021-07-09 中国科学院信息工程研究所 Transverse mobile attack detection method and system based on heterogeneous graph network
CN113094707B (en) * 2021-03-31 2024-05-14 中国科学院信息工程研究所 Lateral movement attack detection method and system based on heterogeneous graph network
CN115828931A (en) * 2023-02-09 2023-03-21 中南大学 Chinese and English semantic similarity calculation method for paragraph-level text
WO2024169263A1 (en) * 2023-02-13 2024-08-22 腾讯科技(深圳)有限公司 Search data processing method and apparatus, computer device and storage medium
CN117496161A (en) * 2023-12-29 2024-02-02 武汉理工大学 Point cloud segmentation method and device
CN117496161B (en) * 2023-12-29 2024-04-05 武汉理工大学 Point cloud segmentation method and device

Similar Documents

Publication Publication Date Title
JP7620727B2 (en) Explainable Transducers and Transformers
EP4018390B1 (en) Resource constrained neural network architecture search
CN111967271A (en) Analysis result generation method, device, equipment and readable storage medium
JP7105789B2 (en) Machine learning method and apparatus for ranking network nodes after using a network with software agents at the network nodes
CN109543740B (en) A target detection method based on generative adversarial network
Li et al. A survey of explainable graph neural networks: Taxonomy and evaluation metrics
CN111651671B (en) User object recommendation method, device, computer equipment and storage medium
WO2021103761A1 (en) Compound property analysis method and apparatus, compound property analysis model training method, and storage medium
CN110659723B (en) Data processing method and device based on artificial intelligence, medium and electronic equipment
Sun et al. Consistent sparse deep learning: Theory and computation
CN113569906A (en) Method and Device for Extracting Heterogeneous Graph Information Based on Meta-Path Subgraph
CN113987200A (en) Recommendation method, system, terminal and medium combining neural network with knowledge graph
US11948078B2 (en) Joint representation learning from images and text
CN111832312A (en) Text processing method, device, equipment and storage medium
Dong et al. Reliant: Fair knowledge distillation for graph neural networks
CN115982467A (en) Multi-interest recommendation method and device for depolarized user and storage medium
CN116541507A (en) A visual question answering method and system based on dynamic semantic graph neural network
CN111178986A (en) User-commodity preference prediction method and system
Zheng et al. Collaborative filtering recommendation algorithm based on variational inference
CN114547303A (en) Text multi-feature classification method and device based on Bert-LSTM
Mishra et al. Deep learning based continuous integration and continuous delivery software defect prediction with effective optimization strategy
CN117236419A (en) A multi-source heterogeneous network migration method without accessing source data
Frieder et al. (Non-) Convergence Results for Predictive Coding Networks
CN116756554A (en) Training method, device, equipment, medium and program product for alignment model
CN114048395A (en) User forwarding prediction method and system based on time perception and key information extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination