WO2020147594A1 - Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system - Google Patents

Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system Download PDF

Info

Publication number
WO2020147594A1
WO2020147594A1 PCT/CN2020/070249 CN2020070249W WO2020147594A1 WO 2020147594 A1 WO2020147594 A1 WO 2020147594A1 CN 2020070249 W CN2020070249 W CN 2020070249W WO 2020147594 A1 WO2020147594 A1 WO 2020147594A1
Authority
WO
WIPO (PCT)
Prior art keywords
heterogeneous
node
nodes
graph
sample data
Prior art date
Application number
PCT/CN2020/070249
Other languages
French (fr)
Chinese (zh)
Inventor
陈怡然
温世阳
吴文金
林伟
朱晓宇
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020147594A1 publication Critical patent/WO2020147594A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy

Definitions

  • the present invention relates to the technical field of data mining, in particular to a method, system and equipment for obtaining expressions of relationships between entities, and an advertisement recall system.
  • the inventor of the present invention found:
  • a graph is composed of nodes and edges.
  • a node is used to represent an entity, and the edge between nodes is used to represent the relationship between nodes.
  • a graph generally includes more than two nodes and more than one edge. Therefore, a graph can also be understood as consisting of a collection of nodes and a collection of edges, usually expressed as: G(V, E), where G represents the graph , V represents the set of nodes in the graph G, and E is the set of edges in the graph G.
  • G represents the graph
  • V represents the set of nodes in the graph G
  • E is the set of edges in the graph G.
  • Graphs can be divided into homogeneous graphs and heterogeneous graphs.
  • a heterogeneous graph refers to different types of nodes in a graph (the types of edges can be the same or different), or different types of edges in a graph (the types of nodes can be the same or different). Therefore, when there are many types of entities that need to be expressed by multiple types of nodes, or the relationship between entities does not need to be expressed by multiple types of edges, it is preferable to express these entities and their relationships through heterogeneous graphs.
  • the magnitude of the nodes and edges included in the heterogeneous graph is very large, the heterogeneous graph will be extremely complex and the amount of data will be very large. Therefore, reducing the complexity and data volume of the heterogeneous graph becomes the field Technical problems faced by technicians.
  • the present invention is proposed to provide a method, system and equipment for obtaining expressions of relationships between entities, and an advertisement recall system that overcomes or at least partially solves the above-mentioned problems.
  • the embodiment of the present invention provides an advertisement recall system, including a system for obtaining relationship expressions between entities and an advertisement recall matching system;
  • the system for obtaining expressions of relationships between entities is used to construct a heterogeneous graph for advertisement search scenarios, and the node types in the heterogeneous graph include: at least one of advertisements, commodities, and query words.
  • the types of edges include at least one of click edges, co-click edges, collaborative filtering edges, content semantically similar edges, and attribute similar edges;
  • the preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs.
  • a graph convolution model corresponds to a heterogeneous subgraph;
  • the preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
  • the preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
  • a node of corresponds to an entity in the sample data
  • the advertisement recall matching system is configured to use the low-dimensional vector expressions of query term nodes, commodity nodes and search advertisement nodes obtained by the system for obtaining inter-entity relationship expressions to determine the relationship between query term nodes, commodity nodes and search advertisement nodes According to the matching degree, select search advertisements that match the product and query terms according to the set requirements.
  • a meta-path corresponds to a heterogeneous subgraph
  • the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph are specifically: a meta-path Used to express the structure of a heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
  • the splitting the heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path specifically includes:
  • the system for obtaining expressions of relationships between entities uses a preset graph convolution model to learn the sample data according to heterogeneous subgraphs to obtain vector expressions of nodes in the heterogeneous subgraphs , Specifically including:
  • the preset graph convolution model obtains the information of the nodes in the heterogeneous graph according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph.
  • Vector expression
  • the system for obtaining expressions of relationships between entities aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data through a preset aggregation model to obtain different heterogeneous subgraphs
  • the same vector expression of the same node includes:
  • the preset aggregation model is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs.
  • attention mechanism aggregation learning fully connected aggregation learning
  • weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs.
  • the same vector representation of the same node in the graph is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs.
  • the same vector representation of the same node in the graph is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighte
  • the advertisement recall matching system determining the degree of matching among query term nodes, commodity nodes and search advertisement nodes includes:
  • the virtual request node is a virtual node constructed by the query term node and the commodity node pre-clicked by the user under the common query term;
  • the matching degree between the query term node, the product node and the search advertisement node is determined.
  • the advertisement recall matching system selects search advertisements that match the product and the query term according to the matching degree, including:
  • a search advertisement whose distance meets the set requirement is selected.
  • the embodiment of the present invention also provides a method for obtaining expressions of relationships between entities, including:
  • the preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs.
  • a graph convolution model corresponds to a heterogeneous subgraph;
  • the preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
  • the preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
  • a node in the heterogeneous graph corresponds to the sample data Of an entity.
  • a meta-path corresponds to a heterogeneous subgraph
  • the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph are specifically: a meta-path Used to express the structure of a heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
  • the splitting the heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path specifically includes:
  • the one meta-path is used to express the structure of a heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph, specifically:
  • a meta-path includes node types and edge types alternately arranged in order. Among them, the node types are ranked first and last. The order of the node types and edge types expresses the structure of heterogeneous subgraphs;
  • the splitting a heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths specifically includes:
  • the preset graph convolution model learns the sample data according to the heterogeneous subgraph to obtain the vector expression of the nodes in the heterogeneous subgraph, which specifically includes:
  • the preset graph convolution model learns the sample data according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph, to obtain Vector expression of nodes in the heterogeneous subgraph.
  • the preset graph convolution model is based on the attribute information of each node in the heterogeneous subgraph and the structure information and attributes of at least first-order neighbor nodes of each node in the heterogeneous subgraph.
  • Information, learning the sample data to obtain the vector expression of each node in the heterogeneous subgraph specifically including:
  • the preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes to obtain the vector expression of the node.
  • the preset graph convolution model is based on the attribute information of each node in the heterogeneous subgraph and the structure information and attributes of at least first-order neighbor nodes of each node in the heterogeneous subgraph.
  • Information, learning the sample data to obtain the vector expression of each node in the heterogeneous graph specifically including:
  • the preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes after sampling to obtain the vector expression of the node.
  • the preset aggregation model aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data to obtain the same vector expression of the same node in different heterogeneous subgraphs, Specifically:
  • the preset aggregation model is based on the sample data, and uses an attention mechanism or a fully connected aggregation mechanism or a weighted average aggregation mechanism to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain the same The same vector representation of the node.
  • the embodiment of the present invention also provides a system for obtaining expressions of relationships between entities, including: a registration device, a storage device, a calculation device, and a parameter exchange device;
  • Storage device for storing data of heterogeneous sub-graphs
  • the computing device is used to obtain the data of the heterogeneous subgraph from the storage device through the registration device, and learn the sample data based on the heterogeneous graph by using the above-mentioned method of obtaining the relationship expression between entities to obtain the low-dimensionality of each node in the heterogeneous graph Vector expression
  • the parameter exchange device is used for parameter interaction with the computing device.
  • the graph convolution model is used to learn the sample data, and the vector expressions of the same nodes obtained by the learning of each heterogeneous subgraph are merged, and the vector expressions of the same nodes are
  • the fusion result optimizes the parameters of the machine learning model, which is used to learn the next batch of samples, realize the iterative learning of the samples, and finally obtain the low-dimensional vector expression for the nodes in the heterogeneous graph, thereby reducing the heterogeneous graph learning process
  • the speed and efficiency of heterogeneous graph learning are improved.
  • the heterogeneous graph learning method is used in the advertisement search scene, mining the entity relationship in the advertisement search scene to realize the use of a large amount of information to accurately realize the advertisement recall, improve the quality of the advertisement recall, and use all advertisements as candidates to ensure that it can be recalled under any traffic Enough advertisements can be achieved in one step through the vector method.
  • FIG. 1 is a flowchart of a method for obtaining expressions of relationships between entities in Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram of the principle of a method for obtaining expressions of relationships between entities in Embodiment 2 of the present invention
  • Embodiment 3 is a flowchart of a method for obtaining expressions of relationships between entities in Embodiment 2 of the present invention
  • Figure 4a is an exemplary diagram of a heterogeneous graph constructed in Embodiment 2 of the present invention.
  • Figure 4b is another example diagram of a heterogeneous graph constructed in Embodiment 2 of the present invention.
  • FIG. 5 is an exemplary diagram of splitting a heterogeneous graph into heterogeneous subgraphs in Embodiment 2 of the present invention.
  • FIG. 6 is an example diagram of a convolutional network model of heterogeneous subgraphs in the second embodiment of the present invention.
  • Fig. 7 is an example diagram of neighbor node sampling in the second embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a system for obtaining an expression of a relationship between entities in an embodiment of the present invention.
  • Fig. 9 is a schematic structural diagram of an advertisement recall system in an embodiment of the present invention.
  • Graph learning has a wide range of applications in mining various data relationships in the real world. For example, it is used in search advertising platforms to mine the correlation between search requests and advertisements and click-through-rate (CTR). That is to say, the method of the present invention can be used in the field of advertisement search for the recall of search advertisements.
  • Search advertising refers to advertisements that advertisers determine relevant keywords based on the content and characteristics of their products or services, write advertising content, and independently place prices in the search results corresponding to the keywords.
  • Search ads recall refers to the selection of the most relevant ads from a large collection of ads through a certain algorithm or model.
  • Existing search ad recall technologies may screen "high-quality" advertisements based on the degree of matching between query terms and advertiser bid words, the advertiser's purchase price, and users' statistical preferences for advertisements; or add each user's Historical behavior data, personalized matching recall of ads.
  • the inventor found in the research of the prior art that the existing recall technology either only emphasizes the matching degree between the advertisement and the query word, or only emphasizes the improvement of the recall advertisement revenue, and lacks an integrated model to take both of the two. Since the quality of advertisement recall is very important to search advertisement revenue and user experience, the inventor provided a graph learning technology that can be used to obtain expressions of relationships between entities in the advertisement recall process, which can obtain more high-quality, Users are more concerned about the ad recall collection.
  • the first embodiment of the present invention provides a method for obtaining expressions of relationships between entities.
  • the process is shown in FIG. 1, and includes the following steps:
  • Step S101 Divide the pre-built heterogeneous graph into at least two heterogeneous subgraphs according to the pre-defined meta-path.
  • the meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph. .
  • a meta path corresponds to a heterogeneous subgraph.
  • the meta path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph are specifically: a meta path is used to express the structure of a heterogeneous subgraph and The node type and edge type included in the heterogeneous subgraph.
  • a meta-path includes node types and edge types alternately arranged in order. Among them, the node type is ranked first and last. The order of the node type and edge type expresses the heterogeneous subgraph structure.
  • Splitting the heterogeneous graph into at least two heterogeneous subgraphs according to a preset meta-path specifically includes splitting the heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths.
  • the corresponding type of node in the heterogeneous graph is obtained according to the node type included in the meta-path; according to the type of the edge connecting each adjacent node, from Obtain the edges that meet the requirements in the heterogeneous graph; the obtained nodes of the corresponding type and the edges that meet the requirements form the heterogeneous subgraph corresponding to the meta-path.
  • Step S102 Acquire sample data of a batch.
  • the sample data can be divided into multiple batches and learn in batches based on heterogeneous subgraphs.
  • Step S103 The preset graph convolution model learns a batch of sample data (or sample data set) according to the heterogeneous subgraph to obtain the vector expression of the nodes in the heterogeneous subgraph, a graph convolution model Corresponds to a heterogeneous subgraph.
  • the preset graph convolution model learns the sample data according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph , Get the vector expression of the nodes in the heterogeneous graph.
  • One is to learn and process the sample data based on all nodes in the heterogeneous subgraph, including:
  • the preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes to obtain the vector expression of the node.
  • the first to Nth order neighbor nodes of the node sample the neighbor nodes of the same order according to the preset number according to the weight of the edges between the nodes, to obtain the first to Nth order neighbor nodes after sampling;
  • the preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes after sampling to obtain the vector expression of the node.
  • Step S104 The preset aggregation model aggregates the vector expressions of the same node in different heterogeneous subgraphs based on the sample data to obtain the same vector expression of the same node in different heterogeneous subgraphs.
  • the preset aggregation model is based on sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain the same node in different heterogeneous subgraphs The same vector expression of.
  • Step S105 The preset loss function optimizes the parameters of the model based on the sample data and the same vector expression of the same node.
  • the vector expressions of at least two types of the same node are used to converge to obtain the low-dimensional vector expression of the virtual request node; the virtual request node is through a certain association relationship A virtual node constructed by at least two types of nodes; according to the low-dimensional vector expression of the virtual request node and the low-dimensional vector expression of another type of node, determine the associated parameters between at least three types of nodes, and according to the associated parameters Optimize model parameters.
  • Step S106 Whether the sample data of all batches have been acquired, if not, go to step 107; if so, go to step S108.
  • Step 107 Obtain sample data of the next batch, and return to step S103.
  • Step S108 Obtain a low-dimensional vector expression of each node in the heterogeneous graph.
  • a node in the heterogeneous graph corresponds to an entity in the sample data.
  • a low-dimensional vector expression of each node in the composition can be obtained.
  • a low-dimensional vector expression of each node in the composition is the last batch of samples learned, and the aggregation model is The same vector representation of the same node in different heterogeneous subgraphs.
  • the matching degree is the correlation parameter between the nodes obtained last time by the loss function.
  • the machine learning model is used to learn the sample data, and the vector expressions of the same nodes obtained by the learning of each heterogeneous subgraph are merged, and According to the fusion result of the vector expression of the same node, the parameters of the machine learning model are optimized, which is used to learn the next batch of samples, realize the iterative learning of the samples, and finally obtain the low-dimensional vector expression for the nodes in the heterogeneous graph.
  • the second embodiment of the present invention provides a specific implementation process of a method for obtaining expressions of relationships between entities.
  • the process of implementing advertisement recall in a search advertisement scenario is taken as an example for description.
  • the implementation principle of the method is shown in FIG. 2 and the flow is shown in FIG. As shown in 3, including the following steps:
  • Step S301 Construct a heterogeneous graph.
  • a large-scale heterogeneous graph is constructed for the search recall scene based on user logs and related products and advertisement data, which serves as a rich search interaction graph for the advertisement search scene, and the constructed heterogeneous graph is used as the follow-up Graph data input, such as the graph data of the heterogeneous graph at the bottom in Figure 2.
  • the heterogeneous graph includes multiple types of nodes such as Query, Item, and Ad to represent different entities in the search scenario.
  • the heterogeneous graph includes multiple types of Edges to represent multiple relationships between entities. Among them, the node type and its specific meaning can be shown in Table 1 below, and the edge type and its meaning can be shown in Table 2 below.
  • the Query node and the Item node are used as user intention nodes to describe the user's personalized search intention
  • the Ad node is the advertisement placed by the advertiser.
  • the user behavior edge represents the user's historical behavior preference. For example, you can create a "click edge" between the Query node and the Item node or between the Query node and the Ad node and use the number of clicks as the edge weight to indicate Query and Item/ Clicks between Ad; For example, you can create a common click edge (session edge), which means the item or Ad that is clicked in the same session (time period) and Query; For example, you can also create a collaborative filtering edge (cf edge) to represent different nodes Collaborative filtering relationship.
  • user behavior describes a dynamic relationship. Popular nodes (such as high-frequency Query nodes) will have more displays and clicks, and then have more dense The unpopular nodes and new nodes will have relatively sparse variable relationships and smaller edge weights, so user behavior edges can better describe popular nodes.
  • the content similarity edge (semantic edge) is used for the similarity between customer nodes. For example, an edge is established between Item nodes and the text similarity of its title is used as the weight.
  • the content-similar edges reflect a static relationship between nodes, which is more stable, and can also well describe the relationship between unpopular nodes and new nodes.
  • the attribute similarity edge represents the overlap of domains between nodes, such as brand, category and other domains.
  • Figure 4b is a representation of the constructed heterogeneous graph, where nodes with the same shape represent nodes of the same type, and edges with the same linear shape represent edges of the same type.
  • Step S302 Divide the constructed heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path.
  • the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph.
  • the graph data to be learned in this application is essentially a heterogeneous graph, and there may be multiple types of points and multiple types of edges.
  • the current graph convolutional neural network (GCN) is only suitable for isomorphic graphs.
  • the use of graph convolutional neural networks for learning as isomorphic graphs cannot obtain effective low-dimensional vector expression. Therefore, in order to realize the learning of heterogeneous graphs, some meaningful meta-paths are defined to divide the original large heterogeneous graph into multiple meaningful heterogeneous subgraphs for learning.
  • the defined meta path can be shown in Table 3 below.
  • the constructed heterogeneous graph is split.
  • the heterogeneous graph shown in Figure 4b is split.
  • meta-path a Meta-path b
  • Meta-path c Meta-path d
  • Meta-path e Meta-path f
  • a sub-graph b
  • sub-graph c sub-graph d
  • sub-graph e sub-graph f
  • sub-graph f Heterogeneous subgraphs.
  • meta-path a includes node Item/Ad-joint click edge-node Item/Ad-attribute similar edge-node Item/Ad.
  • subgraph a is constructed according to meta-path a, from the heterogeneous graph constructed Obtain the nodes (Item and Ad) of the corresponding node type in the meta path a, and keep the edges that meet the requirements to obtain the subgraph a.
  • the construction of heterogeneous subgraphs corresponding to other meta-paths is similar to meta-path a, and will not be repeated here.
  • the bottom is a heterogeneous graph constructed, based on the heterogeneous graph, the initial vector expression of each node is formed according to the characteristics of each node.
  • For each specified node define a meta-path containing the specified node, and construct a heterogeneous subgraph based on the defined meta-path.
  • two meta-paths are defined for the search advertisement node (Ad), corresponding Split into two heterogeneous subgraphs; define four meta-paths for the query node (Query), and split four heterogeneous subgraphs accordingly; for k commodities such as 1, 2, ..., k (Item) node, each commodity node defines two meta-paths, and splits two heterogeneous subgraphs accordingly.
  • Step S303 Obtain sample data of a batch.
  • Extract sample data related to advertisement search from user log data can come from user historical behavior logs, commodity basic attribute information table, advertisement basic attribute information table, query word basic attribute information table, etc.
  • the sample data of each batch is sequentially input into the machine learning model for training and learning.
  • the learning results of the previous batch can optimize the parameters of the model, and use the optimized parameters for the learning of the sample data of the next batch to achieve The effect of iterative learning to obtain the final learning result.
  • Step S304 The preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain vector expressions of nodes in heterogeneous subgraphs, and one graph convolution model corresponds to one heterogeneous subgraph.
  • each heterogeneous sub-graph corresponds to a graph convolutional network model.
  • the two graph convolutional network models in the leftmost group in Figure 2 correspond to the two defined search advertisement nodes (Ad) respectively.
  • the two heterogeneous sub-graphs split by a meta-path, the four graph convolutional network models in the second group from the left correspond to the four heterogeneous sub-graphs split from the four meta-paths defined by the query word node (Query).
  • Composition graph; 1,..., k groups of graph convolutional network models on the right the two graph convolutional network models in each group correspond to two meta-paths defined by an item (Item) node Heterogeneous subgraphs.
  • the sample data is used as input to correspond to the corresponding node in the heterogeneous subgraph for learning.
  • Each convolutional network model shown in Figure 2 can share ownership.
  • a heterogeneous subgraph traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph; from the heterogeneous subgraph that includes the node In, read the first to Nth order neighbor nodes of the node, N is a preset positive integer; the preset graph convolution model is based on the attribute information of the node and the first to Nth order neighbor nodes after sampling The attribute information and structure information are subjected to N-layer convolution operation to obtain the vector expression of the node.
  • the N-layer convolution operation is specifically: for a node in the heterogeneous subgraph, obtain its N-order neighbor node, and then perform the convolution operation hierarchically, for the N-1 order neighbor node, pair with the N-1 order neighbor node
  • the vector expression of the N-order neighbor nodes connected by the node is convolved to obtain the neighbor low-dimensional vector expression of the N-1 order neighbor node.
  • the neighbor low-dimensional vector expression of the N-1 order neighbor node and the N-1 order neighbor node The original low-dimensional vector expression is combined to obtain the new low-dimensional vector expression of the N-1 order neighbor node; and so on, ..., convolution operation is performed on the vector expression of the second order neighbor node connected to the first order neighbor node , Get the neighbor low-dimensional vector expression of the first-order neighbor node, combine the neighbor low-dimensional vector expression of the first-order neighbor node and the original low-dimensional vector expression of the first-order neighbor node to obtain the new low-dimensional vector of the first-order neighbor node Expression; perform convolution operation on the low-dimensional vector expression of each first-order neighbor node of the node to obtain the low-dimensional vector expression of the node's neighbors, and perform the low-dimensional vector expression of the node's neighbors and the original low-dimensional vector expression of the node Combining operations, the new neighbor low-dimensional vector expression of the node is obtained.
  • Fig. 6 The principle of learning sample data based on a heterogeneous subgraph is shown in Fig. 6.
  • a graph convolutional network can be constructed as shown in Fig. 6.
  • the first-order neighbor nodes of node 1 in subgraph a have 2, 3, 4, and 6, and the second-order neighbor nodes have 1, 2, 3, 4, and 10.
  • the second-order neighbor nodes 1, 2, 3, 4, and 10 of node 1 in the subgraph pass through the graph convolution layer to obtain the low-dimensional vector representations of the neighbor nodes 2, 3, 4, and 6 of the first-order neighbor nodes.
  • the low-dimensional vector expressions of 4 and 6 are spliced and non-linearly transformed to obtain the final low-dimensional vector expressions of nodes 2, 3, 4, and 6, which are used as input through the graph convolution layer, and the original low-dimensional vector expression of node 1 is spliced ,
  • the final low-dimensional vector expression of the second-order graph convolutional network of node 1 is obtained by conversion.
  • the final low-dimensional vector expression of other nodes is obtained in a manner similar to that of node 1, and will not be repeated here.
  • the isolated node 8 without neighbor nodes retains the original vector expression. Based on a similar approach, the final low-dimensional vector expression of each node in each heterogeneous subgraph can be obtained.
  • the meta-path-based graph convolution advertisement recall scheme can effectively solve the advertisement recall scenario by using the graph convolution method, there is still the problem of calculation amount.
  • the number of neighbor nodes of a node increases exponentially with the increase of the number of graph convolutional layers.
  • Node 1 has 3 first-order neighbors and 9 second-order neighbors.
  • the hierarchical neighbors can be sampled based on beam-search, reducing the neighbor space complexity from O(n k ) to O(kn).
  • the neighbor nodes when learning the sample data based on the heterogeneous subgraph, when there are many nodes in the heterogeneous subgraph, the neighbor nodes can be sampled, and the convolution calculation is performed based on the neighbor nodes in the sample.
  • a heterogeneous subgraph as an example, traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph; from the heterogeneous subgraph that includes the node
  • N is a preset positive integer
  • the first to Nth order neighbor nodes of the node are compared to neighbors of the same order according to the weight of the edges between nodes
  • the nodes sample according to the preset number to obtain the first to Nth order neighbor nodes after sampling; the preset graph convolution model is based on the attribute information of the node and the attribute information of the first to Nth order neighbor nodes after sampling Perform N-layer convolution with the structure information
  • the sum of the edge weights of neighbor nodes is used as the weight, and neighbor weighted sampling is performed on the nodes.
  • the principle of sampling based on edge weights is shown in FIG. 7.
  • the original convolution structure of node 1 is shown in the left figure in FIG. 7, and the weight of each edge is shown in the label number of each edge in the figure.
  • weighted sampling can be performed based on the node weight w to obtain k sampling nodes, and the weight w can be Expressed as:
  • L represents the current weight of the heavy layer node v
  • J I v represents the node number of the nodes have an upper edge
  • l l represents layer
  • i and j is a sequence number for the specified node.
  • Layer node sampling can reduce the growth trend of neighbor nodes from exponential level to linear level on the basis of taking into account all the connection relationships of upper neighbor nodes.
  • Step S305 The preset aggregation model is based on the vector expression of the nodes in the heterogeneous subgraph, and the sample data is aggregated and learned to obtain the same vector expression of the same node in different heterogeneous subgraphs.
  • the same node may exist in different heterogeneous subgraphs.
  • node 1 exists in subgraphs a, b, c, e, and f, and different heterogeneous subgraph convolutional neural networks will get different node vector expressions.
  • the attention mechanism or the fully connected aggregation mechanism or the weighted average aggregation mechanism is used to aggregate the vector expressions of the same node in different heterogeneous subgraphs, and the same vector expression of the same node in different heterogeneous subgraphs is obtained, which is to aggregate the weighted result As the final node low-dimensional vector expression (embedding) result.
  • the process of converging vector expressions of the same node in different heterogeneous graphs includes:
  • the adjusted convolution model is as follows:
  • WEIGHTEDMEAN represents the weighted average
  • N represents the neighbors of the node v that meets metapath s k
  • w represents the weight in the weighted average
  • CONCAT represents the direct concatenation of the two vectors.
  • W represents the weight to be learned
  • represents the nonlinear transformation.
  • Step S306 The preset loss function optimizes the parameters of the model based on the sample data and the same vector expression of the same node.
  • low-dimensional vector expressions of advertisements, products, and query words can be obtained.
  • the user’s current query term and the user’s previously clicked advertisement or product are used as the user’s current search request.
  • the attention mechanism is used to express the low-dimensional vector of the query term (H Q )
  • multiple low-dimensional vector expressions of pre-clicks H 1k , ..., H Ik ) are aggregated into the final user search request vector.
  • the ads that are clicked under the current request are regarded as positive examples, and the ads that are not clicked are regarded as negative examples.
  • the sample structure is as follows: (request, ad, click-label), including Requests, search ads, and click labels.
  • the request request (query, ⁇ realtimeclicked items ⁇ ads ⁇ k ), including search advertisements and multiple real-time clicked products.
  • y i represents the label data
  • p i represents the prior probability
  • v request , v ad represent the vector expression of the virtual request node and the advertising node
  • R(v request , v ad ) represents the vector expression of the virtual request node and the advertising node
  • Step S307 Whether the sample data of all batches have been acquired, if not, go to step 308; if yes, go to step S309.
  • Step S308 Obtain the sample data of the next batch, and return to step S304.
  • Step S309 Obtain a low-dimensional vector expression of each node in the heterogeneous graph.
  • a node in the heterogeneous graph corresponds to an entity in the sample data.
  • a node in the heterogeneous graph corresponds to an entity in the sample data.
  • the smallest edge is a schematic representation of a heterogeneous graph.
  • the four rows of small white squares in the upper layer are the node vectors in the heterogeneous graph.
  • the initial vector expression of each node is obtained, and then input into the learning model corresponding to each heterogeneous subgraph.
  • a batch of sample data After learning, update the vector expression of each node in the heterogeneous subgraph according to the learning result, converge the vector expression of the same node in each heterogeneous subgraph, and obtain a converged vector expression of the same node, for example, in Figure 2 Is the converged vector expression of search advertising nodes, Is the convergent vector expression of query term nodes, It is the vector expression after the convergence of each commodity node.
  • the embodiments of the present invention also provide a system for obtaining expressions of relationships between entities.
  • the system can be set up in network equipment, cloud equipment in the cloud, or architecture server equipment, client equipment and other equipment.
  • the structure of the system is shown in FIG. 8 and includes: a registration device 803, a storage device 801, a calculation device 802, and a parameter exchange device 804.
  • the storage device 801 is used to store data of heterogeneous subgraphs
  • the computing device 802 is configured to obtain the data of the heterogeneous subgraph from the storage device 801 through the registration device 803, and learn the sample data based on the heterogeneous graph by using the above-mentioned method of obtaining the relationship expression between entities to obtain each node in the heterogeneous graph The low-dimensional vector expression of.
  • the parameter exchange device 804 is used for parameter interaction with the computing device.
  • the computing device 802 obtains the data of each node and edge from the storage device through the registration device 803, including:
  • the computing device 802 sends a data query request to the registration device 803, the data query request includes the information of the heterogeneous subgraph to be queried; receives the query result returned by the registration device 803, and the query result includes the storage device information storing the heterogeneous subgraph data ; Obtain heterogeneous subgraph data from the corresponding storage device 801 according to the storage device information.
  • the storage device 801 may also store data and sample data of each node and edge in the heterogeneous graph.
  • the computing device 802 sends a data query request to the registration device 803, the data query request includes the information of the node and edge to be queried; receiving the query result returned by the registration device 803, the query result includes the storage device information storing the data of the node and edge; Obtain the data of each node and edge from the corresponding storage device 801 according to the storage device information.
  • an embodiment of the present invention also provides an advertisement recall system. As shown in FIG. 9, it includes a system 901 for obtaining relationship expressions between entities and an advertisement recall matching system 902;
  • the system 901 for obtaining expressions of relationships between entities is used to construct a heterogeneous graph for advertisement search scenarios.
  • the node types in the heterogeneous graph include: at least one of advertisements, commodities, and query terms, and the type of edges Including at least one of a click side, a common click side, a collaborative filtering side, a content semantically similar side, and an attribute similar side;
  • the preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs.
  • a graph convolution model corresponds to a heterogeneous subgraph;
  • the preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
  • the preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
  • a node of corresponds to an entity in the sample data
  • the advertisement recall matching system 902 is used to use the low-dimensional vector expressions of query term nodes, product nodes and search advertisement nodes obtained by the system for obtaining relationship expressions between entities to determine the degree of matching between query term nodes, commodity nodes and search advertisement nodes , According to the matching degree, select the search advertisement that matches the product and the query term with the set requirement.
  • the meta-path defined by the system for obtaining the expression of the relationship between entities one meta-path corresponds to a heterogeneous subgraph, and the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edges included in the heterogeneous subgraph
  • the specific type is: a meta-path is used to express the structure of a heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph; specifically: a meta-path includes node types and edge types alternately arranged in order, Among them, the node type is ranked first and last, and the arrangement order of node type and edge type expresses the structure of heterogeneous subgraph;
  • the system for obtaining the expression of the relationship between entities divides the heterogeneous graph into at least two heterogeneous subgraphs according to a preset meta-path specifically including: dividing the heterogeneous graph into at least two preset meta-paths The graph is split into at least two heterogeneous subgraphs, specifically for each of the at least two preset meta-paths, the corresponding type in the heterogeneous graph is obtained according to the node type included in the meta-path Nodes; According to the types of edges connecting adjacent nodes, obtain the required edges from the heterogeneous graph; the obtained corresponding types of nodes and the required edges constitute the heterogeneous sub-path corresponding to the meta path Figure.
  • the system for obtaining the expression of the relationship between entities learns the sample data according to the heterogeneous subgraph through a preset graph convolution model to obtain the vector expression of the nodes in the heterogeneous subgraph, which specifically includes: a preset
  • the graph convolution model obtains the vector expression of the nodes in the heterogeneous graph according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph.
  • the system for obtaining the expression of the relationship between entities uses a preset graph convolution model according to the attribute information of each node in the heterogeneous subgraph and the structural information of at least first-order neighbor nodes of each node in the heterogeneous subgraph and Attribute information to obtain the vector expression of each node in the heterogeneous graph, which specifically includes:
  • the preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth order neighbor nodes to obtain the vector expression of the node.
  • the system for obtaining the expression of the relationship between entities uses a preset graph convolution model according to the attribute information of each node in the heterogeneous subgraph and the structural information of at least first-order neighbor nodes of each node in the heterogeneous subgraph and Attribute information to obtain the vector expression of each node in the heterogeneous graph, which specifically includes:
  • the preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes after sampling to obtain the vector expression of the node.
  • the system for obtaining expressions of relationships between entities aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data through a preset aggregation model to obtain the same vector expression of the same node in different heterogeneous subgraphs , Specifically including:
  • the preset aggregation model is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs.
  • attention mechanism aggregation learning fully connected aggregation learning
  • weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs.
  • the same vector representation of the same node in the graph is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs.
  • the same vector representation of the same node in the graph is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighte
  • the advertisement recall matching system determines the degree of matching between query term nodes, product nodes and search advertisement nodes, including:
  • the virtual request node is a virtual node constructed by the query term node and the commodity node pre-clicked by the user under the common query term;
  • the matching degree between the query term node, the product node and the search advertisement node is determined.
  • the advertisement recall matching system selects search advertisements that match the product and query terms according to the matching degree, including:
  • a search advertisement whose distance meets the set requirement is selected.
  • An embodiment of the present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the foregoing method for obtaining expressions of relationships between entities is implemented.
  • An embodiment of the present invention also provides a heterogeneous graph learning device, including: a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the above-mentioned acquisition entity when the program is executed.
  • the method of expressing the relationship includes: a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the above-mentioned acquisition entity when the program is executed. The method of expressing the relationship.
  • terms such as processing, calculation, operation, determination, display, etc. may refer to one or more actions and/or processes of processing or computing systems, or similar devices, and the actions and/or processes will be expressed as The data manipulation and conversion of physical (such as electronic) quantities in the registers or memory of the processing system becomes other data similarly represented as physical quantities in the memory, registers or other such information storage, transmission or display devices of the processing system.
  • Information and signals can be represented using any of a variety of different technologies and methods.
  • the data, instructions, commands, information, signals, bits, symbols, and chips mentioned throughout the above description can be represented by voltage, current, electromagnetic waves, magnetic fields or particles, light fields or particles, or any combination thereof.
  • the steps of the method or algorithm described in combination with the embodiments of this document can be directly embodied as hardware, a software module executed by a processor, or a combination thereof.
  • the software module can be located in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM or any other form of storage medium known in the art.
  • An exemplary storage medium is connected to the processor, so that the processor can read information from the storage medium and can write information to the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and the storage medium may be located in the ASIC.
  • the ASIC can be located in the user terminal.
  • the processor and the storage medium may also exist as discrete components in the user terminal.
  • the technology described in this application can be implemented with modules (for example, procedures, functions, etc.) that perform the functions described in this application.
  • These software codes can be stored in a memory unit and executed by a processor.
  • the memory unit may be implemented in the processor or outside the processor. In the latter case, it is communicatively coupled to the processor through various means, which are well known in the art.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Strategic Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Development Economics (AREA)
  • Computational Linguistics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method, system, and device for obtaining an expression of a relationship between entities, and an advertisement retrieval system. The method comprises: dividing, according to a meta path, a heterogeneous graph into at least two heterogeneous subgraphs, and obtaining a batch of sample data; learning the sample data according to the heterogeneous subgraphs so as to obtain vector expressions of nodes in the heterogeneous subgraphs; aggregating, on the basis of the sample data, vector expressions of identical nodes in different heterogeneous subgraphs so as to obtain a same vector expression for the identical nodes in the different heterogeneous subgraphs; performing optimization on a parameter of a model on the basis of the sample data and the same vector expression for the identical nodes; and obtaining the next batch of sample data and learning the same until all batches of sample data have been learned, so as to obtain a low-dimensional vector expression of each node in the heterogeneous graph. The method enables learning of complex heterogeneous graphs, ensures high processing speed and high efficiency, and can be used in search advertising to improve the degree of matching for retrieved advertisements.

Description

获取实体间关系表达的方法、系统和设备、广告召回系统Method, system and equipment for obtaining relationship expression between entities, and advertisement recall system
本申请要求2019年01月16日递交的申请号为201910041466.9、发明名称为“获取实体间关系表达的方法、系统和设备、广告召回系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on January 16, 2019 with the application number 201910041466.9 and the invention title "Methods, systems and equipment for obtaining expressions of inter-entity relationships, and advertising recall systems", the entire contents of which are incorporated by reference In this application.
技术领域Technical field
本发明涉及数据挖掘技术领域,特别涉及一种获取实体间关系表达的方法、系统和设备、广告召回系统。The present invention relates to the technical field of data mining, in particular to a method, system and equipment for obtaining expressions of relationships between entities, and an advertisement recall system.
背景技术Background technique
随着移动终端及应用软件的普及,在社交、电商、物流、出行、外卖、营销等领域的服务提供商沉淀了海量业务数据,基于海量业务数据,挖掘不同业务实体(实体)之间的关系成为数据挖掘领域一个重要的技术研究方向。而随着机器处理能力的提升,越来越多技术人员开始研究如何通过机器学习技术进行挖掘。With the popularization of mobile terminals and application software, service providers in the fields of social networking, e-commerce, logistics, travel, food delivery, marketing, etc. have accumulated massive amounts of business data. Based on the massive amounts of business data, they can mine the relationships between different business entities (entities). Relationship has become an important technical research direction in the field of data mining. With the improvement of machine processing capabilities, more and more technicians have begun to study how to mine through machine learning technology.
本发明的发明人发现:The inventor of the present invention found:
目前,通过机器学习技术,对海量业务数据进行学习,得到用于表达实体及实体之间关系的图(Graph),即,对海量业务数据进行图学习,成为一个优选的技术方向。简单理解,图由节点和边构成,一个节点用于表示一个实体,节点与节点之间的边用于表示节点之间的关系。一张图一般会包括两个以上的节点和一条以上的边,因此,图也可以理解为由节点的集合和边的集合组成,通常表示为:G(V,E),其中,G表示图,V表示图G中节点的集合,E是图G中边的集合。图可以分为同构图和异构图,其中,异构图指的是一张图中的节点的类型不同(边的类型可以相同或者不同),或者一张图中边的类型不同(节点的类型可以相同或者不同)。所以,当实体的类型较多需要用多种类型的节点来表达,或者,实体之间的关系不唯一需要用多种类型的边来表达时,优选通过异构图表达这些实体及这些实体之间的关系,而当异构图包括的节点和边的量级很大时,该异构图会异常复杂且数据量会非常庞大,因此,降低异构图的复杂度及数据量成为本领域技术人员面临的技术问题。At present, learning massive business data through machine learning technology to obtain a graph for expressing entities and relationships between entities, that is, graph learning on massive business data has become a preferred technical direction. Simply understand that a graph is composed of nodes and edges. A node is used to represent an entity, and the edge between nodes is used to represent the relationship between nodes. A graph generally includes more than two nodes and more than one edge. Therefore, a graph can also be understood as consisting of a collection of nodes and a collection of edges, usually expressed as: G(V, E), where G represents the graph , V represents the set of nodes in the graph G, and E is the set of edges in the graph G. Graphs can be divided into homogeneous graphs and heterogeneous graphs. A heterogeneous graph refers to different types of nodes in a graph (the types of edges can be the same or different), or different types of edges in a graph (the types of nodes can be the same or different). Therefore, when there are many types of entities that need to be expressed by multiple types of nodes, or the relationship between entities does not need to be expressed by multiple types of edges, it is preferable to express these entities and their relationships through heterogeneous graphs. When the magnitude of the nodes and edges included in the heterogeneous graph is very large, the heterogeneous graph will be extremely complex and the amount of data will be very large. Therefore, reducing the complexity and data volume of the heterogeneous graph becomes the field Technical problems faced by technicians.
发明内容Summary of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述 问题的一种获取实体间关系表达的方法、系统和设备、广告召回系统。In view of the above-mentioned problems, the present invention is proposed to provide a method, system and equipment for obtaining expressions of relationships between entities, and an advertisement recall system that overcomes or at least partially solves the above-mentioned problems.
本发明实施例提供一种广告召回系统,包括获取实体间关系表达的系统和广告召回匹配系统;The embodiment of the present invention provides an advertisement recall system, including a system for obtaining relationship expressions between entities and an advertisement recall matching system;
所述获取实体间关系表达的系统,用于构建用于广告搜索场景的异构图,所述异构图中的所述节点类型包括:广告、商品、查询词中的至少一种,所述边的类型包括点击边、共同点击边、协同过滤边、内容语义相似边和属性相似边中的至少一种;The system for obtaining expressions of relationships between entities is used to construct a heterogeneous graph for advertisement search scenarios, and the node types in the heterogeneous graph include: at least one of advertisements, commodities, and query words. The types of edges include at least one of click edges, co-click edges, collaborative filtering edges, content semantically similar edges, and attribute similar edges;
根据预先定义的元路径,将预先构建的异构图分为至少两个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型;Divide the pre-built heterogeneous graph into at least two heterogeneous subgraphs according to a predefined meta-path, where the meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
获取一个批次的样本数据;Obtain a batch of sample data;
预设的图卷积模型按照异构子图,对一个批次的样本数据进行学习,得到异构子图中节点的向量表达,一个图卷积模型对应一个异构子图;The preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs. A graph convolution model corresponds to a heterogeneous subgraph;
预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达;The preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
预设的损失函数基于所述样本数据和所述相同节点的同一个向量表达对所述模型的参数进行优化;The preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
继续获取下一个批次的样本数据进行学习,直至所有批次的样本数据学习完毕,得到所述异构图中包括的广告节点、商品节点、查询词节点的低维向量表达,异构图中的一个节点对应样本数据中的一个实体;Continue to acquire the sample data of the next batch for learning, until the sample data of all batches have been learned, and the low-dimensional vector expressions of the advertising nodes, commodity nodes, and query word nodes included in the heterogeneous graph are obtained. A node of corresponds to an entity in the sample data;
所述广告召回匹配系统,用于使用所述获取实体间关系表达的系统得到的查询词节点、商品节点和搜索广告节点的低维向量表达,确定查询词节点、商品节点和搜索广告节点之间的匹配程度,根据所述匹配程度选择与商品、查询词匹配程度符合设定要求的搜索广告。The advertisement recall matching system is configured to use the low-dimensional vector expressions of query term nodes, commodity nodes and search advertisement nodes obtained by the system for obtaining inter-entity relationship expressions to determine the relationship between query term nodes, commodity nodes and search advertisement nodes According to the matching degree, select search advertisements that match the product and query terms according to the set requirements.
在一个可选的实施例中,一条元路径对应一个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型具体为:一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型;In an optional embodiment, a meta-path corresponds to a heterogeneous subgraph, and the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph are specifically: a meta-path Used to express the structure of a heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
所述根据预先设定的元路径,将异构图拆分为至少两个异构子图具体包括:The splitting the heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path specifically includes:
根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图。Split the heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths.
在一个可选的实施例中,所述获取实体间关系表达的系统通过预设的图卷积模型按照异构子图,对所述样本数据进行学习,得到异构子图中节点的向量表达,具体包括:In an optional embodiment, the system for obtaining expressions of relationships between entities uses a preset graph convolution model to learn the sample data according to heterogeneous subgraphs to obtain vector expressions of nodes in the heterogeneous subgraphs , Specifically including:
预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至 少一阶邻居节点的结构信息和属性信息,得到所述异构图中节点的向量表达。The preset graph convolution model obtains the information of the nodes in the heterogeneous graph according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph. Vector expression.
在一个可选的实施例中,所述获取实体间关系表达的系统通过预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达,具体包括:In an optional embodiment, the system for obtaining expressions of relationships between entities aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data through a preset aggregation model to obtain different heterogeneous subgraphs The same vector expression of the same node includes:
所述预设的聚合模型基于所述样本数据,使用进行注意力机制聚合学习或者全连接聚合学习或者加权平均聚合学习对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达。The preset aggregation model is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs. The same vector representation of the same node in the graph.
在一个可选的实施例中,所述广告召回匹配系统确定查询词节点、商品节点和搜索广告节点之间的匹配程度,包括:In an optional embodiment, the advertisement recall matching system determining the degree of matching among query term nodes, commodity nodes and search advertisement nodes includes:
使用注意力机制或者全连接聚合机制或者加权平均聚合机制对查询词节点的低维向量表达和同查询词下的用户前置点击商品节点的低维向量表达进行汇聚,得到虚拟请求节点的低维向量表达;所述虚拟请求节点为通过查询词节点和通查询词下的用户前置点击的商品节点构建出的虚拟节点;Use the attention mechanism or the fully connected aggregation mechanism or the weighted average aggregation mechanism to converge the low-dimensional vector expression of the query term node and the low-dimensional vector expression of the user's pre-clicked product node under the same query term to obtain the low dimensionality of the virtual request node Vector expression; the virtual request node is a virtual node constructed by the query term node and the commodity node pre-clicked by the user under the common query term;
根据虚拟请求节点的低维向量表达与搜索广告节点的低维向量表达,确定查询词节点、商品节点和搜索广告节点之间的匹配程度。According to the low-dimensional vector expression of the virtual request node and the low-dimensional vector expression of the search advertisement node, the matching degree between the query term node, the product node and the search advertisement node is determined.
在一个可选的实施例中,所述广告召回匹配系统根据所述匹配程度选择与商品、查询词匹配程度符合设定要求的搜索广告,包括:In an optional embodiment, the advertisement recall matching system selects search advertisements that match the product and the query term according to the matching degree, including:
根据所述虚拟请求节点的低维向量表达与搜索广告节点的低维向量表达的余弦距离,选择距离符合设定要求的搜索广告。According to the cosine distance between the low-dimensional vector expression of the virtual request node and the low-dimensional vector expression of the search advertisement node, a search advertisement whose distance meets the set requirement is selected.
本发明实施例还提供一种获取实体间关系表达的方法,包括:The embodiment of the present invention also provides a method for obtaining expressions of relationships between entities, including:
根据预先定义的元路径,将预先构建的异构图分为至少两个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型;Divide the pre-built heterogeneous graph into at least two heterogeneous subgraphs according to a predefined meta-path, where the meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
获取一个批次的样本数据;Obtain a batch of sample data;
预设的图卷积模型按照异构子图,对一个批次的样本数据进行学习,得到异构子图中节点的向量表达,一个图卷积模型对应一个异构子图;The preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs. A graph convolution model corresponds to a heterogeneous subgraph;
预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达;The preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
预设的损失函数基于所述样本数据和所述相同节点的同一个向量表达对所述模型的参数进行优化;The preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
继续获取下一个批次的样本数据进行学习,直至所有批次的样本数据学习完毕,得 到所述异构图中每个节点的一个低维向量表达,异构图中的一个节点对应样本数据中的一个实体。Continue to acquire the sample data of the next batch for learning, until the sample data of all batches have been learned, and a low-dimensional vector expression of each node in the heterogeneous graph is obtained. A node in the heterogeneous graph corresponds to the sample data Of an entity.
在一个可选的实施例中,一条元路径对应一个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型具体为:一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型;In an optional embodiment, a meta-path corresponds to a heterogeneous subgraph, and the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph are specifically: a meta-path Used to express the structure of a heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
所述根据预先设定的元路径,将异构图拆分为至少两个异构子图具体包括:The splitting the heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path specifically includes:
根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图。Split the heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths.
在一个可选的实施例中,所述一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型,具体为:In an optional embodiment, the one meta-path is used to express the structure of a heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph, specifically:
一条元路径中包括按顺序交替排列的节点类型和边类型,其中,排序在第一位和最后一位的是节点类型,节点类型和边类型的排列顺序表达了异构子图的结构;A meta-path includes node types and edge types alternately arranged in order. Among them, the node types are ranked first and last. The order of the node types and edge types expresses the structure of heterogeneous subgraphs;
所述根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图具体包括:The splitting a heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths specifically includes:
针对预先设定的至少两条元路径中的每一条元路径,根据所述元路径中包括节点类型,获取所述异构图中相应类型的节点;按照连接各相邻节点的边的类型,从所述异构图中获取符合要求的边;由获取到的相应类型的节点和符合要求的边,组成该元路径对应的异构子图。For each of the at least two preset meta-paths, obtain the corresponding type of node in the heterogeneous graph according to the node type included in the meta-path; according to the type of the edge connecting each adjacent node, Obtain the edges that meet the requirements from the heterogeneous graph; the obtained nodes of the corresponding type and the edges that meet the requirements form the heterogeneous subgraph corresponding to the meta-path.
在一个可选的实施例中,预设的图卷积模型按照异构子图,对所述样本数据进行学习,得到异构子图中节点的向量表达,具体包括:In an optional embodiment, the preset graph convolution model learns the sample data according to the heterogeneous subgraph to obtain the vector expression of the nodes in the heterogeneous subgraph, which specifically includes:
预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对所述样本数据进行学习,得到所述异构子图中节点的向量表达。The preset graph convolution model learns the sample data according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph, to obtain Vector expression of nodes in the heterogeneous subgraph.
在一个可选的实施例中,所述预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对所述样本数据进行学习,得到所述异构子图中每个节点的向量表达,具体包括:In an optional embodiment, the preset graph convolution model is based on the attribute information of each node in the heterogeneous subgraph and the structure information and attributes of at least first-order neighbor nodes of each node in the heterogeneous subgraph. Information, learning the sample data to obtain the vector expression of each node in the heterogeneous subgraph, specifically including:
遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到所述实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
从包括该节点的异构子图中,读取所述节点的第一阶至第N阶邻居节点,所述N为预设的正整数;From the heterogeneous subgraph including the node, read the neighboring nodes of the first order to the Nth order of the node, where N is a preset positive integer;
预设的图卷积模型根据所述节点的属性信息和第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到所述节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes to obtain the vector expression of the node.
在一个可选的实施例中,所述预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对所述样本数据进行学习,得到所述异构图中每个节点的向量表达,具体包括:In an optional embodiment, the preset graph convolution model is based on the attribute information of each node in the heterogeneous subgraph and the structure information and attributes of at least first-order neighbor nodes of each node in the heterogeneous subgraph. Information, learning the sample data to obtain the vector expression of each node in the heterogeneous graph, specifically including:
遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到所述实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
从包括该节点的异构子图中,读取所述节点的第一阶至第N阶邻居节点,所述N为预设的正整数;From the heterogeneous subgraph including the node, read the neighboring nodes of the first order to the Nth order of the node, where N is a preset positive integer;
对所述节点的第一阶至第N阶邻居节点按照节点之间边的权重对同一阶的邻居节点按照预设的个数进行采样,得到采样后的第一至第N阶邻居节点;Sampling the neighbor nodes of the same order according to the weight of the edges between the nodes on the neighbor nodes of the first to Nth order of the node according to a preset number to obtain the first to Nth neighbor nodes after sampling;
预设的图卷积模型根据所述节点的属性信息和采样后第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到所述节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes after sampling to obtain the vector expression of the node.
在一个可选的实施例中,所述预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达,具体包括:In an optional embodiment, the preset aggregation model aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data to obtain the same vector expression of the same node in different heterogeneous subgraphs, Specifically:
所述预设的聚合模型基于所述样本数据,使用注意力机制或者全连接聚合机制或者加权平均聚合机制对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达。The preset aggregation model is based on the sample data, and uses an attention mechanism or a fully connected aggregation mechanism or a weighted average aggregation mechanism to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain the same The same vector representation of the node.
本发明实施例还提供一种获取实体间关系表达的系统,包括:注册装置、存储装置、计算装置和参数交换装置;The embodiment of the present invention also provides a system for obtaining expressions of relationships between entities, including: a registration device, a storage device, a calculation device, and a parameter exchange device;
存储装置,用于存储异构子图的数据;Storage device for storing data of heterogeneous sub-graphs;
计算装置,用于通过注册装置从存储装置获取异构子图的数据,采用上述的获取实体间关系表达的方法基于异构图对样本数据进行学习,得到异构图中每个节点的低维向量表达;The computing device is used to obtain the data of the heterogeneous subgraph from the storage device through the registration device, and learn the sample data based on the heterogeneous graph by using the above-mentioned method of obtaining the relationship expression between entities to obtain the low-dimensionality of each node in the heterogeneous graph Vector expression
参数交换装置,用于与计算装置进行参数交互。The parameter exchange device is used for parameter interaction with the computing device.
本发明实施例提供的上述技术方案的有益效果至少包括:The beneficial effects of the foregoing technical solutions provided by the embodiments of the present invention include at least:
基于异构图拆分后的异构子图,使用图卷积模型对样本数据进行学习,并将各异构子图学习得到的相同节点的向量表达进行融合,并根据相同节点的向量表达的融合结果对机器学习模型的参数进行优化,用于对下一批次样本的学习,实现对样本的迭代学习,最终得到异构图中给节点的低维向量表达,从而减少异构图学习过程中的数据处理量,避免异构图处理过程中训练参数爆炸性增长和邻居节点数随层数指数级增长的问题,提 高异构图学习的速度和效率。该异构图学习方法用于广告搜索场景中,挖掘广告搜索场景中的实体关系实现使用大量信息准确实现广告召回,提高广告召回的质量,以全体广告作为候选,保证在任意流量下都能够召回足够多的广告,通过向量方式,实现广告改写和广告筛选可以一步完成。Based on the heterogeneous subgraphs after the split of the heterogeneous graph, the graph convolution model is used to learn the sample data, and the vector expressions of the same nodes obtained by the learning of each heterogeneous subgraph are merged, and the vector expressions of the same nodes are The fusion result optimizes the parameters of the machine learning model, which is used to learn the next batch of samples, realize the iterative learning of the samples, and finally obtain the low-dimensional vector expression for the nodes in the heterogeneous graph, thereby reducing the heterogeneous graph learning process In order to avoid the explosive growth of training parameters and the exponential growth of the number of neighbor nodes with the number of layers in the process of heterogeneous graph processing, the speed and efficiency of heterogeneous graph learning are improved. The heterogeneous graph learning method is used in the advertisement search scene, mining the entity relationship in the advertisement search scene to realize the use of a large amount of information to accurately realize the advertisement recall, improve the quality of the advertisement recall, and use all advertisements as candidates to ensure that it can be recalled under any traffic Enough advertisements can be achieved in one step through the vector method.
本发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。Other features and advantages of the present invention will be explained in the subsequent description, and partly become obvious from the description, or be understood by implementing the present invention. The purpose and other advantages of the present invention can be realized and obtained by the structures specifically pointed out in the written description, claims, and drawings.
下面通过附图和实施例,对本发明的技术方案做进一步的详细描述。The technical solutions of the present invention will be further described in detail below through the accompanying drawings and embodiments.
附图说明BRIEF DESCRIPTION
附图用来提供对本发明的进一步理解,并且构成说明书的一部分,与本发明的实施例一起用于解释本发明,并不构成对本发明的限制。在附图中:The accompanying drawings are used to provide a further understanding of the present invention, and constitute a part of the specification. Together with the embodiments of the present invention, they are used to explain the present invention and do not constitute a limitation to the present invention. In the drawings:
图1为本发明实施例一中获取实体间关系表达的方法的流程图;FIG. 1 is a flowchart of a method for obtaining expressions of relationships between entities in Embodiment 1 of the present invention;
图2为本发明实施例二中获取实体间关系表达的方法的原理示意图;2 is a schematic diagram of the principle of a method for obtaining expressions of relationships between entities in Embodiment 2 of the present invention;
图3为本发明实施例二中获取实体间关系表达的方法的流程图;3 is a flowchart of a method for obtaining expressions of relationships between entities in Embodiment 2 of the present invention;
图4a为本发明实施例二中构建的异构图的示例图;Figure 4a is an exemplary diagram of a heterogeneous graph constructed in Embodiment 2 of the present invention;
图4b为本发明实施例二中构建的异构图的另一个示例图;Figure 4b is another example diagram of a heterogeneous graph constructed in Embodiment 2 of the present invention;
图5为本发明实施例二中异构图拆分为异构子图的示例图;FIG. 5 is an exemplary diagram of splitting a heterogeneous graph into heterogeneous subgraphs in Embodiment 2 of the present invention;
图6为本发明实施例二中异构子图的卷积网络模型示例图;6 is an example diagram of a convolutional network model of heterogeneous subgraphs in the second embodiment of the present invention;
图7为本发明实施例二中邻居节点采样的示例图;Fig. 7 is an example diagram of neighbor node sampling in the second embodiment of the present invention;
图8为本发明实施例中获取实体间关系表达系统的结构示意图;FIG. 8 is a schematic structural diagram of a system for obtaining an expression of a relationship between entities in an embodiment of the present invention;
图9为本发明实施例中广告召回系统的结构示意图。Fig. 9 is a schematic structural diagram of an advertisement recall system in an embodiment of the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Hereinafter, exemplary embodiments of the present disclosure will be described in more detail with reference to the accompanying drawings. Although the exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be implemented in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided to enable a more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
为了解决现有技术中存在的异构图学习时,训练参数指数级增长,邻居采样也随层数增加而指数级增长,从而导致设备无法支持如此大数量级的运算的问题,本发明实施 例提供一种的方法,能够很好地解决上述问题,有效地减少异构图学习过程中的数据处获取实体间关系表达理量,处理速度快、效率高。In order to solve the problem that the training parameters increase exponentially when learning heterogeneous graphs in the prior art, and the neighbor sampling also increases exponentially with the increase of the number of layers, which causes the device to be unable to support such a large order of magnitude of operations. One method can well solve the above-mentioned problems, effectively reduce the amount of expression of the relationship between entities obtained at the data in the process of heterogeneous graph learning, and has fast processing speed and high efficiency.
图学习在现实领域中挖掘各种数据关系时有着广泛的应用,例如在搜索广告平台中用于挖掘搜索请求和广告之间的相关性以及点击通过率(Click-Through-Rate,CTR)等。即本发明方法可以用于广告搜索领域,用于搜索广告的召回。搜索广告是指广告主根据自己的产品或服务的内容、特点等,确定相关的关键词,撰写广告内容并自主定价投放在关键词对应的搜索结果中的广告。搜索广告召回是指通过某种算法或者模型从海量广告集合中挑选最相关的广告。Graph learning has a wide range of applications in mining various data relationships in the real world. For example, it is used in search advertising platforms to mine the correlation between search requests and advertisements and click-through-rate (CTR). That is to say, the method of the present invention can be used in the field of advertisement search for the recall of search advertisements. Search advertising refers to advertisements that advertisers determine relevant keywords based on the content and characteristics of their products or services, write advertising content, and independently place prices in the search results corresponding to the keywords. Search ads recall refers to the selection of the most relevant ads from a large collection of ads through a certain algorithm or model.
现有的搜索广告召回技术或是基于查询词与广告主竞价词(bidword)匹配程度、广告主买词价格以及用户对广告的统计偏好筛选“高质量”的广告;或是加入每个用户的历史行为数据,对广告进行个性化匹配召回。Existing search ad recall technologies may screen "high-quality" advertisements based on the degree of matching between query terms and advertiser bid words, the advertiser's purchase price, and users' statistical preferences for advertisements; or add each user's Historical behavior data, personalized matching recall of ads.
发明人在对现有技术研究中发现,现有的召回技术或是只偏重广告与查询词的匹配程度,亦或只偏重提高召回广告收益,缺少一个集成模型将二者兼顾。由于广告召回的质量高低,对搜索广告收益及用户体验至关重要,因此,发明者提供了一种图学习技术,在广告召回过程中用来获取实体间关系表达,能得到更多高质量、用户更加关心的广告召回集合。The inventor found in the research of the prior art that the existing recall technology either only emphasizes the matching degree between the advertisement and the query word, or only emphasizes the improvement of the recall advertisement revenue, and lacks an integrated model to take both of the two. Since the quality of advertisement recall is very important to search advertisement revenue and user experience, the inventor provided a graph learning technology that can be used to obtain expressions of relationships between entities in the advertisement recall process, which can obtain more high-quality, Users are more concerned about the ad recall collection.
下面通过具体的实施例来详细描述获取实体间关系表达方法和系统,以及用于广告召回系统的具体实现方式。In the following, specific embodiments are used to describe in detail the method and system for obtaining the expression of the relationship between entities, and the specific implementation manner for the advertisement recall system.
实施例一Example one
本发明实施例一提供一种获取实体间关系表达的方法,其流程如图1所示,包括如下步骤:The first embodiment of the present invention provides a method for obtaining expressions of relationships between entities. The process is shown in FIG. 1, and includes the following steps:
步骤S101:根据预先定义的元路径,将预先构建的异构图分为至少两个异构子图,元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型。Step S101: Divide the pre-built heterogeneous graph into at least two heterogeneous subgraphs according to the pre-defined meta-path. The meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph. .
一条元路径对应一个异构子图,元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型具体为:一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型。具体的,一条元路径中包括按顺序交替排列的节点类型和边类型,其中,排序在第一位和最后一位的是节点类型,节点类型和边类型的排列顺序表达了异构子图的结构。A meta path corresponds to a heterogeneous subgraph. The meta path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph are specifically: a meta path is used to express the structure of a heterogeneous subgraph and The node type and edge type included in the heterogeneous subgraph. Specifically, a meta-path includes node types and edge types alternately arranged in order. Among them, the node type is ranked first and last. The order of the node type and edge type expresses the heterogeneous subgraph structure.
根据预先设定的元路径,将异构图拆分为至少两个异构子图具体包括根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图。具体的,针对预先设定的至少 两条元路径中的每一条元路径,根据元路径中包括节点类型,获取异构图中相应类型的节点;按照连接各相邻节点的边的类型,从异构图中获取符合要求的边;由获取到的相应类型的节点和符合要求的边,组成该元路径对应的异构子图。Splitting the heterogeneous graph into at least two heterogeneous subgraphs according to a preset meta-path specifically includes splitting the heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths. Specifically, for each of the at least two preset meta-paths, the corresponding type of node in the heterogeneous graph is obtained according to the node type included in the meta-path; according to the type of the edge connecting each adjacent node, from Obtain the edges that meet the requirements in the heterogeneous graph; the obtained nodes of the corresponding type and the edges that meet the requirements form the heterogeneous subgraph corresponding to the meta-path.
步骤S102:获取一个批次的样本数据。Step S102: Acquire sample data of a batch.
样本数据可以分为多个批次,基于异构子图分批进行学习。The sample data can be divided into multiple batches and learn in batches based on heterogeneous subgraphs.
步骤S103:预设的图卷积模型按照异构子图,对一个批次的样本数据(也可以说样本数据集合)进行学习,得到异构子图中节点的向量表达,一个图卷积模型对应一个异构子图。Step S103: The preset graph convolution model learns a batch of sample data (or sample data set) according to the heterogeneous subgraph to obtain the vector expression of the nodes in the heterogeneous subgraph, a graph convolution model Corresponds to a heterogeneous subgraph.
该步骤中,预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对样本数据进行学习,得到异构图中节点的向量表达。可以有两种可选的情况:In this step, the preset graph convolution model learns the sample data according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph , Get the vector expression of the nodes in the heterogeneous graph. There can be two alternative situations:
一种是基于异构子图中的全部节点,对样本数据进行学习处理,包括:One is to learn and process the sample data based on all nodes in the heterogeneous subgraph, including:
遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
从包括该节点的异构子图中,读取该节点的第一阶至第N阶邻居节点,N为预设的正整数;From the heterogeneous subgraph that includes the node, read the node's first-order to N-th order neighbor nodes, where N is a preset positive integer;
预设的图卷积模型根据该节点的属性信息和第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到该节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes to obtain the vector expression of the node.
一种是对异构子图中节点的采样结果,对样本数据进行学习处理,包括:One is to learn and process the sample data from the sampling results of the nodes in the heterogeneous subgraph, including:
遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
从包括该节点的异构子图中,读取该节点的第一阶至第N阶邻居节点,N为预设的正整数;From the heterogeneous subgraph that includes the node, read the node's first-order to N-th order neighbor nodes, where N is a preset positive integer;
对该节点的第一阶至第N阶邻居节点按照节点之间边的权重对同一阶的邻居节点按照预设的个数进行采样,得到采样后的第一至第N阶邻居节点;The first to Nth order neighbor nodes of the node sample the neighbor nodes of the same order according to the preset number according to the weight of the edges between the nodes, to obtain the first to Nth order neighbor nodes after sampling;
预设的图卷积模型根据所述节点的属性信息和采样后第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到所述节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes after sampling to obtain the vector expression of the node.
步骤S104:预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达。Step S104: The preset aggregation model aggregates the vector expressions of the same node in different heterogeneous subgraphs based on the sample data to obtain the same vector expression of the same node in different heterogeneous subgraphs.
预设的聚合模型基于样本数据,使用进行注意力机制聚合学习或者全连接聚合学习 或者加权平均聚合学习对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达。The preset aggregation model is based on sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain the same node in different heterogeneous subgraphs The same vector expression of.
步骤S105:预设的损失函数基于样本数据和相同节点的同一个向量表达对模型的参数进行优化。Step S105: The preset loss function optimizes the parameters of the model based on the sample data and the same vector expression of the same node.
得到不同异构子图中相同节点的同一个向量表达后,使用至少两种类型的相同节点的向量表达进行汇聚,得到虚拟请求节点的低维向量表达;虚拟请求节点为通过具有一定关联关系的至少两种类型的节点构建出的虚拟节点;根据虚拟请求节点的低维向量表达与另一种类型的节点的低维向量表达,确定至少三种类型的节点之间的关联参数,根据关联参数对模型参数进行优化。After obtaining the same vector expression of the same node in different heterogeneous subgraphs, the vector expressions of at least two types of the same node are used to converge to obtain the low-dimensional vector expression of the virtual request node; the virtual request node is through a certain association relationship A virtual node constructed by at least two types of nodes; according to the low-dimensional vector expression of the virtual request node and the low-dimensional vector expression of another type of node, determine the associated parameters between at least three types of nodes, and according to the associated parameters Optimize model parameters.
步骤S106:是否所有批次的样本数据已经获取完毕,若否,执行步骤107;若是执行步骤S108。Step S106: Whether the sample data of all batches have been acquired, if not, go to step 107; if so, go to step S108.
步骤107:获取下一个批次的样本数据,并返回执行步骤S103。Step 107: Obtain sample data of the next batch, and return to step S103.
从而实现继续获取下一个批次的样本数据进行学习,直至所有批次的样本数据学习完毕。In this way, the sample data of the next batch will continue to be obtained for learning until the sample data of all batches have been learned.
步骤S108:得到异构图中每个节点的一个低维向量表达。异构图中的一个节点对应样本数据中的一个实体。Step S108: Obtain a low-dimensional vector expression of each node in the heterogeneous graph. A node in the heterogeneous graph corresponds to an entity in the sample data.
所有批次的样本都学习完毕后,可以得到构图中每个节点的一个低维向量表达,其中,构图中每个节点的一个低维向量表达是最后一批次样本学习后,聚合模型中得到的不同异构子图中相同节点的同一个向量表达。After all batches of samples have been learned, a low-dimensional vector expression of each node in the composition can be obtained. Among them, a low-dimensional vector expression of each node in the composition is the last batch of samples learned, and the aggregation model is The same vector representation of the same node in different heterogeneous subgraphs.
所有批次的样本都学习完毕后,还可以得到不同类型的节点之间的匹配程度,该匹配程度是损失函数最后一次得到的节点之间的关联参数。After the samples of all batches are learned, the matching degree between different types of nodes can also be obtained. The matching degree is the correlation parameter between the nodes obtained last time by the loss function.
本实施例的上述方法中,基于异构图拆分后的异构子图,使用机器学习模型对样本数据进行学习,并将各异构子图学习得到的相同节点的向量表达进行融合,并根据相同节点的向量表达的融合结果对机器学习模型的参数进行优化,用于对下一批次样本的学习,实现对样本的迭代学习,最终得到异构图中给节点的低维向量表达,从而减少异构图学习过程中的数据处理量,避免异构图处理过程中训练参数爆炸性增长和邻居节点数随层数指数级增长的问题,提高异构图学习的速度和效率。In the above method of this embodiment, based on the heterogeneous subgraphs after the split of the heterogeneous graphs, the machine learning model is used to learn the sample data, and the vector expressions of the same nodes obtained by the learning of each heterogeneous subgraph are merged, and According to the fusion result of the vector expression of the same node, the parameters of the machine learning model are optimized, which is used to learn the next batch of samples, realize the iterative learning of the samples, and finally obtain the low-dimensional vector expression for the nodes in the heterogeneous graph. This reduces the amount of data processing in the process of heterogeneous graph learning, avoids the explosive growth of training parameters and the exponential growth of the number of neighbor nodes with the number of layers in the process of heterogeneous graph processing, and improves the speed and efficiency of heterogeneous graph learning.
实施例二Example 2
本发明实施例二提供获取实体间关系表达的方法的一种具体实现过程,以搜索广告场景下实现广告召回的过程为例进行说明,该方法的实现原理如图2所示,其流程如图 3所示,包括如下步骤:The second embodiment of the present invention provides a specific implementation process of a method for obtaining expressions of relationships between entities. The process of implementing advertisement recall in a search advertisement scenario is taken as an example for description. The implementation principle of the method is shown in FIG. 2 and the flow is shown in FIG. As shown in 3, including the following steps:
步骤S301:构造异构图。Step S301: Construct a heterogeneous graph.
以广告搜索场景为例,根据用户日志以及相关商品、广告数据,为搜索召回场景构造了一张大规模异构图,作为广告搜索场景的丰富的搜索交互图,以构造的异构图作为后续的图数据输入,例如图2中最下方的异构图的图数据。Taking the advertisement search scenario as an example, a large-scale heterogeneous graph is constructed for the search recall scene based on user logs and related products and advertisement data, which serves as a rich search interaction graph for the advertisement search scene, and the constructed heterogeneous graph is used as the follow-up Graph data input, such as the graph data of the heterogeneous graph at the bottom in Figure 2.
构建的异构图的一个示例参照图4a所示,异构图中包括Query、Item、Ad等多种类型的节点,来表示搜索场景中的不同实体,异构图中包括了多种类型的边,来表示实体之间的多种关系。其中,节点类型及其具体含义可以如下表1所示,边的类型及其含义可以如下表2所示。An example of the constructed heterogeneous graph is shown in Figure 4a. The heterogeneous graph includes multiple types of nodes such as Query, Item, and Ad to represent different entities in the search scenario. The heterogeneous graph includes multiple types of Edges to represent multiple relationships between entities. Among them, the node type and its specific meaning can be shown in Table 1 below, and the edge type and its meaning can be shown in Table 2 below.
表1Table 1
节点类型Node type 具体含义Specific meaning
ItemItem 广告搜索场景下的所有商品All products in the ad search scenario
AdAd 广告搜索场景下的搜索广告Search ads in ad search scenarios
QueryQuery 广告搜索场景下的用户查询词User query terms in ad search scenarios
其中,使用Query节点和Item节点作为用户意图节点,刻画用户的个性化搜索意图,Ad节点为广告主投放的广告。Among them, the Query node and the Item node are used as user intention nodes to describe the user's personalized search intention, and the Ad node is the advertisement placed by the advertiser.
表2Table 2
Figure PCTCN2020070249-appb-000001
Figure PCTCN2020070249-appb-000001
其中:among them:
用户行为边表示用户的历史行为偏好,例如,可以在Query节点和Item节点之间或在Query节点和Ad节点之间建“点击边(click边)并使用点击次数作为边权重,表示Query和Item/Ad之间的点击;又例如,可以建立共同点击边(session边),表示同session(时段)同Query共同点击的Item或Ad;又例如,还可以建立协同过滤边(cf边)表示不同节点之间的协同过滤关系。在广告搜索场景下,用户行为边刻画的是一种动态变化的关系。热门的节点(例如高频Query的节点)会有更多的展示和点击,进而拥有更稠密的边关系和更大的边权重,而冷门节点和新节点,则会拥有相对稀疏的变关系和较小的边权重,因此用户行为边能够更好地刻画热门节点。The user behavior edge represents the user's historical behavior preference. For example, you can create a "click edge" between the Query node and the Item node or between the Query node and the Ad node and use the number of clicks as the edge weight to indicate Query and Item/ Clicks between Ad; For example, you can create a common click edge (session edge), which means the item or Ad that is clicked in the same session (time period) and Query; For example, you can also create a collaborative filtering edge (cf edge) to represent different nodes Collaborative filtering relationship. In the advertising search scenario, user behavior describes a dynamic relationship. Popular nodes (such as high-frequency Query nodes) will have more displays and clicks, and then have more dense The unpopular nodes and new nodes will have relatively sparse variable relationships and smaller edge weights, so user behavior edges can better describe popular nodes.
内容相似边(semantic边),用来客户节点之间的相似度,例如:在Item节点之间建立边,使用其标题的文本相似度作为变得权重。内容相似边反映了节点之间的一种静态关系,更加稳定,也可以很好的刻画冷门节点和新节点之间的关系。The content similarity edge (semantic edge) is used for the similarity between customer nodes. For example, an edge is established between Item nodes and the text similarity of its title is used as the weight. The content-similar edges reflect a static relationship between nodes, which is more stable, and can also well describe the relationship between unpopular nodes and new nodes.
属性相似边(domain边),表示节点之间的领域的重叠成都,例如品牌、种类等领域。The attribute similarity edge (domain edge) represents the overlap of domains between nodes, such as brand, category and other domains.
图4b是构建的异构图的一种表示形式,其中相同形状的节点表示相同类型的节点,相同线形的边表示相同类型的边。Figure 4b is a representation of the constructed heterogeneous graph, where nodes with the same shape represent nodes of the same type, and edges with the same linear shape represent edges of the same type.
步骤S302:根据预先设定的元路径,将构建的异构图分为至少两个异构子图。其中,元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型。Step S302: Divide the constructed heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path. Among them, the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph.
本申请中要学习的图数据实质上是一个异构图,可能会有多种类型点、多种类型边,目前的图卷积神经网络(GCN)只适用于同构图,直接将异构图视为同构图使用图卷积神经网络进行学习并不能得到有效的低维向量表达。因此,为了实现对异构图的学习,定义了一些有实际意义的元路径(meta-path),以便将原始的大型异构图分割为多个有意义的异构子图来进行学习。The graph data to be learned in this application is essentially a heterogeneous graph, and there may be multiple types of points and multiple types of edges. The current graph convolutional neural network (GCN) is only suitable for isomorphic graphs. The use of graph convolutional neural networks for learning as isomorphic graphs cannot obtain effective low-dimensional vector expression. Therefore, in order to realize the learning of heterogeneous graphs, some meaningful meta-paths are defined to divide the original large heterogeneous graph into multiple meaningful heterogeneous subgraphs for learning.
以广告搜索场景为例,定义的元路径可以如下表3所示。Taking the advertisement search scenario as an example, the defined meta path can be shown in Table 3 below.
表3table 3
编号Numbering 元路径Meta path
aa Item/Ad节点-共同点击边-Item/Ad节点-属性相似边-Item/Ad节点Item/Ad node-common click edge-Item/Ad node-attribute similar edge-Item/Ad node
bb Item/Ad节点-点击边-Query节点-点击边-Item/Ad节点Item/Ad node-click edge-Query node-click edge-Item/Ad node
cc Query节点-点击边-Item/Ad节点-共同点击边-Item/Ad节点Query node-click edge-Item/Ad node-joint click edge-Item/Ad node
dd Query节点-协同过滤边-Query节点-语义相似边-Query节点Query node-collaborative filtering edge-Query node-semantic similarity edge-Query node
ee Query节点-协同过滤边-Query节点-协同过滤边-Item/Ad节点Query node-collaborative filtering edge-Query node-collaborative filtering edge-Item/Ad node
ff Query节点-点击边-Item/Ad节点-协同过滤边-Query节点Query node-click edge-Item/Ad node-collaborative filtering edge-Query node
基于定义的元路径,对构建的异构图进行拆分,如图5所示的,对图4b所示的异构图进行拆分,以定义了6个元路径为例,针对元路径a、元路径b、元路径c、元路径d、元路径e、元路径f,可以拆分出子图a、子图b、子图c、子图d、子图e、子图f六个异构子图。Based on the defined meta-path, the constructed heterogeneous graph is split. As shown in Figure 5, the heterogeneous graph shown in Figure 4b is split. Taking 6 meta-paths as an example, for meta-path a , Meta-path b, Meta-path c, Meta-path d, Meta-path e, Meta-path f, which can be split into six sub-graphs: a, sub-graph b, sub-graph c, sub-graph d, sub-graph e, and sub-graph f Heterogeneous subgraphs.
以元路径a为例:元路径a包括节点Item/Ad-共同点击边-节点Item/Ad-属性相似边-节点Item/Ad,根据元路径a构建子图a时,从构建的异构图中获取元路径a中包括相应节点类型的节点(Item和Ad),并保留符合要求的边,得到子图a。其他元路径对应的异构子图的构建与元路径a类似,此处不再一一赘述。Take meta-path a as an example: meta-path a includes node Item/Ad-joint click edge-node Item/Ad-attribute similar edge-node Item/Ad. When subgraph a is constructed according to meta-path a, from the heterogeneous graph constructed Obtain the nodes (Item and Ad) of the corresponding node type in the meta path a, and keep the edges that meet the requirements to obtain the subgraph a. The construction of heterogeneous subgraphs corresponding to other meta-paths is similar to meta-path a, and will not be repeated here.
参照图2所示,最下边是构建的一个异构图,基于该异构图,根据各节点的特征,形成各节点的最初的向量表达。针对每个指定的节点,定义包含该指定节点的元路径,基于定义的元路径构建异构子图,如图2中所示的,针对搜索广告节点(Ad)定义了两个元路径,相应的拆分出两个异构子图;针对查询词节点(Query)定义了四个元路径,相应的拆分出四个异构子图;针对1、2、……、k等k个商品(Item)节点,每个商品节点定义了两个元路径,相应的拆分出两个异构子图。Referring to Figure 2, the bottom is a heterogeneous graph constructed, based on the heterogeneous graph, the initial vector expression of each node is formed according to the characteristics of each node. For each specified node, define a meta-path containing the specified node, and construct a heterogeneous subgraph based on the defined meta-path. As shown in Figure 2, two meta-paths are defined for the search advertisement node (Ad), corresponding Split into two heterogeneous subgraphs; define four meta-paths for the query node (Query), and split four heterogeneous subgraphs accordingly; for k commodities such as 1, 2, ..., k (Item) node, each commodity node defines two meta-paths, and splits two heterogeneous subgraphs accordingly.
步骤S303:获取一个批次的样本数据。Step S303: Obtain sample data of a batch.
从用户日志数据中抽取广告搜索相关的样本数据。样本数据可以来源于用户历史行为日志,商品基础属性信息表,广告基础属性信息表,查询词基础属性信息表等。Extract sample data related to advertisement search from user log data. The sample data can come from user historical behavior logs, commodity basic attribute information table, advertisement basic attribute information table, query word basic attribute information table, etc.
可以分多个批次进行抽取。每个批次的样本数据依次输入到机器学习模型中进行训练学习,前一个批次的学习结果可以对模型的参数进行优化,使用优化的参数用于后一个批次样本数据的学习,从而达到迭代学习的效果,获得最终的学习结果。It can be extracted in multiple batches. The sample data of each batch is sequentially input into the machine learning model for training and learning. The learning results of the previous batch can optimize the parameters of the model, and use the optimized parameters for the learning of the sample data of the next batch to achieve The effect of iterative learning to obtain the final learning result.
步骤S304:预设的图卷积模型按照异构子图,对一个批次的样本数据进行学习,得到异构子图中节点的向量表达,一个图卷积模型对应一个异构子图。Step S304: The preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain vector expressions of nodes in heterogeneous subgraphs, and one graph convolution model corresponds to one heterogeneous subgraph.
基于异构子图对样本数据进行学习时,使用预设的图卷积模型,根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对样本数据进行学习,得到异构图子中节点的向量表达。When learning sample data based on heterogeneous subgraphs, use the preset graph convolution model, based on the attribute information of each node in the heterogeneous subgraph and the structure of at least first-order neighbor nodes of each node in the heterogeneous subgraph Information and attribute information, learn the sample data, and get the vector expression of the nodes in the heterogeneous graph.
参照图2所示的,每个异构子图对应一个图卷积网络模型,例如图2中最左边一组的两个图卷积网络模型,分别对应于搜索广告节点(Ad)定义的两个元路径拆分出的两 个异构子图,左边起第二组的四个图卷积网络模型,分别对应于查询词节点(Query)定义的四个元路径拆分出的四个异构子图;右边的1、……、k组图卷积网络模型,每组的两个图卷积网络模型分别对应于一个商品(Item)节点定义的两个元路径拆分出的两个异构子图。分别基于各个异构子图对样本数据进行学习时,将样本数据作为输入对应到异构子图中相应的节点进行学习。图2所示的各个卷积网络模型之间可以权属共享。Referring to Figure 2, each heterogeneous sub-graph corresponds to a graph convolutional network model. For example, the two graph convolutional network models in the leftmost group in Figure 2 correspond to the two defined search advertisement nodes (Ad) respectively. The two heterogeneous sub-graphs split by a meta-path, the four graph convolutional network models in the second group from the left correspond to the four heterogeneous sub-graphs split from the four meta-paths defined by the query word node (Query). Composition graph; 1,..., k groups of graph convolutional network models on the right, the two graph convolutional network models in each group correspond to two meta-paths defined by an item (Item) node Heterogeneous subgraphs. When learning sample data based on each heterogeneous subgraph, the sample data is used as input to correspond to the corresponding node in the heterogeneous subgraph for learning. Each convolutional network model shown in Figure 2 can share ownership.
以一个异构子图为例,遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到实体在异构图中对应的节点;从包括该节点的异构子图中,读取该节点的第一阶至第N阶邻居节点,N为预设的正整数;预设的图卷积模型根据该节点的属性信息和采样后第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到该节点的向量表达。Take a heterogeneous subgraph as an example, traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph; from the heterogeneous subgraph that includes the node In, read the first to Nth order neighbor nodes of the node, N is a preset positive integer; the preset graph convolution model is based on the attribute information of the node and the first to Nth order neighbor nodes after sampling The attribute information and structure information are subjected to N-layer convolution operation to obtain the vector expression of the node.
其中,N层卷积运算具体为:对异构子图中的一个节点,获取其N阶邻居节点,然后分层进行卷积运算,对于N-1阶邻居节点,对与N-1阶邻居节点连接的N阶邻居节点的向量表达进行卷积运算,得到N-1阶邻居节点的邻居低维向量表达,将N-1阶邻居节点的邻居低维向量表达和N-1阶邻居节点的原低维向量表达进行组合运算,得到N-1阶邻居节点的新的低维向量表达;以此类推,……,对与一阶邻居节点连接的二阶邻居节点的向量表达进行卷积运算,得到一阶邻居节点的邻居低维向量表达,将一阶邻居节点的邻居低维向量表达和一阶邻居节点的原低维向量表达进行组合运算,得到一阶邻居节点的新的低维向量表达;对该节点的各一阶邻居节点的低维向量表达进行卷积运算,得到该节点的邻居低维向量表达,将该节点的邻居低维向量表达和该节点的原低维向量表达进行组合运算,得到该节点的新的邻居低维向量表达。Among them, the N-layer convolution operation is specifically: for a node in the heterogeneous subgraph, obtain its N-order neighbor node, and then perform the convolution operation hierarchically, for the N-1 order neighbor node, pair with the N-1 order neighbor node The vector expression of the N-order neighbor nodes connected by the node is convolved to obtain the neighbor low-dimensional vector expression of the N-1 order neighbor node. The neighbor low-dimensional vector expression of the N-1 order neighbor node and the N-1 order neighbor node The original low-dimensional vector expression is combined to obtain the new low-dimensional vector expression of the N-1 order neighbor node; and so on, ..., convolution operation is performed on the vector expression of the second order neighbor node connected to the first order neighbor node , Get the neighbor low-dimensional vector expression of the first-order neighbor node, combine the neighbor low-dimensional vector expression of the first-order neighbor node and the original low-dimensional vector expression of the first-order neighbor node to obtain the new low-dimensional vector of the first-order neighbor node Expression; perform convolution operation on the low-dimensional vector expression of each first-order neighbor node of the node to obtain the low-dimensional vector expression of the node's neighbors, and perform the low-dimensional vector expression of the node's neighbors and the original low-dimensional vector expression of the node Combining operations, the new neighbor low-dimensional vector expression of the node is obtained.
基于一个异构子图对样本数据进行学习的原理参见图6所示,以元路径a对应的子图a为例,对于节点1,可构建图卷积网络如图6所示。为方便观察以二层卷积结构为例,实际可扩展至多层。如图6所示,节点1在子图a中的一阶邻居节点有2、3、4、6,二阶邻居节点有1、2、3、4、10。节点1在子图中的二阶邻居节点1、2、3、4、10经过图卷积层得到一阶邻居节点2、3、4、6的邻居低维向量表达,与节点2、3、4、6的低维向量表达拼接并进行非线性转换得到节点2、3、4、6最终的低维向量表达,并将其作为输入经过图卷积层,拼接节点1的原始低维向量表达,转换得到节点1的二阶图卷积网络的最终低维向量表达。The principle of learning sample data based on a heterogeneous subgraph is shown in Fig. 6. Taking the subgraph a corresponding to meta path a as an example, for node 1, a graph convolutional network can be constructed as shown in Fig. 6. To facilitate observation, take the two-layer convolution structure as an example, which can actually be extended to multiple layers. As shown in Figure 6, the first-order neighbor nodes of node 1 in subgraph a have 2, 3, 4, and 6, and the second-order neighbor nodes have 1, 2, 3, 4, and 10. The second- order neighbor nodes 1, 2, 3, 4, and 10 of node 1 in the subgraph pass through the graph convolution layer to obtain the low-dimensional vector representations of the neighbor nodes 2, 3, 4, and 6 of the first-order neighbor nodes. The low-dimensional vector expressions of 4 and 6 are spliced and non-linearly transformed to obtain the final low-dimensional vector expressions of nodes 2, 3, 4, and 6, which are used as input through the graph convolution layer, and the original low-dimensional vector expression of node 1 is spliced , The final low-dimensional vector expression of the second-order graph convolutional network of node 1 is obtained by conversion.
异构子图a中,其他节点的最终低维向量表达的获取方式与节点1类似,此处不再赘述。没有邻居节点的孤立节点8则保留最初的向量表达。基于类似的方式,可以得到 各个异构子图中各节点的最终低维向量表达。In the heterogeneous subgraph a, the final low-dimensional vector expression of other nodes is obtained in a manner similar to that of node 1, and will not be repeated here. The isolated node 8 without neighbor nodes retains the original vector expression. Based on a similar approach, the final low-dimensional vector expression of each node in each heterogeneous subgraph can be obtained.
基于元路径(meta-path)的图卷积广告召回方案虽然能够利用图卷积方法有效的解决广告召回场景,但依旧存在计算量的问题。以如图6所示的异构子图为例,节点的邻居节点数随图卷积层数的增加呈指数级递增,节点1的一阶邻居有3个,二阶邻居有9个。而现实场景下,真实节点的一阶邻居节点可能成千上万,随层数递增,直接计算海量节点的卷积结果几乎是无法实现的。因此,可以基于束搜索(beam-search)对层级邻居进行采样,将邻居空间复杂度由O(n k)降低到O(kn)。 Although the meta-path-based graph convolution advertisement recall scheme can effectively solve the advertisement recall scenario by using the graph convolution method, there is still the problem of calculation amount. Taking the heterogeneous subgraph shown in Figure 6 as an example, the number of neighbor nodes of a node increases exponentially with the increase of the number of graph convolutional layers. Node 1 has 3 first-order neighbors and 9 second-order neighbors. In a real scenario, there may be thousands of first-order neighbor nodes of a real node. As the number of layers increases, it is almost impossible to directly calculate the convolution result of a large number of nodes. Therefore, the hierarchical neighbors can be sampled based on beam-search, reducing the neighbor space complexity from O(n k ) to O(kn).
在一个可选的实施例中,基于异构子图对样本数据进行学习时,当异构子图中的节点很多时,可以对邻居节点进行采样,基于采样后里邻居节点进行卷积计算。以一个异构子图为例,遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到实体在异构图中对应的节点;从包括该节点的异构子图中,读取该节点的第一阶至第N阶邻居节点,N为预设的正整数;对该节点的第一阶至第N阶邻居节点按照节点之间边的权重对同一阶的邻居节点按照预设的个数进行采样,得到采样后的第一至第N阶邻居节点;预设的图卷积模型根据该节点的属性信息和采样后第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到该节点的向量表达。In an optional embodiment, when learning the sample data based on the heterogeneous subgraph, when there are many nodes in the heterogeneous subgraph, the neighbor nodes can be sampled, and the convolution calculation is performed based on the neighbor nodes in the sample. Take a heterogeneous subgraph as an example, traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph; from the heterogeneous subgraph that includes the node In, read the first to Nth order neighbor nodes of the node, N is a preset positive integer; the first to Nth order neighbor nodes of the node are compared to neighbors of the same order according to the weight of the edges between nodes The nodes sample according to the preset number to obtain the first to Nth order neighbor nodes after sampling; the preset graph convolution model is based on the attribute information of the node and the attribute information of the first to Nth order neighbor nodes after sampling Perform N-layer convolution with the structure information to obtain the vector expression of the node.
参见图6所示的子异构图,以二层卷积结构为例,将邻居节点的边权之和作为权重,对于节点进行邻居加权采样。基于边权重进行采样的原理参见图7所示,节点1的原始卷积结构如图7中左边的图所示,每个边的权重参见图中各边的标示数字。如果k=2,即每层只选择两个邻居节点进行卷积操作,对于节点1,一阶邻居选择节点2、4的概率较高,因为节点2、节点4的权重为3和4;若一阶邻居选择节点2、4,则二阶邻居节点选择节点1、10的概率较高,因为此时只将与一阶邻居节点2、4相连的边权计算进来作为权重,此时节点1的权重为3+4=7、节点10的权重为7,被采样的概率最高。Referring to the sub-heterogeneous graph shown in Figure 6, taking the two-layer convolution structure as an example, the sum of the edge weights of neighbor nodes is used as the weight, and neighbor weighted sampling is performed on the nodes. The principle of sampling based on edge weights is shown in FIG. 7. The original convolution structure of node 1 is shown in the left figure in FIG. 7, and the weight of each edge is shown in the label number of each edge in the figure. If k=2, that is, only two neighbor nodes are selected for convolution operation in each layer, for node 1, the probability of first-order neighbors selecting nodes 2 and 4 is higher, because the weights of node 2 and node 4 are 3 and 4; if The first-order neighbor chooses nodes 2 and 4, and the second-order neighbor node has a higher probability of choosing nodes 1 and 10, because at this time, only the edge weights connected to the first-order neighbor nodes 2 and 4 are calculated as weights. At this time, node 1 The weight of is 3+4=7, the weight of node 10 is 7, and the probability of being sampled is the highest.
在基于边的权重进行采样时,根据权重选取权重最高的k个节点,为了防止邻居采样的结果过于偏向少量热门节点,可以基于节点的权重w进行加权采样,得到k个采样节点,权重w可以表示为:When sampling based on the weight of the edge, the k nodes with the highest weight are selected according to the weight. In order to prevent the results of neighbor sampling from being too biased towards a small number of popular nodes, weighted sampling can be performed based on the node weight w to obtain k sampling nodes, and the weight w can be Expressed as:
Figure PCTCN2020070249-appb-000002
Figure PCTCN2020070249-appb-000002
其中,
Figure PCTCN2020070249-appb-000003
表示边e的边权重,
Figure PCTCN2020070249-appb-000004
表示第L层的节点v的当前权重,J表示与节点v i有边的上层节点个数,l表示第l层,i和j为指定节点的序号。
among them,
Figure PCTCN2020070249-appb-000003
Represents the edge weight of edge e,
Figure PCTCN2020070249-appb-000004
L represents the current weight of the heavy layer node v, J I v represents the node number of the nodes have an upper edge, l l represents layer, i and j is a sequence number for the specified node.
层节点采样可以兼顾上层邻居节点所有连接关系的基础上,将邻居节点的增长趋势由指数级别降低为线性级别。Layer node sampling can reduce the growth trend of neighbor nodes from exponential level to linear level on the basis of taking into account all the connection relationships of upper neighbor nodes.
步骤S305:预设的聚合模型按照异构子图中节点的向量表达,对样本数据进行聚合学习,得到不同异构子图中相同节点的同一个向量表达。Step S305: The preset aggregation model is based on the vector expression of the nodes in the heterogeneous subgraph, and the sample data is aggregated and learned to obtain the same vector expression of the same node in different heterogeneous subgraphs.
同一个节点可能存在于不同的异构子图中,如节点1在子图a、b、c、e、f中都存在,不同的异构子图卷积神经网络会得到不同的节点向量表达,而对于同一节点只需要一个唯一的低维向量表达进行后续的召回工作即可。因此,选用注意力机制或者全连接聚合机制或者加权平均聚合机制对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达,即将聚合加权结果作为最终的节点低维向量表达(embedding)结果。The same node may exist in different heterogeneous subgraphs. For example, node 1 exists in subgraphs a, b, c, e, and f, and different heterogeneous subgraph convolutional neural networks will get different node vector expressions. , And for the same node, only a unique low-dimensional vector expression is needed for subsequent recall work. Therefore, the attention mechanism or the fully connected aggregation mechanism or the weighted average aggregation mechanism is used to aggregate the vector expressions of the same node in different heterogeneous subgraphs, and the same vector expression of the same node in different heterogeneous subgraphs is obtained, which is to aggregate the weighted result As the final node low-dimensional vector expression (embedding) result.
对不同异构图中相同节点的向量表达进行汇聚的过程包括:The process of converging vector expressions of the same node in different heterogeneous graphs includes:
根据节点在每个异构子图中的向量表达和对应的学习权重因子,计算节点的在每个异构子图中的向量表达的权重;以注意力机制为例,计算权重
Figure PCTCN2020070249-appb-000005
的公式如下:
According to the vector expression of the node in each heterogeneous subgraph and the corresponding learning weight factor, calculate the weight of the vector expression of the node in each heterogeneous subgraph; take the attention mechanism as an example, calculate the weight
Figure PCTCN2020070249-appb-000005
The formula is as follows:
Figure PCTCN2020070249-appb-000006
Figure PCTCN2020070249-appb-000006
其中,
Figure PCTCN2020070249-appb-000007
表示多个自异构图得到的相同节点的不同向量表达,
Figure PCTCN2020070249-appb-000008
表示学习权重因子。
among them,
Figure PCTCN2020070249-appb-000007
Represents different vector expressions of the same node from multiple heterogeneous graphs,
Figure PCTCN2020070249-appb-000008
Represents the learning weight factor.
使用计算出来的权重对节点在每个异构子图中的向量表达进行加权求和,得到该节点的聚合后的低维向量表达
Figure PCTCN2020070249-appb-000009
Use the calculated weight to perform a weighted summation of the vector expression of the node in each heterogeneous subgraph to obtain the aggregated low-dimensional vector expression of the node
Figure PCTCN2020070249-appb-000009
Figure PCTCN2020070249-appb-000010
Figure PCTCN2020070249-appb-000010
其中,假设节点v的类型为p,则
Figure PCTCN2020070249-appb-000011
表示第L层节点类型为p的元路径(metapath)集合。
Among them, assuming that the type of node v is p, then
Figure PCTCN2020070249-appb-000011
Represents the metapath set of the L-th layer node type p.
当加入邻居采样后,可以对卷积模型进行调整,调整后的卷积模型如下:When neighbor sampling is added, the convolution model can be adjusted. The adjusted convolution model is as follows:
Figure PCTCN2020070249-appb-000012
Figure PCTCN2020070249-appb-000012
Figure PCTCN2020070249-appb-000013
Figure PCTCN2020070249-appb-000013
Figure PCTCN2020070249-appb-000014
Figure PCTCN2020070249-appb-000014
其中,
Figure PCTCN2020070249-appb-000015
表示的是第0层的节点低维向量表达,
Figure PCTCN2020070249-appb-000016
表示第l层的节点v邻居聚合的低维向量表达,WEIGHTEDMEAN表示加权平均,N表示满足metapath s k节点v的邻居,w表示加权平均中的权重,CONCAT表示是的两个向量直接拼接,
Figure PCTCN2020070249-appb-000017
表示第l层节点v的聚合自身信息和邻居信息的低维向量表达,W表示待学习的权重,σ表示非线性变换。
among them,
Figure PCTCN2020070249-appb-000015
Represents the low-dimensional vector expression of the nodes at level 0,
Figure PCTCN2020070249-appb-000016
Represents the low-dimensional vector expression of the aggregation of the neighbors of node v in the lth layer, WEIGHTEDMEAN represents the weighted average, N represents the neighbors of the node v that meets metapath s k , w represents the weight in the weighted average, and CONCAT represents the direct concatenation of the two vectors.
Figure PCTCN2020070249-appb-000017
Represents the low-dimensional vector expression of the aggregated own information and neighbor information of the l-th node v, W represents the weight to be learned, and σ represents the nonlinear transformation.
步骤S306:预设的损失函数基于样本数据和相同节点的同一个向量表达对模型的参数进行优化。Step S306: The preset loss function optimizes the parameters of the model based on the sample data and the same vector expression of the same node.
以广告搜索场景为例,基于上述步骤,可以得到广告、商品、查询词的低维向量表达。根据抽取的样本数据,为了实现个性化搜索召回,将用户当前查询词和用户之前点击过的广告或商品作为用户当前搜索请求,同时使用注意力机制,将查询词的低维向量表达(H Q)和多个前置点击的低维向量表达(H 1k、……、H Ik)聚合为最终的用户搜索请求向量。将用户搜索请求向量(H r)与当前广告低维向量表达(H ad)计算余弦(cosine)距离,使用点击状态作为标签数据(O label),计算逻辑斯蒂回归(sigmoid)交叉熵作为模型最终的损失函数,进行整个模型的训练。 Taking the advertisement search scenario as an example, based on the above steps, low-dimensional vector expressions of advertisements, products, and query words can be obtained. According to the extracted sample data, in order to achieve personalized search recall, the user’s current query term and the user’s previously clicked advertisement or product are used as the user’s current search request. At the same time, the attention mechanism is used to express the low-dimensional vector of the query term (H Q ) And multiple low-dimensional vector expressions of pre-clicks (H 1k , ..., H Ik ) are aggregated into the final user search request vector. Calculate the cosine distance between the user search request vector (H r ) and the current advertisement low-dimensional vector expression (H ad ), use the click state as the label data (O label ), and calculate the logistic regression (sigmoid) cross entropy as the model The final loss function is used to train the entire model.
以广告搜索为例,在构造样本时,将当前请求下被点击的广告视为正例,未被点击的广告视为负例,得到样本结构如下:(request,ad,click-label),包括请求、搜索广告和点击标签。其中,请求request=(query,{realtimeclicked items\ads} k),包括搜索广告和多个真实点击商品。 Taking ad search as an example, when constructing the sample, the ads that are clicked under the current request are regarded as positive examples, and the ads that are not clicked are regarded as negative examples. The sample structure is as follows: (request, ad, click-label), including Requests, search ads, and click labels. Wherein, the request request=(query,{realtimeclicked items\ads} k ), including search advertisements and multiple real-time clicked products.
使用sigmoid交叉熵作为损失函数,整个模型的优化目标表示为:Using sigmoid cross entropy as the loss function, the optimization objective of the entire model is expressed as:
Figure PCTCN2020070249-appb-000018
Figure PCTCN2020070249-appb-000018
Figure PCTCN2020070249-appb-000019
Figure PCTCN2020070249-appb-000019
其中,y i表示标签数据,p i表示先验概率,v request、v ad分别表示虚拟请求节点和广告节点的向量表达,R(v request,v ad)表示虚拟请求节点和广告节点的向量表达之间的距离度量函数。 Among them, y i represents the label data, p i represents the prior probability, v request , v ad represent the vector expression of the virtual request node and the advertising node, and R(v request , v ad ) represents the vector expression of the virtual request node and the advertising node The distance measurement function between.
步骤S307:是否所有批次的样本数据已经获取完毕,若否,执行步骤308;若是执行步骤S309。Step S307: Whether the sample data of all batches have been acquired, if not, go to step 308; if yes, go to step S309.
步骤S308:获取下一个批次的样本数据,并返回执行步骤S304。Step S308: Obtain the sample data of the next batch, and return to step S304.
从而实现继续获取下一个批次的样本数据进行学习,直至所有批次的样本数据学习完毕。In this way, the sample data of the next batch will continue to be obtained for learning until the sample data of all batches have been learned.
步骤S309:得到异构图中每个节点的一个低维向量表达。异构图中的一个节点对应样本数据中的一个实体。Step S309: Obtain a low-dimensional vector expression of each node in the heterogeneous graph. A node in the heterogeneous graph corresponds to an entity in the sample data.
通过重复对所有批次的样本数据进行预设次数的训练,得到异构图中每个节点的一个低维向量表达,异构图中的一个节点对应样本数据中的一个实体。Through repeated training on all batches of sample data, a low-dimensional vector expression of each node in the heterogeneous graph is obtained. A node in the heterogeneous graph corresponds to an entity in the sample data.
如图2所示系统原理,最小边是一个异构图的示意。上边一层的四排白色小方块为异构图中的节点向量初始化,得到各节点的最初的向量表达,然后输入各个异构子图对应的学习模型中,经学习模型对一个批次样本数据进行学习,根据学习结果更新异构子图中各节点的向量表达后,对各异构子图中相同节点的向量表达进行汇聚,得到相同节点的一个汇聚后的向量表达,例如图2中的
Figure PCTCN2020070249-appb-000020
为搜索广告节点的汇聚后的向量表达,
Figure PCTCN2020070249-appb-000021
为查询词节点的汇聚后的向量表达,
Figure PCTCN2020070249-appb-000022
为各商品节点的汇聚后的向量表达。对
Figure PCTCN2020070249-appb-000023
进行得到
Figure PCTCN2020070249-appb-000024
Figure PCTCN2020070249-appb-000025
Figure PCTCN2020070249-appb-000026
得到损失函数O label,使用O label对各模型的系统参数进行优化,使用参数优化后的系统模型对下一批次的样本数据进行学习,根据学习结果更新异构子图中各节点的向量表达后,对各异构子图中相同节点的向量表达进行汇聚,得到相同节点的一个汇聚后的向量表达,进一步根据汇聚结果得到新的损失函数O label,并对模型参数进行优化更新后继续再下一个批次的样本数据,直至所有批次的样本数据都学习完毕,得到异构图中各节点的一个最终的向量表达。
As shown in the system principle in Figure 2, the smallest edge is a schematic representation of a heterogeneous graph. The four rows of small white squares in the upper layer are the node vectors in the heterogeneous graph. The initial vector expression of each node is obtained, and then input into the learning model corresponding to each heterogeneous subgraph. After the learning model, a batch of sample data After learning, update the vector expression of each node in the heterogeneous subgraph according to the learning result, converge the vector expression of the same node in each heterogeneous subgraph, and obtain a converged vector expression of the same node, for example, in Figure 2
Figure PCTCN2020070249-appb-000020
Is the converged vector expression of search advertising nodes,
Figure PCTCN2020070249-appb-000021
Is the convergent vector expression of query term nodes,
Figure PCTCN2020070249-appb-000022
It is the vector expression after the convergence of each commodity node. Correct
Figure PCTCN2020070249-appb-000023
Get it
Figure PCTCN2020070249-appb-000024
by
Figure PCTCN2020070249-appb-000025
with
Figure PCTCN2020070249-appb-000026
Obtain the loss function O label , use O label to optimize the system parameters of each model, use the parameter optimized system model to learn the next batch of sample data, and update the vector expression of each node in the heterogeneous subgraph according to the learning result Then, the vector expressions of the same node in each heterogeneous subgraph are aggregated to obtain a converged vector expression of the same node. The new loss function O label is further obtained according to the aggregation result, and the model parameters are optimized and updated before continuing. The sample data of the next batch until the sample data of all batches have been learned, and a final vector expression of each node in the heterogeneous graph is obtained.
基于同一发明构思,本发明实施例还提供一种获取实体间关系表达的系统,该系统 可以设置在网络中的网络设备、云端的云端设备或者架构的服务器设备、用户端设备等设备中。该系统的结构如图8所示,包括:注册装置803、存储装置801、计算装置802和参数交换装置804。Based on the same inventive concept, the embodiments of the present invention also provide a system for obtaining expressions of relationships between entities. The system can be set up in network equipment, cloud equipment in the cloud, or architecture server equipment, client equipment and other equipment. The structure of the system is shown in FIG. 8 and includes: a registration device 803, a storage device 801, a calculation device 802, and a parameter exchange device 804.
存储装置801,用于存储异构子图的数据;The storage device 801 is used to store data of heterogeneous subgraphs;
计算装置802,用于通过注册装置803从存储装置801获取异构子图的数据,采用上述的获取实体间关系表达的方法基于异构图对样本数据进行学习,得到异构图中每个节点的低维向量表达。The computing device 802 is configured to obtain the data of the heterogeneous subgraph from the storage device 801 through the registration device 803, and learn the sample data based on the heterogeneous graph by using the above-mentioned method of obtaining the relationship expression between entities to obtain each node in the heterogeneous graph The low-dimensional vector expression of.
参数交换装置804,用于与计算装置进行参数交互。The parameter exchange device 804 is used for parameter interaction with the computing device.
计算装置802通过注册装置803从存储装置获取各节点和边的数据,包括:The computing device 802 obtains the data of each node and edge from the storage device through the registration device 803, including:
计算装置802向注册装置803发送数据查询请求,数据查询请求中包括要查询的异构子图的信息;接收注册装置803返回的查询结果,查询结果中包括存储异构子图数据的存储装置信息;根据存储装置信息向相应的存储装置801获取异构子图的数据。The computing device 802 sends a data query request to the registration device 803, the data query request includes the information of the heterogeneous subgraph to be queried; receives the query result returned by the registration device 803, and the query result includes the storage device information storing the heterogeneous subgraph data ; Obtain heterogeneous subgraph data from the corresponding storage device 801 according to the storage device information.
可选的,上述存储装置801中还可以存储异构图中各节点以及边的数据和样本数据。Optionally, the storage device 801 may also store data and sample data of each node and edge in the heterogeneous graph.
计算装置802向注册装置803发送数据查询请求,数据查询请求中包括要查询的节点和边的信息;接收注册装置803返回的查询结果,查询结果中包括存储节点和边的数据的存储装置信息;根据存储装置信息向相应的存储装置801获取各节点和边的数据。The computing device 802 sends a data query request to the registration device 803, the data query request includes the information of the node and edge to be queried; receiving the query result returned by the registration device 803, the query result includes the storage device information storing the data of the node and edge; Obtain the data of each node and edge from the corresponding storage device 801 according to the storage device information.
基于同一发明构思,本发明实施例还提供一种广告召回系统,参照图9所示,包括获取实体间关系表达的系统901和广告召回匹配系统902;Based on the same inventive concept, an embodiment of the present invention also provides an advertisement recall system. As shown in FIG. 9, it includes a system 901 for obtaining relationship expressions between entities and an advertisement recall matching system 902;
获取实体间关系表达的系统901,用于构建用于广告搜索场景的异构图,异构图中的所述节点类型包括:广告、商品、查询词中的至少一种,所述边的类型包括点击边、共同点击边、协同过滤边、内容语义相似边和属性相似边中的至少一种;The system 901 for obtaining expressions of relationships between entities is used to construct a heterogeneous graph for advertisement search scenarios. The node types in the heterogeneous graph include: at least one of advertisements, commodities, and query terms, and the type of edges Including at least one of a click side, a common click side, a collaborative filtering side, a content semantically similar side, and an attribute similar side;
根据预先定义的元路径,将预先构建的异构图分为至少两个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型;Divide the pre-built heterogeneous graph into at least two heterogeneous subgraphs according to a predefined meta-path, where the meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
获取一个批次的样本数据;Obtain a batch of sample data;
预设的图卷积模型按照异构子图,对一个批次的样本数据进行学习,得到异构子图中节点的向量表达,一个图卷积模型对应一个异构子图;The preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs. A graph convolution model corresponds to a heterogeneous subgraph;
预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达;The preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
预设的损失函数基于所述样本数据和所述相同节点的同一个向量表达对所述模型的参数进行优化;The preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
继续获取下一个批次的样本数据进行学习,直至所有批次的样本数据学习完毕,得到所述异构图中包括的广告节点、商品节点、查询词节点的低维向量表达,异构图中的一个节点对应样本数据中的一个实体;Continue to acquire the sample data of the next batch for learning, until the sample data of all batches have been learned, and the low-dimensional vector expressions of the advertising nodes, commodity nodes, and query word nodes included in the heterogeneous graph are obtained. A node of corresponds to an entity in the sample data;
广告召回匹配系统902,用于使用获取实体间关系表达的系统得到的查询词节点、商品节点和搜索广告节点的低维向量表达,确定查询词节点、商品节点和搜索广告节点之间的匹配程度,根据所述匹配程度选择与商品、查询词匹配程度符合设定要求的搜索广告。The advertisement recall matching system 902 is used to use the low-dimensional vector expressions of query term nodes, product nodes and search advertisement nodes obtained by the system for obtaining relationship expressions between entities to determine the degree of matching between query term nodes, commodity nodes and search advertisement nodes , According to the matching degree, select the search advertisement that matches the product and the query term with the set requirement.
上述系统中,获取实体间关系表达的系统定义的元路径,一条元路径对应一个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型具体为:一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型;具体为:一条元路径中包括按顺序交替排列的节点类型和边类型,其中,排序在第一位和最后一位的是节点类型,节点类型和边类型的排列顺序表达了异构子图的结构;In the above system, the meta-path defined by the system for obtaining the expression of the relationship between entities, one meta-path corresponds to a heterogeneous subgraph, and the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edges included in the heterogeneous subgraph The specific type is: a meta-path is used to express the structure of a heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph; specifically: a meta-path includes node types and edge types alternately arranged in order, Among them, the node type is ranked first and last, and the arrangement order of node type and edge type expresses the structure of heterogeneous subgraph;
可选的,获取实体间关系表达的系统根据预先设定的元路径,将异构图拆分为至少两个异构子图具体包括:根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图,具体为针对预先设定的至少两条元路径中的每一条元路径,根据所述元路径中包括节点类型,获取所述异构图中相应类型的节点;按照连接各相邻节点的边的类型,从所述异构图中获取符合要求的边;由获取到的相应类型的节点和符合要求的边,组成该元路径对应的异构子图。Optionally, the system for obtaining the expression of the relationship between entities divides the heterogeneous graph into at least two heterogeneous subgraphs according to a preset meta-path specifically including: dividing the heterogeneous graph into at least two preset meta-paths The graph is split into at least two heterogeneous subgraphs, specifically for each of the at least two preset meta-paths, the corresponding type in the heterogeneous graph is obtained according to the node type included in the meta-path Nodes; According to the types of edges connecting adjacent nodes, obtain the required edges from the heterogeneous graph; the obtained corresponding types of nodes and the required edges constitute the heterogeneous sub-path corresponding to the meta path Figure.
可选的,获取实体间关系表达的系统通过预设的图卷积模型按照异构子图,对所述样本数据进行学习,得到异构子图中节点的向量表达,具体包括:预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,得到所述异构图中节点的向量表达。Optionally, the system for obtaining the expression of the relationship between entities learns the sample data according to the heterogeneous subgraph through a preset graph convolution model to obtain the vector expression of the nodes in the heterogeneous subgraph, which specifically includes: a preset The graph convolution model obtains the vector expression of the nodes in the heterogeneous graph according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph.
可选的,获取实体间关系表达的系统通过预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,得到所述异构图中每个节点的向量表达,具体包括:Optionally, the system for obtaining the expression of the relationship between entities uses a preset graph convolution model according to the attribute information of each node in the heterogeneous subgraph and the structural information of at least first-order neighbor nodes of each node in the heterogeneous subgraph and Attribute information to obtain the vector expression of each node in the heterogeneous graph, which specifically includes:
遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到所述实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
从包括该节点的异构子图中,读取所述节点的第一阶至第N阶邻居节点,所述N为预设的正整数;From the heterogeneous subgraph including the node, read the neighboring nodes of the first order to the Nth order of the node, where N is a preset positive integer;
预设的图卷积模型根据所述节点的属性信息和第一至第N阶邻居节点的属性信息和 结构信息进行N层卷积运算,得到所述节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth order neighbor nodes to obtain the vector expression of the node.
可选的,获取实体间关系表达的系统通过预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,得到所述异构图中每个节点的向量表达,具体包括:Optionally, the system for obtaining the expression of the relationship between entities uses a preset graph convolution model according to the attribute information of each node in the heterogeneous subgraph and the structural information of at least first-order neighbor nodes of each node in the heterogeneous subgraph and Attribute information to obtain the vector expression of each node in the heterogeneous graph, which specifically includes:
遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到所述实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
从包括该节点的异构子图中,读取所述节点的第一阶至第N阶邻居节点,所述N为预设的正整数;From the heterogeneous subgraph including the node, read the neighboring nodes of the first order to the Nth order of the node, where N is a preset positive integer;
对所述节点的第一阶至第N阶邻居节点按照节点之间边的权重对同一阶的邻居节点按照预设的个数进行采样,得到采样后的第一至第N阶邻居节点;Sampling the neighbor nodes of the same order according to the weight of the edges between the nodes on the neighbor nodes of the first to Nth order of the node according to a preset number to obtain the first to Nth neighbor nodes after sampling;
预设的图卷积模型根据所述节点的属性信息和采样后第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到所述节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes after sampling to obtain the vector expression of the node.
可选的,获取实体间关系表达的系统通过预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达,具体包括:Optionally, the system for obtaining expressions of relationships between entities aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data through a preset aggregation model to obtain the same vector expression of the same node in different heterogeneous subgraphs , Specifically including:
所述预设的聚合模型基于所述样本数据,使用进行注意力机制聚合学习或者全连接聚合学习或者加权平均聚合学习对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达。The preset aggregation model is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs. The same vector representation of the same node in the graph.
可选的,广告召回匹配系统确定查询词节点、商品节点和搜索广告节点之间的匹配程度,包括:Optionally, the advertisement recall matching system determines the degree of matching between query term nodes, product nodes and search advertisement nodes, including:
使用注意力机制或者全连接聚合机制或者加权平均聚合机制对查询词节点的低维向量表达和同查询词下的用户前置点击商品节点的低维向量表达进行汇聚,得到虚拟请求节点的低维向量表达;所述虚拟请求节点为通过查询词节点和通查询词下的用户前置点击的商品节点构建出的虚拟节点;Use the attention mechanism or the fully connected aggregation mechanism or the weighted average aggregation mechanism to converge the low-dimensional vector expression of the query term node and the low-dimensional vector expression of the user's pre-clicked product node under the same query term to obtain the low dimensionality of the virtual request node Vector expression; the virtual request node is a virtual node constructed by the query term node and the commodity node pre-clicked by the user under the common query term;
根据虚拟请求节点的低维向量表达与搜索广告节点的低维向量表达,确定查询词节点、商品节点和搜索广告节点之间的匹配程度。According to the low-dimensional vector expression of the virtual request node and the low-dimensional vector expression of the search advertisement node, the matching degree between the query term node, the product node and the search advertisement node is determined.
可选的,广告召回匹配系统根据所述匹配程度选择与商品、查询词匹配程度符合设定要求的搜索广告,包括:Optionally, the advertisement recall matching system selects search advertisements that match the product and query terms according to the matching degree, including:
根据所述虚拟请求节点的低维融合信息向量与搜索广告节点的低维融合信息向量的余弦距离,选择距离符合设定要求的搜索广告。According to the cosine distance between the low-dimensional fusion information vector of the virtual request node and the low-dimensional fusion information vector of the search advertisement node, a search advertisement whose distance meets the set requirement is selected.
本发明实施例还提供一种计算机可读存储介质,其上存储有计算机指令,该指令被处理器执行时实现上述的获取实体间关系表达的方法。An embodiment of the present invention also provides a computer-readable storage medium on which computer instructions are stored, and when the instructions are executed by a processor, the foregoing method for obtaining expressions of relationships between entities is implemented.
本发明实施例还提供一种异构图学习设备,包括:存储器,处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述的获取实体间关系表达的方法。An embodiment of the present invention also provides a heterogeneous graph learning device, including: a memory, a processor, and a computer program stored in the memory and running on the processor, and the processor implements the above-mentioned acquisition entity when the program is executed. The method of expressing the relationship.
关于上述实施例中的系统,其中各个装置或模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the system in the foregoing embodiment, the specific manner in which each device or module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.
除非另外具体陈述,术语比如处理、计算、运算、确定、显示等等可以指一个或更多个处理或者计算系统、或类似设备的动作和/或过程,所述动作和/或过程将表示为处理系统的寄存器或存储器内的物理(如电子)量的数据操作和转换成为类似地表示为处理系统的存储器、寄存器或者其他此类信息存储、发射或者显示设备内的物理量的其他数据。信息和信号可以使用多种不同的技术和方法中的任何一种来表示。例如,在贯穿上面的描述中提及的数据、指令、命令、信息、信号、比特、符号和码片可以用电压、电流、电磁波、磁场或粒子、光场或粒子或者其任意组合来表示。Unless specifically stated otherwise, terms such as processing, calculation, operation, determination, display, etc. may refer to one or more actions and/or processes of processing or computing systems, or similar devices, and the actions and/or processes will be expressed as The data manipulation and conversion of physical (such as electronic) quantities in the registers or memory of the processing system becomes other data similarly represented as physical quantities in the memory, registers or other such information storage, transmission or display devices of the processing system. Information and signals can be represented using any of a variety of different technologies and methods. For example, the data, instructions, commands, information, signals, bits, symbols, and chips mentioned throughout the above description can be represented by voltage, current, electromagnetic waves, magnetic fields or particles, light fields or particles, or any combination thereof.
应该明白,公开的过程中的步骤的特定顺序或层次是示例性方法的实例。基于设计偏好,应该理解,过程中的步骤的特定顺序或层次可以在不脱离本公开的保护范围的情况下得到重新安排。所附的方法权利要求以示例性的顺序给出了各种步骤的要素,并且不是要限于所述的特定顺序或层次。It should be understood that the specific order or hierarchy of steps in the disclosed process is an example of an exemplary method. Based on design preferences, it should be understood that the specific order or level of steps in the process can be rearranged without departing from the scope of protection of the present disclosure. The accompanying method claims present elements of the various steps in an exemplary order and are not intended to be limited to the specific order or hierarchy described.
在上述的详细描述中,各种特征一起组合在单个的实施方案中,以简化本公开。不应该将这种公开方法解释为反映了这样的意图,即,所要求保护的主题的实施方案需要清楚地在每个权利要求中所陈述的特征更多的特征。相反,如所附的权利要求书所反映的那样,本发明处于比所公开的单个实施方案的全部特征少的状态。因此,所附的权利要求书特此清楚地被并入详细描述中,其中每项权利要求独自作为本发明单独的优选实施方案。In the above detailed description, various features are combined together in a single embodiment to simplify the present disclosure. This method of disclosure should not be interpreted as reflecting the intent that the implementation of the claimed subject matter needs to clearly state more features in each claim. On the contrary, as reflected in the appended claims, the present invention is in a state with fewer features than the disclosed single embodiment. Therefore, the appended claims are hereby clearly incorporated into the detailed description, with each claim standing alone as a separate preferred embodiment of the present invention.
本领域技术人员还应当理解,结合本文的实施例描述的各种说明性的逻辑框、模块、电路和算法步骤均可以实现成电子硬件、计算机软件或其组合。为了清楚地说明硬件和软件之间的可交换性,上面对各种说明性的部件、框、模块、电路和步骤均围绕其功能进行了一般地描述。至于这种功能是实现成硬件还是实现成软件,取决于特定的应用和对整个系统所施加的设计约束条件。熟练的技术人员可以针对每个特定应用,以变通的方式实现所描述的功能,但是,这种实现决策不应解释为背离本公开的保护范围。Those skilled in the art should also understand that the various illustrative logical blocks, modules, circuits, and algorithm steps described in conjunction with the embodiments herein can all be implemented as electronic hardware, computer software, or a combination thereof. In order to clearly illustrate the interchangeability between hardware and software, various illustrative components, blocks, modules, circuits, and steps are described above generally around their functions. As for whether this function is implemented as hardware or as software, it depends on the specific application and the design constraints imposed on the entire system. Skilled technicians can implement the described functions in a flexible manner for each specific application, but this implementation decision should not be interpreted as a departure from the protection scope of the present disclosure.
结合本文的实施例所描述的方法或者算法的步骤可直接体现为硬件、由处理器执行的软件模块或其组合。软件模块可以位于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动磁盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质连接至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。该ASIC可以位于用户终端中。当然,处理器和存储介质也可以作为分立组件存在于用户终端中。The steps of the method or algorithm described in combination with the embodiments of this document can be directly embodied as hardware, a software module executed by a processor, or a combination thereof. The software module can be located in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM or any other form of storage medium known in the art. An exemplary storage medium is connected to the processor, so that the processor can read information from the storage medium and can write information to the storage medium. Of course, the storage medium may also be a component of the processor. The processor and the storage medium may be located in the ASIC. The ASIC can be located in the user terminal. Of course, the processor and the storage medium may also exist as discrete components in the user terminal.
对于软件实现,本申请中描述的技术可用执行本申请所述功能的模块(例如,过程、函数等)来实现。这些软件代码可以存储在存储器单元并由处理器执行。存储器单元可以实现在处理器内,也可以实现在处理器外,在后一种情况下,它经由各种手段以通信方式耦合到处理器,这些都是本领域中所公知的。For software implementation, the technology described in this application can be implemented with modules (for example, procedures, functions, etc.) that perform the functions described in this application. These software codes can be stored in a memory unit and executed by a processor. The memory unit may be implemented in the processor or outside the processor. In the latter case, it is communicatively coupled to the processor through various means, which are well known in the art.
上文的描述包括一个或多个实施例的举例。当然,为了描述上述实施例而描述部件或方法的所有可能的结合是不可能的,但是本领域普通技术人员应该认识到,各个实施例可以做进一步的组合和排列。因此,本文中描述的实施例旨在涵盖落入所附权利要求书的保护范围内的所有这样的改变、修改和变型。此外,就说明书或权利要求书中使用的术语“包含”,该词的涵盖方式类似于术语“包括”,就如同“包括,”在权利要求中用作衔接词所解释的那样。此外,使用在权利要求书的说明书中的任何一个术语“或者”是要表示“非排它性的或者”。The foregoing description includes examples of one or more embodiments. Of course, it is impossible to describe all possible combinations of components or methods in order to describe the above-mentioned embodiments, but those of ordinary skill in the art should recognize that the various embodiments can be further combined and arranged. Therefore, the embodiments described herein are intended to cover all such changes, modifications and variations that fall within the protection scope of the appended claims. In addition, with regard to the term "comprising" used in the specification or claims, the covering manner of the word is similar to that of the term "including", just as "including," is explained as an adaptor in the claims. In addition, any term "or" used in the description of the claims is intended to mean a "non-exclusive or".

Claims (14)

  1. 一种广告召回系统,包括获取实体间关系表达的系统和广告召回匹配系统;An advertisement recall system, including a system for obtaining relationship expressions between entities and an advertisement recall matching system;
    所述获取实体间关系表达的系统,用于构建用于广告搜索场景的异构图,所述异构图中的节点类型包括:广告、商品、查询词中的至少一种,边的类型包括点击边、共同点击边、协同过滤边、内容语义相似边和属性相似边中的至少一种;The system for obtaining expressions of relationships between entities is used to construct a heterogeneous graph for advertisement search scenarios. The types of nodes in the heterogeneous graph include at least one of advertisements, commodities, and query words, and the types of edges include At least one of a click side, a co-click side, a collaborative filtering side, a content semantically similar side, and an attribute similar side;
    根据预先定义的元路径,将预先构建的异构图分为至少两个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型;Divide the pre-built heterogeneous graph into at least two heterogeneous subgraphs according to a predefined meta-path, where the meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
    获取一个批次的样本数据;Obtain a batch of sample data;
    预设的图卷积模型按照异构子图,对一个批次的样本数据进行学习,得到异构子图中节点的向量表达,一个图卷积模型对应一个异构子图;The preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs. A graph convolution model corresponds to a heterogeneous subgraph;
    预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达;The preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
    预设的损失函数基于所述样本数据和所述相同节点的同一个向量表达对所述模型的参数进行优化;The preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
    继续获取下一个批次的样本数据进行学习,直至所有批次的样本数据学习完毕,得到所述异构图中包括的广告节点、商品节点、查询词节点的低维向量表达,异构图中的一个节点对应样本数据中的一个实体;Continue to acquire the sample data of the next batch for learning, until the sample data of all batches have been learned, and the low-dimensional vector expressions of the advertising nodes, commodity nodes, and query word nodes included in the heterogeneous graph are obtained. A node of corresponds to an entity in the sample data;
    所述广告召回匹配系统,用于使用所述获取实体间关系表达的系统得到的查询词节点、商品节点和搜索广告节点的低维向量表达,确定查询词节点、商品节点和搜索广告节点之间的匹配程度,根据所述匹配程度选择与商品、查询词匹配程度符合设定要求的搜索广告。The advertisement recall matching system is configured to use the low-dimensional vector expressions of query term nodes, commodity nodes and search advertisement nodes obtained by the system for obtaining inter-entity relationship expressions to determine the relationship between query term nodes, commodity nodes and search advertisement nodes According to the matching degree, select search advertisements that match the product and query terms according to the set requirements.
  2. 如权利要求1所述的系统,其特征在于,一条元路径对应一个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型具体为:一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型;The system of claim 1, wherein a meta-path corresponds to a heterogeneous subgraph, and the meta-path is used to express the structure of the heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph are specifically : A meta-path is used to express the structure of a heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph;
    所述根据预先设定的元路径,将异构图拆分为至少两个异构子图具体包括:The splitting the heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path specifically includes:
    根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图。Split the heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths.
  3. 如权利要求1所述的系统,其特征在于,所述获取实体间关系表达的系统通过预设的图卷积模型按照异构子图,对所述样本数据进行学习,得到异构子图中节点的向量表达,具体包括:The system according to claim 1, wherein the system for obtaining expressions of relationships between entities uses a preset graph convolution model to learn from the sample data according to heterogeneous subgraphs to obtain heterogeneous subgraphs The vector representation of the node includes:
    预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的 至少一阶邻居节点的结构信息和属性信息,得到所述异构图中节点的向量表达。The preset graph convolution model obtains the information of the nodes in the heterogeneous graph according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph. Vector expression.
  4. 如权利要求1所述的系统,其特征在于,所述获取实体间关系表达的系统通过预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达,具体包括:The system according to claim 1, wherein the system for obtaining expressions of relationships between entities aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data through a preset aggregation model to obtain different differences. The same vector expression of the same node in the composition graph includes:
    所述预设的聚合模型基于所述样本数据,使用进行注意力机制聚合学习或者全连接聚合学习或者加权平均聚合学习对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达。The preset aggregation model is based on the sample data, using attention mechanism aggregation learning, fully connected aggregation learning, or weighted average aggregation learning to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain different heterogeneous subgraphs. The same vector representation of the same node in the graph.
  5. 如权利要求1所述的系统,其特征在于,所述广告召回匹配系统确定查询词节点、商品节点和搜索广告节点之间的匹配程度,包括:The system of claim 1, wherein the advertisement recall matching system determines the degree of matching among query term nodes, commodity nodes, and search advertisement nodes, comprising:
    使用注意力机制或者全连接聚合机制或者加权平均聚合机制对查询词节点的低维向量表达和同查询词下的用户前置点击商品节点的低维向量表达进行汇聚,得到虚拟请求节点的低维向量表达;所述虚拟请求节点为通过查询词节点和通查询词下的用户前置点击的商品节点构建出的虚拟节点;Use the attention mechanism or the fully connected aggregation mechanism or the weighted average aggregation mechanism to converge the low-dimensional vector expression of the query term node and the low-dimensional vector expression of the user's pre-clicked product node under the same query term to obtain the low dimensionality of the virtual request node Vector expression; the virtual request node is a virtual node constructed by the query term node and the commodity node pre-clicked by the user under the common query term;
    根据虚拟请求节点的低维向量表达与搜索广告节点的低维向量表达,确定查询词节点、商品节点和搜索广告节点之间的匹配程度。According to the low-dimensional vector expression of the virtual request node and the low-dimensional vector expression of the search advertisement node, the matching degree between the query term node, the product node and the search advertisement node is determined.
  6. 如权利要求5所述的系统,其特征在于,所述广告召回匹配系统根据所述匹配程度选择与商品、查询词匹配程度符合设定要求的搜索广告,包括:The system according to claim 5, wherein the advertisement recall matching system selects search advertisements that match the product and query terms according to the matching degree, and the search advertisement includes:
    根据所述虚拟请求节点的低维向量表达与搜索广告节点的低维向量表达的余弦距离,选择距离符合设定要求的搜索广告。According to the cosine distance between the low-dimensional vector expression of the virtual request node and the low-dimensional vector expression of the search advertisement node, a search advertisement whose distance meets the set requirement is selected.
  7. 一种获取实体间关系表达的方法,其特征在于,包括:A method for obtaining expressions of relationships between entities, characterized in that it includes:
    根据预先定义的元路径,将预先构建的异构图分为至少两个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型;Divide the pre-built heterogeneous graph into at least two heterogeneous subgraphs according to a predefined meta-path, where the meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph;
    获取一个批次的样本数据;Obtain a batch of sample data;
    预设的图卷积模型按照异构子图,对一个批次的样本数据进行学习,得到异构子图中节点的向量表达,一个图卷积模型对应一个异构子图;The preset graph convolution model learns a batch of sample data according to heterogeneous subgraphs to obtain the vector expression of nodes in heterogeneous subgraphs. A graph convolution model corresponds to a heterogeneous subgraph;
    预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达;The preset aggregation model is based on sample data and aggregates the vector expressions of the same node in different heterogeneous subgraphs to obtain the same vector expression of the same node in different heterogeneous subgraphs;
    预设的损失函数基于所述样本数据和所述相同节点的同一个向量表达对所述模型的参数进行优化;The preset loss function optimizes the parameters of the model based on the same vector expression of the sample data and the same node;
    继续获取下一个批次的样本数据进行学习,直至所有批次的样本数据学习完毕,得 到所述异构图中每个节点的一个低维向量表达,异构图中的一个节点对应样本数据中的一个实体。Continue to acquire the sample data of the next batch for learning, until the sample data of all batches have been learned, and a low-dimensional vector expression of each node in the heterogeneous graph is obtained. A node in the heterogeneous graph corresponds to the sample data Of an entity.
  8. 如权利要求7所述的方法,其特征在于,一条元路径对应一个异构子图,所述元路径用于表达异构子图的结构及异构子图包括的节点类型和边类型具体为:一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型;The method of claim 7, wherein a meta-path corresponds to a heterogeneous subgraph, and the meta-path is used to express the structure of the heterogeneous subgraph and the types of nodes and edges included in the heterogeneous subgraph are specifically : A meta-path is used to express the structure of a heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph;
    所述根据预先设定的元路径,将异构图拆分为至少两个异构子图具体包括:The splitting the heterogeneous graph into at least two heterogeneous subgraphs according to the preset meta-path specifically includes:
    根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图。Split the heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths.
  9. 根据权利要求8所述的方法,其特征在于,所述一条元路径用于表达一个异构子图的结构及该异构子图包括的节点类型和边类型,具体为:The method according to claim 8, wherein the one meta-path is used to express the structure of a heterogeneous subgraph and the node types and edge types included in the heterogeneous subgraph, specifically:
    一条元路径中包括按顺序交替排列的节点类型和边类型,其中,排序在第一位和最后一位的是节点类型,节点类型和边类型的排列顺序表达了异构子图的结构;A meta-path includes node types and edge types alternately arranged in order. Among them, the node types are ranked first and last. The order of the node types and edge types expresses the structure of heterogeneous subgraphs;
    所述根据预先设定的至少两条元路径,将异构图拆分为至少两个异构子图具体包括:The splitting a heterogeneous graph into at least two heterogeneous subgraphs according to at least two preset meta-paths specifically includes:
    针对预先设定的至少两条元路径中的每一条元路径,根据所述元路径中包括节点类型,获取所述异构图中相应类型的节点;按照连接各相邻节点的边的类型,从所述异构图中获取符合要求的边;由获取到的相应类型的节点和符合要求的边,组成该元路径对应的异构子图。For each of the at least two preset meta-paths, obtain the corresponding type of node in the heterogeneous graph according to the node type included in the meta-path; according to the type of the edge connecting each adjacent node, Obtain the edges that meet the requirements from the heterogeneous graph; the obtained nodes of the corresponding type and the edges that meet the requirements form the heterogeneous subgraph corresponding to the meta-path.
  10. 如权利要求7所述的方法,其特征在于,预设的图卷积模型按照异构子图,对所述样本数据进行学习,得到异构子图中节点的向量表达,具体包括:The method according to claim 7, wherein the preset graph convolution model learns the sample data according to the heterogeneous subgraph to obtain the vector expression of the nodes in the heterogeneous subgraph, which specifically includes:
    预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对所述样本数据进行学习,得到所述异构子图中节点的向量表达。The preset graph convolution model learns the sample data according to the attribute information of each node in the heterogeneous subgraph and the structure information and attribute information of the at least first-order neighbor nodes of each node in the heterogeneous subgraph, to obtain Vector expression of nodes in the heterogeneous subgraph.
  11. 如权利要求10所述的方法,其特征在于,所述预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对所述样本数据进行学习,得到所述异构子图中每个节点的向量表达,具体包括:The method of claim 10, wherein the preset graph convolution model is based on the attribute information of each node in the heterogeneous subgraph and the value of at least first-order neighbor nodes of each node in the heterogeneous subgraph. Structure information and attribute information, learn the sample data to obtain the vector expression of each node in the heterogeneous subgraph, which specifically includes:
    遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到所述实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
    从包括该节点的异构子图中,读取所述节点的第一阶至第N阶邻居节点,所述N为预设的正整数;From the heterogeneous subgraph including the node, read the neighboring nodes of the first order to the Nth order of the node, where N is a preset positive integer;
    预设的图卷积模型根据所述节点的属性信息和第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到所述节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes to obtain the vector expression of the node.
  12. 如权利要求10所述的方法,其特征在于,所述预设的图卷积模型根据异构子图的每个节点的属性信息及异构子图中每个节点的至少一阶邻居节点的结构信息和属性信息,对所述样本数据进行学习,得到所述异构图中每个节点的向量表达,具体包括:The method of claim 10, wherein the preset graph convolution model is based on the attribute information of each node in the heterogeneous subgraph and the value of at least first-order neighbor nodes of each node in the heterogeneous subgraph. The structure information and attribute information are learned from the sample data to obtain the vector expression of each node in the heterogeneous graph, which specifically includes:
    遍历样本数据,针对当前遍历到的一条样本数据,读取其记录的实体,并找到所述实体在异构图中对应的节点;Traverse the sample data, read the recorded entity for a piece of sample data currently traversed, and find the corresponding node of the entity in the heterogeneous graph;
    从包括该节点的异构子图中,读取所述节点的第一阶至第N阶邻居节点,所述N为预设的正整数;From the heterogeneous subgraph including the node, read the neighboring nodes of the first order to the Nth order of the node, where N is a preset positive integer;
    对所述节点的第一阶至第N阶邻居节点按照节点之间边的权重对同一阶的邻居节点按照预设的个数进行采样,得到采样后的第一至第N阶邻居节点;Sampling the neighbor nodes of the same order according to the weight of the edges between the nodes on the neighbor nodes of the first to Nth order of the node according to a preset number to obtain the first to Nth neighbor nodes after sampling;
    预设的图卷积模型根据所述节点的属性信息和采样后第一至第N阶邻居节点的属性信息和结构信息进行N层卷积运算,得到所述节点的向量表达。The preset graph convolution model performs an N-layer convolution operation according to the attribute information of the node and the attribute information and structure information of the first to Nth-order neighbor nodes after sampling to obtain the vector expression of the node.
  13. 如权利要求7所述的方法,其特征在于,所述预设的聚合模型基于样本数据,对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达,具体包括:The method according to claim 7, wherein the preset aggregation model aggregates vector expressions of the same node in different heterogeneous subgraphs based on sample data to obtain the same node of the same node in different heterogeneous subgraphs. A vector expression, including:
    所述预设的聚合模型基于所述样本数据,使用注意力机制或者全连接聚合机制或者加权平均聚合机制对不同异构子图中相同节点的向量表达进行聚合,得到不同异构子图中相同节点的同一个向量表达。The preset aggregation model is based on the sample data, and uses an attention mechanism or a fully connected aggregation mechanism or a weighted average aggregation mechanism to aggregate the vector expressions of the same node in different heterogeneous subgraphs to obtain the same The same vector representation of the node.
  14. 一种获取实体间关系表达的系统,其特征在于,包括:注册装置、存储装置、计算装置和参数交换装置;A system for obtaining expressions of relationships between entities, which is characterized by comprising: a registration device, a storage device, a calculation device, and a parameter exchange device;
    存储装置,用于存储异构子图的数据;Storage device for storing data of heterogeneous sub-graphs;
    计算装置,用于通过注册装置从存储装置获取异构子图的数据,采用如权利要求7-13任一所述的获取实体间关系表达的方法基于异构图对样本数据进行学习,得到异构图中每个节点的低维向量表达;The computing device is used to obtain the data of the heterogeneous subgraph from the storage device through the registration device, and learn the sample data based on the heterogeneous graph by using the method for obtaining the relationship expression between entities according to any one of claims 7-13 to obtain the heterogeneous graph. Low-dimensional vector expression of each node in the composition;
    参数交换装置,用于与计算装置进行参数交互。The parameter exchange device is used for parameter interaction with the computing device.
PCT/CN2020/070249 2019-01-16 2020-01-03 Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system WO2020147594A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910041466.9A CN111444394B (en) 2019-01-16 2019-01-16 Method, system and equipment for obtaining relation expression between entities and advertisement recall system
CN201910041466.9 2019-01-16

Publications (1)

Publication Number Publication Date
WO2020147594A1 true WO2020147594A1 (en) 2020-07-23

Family

ID=71613283

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/070249 WO2020147594A1 (en) 2019-01-16 2020-01-03 Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system

Country Status (2)

Country Link
CN (1) CN111444394B (en)
WO (1) WO2020147594A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380435A (en) * 2020-11-16 2021-02-19 北京大学 Literature recommendation method and recommendation system based on heterogeneous graph neural network
CN113094558A (en) * 2021-04-08 2021-07-09 电子科技大学 Network node influence sequencing method based on local structure
CN113254580A (en) * 2021-05-24 2021-08-13 厦门大学 Special group searching method and system
CN113420551A (en) * 2021-07-13 2021-09-21 华中师范大学 Biomedical entity relation extraction method for modeling entity similarity
CN115186086A (en) * 2022-06-27 2022-10-14 长安大学 Literature recommendation method for embedding expected value in heterogeneous environment
CN117350461A (en) * 2023-12-05 2024-01-05 湖南财信数字科技有限公司 Enterprise abnormal behavior early warning method, system, computer equipment and storage medium

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507185B (en) * 2020-10-22 2022-08-19 复旦大学 User portrait determination method and device
CN112214499B (en) 2020-12-03 2021-03-19 腾讯科技(深圳)有限公司 Graph data processing method and device, computer equipment and storage medium
CN112767054A (en) * 2021-01-29 2021-05-07 北京达佳互联信息技术有限公司 Data recommendation method, device, server and computer-readable storage medium
CN112948591B (en) * 2021-02-25 2024-02-09 成都数联铭品科技有限公司 Subgraph matching method and system suitable for directed graph and electronic equipment
CN113268574B (en) * 2021-05-25 2022-12-20 山东交通学院 Graph volume network knowledge base question-answering method and system based on dependency structure
CN113434556B (en) * 2021-07-22 2022-05-31 支付宝(杭州)信息技术有限公司 Data processing method and system
CN113553446B (en) * 2021-07-28 2022-05-24 厦门国际银行股份有限公司 Financial anti-fraud method and device based on heterograph deconstruction

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630901A (en) * 2015-12-21 2016-06-01 清华大学 Knowledge graph representation learning method
CN106528609A (en) * 2016-09-28 2017-03-22 厦门理工学院 Vector constraint embedded transformation knowledge graph inference method
CN106909622A (en) * 2017-01-20 2017-06-30 中国科学院计算技术研究所 Knowledge mapping vector representation method, knowledge mapping relation inference method and system
US20180341863A1 (en) * 2017-05-27 2018-11-29 Ricoh Company, Ltd. Knowledge graph processing method and device
CN109002488A (en) * 2018-06-26 2018-12-14 北京邮电大学 A kind of recommended models training method and device based on first path context
CN109213801A (en) * 2018-08-09 2019-01-15 阿里巴巴集团控股有限公司 Data digging method and device based on incidence relation

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160189218A1 (en) * 2014-12-30 2016-06-30 Yahoo, Inc. Systems and methods for sponsored search ad matching
CN106155635B (en) * 2015-04-03 2020-09-18 北京奇虎科技有限公司 Data processing method and device
CN107944898A (en) * 2016-10-13 2018-04-20 驰众信息技术(上海)有限公司 The automatic discovery of advertisement putting building information and sort method
CN108763376B (en) * 2018-05-18 2020-09-29 浙江大学 Knowledge representation learning method for integrating relationship path, type and entity description information

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105630901A (en) * 2015-12-21 2016-06-01 清华大学 Knowledge graph representation learning method
CN106528609A (en) * 2016-09-28 2017-03-22 厦门理工学院 Vector constraint embedded transformation knowledge graph inference method
CN106909622A (en) * 2017-01-20 2017-06-30 中国科学院计算技术研究所 Knowledge mapping vector representation method, knowledge mapping relation inference method and system
US20180341863A1 (en) * 2017-05-27 2018-11-29 Ricoh Company, Ltd. Knowledge graph processing method and device
CN109002488A (en) * 2018-06-26 2018-12-14 北京邮电大学 A kind of recommended models training method and device based on first path context
CN109213801A (en) * 2018-08-09 2019-01-15 阿里巴巴集团控股有限公司 Data digging method and device based on incidence relation

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380435A (en) * 2020-11-16 2021-02-19 北京大学 Literature recommendation method and recommendation system based on heterogeneous graph neural network
CN112380435B (en) * 2020-11-16 2024-05-07 北京大学 Document recommendation method and system based on heterogeneous graph neural network
CN113094558A (en) * 2021-04-08 2021-07-09 电子科技大学 Network node influence sequencing method based on local structure
CN113094558B (en) * 2021-04-08 2023-10-20 电子科技大学 Network node influence ordering method based on local structure
CN113254580A (en) * 2021-05-24 2021-08-13 厦门大学 Special group searching method and system
CN113254580B (en) * 2021-05-24 2023-10-03 厦门大学 Special group searching method and system
CN113420551A (en) * 2021-07-13 2021-09-21 华中师范大学 Biomedical entity relation extraction method for modeling entity similarity
CN115186086A (en) * 2022-06-27 2022-10-14 长安大学 Literature recommendation method for embedding expected value in heterogeneous environment
CN115186086B (en) * 2022-06-27 2023-08-08 长安大学 Literature recommendation method for embedding expected value in heterogeneous environment
CN117350461A (en) * 2023-12-05 2024-01-05 湖南财信数字科技有限公司 Enterprise abnormal behavior early warning method, system, computer equipment and storage medium
CN117350461B (en) * 2023-12-05 2024-03-19 湖南财信数字科技有限公司 Enterprise abnormal behavior early warning method, system, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111444394B (en) 2023-05-23
CN111444394A (en) 2020-07-24

Similar Documents

Publication Publication Date Title
WO2020147594A1 (en) Method, system, and device for obtaining expression of relationship between entities, and advertisement retrieval system
WO2020147595A1 (en) Method, system and device for obtaining relationship expression between entities, and advertisement recalling system
CN112581191B (en) Training method and device of behavior prediction model
Mao et al. Multiobjective e-commerce recommendations based on hypergraph ranking
Li et al. On both cold-start and long-tail recommendation with social data
US8380723B2 (en) Query intent in information retrieval
JP3389948B2 (en) Display ad selection system
US20140279065A1 (en) Generating Ad Copy
US8660901B2 (en) Matching of advertising sources and keyword sets in online commerce platforms
CN105787767A (en) Method and system for obtaining advertisement click-through rate pre-estimation model
US20100100407A1 (en) Scaling optimization of allocation of online advertisement inventory
TW201537365A (en) Data search processing
US11636394B2 (en) Differentiable user-item co-clustering
US20100318427A1 (en) Enhancing database management by search, personal search, advertising, and databases analysis efficiently using core-set implementations
CN111783963A (en) Recommendation method based on star atlas neural network
CN108960293B (en) CTR (China train reactor) estimation method and system based on FM (frequency modulation) algorithm
Xin et al. ATNN: adversarial two-tower neural network for new item’s popularity prediction in E-commerce
Liang et al. Collaborative filtering based on information-theoretic co-clustering
JP2017201535A (en) Determination device, learning device, determination method, and determination program
CN112446739B (en) Click rate prediction method and system based on decomposition machine and graph neural network
Zeng et al. Collaborative filtering via heterogeneous neural networks
Yang et al. Exploring different interaction among features for CTR prediction
CN114841765A (en) Sequence recommendation method based on meta-path neighborhood target generalization
KR101985603B1 (en) Recommendation method based on tripartite graph
CN114329167A (en) Hyper-parameter learning, intelligent recommendation, keyword and multimedia recommendation method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20742058

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20742058

Country of ref document: EP

Kind code of ref document: A1