CN108491511A - Data digging method and device, model training method based on diagram data and device - Google Patents

Data digging method and device, model training method based on diagram data and device Download PDF

Info

Publication number
CN108491511A
CN108491511A CN201810246990.5A CN201810246990A CN108491511A CN 108491511 A CN108491511 A CN 108491511A CN 201810246990 A CN201810246990 A CN 201810246990A CN 108491511 A CN108491511 A CN 108491511A
Authority
CN
China
Prior art keywords
node
user
sample
nodes
graph data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810246990.5A
Other languages
Chinese (zh)
Other versions
CN108491511B (en
Inventor
陈尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810246990.5A priority Critical patent/CN108491511B/en
Publication of CN108491511A publication Critical patent/CN108491511A/en
Application granted granted Critical
Publication of CN108491511B publication Critical patent/CN108491511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

This application involves a kind of data digging method based on diagram data and device, for the model training method and device of data mining, the data digging method based on diagram data includes:Obtain diagram data;The diagram data includes the side between nodal community and node, and the node includes user node and public identity node;The diagram data is inputted to the machine learning model trained;By the machine learning model, based on the side between the nodal community and node included by the diagram data, the behavior prediction result corresponding to user node in the node is determined;From the user node in the diagram data, the user node that corresponding behavior prediction result meets data mining condition is screened.Scheme provided by the present application can improve the accuracy of data mining results.

Description

Data mining method and device based on graph data and model training method and device
Technical Field
The application relates to the technical field of computers, in particular to a data mining method and device based on graph data and a model training method and device.
Background
The rapid development of computer technology and network technology brings great convenience to daily life and work of people. For example, more and more users communicate over the network, read browsing pages, or conduct online and offline transactions, etc. How to perform data mining on behavior data or social data of users and the like gradually becomes a focus of increasing attention of people.
In a conventional data mining method, data analysis is often performed by using existing relational data related to a user, historical relational data of the user is artificially encoded and then modeled, and then a trained regression model is used to predict user actions, such as whether the user clicks a page or whether the user performs a transaction. The traditional data mining method only simply analyzes the relational data of the user, and often causes the problem of inaccurate mining result.
Disclosure of Invention
Based on this, it is necessary to provide a data mining method and apparatus based on graph data, and a model training method and apparatus for data mining, aiming at the technical problem that the mining result of data mining is inaccurate.
A method of graph data-based data mining, comprising:
acquiring graph data; the graph data comprises node attributes and edges among nodes, and the nodes comprise user nodes and public identification nodes;
inputting the graph data into a trained machine learning model;
determining a behavior prediction result corresponding to a user node in the node based on the node attribute and the edge between the nodes included in the graph data through the machine learning model;
and screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
An apparatus for graph data-based data mining, the apparatus comprising:
the acquisition module is used for acquiring graph data; the graph data comprises node attributes and edges among nodes, and the nodes comprise user nodes and public identification nodes;
an input module for inputting the graph data into a trained machine learning model;
a determining module, configured to determine, through the machine learning model, a behavior prediction result corresponding to a user node in the node based on node attributes and edges between nodes included in the graph data;
and the screening module is used for screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the graph data-based data mining method.
A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the graph data based data mining method.
According to the data mining method, the data mining device, the computer readable storage medium and the computer equipment based on the graph data, the graph data including the node attributes and the edges among the nodes are input into the trained machine learning model, and the behavior prediction result corresponding to the user node in the node is determined through the trained machine learning model. The nodes comprise user nodes and public identification nodes. Because the graph data comprises the node attributes and the edges among the nodes, the trained machine learning model can make full use of the relationship information among the nodes in the graph data, the attribute information of the corresponding nodes and the like. The relationship information between nodes, such as the relationship information between user nodes and user nodes, the relationship information between user nodes and public identification nodes, can fully show the behavior habits or preferences of users. Therefore, when the trained machine learning model analyzes the graph data, comprehensive and accurate data characteristics can be extracted, and an accurate behavior prediction result corresponding to the user node is obtained. And then according to the behavior prediction result, screening the user nodes which accord with the data mining condition, wherein the screened user nodes are the potential valuable user nodes excavated, and the accuracy of the data mining result is greatly improved.
A model training method for data mining, comprising:
acquiring a graph data sample and a corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes;
inputting the graph data samples into a machine learning model;
determining an intermediate behavior prediction result corresponding to a user sample node in the sample node based on sample node attributes and edges among the sample nodes included in the graph data sample through the machine learning model;
and adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result and the label, and continuing training until the training stopping condition is met.
A model training apparatus for data mining, the apparatus comprising:
the acquisition module is used for acquiring the image data sample and the corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes;
an input module for inputting the graph data samples into a machine learning model;
the determining module is used for determining an intermediate behavior prediction result corresponding to a user sample node in the sample node based on the sample node attribute and the edges among the sample nodes included in the graph data sample through the machine learning model;
and the adjusting module is used for adjusting the model parameters of the machine learning model according to the difference between the intermediate behavior prediction result and the label and continuing training until the training stopping condition is met.
A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the graph data-based data mining method.
A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the graph data based data mining method.
The model training method, the device, the computer readable storage medium and the computer equipment for data mining input the graph data samples including the sample node attributes and the edges among the sample nodes into the machine learning model, and determine the intermediate behavior prediction results corresponding to the user sample nodes through the machine learning model. The sample nodes comprise user sample nodes and public identification sample nodes. Because the graph data sample comprises the sample node attributes and the edges among the sample nodes, the machine learning model can make full use of the relationship information among the sample nodes in the graph data sample, the attribute information of the corresponding sample nodes and the like. The relationship information between the sample nodes, such as the relationship information between the user sample nodes and the user sample nodes, the relationship information between the user sample nodes and the public identification sample nodes, can fully show the behavior habits or the preferences of the sample users. Therefore, when the machine learning model analyzes the graph data sample, comprehensive and accurate data characteristics can be extracted, and then model parameters of the machine learning model are continuously adjusted and training is continued according to the difference between the intermediate behavior prediction result and the corresponding label of the graph data sample until the training stopping condition is met, so that the training is finished. The machine learning model trained in the way can predict the accurate behavior result of the user node, so that the accuracy and effectiveness of model training are greatly improved, and the accuracy of the subsequent data mining result is further improved.
Drawings
FIG. 1 is a diagram of an application environment of a graph data-based data mining method in one embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a method for graph data-based data mining, according to one embodiment;
FIG. 3 is a flowchart illustrating the step of obtaining graph data in one embodiment;
FIG. 4 is a flowchart illustrating the steps of constructing graph data according to the read user identifier and corresponding user attribute, public identifier and corresponding public identifier attribute, user relationship, and behavior relationship in one embodiment;
FIG. 5 is a flowchart illustrating the step of determining a behavior prediction result corresponding to a user node in a node based on node attributes and edges between nodes included in graph data through a machine learning model in one embodiment;
FIG. 6 is a schematic flow chart diagram illustrating a method for graph data-based data mining in accordance with another embodiment;
FIG. 7 is a schematic flow diagram of a method for model training for data mining in one embodiment;
FIG. 8 is a schematic flow chart diagram illustrating a model training method for data mining in accordance with another embodiment;
FIG. 9 is a diagram of a data mining system architecture based on graph data, in one embodiment;
FIG. 10 is a block diagram of an embodiment of a graph data-based data mining device;
FIG. 11 is a block diagram showing the construction of a data mining apparatus based on graph data according to another embodiment;
FIG. 12 is a block diagram showing the construction of a data mining apparatus based on graph data according to still another embodiment;
FIG. 13 is a block diagram showing the construction of a data mining apparatus based on graph data according to still another embodiment;
FIG. 14 is a block diagram of a model training apparatus for data mining in one embodiment;
FIG. 15 is a block diagram showing a structure of a model training apparatus for data mining according to another embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
FIG. 1 is a diagram of an application environment of a graph data-based data mining method and/or a model training method for data mining in one embodiment. As shown in FIG. 1, the graph data-based data mining method and/or the model training method for data mining are applied to a computer device. The computer device may be a terminal or a server. The terminal may be a desktop device or a mobile terminal. The servers may be individual physical servers, clusters of physical servers, or virtual servers. Wherein the computer device comprises a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, may cause the processor to implement a graph data-based data mining method and/or a model training method for data mining. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a graph data-based data mining method and/or a model training method for data mining.
Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
As shown in FIG. 2, in one embodiment, a graph data-based data mining method is provided. The embodiment is mainly illustrated by applying the method to the computer device in fig. 1. Referring to fig. 2, the data mining method based on graph data specifically includes the following steps:
s202, acquiring graph data; the graph data comprises node attributes and edges among nodes, and the nodes comprise user nodes and public identification nodes.
The graph data is data expressed in a graph form, is non-relational data, and stores relationship information between entities by applying graph theory. The relational data is data composed of a two-dimensional row-column table, and is usually stored in a relational database. Non-relational data is data whose relational structure is complex and not fixed, and is typically stored in a non-relational database. Typically, graph data consists of node attributes and edges between nodes. For example, storing information between individuals in a social network using graph data may represent different nodes in the graph data that are not used by individuals and represent relationships between individuals using edges in the graph data.
The node attributes are characteristic attributes of the nodes and comprise user node attributes and public identification node attributes. The user node attribute is an attribute of a user to which the user node corresponds, such as a sex, an age, a native place, a residence, and the like of the user. The public identity node attribute is an attribute of a public identity corresponding to the public identity node, such as category information to which the public identity belongs, promotion information corresponding to the public identity, and the like. The category information of the public mark may be a category of a field to which the public mark belongs, such as a financial field, an insurance field, an electronic technology field, or a movie field.
Edges between nodes are nodes and relationships between nodes. The edges between the nodes comprise edges between the user nodes and the user nodes, edges between the user nodes and the public identification nodes, and edges between the public identification nodes and the public identification nodes. The user node and the edge between the user nodes may specifically represent a relationship between users corresponding to the user node, such as a mutual friend relationship, a unidirectional shielding relationship, a unidirectional blacking relationship, or a mutual attention relationship between the users. The edge between the user node and the public identity node may specifically represent a relationship between a user corresponding to the user node and a public identity corresponding to the public identity node, for example, the user pays attention to the public identity, or the user shields the public identity. The edge between the public identity node and the public identity node may specifically represent a relationship between public identities corresponding to the public identity node, for example, the public identity a and the public identity B are related public identities, and the like.
In one embodiment, a computer device may convert relational data obtained from a service platform relating to user and public identities into non-relational data for storage in a graph database. The computer device can obtain the graph data in the local graph database, or obtain the graph database stored by other devices, such as a graph database system, through network communication and the like. The acquired graph data comprises user node attributes corresponding to the user nodes, public identification node attributes corresponding to the public identification nodes and edges among the nodes.
In one embodiment, the computer device may acquire the map data whose generation time is within a preset time period according to a time point at which the map data is generated. For example, a computer device may obtain map data stored in a map database over the last month in an attempt to obtain the most up-to-date data for user and public identification.
And S204, inputting the graph data into the trained machine learning model.
The machine learning model is a machine learning algorithm model with classification capability after training. Machine Learning is generally called Machine Learning, ML for short. The machine learning model can be provided with classification capability through sample learning. The trained machine learning model is a machine learning model with classification capability trained by sample data and sample labels. In this embodiment, the trained machine learning model is obtained by training through the graph data samples and the corresponding labels, and continuously adjusting the model parameters until the training conditions are satisfied and then stopping the training.
Specifically, the computer device inputs the acquired graph data into the trained machine learning model, and the trained machine learning model processes the graph data to obtain a behavior prediction result corresponding to the user node. The corresponding machine learning Model may be a Markov Model (Markov Model), a VGG (Visual Geometry Group Visual aggregation Group) network Model, a google network Model, or a ResNet (energy efficiency assessment system) network Model, etc.
In one embodiment, a computer device may obtain graph data comprised of a plurality of user node attributes and a plurality of public identity node attributes, along with edges between nodes. Inputting the graph data into a trained machine learning model, extracting the required data by the machine learning model, and storing the data into an HDFS (Hadoop Distributed File System) Distributed storage environment. And then, the stored data are processed in a distributed mode, and data characteristics are extracted and analyzed to obtain a behavior prediction result corresponding to the user node.
And S206, determining a behavior prediction result corresponding to the user node in the node based on the node attribute and the edges among the nodes included in the graph data through a machine learning model.
The behavior prediction result is a prediction result of a user behavior, and specifically may be a prediction result of a user own behavior or a prediction result of a user for a public identification behavior. For example, the predicted classification result may be a predicted classification result for a user node, or a predicted classification result for an edge between nodes.
Specifically, the behavior prediction result may be a user behavior probability. For example, the user behavior prediction result may be a user behavior prediction probability, such as a probability of the user clicking promotion information, a probability of the user performing online transaction, or a probability of user loan default. The prediction result of the user for the public identity behavior may be a prediction probability of the user for the public identity behavior, for example, a probability of the user i reading the promotion information of the public identity j, or a probability of the user i purchasing a transaction product provided by the public identity j.
Specifically, after the computer device inputs graph data into the trained machine learning model, the machine learning model can determine the relationship between the nodes according to the edges between the nodes included in the graph data. For example, after the machine learning model determines a certain user node, the user nodes and/or public identity nodes adjacent to the user node may be determined according to edges between the nodes. The machine learning model can fully utilize the node attribute, the attribute of the user node and/or the attribute of the public identification node related to the node to determine the behavior prediction content corresponding to the user node. The node related to the node may be a node adjacent to the node, or may be a second-degree node or other multi-degree nodes of the node. The second-degree node is a node adjacent to the node, and the multi-degree node is a node connected with the nodes through a plurality of edges as the name suggests.
The user node adjacent to the user node may be a user node in which corresponding users have a friend relationship with each other, and the public identification node adjacent to the user node may be promotion information and the like of corresponding users who read corresponding public identifications. The promotion information is a message pushed by the public identity to a user who pays attention to the public identity, such as an advertisement, an article, news or a vote.
In one embodiment, the machine learning model may determine behavior prediction content corresponding to a user node by making use of node attributes, content of user nodes and/or public identity nodes adjacent to the node.
In one embodiment, after the machine learning model determines a node, the user nodes and/or public identity nodes adjacent to the node can be determined according to edges between the nodes. And determining user nodes and/or public identification nodes and the like adjacent to the adjacent nodes according to the edges between the nodes. And the machine learning model jointly determines the behavior prediction result corresponding to the user node according to the node attribute, the adjacent node of the node, the second-degree node or other multi-degree nodes of the node and the like.
S208, screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
The data mining conditions are preset conditions which are met during data mining. When the behavior prediction result is the user behavior prediction probability, the data mining condition may specifically be that the user behavior prediction probability is greater than or equal to a first preset threshold, or that the user behavior prediction probability is less than or equal to a second preset threshold. When the behavior prediction result is the prediction probability of the user for the public identity behavior, the data mining condition may specifically be that the prediction probability of the user for the public identity behavior is greater than or equal to a third threshold, or the maximum probability in the prediction probability of the user for the public identity behavior, or the like.
In one embodiment, when the behavior prediction result is a user behavior prediction probability, the computer device may filter user nodes in the graph data, where the corresponding user behavior prediction probability is greater than or equal to a first preset threshold, or the user behavior prediction probability is less than or equal to a second preset threshold.
In one embodiment, when the behavior prediction result is a prediction probability of the user for the public identity behavior, the computer device may filter, from the user nodes included in the graph data, the user nodes and corresponding public identities for which the prediction probability of the corresponding user for the public identity behavior satisfies a preset condition. For example, for a certain user node, a public identity node corresponding to a public identity, which enables a user corresponding to the user node to have the maximum behavior prediction probability for the public identity, is selected as a user node and a public identity node, of which the corresponding screened behavior prediction results meet data mining conditions.
The data mining method based on the graph data inputs the graph data including the node attributes and the edges among the nodes into the trained machine learning model, and determines the behavior prediction result corresponding to the user nodes in the nodes through the trained machine learning model. The nodes comprise user nodes and public identification nodes. Because the graph data comprises the node attributes and the edges among the nodes, the trained machine learning model can make full use of the relationship information among the nodes in the graph data, the attribute information of the corresponding nodes and the like. The relationship information between nodes, such as the relationship information between user nodes and user nodes, the relationship information between user nodes and public identification nodes, can fully show the behavior habits or preferences of users. Therefore, when the trained machine learning model analyzes the graph data, comprehensive and accurate data characteristics can be extracted, and an accurate behavior prediction result corresponding to the user node is obtained. And then according to the behavior prediction result, screening the user nodes which accord with the data mining condition, wherein the screened user nodes are the potential valuable user nodes excavated, and the accuracy of the data mining result is greatly improved.
In one embodiment, step S202 specifically includes the following steps:
s302, reading the user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship between the user identifications and the behavior relationship between the user identifications and the public identifications from the relational database.
The user identifier is used to uniquely identify the user, and may be specifically one of a number, a letter, a word, or a character. The public mark is used for uniquely identifying the public mark and can be one of numbers, letters, characters or characters. The user relationship among the user identifiers may be a mutual friend relationship, a one-way shielding relationship, a one-way blacking relationship, or a mutual attention relationship, etc. The behavior relationship between the user identifier and the public identifier may specifically be that the user pays attention to the public identifier, the user reads promotion information pushed by the public identifier, or the user shields the public identifier.
Specifically, the computer device may obtain a plurality of two-dimensional row and column tables capable of being connected with each other from the relational database, and read a user relationship between the user identifier and a corresponding user attribute, a public identifier and a corresponding public identifier attribute, a user relationship between the user identifiers, and a behavior relationship between the user identifier and the public identifier from the obtained two-dimensional row and column tables.
S304, according to the read user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship and the behavior relationship, graph data is constructed.
Specifically, the computer device may construct graph data according to the read user identifier and the corresponding user attribute, the public identifier and the corresponding public identifier attribute, the user relationship, and the behavior relationship, and store the constructed graph data in the graph database.
In one embodiment, the computer device may construct the graph data unit according to the read user identifier and the corresponding user attribute, the public identifier and the corresponding public identifier attribute, the user relationship between the user identifiers, and the behavior relationship between the user identifiers and the public identifiers. A graph data element may be represented by a set of triple data whose structure is that of a (subject, predicate, object). The computer device may sort the read data into data of a triple structure to constitute a graph data unit. The plurality of map data units collectively constitute map data.
For example, graph data elements such as (user 1, user id, 100058), (user 1, age, 26), (user 1, gender, male), (user 1, friend, user 1_1), … (user 1, friend, user 1_ n1), (user 1, reading, article 1), (article 1, article id, 87322544), (article 1, author, user 2), (user 2, user id, 253301), (article 1, published in, public identity 1), (public identity 1, first class, finance), (public identity 1, second class, insurance), and so forth. And when the predicate in the triple data is age or gender, the object in the triple data is attribute information. When the predicates in the triple data are friends, the objects in the triple data belong to the social information. When the predicate in the triple data is reading, the object in the triple data is behavior information. When the predicates in the triple data are classified, the objects in the triple data belong to the domain knowledge information. As can be seen from the above illustration, the predicates are the attributes of the nodes with the subjects as the corresponding objects in the triple data of age, gender or classification. The three sets of data with predicates of friends, reading and the like represent the relationship between the subject and the object. Therefore, heterogeneous data can be organized by the knowledge graph in a triple form, and the traditional relational database is prevented from being divided into a large number of different two-dimensional row-column table structures for storage.
In the embodiment, the graph data is constructed according to the user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship and the behavior relationship in the relational database, the data stored in a large number of two-dimensional row-column tables can be recombined, and the graph data with the heterogeneous structure can be conveniently and quickly constructed.
In one embodiment, step S304 specifically includes the following steps:
s402, according to the read user identification and the corresponding user attribute, constructing a user node and a corresponding node attribute in the graph data.
Specifically, the computer device may construct a corresponding user node in the graph data according to the read user identifier. The user nodes and the user identifications are in one-to-one correspondence. That is, when a computer device reads multiple user identities, a corresponding number of user nodes are constructed. And the computer device constructs node attributes of the user nodes in the graph data according to the user attributes corresponding to the user identifications.
S404, according to the read public identification and the corresponding public identification attribute, public identification nodes and corresponding node attributes in the graph data are constructed.
Specifically, the computer device may construct a corresponding public identity node in the graph data according to the read public identity. The public identification nodes and the public identifications are in one-to-one correspondence. That is, when the computer device reads a plurality of public identities, a corresponding number of public identity nodes are constructed. And the computer equipment constructs the public identification node attribute of the public identification node in the graph data according to the public identification attribute corresponding to the public identification.
S406, edges among the user nodes in the graph data are constructed according to the read user relationship.
Specifically, the computer device may construct edges between user nodes in the graph data according to the read user relationships. For example, when the user relationships are a friend relationship, a one-way shielding relationship, a one-way blacking relationship, or a mutual attention relationship, an edge representing the corresponding relationship is constructed between the user nodes. When no relation exists between the users, no edge exists between corresponding user nodes in the graph data.
And S408, constructing edges between the user nodes and the public identification nodes in the graph data according to the read behavior relation.
Specifically, the computer device may construct an edge between the user node and the public identity node in the graph data according to the read behavior relationship between the user identity and the public identity. For example, when the behavioral relationship between the user identifier and the public identifier is that the user pays attention to the public identifier, the user reads promotion information pushed by the public identifier, or the user masks the public identifier, an edge representing the corresponding relationship is constructed between the user node and the public identifier node. When no relation exists between the user and the public identification, no edge exists between the corresponding user node and the public identification node in the graph data.
In one embodiment, the computer device may further read the relationship between the public identities, and construct an edge between the public identities and the public identities according to the relationship between the public identities.
In the above embodiment, the user node and the corresponding node attribute in the graph data are constructed according to the user identifier and the corresponding user attribute. And constructing public identification nodes and corresponding node attributes in the graph data according to the public identifications and the corresponding public identification attributes. And respectively constructing edges among the nodes in the graph data according to the user relationship or the behavior relationship. The graph data constructed in the way can fully represent respective node attributes of the user node and the public identification node and the relationship between the nodes, and can conveniently and quickly organize important data in a plurality of relational data to convert the important data into corresponding graph data, so that subsequent data mining can be smoothly carried out.
In one embodiment, step S206 specifically includes:
s502, through a machine learning model, based on node attributes and edges among nodes included in the graph data, the implicit characteristic vectors corresponding to the nodes in the graph data are calculated in an iterative mode.
The feature vector is a vector representing a feature of the data. Implicit feature vectors are feature vectors that cannot be observed, but can be inferred from observable variables. The implicit feature vector corresponding to the node is the implicit feature vector corresponding to the node, reflects the node attribute information related to the node, the edge information in the graph data and the like, and can fully represent the node. For example, for an arbitrary node n, it is assumed that there is an implicit feature vector s that can adequately represent the noden. Then snMay be associated with node n and other nodes l adjacent to node n.
In particular, the computer device may determine nodes associated with the nodes based on edges between the nodes. For example, the computer device may determine nodes adjacent to the node by the edge. Alternatively, the computer device may also determine a second degree node, a third degree node, or other multi-degree nodes of the node, etc. according to the edges in the graph data. The computer equipment can jointly iterate and calculate the corresponding implicit characteristic vector of each node in the graph data through the machine learning model according to the node attribute included in the graph data and the node attribute of the node related to the node.
In one embodiment, for any node n, assume that the implicit feature vector corresponding to that node n is snThen snThe data associated with node n may be fully characterized. The nodes in the graph data can be divided into two types, one is a user node and the other is a public identification node. For user node viSuppose with user node viThe corresponding implicit feature vector is hiThen h isiCan fully represent and communicate with user node viCharacteristics of the associated data. For public identification node ujSuppose a node u identifies with the publicjThe corresponding implicit feature vector is qjThen q isjNode u capable of fully representing and identifying public identityjCharacteristics of the associated data. And, according to statistical theory, with the iteration of the algorithm, hiAnd q isjWill converge to a stable value.
In one embodiment, the computer device may calculate, for each node in the graph data, an implicit feature vector for a current iteration of each node according to the corresponding node attribute, the implicit feature vector for a previous iteration of the node, and the implicit feature vector for a previous iteration of a node adjacent through the edge, through the first neural network of the machine learning model, until the implicit feature vector for the current iteration satisfies an iteration stop condition.
In one embodiment, the computer device may calculate, for each node in the graph data, an implicit feature vector for a current iteration of each node according to the corresponding node attribute, an implicit feature vector for a previous iteration of the node, an implicit feature vector for a previous iteration of a node adjacent to the edge, and an implicit feature vector for a previous iteration of a second degree node of the node, until the implicit feature vector for the current iteration satisfies an iteration stop condition, through the first neural network of the machine learning model.
In one embodiment, when the first neural network of the machine learning model initially calculates the implicit feature vector of each node in the current iteration, the implicit feature vector of the previous iteration of the initial iteration may be assumed to be a random value. That is, at the beginning of the algorithm iteration, an initial random value is set for the implicit feature vector of the previous iteration of the node, the implicit feature vector of the previous iteration of the node adjacent to the edge or the implicit feature vector of the previous iteration of the second-degree node of the node. And performing iterative computation by taking the current implicit characteristic vector of the computed node as the previous implicit characteristic vector of the next iterative computation.
In one embodiment, the iteration stop condition may be a preset iteration number, when an implicit feature vector corresponding to a node in the graph data obtained by iterative computation converges to a stable value, or the time of iterative computation reaches a preset time, and the like.
And S504, calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting a behavior prediction result corresponding to the user node in the node.
Specifically, the computer device may input implicit feature vectors corresponding to each node in the graph data iteratively calculated by the first neural network in the machine learning model to a second neural network in the machine learning model. And calculating the implicit characteristic vector through a second neural network, and outputting a behavior prediction result corresponding to the user node in the node. The behavior prediction result corresponding to the user node comprises a prediction result of the user behavior or a prediction result of the user aiming at the public identification behavior.
In one embodiment, the computer device may map the implicit feature vectors obtained by iterative computation to the user's own behavior prediction results through a machine learning model. For example, the following formula can be used to calculate the user's own behavior prediction result pi:pi=f1(W4hi). Wherein h isiRepresentation and user node viCorresponding implicit featuresVector quantity; w4Is a model parameter; f. of1Representing a mapping relationship.
In one embodiment, the computer device may map the implicit feature vectors obtained by iterative computation to the predicted results of the user for the public identity behavior through a machine learning model. For example, the following formula can be adopted to calculate the predicted result p of the user for the public identity behaviori,jWherein h isiRepresentation and user node viA corresponding implicit feature vector; q. q.sjRepresentation and public identity node ujA corresponding implicit feature vector;is the sum of the implicit characteristic vectors of all public identification nodes j adjacent to the user node i; w4And W5Is a model parameter; f. of2Representing a mapping relationship.
In the above embodiment, the implicit feature vectors corresponding to the nodes in the graph data are iteratively calculated through the machine learning model based on the node attributes included in the graph data and the edges between the nodes, and then the behavior prediction results corresponding to the user nodes are output according to the implicit feature vectors. Therefore, the data characteristics of the graph data are learned through the machine learning model, the structured graph data are converted into the implicit characteristic vectors, and the behavior prediction results corresponding to the user nodes are calculated according to the implicit characteristic vectors, so that the behavior prediction results corresponding to the user nodes are more accurate.
In one embodiment, step S502 specifically includes: and calculating the implicit characteristic vector of each node in the current iteration according to the corresponding node attribute, the implicit characteristic vector of the previous iteration and the implicit characteristic vector of the previous iteration of the node adjacent to the edge for each node in the graph data through a machine learning model until the implicit characteristic vector of the current iteration meets the iteration stop condition.
Specifically, the computer device may obtain, for each node in the graph data, an implicit feature vector of a previous iteration of the node and an implicit feature vector of a previous iteration of a node adjacent to the edge, respectively, through the trained machine learning model. And calculating to obtain the implicit characteristic vector of each node in the current iteration according to the node attribute corresponding to the node, the implicit characteristic vector of the previous iteration of the node and the implicit characteristic vector of the previous iteration of the node adjacent to the edge through a first neural network of a machine learning model.
And the machine learning model takes the implicit characteristic vector of the current iteration of the node as the implicit characteristic vector of the previous iteration of the next adjacent node, and the implicit characteristic vector of the node adjacent to the passing edge of the current iteration is taken as the implicit characteristic vector of the previous iteration of the next adjacent node to perform iterative computation until the implicit characteristic vector of the current iteration meets the iteration stop condition.
In the above embodiment, for each node in the graph data, the implicit feature vector of the current iteration of each node is calculated through the machine learning model according to the corresponding node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration of the node adjacent to the edge until the implicit feature vector of the current iteration meets the iteration stop condition. Through continuous iteration, the calculated implicit characteristic vectors corresponding to the nodes can completely reflect node attributes related to the nodes, edge information in graph data and the like, and the nodes can be fully represented.
In one embodiment, calculating the implicit feature vector of each node in the current iteration according to the corresponding node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration passing through the nodes adjacent to the edge includes: calculating the implicit characteristic vector of each node in the current iteration by the following formula:
wherein,is an implicit feature vector of the node n in the current iteration; w is a1、w2And w3Respectively, model parameters;is an implicit eigenvector of the previous iteration of the node n; dnIs the node attribute corresponding to node n; k is a radical of{n,l}1 means that node n and node l are adjacent;is the sum of all implicit eigenvectors of the previous iteration of the node l adjacent to the node n; f represents a mapping relationship.
Specifically, the machine learning model depends on the node attribute d corresponding to the node nnImplicit feature vector of previous iterationAnd implicit feature vectors of previous iterations through edge-adjacent nodesWhen calculating the implicit feature vector of each node in the next iteration, the following formula may be specifically adopted for calculation:
in one embodiment, node n in the graph data comprises a user node viAnd a public identity node uj. For user node viWith the user node viThe corresponding implicit feature vector is hi. For public identification node ujAnd public identification node ujThe corresponding implicit feature vector is qj
For user node viAccording to user node viCorresponding node attribute xiUser node viImplicit feature vectors of previous iterationsWith user node viAdjacent user node vpImplicit feature vector of previous iterationAnd a user node viAdjacent public identification node ujImplicit feature vector of previous iterationJointly calculating to obtain user node viThe implicit feature vector of the current iteration. Specifically, the following formula can be used for calculation:
wherein,is a user node viImplicit feature vectors for the current iteration; w is a1、w2、w31And w32Respectively, model parameters;is a user node viImplicit feature vectors of the previous iteration; x is the number ofiIs a user node viCorresponding node attributes; e.g. of the type{i,p}User node v is denoted by 1iAnd a user node vpAdjacent, for example, user p and user i are friends with each other;is all with the user node viAdjacent user node vpThe sum of the implicit eigenvectors of the previous iteration; m is{i,j}User node v is denoted by 1iAnd a public identity node ujAdjacent, for example, the user i reads the promotion information pushed by the public identity j;is all with the user node viAdjacent public identification node ujThe sum of the implicit eigenvectors of the previous iteration; f. of3Representing a mapping relationship.
For public identification node ujCan identify the node u according to the publicjCorresponding node attribute gjPublic identification node ujImplicit feature vectors of previous iterationsAnd with the public identification node ujAdjacent user node viImplicit feature vector of previous iterationCommon calculation to obtain public identification node ujThe implicit feature vector of the current iteration. Specifically, the following formula can be used for calculation:
wherein,is a public identification node ujImplicit feature vectors for the current iteration; w is a1、w2And w3Respectively, model parameters;is a public identification node ujImplicit feature vectors of the previous iteration; gjIs a public identification node ujCorresponding node attributes; m is{i,j}User node v is denoted by 1iAnd a public identity node ujThe adjacent ones are adjacent to each other,for example, the user i reads promotion information pushed by the public identity j;all and public identification nodes ujAdjacent user node viThe sum of the implicit eigenvectors of the previous iteration; f. of4Representing a mapping relationship.
In the above embodiment, the implicit characteristic vector of each node in the current iteration is iteratively calculated according to the corresponding formula, and through continuous iteration, the calculated implicit characteristic vector corresponding to the node can completely reflect the node attribute related to the node, the edge information in the graph data, and the like, so that the node can be fully represented.
In one embodiment, when the behavior prediction result is a user behavior prediction probability for public identity, step S208 includes: and screening the user nodes and the corresponding public identification nodes from the user nodes and the public identification nodes included in the graph data, wherein the behavior prediction results corresponding to the screened user nodes and the corresponding public identification nodes together accord with the data mining conditions. The data mining method based on the graph data further comprises the following steps: and executing service operation related to the screened public identification node aiming at the screened user node.
The service operation is to perform service processing, for example, to push popularization information of the public identity corresponding to the public identity node to a user corresponding to the user node. The user behavior prediction probability for the public identity may be a prediction probability for a user behavior for the public identity, for example, a probability that the user i reads promotion information of the public identity j, or a probability that the user i purchases a transaction product provided by the public identity j.
Specifically, when the behavior prediction result is the prediction probability of the user for the public identification behavior, the user node and the corresponding public identification node are screened from the user node and the public identification node included in the graph data, and the behavior prediction result corresponding to the screened user node and the corresponding public identification node together meets the data mining condition. For example, the user node and the corresponding public identity node corresponding to the user prediction probability for the public identity behavior greater than or equal to the third threshold or the maximum probability in all the public identity behavior prediction probabilities of the user are screened out.
After the computer device screens out the user nodes and the corresponding public identification nodes, the behavior prediction results of which accord with the data mining conditions, the business operation related to the screened public identification nodes is executed aiming at the screened user nodes. For example, the promotion information of the public identity corresponding to the screened public identity node is pushed to the user corresponding to the screened user node.
In the above embodiment, by screening the user node whose behavior prediction result meets the data mining condition and the corresponding public identity node, the service operation related to the screened public identity node can be executed for the screened user node, so as to implement the service operation related to both the user node and the public identity node.
As shown in FIG. 6, in a specific embodiment, the graph data-based data mining method comprises the following steps:
s602, reading the user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship between the user identifications and the behavior relationship between the user identifications and the public identification from the relational database.
S604, according to the read user identification and the corresponding user attribute, a user node and a corresponding node attribute in the graph data are constructed.
And S606, constructing public identification nodes and corresponding node attributes in the graph data according to the read public identifications and the corresponding public identification attributes.
S608, according to the read user relation, edges among the user nodes in the graph data are constructed.
S610, according to the read behavior relation, edges between the user nodes and the public identification nodes in the graph data are constructed.
And S612, inputting the graph data into the trained machine learning model.
And S614, calculating the implicit characteristic vector of each node in the current iteration according to the corresponding node attribute, the implicit characteristic vector of the previous iteration and the implicit characteristic vector of the previous iteration passing through the adjacent node of the edge for each node in the graph data through the machine learning model until the implicit characteristic vector of the current iteration meets the iteration stop condition.
And S616, calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting a behavior prediction result corresponding to the user node in the node.
And S618, screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
According to the data mining method, the data mining device, the computer readable storage medium and the computer equipment based on the graph data, the graph data including the node attributes and the edges among the nodes are input into the trained machine learning model, and the behavior prediction result corresponding to the user node in the node is determined through the trained machine learning model. The nodes comprise user nodes and public identification nodes. Because the graph data comprises the node attributes and the edges among the nodes, the trained machine learning model can make full use of the relationship information among the nodes in the graph data, the attribute information of the corresponding nodes and the like. The relationship information between nodes, such as the relationship information between user nodes and user nodes, the relationship information between user nodes and public identification nodes, can fully show the behavior habits or preferences of users. Therefore, when the trained machine learning model analyzes the graph data, comprehensive and accurate data characteristics can be extracted, and an accurate behavior prediction result corresponding to the user node is obtained. And then according to the behavior prediction result, screening the user nodes which accord with the data mining condition, wherein the screened user nodes are the potential valuable user nodes excavated, and the accuracy of the data mining result is greatly improved.
FIG. 6 is a flowchart illustrating a method for graph data-based data mining, according to an embodiment. It should be understood that, although the steps in the flowchart of fig. 6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 6 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
As shown in FIG. 7, in one embodiment, a model training method for data mining is provided. The embodiment is mainly illustrated by applying the method to the computer device in fig. 1. Referring to fig. 7, the model training method for data mining specifically includes the following steps:
s702, acquiring a graph data sample and a corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes.
The graph data samples are graph data serving as training samples, and the corresponding labels are corresponding labels of the graph data samples. The graph data sample includes user sample nodes and public identity sample nodes, sample node attributes corresponding to the sample nodes, and edges between the sample nodes. The label corresponding to a graph data sample may be a label corresponding to a user sample node or a label corresponding to an edge between sample nodes. And labels corresponding to the user sample nodes, such as user click promotion information or user loan default, and the like. And labels corresponding to the edges between the sample nodes, such as promotion information of reading the public identity j by the user i, or transaction products provided by purchasing the public identity j by the user i.
Specifically, the computer device may obtain the map data samples and corresponding tags from a local map database, or obtain the map data samples and corresponding tags stored by other devices, such as a map database system, through network communication or the like. In one embodiment, one graph data sample may correspond to multiple tags.
S704, inputting the graph data sample into a machine learning model.
In particular, the computer device may input the acquired graph data samples into a machine learning model.
In one embodiment, a computer device may obtain a graph data sample comprised of a plurality of user sample nodes and a plurality of public identity sample nodes together with edges between the sample nodes. Inputting the graph data sample into a machine learning model, extracting the required data by the machine learning model, and storing the data into an HDFS (Hadoop Distributed File System) Distributed storage environment. The computer equipment can store the model parameters in a Parameter service node of a Parameter Server, so that the model parameters can be updated iteratively and rapidly in the model training process.
S706, determining an intermediate behavior prediction result corresponding to the user sample node in the sample node based on the sample node attribute and the edges among the sample nodes included in the graph data sample through a machine learning model.
And the intermediate behavior prediction result is a behavior prediction result corresponding to a user sample node output by the machine learning model after the graph data sample is input to the machine learning model in the model training process. The intermediate behavior prediction result corresponding to the user sample node may be specifically an intermediate prediction result of a self behavior of the sample user or an intermediate prediction result of the sample user for a sample public identification behavior. For example, the classification result may be an intermediate prediction classification result for a user sample node, or an intermediate prediction classification result for an edge between sample nodes.
Specifically, after the computer device inputs the graph data sample into the machine learning model, the machine learning model can determine the relationship between the sample node and the sample node according to the edges between the sample nodes included in the graph data sample. For example, after the machine learning model determines a certain user sample node, the user sample node and/or the public identity sample node adjacent to the user sample node may be determined according to the edges between the sample nodes. The machine learning model can fully utilize the sample node attribute, the content of the user sample node and/or the public identification sample node related to the sample node to determine the intermediate behavior prediction content corresponding to the user sample node. The sample node related to the sample node may be a sample node adjacent to the sample node, or may be a two-degree sample node or other multi-degree sample node of the sample node. The two-degree sample node is a sample node adjacent to an adjacent sample node of the sample nodes, and the multi-degree sample node is a sample node connected with the sample nodes through a plurality of edges as the name implies.
And S708, according to the difference between the intermediate behavior prediction result and the label, adjusting the model parameters of the machine learning model and continuing training until the training stopping condition is met, and finishing the training.
Wherein the training stop condition is a condition for ending the model training. The training stopping condition may be that a preset number of iterations is reached, or that the classification performance index of the machine learning model after the model parameters are adjusted reaches a preset index. The model parameters of the machine learning model are adjusted.
Specifically, the computer device may compare the intermediate behavior prediction results to the difference in the labels to adjust model parameters of the machine learning model at a preset learning rate in a direction to reduce the difference. And if the training stopping condition is not met after the model parameters are adjusted, returning to the step S706 to continue training until the training stopping condition is met, and ending the training.
In one embodiment, the difference between the intermediate behavior prediction result and the label may be measured by a loss function. The loss function is a function of model parameters, which can measure the difference between the intermediate behavior prediction result of the machine learning model and the label. The computer device may end the training when the value of the loss function is less than a preset value, resulting in a machine learning model for classifying the graph data. Functions such as cross entropy or mean square error may be selected as the loss function.
In one embodiment, when the intermediate behavior prediction result is an intermediate prediction result of the sample user's own behavior, p is usediRepresenting the intermediate prediction result of the self behavior of the sample user; by yiRepresenting the corresponding label of the graph data sample. Then the loss function Lt=∑il(pi t,yi) Wherein, the intermediate prediction result p of the self-behavior of the sample user corresponding to the sample user node of the current iterationi tModel parameter w from previous iterationt-1And (4) correlating.
In one embodiment, p is used when the intermediate behavior prediction result is an intermediate prediction result of a sample user identifying behavior with respect to a sample publici,jRepresenting an intermediate prediction result of a sample user for a sample public identity behavior; by yiRepresenting the corresponding label of the graph data sample. Then the loss function Lt=∑i,jl(pt i,j,yi,j) Wherein, the intermediate prediction result p of the sample user for the sample public identification behavior at the current iterationt i,jModel parameter w from previous iterationt-1And (4) correlating.
In one embodiment, the computer device may update the parameter W according to a gradient descent method of the loss functiont
Where Δ is a learning rate of gradient descent, and may be determined by experience, Cross-validation, or the like. In the process of training machine learning modelAnd ending the training when the training stopping condition is met. W obtained at this timetThe trained model parameters are saved for the trained model parameters of the machine learning model.
The model training method for data mining inputs a graph data sample comprising sample node attributes and edges among sample nodes into a machine learning model, and determines an intermediate behavior prediction result corresponding to the user sample nodes through the machine learning model. The sample nodes comprise user sample nodes and public identification sample nodes. Because the graph data sample comprises the sample node attributes and the edges among the sample nodes, the machine learning model can make full use of the relationship information among the sample nodes in the graph data sample, the attribute information of the corresponding sample nodes and the like. The relationship information between the sample nodes, such as the relationship information between the user sample nodes and the user sample nodes, the relationship information between the user sample nodes and the public identification sample nodes, can fully show the behavior habits or the preferences of the sample users. Therefore, when the machine learning model analyzes the graph data sample, comprehensive and accurate data characteristics can be extracted, and then model parameters of the machine learning model are continuously adjusted and training is continued according to the difference between the intermediate behavior prediction result and the corresponding label of the graph data sample until the training stopping condition is met, so that the training is finished. The machine learning model trained in the way can predict the accurate behavior result of the user, so that the accuracy and effectiveness of model training are greatly improved, and the accuracy of the subsequent data mining result is further improved.
In one embodiment, step S706 specifically includes the steps of: iteratively calculating corresponding implicit characteristic vectors of the sample nodes in the graph data samples through a machine learning model based on the sample node attributes included in the graph data samples and edges among the sample nodes; and calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
In particular, the computer device may determine sample nodes associated with the sample nodes based on edges between the sample nodes. For example, the computer device may determine sample nodes that are adjacent to the sample node by an edge. Alternatively, the computer device may also determine a two-degree sample node, a three-degree sample node, or other multi-degree sample nodes of the sample node, etc. according to edges in the graph data sample. The computer equipment can jointly iterate and calculate the corresponding implicit characteristic vector of each sample node in the graph data sample according to the sample node attribute of the sample node included in the graph data sample and the sample node attribute of the sample node related to the sample node through a machine learning model. And the computer equipment calculates according to the implicit characteristic vector obtained by iterative calculation and outputs an intermediate behavior prediction result corresponding to the user sample node in the sample node.
In one embodiment, the computer device calculates, for each sample node in the graph data sample, an implicit feature vector of a current iteration of each sample node according to a corresponding sample node attribute, an implicit feature vector of a previous iteration, and an implicit feature vector of a previous iteration of a sample node adjacent to an edge, in combination with a model parameter obtained by previous adjustment, through a first neural network of the machine learning model. And calculating according to the implicit characteristic vector obtained by iterative calculation, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node of the current iteration. And adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result of the current iteration and the label, and continuing training until the training stopping condition is met.
In one embodiment, the computer device may jointly calculate, for each sample node in the graph data sample through the first neural network of the machine learning model, an implicit feature vector of a current iteration of each sample node according to a corresponding sample node attribute, an implicit feature vector of a previous iteration of a sample node adjacent to the edge, and an implicit feature vector of a previous iteration of a second degree node of the sample node, in combination with the model parameter obtained through previous adjustment. And calculating according to the implicit characteristic vector obtained by iterative calculation, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node of the current iteration. And adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result of the current iteration and the label, and continuing training until the training stopping condition is met.
In one embodiment, when the first neural network of the machine learning model initially calculates the implicit feature vector of each sample node in the current iteration, the implicit feature vector of the previous iteration of the initial iteration may be assumed to be a random value. That is, at the beginning of the algorithm iteration, an initial random value is set for the implicit feature vector of the previous iteration of the sample node, and the implicit feature vector of the previous iteration of the sample node adjacent to the edge or the implicit feature vector of the previous iteration of the two-degree sample node of the sample node. And performing iterative computation by taking the current implicit characteristic vector of the computed sample node as the previous implicit characteristic vector of the next iterative computation.
In the above embodiment, through a machine learning model, based on the sample node attributes included in the graph data sample and the edges between the sample nodes, the implicit feature vector corresponding to the sample node in the graph data is iteratively calculated, and then the intermediate behavior prediction result corresponding to the user sample node is output according to the implicit feature vector. Therefore, the data characteristics of the graph data samples are learned through the machine learning model, the structured graph data samples are converted into the implicit characteristic vectors, and the intermediate behavior prediction results corresponding to the user sample nodes are calculated according to the implicit characteristic vectors, so that the intermediate behavior prediction results corresponding to the user sample nodes are more accurate.
In one embodiment, the step of iteratively calculating, by a machine learning model, implicit feature vectors corresponding to sample nodes in the graph data sample based on the sample node attributes included in the graph data sample and edges between the sample nodes specifically includes: and calculating the implicit characteristic vector of each sample node in the current iteration of each sample node according to the corresponding sample node attribute, the implicit characteristic vector of the previous iteration and the implicit characteristic vector of the previous iteration of the sample node adjacent to the edge through a machine learning model and in combination with the model parameter obtained by the previous adjustment. Calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to a user sample node in the sample node specifically comprises the following steps: and calculating according to the implicit characteristic vector obtained by the current iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
Specifically, the computer device may calculate, for each sample node in the graph data sample, the implicit feature vector of the current iteration of each sample node according to the corresponding sample node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration of the sample node adjacent to the edge, in combination with the model parameter obtained by the previous adjustment, through the machine learning model. And calculating according to the implicit characteristic vector obtained by the current iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node. And adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result of the current iteration and the label, and continuing training until the training stopping condition is met.
In one embodiment, the computer device may calculate the implicit feature vector for each sample node for the current iteration by:wherein,is an implicit feature vector of the sample node n current iteration;andrespectively obtaining model parameters obtained by previous adjustment;is an implicit feature vector of the previous iteration of the sample node n; dnIs the node attribute corresponding to the sample node n; k is a radical of{n,l}1 denotes that sample node n is adjacent to sample node l;is the sum of the implicit eigenvectors of the previous iteration of all sample nodes l adjacent to the sample node n; f represents a mapping relationship.
In one embodiment, for a user sample node, the computer device may calculate an implicit feature vector for the current iteration of the user sample node using the following formula
Wherein,is an implicit feature vector of the current iteration of the user sample node;andrespectively obtaining model parameters obtained by previous adjustment;is an implicit feature vector of the previous iteration of the user sample node; x is the number ofiIs the corresponding node attribute of the user sample node;is an implicit feature vector of a previous iteration of a user sample node adjacent to the user sample node;the sum of the implicit characteristic vectors of the previous iteration of all the user sample nodes adjacent to the user sample node;is an implicit feature vector of the previous iteration of the public identity sample node adjacent to the user sample node;is the sum of the implicit feature vectors of the previous iteration of all public identity sample nodes adjacent to the user sample node; f. of3Representing a mapping relationship.
In one embodiment, for the public identity sample node, the computer device may calculate the implicit feature vector for the current iteration of the user sample node using the following formula
Wherein,is an implicit feature vector of the current iteration of the public identity sample node;andrespectively the previous adjustmentObtaining model parameters;is an implicit feature vector of the previous iteration of the public identity sample node; gjIs the corresponding node attribute of the public identification sample node;is an implicit feature vector of the previous iteration of the user sample node adjacent to the public identity sample node;is the sum of the implicit feature vectors of the previous iteration of all user sample nodes adjacent to the public identity sample node; f. of4Representing a mapping relationship.
In one embodiment, the computer device determines the implicit feature vector for the current iteration based on the user sample nodesAnd implicit feature vectors of the current iteration of the public identification sample nodeTo calculate an intermediate behavior prediction result, such as an intermediate prediction result of a sample user's own behaviorOr intermediate prediction results of sample user aiming at sample public identification behaviorsWherein the secondary iteration is based on the intermediate prediction result of the user's own behaviorCan be calculated by the following formula:wherein,an implicit feature vector representing the current iteration of the user sample node;is the model parameter obtained from the previous adjustment; f. of1Representing a mapping relationship. Intermediate prediction result of sample user for sample public identification behaviors in current iterationCan be calculated by the following formula:wherein,an implicit feature vector representing the current iteration of the user sample node;implicit feature vectors representing the current iteration of the public identity sample nodes;the sum of the implicit characteristic vectors of the current iteration of all public identification sample nodes adjacent to the user sample node;andrespectively obtaining model parameters obtained by previous adjustment; f. of2Representing a mapping relationship.
Further, when the intermediate behavior prediction result is the behavior prediction result of the sample user, the loss function in the sub-iteration process may be represented as:when the intermediate behavior prediction result is a prediction result of the sample user for the sample public identity behavior, the loss function in the secondary iteration process can be expressed as: then, according to the gradient descent method of the loss function, the parameter W is updatedt. And finishing the training when the model training meets the training stopping condition, and storing the model parameters obtained when the training is finished.
In one embodiment, the training stop condition is that the number of iterative computations reaches a preset number, such as T times. And in each iterative calculation period, the computer equipment calculates the data of the current iteration according to the data of the previous iterative calculation, and adjusts the model parameters according to the difference between the intermediate behavior prediction result and the label until the iterative calculation reaches the preset times.
In the above embodiment, for each sample node in the graph data sample, the implicit feature vector of the current iteration of each sample node is calculated through the machine learning model according to the corresponding sample node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration of the sample node adjacent to the edge, in combination with the model parameter obtained through the previous adjustment. And calculating and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node according to the implicit characteristic vector obtained by the current iterative calculation. Through continuous iteration, the calculated implicit characteristic vector corresponding to the sample node can completely reflect the sample node attribute related to the sample node, the edge information in the graph data sample and the like, and the sample node can be fully represented. In addition, in the model training process, the model parameters required to be updated are far less than those required to be updated in the traditional model training, and the model training efficiency is greatly improved.
As shown in FIG. 8, in a specific embodiment, a model training method for data mining includes the steps of:
s802, acquiring a graph data sample and a corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes.
S804, the graph data samples are input into a machine learning model.
And S806, calculating the hidden feature vector of each sample node in the current iteration of each sample node according to the corresponding sample node attribute, the hidden feature vector of the previous iteration and the hidden feature vector of the previous iteration of the sample node adjacent to the edge through the machine learning model and in combination with the model parameters obtained by the previous adjustment.
And S808, calculating according to the implicit characteristic vector obtained by the current iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
And S810, adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result and the label, and continuing training until the training stopping condition is met.
The model training method for data mining inputs a graph data sample comprising sample node attributes and edges among sample nodes into a machine learning model, and determines an intermediate behavior prediction result corresponding to the user sample nodes through the machine learning model. The sample nodes comprise user sample nodes and public identification sample nodes. Because the graph data sample comprises the sample node attributes and the edges among the sample nodes, the machine learning model can make full use of the relationship information among the sample nodes in the graph data sample, the attribute information of the corresponding sample nodes and the like. The relationship information between the sample nodes, such as the relationship information between the user sample nodes and the user sample nodes, the relationship information between the user sample nodes and the public identification sample nodes, can fully show the behavior habits or the preferences of the sample users. Therefore, when the machine learning model analyzes the graph data sample, comprehensive and accurate data characteristics can be extracted, and then model parameters of the machine learning model are continuously adjusted and training is continued according to the difference between the intermediate behavior prediction result and the corresponding label of the graph data sample until the training stopping condition is met, so that the training is finished. The machine learning model trained in the way can predict the accurate behavior result of the user node, so that the accuracy and effectiveness of model training are greatly improved, and the accuracy of the subsequent data mining result is further improved.
FIG. 8 is a flow diagram that illustrates a method for model training for data mining, according to one embodiment. It should be understood that, although the steps in the flowchart of fig. 8 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 8 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In a specific application scenario, a user can log in a social application through a user account, and a platform where a public identifier is located can also push related promotion information, such as advertisements, news or transaction links, to the related user through the social application. The server of the social application records the user behavior and stores it in a relational database.
For example, a user may log in to a wechat application through a wechat account, and the user may choose to focus on a certain public identity, such as a public number. The platform where the public number is located can push messages such as advertisements, articles or news to all users who pay attention to the public number. The user can read articles pushed by the public number or click advertisements pushed by the public number. After the user logs in the WeChat application through the WeChat account, the user can also manage account numbers of other platforms through the small program in the WeChat, purchase online and offline products through WeChat payment, borrow money or manage money through the small program and the like. The background server of the WeChat application may record and store the corresponding user behavior.
As shown in FIG. 9, FIG. 9 illustrates a graph data-based data mining system architecture in one embodiment. The computer device may obtain data about user behavior stored in the relational database from a backend server of a corresponding social application, reassemble the data according to the structure of the knowledge graph, and store the data in a graph data format into a graph database.
And selecting the samples and the corresponding labels from the graph database by the computer equipment, and training the machine learning model through the parameter learning system. And initializing the parameter w randomly, and continuously adjusting and updating the model parameter w in the process of continuously training the model until the training stopping condition is met, so as to finish the training and obtain the trained model parameter. And the recommendation system or the decision system acquires the latest graph data from the graph database and determines a behavior prediction result related to the user through the trained machine learning model.
The behavioral prediction associated with the user may include a probability that the user clicked on an advertisement, purchased a product, or violated a loan. Based on the behavioral prediction results associated with the users, the recommendation system or the decision system may only show advertisements to all users having a probability of clicking on an advertisement, a probability of purchasing a product, or both, and deny loans to users having a probability of loan default greater than a threshold.
The user-related behavioral prediction result may also include a probability that a certain user purchases a certain public number offering product. According to the behavior prediction result related to the user, the recommendation system or the decision system can select and recommend the product provided by the public number, which enables the probability that the user purchases the product provided by the public number to be the highest, to any user.
As shown in fig. 10, in one embodiment, there is provided a graph data-based data mining apparatus 1000, comprising: an acquisition module 1001, an input module 1002, a determination module 1003, and a screening module 1004.
An obtaining module 1001 configured to obtain graph data; the graph data comprises node attributes and edges among nodes, and the nodes comprise user nodes and public identification nodes.
An input module 1002 is configured to input graph data into the trained machine learning model.
The determining module 1003 is configured to determine, through a machine learning model, a behavior prediction result corresponding to a user node in a node based on node attributes included in the graph data and edges between the nodes.
The screening module 1004 is configured to screen, from the user nodes in the graph data, user nodes whose corresponding behavior prediction results meet the data mining conditions.
The data mining device based on the graph data inputs the graph data including the node attributes and the edges among the nodes into the trained machine learning model, and determines the behavior prediction result corresponding to the user nodes in the nodes through the trained machine learning model. The nodes comprise user nodes and public identification nodes. Because the graph data comprises the node attributes and the edges among the nodes, the trained machine learning model can make full use of the relationship information among the nodes in the graph data, the attribute information of the corresponding nodes and the like. The relationship information between nodes, such as the relationship information between user nodes and user nodes, the relationship information between user nodes and public identification nodes, can fully show the behavior habits or preferences of users. Therefore, when the trained machine learning model analyzes the graph data, comprehensive and accurate data characteristics can be extracted, and an accurate behavior prediction result corresponding to the user node is obtained. And then according to the behavior prediction result, screening the user nodes which accord with the data mining condition, wherein the screened user nodes are the potential valuable user nodes excavated, and the accuracy of the data mining result is greatly improved.
As shown in fig. 11, in one embodiment, the obtaining module 1001 includes a reading module 1101 and a constructing module 1102:
a reading module 1101, configured to read a user identifier and a corresponding user attribute, a public identifier and a corresponding public identifier attribute, a user relationship between user identifiers, and a behavior relationship between user identifiers and public identifiers from a relational database;
and a constructing module 1102, configured to construct graph data according to the read user identifier and the corresponding user attribute, the public identifier and the corresponding public identifier attribute, the user relationship, and the behavior relationship.
In the embodiment, the graph data is constructed according to the user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship and the behavior relationship in the relational database, the data stored in a large number of two-dimensional row-column tables can be recombined, and the graph data with the heterogeneous structure can be conveniently and quickly constructed.
In one embodiment, the building module 1102 is further configured to build a user node and a corresponding node attribute in the graph data according to the read user identifier and the corresponding user attribute; according to the read public identification and the corresponding public identification attribute, public identification nodes and corresponding node attributes in the graph data are constructed; according to the read user relationship, edges among the user nodes in the graph data are constructed; and according to the read behavior relation, constructing edges between the user nodes and the public identification nodes in the graph data.
In the above embodiment, the user node and the corresponding node attribute in the graph data are constructed according to the user identifier and the corresponding user attribute. And constructing public identification nodes and corresponding node attributes in the graph data according to the public identifications and the corresponding public identification attributes. And respectively constructing edges among the nodes in the graph data according to the user relationship or the behavior relationship. The graph data constructed in the way can fully represent respective node attributes of the user node and the public identification node and the relationship between the nodes, and can conveniently and quickly organize important data in a plurality of relational data to convert the important data into corresponding graph data, so that subsequent data mining can be smoothly carried out.
As shown in fig. 12, in one embodiment, the determining module 1003 includes a calculating module 1201 and an outputting module 1202:
the calculating module 1201 is configured to iteratively calculate, through a machine learning model, implicit feature vectors corresponding to nodes in the graph data based on the node attributes and edges between the nodes included in the graph data.
And the output module 1202 is configured to perform calculation according to the implicit feature vector obtained through iterative calculation through a machine learning model, and output a behavior prediction result corresponding to the user node in the node.
In the above embodiment, the implicit feature vectors corresponding to the nodes in the graph data are iteratively calculated through the machine learning model based on the node attributes included in the graph data and the edges between the nodes, and then the behavior prediction results corresponding to the user nodes are output according to the implicit feature vectors. Therefore, the data characteristics of the graph data are learned through the machine learning model, the structured graph data are converted into the implicit characteristic vectors, and the behavior prediction results corresponding to the user nodes are calculated according to the implicit characteristic vectors, so that the behavior prediction results corresponding to the user nodes are more accurate.
In one embodiment, the calculation module 1201 is further configured to calculate, for each node in the graph data, an implicit feature vector of a current iteration of each node according to the corresponding node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration of the node adjacent to the passing edge, through the machine learning model, until the implicit feature vector of the current iteration meets the iteration stop condition.
In the above embodiment, for each node in the graph data, the implicit feature vector of the current iteration of each node is calculated through the machine learning model according to the corresponding node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration of the node adjacent to the edge until the implicit feature vector of the current iteration meets the iteration stop condition. Through continuous iteration, the calculated implicit characteristic vectors corresponding to the nodes can completely reflect node attributes related to the nodes, edge information in graph data and the like, and the nodes can be fully represented.
In one embodiment, the calculation module is further configured to calculate the implicit feature vector for each node in the current iteration by the following formula:
wherein,is an implicit feature vector of the node n in the current iteration; w is a1、w2And w3Respectively, model parameters;is an implicit eigenvector of the previous iteration of the node n; dnIs the node attribute corresponding to node n; k is a radical of{n,l}1 means that node n and node l are adjacent;is the sum of all implicit eigenvectors of the previous iteration of the node l adjacent to the node n; f represents a mapping relationship.
In the above embodiment, the implicit characteristic vector of each node in the current iteration is iteratively calculated according to the corresponding formula, and through continuous iteration, the calculated implicit characteristic vector corresponding to the node can completely reflect the node attribute related to the node, the edge information in the graph data, and the like, so that the node can be fully represented.
As shown in fig. 13, in an embodiment, when the behavior prediction result is a prediction probability of the user for the public identity behavior, the screening module 1004 is further configured to screen the user node and the corresponding public identity node from among the user node and the public identity node included in the graph data, and the behavior prediction result corresponding to both the screened user node and the corresponding public identity node meets the data mining condition. The graph data-based data mining apparatus 1000 further includes an execution module 1005: an executing module 1005, configured to execute, for the screened user node, a service operation related to the screened public identity node.
In the above embodiment, by screening the user node whose behavior prediction result meets the data mining condition and the corresponding public identity node, the service operation related to the screened public identity node can be executed for the screened user node, so as to implement the service operation related to both the user node and the public identity node.
As shown in FIG. 14, in one embodiment, a model training apparatus 1400 for data mining is provided, comprising: an acquisition module 1401, an input module 1402, a determination module 1403, and an adjustment module 1404.
An obtaining module 1401, configured to obtain a graph data sample and a corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes.
An input module 1402 for inputting the graph data samples into a machine learning model.
A determining module 1403, configured to determine, through a machine learning model, an intermediate behavior prediction result corresponding to a user sample node in a sample node based on the sample node attribute included in the graph data sample and the edge between the sample nodes.
And an adjusting module 1404, configured to adjust model parameters of the machine learning model according to a difference between the intermediate behavior prediction result and the label, and continue training until the training stop condition is met.
The model training method for data mining inputs a graph data sample comprising sample node attributes and edges among sample nodes into a machine learning model, and determines an intermediate behavior prediction result corresponding to the user sample nodes through the machine learning model. The sample nodes comprise user sample nodes and public identification sample nodes. Because the graph data sample comprises the sample node attributes and the edges among the sample nodes, the machine learning model can make full use of the relationship information among the sample nodes in the graph data sample, the attribute information of the corresponding sample nodes and the like. The relationship information between the sample nodes, such as the relationship information between the user sample nodes and the user sample nodes, the relationship information between the user sample nodes and the public identification sample nodes, can fully show the behavior habits or the preferences of the sample users. Therefore, when the machine learning model analyzes the graph data sample, comprehensive and accurate data characteristics can be extracted, and then model parameters of the machine learning model are continuously adjusted and training is continued according to the difference between the intermediate behavior prediction result and the corresponding label of the graph data sample until the training stopping condition is met, so that the training is finished. The machine learning model trained in the way can predict the accurate behavior result of the user, so that the accuracy and effectiveness of model training are greatly improved, and the accuracy of the subsequent data mining result is further improved.
As shown in fig. 15, in one embodiment, determination module 1403 includes calculation module 1501 and output module 1502:
the calculating module 1501 is configured to iteratively calculate, through a machine learning model, implicit feature vectors corresponding to sample nodes in the graph data sample based on the sample node attributes included in the graph data sample and edges between the sample nodes.
The output module 1502 is configured to perform calculation according to the implicit feature vector obtained through iterative calculation through a machine learning model, and output an intermediate behavior prediction result corresponding to the user sample node in the sample node.
In the above embodiment, through a machine learning model, based on the sample node attributes included in the graph data sample and the edges between the sample nodes, the implicit feature vector corresponding to the sample node in the graph data is iteratively calculated, and then the intermediate behavior prediction result corresponding to the user sample node is output according to the implicit feature vector. Therefore, the data characteristics of the graph data samples are learned through the machine learning model, the structured graph data samples are converted into the implicit characteristic vectors, and the intermediate behavior prediction results corresponding to the user sample nodes are calculated according to the implicit characteristic vectors, so that the intermediate behavior prediction results corresponding to the user sample nodes are more accurate.
In an embodiment, the calculating module 1501 is further configured to calculate, through a machine learning model, for each sample node in the graph data sample, an implicit feature vector of the current iteration of each sample node according to the corresponding sample node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration of the sample node adjacent to the edge, in combination with the model parameter obtained through the previous adjustment. The output module 1502 is further configured to perform calculation according to the implicit feature vector obtained by the current iterative calculation through a machine learning model, and output an intermediate behavior prediction result corresponding to the user sample node in the sample node.
In the above embodiment, for each sample node in the graph data sample, the implicit feature vector of the current iteration of each sample node is calculated through the machine learning model according to the corresponding sample node attribute, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration of the sample node adjacent to the edge, in combination with the model parameter obtained through the previous adjustment. And calculating and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node according to the implicit characteristic vector obtained by the current iterative calculation. Through continuous iteration, the calculated implicit characteristic vector corresponding to the sample node can completely reflect the sample node attribute related to the sample node, the edge information in the graph data sample and the like, and the sample node can be fully represented. In addition, in the model training process, the model parameters required to be updated are far less than those required to be updated in the traditional model training, and the model training efficiency is greatly improved.
In one embodiment, the graph data-based data mining apparatus and/or the model training apparatus for data mining provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 1. The memory of the computer device may store various program modules that make up the graph data-based data mining apparatus and/or the model training apparatus for data mining, such as the acquisition module, the input module, the determination module, and the filtering module shown in FIG. 10. The computer program constituted by the respective program modules causes the processor to execute the steps in the graph data-based data mining method according to the respective embodiments of the present application described in the present specification. Also for example, the obtaining module, the inputting module, the determining module, and the adjusting module shown in fig. 14. The program modules constitute computer programs that cause the processors to perform the steps of the model training methods for data mining of the various embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 1 may execute step S202 by an acquisition module in the graph data-based data mining apparatus shown in fig. 10. The computer device may perform step S204 through the input module. The computer device may perform step S206 by the determination module. The computer device may perform step S208 through the filtering module.
For example, the computer device shown in fig. 1 may execute step S702 through an acquisition module in the model training apparatus for data mining as shown in fig. 14. The computer device may perform step S704 through the input module. The computer device may perform step S706 by the determination module. The computer device may perform step S708 through the adjustment module.
In one embodiment, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of: acquiring graph data; the graph data comprises node attributes and edges among the nodes, and the nodes comprise user nodes and public identification nodes; inputting graph data into a trained machine learning model; determining a behavior prediction result corresponding to a user node in the node based on the node attribute and the edges among the nodes included in the graph data through a machine learning model; and screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
In one embodiment, the computer program causes the processor, when executing the step of obtaining the graph data, to perform in particular the steps of: reading user identification and corresponding user attribute, public identification and corresponding public identification attribute, user relationship between the user identifications and behavior relationship between the user identifications and the public identifications from a relational database; and constructing graph data according to the read user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship and the behavior relationship.
In one embodiment, the computer program causes the processor to specifically perform the following steps when executing the step of constructing graph data according to the read user identifier and corresponding user attribute, public identifier and corresponding public identifier attribute, user relationship and behavior relationship: according to the read user identification and the corresponding user attribute, constructing a user node and a corresponding node attribute in the graph data; according to the read public identification and the corresponding public identification attribute, public identification nodes and corresponding node attributes in the graph data are constructed; according to the read user relationship, edges among the user nodes in the graph data are constructed; and according to the read behavior relation, constructing edges between the user nodes and the public identification nodes in the graph data.
In one embodiment, the computer program causes the processor to specifically perform the following steps when executing the step of determining the behavior prediction result corresponding to the user node in the node based on the node attribute included in the graph data and the edge between the nodes by using the machine learning model: iteratively calculating corresponding implicit characteristic vectors of the nodes in the graph data based on node attributes and edges among the nodes included in the graph data through a machine learning model; and calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting a behavior prediction result corresponding to the user node in the node.
In one embodiment, the computer program causes the processor to specifically perform the following steps when performing the step of iteratively calculating, by the machine learning model, the implicit feature vector corresponding to a node in the graph data based on the node attribute included in the graph data and the edge between the nodes: and calculating the implicit characteristic vector of each node in the current iteration according to the corresponding node attribute, the implicit characteristic vector of the previous iteration and the implicit characteristic vector of the previous iteration of the node adjacent to the edge for each node in the graph data through a machine learning model until the implicit characteristic vector of the current iteration meets the iteration stop condition.
In one embodiment, the implicit feature vector for each node's current iteration is calculated by the following formula:
wherein,is an implicit feature vector of the node n in the current iteration; w is a1、w2And w3Respectively, model parameters;is an implicit eigenvector of the previous iteration of the node n; dnIs the node attribute corresponding to node n; k is a radical of{n,l}1 means that node n and node l are adjacent;is the sum of all implicit eigenvectors of the previous iteration of the node l adjacent to the node n; f represents a mapping relationship.
In one embodiment, when the behavior prediction result is a user behavior prediction probability for public identification, the computer program causes the processor to specifically perform the following steps when performing the step of screening user nodes whose corresponding behavior prediction results meet the data mining condition from among the user nodes included in the graph data: screening user nodes and corresponding public identification nodes from the user nodes and the public identification nodes included in the graph data, wherein behavior prediction results corresponding to the screened user nodes and the corresponding public identification nodes jointly accord with data mining conditions; the computer program causes the processor to perform the steps of: and executing service operation related to the screened public identification node aiming at the screened user node.
The computer device inputs graph data including node attributes and edges between nodes into the trained machine learning model, and determines behavior prediction results corresponding to user nodes in the nodes through the trained machine learning model. The nodes comprise user nodes and public identification nodes. Because the graph data comprises the node attributes and the edges among the nodes, the trained machine learning model can make full use of the relationship information among the nodes in the graph data, the attribute information of the corresponding nodes and the like. The relationship information between nodes, such as the relationship information between user nodes and user nodes, the relationship information between user nodes and public identification nodes, can fully show the behavior habits or preferences of users. Therefore, when the trained machine learning model analyzes the graph data, comprehensive and accurate data characteristics can be extracted, and an accurate behavior prediction result corresponding to the user node is obtained. And then according to the behavior prediction result, screening the user nodes which accord with the data mining condition, wherein the screened user nodes are the potential valuable user nodes excavated, and the accuracy of the data mining result is greatly improved.
In one embodiment, there is provided a computer device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of: acquiring a graph data sample and a corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes; inputting graph data samples into a machine learning model; determining an intermediate behavior prediction result corresponding to a user sample node in the sample node based on the sample node attribute and the edges among the sample nodes included in the graph data sample through a machine learning model; and adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result and the label, and continuing training until the training stopping condition is met.
In one embodiment, the computer program causes the processor to specifically perform the following steps when executing the step of determining an intermediate behavior prediction result corresponding to a user sample node in a sample node based on sample node attributes included in a graph data sample and edges between the sample nodes through a machine learning model: iteratively calculating corresponding implicit characteristic vectors of the sample nodes in the graph data samples through a machine learning model based on the sample node attributes included in the graph data samples and edges among the sample nodes; and calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
In one embodiment, the computer program causes the processor to perform the following steps in performing the step of iteratively calculating, by the machine learning model, the corresponding implicit feature vectors of the sample nodes in the graph data sample based on the sample node attributes included in the graph data sample and the edges between the sample nodes: calculating the hidden feature vector of each sample node in the current iteration of each sample node according to the corresponding sample node attribute, the hidden feature vector of the previous iteration and the hidden feature vector of the previous iteration of the sample node adjacent to the edge through a machine learning model and in combination with the model parameter obtained by the previous adjustment; the computer program enables the processor to specifically execute the following steps when executing the steps of calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model and outputting an intermediate behavior prediction result corresponding to a user sample node in a sample node: and calculating according to the implicit characteristic vector obtained by the current iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
The computer equipment inputs the graph data samples comprising the sample node attributes and the edges among the sample nodes into the machine learning model, and determines the intermediate behavior prediction result corresponding to the user sample node through the machine learning model. The sample nodes comprise user sample nodes and public identification sample nodes. Because the graph data sample comprises the sample node attributes and the edges among the sample nodes, the machine learning model can make full use of the relationship information among the sample nodes in the graph data sample, the attribute information of the corresponding sample nodes and the like. The relationship information between the sample nodes, such as the relationship information between the user sample nodes and the user sample nodes, the relationship information between the user sample nodes and the public identification sample nodes, can fully show the behavior habits or the preferences of the sample users. Therefore, when the machine learning model analyzes the graph data sample, comprehensive and accurate data characteristics can be extracted, and then model parameters of the machine learning model are continuously adjusted and training is continued according to the difference between the intermediate behavior prediction result and the corresponding label of the graph data sample until the training stopping condition is met, so that the training is finished. The machine learning model trained in the way can predict the accurate behavior result of the user node, so that the accuracy and effectiveness of model training are greatly improved, and the accuracy of the subsequent data mining result is further improved.
A computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of: acquiring graph data; the graph data comprises node attributes and edges among the nodes, and the nodes comprise user nodes and public identification nodes; inputting graph data into a trained machine learning model; determining a behavior prediction result corresponding to a user node in the node based on the node attribute and the edges among the nodes included in the graph data through a machine learning model; and screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
In one embodiment, the computer program causes the processor, when executing the step of obtaining the graph data, to perform in particular the steps of: reading user identification and corresponding user attribute, public identification and corresponding public identification attribute, user relationship between the user identifications and behavior relationship between the user identifications and the public identifications from a relational database; and constructing graph data according to the read user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship and the behavior relationship.
In one embodiment, the computer program causes the processor to specifically perform the following steps when executing the step of constructing graph data according to the read user identifier and corresponding user attribute, public identifier and corresponding public identifier attribute, user relationship and behavior relationship: according to the read user identification and the corresponding user attribute, constructing a user node and a corresponding node attribute in the graph data; according to the read public identification and the corresponding public identification attribute, public identification nodes and corresponding node attributes in the graph data are constructed; according to the read user relationship, edges among the user nodes in the graph data are constructed; and according to the read behavior relation, constructing edges between the user nodes and the public identification nodes in the graph data.
In one embodiment, the computer program causes the processor to specifically perform the following steps when executing the step of determining the behavior prediction result corresponding to the user node in the node based on the node attribute included in the graph data and the edge between the nodes by using the machine learning model: iteratively calculating corresponding implicit characteristic vectors of the nodes in the graph data based on node attributes and edges among the nodes included in the graph data through a machine learning model; and calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting a behavior prediction result corresponding to the user node in the node.
In one embodiment, the computer program causes the processor to specifically perform the following steps when performing the step of iteratively calculating, by the machine learning model, the implicit feature vector corresponding to a node in the graph data based on the node attribute included in the graph data and the edge between the nodes: and calculating the implicit characteristic vector of each node in the current iteration according to the corresponding node attribute, the implicit characteristic vector of the previous iteration and the implicit characteristic vector of the previous iteration of the node adjacent to the edge for each node in the graph data through a machine learning model until the implicit characteristic vector of the current iteration meets the iteration stop condition.
In one embodiment, the implicit feature vector for each node's current iteration is calculated by the following formula:
wherein,is an implicit feature vector of the node n in the current iteration; w is a1、w2And w3Respectively, model parameters;is an implicit eigenvector of the previous iteration of the node n; dnIs the node attribute corresponding to node n; k is a radical of{n,l}1 means that node n and node l are adjacent;is the sum of all implicit eigenvectors of the previous iteration of the node l adjacent to the node n; f represents a mapping relationship.
In one embodiment, when the behavior prediction result is a user behavior prediction probability for public identification, the computer program causes the processor to specifically perform the following steps when performing the step of screening user nodes whose corresponding behavior prediction results meet the data mining condition from among the user nodes included in the graph data: screening user nodes and corresponding public identification nodes from the user nodes and the public identification nodes included in the graph data, wherein behavior prediction results corresponding to the screened user nodes and the corresponding public identification nodes jointly accord with data mining conditions; the computer program causes the processor to perform the steps of: and executing service operation related to the screened public identification node aiming at the screened user node.
The computer-readable storage medium inputs graph data including node attributes and edges between nodes into a trained machine learning model, and determines behavior prediction results corresponding to user nodes in the nodes through the trained machine learning model. The nodes comprise user nodes and public identification nodes. Because the graph data comprises the node attributes and the edges among the nodes, the trained machine learning model can make full use of the relationship information among the nodes in the graph data, the attribute information of the corresponding nodes and the like. The relationship information between nodes, such as the relationship information between user nodes and user nodes, the relationship information between user nodes and public identification nodes, can fully show the behavior habits or preferences of users. Therefore, when the trained machine learning model analyzes the graph data, comprehensive and accurate data characteristics can be extracted, and an accurate behavior prediction result corresponding to the user node is obtained. And then according to the behavior prediction result, screening the user nodes which accord with the data mining condition, wherein the screened user nodes are the potential valuable user nodes excavated, and the accuracy of the data mining result is greatly improved.
A computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of: acquiring a graph data sample and a corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes; inputting graph data samples into a machine learning model; determining an intermediate behavior prediction result corresponding to a user sample node in the sample node based on the sample node attribute and the edges among the sample nodes included in the graph data sample through a machine learning model; and adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result and the label, and continuing training until the training stopping condition is met.
In one embodiment, the computer program causes the processor to specifically perform the following steps when executing the step of determining an intermediate behavior prediction result corresponding to a user sample node in a sample node based on sample node attributes included in a graph data sample and edges between the sample nodes through a machine learning model: iteratively calculating corresponding implicit characteristic vectors of the sample nodes in the graph data samples through a machine learning model based on the sample node attributes included in the graph data samples and edges among the sample nodes; and calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
In one embodiment, the computer program causes the processor to perform the following steps in performing the step of iteratively calculating, by the machine learning model, the corresponding implicit feature vectors of the sample nodes in the graph data sample based on the sample node attributes included in the graph data sample and the edges between the sample nodes: calculating the hidden feature vector of each sample node in the current iteration of each sample node according to the corresponding sample node attribute, the hidden feature vector of the previous iteration and the hidden feature vector of the previous iteration of the sample node adjacent to the edge through a machine learning model and in combination with the model parameter obtained by the previous adjustment; the computer program enables the processor to specifically execute the following steps when executing the steps of calculating according to the implicit characteristic vector obtained by iterative calculation through a machine learning model and outputting an intermediate behavior prediction result corresponding to a user sample node in a sample node: and calculating according to the implicit characteristic vector obtained by the current iterative calculation through a machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
The computer-readable storage medium inputs the graph data sample including the sample node attributes and the edges between the sample nodes into the machine learning model, and determines the intermediate behavior prediction result corresponding to the user sample node through the machine learning model. The sample nodes comprise user sample nodes and public identification sample nodes. Because the graph data sample comprises the sample node attributes and the edges among the sample nodes, the machine learning model can make full use of the relationship information among the sample nodes in the graph data sample, the attribute information of the corresponding sample nodes and the like. The relationship information between the sample nodes, such as the relationship information between the user sample nodes and the user sample nodes, the relationship information between the user sample nodes and the public identification sample nodes, can fully show the behavior habits or the preferences of the sample users. Therefore, when the machine learning model analyzes the graph data sample, comprehensive and accurate data characteristics can be extracted, and then model parameters of the machine learning model are continuously adjusted and training is continued according to the difference between the intermediate behavior prediction result and the corresponding label of the graph data sample until the training stopping condition is met, so that the training is finished. The machine learning model trained in the way can predict the accurate behavior result of the user node, so that the accuracy and effectiveness of model training are greatly improved, and the accuracy of the subsequent data mining result is further improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (15)

1. A method of graph data-based data mining, comprising:
acquiring graph data; the graph data comprises node attributes and edges among nodes, and the nodes comprise user nodes and public identification nodes;
inputting the graph data into a trained machine learning model;
determining a behavior prediction result corresponding to a user node in the node based on the node attribute and the edge between the nodes included in the graph data through the machine learning model;
and screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
2. The method of claim 1, wherein the obtaining graph data comprises:
reading user identification and corresponding user attribute, public identification and corresponding public identification attribute, user relationship between the user identifications and behavior relationship between the user identifications and the public identifications from a relational database;
and constructing graph data according to the read user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship and the behavior relationship.
3. The method according to claim 2, wherein the constructing graph data according to the read user identifier and corresponding user attribute, public identifier and corresponding public identifier attribute, the user relationship and the behavior relationship comprises:
according to the read user identification and the corresponding user attribute, constructing a user node and a corresponding node attribute in the graph data;
according to the read public identification and the corresponding public identification attribute, public identification nodes and corresponding node attributes in the graph data are constructed;
according to the read user relationship, edges among the user nodes in the graph data are constructed;
and according to the read behavior relation, constructing edges between the user nodes and the public identification nodes in the graph data.
4. The method according to claim 1, wherein the determining, by the machine learning model, a behavior prediction result corresponding to a user node in the node based on node attributes included in the graph data and edges between nodes comprises:
iteratively calculating corresponding implicit characteristic vectors of the nodes in the graph data based on the node attributes and edges among the nodes included in the graph data through the machine learning model;
and calculating according to the implicit characteristic vector obtained by iterative calculation through the machine learning model, and outputting a behavior prediction result corresponding to the user node in the node.
5. The method according to claim 4, wherein the iteratively calculating, through the machine learning model, implicit feature vectors corresponding to nodes in the graph data based on node attributes included in the graph data and edges between the nodes includes:
and calculating the implicit characteristic vector of each node in the current iteration according to the corresponding node attribute, the implicit characteristic vector of the previous iteration and the implicit characteristic vector of the previous iteration of the node adjacent to the edge for each node in the graph data through the machine learning model until the implicit characteristic vector of the current iteration meets the iteration stop condition.
6. The method of claim 5, wherein calculating the implicit feature vector for each node in the current iteration according to the corresponding node attributes, the implicit feature vector of the previous iteration, and the implicit feature vector of the previous iteration passing through the nodes adjacent to the edge comprises:
calculating the implicit characteristic vector of each node in the current iteration by the following formula:
wherein,is an implicit feature vector of the node n in the current iteration; w is a1、w2And w3Respectively, model parameters;is an implicit eigenvector of the previous iteration of the node n; dnIs the node attribute corresponding to node n; k is a radical of{n,l}1 means that node n and node l are adjacent;is the sum of all implicit eigenvectors of the previous iteration of the node l adjacent to the node n; f represents a mapping relationship.
7. The method according to any one of claims 1 to 6, wherein when the behavior prediction result is a user behavior prediction probability for a public identity, the screening, from user nodes included in the graph data, a user node whose corresponding behavior prediction result meets a data mining condition includes:
screening user nodes and corresponding public identification nodes from the user nodes and the public identification nodes included in the graph data, wherein behavior prediction results corresponding to the screened user nodes and the corresponding public identification nodes jointly accord with data mining conditions;
the method further comprises the following steps:
and executing service operation related to the screened public identification node aiming at the screened user node.
8. A model training method for data mining, comprising:
acquiring a graph data sample and a corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes;
inputting the graph data samples into a machine learning model;
determining an intermediate behavior prediction result corresponding to a user sample node in the sample node based on sample node attributes and edges among the sample nodes included in the graph data sample through the machine learning model;
and adjusting model parameters of the machine learning model according to the difference between the intermediate behavior prediction result and the label, and continuing training until the training stopping condition is met.
9. The method of claim 8, wherein the determining, by the machine learning model, an intermediate behavior prediction result corresponding to a user sample node in the sample node based on sample node attributes included in the graph data sample and edges between the sample nodes comprises:
iteratively calculating corresponding implicit characteristic vectors of the sample nodes in the graph data sample based on the sample node attributes included in the graph data sample and edges among the sample nodes through the machine learning model;
and calculating according to the implicit characteristic vector obtained by iterative calculation through the machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
10. The method of claim 9, wherein iteratively calculating, by the machine learning model, implicit feature vectors corresponding to sample nodes in the graph data samples based on sample node attributes included in the graph data samples and edges between the sample nodes comprises:
calculating the hidden feature vector of each sample node in the current iteration of each sample node according to the corresponding sample node attribute, the hidden feature vector of the previous iteration and the hidden feature vector of the previous iteration of the sample node adjacent to the edge through the machine learning model and in combination with the model parameter obtained by the previous adjustment;
the step of calculating according to the implicit characteristic vector obtained by iterative computation through the machine learning model and outputting the intermediate behavior prediction result corresponding to the user sample node in the sample node comprises the following steps:
and calculating according to the implicit characteristic vector obtained by the current iterative calculation through the machine learning model, and outputting an intermediate behavior prediction result corresponding to the user sample node in the sample node.
11. An apparatus for graph data-based data mining, the apparatus comprising:
the acquisition module is used for acquiring graph data; the graph data comprises node attributes and edges among nodes, and the nodes comprise user nodes and public identification nodes;
an input module for inputting the graph data into a trained machine learning model;
a determining module, configured to determine, through the machine learning model, a behavior prediction result corresponding to a user node in the node based on node attributes and edges between nodes included in the graph data;
and the screening module is used for screening the user nodes of which the corresponding behavior prediction results meet the data mining conditions from the user nodes in the graph data.
12. The apparatus of claim 11, wherein the obtaining module comprises a reading module and a building module:
the reading module is used for reading the user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship between the user identifications and the behavior relationship between the user identifications and the public identification from the relational database;
and the construction module is used for constructing graph data according to the read user identification and the corresponding user attribute, the public identification and the corresponding public identification attribute, the user relationship and the behavior relationship.
13. A model training apparatus for data mining, the apparatus comprising:
the acquisition module is used for acquiring the image data sample and the corresponding label; the graph data sample comprises sample node attributes and edges among the sample nodes, wherein the sample nodes comprise user sample nodes and public identification sample nodes;
an input module for inputting the graph data samples into a machine learning model;
the determining module is used for determining an intermediate behavior prediction result corresponding to a user sample node in the sample node based on the sample node attribute and the edges among the sample nodes included in the graph data sample through the machine learning model;
and the adjusting module is used for adjusting the model parameters of the machine learning model according to the difference between the intermediate behavior prediction result and the label and continuing training until the training stopping condition is met.
14. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 10.
15. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 10.
CN201810246990.5A 2018-03-23 2018-03-23 Data mining method and device based on graph data and model training method and device Active CN108491511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810246990.5A CN108491511B (en) 2018-03-23 2018-03-23 Data mining method and device based on graph data and model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810246990.5A CN108491511B (en) 2018-03-23 2018-03-23 Data mining method and device based on graph data and model training method and device

Publications (2)

Publication Number Publication Date
CN108491511A true CN108491511A (en) 2018-09-04
CN108491511B CN108491511B (en) 2022-03-18

Family

ID=63319545

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810246990.5A Active CN108491511B (en) 2018-03-23 2018-03-23 Data mining method and device based on graph data and model training method and device

Country Status (1)

Country Link
CN (1) CN108491511B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582831A (en) * 2018-10-16 2019-04-05 中国科学院计算机网络信息中心 A kind of chart database management system for supporting unstructured data storage and inquiry
CN109614542A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Public platform recommended method, device, computer equipment and storage medium
CN110457505A (en) * 2019-07-04 2019-11-15 特斯联(北京)科技有限公司 The method and apparatus for carrying out relation excavation based on chart database
CN111309815A (en) * 2018-12-12 2020-06-19 北京嘀嘀无限科技发展有限公司 Method and device for processing relation map and electronic equipment
WO2020147595A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system and device for obtaining relationship expression between entities, and advertisement recalling system
CN111444287A (en) * 2020-03-17 2020-07-24 北京齐尔布莱特科技有限公司 Graph database construction method, associated information query method, device and computing equipment
CN111783968A (en) * 2020-06-30 2020-10-16 山东信通电子股份有限公司 Power transmission line monitoring method and system based on cloud edge cooperation
WO2021047021A1 (en) * 2019-09-09 2021-03-18 平安科技(深圳)有限公司 Information mining method and apparatus, device, and storage medium
CN112601215A (en) * 2020-12-01 2021-04-02 深圳市和讯华谷信息技术有限公司 Method and device for unifying equipment identifications
WO2021169454A1 (en) * 2020-02-25 2021-09-02 支付宝(杭州)信息技术有限公司 Graph feature processing method and device
CN113761286A (en) * 2020-06-01 2021-12-07 杭州海康威视数字技术股份有限公司 Map embedding method and device of knowledge map and electronic equipment
WO2022217712A1 (en) * 2021-04-16 2022-10-20 平安科技(深圳)有限公司 Data mining method and apparatus, and computer device and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768670A (en) * 2012-05-31 2012-11-07 哈尔滨工程大学 Webpage clustering method based on node property label propagation
CN103346565A (en) * 2013-07-26 2013-10-09 华北电力大学 Method for identifying weak nodes of power grid based on vector digraph
US20140317033A1 (en) * 2013-04-23 2014-10-23 International Business Machines Corporation Predictive and descriptive analysis on relations graphs with heterogeneous entities
CN106295844A (en) * 2015-06-12 2017-01-04 华为技术有限公司 A kind of data processing method, device, system and electronic equipment
CN106447066A (en) * 2016-06-01 2017-02-22 上海坤士合生信息科技有限公司 Big data feature extraction method and device
US20170185910A1 (en) * 2015-12-28 2017-06-29 International Business Machines Corporation Steering graph mining algorithms applied to complex networks
CN106960251A (en) * 2017-03-09 2017-07-18 浙江工业大学 A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology
US20170228435A1 (en) * 2016-02-05 2017-08-10 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets
CN107797852A (en) * 2016-09-06 2018-03-13 阿里巴巴集团控股有限公司 The processing unit and processing method of data iteration

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102768670A (en) * 2012-05-31 2012-11-07 哈尔滨工程大学 Webpage clustering method based on node property label propagation
US20140317033A1 (en) * 2013-04-23 2014-10-23 International Business Machines Corporation Predictive and descriptive analysis on relations graphs with heterogeneous entities
CN103346565A (en) * 2013-07-26 2013-10-09 华北电力大学 Method for identifying weak nodes of power grid based on vector digraph
CN106295844A (en) * 2015-06-12 2017-01-04 华为技术有限公司 A kind of data processing method, device, system and electronic equipment
US20170185910A1 (en) * 2015-12-28 2017-06-29 International Business Machines Corporation Steering graph mining algorithms applied to complex networks
US20170228435A1 (en) * 2016-02-05 2017-08-10 Quid, Inc. Measuring accuracy of semantic graphs with exogenous datasets
CN106447066A (en) * 2016-06-01 2017-02-22 上海坤士合生信息科技有限公司 Big data feature extraction method and device
CN107797852A (en) * 2016-09-06 2018-03-13 阿里巴巴集团控股有限公司 The processing unit and processing method of data iteration
CN106960251A (en) * 2017-03-09 2017-07-18 浙江工业大学 A kind of Undirected networks based on node similitude connect side right value Forecasting Methodology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NTUHUIHUI: "灰灰深入浅出讲解循环神经网络(RNN)", 《HTTPS://BLOG.CSDN.NET/NTUHUIHUI/ARTICLE/DETAILS/78992554》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582831A (en) * 2018-10-16 2019-04-05 中国科学院计算机网络信息中心 A kind of chart database management system for supporting unstructured data storage and inquiry
CN109582831B (en) * 2018-10-16 2022-02-01 中国科学院计算机网络信息中心 Graph database management system supporting unstructured data storage and query
CN109614542A (en) * 2018-12-11 2019-04-12 平安科技(深圳)有限公司 Public platform recommended method, device, computer equipment and storage medium
CN109614542B (en) * 2018-12-11 2024-05-14 平安科技(深圳)有限公司 Public number recommendation method, device, computer equipment and storage medium
CN111309815A (en) * 2018-12-12 2020-06-19 北京嘀嘀无限科技发展有限公司 Method and device for processing relation map and electronic equipment
WO2020147595A1 (en) * 2019-01-16 2020-07-23 阿里巴巴集团控股有限公司 Method, system and device for obtaining relationship expression between entities, and advertisement recalling system
CN110457505A (en) * 2019-07-04 2019-11-15 特斯联(北京)科技有限公司 The method and apparatus for carrying out relation excavation based on chart database
WO2021047021A1 (en) * 2019-09-09 2021-03-18 平安科技(深圳)有限公司 Information mining method and apparatus, device, and storage medium
WO2021169454A1 (en) * 2020-02-25 2021-09-02 支付宝(杭州)信息技术有限公司 Graph feature processing method and device
CN111444287A (en) * 2020-03-17 2020-07-24 北京齐尔布莱特科技有限公司 Graph database construction method, associated information query method, device and computing equipment
CN111444287B (en) * 2020-03-17 2024-03-15 北京齐尔布莱特科技有限公司 Graph database construction method, associated information query method, device and computing equipment
CN113761286A (en) * 2020-06-01 2021-12-07 杭州海康威视数字技术股份有限公司 Map embedding method and device of knowledge map and electronic equipment
CN113761286B (en) * 2020-06-01 2024-01-02 杭州海康威视数字技术股份有限公司 Knowledge graph embedding method and device and electronic equipment
CN111783968A (en) * 2020-06-30 2020-10-16 山东信通电子股份有限公司 Power transmission line monitoring method and system based on cloud edge cooperation
CN111783968B (en) * 2020-06-30 2024-05-31 山东信通电子股份有限公司 Power transmission line monitoring method and system based on cloud edge cooperation
CN112601215A (en) * 2020-12-01 2021-04-02 深圳市和讯华谷信息技术有限公司 Method and device for unifying equipment identifications
WO2022217712A1 (en) * 2021-04-16 2022-10-20 平安科技(深圳)有限公司 Data mining method and apparatus, and computer device and storage medium

Also Published As

Publication number Publication date
CN108491511B (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN108491511B (en) Data mining method and device based on graph data and model training method and device
Chernozhukov et al. hdm: High-dimensional metrics
CN108874992B (en) Public opinion analysis method, system, computer equipment and storage medium
US10558852B2 (en) Predictive analysis of target behaviors utilizing RNN-based user embeddings
US20180253657A1 (en) Real-time credit risk management system
US20220253406A1 (en) Method for data structure relationship detection
CN114372573B (en) User portrait information recognition method and device, computer equipment and storage medium
CN110263235A (en) Information pushes object updating method, device and computer equipment
CN111291264A (en) Access object prediction method and device based on machine learning and computer equipment
Moradi et al. A trust-aware recommender algorithm based on users overlapping community structure
CN112800344B (en) Deep neural network-based movie recommendation method
CN110880006A (en) User classification method and device, computer equipment and storage medium
CN113656699B (en) User feature vector determining method, related equipment and medium
US11538029B2 (en) Integrated machine learning and blockchain systems and methods for implementing an online platform for accelerating online transacting
CN112685635A (en) Item recommendation method, device, server and storage medium based on classification label
CN114491084B (en) Self-encoder-based relation network information mining method, device and equipment
Sharma et al. Demographic profile building for cold start in recommender system: A social media fusion approach
Liu E‐Commerce Precision Marketing Model Based on Convolutional Neural Network
CN108304568B (en) Real estate public expectation big data processing method and system
CN110610378A (en) Product demand analysis method and device, computer equipment and storage medium
US20140324523A1 (en) Missing String Compensation In Capped Customer Linkage Model
Yin et al. Rsygan: Generative adversarial network for recommender systems
CN114491296A (en) Proposal affiliate recommendation method, system, computer device and readable storage medium
US20140324524A1 (en) Evolving a capped customer linkage model using genetic models
CN114529399A (en) User data processing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant