CN115795109A

CN115795109A - Data processing method, device and equipment

Info

Publication number: CN115795109A
Application number: CN202211591577.5A
Authority: CN
Inventors: 田胜; 朱亮; 但家旺
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-03-14

Abstract

An embodiment of the specification provides a data processing method, a data processing device and data processing equipment, wherein the method comprises the following steps: acquiring first graph structure data to be cut; determining a node characterization vector of each node in the first graph structure data based on a graph coding network in a pre-trained graph sampling model; determining an edge representation vector of an edge between every two nodes with a connection relation in the first graph structure data based on the node representation vector of each node in the first graph structure data, the construction time of the first graph structure data and the time information between every two nodes with a connection relation in the first graph structure data; determining the sampling probability of edges between every two nodes with connection relations in the first graph structure data based on a sampling network in a pre-trained graph sampling model; and based on the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data, cutting the first graph structure data to obtain the cut first graph structure data.

Description

Data processing method, device and equipment

Technical Field

The embodiment of the specification relates to the technical field of data processing, in particular to a data processing method, a data processing device and data processing equipment.

Background

With the rapid development of computer technology, various industries face the problem of big data processing, how to extract valuable information from big data, support increasingly complex business requirements, and are the problems to be solved urgently, and graph structure data can describe knowledge resources and carriers thereof by using a visualization technology, so that the graph structure data can be combined with the graph structure data to solve big data problems such as text semantic similarity, similar commodity recommendation or an intelligent question-and-answer system.

Because the complete graph data may include a large amount of redundant data, the graph data needs to be clipped to process the above-mentioned big data problem in combination with the clipped graph data, for example, the complete graph data may be clipped by clipping k-order neighbor subgraphs to obtain the clipped graph data. However, because the nodes useful for the big data problem may not be effectively selected by cutting in the k-order neighbor subgraphs, that is, the cut graph structure data may not be accurately used for processing the big data problem, thereby causing the problems of poor cutting effect of the graph data, poor data processing accuracy and the like.

Disclosure of Invention

An object of the embodiments of the present specification is to provide a data processing method, apparatus, and device, so as to provide a technical solution for improving clipping accuracy for graph structure data.

In order to implement the above technical solution, the embodiments of the present specification are implemented as follows:

in a first aspect, an embodiment of the present specification provides a data processing method, including: acquiring first graph structure data to be cut, wherein the first graph structure data is constructed on the basis of human-computer interaction data with a preset corresponding relation with a target user; coding the first graph structure data based on a graph coding network in a pre-trained graph sampling model, and determining a node representation vector of each node in the first graph structure data; determining an edge representation vector of an edge between every two nodes with connection relations in the first graph structure data based on a node representation vector of each node in the first graph structure data, the construction time of the first graph structure data and time information between every two nodes with connection relations in the first graph structure data; determining sampling probability of the edge between every two nodes with connection relation in the first graph structure data based on the sampling network in the pre-trained graph sampling model and the edge characterization vector of the edge between every two nodes with connection relation in the first graph structure data; and based on the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data, cutting the first graph structure data to obtain the cut first graph structure data.

In a second aspect, an embodiment of the present specification provides a data processing apparatus, including: the data acquisition module is used for acquiring first graph structure data to be cut, and the first graph structure data is constructed on the basis of human-computer interaction data which has a preset corresponding relationship with a target user; the first determination module is used for performing coding processing on the first graph structure data based on a graph coding network in a pre-trained graph sampling model, and determining a node characterization vector of each node in the first graph structure data; a second determining module, configured to determine an edge representation vector of an edge between every two nodes having a connection relationship in the first graph structure data based on a node representation vector of each node in the first graph structure data, a construction time of the first graph structure data, and time information between every two nodes having a connection relationship in the first graph structure data; a probability determination module, configured to determine a sampling probability of an edge between every two nodes having a connection relation in the first graph structure data based on a sampling network in the pre-trained graph sampling model and an edge characterization vector of an edge between every two nodes having a connection relation in the first graph structure data; and the first cutting module is used for cutting the first graph structure data based on the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data to obtain the cut first graph structure data.

In a third aspect, an embodiment of the present specification provides a data processing apparatus, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: acquiring first graph structure data to be cut, wherein the first graph structure data is constructed on the basis of human-computer interaction data which has a preset corresponding relation with a target user; coding the first graph structure data based on a graph coding network in a pre-trained graph sampling model, and determining a node representation vector of each node in the first graph structure data; determining an edge representation vector of an edge between every two nodes with connection relation in the first graph structure data based on a node representation vector of each node in the first graph structure data, the construction time of the first graph structure data and time information between every two nodes with connection relation in the first graph structure data; determining sampling probability of edges between every two nodes with connection relation in the first graph structure data based on a sampling network in the pre-trained graph sampling model and edge characterization vectors of edges between every two nodes with connection relation in the first graph structure data; and based on the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data, cutting the first graph structure data to obtain the cut first graph structure data.

In a fourth aspect, embodiments of the present specification provide a storage medium for storing computer-executable instructions, which when executed implement the following processes: acquiring first graph structure data to be cut, wherein the first graph structure data is constructed on the basis of human-computer interaction data which has a preset corresponding relation with a target user; coding the first graph structure data based on a graph coding network in a pre-trained graph sampling model, and determining a node representation vector of each node in the first graph structure data; determining an edge representation vector of an edge between every two nodes with connection relation in the first graph structure data based on a node representation vector of each node in the first graph structure data, the construction time of the first graph structure data and time information between every two nodes with connection relation in the first graph structure data; determining sampling probability of edges between every two nodes with connection relation in the first graph structure data based on a sampling network in the pre-trained graph sampling model and edge characterization vectors of edges between every two nodes with connection relation in the first graph structure data; and based on the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data, cutting the first graph structure data to obtain the cut first graph structure data.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

FIG. 1A is a flow chart of one embodiment of a data processing method of the present disclosure;

FIG. 1B is a schematic diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 2 is a diagram illustrating a clipping process of a first graph structure data according to the present disclosure;

FIG. 3 is a schematic diagram of another embodiment of a data processing method;

FIG. 4 is a schematic diagram illustrating a training process of a graph sampling model according to the present disclosure;

FIG. 5 is a schematic diagram of another embodiment of a data processing method;

FIG. 6 is a schematic diagram of a graph sampling process according to the present disclosure;

FIG. 7 is a block diagram of an embodiment of a data processing apparatus according to the present disclosure;

fig. 8 is a schematic structural diagram of a data processing apparatus according to the present specification.

Detailed Description

The embodiment of the specification provides a data processing method, a data processing device and data processing equipment.

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.

Example one

As shown in fig. 1A and fig. 1B, an execution subject of the method may be a server, and the server may be an independent server, or a server cluster composed of a plurality of servers. The method specifically comprises the following steps:

in S102, first drawing structure data to be clipped is acquired.

The first graph structure data may be constructed based on human-computer interaction data having a preset corresponding relationship with a target user, the server may receive the first graph structure data transmitted by any transmitting device (e.g., a mobile terminal device such as a mobile phone or a tablet computer, a terminal device such as a personal computer, or a server), the first graph structure data may be any graph structure data constructed based on human-computer interaction data having a preset corresponding relationship with the target user by the transmitting device, for example, the first graph structure data may be any graph data capable of representing nodes and node association relationships such as a knowledge graph, a node in the first graph structure data may be used to represent an entity or a concept, an edge between nodes may be used to represent a semantic relationship between an entity and/or a concept, for example, when the target user makes a click behavior while browsing a page, it may be considered that an interaction between the target user and a content block of the page occurs, that the target user and the content block of the page may be used as a node in the first graph structure data, and an edge between the two nodes may be used to represent a fine-computer interaction data interaction characteristic that the first graph structure data has a preset corresponding relationship with the target user.

In implementation, with the rapid development of computer technology, various industries face the problem of big data processing, how to extract valuable information from big data, support increasingly complex business requirements, and are the problems to be solved urgently, and graph structure data can be used for solving the big data problems such as text semantic similarity, similar commodity recommendation or intelligent question and answer systems because the graph structure data can describe knowledge resources and carriers thereof by using a visualization technology. Because the complete graph data may include a large amount of redundant data, the graph data needs to be cut to process the big data problem by combining the cut graph data, for example, the complete graph data may be cut in a manner of cutting k-order neighbor subgraphs, so as to obtain the cut graph data. However, because the nodes useful for the big data problem may not be effectively selected by cutting in the k-order neighbor subgraphs, that is, the cut graph structure data may not be accurately used for processing the big data problem, thereby causing the problems of poor cutting effect of the graph data, poor data processing accuracy and the like. Therefore, the embodiments of the present disclosure provide a technical solution that can solve the above problems, and refer to the following specifically.

Taking a resource transfer scenario as an example, when detecting that a target user triggers execution of a resource transfer service, a terminal device may send a request for executing the resource transfer service to a server, and the server may obtain, in response to the service execution request, first graph structure data constructed based on human-computer interaction data having a preset correspondence with the target user, where the first graph structure data may be first graph structure data constructed based on human-computer interaction data such as input data of the target user in a human-computer interaction process.

In addition, the server may further obtain second graph structure data, where the second graph structure data may be human-computer interaction data that has a preset correspondence with the target user and is obtained within the first time period, and since the obtained second graph structure data may have a large data volume, in order to improve subsequent data processing efficiency, the server may perform preliminary clipping processing on the second graph structure data to obtain the first graph structure data. For example, the server may obtain a sub-graph corresponding to a second time period in the second graph structure data as the first graph structure data, where the second time period is smaller than the first time period, for example, the first time period may be approximately half a month, and the second time period may be approximately one week.

The method for acquiring the first graph structure data is an optional and realizable method, and in an actual application scenario, there may be a plurality of different acquisition methods, which may be different according to different actual application scenarios, and this is not specifically limited in this embodiment of the present specification.

In S104, the first graph structure data is encoded based on a graph encoding network in a pre-trained graph sampling model, and a node characterization vector of each node in the first graph structure data is determined.

The graph coding network may be any graph neural network capable of coding the first graph structure data.

In implementation, the first graph structure data may be input into a graph coding network in a graph sampling model trained in advance, and the graph coding network may perform message aggregation processing on each node in the first graph structure data to obtain a node characterization vector of each node.

In S106, an edge representation vector of an edge between every two nodes having a connection relation in the first graph structure data is determined based on the node representation vector of each node in the first graph structure data, the construction time of the first graph structure data, and time information between every two nodes having a connection relation in the first graph structure data.

For example, when a target user generates a click behavior while browsing a certain page, it may be considered that an interaction has occurred between the target user and the page content block, that is, the target user and the page content block may be regarded as nodes in the first graph structure data, an edge between the two nodes may be used to represent the interaction between the target user and the page content block, and the time information between the two nodes having a connection relationship may be time information of the click behavior of the target user.

In implementation, a node characterization vector of each of two nodes having a connection relationship may be obtained, and the construction time of the first graph structure data and the time information between the two nodes having a connection relationship may be obtained, and based on the obtained information, a variable characterization vector of an edge between the two nodes having a connection relationship may be constructed.

In S108, based on the sampling network in the pre-trained graph sampling model and the edge characterization vector of the edge between every two nodes having a connection relationship in the first graph structure data, a sampling probability of the edge between every two nodes having a connection relationship in the first graph structure data is determined.

The sampling network in the graph sampling model may be a network constructed based on a neural network algorithm, for example, the sampling network may be constructed based on a graph convolution neural network algorithm, the sampling probability may be used to represent the importance of an edge between every two nodes having a connection relationship, different network structures may be selected according to different actual application scenarios for the graph coding network and the sampling network in the graph sampling model, for example, in a scenario with a higher timeliness requirement and a lower accuracy requirement, a network structure with a smaller number of layers (e.g., a network structure with 2 to 3 layers) may be adopted to meet a high-efficiency real-time scenario calculation requirement, and in a scenario with a lower timeliness requirement and a higher accuracy requirement, a network structure with a larger number of layers (e.g., a network structure with 5 to 6) may be adopted to meet a high-accuracy scenario calculation requirement, and the like.

In implementation, the server may perform joint training on the graph coding network and the sampling network in the graph sampling model based on the historical graph structure data to obtain a trained graph sampling model, and then input an edge representation vector of an edge between every two nodes having a connection relationship in the first graph structure data into the pre-trained sampling network in the graph sampling model to obtain a sampling probability of an edge between every two nodes having a connection relationship.

In S110, based on the sampling probability of the edge between every two nodes having a connection relationship in the first graph structure data, the first graph structure data is clipped to obtain the clipped first graph structure data.

In implementation, the sampling probabilities of edges between every two nodes having a connection relationship may be sorted from large to small, and an edge corresponding to a last n-bit sampling probability may be clipped to obtain clipped first graph structure data, where n is a positive integer not less than 1, and n may be different according to different clipping requirements of an actual application scenario, which is not specifically limited in this embodiment of the present specification.

For example, assuming that the first graph structure data includes node 1, node 2, node 3, node 4, and node 5, and the connection relationships between the nodes are as shown in fig. 2, the sampling probabilities of edges between every two nodes having connection relationships may be sorted from large to small, and an edge corresponding to the next 2-bit sampling probability may be clipped, so as to obtain the clipped first graph structure data.

The method for performing clipping processing on the first graph structure data is an optional and realizable method, and there may be a plurality of different clipping processing methods in an actual application scenario, and different clipping processing methods may be selected according to different actual application scenarios, which is not specifically limited in this embodiment of the present specification.

By cutting the first graph structure data, the requirement on the computational complexity in a big data scene can be met, the graph structure data can be cut at low computational cost, high cutting rate is achieved, the storage cost of the data is reduced, and the consumption of computational resources of subsequent downstream tasks is reduced.

The embodiment of the specification provides a data processing method, which includes the steps of obtaining first graph structure data to be cut, constructing the first graph structure data based on human-computer interaction data with a preset corresponding relation with a target user, coding the first graph structure data based on a graph coding network in a pre-trained graph sampling model, determining a node representation vector of each node in the first graph structure data, determining an edge representation vector of an edge between every two nodes with a connection relation in the first graph structure data based on the node representation vector of each node in the first graph structure data, constructing time of the first graph structure data and time information between every two nodes with a connection relation in the first graph structure data, determining sampling probability of an edge between every two nodes with a connection relation in the first graph structure data based on the edge representation vector of an edge between every two nodes with a connection relation in the pre-trained graph sampling model and the first graph structure data, and cutting the first graph structure data to obtain the first graph structure data. In this way, through the graph coding network, the node representation vector of each node can be accurately determined, the edge representation vector of the edge between every two nodes with the connection relation in the first graph structure data can be accurately determined by combining the node representation vector and the time information (namely the construction time of the first graph structure data and the time information between every two nodes with the connection relation in the first graph structure data), and then the first graph structure data is cut according to the determined sampling probability of the edge between every two nodes with the connection relation to obtain the cut first graph structure data.

Example two

As shown in fig. 3, an execution subject of the method may be a terminal device or a server, where the terminal device may be a device such as a personal computer, or may also be a mobile terminal device such as a mobile phone or a tablet computer, and the server may be an independent server, or may be a server cluster composed of multiple servers. The method may specifically comprise the steps of:

in S302, the history map structure data is acquired.

The historical graph structure data may be training data used for training the graph sampling model, the historical graph structure data may be constructed based on historical human-computer interaction data having a preset corresponding relationship with a first user, the first user may be a plurality of users including a target user, or the first user may also be a user different from the target user, and the like.

In S304, the historical graph structure data is input to a graph coding network in the graph sampling model, and the historical graph structure data is subjected to coding processing to determine a node characterization vector of each node in the historical graph structure data.

The Graph coding Network may be a timing sequence coding Network, for example, the Graph coding Network may be a timing knowledge Graph (TGAT), a timing sequence Network (Temporal Graph Network), a Transformer Network, or the like.

In implementation, as shown in fig. 4, the historical graph structure data may be input into a graph coding network of a graph sampling model, and a message aggregation calculation may be performed by using the graph coding network to obtain a node characterization vector of each node in the historical graph structure data. Specifically, the graph coding network may adopt a graph network structure of TGAT, and perform calculation of neighbor node message aggregation on each node, so as to obtain a node characterization vector of each node.

In S306, an edge representation vector of an edge between every two nodes having a connection relationship in the historical graph structure data is determined based on the node representation vector of each node in the historical graph structure data, the construction time of the historical graph structure data, and the time information between every two nodes having a connection relationship in the historical graph structure data.

In implementation, in an actual application scenario, the time information has a great significance for processing a big data problem, for example, the side closer to the current time is more important, and regular behaviors such as habit preference of some users may be found through the time information corresponding to the side (i.e. the generation time of the interaction event), for example, a user may purchase a certain commodity at 8 am every day (or early monthly). Therefore, in order to improve the accuracy of subsequent data processing, the server may determine the edge representation vector of each edge by combining the node representation vectors of the left and right nodes of each edge, the construction time of the historical graph structure data, and the time information corresponding to the edge.

In practical applications, the processing manner of S306 may be various, and an alternative implementation manner is provided below, which may specifically refer to the following steps one to three:

step one, constructing a discrete characterization vector based on the preset dimension and time information between every two nodes with connection relations in the historical graph structure data.

In implementation, assuming that the preset dimensions include four dimensions of month, date, hour and minute, the corresponding discrete token vector may be constructed based on the four dimensions according to the time information between every two nodes having a connection relationship in the historical graph structure data.

And step two, determining a time difference value based on the construction time of the historical graph structure data and the time information between every two nodes with the connection relation in the historical graph structure data, and converting the time difference value into a continuous characterization vector of a time domain.

In practice, the time difference may be input to a formula

φ(Δt)＝cos(WΔt+b)，

And obtaining continuous characterization vectors in the time domain, wherein delta t is a time difference value, phi (delta t) is the continuous characterization vectors of the time difference value in the time domain, W and b are learnable parameters with a dimension d, and d is a preset dimension.

And thirdly, determining an edge representation vector of an edge between every two nodes with connection relations in the historical graph structure data based on the node representation vectors of every two nodes with connection relations, the discrete representation vectors and the continuous representation vectors of the edges between every two nodes with connection relations in the historical graph structure data.

In S308, based on the sampling network in the graph sampling model and the edge characterization vector of the edge between every two nodes with connection relations in the historical graph structure data, the sampling probability of the edge between every two nodes with connection relations in the historical graph structure data is determined.

In practice, the processing manner of S308 may be varied in practical applications, and an alternative implementation manner is provided below, which may specifically refer to the following steps one to two:

the method comprises the steps of firstly, determining a first sampling probability of an edge between every two nodes with a connection relation in historical graph structure data based on a sampling network in a graph sampling model and an edge characterization vector of the edge between every two nodes with a connection relation in the historical graph structure data.

In implementation, for example, when a sampling network is constructed based on a Multi-Layer Perceptron (MLP), assuming that a network structure of the sampling network is a 3-Layer MLP structure, an edge characterization vector of an edge between every two nodes having a connection relationship in the historical graph structure data may be input into the sampling network, and a first sampling probability may be obtained through an activation function (e.g., may be a sigmoid function).

And step two, converting the first sampling probability of the edge between every two nodes with the connection relation in the historical graph structure data into a preset distribution for sampling to obtain the sampling probability of the edge between every two nodes with the connection relation in the historical graph structure data.

In implementation, since the value of the first sampling probability obtained by activating the function is 0 or 1, in order to make the sampling process train, the bernoulli distribution can be used for sampling, and the re-parameterization is used to approximate the original untrained sampling process.

The first sampling probability may be input to a formula

e＝σ((logε-log(1-ε)+m/τ)，

And obtaining the sampling probability of an edge between every two nodes with a connection relation in the historical graph structure data, wherein e is the sampling probability of the edge, m is the first sampling probability of the edge, sigma is an activation function, and epsilon and tau are preset hyper-parameters.

In S310, based on the sampling probability of the edge between every two nodes having a connection relationship in the historical graph structure data, the historical graph structure data is clipped, so as to obtain the clipped historical graph structure data.

In implementation, based on a preset sampling probability threshold, the sampling probability of an edge between every two nodes having a connection relation in the historical graph structure data is subjected to screening processing, and the clipped historical graph structure data is constructed based on the nodes subjected to screening processing.

In S312, based on the attribute information of the edge between every two nodes having a connection relationship in the clipped historical graph structure data, an edge feature vector between every two nodes having a connection relationship in the clipped historical graph structure data is generated.

For example, assuming that an edge between two nodes having a connection relationship is used to represent a resource transfer event, the attribute information of the edge may be determined based on the interaction data of the resource transfer event, for example, the attribute information of the edge may include resource transfer time, resource transfer number, and the like.

In implementation, feature extraction processing may be performed on attribute information of an edge between every two nodes having a connection relationship in the clipped historical graph structure data based on a preset feature extraction algorithm, so as to obtain an edge feature vector between every two nodes having a connection relationship in the clipped historical graph structure data.

In S314, based on the edge feature vector and the edge feature vector of the edge between every two nodes having a connection relationship in the clipped historical graph structure data and the attention coefficient between every two nodes having a connection relationship in the clipped historical graph structure data, the node feature vector of each node in the clipped historical graph structure data is updated, so as to obtain an updated node feature vector of each node in the clipped historical graph structure data.

In implementation, as shown in fig. 4, an attention coefficient between every two nodes having a connection relationship in the clipped historical graph structure data may be obtained through a graph coding network, and an updated node characterization vector of each node in the clipped historical graph structure data is obtained by updating a node characterization vector of each node in the clipped historical graph structure data based on an edge feature vector and an edge characterization vector of an edge between every two nodes having a connection relationship in the clipped historical graph structure data and an attention coefficient between every two nodes having a connection relationship in the clipped historical graph structure data. The edge feature vector and the edge feature vector of the edge between every two nodes with the connection relation in the clipped historical graph structure data and the product of the attention coefficient between every two nodes with the connection relation in the clipped historical graph structure data can be used as the updated node feature vector of each node in the clipped historical graph structure data.

The above-mentioned update method for the node characterization vector is an optional and realizable update method, and in an actual application scenario, there may be a plurality of different update methods, and different update methods may be selected according to different actual application scenarios, which is not specifically limited in the embodiments of the present specification.

In S316, a loss value is determined based on the node characterizing vector of the node of the historical map structure data and the updated node characterizing vector of the node of the clipped historical map structure data.

In practice, the processing manner of S316 may be varied in practical applications, and an alternative implementation manner is provided below, which may specifically refer to the following steps one to two:

step one, determining a loss value based on a node characterization vector of a central node of the historical graph structure data and an updated node characterization vector of the central node of the cut historical graph structure data.

The central node of the historical graph structure data may be determined based on the number of connection relationships between each node of the historical graph structure data and other nodes, and besides, there may be multiple determination methods.

And step two, acquiring mutual information values between the node characterization vectors of the central nodes of the historical graph structure data and the updated node characterization vectors of the central nodes of the cut historical graph structure data, and determining the mutual information values as loss values.

In S318, whether the graph sampling model converges is determined based on the loss value, and if the graph sampling model does not converge, the graph coding network and the sampling network of the graph sampling model are trained based on the historical graph structure data until the graph sampling model converges, so as to obtain the trained graph sampling model.

In implementation, in order to keep useful information in the original graph structure data as much as possible in the clipped graph structure data, the graph sampling model may be trained through the characterization similarity of the central nodes of the graph structure data before and after clipping, that is, the graph sampling model may be trained through whether a loss value determined by a mutual information value between a node characterization vector of the central node of the historical graph structure data and an updated node characterization vector of the central node of the clipped historical graph structure data is greater than a preset loss value threshold value, so that the graph sampling model obtained through training can keep useful information in the original graph structure data as much as possible in the clipped graph structure data.

In S102, first drawing structure data to be clipped is acquired.

The first graph structure data can be constructed based on human-computer interaction data with a preset corresponding relation with a target user.

In S320, an information recommendation request for a target user is received.

In implementation, taking the graph structure data determined based on the commodity purchase information of the target user in a preset first period as an example, the target user purchases a commodity 1 in the commodity transaction application program in the preset first period, and the first graph structure data to be clipped may be constructed based on the user information of the target user, the commodity information of the commodity 1, and the commodity information of the commodity 2.

For example, the commodity transaction application program may include a commodity 1, a commodity 2, a commodity 3, and a commodity 4, where the commodity and the commodity 2 are sold in the same number within a preset first time period, the commodity 1 and the commodity 3 are of the same type, and the transaction prices of the commodity 1 and the commodity 4 are the same, so that the node in the first graph structure data may be determined according to the target user, the commodity 1, the commodity 2, the commodity 3, and the commodity 4, and the node association relationship in the first graph structure data may be determined according to the purchase relationship between the target user and each commodity, the commodity relationship between the commodities, and the like. The server may obtain the first graph structure data constructed based on the information, and perform clipping on the first graph structure data according to the above S104 to S110 to obtain the clipped first graph structure data.

The terminal device may send an information recommendation request for the target user to the server when detecting that the target user triggers and starts the commodity transaction application program.

In S322, the clipped first graph structure data is subjected to node classification processing to obtain a node classification result, and an information recommendation result is determined based on the node classification result.

In implementation, the server may perform node classification processing on the clipped first graph structure data to obtain a node classification result, and determine an information recommendation result according to the node classification result.

For example, the nodes except for the node corresponding to the target user in the clipped first graph structure data may be classified, and the obtained node classification result may be: the information recommendation method includes the following steps of classifying 1 and classifying 2, wherein the classifying 1 corresponds to the commodity 1, the commodity 2 and the commodity 3, and the classifying 2 corresponds to the commodity 4, and since the commodity 1 is a commodity purchased by a target user in a preset first time period, the information recommendation result can be determined based on the classifying 1, namely the information recommendation result can include the commodity 2 and the commodity 3.

The determination method of the information recommendation result is an optional and realizable determination method, and in an actual application scenario, there may be a plurality of different determination methods, and different determination methods may be selected according to different actual application scenarios, which is not specifically limited in the embodiments of the present specification.

In S324, the information recommendation result is fed back with respect to the information recommendation request.

In implementation, the server can feed back the information recommendation result to the terminal device, so that the terminal device displays the information recommendation result, and the information recommendation efficiency and accuracy can be improved.

EXAMPLE III

As shown in fig. 5, an execution subject of the method may be a terminal device or a server, where the terminal device may be a device such as a personal computer, or may also be a mobile terminal device such as a mobile phone or a tablet computer, and the server may be an independent server, or may be a server cluster composed of multiple servers. The method may specifically comprise the steps of:

in S502, a risk detection request for triggering execution of a target service for a target user is received.

The target service may be any service that may have a risk of privacy disclosure and the like, for example, the target service may be a resource transfer service, an information update service and the like.

In implementation, the terminal device may send a risk detection request for triggering execution of the target service for the target user to the server when detecting that the user triggers the target service.

In S504, in response to the risk detection request, human-computer interaction data having a preset correspondence with the target user and target data required for executing the target service are obtained.

In implementation, taking a target service as a resource transfer service as an example, the target data may include resource transfer time, resource transfer quantity, resource transfer objects, resource transfer routes, and the like, and the human-computer interaction data having a preset correspondence with the target user may include input data of the target user for a dialog corresponding to the resource transfer service, and the like.

In S506, based on the human-computer interaction data and the target data, first graph structure data to be cut is constructed.

In implementation, the server may use the target user, the resource transfer object, and the like as nodes, and construct a connection relationship between the nodes according to an interaction relationship between the target user and the resource transfer object. Because the data volume of the human-computer interaction data having the preset corresponding relationship with the target user may be large, and the human-computer interaction data may include redundant data having a small correlation with the target service, the redundant data may not only cause waste of storage space, but also affect the recognition efficiency and recognition effect of risk detection, and therefore the first graph structure data needs to be cut.

The graph coding network may be a timing diagram coding network.

The training process of the graph sampling model may refer to the training processes of S302 to S318 in the second embodiment, which are not described herein again.

In S508, a discrete token vector is constructed based on the preset dimension and time information between every two nodes having a connection relationship in the first graph structure data.

In S510, a time difference value is determined based on the construction time of the first graph structure data and time information between every two nodes having a connection relationship in the first graph structure data, and the time difference value is converted into a continuous characterization vector of a time domain.

In S512, an edge representation vector of an edge between each two nodes having a connection relationship in the first graph structure data is determined based on the node representation vectors of each two nodes having a connection relationship in the first graph structure data, the discrete representation vector and the continuous representation vector of the edge between each two nodes having a connection relationship.

In S514, a first sampling probability of an edge between every two nodes having a connection relation in the first graph structure data is determined based on the sampling network in the pre-trained graph sampling model and the edge characterization vector of the edge between every two nodes having a connection relation in the first graph structure data.

In S516, the first sampling probability of the edge between every two nodes having a connection relationship in the first graph structure data is converted into a preset distribution for sampling, so as to obtain the sampling probability of the edge between every two nodes having a connection relationship in the first graph structure data.

For the specific processing procedures of S508 to S516, reference may be made to the relevant contents of S306 to S308 in the second embodiment, which are not described herein again.

In S518, based on a preset sampling probability threshold, a filtering process is performed on the sampling probability of an edge between every two nodes having a connection relationship in the first graph structure data, and the clipped first graph structure data is constructed based on the nodes after the filtering process.

In practice, the processing manner of S518 may be varied in practical applications, and an alternative implementation manner is provided below, which may specifically refer to the following processing in steps one to three:

step one, carrying out aggregation processing on node characterization vectors of nodes in the first graph structure data to obtain a node characterization vector of a target node.

In implementation, as shown in fig. 6, the node characterizing vectors of the nodes except the central node in the first graph structure data may be aggregated to obtain the node characterizing vector of the target node. The node characterization vectors of the target nodes can be determined through mean value aggregation and the like.

And step two, constructing a connection relation between the target node and the central node of the first graph structure data, and setting the sampling probability of the edge between the target node and the central node of the first graph structure data as a preset sampling probability value.

Wherein the preset sampling probability value may be greater than the sampling probability threshold value.

In implementation, part of topology information (that is, global information) is lost in the graph structure data after being clipped, so that the node characterization vectors of the nodes in the first graph structure data can be aggregated to obtain the target node, and the node characterization vector of the target node can be used for characterizing the global information of the first graph structure data. A connection relationship may be established between the target node and the center node of the first graph structure data, and the sampling probability of the edge between the target node and the center node of the first graph structure data is set to a preset sampling probability value that is greater than a sampling probability threshold, for example, since the numerical range of the sampling probability may be between 0 and 1, that is, the sampling probability is not less than 0 and not greater than 1, the preset sampling probability value may be set to 1.

And thirdly, screening the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data containing the target node based on a preset sampling probability threshold value, and constructing the first graph structure data after cutting based on the nodes after screening.

In implementation, because the sampling probability of the edge between the target node and the center node of the first graph structure data is a preset sampling probability value greater than a sampling probability threshold, the node after the screening processing includes the target node, so that the global information of the first graph structure data before the clipping can be retained by the target node, the information loss caused by the clipping processing is reduced, and the subsequent data processing performance is facilitated to be improved.

In S520, it is determined whether there is a risk in executing the target service based on the clipped first graph structure data.

In implementation, risk identification processing may be performed on the clipped first graph structure data through a risk identification model trained in advance to obtain a corresponding risk identification result, and whether a risk exists in executing the target service is determined based on the risk identification result. The risk identification model can be a model constructed based on a preset machine learning algorithm.

In addition, if the server determines that there is a risk in executing the target service based on the risk identification result, the server may send the risk identification result to the terminal device and stop executing the target service. And if the server determines that the target service is executed without risk based on the risk identification result, the server can execute the target service and return the service execution result to the terminal equipment.

The method for determining whether there is a risk in executing the target service is an optional and realizable method, and in an actual application scenario, there may be a plurality of different determination methods, and different determination methods may be selected according to different actual application scenarios, which is not specifically limited in the embodiment of the present specification.

The embodiment of the specification provides a data processing method, which includes the steps of obtaining first graph structure data to be cut, constructing the first graph structure data based on human-computer interaction data with a preset corresponding relation with a target user, coding the first graph structure data based on a graph coding network in a graph sampling model trained in advance, determining a node representation vector of each node in the first graph structure data, determining an edge representation vector of an edge between every two nodes with a connection relation in the first graph structure data based on the node representation vector of each node in the first graph structure data, constructing time of the first graph structure data and time information between every two nodes with the connection relation in the first graph structure data, determining sampling probability of the edge between every two nodes with the connection relation in the first graph structure data based on the sampling network in the graph sampling model trained in advance and the edge representation vector of the edge between every two nodes with the connection relation in the first graph structure data, processing the first graph structure data to obtain a first graph structure data, and cutting the first graph structure data. Thus, through the graph coding network, the node representation vector of each node can be accurately determined, the edge representation vector of an edge between every two nodes with connection relation in the first graph structure data can be accurately determined by combining the node representation vector and time information (namely the construction time of the first graph structure data and the time information between every two nodes with connection relation in the first graph structure data), and then the first graph structure data is cut according to the determined sampling probability of the edge between every two nodes with connection relation to obtain the cut first graph structure data.

Example four

Based on the same idea, the data processing method provided in the embodiment of the present specification further provides a data processing apparatus, as shown in fig. 7.

The data processing apparatus includes: a data acquisition module 701, a first determination module 702, a second determination module 703, a probability determination module 704, and a first clipping module 705, wherein:

the data acquisition module 701 is used for acquiring first graph structure data to be cut, wherein the first graph structure data is constructed on the basis of human-computer interaction data which has a preset corresponding relationship with a target user;

a first determining module 702, configured to perform encoding processing on the first graph structure data based on a graph encoding network in a pre-trained graph sampling model, and determine a node characterization vector of each node in the first graph structure data;

a second determining module 703, configured to determine an edge representation vector of an edge between every two nodes having a connection relationship in the first graph structure data based on a node representation vector of each node in the first graph structure data, a construction time of the first graph structure data, and time information between every two nodes having a connection relationship in the first graph structure data;

a probability determination module 704, configured to determine sampling probabilities of edges between every two nodes having a connection relation in the first graph structure data based on the sampling networks in the pre-trained graph sampling model and the edge characterization vectors of the edges between every two nodes having a connection relation in the first graph structure data;

the first clipping module 705 is configured to clip the first graph structure data based on a sampling probability of an edge between every two nodes having a connection relationship in the first graph structure data, so as to obtain the clipped first graph structure data.

In an embodiment of this specification, the apparatus further includes:

the request receiving module is used for receiving an information recommendation request aiming at the target user;

the classification module is used for carrying out node classification processing on the cut first graph structure data to obtain a node classification result and determining an information recommendation result based on the node classification result;

and the feedback module is used for feeding back the information recommendation result according to the information recommendation request.

In this embodiment of the present specification, the data obtaining module 701 is configured to:

receiving a risk detection request for triggering and executing target service aiming at the target user;

responding to the risk detection request, acquiring the human-computer interaction data which has a preset corresponding relation with the target user, and target data required by executing the target service;

constructing the first graph structure data to be cut based on the human-computer interaction data and the target data;

the device further comprises:

and the risk determining module is used for determining whether the target business is carried out or not based on the clipped first graph structure data.

In the embodiment of the present specification, the graph coding network is a timing diagram coding network.

In this embodiment of the present specification, the second determining module 703 is configured to:

constructing a discrete characterization vector based on a preset dimension and time information between every two nodes with a connection relation in the first graph structure data;

determining a time difference value based on the construction time of the first graph structure data and time information between every two nodes with connection relations in the first graph structure data, and converting the time difference value into a continuous characterization vector of a time domain;

determining an edge representation vector of an edge between every two nodes with connection relations in the first graph structure data based on the node representation vectors of every two nodes with connection relations in the first graph structure data, the discrete representation vectors of the edge between every two nodes with connection relations and the continuous representation vector.

In this embodiment of the present specification, the probability determining module 704 is configured to:

determining a first sampling probability of an edge between every two nodes with connection relations in the first graph structure data based on a sampling network in the pre-trained graph sampling model and an edge characterization vector of the edge between every two nodes with connection relations in the first graph structure data;

and converting the first sampling probability of the edge between every two nodes with connection relation in the first graph structure data into preset distribution for sampling to obtain the sampling probability of the edge between every two nodes with connection relation in the first graph structure data.

In this embodiment of the present specification, the first clipping module 705 is configured to:

and based on a preset sampling probability threshold value, screening the sampling probability of edges between every two nodes with connection relations in the first graph structure data, and constructing the cut first graph structure data based on the screened nodes.

In an embodiment of this specification, the first clipping module 705 is configured to:

carrying out aggregation processing on the node characterization vectors of the nodes in the first graph structure data to obtain a node characterization vector of a target node;

constructing a connection relation between the target node and a central node of the first graph structure data, and setting a sampling probability of an edge between the target node and the central node of the first graph structure data as a preset sampling probability value, wherein the preset sampling probability value is greater than the sampling probability threshold value;

and screening the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data containing the target node based on a preset sampling probability threshold value, and constructing the first graph structure data after cutting based on the nodes after screening.

In an embodiment of this specification, the apparatus further includes:

the historical data acquisition module is used for acquiring historical map structure data;

a third determining module, configured to input the historical graph structure data into a graph coding network in the graph sampling model, perform coding processing on the historical graph structure data, and determine a node characterization vector of each node in the historical graph structure data;

a fourth determining module, configured to determine an edge representation vector of an edge between every two nodes having a connection relationship in the historical graph structure data based on a node representation vector of each node in the historical graph structure data, a construction time of the historical graph structure data, and time information between every two nodes having a connection relationship in the historical graph structure data;

a fifth determining module, configured to determine a sampling probability of an edge between every two nodes having a connection relation in the historical graph structure data based on an edge characterization vector of an edge between a sampling network in the graph sampling model and every two nodes having a connection relation in the historical graph structure data;

the second cutting module is used for cutting the historical graph structure data based on the sampling probability of edges between every two nodes with connection relations in the historical graph structure data to obtain the cut historical graph structure data;

the feature generation module is used for generating edge feature vectors between every two nodes with connection relations in the cut historical graph structure data based on the attribute information of edges between every two nodes with connection relations in the cut historical graph structure data;

an updating module, configured to update a node feature vector of each node in the clipped historical graph structure data based on an edge feature vector and an edge feature vector of an edge between every two nodes having a connection relationship in the clipped historical graph structure data and an attention coefficient between every two nodes having a connection relationship in the clipped historical graph structure data, so as to obtain an updated node feature vector of each node in the clipped historical graph structure data;

a loss determination module, configured to determine a loss value based on a node characterization vector of a node of the historical graph structure data and an updated node characterization vector of the node of the clipped historical graph structure data;

and the training module is used for determining whether the graph sampling model is converged or not based on the loss value, and if the graph sampling model is not converged, continuing to train a graph coding network and a sampling network of the graph sampling model based on the historical graph structure data until the graph sampling model is converged to obtain the trained graph sampling model.

In an embodiment of this specification, the loss determining module is configured to:

and determining the loss value based on the node characterization vector of the central node of the historical graph structure data and the updated node characterization vector of the central node of the trimmed historical graph structure data.

and acquiring mutual information values between the node characterization vectors of the central nodes of the historical graph structure data and the updated node characterization vectors of the central nodes of the cut historical graph structure data, and determining the mutual information values as the loss values.

The embodiment of the present specification provides a data processing apparatus, which obtains first graph structure data to be clipped, where the first graph structure data is constructed based on human-computer interaction data having a preset corresponding relationship with a target user, the first graph structure data is encoded based on a graph coding network in a graph sampling model trained in advance, a node characterization vector of each node in the first graph structure data is determined, an edge characterization vector of an edge between every two nodes having a connection relationship in the first graph structure data is determined based on the node characterization vector of each node in the first graph structure data, the construction time of the first graph structure data and time information between every two nodes having a connection relationship in the first graph structure data, a sampling probability of an edge between every two nodes having a connection relationship in the first graph structure data is determined based on an edge characterization vector of an edge between every two nodes having a connection relationship in the graph sampling model trained in advance and the first graph structure data, and the first graph structure data is clipped to obtain the first graph structure data. In this way, through the graph coding network, the node representation vector of each node can be accurately determined, the edge representation vector of the edge between every two nodes with the connection relation in the first graph structure data can be accurately determined by combining the node representation vector and the time information (namely the construction time of the first graph structure data and the time information between every two nodes with the connection relation in the first graph structure data), and then the first graph structure data is cut according to the determined sampling probability of the edge between every two nodes with the connection relation to obtain the cut first graph structure data.

EXAMPLE five

Based on the same idea, embodiments of the present specification further provide a data processing apparatus, as shown in fig. 8.

The data processing apparatus may have a large difference due to different configurations or performances, and may include one or more processors 801 and a memory 802, and one or more stored applications or data may be stored in the memory 802. Wherein the memory 802 may be a transient storage or a persistent storage. The application program stored in memory 802 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for a data processing device. Still further, the processor 801 may be arranged in communication with the memory 802 to execute a series of computer executable instructions in the memory 802 on the data processing device. The data processing apparatus may also include one or more power supplies 803, one or more wired or wireless network interfaces 804, one or more input output interfaces 805, one or more keyboards 806.

In particular, in this embodiment, the data processing apparatus includes a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs may include one or more modules, and each module may include a series of computer-executable instructions for the data processing apparatus, and the one or more programs configured to be executed by the one or more processors include computer-executable instructions for:

acquiring first graph structure data to be cut, wherein the first graph structure data is constructed on the basis of human-computer interaction data which has a preset corresponding relation with a target user;

coding the first graph structure data based on a graph coding network in a pre-trained graph sampling model, and determining a node representation vector of each node in the first graph structure data;

determining an edge representation vector of an edge between every two nodes with connection relation in the first graph structure data based on a node representation vector of each node in the first graph structure data, the construction time of the first graph structure data and time information between every two nodes with connection relation in the first graph structure data;

determining sampling probability of edges between every two nodes with connection relation in the first graph structure data based on a sampling network in the pre-trained graph sampling model and edge characterization vectors of edges between every two nodes with connection relation in the first graph structure data;

and based on the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data, cutting the first graph structure data to obtain the cut first graph structure data.

Optionally, the method further comprises:

receiving an information recommendation request aiming at the target user;

performing node classification processing on the cut first graph structure data to obtain a node classification result, and determining an information recommendation result based on the node classification result;

and feeding back the information recommendation result according to the information recommendation request.

Optionally, the obtaining of the first graph structure data to be clipped includes:

receiving a risk detection request for triggering and executing a target service aiming at the target user;

the method further comprises the following steps:

and determining whether the target business is executed with risks or not based on the clipped first graph structure data.

Optionally, the graph coding network is a timing graph coding network.

Optionally, the determining, based on the node characterization vector of each node in the first graph structure data, the construction time of the first graph structure data, and the time information between every two nodes with connection relationships in the first graph structure data, an edge characterization vector of an edge between every two nodes with connection relationships in the first graph structure data includes:

determining a time difference value based on the construction time of the first graph structure data and the time information between every two nodes with connection relations in the first graph structure data, and converting the time difference value into a continuous characterization vector of a time domain;

Optionally, the determining, based on the sampling network in the pre-trained graph sampling model and the edge characterization vector of the edge between each two nodes having a connection relationship in the first graph structure data, the sampling probability of the edge between each two nodes having a connection relationship in the first graph structure data includes:

and converting the first sampling probability of the edge between every two nodes with the connection relation in the first graph structure data into a preset distribution for sampling to obtain the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data.

Optionally, the clipping the first graph structure data based on the sampling probability of an edge between every two nodes having a connection relationship in the first graph structure data to obtain the clipped first graph structure data includes:

Optionally, the screening, based on a preset sampling probability threshold, the sampling probability of an edge between every two nodes having a connection relationship in the first graph structure data, and constructing the clipped first graph structure data based on the screened nodes, includes:

and based on a preset sampling probability threshold value, screening the sampling probability of edges between every two nodes with connection relation in the first graph structure data containing the target node, and constructing the cut first graph structure data based on the screened nodes.

Optionally, before the encoding processing is performed on the first graph structure data based on the graph coding network in the pre-trained graph sampling model to obtain a node characterization vector of each node in the first graph structure data, the method further includes:

acquiring historical map structure data;

inputting the historical graph structure data into a graph coding network in the graph sampling model, coding the historical graph structure data, and determining a node characterization vector of each node in the historical graph structure data;

determining an edge representation vector of an edge between every two nodes with a connection relation in the historical graph structure data based on a node representation vector of each node in the historical graph structure data, the construction time of the historical graph structure data and time information between every two nodes with a connection relation in the historical graph structure data;

determining sampling probability of the edges between every two nodes with connection relations in the historical graph structure data based on the sampling network in the graph sampling model and the edge characterization vectors of the edges between every two nodes with connection relations in the historical graph structure data;

based on the sampling probability of edges between every two nodes with connection relations in the historical graph structure data, cutting the historical graph structure data to obtain the cut historical graph structure data;

generating edge feature vectors between every two nodes with connection relations in the cut historical graph structure data based on the attribute information of edges between every two nodes with connection relations in the cut historical graph structure data;

updating the node characteristic vector of each node in the cut historical graph structure data based on the edge characteristic vector and the edge characteristic vector of the edge between every two nodes with connection relation in the cut historical graph structure data and the attention coefficient between every two nodes with connection relation in the cut historical graph structure data to obtain the updated node characteristic vector of each node in the cut historical graph structure data;

determining a loss value based on a node characterization vector of a node of the historical graph structural data and an updated node characterization vector of the node of the clipped historical graph structural data;

and determining whether the graph sampling model is converged or not based on the loss value, and if the graph sampling model is not converged, continuing to train a graph coding network and a sampling network of the graph sampling model based on the historical graph structure data until the graph sampling model is converged to obtain the trained graph sampling model.

Optionally, the determining a loss value based on the node characterization vector of the node of the historical graph structure data and the updated node characterization vector of the node of the clipped historical graph structure data includes:

Optionally, the determining the loss value based on the node characterization vector of the central node of the historical graph structure data and the updated node characterization vector of the central node of the clipped historical graph structure data includes:

The embodiment of the present specification provides a data processing device, which obtains first graph structure data to be clipped, where the first graph structure data is constructed based on human-computer interaction data having a preset corresponding relationship with a target user, the first graph structure data is encoded based on a graph coding network in a graph sampling model trained in advance, a node characterization vector of each node in the first graph structure data is determined, an edge characterization vector of an edge between every two nodes having a connection relationship in the first graph structure data is determined based on the node characterization vector of each node in the first graph structure data, the construction time of the first graph structure data and time information between every two nodes having a connection relationship in the first graph structure data, a sampling probability of an edge between every two nodes having a connection relationship in the first graph structure data is determined based on an edge characterization vector of an edge between every two nodes having a connection relationship in the graph sampling model trained in advance and the first graph structure data, and the first graph structure data is clipped to obtain the first graph structure data. In this way, through the graph coding network, the node representation vector of each node can be accurately determined, the edge representation vector of the edge between every two nodes with the connection relation in the first graph structure data can be accurately determined by combining the node representation vector and the time information (namely the construction time of the first graph structure data and the time information between every two nodes with the connection relation in the first graph structure data), and then the first graph structure data is cut according to the determined sampling probability of the edge between every two nodes with the connection relation to obtain the cut first graph structure data.

EXAMPLE six

The embodiments of the present disclosure further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the processes of the data processing method embodiments, and can achieve the same technical effects, and in order to avoid repetition, the details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The embodiment of the present specification provides a computer-readable storage medium, which acquires first graph structure data to be clipped, where the first graph structure data is constructed based on human-computer interaction data having a preset corresponding relationship with a target user, the first graph structure data is encoded based on a graph coding network in a graph sampling model trained in advance, a node characterization vector of each node in the first graph structure data is determined, an edge characterization vector of an edge between every two nodes having a connection relationship in the first graph structure data is determined based on the node characterization vector of each node in the first graph structure data, the construction time of the first graph structure data and time information between every two nodes having a connection relationship in the first graph structure data, a sampling probability of an edge between every two nodes having a connection relationship in the first graph structure data is determined based on the edge characterization vector of an edge between every two nodes having a connection relationship in the graph sampling model trained in advance and the first graph structure data, and the first graph structure data is processed to obtain the first graph structure data. In this way, through the graph coding network, the node representation vector of each node can be accurately determined, the edge representation vector of the edge between every two nodes with the connection relation in the first graph structure data can be accurately determined by combining the node representation vector and the time information (namely the construction time of the first graph structure data and the time information between every two nodes with the connection relation in the first graph structure data), and then the first graph structure data is cut according to the determined sampling probability of the edge between every two nodes with the connection relation to obtain the cut first graph structure data.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually manufacturing an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development, but the original code before compiling is also written in a specific Programming Language, which is called Hardware Description Language (HDL), and the HDL is not only one kind but many kinds, such as abll (Advanced boot Expression Language), AHDL (alternate hard Description Language), traffic, CUPL (computer universal Programming Language), HDCal (Java hard Description Language), lava, lola, HDL, PALASM, software, rhydl (Hardware Description Language), and vhul-Language (vhyg-Language), which is currently used in the field. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in purely computer readable program code means, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

Embodiments of the present description are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

One or more embodiments of the specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of data processing, comprising:

2. The method of claim 1, further comprising:

receiving an information recommendation request aiming at the target user;

3. The method of claim 1, the obtaining first graph structure data to be clipped comprising:

responding to the risk detection request, acquiring the human-computer interaction data which has a preset corresponding relationship with the target user, and target data required by executing the target service;

the method further comprises the following steps:

4. The method of claim 1, the graph coding network being a timing graph coding network.

5. The method of claim 4, wherein the determining an edge representation vector of an edge between every two nodes with a connection relation in the first graph structure data based on a node representation vector of each node in the first graph structure data, a construction time of the first graph structure data, and time information between every two nodes with a connection relation in the first graph structure data comprises:

6. The method of claim 5, wherein the determining the sampling probability of the edge between each two nodes with connection relation in the first graph structure data based on the sampling network in the pre-trained graph sampling model and the edge characterization vector of the edge between each two nodes with connection relation in the first graph structure data comprises:

7. The method according to claim 6, wherein the clipping the first graph structure data based on the sampling probability of the edge between every two nodes having a connection relationship in the first graph structure data to obtain the clipped first graph structure data includes:

8. The method according to claim 7, wherein the filtering, based on a preset sampling probability threshold, sampling probabilities of edges between every two nodes having a connection relationship in the first graph structure data, and constructing the clipped first graph structure data based on the filtered nodes, includes:

constructing a connection relation between the target node and a center node of the first graph structure data, and setting a sampling probability of an edge between the target node and the center node of the first graph structure data as a preset sampling probability value, wherein the preset sampling probability value is greater than the sampling probability threshold value;

9. The method according to claim 1, before the encoding the first graph structure data based on the graph coding network in the pre-trained graph sampling model to obtain the node characterization vector of each node in the first graph structure data, further comprising:

obtaining historical map structure data;

determining sampling probability of edges between every two nodes with connection relations in the historical graph structure data based on the sampling network in the graph sampling model and the edge characterization vectors of the edges between every two nodes with connection relations in the historical graph structure data;

updating the node characteristic vector of each node in the clipped historical graph structure data based on the edge characteristic vector and the edge characteristic vector of the edge between every two nodes with connection relation in the clipped historical graph structure data and the attention coefficient between every two nodes with connection relation in the clipped historical graph structure data to obtain the updated node characteristic vector of each node in the clipped historical graph structure data;

determining a loss value based on a node characterization vector of a node of the historical graph structure data and an updated node characterization vector of the node of the cut historical graph structure data;

10. The method of claim 9, the determining a penalty value based on a node characterization vector of a node of the historical graph structure data and an updated node characterization vector of a node of the pruned historical graph structure data, comprising:

and determining the loss value based on the node characterization vector of the central node of the historical graph structure data and the updated node characterization vector of the central node of the clipped historical graph structure data.

11. The method of claim 10, the determining the loss value based on a node characterization vector of a center node of the historical graph structure data and an updated node characterization vector of a center node of the cropped historical graph structure data, comprising:

12. A data processing apparatus comprising:

the data acquisition module is used for acquiring first graph structure data to be cut, and the first graph structure data is constructed on the basis of human-computer interaction data which has a preset corresponding relation with a target user;

the first determination module is used for performing coding processing on the first graph structure data based on a graph coding network in a pre-trained graph sampling model, and determining a node characterization vector of each node in the first graph structure data;

a second determining module, configured to determine an edge representation vector of an edge between every two nodes having a connection relationship in the first graph structure data based on a node representation vector of each node in the first graph structure data, a construction time of the first graph structure data, and time information between every two nodes having a connection relationship in the first graph structure data;

a probability determination module, configured to determine a sampling probability of an edge between every two nodes having a connection relation in the first graph structure data based on a sampling network in the pre-trained graph sampling model and an edge characterization vector of an edge between every two nodes having a connection relation in the first graph structure data;

and the first cutting module is used for cutting the first graph structure data based on the sampling probability of the edge between every two nodes with the connection relation in the first graph structure data to obtain the cut first graph structure data.

13. A data processing apparatus, the data processing apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

acquiring first graph structure data to be cut, wherein the first graph structure data is constructed on the basis of human-computer interaction data with a preset corresponding relation with a target user;

determining sampling probability of the edge between every two nodes with connection relation in the first graph structure data based on the sampling network in the pre-trained graph sampling model and the edge characterization vector of the edge between every two nodes with connection relation in the first graph structure data;

14. A storage medium for storing computer-executable instructions, which when executed by a processor implement the following: