CN113283589A

CN113283589A - Updating method and device of event prediction system

Info

Publication number: CN113283589A
Application number: CN202110631255.8A
Authority: CN
Inventors: 薛思乔; 师晓明; 马琳涛; 潘晨; 王世军; 詹姆士·张; 郝鸿延
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-08-20
Anticipated expiration: 2041-06-07
Also published as: CN113283589B

Abstract

An embodiment of the present specification provides an update method of an event prediction system, including: inputting samples acquired based on the event sample sequence into an event prediction system for event processing, wherein the event processing comprises the following steps: determining a sequence coding vector of a subsequence up to the occurrence time of the sample through a sequence coding network, wherein each sample in the subsequence corresponds to a first user; updating a node representation vector related to the first user node in the user relationship network graph according to the sequence coding vector through the graph propagation network; fitting an event occurrence intensity function corresponding to the first user according to the updated node characterization vector through an intensity fitting network; mapping the event occurrence intensity function to an event type space through an intensity mapping network to obtain a plurality of intensity functions of the first user under a plurality of event types; and then, updating the network parameters in the event prediction system based on a plurality of intensity functions obtained by event processing and the label samples corresponding to the first user.

Description

Updating method and device of event prediction system

Technical Field

One or more embodiments of the present disclosure relate to the field of machine learning technologies, and in particular, to an update method and apparatus for an event prediction system.

Background

With the development of economy and the advancement of technology, users frequently use various services provided by the service platform to meet various demands in work and life. During the service usage of the user, a large amount of on-line and off-line data is generated. The behavior data (or event data and operation data) reflects the personal interests and behavior preferences of the user, and can effectively guide the optimization of the service if the behavior data can be deeply mined and reasonably utilized. In a data processing mode, a user behavior sequence is modeled to predict what behavior occurs at what time next time, and the data processing method can help provide personalized service meeting the needs of the user, so that the user experience is improved. The point process is a sequence modeling technology, and the prediction of the next event is realized by abstracting the user behavior into points in space, abstracting the user behavior sequence into a point process sequence and simulating the intensity function of the occurrence of the subsequent behavior event.

However, the existing point process algorithm is difficult to meet the high requirement of event prediction in practical application. Therefore, a solution is needed to effectively improve the performance of the peer-to-peer process algorithm, so as to optimize the accuracy and usability of the event prediction result.

Disclosure of Invention

According to the updating method and the updating device for the event prediction system, which are described in one or more embodiments of the specification, the point process algorithm is optimized, and the strength function obtained through the optimized point process algorithm is more accurate, so that the accuracy and the usability of the event prediction result are effectively improved.

According to a first aspect, there is provided an update method of an event prediction system, including: the event samples are sequentially acquired from an event sample sequence formed by arranging according to the time sequence as first event samples, and the sample attributes of the first event samples comprise first occurrence time and first user identification. Inputting the first event sample into an event prediction system for event processing, wherein the event prediction system comprises a sequence coding network, a graph propagation network, an intensity fitting network and an intensity mapping network; the event processing comprises the following steps: determining a sequence coding vector of a subsequence up to the first occurrence moment through the sequence coding network, wherein each event sample in the subsequence corresponds to the first user identifier; updating node characterization vectors of a first user node and neighbor nodes thereof in the user relationship network graph through the graph propagation network according to the sequence coding vectors; determining parameter values in an event occurrence intensity function corresponding to the first user identification according to the updated first node characterization vector of the first user node through the intensity fitting network; and mapping the event occurrence intensity function to an event type space through the intensity mapping network to obtain a plurality of intensity functions of the first user identifier under a plurality of event types. Updating network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity; and the second occurrence time corresponding to the second event sample is later than the first occurrence time.

In one embodiment, the sequence encoding network includes a linear embedding sub-network and a timing sub-network; wherein determining the sequence-encoded vector of the sub-sequence up to the first occurrence time comprises: determining type embedding vectors of event types corresponding to the event samples in the subsequence through the linear embedding sub-network; and outputting the sequence coding vector based on the sequentially input type embedding vector corresponding to each event sample through the time sequence sub-network.

In one embodiment, updating node characterization vectors for a first user node and its neighbor nodes in a user relationship network graph comprises: taking the first user node as a target node, and executing an updating operation aiming at the target node; wherein the update operation comprises: and determining the updated representation vector of the target node according to the target sequence coding vector corresponding to the target node, the current representation vector of the target neighbor node of the target node and the current representation vector of the target node.

In a specific embodiment, after the first user node is used as a target node and the update operation on the characterization vector of the target node is performed, the method further includes: and taking the neighbor node of the first user node as a target node, and executing the updating operation.

In a more specific embodiment, the graph propagation network includes a local propagation layer, a self-propagation layer, an exogenous propagation layer, and a fusion layer; wherein the update operation specifically includes: processing the current characterization vector of the target neighbor node through the local propagation layer to obtain a local propagation vector; performing linear transformation on the current characterization vector of the target node by using a first parameter matrix through the self-propagation layer to obtain a self-propagation vector; performing linear transformation on the target sequence coding vector by using a second parameter matrix through the exogenous propagation layer to obtain an exogenous propagation vector; and performing fusion processing on the local propagation vector, the self-propagation vector and the exogenous propagation vector through the fusion layer to obtain the updated representation vector of the target node.

In one example, the target neighbor node is a plurality of first order neighbor nodes of the target node; wherein, processing the current characterization vector of the target neighbor node to obtain a local propagation vector, includes: and carrying out weighted summation on the current characterization vectors of the first-order neighbor nodes by using the weight parameter vector in the local propagation layer to obtain the local propagation vector.

In another example, the target neighbor node includes a plurality of first order neighbor nodes and a plurality of second order neighbor nodes of the target node; wherein, processing the current characterization vector of the target neighbor node to obtain a local propagation vector, includes: aiming at each first-order neighbor node, determining a plurality of attention weights corresponding to a plurality of current characterization vectors of a plurality of second-order neighbor nodes by using the current characterization vector of the first-order neighbor node; weighting and summing the current characterization vectors by using the attention weights to obtain a neighbor aggregation vector of the first-order neighbor node; and carrying out weighted summation on a plurality of neighbor aggregation vectors of the first-order neighbor nodes by using the weight parameter vector in the local propagation layer to obtain the local propagation vector.

In one embodiment, the event occurrence intensity function includes a reference intensity, and the intensity fitting network includes a reference intensity determination layer; wherein determining parameter values in an event occurrence strength function corresponding to the first user comprises: and performing linear transformation processing and activation processing on the first node characterization vector through the reference strength determination layer to obtain the reference strength.

In a specific embodiment, the event occurrence intensity function further includes a historical stimulation coefficient and a time attenuation coefficient, and the intensity fitting network further includes a stimulation coefficient determination layer and an attenuation coefficient determination layer; wherein determining a parameter value in an event occurrence strength function corresponding to the first user further comprises: determining, by the stimulation coefficient determination layer, a number of attention weights for a number of historical characterization vectors for the first user node from the first node characterization vector, the number of characterization vectors being derived based on a number of other event samples in the subsequence; and for each other event sample, determining the product result of the corresponding attention weight and the historical characterization vector as a corresponding historical stimulation coefficient; and respectively fusing the first node characterization vector with the plurality of historical characterization vectors through the attenuation coefficient determination layer to obtain a plurality of fusion vectors, and sequentially performing linear transformation and activation processing on each fusion vector to obtain a corresponding time attenuation coefficient.

In one embodiment, updating network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity comprises: determining an intensity function corresponding to the same event type as the second event sample from the plurality of intensity functions; and updating the network parameters based on the intensity function and the occurrence time corresponding to the second event sample.

In one embodiment, the event prediction system further comprises a adjacency matrix prediction layer; wherein, prior to updating network parameters in the event prediction system, the method further comprises: determining a prediction adjacent order matrix of a virtual event relation network graph constructed based on the plurality of event types according to the characterization vectors of the nodes in the user relation network graph through the adjacent matrix prediction layer; wherein updating the network parameters in the event prediction system comprises: determining a first loss term based on the plurality of intensity functions and a second event sample; acquiring a real event relation network graph, wherein the real event relation network graph comprises a plurality of type nodes corresponding to the event types and directed connection edges formed by causal relations among the type nodes; determining a second loss term based on a true adjacency matrix and the predicted adjacency matrix of the true event relationship network map; updating the network parameter based on the first loss term and the second loss term.

In a specific embodiment, obtaining a real event relationship network graph includes: acquiring a plurality of user event sequences, wherein each user event sequence comprises a plurality of events which are made by a corresponding user and are arranged according to a time sequence, and the causal relationship exists between event types corresponding to any two adjacent events; and constructing the real event relation network graph based on the plurality of user event sequences.

In a more specific embodiment, the truth adjacency matrix includes a weight of the directed connecting edge, which is determined based on a statistical number of the causal relationship.

Further, in one example, determining a prediction order matrix of a virtual event relationship network graph constructed based on the plurality of event types according to the characterization vectors of the nodes in the user relationship network graph includes: determining a type characterization vector of each event type in the plurality of event types based on the characterization vectors of the nodes to form a type characterization matrix; determining the predicted adjacency matrix based on the type characterization matrix and a learning parameter matrix in the adjacency matrix prediction layer.

In a more specific example, prior to updating the network parameters in the event prediction system, the method further comprises: and updating the type characterization vector corresponding to the event type of the first event sample into the first node characterization vector.

According to a second aspect, there is provided an updating apparatus of an event prediction system, comprising: the sequence acquiring unit is configured to sequentially acquire event samples as first event samples from an event sample sequence formed by arranging the event samples in a time sequence, wherein the sample attributes of the first event samples comprise a first occurrence time and a first user identifier. The event processing unit is configured to input the first event sample into an event prediction system for event processing, and the event prediction system comprises a sequence coding network, a graph propagation network, an intensity fitting network and an intensity mapping network; the event processing unit comprises the following modules: the encoding module is configured to determine a sequence encoding vector of a subsequence up to the first occurrence time through the sequence encoding network, wherein each event sample in the subsequence corresponds to the first user identifier; the graph propagation module is configured to update the node representation vectors of the first user node and the neighbor nodes thereof in the user relationship network graph according to the sequence coding vector through the graph propagation network; the intensity fitting module is configured to determine a parameter value in an event occurrence intensity function corresponding to the first user identifier according to the updated first node characterization vector of the first user node through the intensity fitting network; and the intensity mapping module is configured to map the event occurrence intensity function to an event type space through the intensity mapping network to obtain a plurality of intensity functions of the first user identifier under a plurality of event types. A parameter updating unit configured to update network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity; and the second occurrence time corresponding to the second event sample is later than the first occurrence time.

In a specific embodiment, the graph propagation module is specifically configured to: taking the first user node as a target node, and executing an updating operation aiming at the target node; wherein the update operation comprises: and determining the updated representation vector of the target node according to the target sequence coding vector corresponding to the target node, the current representation vector of the target neighbor node of the target node and the current representation vector of the target node.

In a particular embodiment, the graph propagation network includes a local propagation layer, a self-propagation layer, an exogenous propagation layer, and a fusion layer; wherein, the update operation executed by the graph propagation module specifically includes: processing the current characterization vector of the target neighbor node through the local propagation layer to obtain a local propagation vector; performing linear transformation on the current characterization vector of the target node by using a first parameter matrix through the self-propagation layer to obtain a self-propagation vector; performing linear transformation on the target sequence coding vector by using a second parameter matrix through the exogenous propagation layer to obtain an exogenous propagation vector; and performing fusion processing on the local propagation vector, the self-propagation vector and the exogenous propagation vector through the fusion layer to obtain the updated representation vector of the target node.

In a more particular embodiment, the target neighbor node includes a plurality of first order neighbor nodes and a plurality of second order neighbor nodes of the target node; the graph propagation module obtains a local propagation vector by executing the update operation, and specifically includes: aiming at each first-order neighbor node, determining a plurality of attention weights corresponding to a plurality of current characterization vectors of a plurality of second-order neighbor nodes by using the current characterization vector of the first-order neighbor node; weighting and summing the current characterization vectors by using the attention weights to obtain a neighbor aggregation vector of the first-order neighbor node; and carrying out weighted summation on a plurality of neighbor aggregation vectors of the first-order neighbor nodes by using the weight parameter vector in the local propagation layer to obtain the local propagation vector.

In one embodiment, the event occurrence intensity function includes a reference intensity, and the intensity fitting network includes a reference intensity determination layer; wherein the intensity fitting module is specifically configured to: and performing linear transformation processing and activation processing on the first node characterization vector through the reference strength determination layer to obtain the reference strength.

In a more specific embodiment, the event occurrence intensity function further comprises a historical stimulation coefficient and a time attenuation coefficient, and the intensity fitting network further comprises a stimulation coefficient determination layer and an attenuation coefficient determination layer; wherein the intensity fitting module is further configured to: determining, by the stimulation coefficient determination layer, a number of attention weights for a number of historical characterization vectors for the first user node from the first node characterization vector, the number of characterization vectors being derived based on a number of other event samples in the subsequence; and for each other event sample, determining the product result of the corresponding attention weight and the historical characterization vector as a corresponding historical stimulation coefficient; and respectively fusing the first node characterization vector with the plurality of historical characterization vectors through the attenuation coefficient determination layer to obtain a plurality of fusion vectors, and sequentially performing linear transformation and activation processing on each fusion vector to obtain a corresponding time attenuation coefficient.

In one embodiment, the event prediction system further comprises a adjacency matrix prediction layer; the apparatus further comprises an adjacent order matrix prediction unit configured to: and determining a prediction adjacent order matrix of the virtual event relation network graph constructed based on the plurality of event types according to the characterization vectors of the nodes in the user relation network graph through the adjacent matrix prediction layer. Wherein the parameter updating unit is specifically configured to: determining a first loss term based on the plurality of intensity functions and a second event sample; acquiring a real event relation network graph, wherein the real event relation network graph comprises a plurality of type nodes corresponding to the event types and directed connection edges formed by causal relations among the type nodes; determining a second loss term based on a true adjacency matrix and the predicted adjacency matrix of the true event relationship network map; updating the network parameter based on the first loss term and the second loss term.

According to a third aspect, there is provided an event prediction system comprising: the system comprises an input layer, a data processing layer and a data processing layer, wherein the input layer is used for sequentially acquiring event samples from an event sample sequence formed by arranging according to a time sequence as first event samples, and the sample attributes of the first event samples comprise first generation time and first user identification; the sequence coding network is used for determining a sequence coding vector of a subsequence up to the first occurrence moment, and each event sample in the subsequence corresponds to the first user identifier; the graph propagation network is used for updating the node representation vectors of the first user node and the neighbor nodes thereof in the user relationship network graph according to the sequence coding vector; the intensity fitting network is used for determining a parameter value in an event occurrence intensity function corresponding to the first user identifier according to the updated first node characterization vector of the first user node; the intensity mapping network is used for mapping the event occurrence intensity function to an event type space to obtain a plurality of intensity functions of the first user identifier under a plurality of event types; and the output layer is used for outputting an event prediction result corresponding to the first user identifier based on the plurality of intensity functions, wherein the event prediction result comprises a predicted event type and a predicted occurrence moment.

According to a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.

According to a fifth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.

In the method and the device provided by the embodiment of the specification, an event prediction system framework built based on a neural network is innovatively provided, in the training process of the event prediction system, a deep neural network is utilized to perform feature extraction on an event sample, parameters of a strength function are fitted in a hidden space (or simply hidden space), and the strength function is mapped back to an event type space, so that strength functions corresponding to various event types are obtained, and the update of the event prediction system is further realized by combining a label sample. Furthermore, in the updating process, graph regularization processing can be introduced, so that a better training effect is obtained. Therefore, through repeated iterative training, a trained event prediction system can be obtained, so that the intensity function of the target event sequence of the target user can be modeled, and accurate prediction of the next event after the target event sequence in the future can be realized by adopting the obtained more accurate and more flexible intensity function.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates an implementation architecture diagram of an update event prediction system according to one embodiment;

FIG. 2 illustrates a flow diagram of an update method of an event prediction system, according to one embodiment;

FIG. 3 illustrates a sequence of event samples according to one example;

FIG. 4 illustrates a schematic structural diagram of an event prediction system according to one embodiment;

FIG. 5 illustrates an update apparatus architecture diagram of an event prediction system, according to one embodiment.

Detailed Description

The solution provided in the present specification will be described below with reference to the above drawings.

As previously described, the point process may be used to model a sequence of events, thereby enabling prediction of events. The key to the point process algorithm is to determine a conditional intensity function (conditional intensity function), which can be defined as the following equation:

wherein λ is_tRepresenting the intensity at time t; symbol: is defined as;

representing historical events occurring by time t;

is shown in given

At time intervals (t, t + delta)]The desired number of events that occur within. It is to be understood that formula (1) shows the definition of the intensity function, and the actually used intensity function is a function with the time t as an independent variable and the intensity as a dependent variable.

After modeling the intensity function from historical events, it can be used to predict the time of occurrence and event type of the next event. It is to be understood that the intensity function can be derived for each of a limited number of event types by modeling.

In one use of the intensity function, the probability density function of event propagation to time t may be first calculated from the intensity function:

wherein the content of the first and second substances,

representing a given historical event before the (i + 1) th event that needs to be predicted;

the function value of (a) represents the probability density of the occurrence of the (i + 1) th event at time t; lambda (t) ═ Σ_kλ_k(t)，λ_k(t) represents the intensity function at the kth event type; t is t_iIndicating the occurrence time of the ith event. It is to be understood that,

can also be abbreviated as p_i+1(t)。

Obtaining a probability density function p_i+1After (t), the occurrence time and the event type of the next event may be calculated according to the following equation (3) and equation (4), respectively.

Wherein the content of the first and second substances,

indicating the occurrence of the (i + 1) th event,

indicating the event type of the (i + 1) th event.

For example, suppose the historical event sequence of a given user is: meal at 12 pm → sleeping at 1 pm → playing at 3 pm, and after modeling the intensity function, the next event can be predicted: and 6 o' clock eating at night. In this way, a strength function can be modeled based on the historical event sequence, thereby predicting the occurrence time and event type of the next event.

From the above, the modeling effect of the intensity function determines the accuracy of the event prediction result, and is therefore crucial. Accordingly, in the event prediction system, a deep neural network is used to perform feature extraction on events in a historical event sequence, parameters of a strength function are fitted in a hidden space (or simply, hidden space), and the strength function is mapped back to an event type space, so that a strength function with extremely high availability corresponding to each event type is obtained.

FIG. 1 illustrates an implementation architecture diagram of an update event prediction system, according to one embodiment. As shown in fig. 1, a history event sample (or simply an event sample) is obtained according to a behavior record of an event of type c made by a user u at a time t, and accordingly, a plurality of event samples arranged in a time sequence may be obtained to form an event sample sequence, where an ith event sample x_iIncludes the occurrence time t of the ith event_iEvent type c_iAnd a user identity u_iIn FIG. 1, t_i-1＜t_i＜t_i+1. Further, based on the obtained event sample sequence, sequentially inputting each event sample into an event prediction system for event processing, wherein the event processing comprises sequentially utilizing a sequence coding network 101, a graph propagation network 102, an intensity fitting network 103 and an intensity mapping network 104 for processing, and respectively and correspondingly obtaining a sequence coding vector, an updated representation vector for a node in a user relationship network graph, an intensity function of a hidden space and an intensity function of an event type space; and then updating network parameters in the event prediction system by using the intensity function of the event type space and the label event sample. Therefore, the updating of the event prediction system can be realized, and the updated event prediction system is used for modeling the intensity function of the target event sequence of the target user, so that a more accurate and more flexible intensity function can be obtained, and the accurate prediction of the next event which is continued after the target event sequence is realized.

The following describes the implementation steps of the above inventive concept with reference to fig. 1 and 2, and the specific embodiments. FIG. 2 illustrates a flow diagram of an update method of an event prediction system, according to one embodiment. It is understood that the execution subject of the update method may be any platform, apparatus or device cluster with computing and processing capabilities. As shown in fig. 2, the method comprises the steps of:

step S210, sequentially obtaining event samples from the event sample sequence formed by arranging the event samples in the time sequence as a first event sample, wherein the sample attributes include a first occurrence time and a first user identifier. Step S220, inputting the first event sample into an event prediction system for event processing, wherein the event prediction system comprises a sequence coding network 101, a graph propagation network 102, a strength fitting network 103 and a strength mapping network 104; wherein the event processing comprises: step S221, determining a sequence coding vector of a subsequence up to the first occurrence time through a sequence coding network 101, wherein each event sample in the subsequence corresponds to the first user identifier; step S222, updating the node representation vectors of the first user node and the neighbor nodes thereof in the user relationship network graph through the graph propagation network 102 according to the sequence coding vectors; step S223, determining, through the strength fitting network 103, a parameter value in the event occurrence strength function corresponding to the first user identifier according to the updated first node characterization vector of the first user node; step S224, mapping the event occurrence strength function to an event type space through the strength mapping network 104, so as to obtain a plurality of strength functions of the first user identifier under a plurality of event types. Step S230, updating network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity; the second occurrence time corresponding to the second event sample is later than the first occurrence time.

In the above steps, it should be noted that, in the above "first event sample", "first occurrence time", and "second" in the "first" and "second event sample", and similar terms in other places are used to distinguish similar things, and do not have other limiting functions such as ordering.

The development of the above steps is as follows:

first, in step S210, event samples are sequentially acquired as first event samples from a sequence of event samples formed in chronological order. The first event sample may also be referred to as the current sample to be processed, or the current sample to be processed.

The sample property of the first event sample includes a first occurrence time of the corresponding first event, and the precision of the first occurrence time can be set according to requirements, for example, to be precise to minutes (min), seconds(s), days or months, and the like. The sample attribute of the first event further includes a subject identifier of an implementation subject of the first event, or a second user identifier, where the subject identifier may be a numeric number, or a serial number composed of numbers and letters, or the like. The first event has a first event type, which may also be included in the sample properties of the first event sample, accordingly.

For the acquisition of the event sample sequence, in an embodiment, a discrete event type space may be defined first, where the discrete event type space includes a limited number of multiple event types, and then, according to the multiple event types, a user behavior event is subjected to embedded point acquisition, a large number of user events are acquired, and then, according to the sequence of occurrence times, the acquired user events are sorted. Further, in a specific embodiment, the total sequence of events obtained by sorting may be directly used as the above-mentioned sequence of event samples. In another specific embodiment, considering that the number of collected user events is huge, and the computing power in practical engineering application is often limited, a sliding window sampling may be performed on the total sequence of events obtained by sorting, and a plurality of window sequences are slid by using the sliding window, and in any one of a plurality of rounds of iterative training, any window sequence is extracted from the plurality of window sequences to be used as an event sample sequence used by the training round.

Thus, after the sequence of event samples is acquired, the event samples therein are sequentially acquired as the first event sample. In step S220, the first event sample is input into the event prediction system for event processing. For clarity of description, the first event sample is denoted as event sample x_iIncluding the occurrence time t of the ith event_iTo do something likeType c of piece_iAnd a user identity u_i。

The event processing comprises the following steps:

in step S221, the occurrence time t is determined by the sequence encoding network 101_iOf the sub-sequence of (1)

Each event sample in the subsequence corresponds to a user identifier u_i。

It is to be understood that the sub-sequence is a sub-sequence of the above-mentioned sequence of event samples, and the event samples in the sub-sequence are also arranged in time sequence, and the sub-sequence includes the event samples occurring at the time t_iEvent sample x of_iAnd occurs at time t_iPrevious identical corresponding subscriber identity u_iThe event sample of (2). In one possible case, only event sample x is included in the sequence of event samples_i. In one example, FIG. 3 shows a sequence of event samples, i.e., x, according to one example₀→x₁→x₂→x₃→x₄→x₅....... Assume event sample x_iIs x₄Then, the sub-sequence corresponding to the user whitish up to 2 months and 10 days can be obtained as x₀→x₂→x₄。

In one embodiment, as shown in FIG. 4, a linear embedding subnetwork 1011 and a timing subnetwork 1012 are included in the sequence encoding network 101. Based on this, in a specific embodiment, the step may include: determining a type embedding vector of an event type corresponding to each sample in the subsequence through a linear embedding sub-network 1011; then, the sequential sub-network 1012 outputs a sequential code vector based on the sequentially input type-embedded vector corresponding to each sample

In one example, the timing subnetwork 1021 may be implemented as a recurrent neural network RNN or a long-short term timing memory network LSTM, or the like. In another specific embodiment, the sub-sequence is divided by event sample x_iOther than othersThe sample sequence formed by the event samples corresponds to the last sequence of coding vectors

Accordingly, the step may include: determining type embedding vector of event type ci of event sample xi by linear embedding sub-network 1011

Then passes through timing subnetwork 1012, based on the last sequence of encoded vectors

And type embedding vector

Outputting sequence-encoded vectors

In one example, assuming the timing subnetwork 1021 is implemented as an LSTM network, accordingly, the sequence encoding vectors may be implemented

Is represented by

From the above, a sequence-coded vector can be obtained

Step S222, updating the first user node (denoted as "first user node") in the user relationship network graph according to the sequence coding vector through the graph propagation network 102

) And the node characterization vectors of its neighboring nodes. It is understood that the first one therein isHousehold node

Corresponding to the first user identifier u_iIn other words, the first user node

And a first subscriber identity u_iAll uniquely corresponding to the same user.

It should be noted that the user relationship network graph is used to represent the relationship between multiple users, and includes multiple user nodes corresponding to the multiple users, and a connection edge formed by association between the user nodes. In one example, the associations between user nodes may include social relationships, e.g., where two users are friends in a social platform, or even if the frequency of communication (e.g., number of messaging, cumulative number of communication days) exceeds a preset threshold. In another example, the association between user nodes may include a relative, e.g., a mother, a grandmother, etc.

In one embodiment, this step may include: connecting the first user node

As a target Node (denoted as Node)_o) And executing the updating operation of the characterization vector aiming at the target node. This update operation includes: node according to target Node_oCorresponding target sequence code vector

The target Node_oTarget neighbor Node (denoted as Node)_rAnd is and

set of neighbor nodes representing target node) of a current token vector

And the current characterization vector of the target node itself

Determining a characterization vector for an updated target node

Wherein the content of the first and second substances,

in one particular embodiment, as shown in FIG. 4, graph propagation network 102 includes a local propagation layer 1021, a self-propagation layer 1022, an exogenous propagation layer 1023, and a fusion layer 1024. Based on this, the update operation specifically includes: processing the current characterization vector of the target neighbor node through a local propagation layer 1021

Obtaining local propagation vectors

Through the self-propagating layer 1022, a first parameter matrix W is utilized₁The current characterization vector of the target node

Performing linear transformation to obtain self-propagation vector

Using a second parameter matrix W through an exogenous propagation layer 1023₂Encoding the target sequence with the vector

Linear transformation is carried out to obtain exogenous propagation vectors

By fusing the layers 1024, local propagation vectors are aligned

Self-propagating vector

And exogenous propagation vectors

Performing fusion processing to obtain the updated characterization vector of the target node

Determining local propagation vectors for the above-described passing through local propagation layer 1021

In a more specific embodiment, the target neighbor node is a plurality of first-order neighbor nodes, and accordingly, the current characterization vectors of the plurality of first-order neighbor nodes may be obtained by performing weighted summation by using the weight parameter vector w in the local propagation layer 1021, which may be specifically expressed as a calculation formula:

in the formula (I), the compound is shown in the specification,

representing the current token vector of the plurality of first-order neighbor nodes

And stacking the characterization matrixes.

In another more specific embodiment, the target neighbor Node includes a plurality of first-order neighbor nodes and a plurality of second-order neighbor nodes, and the first-order neighbor nodes are denoted as Node r₁The second-order neighbor Node is recorded as Node r₂. Corresponding, local propagation vector

The determining of (a) may include: aiming at each first-order neighbor node, the current characterization vector of the first-order neighbor node is firstly utilized

Determining a plurality of current characterization vectors (denoted as second-order neighbor nodes)

) Corresponding attention weights (denoted as

) It is to be understood that

It is meant that the number of second order neighbor nodes are first order neighbors of the first order neighbor node, and several refer herein to one or more. Then, the plurality of attention weights are utilized to carry out weighted summation on the plurality of current characterization vectors to obtain a neighbor aggregation vector of the first-order neighbor node

Then, a plurality of neighbor aggregation vectors of the first-order neighbor nodes are weighted and summed by using the weight parameter vector w in the local propagation layer 1021 to obtain a local propagation vector

For the above attention weights

In one example, an attention mechanism may be introduced, for example, calculating an attention weight using equation (6) below

In equation (6), score represents the attention score, and there are various solving methods. In a specific example, the local propagation layer 1021 includes an attention scoring sublayer, and accordingly, the corresponding two vectors may be spliced and input to the attention scoring sublayer, so as to obtain the corresponding attention score. In another specific example, a similarity between two vectors may be calculated as the corresponding attention score.

In another example, the attention weights are several

Are set so that the respective meta attention weights are equal and the sum value is 1. In yet another example, the attention weight is a parameter that needs to be learned in the local propagation layer 1021.

As such, several attention weights may be derived

The method is used for aggregation of a plurality of second-order neighbor nodes, so that an aggregation vector of the first-order neighbor nodes is obtained, and a local propagation vector for a target node is obtained

For the above-described fusion process by the fusion layer 1024, in one embodiment, this fusion process may include an addition process, an averaging process, or a weighted summation process.

According to a specific example, the above updating operation can be performed by using the following formula (7), and the description of the mathematical symbols therein can be referred to above.

In another embodiment, the refresh operation toolThe body may include: encoding a target sequence into a vector

Current characterization vector of target node

And the current characterization vector of the target neighbor node

Splicing is carried out, and the spliced vectors are input into the graph propagation network 102 to obtain updated characterization vectors of the target nodes

In the above, the update operation for the target node characterization vector is described. After the first user node is taken as the target node and the updating operation for the target node is executed, the updated node characterization vector of the first user node can be obtained

Further, in one embodiment, the node characterization vector is obtained

Then, the method can further comprise the following steps: characterizing vectors based on the node

And updating the characterization vectors of the neighbor nodes of the first user node. Thus, the above event sample x can be implemented_iThe event information in (1) is propagated in the user relationship network graph. It should be understood that the order of the neighbor node corresponding to the update may be set according to actual requirements, for example, set to be 1 order, or set to be within 2 orders. In addition, the token vector updating method adopted for the first user node may be the same as or different from the token vector updating method adopted for the neighboring node. In one example, canAnd respectively taking each neighbor node of the first user node as the target node, and executing the updating operation to obtain the updated node characterization vector of the neighbor node. In another example, each neighboring node of the first user node is respectively used as a central node, and multi-order neighbor aggregation operation is performed to obtain an updated node characterization vector of each neighboring node. The multi-order neighbor aggregation operation can be used for referencing the neighbor aggregation operation commonly used in the graph neural network.

Therefore, the first user node in the user relationship network graph can be realized

And the update of the node characterization vectors of its neighboring nodes.

Thereafter, in step S223, the first node characterization vector according to the updated first user node is obtained through the strength fitting network 103

And determining the parameter value in the event occurrence strength function corresponding to the first user identification. It is to be understood that the mathematical form of the event occurrence intensity function may be predetermined, including the dependent variable time t and the parameter term, the value of which may be determined based on the intensity fitting network 103 and its inputs.

In one embodiment, the event occurrence intensity function may be represented by the following equation:

in the formula (8), the first and second groups,

representing a user identity u_iEvent Strength function in hidden space (labeled h), t ∈ [ t ]_i，t_i+1)，

Presentation and user targetingHu Jiu_iIn the corresponding subsequence, the event occurrence time contained in the jth event sample;

representing a reference intensity; alpha is alpha_j，iAnd delta_j，iRespectively representing the historical stimulation coefficient and the time attenuation coefficient of the jth event sample to the ith event sample in the subsequence;

indicating a bit-wise multiplication operation between vectors.

Further, in a specific embodiment, as shown in fig. 4, the intensity fitting network 103 includes a reference intensity determining layer 1031, a stimulation coefficient determining layer 1032 and an attenuation coefficient determining layer 1033, which are respectively used for determining the reference intensity in formula (8)

Historical stimulation coefficient alpha_j，iAnd the time attenuation coefficient delta_j，i。

In a more specific embodiment, the vector is characterized for the first node by a reference strength determination layer 1031

Performing linear transformation and/or activation to obtain reference intensity

In one specific example, a vector is first characterized for a first node using a weight matrix

Performing linear transformation processing, and then performing activation processing on the result of the linear transformation processing, wherein the calculation process can be expressed as the following formula:

where σ (·) represents an activation function in machine learning; w^μAnd b^μThe weight matrix and the bias vector in the reference strength determination layer 1031 are respectively represented, and are training parameters that need to be learned.

As such, the reference strength determination layer 1031 may be utilized to determine the reference strength

In a more specific embodiment, the vector is characterized from the first node by a stimulation coefficient determination layer 1032

Determining a first user node

A plurality of historical characterization vectors (denoted as

) Several attention weights (denoted as { beta)_j，i，j∈[1，i-1]{ x) }) that are based on a number of other event samples in the subsequence (denoted as { x })_j，j∈[1，i-1]}) to obtain; and, for each other event sample x_jIts corresponding attention weight beta_j，iAnd historical characterization vectors

Result of multiplication of

Determined as corresponding historical stimulation coefficients alpha_j，i. It is to be understood that vectors are characterized for history

For the determination of (2), see the characterization vector for the first node

The determination of (2) is not described in detail herein。

For the above attention weight β_j，iIn one example, for any historical token vector

The vector may be characterized by the first node first

Determining the historical characterization vector

Attention score ω of (1)_j，iAnd then, normalizing a plurality of attention scores corresponding to a plurality of historical characterization vectors to obtain a plurality of attention weights. For the attention score ω_j，iIn one particular example, the determination of (c) can be calculated

And

vector similarity between them as the attention score ω_j，i. In another specific example, the attention score ω may be calculated based on the following equation (10)_j，i。

In equation (10), [ ·; a]Representing a splice between vectors; v and W^ωIs a parameter matrix to be learned in the stimulation coefficient determination layer 1032.

For the above normalization processing, in a specific example, the normalization processing may be implemented using a softmax function; in another specific example, the normalization process of the attention score may be implemented in a manner of taking a ratio.

As such, the historical stimulation coefficient α may be determined using the stimulation coefficient determination layer 1032_j,i. Note that the calendarStachy coefficient of stimulation alpha_j,iFor capturing long-term dependencies of stimuli between events based on subsequences. In the learning process, the historical stimulation coefficient alpha_j，iThe value of (a) can be negative, thereby realizing the capture of the inhibition.

In a more specific embodiment, the first node is characterized by a attenuation coefficient determination layer 1033 that characterizes the vector

Respectively carrying out fusion processing with a plurality of historical representation vectors to obtain a plurality of fusion vectors, and carrying out linear transformation and activation processing on each fusion vector in sequence to obtain a corresponding time attenuation coefficient delta_j，i. In one example, the fusion process may include a stitching process, an addition process, or a bit-by-bit multiplication process, among others. In a specific example, the time attenuation coefficient δ may be calculated in the attenuation coefficient determination layer 1033 by the following equation (11)_j，i。

Wherein, [ ·; a]Representing a splice between vectors; w^δAnd b^δRespectively, a parameter matrix and a bias vector to be learned in the attenuation coefficient determination layer 1033.

Thus, the temporal attenuation coefficient δ can be determined using the attenuation coefficient determination layer 1033_j，i。

In the above, the reference intensity may be determined by the reference intensity determination layer 1031, the stimulation coefficient determination layer 1032, and the attenuation coefficient determination layer 1033 in the intensity fitting network 103

Historical stimulation coefficient alpha_j，iAnd the time attenuation coefficient delta_j，iTo fit an event occurrence intensity function in a hidden space

Functional form parametrizationSee equation (8). It will be appreciated that where only event samples xi are included in the sub-sequence, the historical stimulation coefficient α_j，iAnd the time attenuation coefficient delta_j，iAre all 0, so only the reference intensity needs to be determined

In another embodiment, the form of the event occurrence intensity function may also be represented as follows:

in the formula (12), η_t、

And b is a parameter vector to be learned; for the rest of the symbols, reference may be made to the description of the symbols in equation (8).

Further, when each sample in the subsequence is used as an input sample of the event prediction system, the updated node characterization vectors of the first user node are spliced, and the spliced vectors are input to the strength fitting network 103, so that the parameter η in the formula (12) is obtained_t、

And b is selected.

In the above, η in the formula (12) can be determined by the intensity fitting network 103_t、

And b, obtaining the event occurrence intensity function fitted in the hidden space

From the above, the intensity function of the event occurrence in the hidden space can be fitted through the intensity fitting network 103

Then, in step S224, the intensity of the event occurrence is functionalized through the intensity mapping network 104

Mapping to event type space with event type as space dimension to obtain mapped function

I.e. a plurality of intensity functions at a plurality of event types, R^KK in (2) is equal to the total number of the plurality of event types.

In a particular embodiment, the intensity mapping network 104 may be implemented as a fully connected network with the softplus function as the activation function. In another particular embodiment, the intensity mapping network 104 may be implemented as a multi-layer fully connected network.

Thus, the first user identification u can be obtained_iA plurality of intensity functions under a plurality of event types. Next, in step S230, based on the plurality of intensity functions and the corresponding first user ID u_iSecond event sample x_i+1And updating the network parameters in the event prediction system. Wherein the second event sample x_i+1Is taken as a label sample corresponding to the second occurrence time t_i+1Sample x later than the first event_iCorresponding first occurrence time t_i. It is to be understood that historically, the first subscriber identity u_iThe identified first user, after making a first event in the first event sample, then makes a second event sample x_i+1A second event in (1); in addition, a second event sample x_i+1May or may not be included in the sequence of event samples described above.

In one embodiment, this step may include: determining the second event sample x from the plurality of intensity functions corresponding to the plurality of event types_i+1Intensity function (denoted as

Or

) (ii) a Based on this intensity function

And a second event sample x_i+1Corresponding occurrence time t_i+1And updating the network parameters in the event prediction system. In a particular embodiment, it may be based on an intensity function

And the occurrence time t_i+1And determining the training loss, and updating the network parameters by using the training loss. In another specific embodiment, it may be based on an intensity function

Time of occurrence t_i+1And the strength function corresponding to other K-1 event types, determining the training loss, and updating the network parameters by using the training loss. Further, in one example, the training loss is calculated based on a negative log-likelihood function, as shown in equation (13) below.

In the formula (13), the first and second groups,

k represents the kth event type of the K event types. Thus, L can be reduced_nllTo target, network parameters in the event prediction system are updated.

In one embodiment, propagation between event types is modeled through a design Graph Regularization (Graph Regularization) process, thereby further improving the training effectiveness of the event prediction system described above. Specifically, as shown in FIG. 4, topThe event prediction system also includes a neighbor matrix prediction layer 105. Prior to this step, the method further comprises: and determining a prediction adjacent order matrix of the virtual event relation network graph constructed based on the plurality of event types according to the characterization vectors of the nodes in the user relation network graph through the adjacent matrix prediction layer 105. Based on this, the method can comprise the following steps: in one aspect, the second event sample x is based on the plurality of intensity functions and_i+1determining a first loss term; on the other hand, acquiring a real event relation network graph, wherein the real event relation network graph comprises a plurality of type nodes corresponding to the event types and directed connection edges formed by causal relations among the type nodes; determining a second loss term based on the true adjacency matrix and the predicted adjacency matrix of the real event relationship network diagram; and updating the network parameters of the event prediction system based on the first loss term and the second loss term.

In a specific embodiment, the constructing of the virtual event relationship network graph includes: designing K nodes corresponding to the K event types, then establishing directed connection edges between any two nodes, and correspondingly obtaining an edge set epsilon { e }_pq}_KxKWherein e is_pqIndicating that node p is the parent of node q, i.e., an event of type p may cause an event of type q to occur, with a causal relationship between them that event type p is a cause and event type q is an effect. It should be noted that the adjacency matrix is used to record the connection relationship between nodes in the relational network graph, for example, if there is a directed edge pointing from node i to node j in the relational network graph, the element B in the adjacency matrix B _pq1, otherwise b_pq0. The prediction of the adjacent order matrix means that the value of the matrix element is obtained through prediction, and correspondingly, if the predicted value is 0, the connection strength of the corresponding directed edge is low, which is equivalent to absence; if the predicted numerical value is larger, the connection strength of the corresponding directed edge is higher. Alternatively, the predicted values of the matrix elements may be considered as the connection weights of the corresponding connection edges.

In a specific embodiment, the determining of the prediction adjacent order matrix may include: determining type characterization vectors of each event type in the event types based on the characterization vectors of the nodes in the user relationship network diagram to form a type characterization matrix H; then, the predicted adjacency matrix a is determined based on the type characterization matrix H and the learning parameter matrix in the adjacency matrix prediction layer 105.

In a more specific embodiment, for the determination of the type token vector for each event type, in one example, in step S222 above, the token vector of the first user node is updated to be the token vector of the first user node through the graph propagation network 102

Thereafter, the event sample x may be sampled_iType of Medium event c_iThe corresponding type token vector is updated to

In another example, in step S222 above, the token vector of the first user node is updated to be the token vector of the first user node through the graph propagation network 102

Thereafter, the event sample x may be sampled_iType of Medium event c_iThe corresponding current type token vector is updated to be the sum thereof

Average value of (a) to (b). In this way, by sequentially using the event samples in the event sequence as the first event sample, a plurality of type characterization vectors corresponding to a plurality of event types can be obtained.

In a more specific embodiment, the predicted adjacency matrix a may be calculated by the following equation (14):

A＝HΩH^T (14)

in formula (14), H denotes the above-described type characterization matrix, Ω denotes a learning parameter matrix in the adjacency matrix prediction layer 105, and T denotes a transposition operation of the matrix.

In a specific embodiment, the obtaining the real event relationship network graph may include: the method comprises the steps of obtaining a plurality of user event sequences, wherein each user event sequence comprises a plurality of events which are made by a corresponding user and are arranged according to a time sequence, and a causal relationship exists between event types corresponding to any two adjacent events; and constructing the real event relation network graph based on the plurality of user event sequences. It should be understood that the truth adjacency matrix records the connection relationship between nodes in the real event relationship network diagram. In a more specific embodiment, the truth adjacency matrix also records a weight of the directed connecting edge, and the weight is determined based on the statistical times of the causal relationship. In an example, the propagation times between any two event types can be counted according to the user event sequence, and then the connection edge weight between the nodes corresponding to any two event types is calculated through the following formula.

In the formula (15), N_pqRepresenting the statistical number of times an event type p propagates to an event type q, N_maxRepresents the maximum of all statistical times, e_pqRepresenting the weight of the connecting edge that node p points to node q.

In a specific embodiment, determining the second loss term based on the true adjacency matrix and the predicted adjacency matrix may be implemented as: determining the second loss term L by using KL divergence (KL-divergence) or other modes capable of quantifying the distance between the matrixes_graph。

It should be noted that, for the determination of the first loss term, reference may be made to the relevant description in the foregoing embodiments. After the first loss term and the second loss term are determined, the network parameters of the event prediction system are updated with the goal of reducing the combined loss between the first loss term and the second loss term. In one example, the synthetic loss is calculated as:

min-L_nll+γL_graph (16)

in equation (16), the first loss term is implemented as L_nllAnd L is_nllThe meanings of (A) can be found inSee equation (13); l is_graphRepresenting a second loss term; gamma denotes L_graphThe weighting factor of (2) is a super parameter, and may be set to 0.02, for example.

In the above, by introducing graph regularization processing, more effective training of the event prediction system can be realized.

In summary, in the embodiment of the present specification, an event prediction system framework built based on a neural network is innovatively provided, in a training process of the event prediction system, a deep neural network is used to perform feature extraction on an event sample, parameters of a strength function are fitted in a hidden space (or simply, hidden space), and the strength function is mapped back to an event type space, so that strength functions corresponding to various event types are obtained, and the update of the event prediction system is further realized by combining with a label sample. Furthermore, in the updating process, graph regularization processing can be introduced, so that a better training effect is obtained. Therefore, through repeated iterative training, a trained event prediction system can be obtained, so that the intensity function of the target event sequence of the target user can be modeled, and accurate prediction of the next event after the target event sequence in the future can be realized by adopting the obtained more accurate and more flexible intensity function.

Corresponding to the above updating method, the embodiment of the present specification further discloses an updating apparatus. FIG. 5 illustrates an update apparatus architecture diagram of an event prediction system, according to one embodiment. As shown in fig. 5, the illustrated apparatus 500 includes:

the sequence acquiring unit 510 is configured to sequentially acquire event samples as first event samples from a sequence of event samples formed by arranging in a time sequence, where sample attributes include a first occurrence time and a first user identifier. An event processing unit 520, configured to input the first event sample into an event prediction system for event processing, where the event prediction system includes a sequence coding network, a graph propagation network, an intensity fitting network, and an intensity mapping network; the event processing unit 520 includes the following modules: the encoding module 521 is configured to determine, through the sequence encoding network, a sequence encoding vector of a subsequence up to the first occurrence time, where each event sample in the subsequence corresponds to the first user identifier; a graph propagation module 522 configured to update node characterization vectors of the first user node and its neighboring nodes in the user relationship network graph according to the sequence coding vector through the graph propagation network; a strength fitting module 523 configured to determine, through the strength fitting network, a parameter value in an event occurrence strength function corresponding to the first user identifier according to the updated first node characterization vector of the first user node; a strength mapping module 524, configured to map the event occurrence strength function to an event type space through the strength mapping network, to obtain a plurality of strength functions of the first subscriber identity under a plurality of event types. A parameter updating unit 530 configured to update network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity; and the second occurrence time corresponding to the second event sample is later than the first occurrence time.

In one embodiment, the sequence encoding network includes a linear embedding sub-network and a timing sub-network; the encoding module 521 is specifically configured to: determining type embedding vectors of event types corresponding to the event samples in the subsequence through the linear embedding sub-network; and outputting the sequence coding vector based on the sequentially input type embedding vector corresponding to each event sample through the time sequence sub-network.

In one embodiment, the graph propagation module 522 is specifically configured to: taking the first user node as a target node, and executing an updating operation aiming at the target node; wherein the update operation comprises: and determining the updated representation vector of the target node according to the target sequence coding vector corresponding to the target node, the current representation vector of the target neighbor node of the target node and the current representation vector of the target node.

In a specific embodiment, the graph propagation module 522 is further configured to: and taking the neighbor node of the first user node as a target node, and executing the updating operation.

In a particular embodiment, the graph propagation network includes a local propagation layer, a self-propagation layer, an exogenous propagation layer, and a fusion layer; wherein the update operation specifically includes: processing the current characterization vector of the target neighbor node through the local propagation layer to obtain a local propagation vector; performing linear transformation on the current characterization vector of the target node by using a first parameter matrix through the self-propagation layer to obtain a self-propagation vector; performing linear transformation on the target sequence coding vector by using a second parameter matrix through the exogenous propagation layer to obtain an exogenous propagation vector; and performing fusion processing on the local propagation vector, the self-propagation vector and the exogenous propagation vector through the fusion layer to obtain the updated representation vector of the target node.

In a more specific embodiment, the target neighbor node is a plurality of first order neighbor nodes of the target node; the graph propagation module 522 obtains a local propagation vector by performing the update operation, and specifically includes: and carrying out weighted summation on the current characterization vectors of the first-order neighbor nodes by using the weight parameter vector in the local propagation layer to obtain the local propagation vector.

In another more particular embodiment, the target neighbor node includes a plurality of first order neighbor nodes and a plurality of second order neighbor nodes of the target node; the graph propagation module 522 obtains a local propagation vector by performing the update operation, and specifically includes: aiming at each first-order neighbor node, determining a plurality of attention weights corresponding to a plurality of current characterization vectors of a plurality of second-order neighbor nodes by using the current characterization vector of the first-order neighbor node; weighting and summing the current characterization vectors by using the attention weights to obtain a neighbor aggregation vector of the first-order neighbor node; and carrying out weighted summation on a plurality of neighbor aggregation vectors of the first-order neighbor nodes by using the weight parameter vector in the local propagation layer to obtain the local propagation vector.

In one embodiment, the event occurrence intensity function includes a reference intensity, and the intensity fitting network includes a reference intensity determination layer; wherein the intensity fitting module 523 is specifically configured to: and performing linear transformation processing and activation processing on the first node characterization vector through the reference strength determination layer to obtain the reference strength.

In a specific embodiment, the event occurrence intensity function further includes a historical stimulation coefficient and a time attenuation coefficient, and the intensity fitting network further includes a stimulation coefficient determination layer and an attenuation coefficient determination layer; the intensity fitting module 523 is further configured to: determining, by the stimulation coefficient determination layer, a number of attention weights for a number of historical characterization vectors for the first user node from the first node characterization vector, the number of characterization vectors being derived based on a number of other event samples in the subsequence; and for each other event sample, determining the product result of the corresponding attention weight and the historical characterization vector as a corresponding historical stimulation coefficient; and respectively fusing the first node characterization vector with the plurality of historical characterization vectors through the attenuation coefficient determination layer to obtain a plurality of fusion vectors, and sequentially performing linear transformation and activation processing on each fusion vector to obtain a corresponding time attenuation coefficient.

In an embodiment, the parameter updating unit 530 is specifically configured to: determining an intensity function corresponding to the same event type as the second event sample from the plurality of intensity functions; and updating the network parameters based on the intensity function and the occurrence time corresponding to the second event sample.

In one embodiment, the event prediction system further comprises a adjacency matrix prediction layer; the apparatus 500 further comprises: the adjacent order matrix prediction unit 540 is configured to determine, through the adjacent matrix prediction layer, a predicted adjacent order matrix of the virtual event relationship network graph constructed based on the multiple event types according to the characterization vectors of the nodes in the user relationship network graph. The updating unit 530 is specifically configured to: determining a first loss term based on the plurality of intensity functions and a second event sample; acquiring a real event relation network graph, wherein the real event relation network graph comprises a plurality of type nodes corresponding to the event types and directed connection edges formed by causal relations among the type nodes; determining a second loss term based on a true adjacency matrix and the predicted adjacency matrix of the true event relationship network map; updating the network parameter based on the first loss term and the second loss term.

In a specific embodiment, the updating unit 530 is configured to obtain a real event relationship network graph, including: acquiring a plurality of user event sequences, wherein each user event sequence comprises a plurality of events which are made by a corresponding user and are arranged according to a time sequence, and the causal relationship exists between event types corresponding to any two adjacent events; and constructing the real event relation network graph based on the plurality of user event sequences.

On the other hand, in a specific embodiment, the neighboring matrix prediction unit 540 is specifically configured to: determining a type characterization vector of each event type in the plurality of event types based on the characterization vectors of the nodes to form a type characterization matrix; determining the predicted adjacency matrix based on the type characterization matrix and a learning parameter matrix in the adjacency matrix prediction layer.

In a more specific embodiment, the apparatus further comprises: the vector updating unit 550 is configured to update the type characterization vector corresponding to the event type of the first event sample to the first node characterization vector.

According to an embodiment of a further aspect, the present specification further discloses an event prediction system. The event prediction system comprises: the system comprises an input layer, a data processing layer and a data processing layer, wherein the input layer is used for sequentially acquiring event samples from an event sample sequence formed by arranging according to a time sequence as first event samples, and the sample attributes of the first event samples comprise first generation time and first user identification; the sequence coding network is used for determining a sequence coding vector of a subsequence up to the first occurrence moment, and each event sample in the subsequence corresponds to the first user identifier; the graph propagation network is used for updating the node representation vectors of the first user node and the neighbor nodes thereof in the user relationship network graph according to the sequence coding vector; the intensity fitting network is used for determining a parameter value in an event occurrence intensity function corresponding to the first user identifier according to the updated first node characterization vector of the first user node; the intensity mapping network is used for mapping the event occurrence intensity function to an event type space to obtain a plurality of intensity functions of the first user identifier under a plurality of event types; and the output layer is used for outputting an event prediction result corresponding to the first user identifier based on the plurality of intensity functions, wherein the event prediction result comprises a predicted event type and a predicted occurrence moment. It should be noted that the description of the event prediction system can be referred to the related description in the foregoing embodiments. In addition, for the case that the event prediction system includes the adjacent order matrix prediction layer 105 in the training process, when the trained event prediction system is used, the trained adjacent order matrix prediction layer 105 can be removed from the event prediction system, and the modeling of the intensity function and the prediction of the future event can be realized by using the residual network part.

According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 2.

According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 2.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. An updating method of an event prediction system comprises the following steps:

the method comprises the steps that event samples are sequentially obtained from an event sample sequence formed by arranging according to a time sequence and serve as first event samples, and the sample attributes of the first event samples comprise first occurrence time and first user identification;

inputting the first event sample into an event prediction system for event processing, wherein the event prediction system comprises a sequence coding network, a graph propagation network, an intensity fitting network and an intensity mapping network; the event processing comprises the following steps:

determining a sequence coding vector of a subsequence up to the first occurrence moment through the sequence coding network, wherein each event sample in the subsequence corresponds to the first user identifier;

updating node characterization vectors of a first user node and neighbor nodes thereof in the user relationship network graph through the graph propagation network according to the sequence coding vectors;

determining parameter values in an event occurrence intensity function corresponding to the first user identification according to the updated first node characterization vector of the first user node through the intensity fitting network;

mapping the event occurrence intensity function to an event type space through the intensity mapping network to obtain a plurality of intensity functions of the first user identifier under a plurality of event types;

updating network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity; and the second occurrence time corresponding to the second event sample is later than the first occurrence time.

2. The method of claim 1, wherein the sequence encoding network comprises a linear embedding sub-network and a timing sub-network; wherein determining the sequence-encoded vector of the sub-sequence up to the first occurrence time comprises:

determining type embedding vectors of event types corresponding to the event samples in the subsequence through the linear embedding sub-network;

and outputting the sequence coding vector based on the sequentially input type embedding vector corresponding to each event sample through the time sequence sub-network.

3. The method of claim 1, wherein updating the node characterization vectors for the first user node and its neighbor nodes in the user relationship network graph comprises:

taking the first user node as a target node, and executing an updating operation aiming at the target node;

wherein the update operation comprises: and determining the updated representation vector of the target node according to the target sequence coding vector corresponding to the target node, the current representation vector of the target neighbor node of the target node and the current representation vector of the target node.

4. The method of claim 3, wherein after performing an update operation on a target node characterization vector with the first user node as a target node, the method further comprises:

and taking the neighbor node of the first user node as a target node, and executing the updating operation.

5. The method of claim 3 or 4, wherein the graph propagation network comprises a local propagation layer, a self propagation layer, an exogenous propagation layer, and a fusion layer; wherein the update operation specifically includes:

processing the current characterization vector of the target neighbor node through the local propagation layer to obtain a local propagation vector;

performing linear transformation on the current characterization vector of the target node by using a first parameter matrix through the self-propagation layer to obtain a self-propagation vector;

performing linear transformation on the target sequence coding vector by using a second parameter matrix through the exogenous propagation layer to obtain an exogenous propagation vector;

and performing fusion processing on the local propagation vector, the self-propagation vector and the exogenous propagation vector through the fusion layer to obtain the updated representation vector of the target node.

6. The method of claim 5, wherein the target neighbor node is a plurality of first order neighbor nodes of the target node; wherein, processing the current characterization vector of the target neighbor node to obtain a local propagation vector, includes:

and carrying out weighted summation on the current characterization vectors of the first-order neighbor nodes by using the weight parameter vector in the local propagation layer to obtain the local propagation vector.

7. The method of claim 5, wherein the target neighbor node comprises a plurality of first order neighbor nodes and a plurality of second order neighbor nodes of the target node; wherein, processing the current characterization vector of the target neighbor node to obtain a local propagation vector, includes:

aiming at each first-order neighbor node, determining a plurality of attention weights corresponding to a plurality of current characterization vectors of a plurality of second-order neighbor nodes by using the current characterization vector of the first-order neighbor node;

weighting and summing the current characterization vectors by using the attention weights to obtain a neighbor aggregation vector of the first-order neighbor node;

and carrying out weighted summation on a plurality of neighbor aggregation vectors of the first-order neighbor nodes by using the weight parameter vector in the local propagation layer to obtain the local propagation vector.

8. The method of claim 1, wherein the incident intensity function includes a reference intensity, and the intensity fit network includes a reference intensity determination layer; wherein determining parameter values in an event occurrence strength function corresponding to the first user comprises:

and performing linear transformation processing and activation processing on the first node characterization vector through the reference strength determination layer to obtain the reference strength.

9. The method according to claim 8, wherein the event occurrence intensity function further comprises a historical stimulation coefficient and a time attenuation coefficient, and the intensity fitting network further comprises a stimulation coefficient determination layer and an attenuation coefficient determination layer; wherein determining a parameter value in an event occurrence strength function corresponding to the first user further comprises:

determining, by the stimulation coefficient determination layer, a number of attention weights for a number of historical characterization vectors for the first user node from the first node characterization vector, the number of characterization vectors being derived based on a number of other event samples in the subsequence; and for each other event sample, determining the product result of the corresponding attention weight and the historical characterization vector as a corresponding historical stimulation coefficient;

and respectively fusing the first node characterization vector with the plurality of historical characterization vectors through the attenuation coefficient determination layer to obtain a plurality of fusion vectors, and sequentially performing linear transformation and activation processing on each fusion vector to obtain a corresponding time attenuation coefficient.

10. The method of claim 1, wherein updating network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity comprises:

determining an intensity function corresponding to the same event type as the second event sample from the plurality of intensity functions;

and updating the network parameters based on the intensity function and the occurrence time corresponding to the second event sample.

11. The method of claim 1, wherein the event prediction system further comprises a adjacency matrix prediction layer; wherein, prior to updating network parameters in the event prediction system, the method further comprises:

determining a prediction adjacent order matrix of a virtual event relation network graph constructed based on the plurality of event types according to the characterization vectors of the nodes in the user relation network graph through the adjacent matrix prediction layer;

wherein updating the network parameters in the event prediction system comprises:

determining a first loss term based on the plurality of intensity functions and a second event sample;

acquiring a real event relation network graph, wherein the real event relation network graph comprises a plurality of type nodes corresponding to the event types and directed connection edges formed by causal relations among the type nodes;

determining a second loss term based on a true adjacency matrix and the predicted adjacency matrix of the true event relationship network map;

updating the network parameter based on the first loss term and the second loss term.

12. The method of claim 11, wherein obtaining a real event relationship network graph comprises:

acquiring a plurality of user event sequences, wherein each user event sequence comprises a plurality of events which are made by a corresponding user and are arranged according to a time sequence, and the causal relationship exists between event types corresponding to any two adjacent events;

and constructing the real event relation network graph based on the plurality of user event sequences.

13. The method according to claim 11 or 12, wherein the truth adjacency matrix includes a weight of the directed connecting edge, the weight being determined based on a statistical number of the causal relationship.

14. The method of claim 11, wherein determining a prediction neighborhood matrix for a virtual event relationship network graph constructed based on the plurality of event types from characterization vectors for nodes in the user relationship network graph comprises:

determining a type characterization vector of each event type in the plurality of event types based on the characterization vectors of the nodes to form a type characterization matrix;

determining the predicted adjacency matrix based on the type characterization matrix and a learning parameter matrix in the adjacency matrix prediction layer.

15. The method of claim 14, wherein prior to updating network parameters in the event prediction system, the method further comprises:

and updating the type characterization vector corresponding to the event type of the first event sample into the first node characterization vector.

16. An updating apparatus of an event prediction system, comprising:

the event sample sequence acquiring unit is configured to sequentially acquire event samples from an event sample sequence formed by arranging according to a time sequence, wherein the event samples are used as first event samples, and the sample attributes of the event samples comprise a first occurrence time and a first user identifier;

the event processing unit is configured to input the first event sample into an event prediction system for event processing, and the event prediction system comprises a sequence coding network, a graph propagation network, an intensity fitting network and an intensity mapping network; the event processing unit comprises the following modules:

the encoding module is configured to determine a sequence encoding vector of a subsequence up to the first occurrence time through the sequence encoding network, wherein each event sample in the subsequence corresponds to the first user identifier;

the graph propagation module is configured to update the node representation vectors of the first user node and the neighbor nodes thereof in the user relationship network graph according to the sequence coding vector through the graph propagation network;

the intensity fitting module is configured to determine a parameter value in an event occurrence intensity function corresponding to the first user identifier according to the updated first node characterization vector of the first user node through the intensity fitting network;

the intensity mapping module is configured to map the event occurrence intensity function to an event type space through the intensity mapping network to obtain a plurality of intensity functions of the first user identifier under a plurality of event types;

a parameter updating unit configured to update network parameters in the event prediction system based on the plurality of intensity functions and a second event sample corresponding to the first subscriber identity; and the second occurrence time corresponding to the second event sample is later than the first occurrence time.

17. The apparatus of claim 16, wherein the graph propagation module is specifically configured to:

18. The apparatus of claim 17, wherein the graph propagation network comprises a local propagation layer, a self propagation layer, an exogenous propagation layer, and a fusion layer; wherein, the update operation executed by the graph propagation module specifically includes:

19. The apparatus of claim 18, wherein the target neighbor node comprises a plurality of first order neighbor nodes and a plurality of second order neighbor nodes of the target node; the graph propagation module obtains a local propagation vector by executing the update operation, and specifically includes:

20. The apparatus of claim 16, wherein the event occurrence intensity function includes a reference intensity, and the intensity fit network includes a reference intensity determination layer; wherein the intensity fitting module is specifically configured to:

21. The apparatus according to claim 20, wherein the event occurrence intensity function further comprises a historical stimulation coefficient and a time attenuation coefficient, and the intensity fitting network further comprises a stimulation coefficient determination layer and an attenuation coefficient determination layer; wherein the intensity fitting module is further configured to:

22. The apparatus of claim 16, wherein the event prediction system further comprises a adjacency matrix prediction layer; the apparatus further comprises an adjacent order matrix prediction unit configured to:

wherein the parameter updating unit is specifically configured to:

23. An event prediction system comprising:

the system comprises an input layer, a data processing layer and a data processing layer, wherein the input layer is used for sequentially acquiring event samples from an event sample sequence formed by arranging according to a time sequence as first event samples, and the sample attributes of the first event samples comprise first generation time and first user identification;

the sequence coding network is used for determining a sequence coding vector of a subsequence up to the first occurrence moment, and each event sample in the subsequence corresponds to the first user identifier;

the graph propagation network is used for updating the node representation vectors of the first user node and the neighbor nodes thereof in the user relationship network graph according to the sequence coding vector;

the intensity fitting network is used for determining a parameter value in an event occurrence intensity function corresponding to the first user identifier according to the updated first node characterization vector of the first user node;

the intensity mapping network is used for mapping the event occurrence intensity function to an event type space to obtain a plurality of intensity functions of the first user identifier under a plurality of event types;

and the output layer is used for outputting an event prediction result corresponding to the first user identifier based on the plurality of intensity functions, wherein the event prediction result comprises a predicted event type and a predicted occurrence moment.

24. A computer-readable storage medium, having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-15.

25. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-15.