CN116596150A - Event prediction method of transform Hoxwell process model based on multi-branch self-attention - Google Patents

Event prediction method of transform Hoxwell process model based on multi-branch self-attention Download PDF

Info

Publication number
CN116596150A
CN116596150A CN202310616233.3A CN202310616233A CN116596150A CN 116596150 A CN116596150 A CN 116596150A CN 202310616233 A CN202310616233 A CN 202310616233A CN 116596150 A CN116596150 A CN 116596150A
Authority
CN
China
Prior art keywords
event
attention
sequence
branch
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310616233.3A
Other languages
Chinese (zh)
Inventor
高腾达
吴春雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum East China
Original Assignee
China University of Petroleum East China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum East China filed Critical China University of Petroleum East China
Priority to CN202310616233.3A priority Critical patent/CN116596150A/en
Publication of CN116596150A publication Critical patent/CN116596150A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an event prediction method of a transform Hoxwell process model based on multi-branch self-attention. The previous methods have been mainly directed application of recurrent neural networks or self-attention models. However, we note that in the application of the conventional self-attention model, the model ignores the difference in the integrity of the dependency relationship learned at different angles, resulting in learning bias of the overall sequence information. Furthermore, we note that in the transducer-based model, there is a problem of poor local perceptibility, which results in local information being easily ignored. The invention provides a multi-branch self-attention-based transform Hoxwell process model for learning information of an event sequence for the first time. A multi-branch self-attention network is designed, and more accurate sequence information is mined. A local perception enhancement module is provided to solve the problem that local context information is insensitive. The effectiveness of the proposed model is proved by carrying out a large number of experiments on four real world data sets such as Retweets and two artificial data sets.

Description

Event prediction method of transform Hoxwell process model based on multi-branch self-attention
Technical Field
The invention belongs to a method for predicting events, and relates to the technical field of deep learning and time sequence point processes.
Background
In the information age, there are numerous events worth recording every day, such as hospital visits, stock exchanges, earthquakes, and popular trends of social platforms, and these data can be stored as a plurality of event sequences, each event sequence is a record of all events occurring at indefinite intervals in a continuous time, and in each event record, the type of event occurrence and the time of event occurrence are stored. The data stored in the event sequence contains a great amount of rule information, and the time and the type of the event can be predicted by learning the occurrence rule of the event and analyzing the relationship between the events. How to analyze useful rule information from event sequences is a key of research, and a point process model can well model discrete event sequences in continuous time, so that the research of the point process model is a field of great concern due to the intelligent social requirement, wherein the Hox process is one of widely used methods in the point process model.
The self-excitation process is also called a Hox process, and it is considered that for the events to occur in the future, each historical event can generate excitation action and accumulate, and the dependency relationship between the events can be well simulated, so that the Hox process is widely applied. The traditional Hox process model has the defect that the situation is difficult to avoid, the relation between different events in reality is uncertain, the mutual excitation is possible, the mutual inhibition is possible, the relation is also possible, the parameter assumption of the model needs to be modified for different scenes, and the traditional method is difficult to work for the scenes; meanwhile, the lack of the nonlinear fitting capability of the traditional Hox process model limits the expression capability of the model, so that the model is difficult to cope with very complex situations.
In recent years, with the development of machine learning, particularly deep learning, many strong network architectures have emerged in the fields of computer vision and natural language processing, RNN is one of the most excellent network architectures, and has been widely used and improved because RNN is a network generated for processing sequence data, which can naturally learn potential relationships between data, nan Du et al converts event sequence information into vector input RNN models, learns nonlinear functions representing event intensity functions from history data, and thus realizes modeling of sequences. The point process model is combined with deep learning to improve the expressive power of the sequence data, and the event and category of the future event occurrence can be predicted more accurately while no custom algorithm is required for the model design.
However, RNN suffers from the disadvantage of problematic endogenous, RNN-based models that, even if equipped with gating mechanisms for preserving long-term and short-term memory, still have difficulty capturing long-term dependencies of event sequences; in addition, the training difficulty of the circulating neural network is also relatively high, and the problems of gradient disappearance and gradient explosion are always difficult to overcome, so that the model performance of the circulating neural network is limited.
In recent years, a transducer architecture has achieved great success in the field of natural language processing, and then exciting results are obtained in the fields of cognitive voice signal processing, image classification, target detection, semantic segmentation and the like, so that the transducer has excellent remote dependency capture capability, can effectively solve the problem of long-term dependence which is difficult to cope with RNN, and the Qiang Zhang et al firstly propose modeling a Hox process by using a self-attention mechanism; similarly, simiao Zuo et al propose a transducer houx process model based on the architecture of a transducer encoder, directly using the transducer encoder to encode historical representations of sequences. However, these methods do not take into account the integrity of the dependencies learned from the data sequence at different perspectives, and we note that in the existing transform hoxwellian process model, each attention header focuses on information from different token subspaces from different perspectives, they learn the sequence information independently, the extracted hidden tokens are all the whole information obtained from different perspectives, they are simply combined in a connected manner, and the difference in integrity of the different independent sequence information is ignored, which will lead to a deviation in the whole sequence information learning. In addition, in the conventional hoxwell process model based on the transducer, the problem of poor local perceptibility exists, which also causes deviation of sequence information understanding.
Disclosure of Invention
The invention aims to solve the problem that in an event prediction method based on a transform Hooke process model, the relation among information from different feature subspaces is rarely considered, and all information is simply connected. Moreover, the local perceptibility of the existing converter-based hoxwell process model is poor, so that information learning deviation can occur for event sequences with short time intervals.
The technical scheme adopted for solving the technical problems is as follows:
s1, constructing a multi-branch self-attention module, and extracting more accurate historical characterization of the sequence according to the differentiation fusion of multi-angle information.
S2, a local perception enhancement module is constructed, local context information is further processed through causal convolution, and the local perception capability is enhanced to improve modeling performance.
S3, constructing a multi-branch self-attention-based transducer sequence history representation coding network architecture by combining the network in S2 and the network in S3
S4, an event prediction module is constructed, and occurrence of future events is predicted through sequence history characterization extracted in the S3.
S5, constructing a multi-branch self-attention-based transform Hox process model framework by combining the network in the S3 and the network in the S4.
S6, training a multi-branch self-attention-based transducer Hoxwell process model.
The multi-branch self-attention module differentiates different characterization information obtained under multiple view angles according to input to extract a sequence hidden characterization with more accurate sequence. By considering the contribution degree of different characterization information to the hidden characterization of the final sequence, corresponding weights are given to the output of different characterization subspaces, so that the deviation of the overall understanding of the sequence is reduced. We describe the detailed operation below:
given a data set, the event categories in the data set are set to be M categories, and for any event sequence in the given data set Event type +.>Performing one-time thermal encoding (except kth i M-dimensional vectors with all values except 1, 0) are obtained +.>Then input a type code of an embedding matrix generation event +.>
C=W e K′#(1)
Wherein the method comprises the steps ofIs an embedding matrix, using a D-dimensional sized coded vector to represent event categories, d=128.
Encoding the time information using a sine and cosine function to obtain
For time information t i We use p (t) i ) To encode, [ p (t) i )] j The j-th element encoded for time information.
In the above way we obtain the event type code C and the event time code P of the sequence, the embedded code of the whole sequence of events can be expressed as: x= (c+p) T The embedded encodings of all events obtained are input into a multi-branch self-attention based transform sequence history characterization encoding network.
Given a sequence of events embedded with codesInputting X into the device withIn the layer of the multi-branch transducer encoder for enhancing local attention, each layer of encoder is provided with H branches, each branch is provided with a self-attention module and a local attention enhancement network module, and in each branch, X generates Q, K and V through 3 conversion matrixes respectively, and the processing mode is as follows:
Q=XW Q ,K=XW K ,V=XW V #(3)
wherein W is QIs a weight matrix of linear conversion, D K =D v =64。
To prevent t i Event pair t at time j The event at time (j > i) generates attention, mask-processes Q, and then Q [ i, j ]](j > i) is set to 0.
Calculating the output S of the attention head from Q, K, V i
Wherein K is T Representing a transpose of the matrix, softmax () refers to the softmax function.
Giving different weights to the outputs of different attention heads, generating the outputs by converting the matrix
Wherein the method comprises the steps ofIs an aggregation matrix, alpha i Is a learned parameter, requiring +.>The method can endow sequence information learned from different angles with different importance. Then carrying out residual connection and normalizationUnification to obtain the final output +.>
S′ i =S′ i +X#(6)
S′ i =LayerNorm(S′ i )#(7)
The local perception enhancement module further learns the local context information of the characterization information under a single view angle through causal convolution, so that the local perception capability of the model on the characterization information is improved, and the accuracy of the model on the overall understanding of the sequence is improved. Obtain the output S 'of the single attention head' i Then, the data is input into a local perception enhancement network to obtain
F A =S′ i W 1 +b 1 #(8)
F B =ReLu(F A )#(9)
F C =CCNN(F B )#(10)
F D =F C W 2 +b 2 #(11)
Wherein the method comprises the steps ofIs a parameter of the neural network, reLu () refers to ReLu function, CCNN is a causal convolution layer, convolution kernel size is 3, stride is 1, and padding is 2. The closer the event is to the more critical the future event influence, the local perception enhancement mechanism of the invention can make the model learn more reasonable hidden characterization of each event by carrying out causal convolution on adjacent events.
Will F D The result of (2) is weighted to obtain the final output information of the branchThe specific operation is as follows:
β i is a learned parameter, requiresB is input into the next encoder layer for learning after residual connection and normalization, and hidden characterization of an event sequence is finally output after multi-layer encoder layer learning
In a multi-branch self-attention-based transducer sequence history representation coding network architecture, each branch consists of a weight-based self-attention module and a local perception enhancement module, and different weight branches are fused, so that a model can learn more accurate sequence information.
The event prediction module accurately predicts future events based on hidden characterizations of the input event sequence. And respectively obtaining the prediction results of the event type and the event time by inputting the hidden representation into different prediction networks. We describe the detailed operation below:
given an event (k) i ,t i ) Historical hidden characterization of (a)Wherein h is i =b (i,:), the probability distribution of the next event type is obtained by one full connection layer:
wherein the method comprises the steps ofIs a full-connection layer parameter of the predicted event type, argmax () is an argmax function for judging the maximum probability type of the next event +.>
Similarly, for the occurrence time of the next eventIs predicted by:
wherein the method comprises the steps ofIs a full connection layer parameter that predicts event time.
The event prediction method based on the multi-branch self-attention transducer Hooke process model comprises a multi-branch self-attention transducer sequence history characterization coding network and an event prediction network module.
Finally, the training method of the multi-branch self-attention-based transducer Hox process model is as follows:
the training loss of the model consists of likelihood function loss, time loss and type loss, the model is trained by using an Adam optimizer, and the loss formula is as follows:
wherein R is the number of sequences in the dataset, gamma type And gamma is equal to time Is a super parameter that helps keep training stable, first, time lossAnd type loss->The calculation method is as follows:
to maximize likelihood function, the likelihood loss is calculatedTaking a negative value, the calculation of the likelihood function comprises two parts, namely event log likelihood and non-event log likelihood, and the calculation mode is as follows:
event log likelihoodThe calculation mode of (2) is as follows:
in the event sequence, the relation between the occurrence probability and the occurrence time of each type of event is determined by the corresponding conditional intensity function, and we use the obtained hidden characterization of the event sequence to calculate the conditional intensity function lambda (t|H t ) Wherein H is t ={(t j ,k j ):t j < t } is a sequence of historical events prior to time t, we define different conditional intensity functions for different classes of events, for example, in a sequence with M classes of event categories, for each event category k ε {1,2,., M }, a conditional intensity function λ k (t|H t ):
Wherein the method comprises the steps ofIs a softplus function, beta k Is the soft parameter。
The conditional intensity function of the whole sequence is defined as follows:
log likelihood for non-eventsBecause of the existence of the softplus function, the integral is not calculated in a closed form, proper approximation is needed, and the methods for approximating the non-event log likelihood are Monte Carlo integral, numerical integral method and the like.
The Monte Carlo integration method is approximated as follows:
wherein u is i Is from [ t ] j ,t j-1 ]The estimation obtained by the monte carlo integration method is an unbiased estimation, but the calculation is complicated by the need for sampling.
The numerical integration method approximates the following:
the approximation based on integration of the data values is faster due to the fact that no sampling is required, but smaller deviations are generated.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a novel multi-branch self-attention-based transform Hox process model for event prediction. The sequence information obtained under multiple angles is differentially fused, so that more accurate learning of the sequence information is realized, and the accuracy of event prediction is improved.
2. The invention provides a local perception enhancement mechanism to further process the local context information of the sequence, enhance the local perception capability of the model, further improve modeling performance and improve the accuracy of event prediction.
Drawings
Fig. 1 is a schematic structural diagram of an event prediction method based on a multi-branch self-attention transducer hough process model.
FIG. 2 is a schematic diagram of a multi-branch self-attention based transform sequence history characterization encoding network.
FIG. 3 is a schematic diagram of an event prediction module.
FIG. 4 is a graph comparing log likelihood results of a multi-branch self-attention based transform Houx process model with event prediction models of other network structures on four real world datasets Retweets, MIMIC-II, stackOverflow, financial and two artificial datasets.
FIG. 5 is a graph of event type prediction accuracy versus graph.
FIG. 6 is a graph comparing prediction errors of event occurrence time.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent.
The invention is further illustrated in the following figures and examples.
Fig. 1 is a schematic structural diagram of an event prediction method based on a multi-branch self-attention transducer hough process model. As shown in fig. 1, the framework of the whole event prediction mainly consists of two major parts, namely a sequence history characterization coding module and an event prediction module, wherein the sequence history characterization coding module comprises multi-branch self-attention and local perception enhancement.
FIG. 2 is a schematic diagram of a multi-branch self-attention based transform sequence history characterization encoding network. As shown in FIG. 2, a given sequence of events embeds the codeInputting X into a multi-branch transducer encoder layer with enhanced local attention, providing each layer encoder comprising H branches, each branch having a self-attention module and a local attention enhancing network module, in each branch X passing through 3 conversion matrices respectivelyQ, K and V are generated, and the processing mode is as follows:
Q=XW Q ,K=XW K ,V=XW V #(3)
wherein W is QIs a weight matrix of linear conversion, D K =D v =64。
To prevent t i Event pair t at time j The event at time (j > i) generates attention, mask-processes Q, and then Q [ i, j ]](j > i) is set to 0.
Calculating the output S of the attention head from Q, K, V i
Wherein K is T Representing a transpose of the matrix, softmax () refers to the softmax function.
Giving different weights to the outputs of the self-attention modules under different branches, generating outputs by converting the matrix
Wherein the method comprises the steps ofIs an aggregation matrix, alpha i Is a learned parameter, requiring +.>The method can endow sequence information learned from different angles with different importance. Then carrying out residual connection and normalization to obtain final output +.>
S′ i =S′ i +X#(6)
S′ i =LayerNorm(S′ i )#(7)
As shown in fig. 2, the local perception enhancement network (Local Perception Enhancement Networks) of the present invention further processes the local context information of the sequence to enhance the local perception capability of the model to further improve the modeling performance.
The local perception enhancement module further learns the local context information of the characterization information under a single view angle through causal convolution, so that the local perception capability of the model on the characterization information is improved, and the accuracy of the model on the overall understanding of the sequence is improved. Obtain the output S 'of the single attention head' i Then, the data is input into a local perception enhancement network to obtain
F A =S′ i W 1 +b 1 #(8)
F B =ReLu(F A )#(9)
F C =CCNN(F B )#(10)
F D =F C W 2 +b 2 #(11)
Wherein the method comprises the steps ofIs a parameter of the neural network, reLu () refers to ReLu function, CCNN is a causal convolution layer, convolution kernel size is 3, stride is 1, and padding is 2. The closer the event is to the more critical the future event influence, the local perception enhancement mechanism of the invention can make the model learn more reasonable hidden characterization of each event by carrying out causal convolution on adjacent events.
Will F D The result of (2) is weighted to obtain the final output information of the branchThe specific operation is as follows:
β i is a learned parameter, requiresB is input into the next encoder layer for learning after residual connection and normalization, and hidden characterization of an event sequence is finally output after multi-layer encoder layer learning
FIG. 3 is a schematic diagram of an event prediction module. The event prediction module accurately predicts future events based on hidden characterizations of the input event sequence. And respectively obtaining the prediction results of the event type and the event time by inputting the hidden representation into different prediction networks. We describe the detailed operation below:
given an event (k) i ,t i ) Historical hidden characterization of (a)Wherein h is i =b (i,:), the probability distribution of the next event type is obtained by one full connection layer:
wherein the method comprises the steps ofIs a full-connection layer parameter of the predicted event type, argmax () is an argmax function for judging the maximum probability type of the next event +.>
Similarly, for the occurrence time of the next eventIs predicted by:
wherein the method comprises the steps ofIs a full connection layer parameter that predicts event time.
FIG. 4 is a graph comparing log likelihood results of a multi-branch self-attention based transform Houx process model with event prediction models of other network structures on four real world datasets Retweets, MIMIC-II, stackOverflow, financial and two artificial datasets. As shown in fig. 4, the modeling result of the multi-branch self-attention based transform hough process model on the data is more accurate than other models.
FIG. 5 is a graph of event type prediction accuracy versus graph. As shown in fig. 5, the multi-branch self-attention based Transformer hough process model achieves higher accuracy for the prediction of event occurrence type.
FIG. 6 is a graph comparing prediction errors of event occurrence time. As shown in fig. 6, the root mean square error of the multi-branch self-attention based Transformer hough process model is smaller for the prediction of event occurrence time.
The invention provides a transform Hoxwell process model based on multi-branch self-attention, which is used for event prediction. The sequence information obtained under multiple angles is subjected to differential fusion, so that more accurate learning of the sequence information is realized. In addition, a local perception enhancement module is added to further process the local context information of the sequence, and the local perception capability of the model is enhanced, so that the modeling performance is further improved, and the accuracy of event prediction is improved. A number of experiments performed on four real world datasets, retweets, MIMIC-II, stackOverflow, financial, and two artificial datasets, have shown that the model achieves good results in terms of sequence modeling and event prediction. In future work, we will continue to explore how better to learn the dependency information of the sequences and effectively integrate it for future event prediction.
Finally, the details of the above examples of the invention are provided only for illustrating the invention, and any modifications, improvements, substitutions, etc. of the above embodiments should be included in the scope of the claims of the invention.

Claims (2)

1. An event prediction method based on a multi-branch self-attention transducer Hoxwell process model is characterized in that,
the method comprises the following steps:
s1, constructing a multi-branch self-attention module, and extracting more accurate historical characterization of the sequence according to the differentiation fusion of multi-angle information.
S2, a local perception enhancement module is constructed, local context information is further processed through causal convolution, and the local perception capability is enhanced to improve modeling performance.
S3, constructing a multi-branch self-attention-based transducer sequence history representation coding network architecture by combining the network in S2 and the network in S3
S4, an event prediction module is constructed, and occurrence of future events is predicted through sequence history characterization extracted in the S3.
S5, constructing a multi-branch self-attention-based transform Hox process model framework by combining the network in the S3 and the network in the S4.
S6, training a multi-branch self-attention-based transducer Hoxwell process model.
The event prediction method based on a multi-branch self-attention transducer hough process model according to claim 1, wherein the specific process of S1 is:
the multi-branch self-attention module differentiates different characterization information obtained under multiple view angles according to input to extract a sequence hidden characterization with more accurate sequence. By considering the contribution degree of different characterization information to the hidden characterization of the final sequence, corresponding weights are given to the output of different characterization subspaces, so that the deviation of the overall understanding of the sequence is reduced. We describe the detailed operation below:
given a data set, the event categories in the data set are set to be M categories, and for any event sequence in the given data set Event type +.>Performing one-time thermal encoding (except kth i M-dimensional vectors with all values except 1, 0) are obtained +.>Then input a type code of an embedding matrix generation event +.>
C=W e K’#(1)
Wherein the method comprises the steps ofIs an embedding matrix, using a D-dimensional sized coded vector to represent event categories, d=128.
Encoding the time information using a sine and cosine function to obtain
For time information t i We use p (t) i ) To encode, [ p (t) i )] j The j-th element encoded for time information.
In the above way we obtain the event type code C and the event time code P of the sequence, the embedded code of the whole sequence of events can be expressed as: x= (c+p) T The embedded encodings of all events obtained are input into a multi-branch self-attention based transform sequence history characterization encoding network.
Given a sequence of events embedded with codesInputting X into a multi-branch transducer encoder layer with enhanced local attention, and setting each layer encoder to comprise H branches, wherein each branch is provided with a self-attention module and a local attention enhancement network module, and in each branch, X respectively generates Q, K and V through 3 conversion matrixes, and the processing mode is as follows:
Q=XW Q ,K=XW K ,V=XW V #(3)
wherein the method comprises the steps ofIs a weight matrix of linear conversion, D K =D v =64。
To prevent t i Event pair t at time j ,(j>i) The moment event generates attention, mask processing is carried out on Q, and Q [ i, j](j>i) And setting 0.
Calculating the output S of the attention head from Q, K, V i
Wherein K is T Representing a transpose of the matrix, softmax () refers to the softmax function.
Giving different weights to the outputs of different attention heads, generating the outputs by converting the matrix
Wherein the method comprises the steps ofIs an aggregation matrix, alpha i Is a learned parameter, requiring +.>The method can endow sequence information learned from different angles with different importance. Then carrying out residual connection and normalization to obtain final output +.>
S′ i =S′ i +X#(6)
S′ i =LayerNorm(S′ i )#(7)
The event prediction method based on a multi-branch self-attention transducer hough process model according to claim 1, wherein the specific process of S2 is:
the local perception enhancement module further learns the local context information of the characterization information under a single view angle through causal convolution, so that the local perception capability of the model on the characterization information is improved, and the accuracy of the model on the overall understanding of the sequence is improved. Obtain the output S 'of the single attention head' i Then, the data is input into a local perception enhancement network to obtain
F A =S′ i W 1 +b 1 #(8)
F B =ReLu(F A )#(9)
F C =CCNN(F B )#(10)
F D =F C W 2 +b 2 #(11)
Wherein the method comprises the steps ofIs a parameter of the neural network, reLu () refers to ReLu function, CCNN is a causal convolution layer, convolution kernel size is 3, stride is 1, and padding is 2. The closer the event is to the more critical the future event influence, the local perception enhancement mechanism of the invention can make the model learn more reasonable hidden characterization of each event by carrying out causal convolution on adjacent events.
Will F D The result of (2) is weighted to obtain the final output information of the branchThe specific operation is as follows:
β i is a learned parameter, requiresB is input into the next encoder layer for learning after residual connection and normalization, and hidden representation of an event sequence is finally output after multi-layer encoder layer learning>
The event prediction method based on a multi-branch self-attention transducer hough process model according to claim 1, wherein the specific process of S3 is:
in a multi-branch self-attention-based transducer sequence history representation coding network architecture, each branch consists of a weight-based self-attention module and a local perception enhancement module, and different weight branches are fused, so that a model can learn more accurate sequence information.
The event prediction method based on a multi-branch self-attention transducer hough process model according to claim 1, wherein the specific process of S4 is:
the event prediction module accurately predicts future events based on hidden characterizations of the input event sequence. And respectively obtaining the prediction results of the event type and the event time by inputting the hidden representation into different prediction networks. We describe the detailed operation below:
given an event (k) i ,t i ) Historical hidden characterization of (a)Wherein h is i =b (i,:), the probability distribution of the next event type is obtained by one full connection layer:
wherein the method comprises the steps ofIs a full-connection layer parameter of the predicted event type, argmax () is an argmax function for judging the maximum probability type of the next event +.>
Similarly, for the occurrence time of the next eventIs predicted by:
wherein the method comprises the steps ofIs a full connection layer parameter that predicts event time.
The event prediction method based on a multi-branch self-attention transducer hough process model according to claim 1, wherein the specific process of S5 is:
the event prediction method based on the multi-branch self-attention transducer Hooke process model comprises a multi-branch self-attention transducer sequence history characterization coding network and an event prediction network module.
2. The event prediction method based on a multi-branch self-attention transducer hough process model according to claim 1, wherein the specific process of S6 is:
the training method of the multi-branch self-attention-based transducer Hox process model is as follows:
the training loss of the model consists of likelihood function loss, time loss and type loss, the model is trained by using an Adam optimizer, and the loss formula is as follows:
wherein R is the number of sequences in the dataset, gamma type And gamma is equal to time Is a super parameter that helps keep training stable, first, time lossAnd type loss->The calculation method is as follows:
to maximize likelihood function, the likelihood loss is calculatedTaking a negative value, the calculation of the likelihood function comprises two parts, namely event log likelihood and non-event log likelihood, and the calculation mode is as follows:
event log likelihoodThe calculation mode of (2) is as follows:
in the event sequence, the relation between the occurrence probability and the occurrence time of each type of event is determined by a corresponding condition intensity function, and we calculate the condition intensity by using the obtained hidden representation of the event sequenceDegree function λ (t|H) t ) Wherein H is t ={(t j ,k j ):t j <t is a sequence of historical events before time t, we define different conditional intensity functions for different classes of events, e.g. in a sequence with M classes of event classes, for each event class k e {1,2, …, M }, conditional intensity function λ k (t|H t ):
Wherein the method comprises the steps ofIs a softplus function, beta k Is the soft parameter.
The conditional intensity function of the whole sequence is defined as follows:
log likelihood for non-eventsBecause of the existence of the softplus function, the integral is not calculated in a closed form, proper approximation is needed, and the methods for approximating the non-event log likelihood are Monte Carlo integral, numerical integral method and the like.
The Monte Carlo integration method is approximated as follows:
wherein u is i Is from [ t ] j ,t j-1 ]The estimation obtained by the monte carlo integration method is an unbiased estimation, but the calculation is complicated by the need for sampling.
The numerical integration method approximates the following:
the approximation based on integration of the data values is faster due to the fact that no sampling is required, but smaller deviations are generated.
CN202310616233.3A 2023-05-29 2023-05-29 Event prediction method of transform Hoxwell process model based on multi-branch self-attention Pending CN116596150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310616233.3A CN116596150A (en) 2023-05-29 2023-05-29 Event prediction method of transform Hoxwell process model based on multi-branch self-attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310616233.3A CN116596150A (en) 2023-05-29 2023-05-29 Event prediction method of transform Hoxwell process model based on multi-branch self-attention

Publications (1)

Publication Number Publication Date
CN116596150A true CN116596150A (en) 2023-08-15

Family

ID=87611388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310616233.3A Pending CN116596150A (en) 2023-05-29 2023-05-29 Event prediction method of transform Hoxwell process model based on multi-branch self-attention

Country Status (1)

Country Link
CN (1) CN116596150A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332377A (en) * 2023-12-01 2024-01-02 西南石油大学 Discrete time sequence event mining method and system based on deep learning
CN117493068A (en) * 2024-01-03 2024-02-02 安徽思高智能科技有限公司 Root cause positioning method, equipment and storage medium for micro-service system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117332377A (en) * 2023-12-01 2024-01-02 西南石油大学 Discrete time sequence event mining method and system based on deep learning
CN117332377B (en) * 2023-12-01 2024-02-02 西南石油大学 Discrete time sequence event mining method and system based on deep learning
CN117493068A (en) * 2024-01-03 2024-02-02 安徽思高智能科技有限公司 Root cause positioning method, equipment and storage medium for micro-service system
CN117493068B (en) * 2024-01-03 2024-03-26 安徽思高智能科技有限公司 Root cause positioning method, equipment and storage medium for micro-service system

Similar Documents

Publication Publication Date Title
CN116596150A (en) Event prediction method of transform Hoxwell process model based on multi-branch self-attention
CN111985205A (en) Aspect level emotion classification model
CN111626764A (en) Commodity sales volume prediction method and device based on Transformer + LSTM neural network model
CN111242351A (en) Tropical cyclone track prediction method based on self-encoder and GRU neural network
CN114896434B (en) Hash code generation method and device based on center similarity learning
CN115222998B (en) Image classification method
CN110956309A (en) Flow activity prediction method based on CRF and LSTM
CN115630651B (en) Text generation method and training method and device of text generation model
CN115757919A (en) Symmetric deep network and dynamic multi-interaction human resource post recommendation method
CN113609326B (en) Image description generation method based on relationship between external knowledge and target
CN112559741B (en) Nuclear power equipment defect record text classification method, system, medium and electronic equipment
CN117076931B (en) Time sequence data prediction method and system based on conditional diffusion model
CN114003900A (en) Network intrusion detection method, device and system for secondary system of transformer substation
CN116258504A (en) Bank customer relationship management system and method thereof
CN115422945A (en) Rumor detection method and system integrating emotion mining
CN113011495A (en) GTN-based multivariate time series classification model and construction method thereof
Nayak et al. Learning a sparse dictionary of video structure for activity modeling
CN111158640B (en) One-to-many demand analysis and identification method based on deep learning
CN110879833B (en) Text prediction method based on light weight circulation unit LRU
CN113361505B (en) Non-specific human sign language translation method and system based on contrast decoupling element learning
CN118036657A (en) AIS data-based method for bidirectional RNN track completion method with encoding and decoding functions
CN115796365A (en) Financial time sequence prediction method and device based on predictable factor decomposition
Zhu et al. SEREVIT: A Novel Vision Transformer Based Double-Attention Network for Detecting Pneumonia
CN115358240A (en) Named entity identification method for intelligent financial product recommendation system
Wang et al. Well Logging Stratigraphic Correlation Algorithm Based on Semantic Segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication