WO2023149236A1 - 学習装置、予測装置、学習方法、予測方法、学習プログラム及び予測プログラム - Google Patents

学習装置、予測装置、学習方法、予測方法、学習プログラム及び予測プログラム Download PDF

Info

Publication number
WO2023149236A1
WO2023149236A1 PCT/JP2023/001699 JP2023001699W WO2023149236A1 WO 2023149236 A1 WO2023149236 A1 WO 2023149236A1 JP 2023001699 W JP2023001699 W JP 2023001699W WO 2023149236 A1 WO2023149236 A1 WO 2023149236A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sequence
prediction
learning
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2023/001699
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
勇和 佐々木
寛人 山口
真 鬼塚
政司 前川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Osaka NUC
Original Assignee
Osaka University NUC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Osaka University NUC filed Critical Osaka University NUC
Priority to JP2023578472A priority Critical patent/JPWO2023149236A1/ja
Publication of WO2023149236A1 publication Critical patent/WO2023149236A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a learning device, a prediction device, a learning method, a prediction method, a learning program, and a prediction program.
  • time-series graphs For example, there are retweet networks in which users on Twitter (registered trademark) are nodes and retweets are links, and e-mail networks in which e-mail users in the community are nodes and e-mail histories are links.
  • Non-Patent Document 1 There is a demand for technology that predicts the future behavior of users from this time-series graph.
  • Non-Patent Document 1 As a means of predicting information diffusion of tweets, research is being conducted to predict future retweets using a retweet graph (Non-Patent Document 1).
  • Non-Patent Documents 2 to 9 research has been conducted on a time-series link prediction method for predicting future links using past link generation history.
  • a continuous-time graph is a graph that can express link occurrence times as continuous values.
  • link prediction that considers the detailed time series of link occurrence is realized.
  • Non-Patent Documents 7 and 8 link prediction methods that utilize continuous-time graphs (Non-Patent Documents 7 and 8) achieve improved accuracy by learning link generation from two aspects: structural tendency and temporal tendency. .
  • the methods described in Non-Patent Documents 7 and 8 create node embeddings that reflect the latest node features from the input continuous-time graph.
  • the methods described in Non-Patent Documents 7 and 8 by learning the structural tendency of link generation and introducing a point process when calculating the link generation probability from the latest node embedding, the temporal link generation learning trends.
  • the present invention has been made in view of the above, and is a learning device and a prediction that can acquire a necessary number of teacher data for sufficiently learning a deep learning model and improve the accuracy of the deep learning model.
  • An object is to provide an apparatus, a learning method, a prediction method, a learning program, and a prediction program.
  • a first deep learning model an acquisition unit that acquires a feature amount of a sequence, and a second deep learning model, from the feature amount of the sequence, an event or data corresponding to the learning data is generated
  • a prediction unit that obtains the prediction result of and an update unit that updates the parameters of the first deep learning model and the second deep learning model so that the error between the prediction result and the correct data of the learning data is reduced. characterized by having
  • the prediction apparatus receives as input data for predicting the occurrence of an event or data and time-series data related to the data for prediction, and generates a fixed-length sequence from the input data. and a first deep learning model trained using learning data based on a plurality of time-series data respectively acquired from a plurality of types of applications, an acquisition unit for acquiring sequence feature values, and learning data and a prediction unit that obtains a prediction result of the occurrence of an event or data corresponding to the prediction target data from the feature amount of the sequence using the second deep learning model learned using .
  • the present invention it is possible to acquire the necessary number of teacher data to sufficiently train the deep learning model and improve the accuracy of the deep learning model.
  • FIG. 1 is a diagram schematically showing an example of the configuration of a learning device according to an embodiment.
  • FIG. 2 is a diagram schematically showing an example of a configuration of a prediction unit shown in FIG. 1;
  • FIG. 3 is a diagram for explaining the processing flow of the prediction unit shown in FIG. 2;
  • FIG. 4 is a diagram showing an algorithm for acquiring the action history of the source node and the action history of the target node.
  • FIG. 5 is a diagram showing an algorithm for obtaining an information propagation route to a source node.
  • FIG. 6 is a flowchart showing a processing procedure of learning processing according to the embodiment.
  • FIG. 7 is a diagram schematically illustrating an example of a configuration of a prediction device according to an embodiment;
  • FIG. 7 is a diagram schematically illustrating an example of a configuration of a prediction device according to an embodiment;
  • FIG. 8 is a flowchart illustrating a processing procedure of prediction processing according to the embodiment.
  • FIG. 9 is a diagram showing the accuracy when the sequence length is set to an integer value of 1-5.
  • FIG. 10 is a diagram illustrating an example of a computer that realizes a learning device and a prediction device by executing programs.
  • the embodiment inspired by this trend in the field of natural language processing, it is assumed that even in graph data, there is a common trend in the behavior of nodes across multiple data sets. Based on this assumption, the embodiment aims to improve the accuracy of link generation prediction for predicting future link generation by learning a common link generation tendency from a plurality of data sets.
  • Table 1 shows a comparison of link occurrence prediction accuracy in the conventional method DyRep (Non-Patent Document 8) and in this embodiment learned with multiple data sets.
  • Table 1 shows the prediction accuracy of unreleased links in the dataset used in evaluation experiments (described later) in the embodiment.
  • Hits@10 is used as an evaluation index, and the higher the value, the higher the accuracy.
  • the maximum accuracy is shown in bold.
  • the issue (1) will be explained.
  • the input of the prediction model is abstracted into a link occurrence history sequence common to multiple datasets.
  • common model learning is enabled for a plurality of data sets.
  • a time-series deep learning model (eg, Transformer encoder) is used to learn link generation from a fixed-length link generation history sequence.
  • a time-series deep learning model eg, Transformer encoder
  • BERT reference 1
  • the issue (2) we will explain the issue (2).
  • a node action history sequence that expresses both a node action and action time into a transformer encoder, which is a time-series deep learning model.
  • the action history sequence of this node expresses the actions of the n most recent nodes by the n most recent link embeddings of the node to be predicted.
  • this link embedding is temporally encoded based on each link occurrence time to add the action time information of the node.
  • the embodiment achieves prediction model learning using multiple data sets and improves the accuracy of link prediction. Embodiments will be specifically described below.
  • a continuous time graph obtained from an application is used as a data set.
  • prediction model learning is performed using not one data set but multiple data sets obtained from multiple types of applications.
  • a continuous-time graph is a time-series graph that expresses changes in the graph structure with a single multigraph in which links are given time stamps of occurrence times.
  • a continuous-time graph is a graph that can express link occurrence times as continuous values.
  • the continuous-time graph is composed of a pair of a source node and a target node, a link between the pair, and time information when the link occurred.
  • V is the node set common at each time
  • E t is the time-stamped link set consisting of the links prior to time t.
  • e i (v src i , v tgt i , t i ) ⁇ E t represents the i-th occurrence of the timestamped link in the graph.
  • v src i is the source node
  • v tgt i is the target node
  • t i is the link generation time.
  • a prediction target time t, a pair of a prediction target source node and a target node, and a continuous time graph G(t) up to time t are input to a prediction model, and the prediction target source at time t is Predict link occurrence probabilities between pairs of nodes and target nodes.
  • Transformer Transformer (Reference 2) adopted in this embodiment as an example of a time-series deep learning model is based on BERT (Reference 1), GPT (Reference 3), XLnet (Reference 4), etc. It is a method widely used in the field of natural language processing.
  • Reference 2 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin, “Attention Is All You Need”, In Proceedings of NeurIPS, 2017.
  • Reference 3 Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever, “Language Models are Unsupervised Multitask Learners”, 2019.
  • Reference 4 Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell,paln Salakhutdinov, and Quoc V. Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding”, In Proceedings of NeurIPS, 2019.
  • a Transformer is an Encoder-Decoder model, where the encoder outputs their latent representation from the vector sequence, and the decoder outputs the target sequence from the encoder's latent representation.
  • the encoder and decoder it is possible to acquire the latent expression and the target sequence that capture the strength of the dependency between each element by applying attention to the sequence.
  • attention alone cannot consider the order of the input sequence, it is necessary to add information that indicates the order of the elements to the sequence. Therefore, in general, the order relationship of each element is expressed by adding a positional encoding based on the positional information of each element in the sequence to the input sequence.
  • BERT In the field of natural language processing, BERT, which achieves general-purpose contextual understanding by training a transformer encoder using multiple datasets, has been proposed and used for a wide range of tasks. In BERT, we have succeeded in capturing general inter-word dependencies by inputting sentences in multiple datasets into the transformer encoder as word embedding sequences.
  • FIG. 1 is a diagram schematically showing an example of the configuration of a learning device according to an embodiment.
  • the learning device 10 has a communication section 11, an input/output section 12, a storage section 13 and a control section .
  • the communication unit 11 is a communication interface that transmits and receives various types of information to and from other devices connected via a network or the like.
  • the communication unit 11 is realized by a NIC (Network Interface Card) or the like, and performs communication between another device and the control unit 14 (described later) via an electric communication line such as a LAN (Local Area Network) or the Internet.
  • NIC Network Interface Card
  • LAN Local Area Network
  • the input/output unit 12 receives input of information and outputs information.
  • the input/output unit 12 is a device such as a mouse or a keyboard that receives input of various instruction information to the learning device 10 in response to input operations by the user.
  • the input/output unit 12 is realized by, for example, a liquid crystal display or the like, and displays and outputs a screen whose display is controlled by the learning device 10 .
  • the storage unit 13 is realized by semiconductor memory elements such as RAM (Random Access Memory) and flash memory, and stores processing programs that operate the learning device 10 and data used during execution of the processing programs. be done.
  • the storage unit 13 has a data storage unit 131 , a learning data storage unit 132 , a correct data storage unit 133 , a sequence storage unit 134 and a parameter storage unit 135 .
  • the data storage unit 131 stores multiple data sets including continuous time graphs obtained from multiple types of applications.
  • the learning data storage unit 132 stores learning data created by the data generation unit 141 (described later) from a plurality of data sets.
  • the correct data storage unit 133 stores correct data of each learning data.
  • the sequence storage unit 134 stores sequences generated by the sequence generation unit 145 (described later).
  • a sequence is a fixed-length sequence generated from training data.
  • a sequence represents the occurrence history of the link of a pair of nodes.
  • the sequence includes a source node action history sequence (first sequence) representing the action history of the source node, a target received action history sequence (second sequence) representing the target node received action history, and /or an information propagation path sequence (third sequence) representing the information propagation path to the source node.
  • the parameter storage unit 135 stores parameters of various models possessed by the prediction unit 142 (described later).
  • the control unit 14 controls the learning device 10 as a whole.
  • the control unit 14 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • the control unit 14 also has an internal memory for storing programs defining various processing procedures and control data, and executes each processing using the internal memory. Further, the control unit 14 functions as various processing units by running various programs.
  • the control unit 14 has a data generation unit 141 , a prediction unit 142 , an update unit 143 and a control processing unit 144 .
  • the data generation unit 141 generates learning data and correct data for each learning data from a plurality of data sets.
  • the data generation unit 141 extracts, from a plurality of data sets, a learning target node pair and a predetermined number of node pairs in which links are generated before the link generation time between the learning target node pairs as learning data. do.
  • the data generation unit 141 extracts, from a plurality of data sets, a learning target node pair and n pairs of nodes that occurred at a time immediately preceding the link generation time of the learning target node pair as learning data.
  • the data generation unit 141 uses data indicating whether or not a link has actually occurred in a pair of learning target nodes as correct data. For example, the correct answer data is indicated by two values (0 (does not occur), 1 (occurs)).
  • a pair of nodes is a pair of a source node and a target node.
  • the application is Twitter, the user who tweeted becomes the source node, the user who retweeted becomes the target node, the retweet becomes the link, and the retweet time becomes the link generation time.
  • the application is e-mail, the source user of the e-mail is the source node, the destination user of the e-mail is the target node, the e-mail history is the link, and the transmission time of the e-mail is the occurrence time of the link.
  • the application is a purchase application, the purchaser is the source node, the purchased (or reserved) product or service is the target node, the purchase history is the link, and the purchase time is the link generation time.
  • the prediction unit 142 predicts the occurrence of links between pairs of learning target nodes.
  • the prediction unit 142 includes a sequence generation unit 145 (generation unit), a feature amount acquisition unit 146 (acquisition unit), and a calculation unit 147 (prediction unit).
  • the prediction unit 142 corresponds to the prediction model in the embodiment.
  • the sequence generator 145 generates a fixed-length sequence from each piece of learning data generated by the data generator 141 .
  • the feature amount acquisition unit 146 acquires the feature amount of the sequence using the first deep learning model.
  • the calculation unit 147 predicts the occurrence of a link between pairs of learning target nodes from the feature amount of the sequence using the second deep learning model.
  • the calculation unit 147 calculates, for example, a link occurrence probability between a pair of learning target nodes.
  • the update unit 143 updates the parameters of the first deep learning model and the second deep learning model so that the error between the prediction result predicted by the calculation unit 147 and the correct data of the learning data is reduced.
  • the updating unit 143 updates the parameters of the first deep learning model and the parameters of the second deep learning model using a predetermined loss function.
  • the control processing unit 144 causes the feature amount acquiring unit 146, the calculating unit 147, and the updating unit 143 to repeatedly execute the processing until a predetermined end condition is satisfied.
  • the predetermined end condition is, for example, reaching a predetermined number of iterations, the loss value falling within a predetermined range, and the parameter update amount between the first deep learning model and the second deep learning model being less than a predetermined threshold. It is a condition that the learning of the first deep learning model and the second deep learning model is sufficiently performed.
  • FIG. 2 is a diagram schematically showing an example of the configuration of the prediction section 142 shown in FIG. 1.
  • FIG. 3 is a diagram for explaining the processing flow of the prediction unit 142 shown in FIG.
  • the prediction unit 142 receives learning data generated from continuous-time graphs included in multiple data sets. For example, one data set includes a graph obtained from application A (eg, graph Ga(1) in FIG. 3), and one data set includes a graph obtained from application B (eg, graph Ga(1) in FIG. 3). Gb(1)). Then, the prediction unit 142 performs two steps: a sequence generation step ((A) in FIG. 3) and a link generation prediction step ((B) in FIG. 3).
  • the sequence generation unit 145 In the sequence generation step, the sequence generation unit 145 generates a link from a source node to a target node that becomes an input common to a plurality of data sets in order to enable learning of a link generation tendency common to a plurality of data sets. Create a history sequence.
  • the link occurrence history sequence expresses the actions of the nodes and the action times. Make patterns learnable.
  • the sequence generation unit 145 extracts from the continuous-time graph a pair of nodes to be learned, and a predetermined number of nodes for which links are generated before the link generation time between the pair of nodes to be learned. A plurality of types of sequences (link occurrence history sequences) representing the link occurrence history are generated for each pair of .
  • the sequence generation unit 145 generates the latest n link occurrence history sequences for each learning target node pair ((1) in FIG. 3).
  • the sequence generation unit 145 in addition to the link generation history sequence corresponding to time t10 , the sequence generation unit 145 generates, for example, six times t2 , t3 , t6 , t7 , t8 , t Extract node pairs for links generated in 9 , and generate a link generation history sequence for each pair. This allows the subsequent Transformer encoder to capture link generation due to the actions of the most recent nodes. Note that the sequence generation unit 145 also extracts a pair of nodes in which a link occurs before the prediction target link occurrence time from a graph (for example, Gb(1)) acquired from another application B, and
  • the sequence generation unit 145 creates three types of link generation history sequences as link generation history sequences: the action history sequence of the source node, the action received history sequence of the target, and the information propagation route sequence.
  • the prediction unit 142 can grasp behavior patterns of various nodes.
  • the sequence generation unit 145 replaces the created link generation history sequence with a link embedding sequence that can be input to the transformer encoders 1461-1 to 1461-3 ((2) in FIG. 3).
  • the sequence generation unit 145 performs temporal encoding based on each link occurrence time for the three types of sequences in order to express the action time ((3) in FIG. 3).
  • the sequence generator 145 performs encoding based on the difference between the link generation time of the input sequence and the predicted link generation time, thereby enabling the transformer encoder to capture the time interval pattern of link generation. do.
  • the feature acquisition unit 146 has a time-series deep learning model (first deep learning model) for each type of sequence generated by the sequence generation unit 145, and combines the outputs of a plurality of time-series deep learning models. and output to the calculation unit 147 .
  • the feature quantity acquisition unit 146 has, for example, transformer encoders 1461-1 to 1461-3 as time series deep learning models.
  • the feature quantity acquisition unit 146 has a combining unit 1462 .
  • the transformer encoder 1461-1 receives the action history sequence of the source node as input, and generates a latent expression (first latent expression) that captures the action pattern of the source node. Output ((4) in FIG. 3).
  • Transformer encoder 1461-2 receives as input the target's action history sequence, and outputs a latent expression (second latent expression) that captures the target's action pattern (see FIG. 3 (4)).
  • Transformer encoder 1461-3 receives the information propagation path sequence as input and outputs a latent expression (third latent expression) that captures the information propagation pattern ((4) in FIG. 3).
  • the combining unit 1462 selects a latent expression that captures the action pattern of the source node, a latent expression that captures the pattern of the target's behavior, and/or a latent expression that captures the information propagation pattern. They are weighted with corresponding weights ⁇ 1 , ⁇ 2 and ⁇ 3 ((5) in FIG. 3).
  • the combining unit 1462 generates a feature amount by combining each weighted latent expression ((6) in FIG. 3).
  • the weights ⁇ 1 , ⁇ 2 , and ⁇ 3 are hyperparameters, for example, and are appropriately set according to the problem to be solved. Alternatively, the weights ⁇ 1 , ⁇ 2 , ⁇ 3 may be optimized together with each parameter of the prediction model by the updating unit 143 .
  • the calculation unit 147 uses the second deep learning model to calculate the link occurrence probability between the pair of learning target nodes ((7) in FIG. 3), and outputs the calculated link occurrence probability.
  • the calculation unit 147 uses, for example, an MLP (Multilayer Perceptron) 1471 as the second deep learning model.
  • the calculation unit 147 may use a convolutional neural network or the like as the second deep learning model.
  • the update unit 143 updates the parameters of the transformer encoders 1461-1 to 1461-3 and the parameters of the MLP 1471 so that the error between the prediction result predicted by the calculation unit 147 and the correct data of the learning data is reduced.
  • the updating unit 143 may also update and optimize the weights ⁇ 1 , ⁇ 2 and ⁇ 3 .
  • the sequence generation unit 145 generates three types of sequences by extracting link generation history, creating link embedding, and performing temporal encoding.
  • the sequence generator 145 first acquires the action history of the source node, the action history of the target node, and the information propagation route to the source node.
  • FIG. 4 is a diagram showing an algorithm for acquiring the action history of the source node and the action history of the target node.
  • FIG. 5 is a diagram showing an algorithm for obtaining an information propagation route to a source node.
  • the sequence generation unit 145 extracts the link occurrence history of the last l src (including the prediction target link) immediately before the link occurrence prediction target time t according to Algorithm 1 in FIG. to create When creating the action history sequence of the source node, ⁇ node'' in Algorithm1 is explained as ⁇ src''.
  • the sequence generation unit 145 first adds the link to be predicted (for example, (a, b, t 10 ) shown in FIG. 3) to the action history S src of the source node (the first line of Algorithm1 ). This makes it possible for the prediction unit 142 to determine between which node pair the link occurrence probability is to be predicted.
  • the sequence generation unit 145 searches the link generation history sorted by the link generation time from the nearest to the time t to the past, and generates the link history generated by the source node until the length becomes l src .
  • 's action history S src Lines 2-9 of Algorithm1.
  • the sequence generation unit 145 rearranges the elements of the action history S src of the source node in ascending order of the link occurrence time, thereby converting the actual link occurrence chronological order (10th line of Algorithm1).
  • a behavior history sequence (for example, sequence Sc1 in FIG. 3) is created.
  • the sequence generation unit 145 extracts the latest l tgt backlink histories of the target node from immediately before the time t by performing the processing of Algorithm 1 in FIG. 4 in the same manner as the action history of the source node.
  • the "node" of Algorithm1 becomes "tgt".
  • the sequence generating unit 145 generates the action history sequence of the target node (for example, the sequence Sc2 in FIG. 3).
  • sequence generating unit 145 extracts the information propagation path to the nearest prediction target source node of the length l propagation according to Algorithm2 of FIG. ).
  • the sequence generation unit 145 first adds a prediction target node pair (for example, (a, b, t 10 ) shown in FIG. 3) to the information propagation path S propagation (first line of Algorithm2). Subsequently, the sequence generation unit 145 searches the link occurrence history in the same way as when acquiring the action history sequence of the source node, and adds information propagation paths until the propagation length becomes equal to l propagation (2- line 11). Then, the sequence generation unit 145 rearranges the elements of the information propagation path S propagation in ascending order of link generation time, thereby converting the elements into the chronological order of actual link generation (12th line of Algorithm2), and the information propagation path A sequence (for example, sequence Sc3 in FIG. 3) is created.
  • a prediction target node pair for example, (a, b, t 10 ) shown in FIG. 3
  • the sequence generation unit 145 searches the link occurrence history in the same way as when acquiring the action history sequence of the source node,
  • the sequence generator 145 replaces the created link occurrence history sequence with a link embedding sequence (for example, Sc1-1 in FIG. 3) that can be input to the transformer encoders 1461-1 to 1461-3.
  • Link embedding is a combination of embeddings of node pairs at both ends of a link obtained by existing pre-trained node embedding methods.
  • the sequence generation unit 145 uses TDGNN (Temporal Dependent Graph Neural Network) (Non-Patent Document 6) among continuous-time graph node embedding techniques.
  • TDGNN is a model that can acquire the embedding of any node at any time.
  • TDGNN outputs node embeddings that more strongly reflect the characteristics of the most recently linked node by heavily propagating the embedding of the adjacent node with the most recent link using the continuous time graph before the input time. do.
  • a TDGNN is provided for each of the action history of the source node, the action history of the target node, and the information propagation route to the source node.
  • Each TDGNN is trained in advance using a continuous-time graph for each application so as to output node embeddings that more strongly reflect the characteristics of the node where the link occurred most recently.
  • the sequence generation unit 145 replaces each of the three sequences with a link embedding sequence using the TDGNN corresponding to each sequence. As a result, the sequence generation unit 145 can acquire link embeddings that express chronological changes in node features.
  • the sequence generation unit 145 uses node embedding obtained by node2vec (reference document 5) as the initial value of TDGNN, thereby obtaining node embedding that more accurately reflects the structural information of the continuous-time graph.
  • Reference 5 Aditya Grover and Jure Leskovec, “node2vec: Scalable Feature Learning for Networks”, In Proceedings of KDD, p.855-864. Association for Computing Machinery, 2016.
  • the sequence generation unit 145 adds temporal encoding based on the link generation time to the link embedding created above, similar to Positional Encoding used in natural language processing.
  • the sequence generation unit 145 encodes based on the difference between the link prediction time and the link occurrence time in the sequence, so that the link occurrence times in a plurality of data sets can be calculated without depending on the link occurrence absolute time of each data set. It makes it possible to treat temporal trends as well.
  • the sequence generator 145 uses, for example, Equation (1) as an encoding method that satisfies the above conditions.
  • Equation (1) t k is the occurrence time of each link.
  • W is a learning parameter.
  • W is replaced by W src , W tgt or W propagation depending on the type of each sequence.
  • the feature amount acquisition unit 146 acquires feature amounts indicating the tendency of link occurrence using the transformer encoders 1461-1 to 1461-3 from the three types of sequences created in the sequence generation step.
  • the feature quantity acquisition unit 146 acquires each latent expression H src_node_seq , H tgt_node_seq , and H propagation_seq by inputting three types of sequences to different transformer encoders 1461-1 to 1461-3.
  • the combining unit 1462 weights these latent expressions H src_node_seq , H tgt_node_seq , and H propagation_seq with weights ⁇ 1 , ⁇ 2 , and ⁇ 3 , and generates a feature quantity that combines the weighted latent expressions.
  • the calculation unit 147 inputs this feature amount to the MLP 1471 and calculates the link occurrence probability.
  • the updating unit 143 uses Binary Cross Entropy loss used in general binary classification as the loss function.
  • the learning device 10 learns the prediction model using the same number of negative samples as the learning data.
  • FIG. 6 is a flowchart showing a processing procedure of learning processing according to the embodiment.
  • the learning device 10 generates learning data and correct data for each learning data from a plurality of data sets (step S1).
  • the sequence generation unit 145 generates a fixed-length source node action history sequence, target node received action history sequence, and information propagation path sequence to the source node from each learning data generated by the data generation unit 141 (step S2).
  • the learning device 10 uses the transformer encoders 1461-1 to 1461-3 to obtain feature amounts of the three sequences (step S3).
  • the learning device 10 uses the MLP 1471 to predict the occurrence of links between pairs of learning target nodes based on the feature amounts of the three sequences (step S4).
  • the learning device 10 updates the parameters of the transformer encoders 1461-1 to 1461-3 and the parameters of the MLP 1471 so that the error between the prediction result predicted in step S4 and the correct data of the learning data is reduced (step S5). .
  • the learning device 10 determines whether or not a predetermined termination condition is satisfied (step S6). If the predetermined termination condition is not satisfied (step S6: No), the learning device 10 returns to step S3, and performs feature amount acquisition processing (step S3), link generation prediction processing (step S4), and parameter update processing (step S5) is executed. If the predetermined termination condition is satisfied (step S6: Yes), the learning device 10 terminates the learning process for this learning data.
  • FIG. 7 is a diagram schematically illustrating an example of a configuration of a prediction device according to an embodiment.
  • the prediction device 20 has a communication unit 21 , an input/output unit 22 , a storage unit 23 and a control unit 24 .
  • the communication unit 21 is a communication interface that transmits and receives various types of information to and from other devices connected via a network or the like.
  • the input/output unit 22 is implemented by devices such as a mouse and keyboard, a liquid crystal display, and the like, and has the same functions as the input/output unit 12 shown in FIG.
  • the storage unit 23 is implemented by a semiconductor memory device such as a RAM or flash memory, and has the same function as the storage unit 13 shown in FIG.
  • the control unit 24 has, for example, an electronic circuit such as a CPU, an integrated circuit such as an ASIC, an internal memory for storing programs and control data, and has the same functions as the control unit 14 shown in FIG.
  • the control unit 24 has a prediction unit 241 .
  • the prediction unit 241 predicts the occurrence of a link between a pair of prediction target nodes.
  • the prediction unit 241 has a sequence generation unit 242 , a feature amount acquisition unit 243 and a calculation unit 244 .
  • the prediction unit 241 corresponds to the prediction model in the embodiment.
  • the sequence generator 242 has the same function as the sequence generator 145 shown in FIGS.
  • the sequence generation unit 242 receives the input of the source node, the target node, and the time, which are the target of link occurrence prediction
  • the sequence generation unit 242 generates the prediction target from the continuous time graph including the pair of the source node and the target node, which are the target of link occurrence prediction.
  • a predetermined number of pairs of nodes for which links are generated before the link generation time are extracted as data.
  • the sequence generator 242 extracts node pairs and link occurrence times for the n links most recent to the link occurrence time to be predicted.
  • the sequence generation unit 242 generates three types of fixed-length sequences, the action history sequence of the source node, the action history sequence of the target, and the information propagation path sequence, from the extracted data and the received prediction target data. Generate.
  • the feature quantity acquisition unit 243 has the same function as the feature quantity acquisition unit 146 shown in FIGS.
  • the feature quantity acquisition unit 243 has transformer encoders 2431-1 to 2421-3, which are first deep learning models, and a combining unit 2432.
  • FIG. Transformer encoders 2431 - 1 to 2431 - 3 have already been trained by the learning device 10 .
  • the feature amount acquisition unit 243 inputs the action history sequence of the source node, the target action history sequence, and the information propagation path sequence to the transformer encoders 2431-1 to 2431-3, respectively, and obtains the action history sequence of the source node, the target action history sequence, and the information propagation path sequence. obtain the latent representations of the acted history sequence and the information propagation path sequence. After weighting with weights ⁇ 1 , ⁇ 2 , and ⁇ 3 according to the degree of importance of each latent expression, the combining unit 2432 generates a feature amount by combining each latent expression and outputs it to the calculation unit 244 .
  • the calculation unit 244 uses, for example, the MLP 2441 as second deep learning to predict the occurrence of links between pairs of prediction target nodes.
  • the MLP 2441 has already been trained by the learning device 10 .
  • the calculation unit 244 calculates, for example, a link occurrence probability between a pair of prediction target nodes.
  • FIG. 8 is a flowchart illustrating a processing procedure of prediction processing according to the embodiment.
  • the prediction device 20 receives input of the source node, the target node, and the time, which are targets of link generation prediction (step S11).
  • the prediction device 20 extracts, as data, a predetermined number of node pairs in which links are generated before the link generation time to be predicted from a continuous-time graph including pairs of source nodes and target nodes that are targets of link generation prediction. .
  • the prediction device 20 generates three types of fixed-length sequences: a source node action history sequence, a target action history sequence, and an information propagation path sequence, from the extracted data and the received prediction target data. (step S12).
  • the prediction device 20 uses transformer encoders 1461-1 to 1461-3 to acquire each latent representation of the action history sequence of the source node, the action history sequence of the target, and the information propagation path sequence as the feature quantity of each sequence (step S13).
  • the prediction device 20 uses the MLP 2441 to predict the occurrence of a link between a pair of prediction target nodes (step S14).
  • Reference 6 Xiaofu Chang, Xuqin Liu, Jianfeng Wen, Shuang Li, Yanming Fang, Le Song, and Yuan Qi, “Continuous-Time Dynamic Graph Learning via Neural Interaction Processes”, In Proceedings of CIKM, 2020.
  • Reference 7 Seyed Mehran Kazemi, Rishab Goel, Kshitij Jain, Ivan Kobyzev, Akshay Sethi, Peter Forsyth, and Pascal Poupart, “Relational Representation Learning for Dynamic (Knowledge) Graphs: A Survey”, arXiv, 2019.
  • Mean Rank is obtained by obtaining the ranking of correct target nodes v tgt i from the probability of occurrence of links with all nodes predicted by each test data, and taking the average value of all test data. Therefore, a smaller value of MR indicates better accuracy.
  • Hits@10 indicates the percentage of correct answer data obtained by the same method as above that are ranked within the top 10. Therefore, the larger the value of Hits@10, the better the accuracy. In the following experimental results, the average value and standard deviation of five accuracy evaluations are shown.
  • Ia-enron-employees (reference 8) and ia-radoslow-email (reference 9) are e-mail networks representing the transmission and reception of e-mail.
  • ia-contact (Reference 8) and a-contacts-hypertext09 (Reference 8) are contact networks representing human contacts.
  • rt-pol (Reference 10) is a network representing retweet relationships on Twitter. Statistical information for each data set is shown in Table 2.
  • GraphSAGE (Reference 11) is an inductive node embedding method for static graphs that can embed arbitrary nodes.
  • Reference 11 William L. Hamilton, Rex Ying, and Jure Leskovec, “Inductive Rrepresentation Learning on Large Graphs”, In Proceedings of NeurIPS, 2017.
  • DySAT (Non-Patent Document 4) is the latest link prediction method that utilizes discrete-time graphs.
  • TDGNN Non-Patent Document 6
  • JODIE Non-Patent Document 9
  • Dyrep (Non-Patent Document 8) is a link prediction method for continuous-time graphs using point processes.
  • Prop_AllData is a prediction method according to the embodiment, and represents a model trained with learning data of all datasets used.
  • Prop_SingleData represents a model trained only with the training data of the data set testing the prediction technique according to the embodiment.
  • Table 3 shows the result of accuracy comparison between the conventional method and the prediction method according to the embodiment.
  • Prop_AllData trained using all data sets, achieves accuracy higher than that of conventional methods for almost all data sets. From this, it is possible to confirm the effectiveness of the prediction method according to the present embodiment for learning a common link occurrence tendency from a plurality of data sets.
  • Prop_AllData uses multiple datasets to train the model, so highly accurate link prediction is possible even for datasets with a small amount of teacher data that can be used for learning. Therefore, the prediction method according to this embodiment can improve the prediction accuracy as compared with the conventional method.
  • Prop w/o src_seq is the prediction accuracy when the action history sequence of the source node is deleted among the three types of sequences.
  • Prop w/o tgt_seq is the prediction accuracy when the action history sequence of the target node is deleted among the three types of sequences.
  • Prop w/o propagation_seq is the prediction accuracy when the information propagation path sequence to the source node is deleted among the three types of sequences.
  • Table 4 shows that the prediction accuracy is highest when all three sequences are used. Therefore, it was found that prediction accuracy can be further improved by making predictions using three types of sequences. In addition, Prop w/o tgt seq had the lowest accuracy among the models in which the sequences were deleted. This indicates that link generation has a strong dependency on the backlink of the previous target node.
  • FIG. 9 is a diagram showing the accuracy when the sequence length is set to an integer value of 1-5.
  • the prediction device 20 has higher accuracy when the sequence length is 2 or 3 compared to other sequence lengths. Therefore, it is desirable to set the sequence length to 2 or 3 in the prediction device 20 .
  • the Transformer encoders 1461-1 to 1461-3 and the MLP 1471 are trained using multiple data sets. Therefore, in the embodiment, even if there is a data set with a small amount of teacher data that can be used for learning, the necessary number of teacher data is obtained from other data sets. Therefore, in the present embodiment, various link generation tendencies can be learned from abundant learning data, and the accuracy of link generation prediction can be improved.
  • the embodiment by generating a fixed-length sequence from a plurality of data sets, it is possible to input a common sequence to the transformer encoders 1461-1 to 1461-3. As a result, in the embodiment, even when using a plurality of data sets of different sizes, it is possible to realize a prediction model that has learned the tendency of occurrence of links common to these data sets.
  • the action history sequence of the source node in addition to the action history sequence of the source node, the action history sequence of the target node and/or the information propagation route sequence to the source node are generated from the learning data.
  • transformer encoders 1461-1 to 1461-3 are provided for each type of sequence, the feature amount of each sequence is acquired, and link generation is performed based on the feature amount obtained by combining the outputs of a plurality of transformer encoders. make predictions.
  • the transformer encoder in order to learn the temporal trend of link generation, by performing temporal encoding based on the link generation time for the behavior history sequence of the node, the transformer encoder can calculate the general node behavior cycle. learn. Further, in the embodiment, it is possible to build a learning model without using prediction target data, and then finetuning only a small amount of prediction target data for learning to improve prediction accuracy.
  • an e-mail network For example, let's take the case of an e-mail network as an example. For example, a user who is a source node predicts to which user an e-mail will be sent at 17:00. In this case, the prediction device 20 changes the target node to each user who can be the destination, and predicts the probability (link occurrence probability) that the user who is the source node will send an e-mail to each user at 17:00. Then, the prediction device 20 outputs the user with the highest probability as a prediction result.
  • Another example is the case of predicting how many times the source node user will send e-mails between 17:00 and 18:00.
  • the prediction device 20 integrates the prediction results for each unit time during this period, and outputs the integration result as the prediction result.
  • which of the action history of the source node, the action history of the target node, and the information propagation route to the source node is emphasized differs depending on the application to be predicted and the content of prediction.
  • the embodiment it is not always necessary to create all of the action history sequence of the source node, the received action history sequence of the target node, and the information propagation path sequence to the source node. sequence should be created.
  • the number of transformer encoders corresponding to the types of sequences should be provided. Therefore, the number of transformer encoders used in the embodiment is not limited to plural, and may be singular.
  • the prediction method according to the embodiment is a trend prediction system that predicts how far a message sent on the network via SNS (Social Networking Service) will spread, It can be widely applied to a schedule management system used, a product inventory management system based on prediction results of who will buy what and when, a product recommendation system, and the like.
  • SNS Social Networking Service
  • Each component of the learning device 10 and the prediction device 20 is functionally conceptual and does not necessarily need to be physically configured as illustrated.
  • the specific forms of distribution and integration of the functions of the learning device 10 and the prediction device 20 are not limited to those illustrated, and all or part of them can be functioned in arbitrary units according to various loads and usage conditions. can be physically or physically distributed or integrated.
  • each process performed in the learning device 10 and the prediction device 20 may be realized by a CPU, a GPU (Graphics Processing Unit), and a program that is analyzed and executed by the CPU and GPU. good. Further, each process performed in the learning device 10 and the prediction device 20 may be implemented as hardware based on wired logic.
  • FIG. 10 is a diagram showing an example of a computer that realizes the learning device 10 and the prediction device 20 by executing programs.
  • the computer 1000 has a memory 1010 and a CPU 1020, for example.
  • Computer 1000 also has hard disk drive interface 1030 , disk drive interface 1040 , serial port interface 1050 , video adapter 1060 and network interface 1070 . These units are connected by a bus 1080 .
  • the memory 1010 includes a ROM 1011 and a RAM 1012.
  • the ROM 1011 stores a boot program such as BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • Hard disk drive interface 1030 is connected to hard disk drive 1090 .
  • a disk drive interface 1040 is connected to the disk drive 1100 .
  • a removable storage medium such as a magnetic disk or optical disk is inserted into the disk drive 1100 .
  • Serial port interface 1050 is connected to mouse 1110 and keyboard 1120, for example.
  • Video adapter 1060 is connected to display 1130, for example.
  • the hard disk drive 1090 stores an OS (Operating System) 1091, application programs 1092, program modules 1093, and program data 1094, for example. That is, a program that defines each process of the learning device 10 and the prediction device 20 is implemented as a program module 1093 in which code executable by the computer 1000 is described. Program modules 1093 are stored, for example, on hard disk drive 1090 .
  • the hard disk drive 1090 stores a program module 1093 for executing processing similar to the functional configurations of the learning device 10 and the prediction device 20 .
  • the hard disk drive 1090 may be replaced by an SSD (Solid State Drive).
  • the setting data used in the processing of the above-described embodiment is stored as program data 1094 in the memory 1010 or the hard disk drive 1090, for example. Then, the CPU 1020 reads out the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 to the RAM 1012 as necessary and executes them.
  • the program modules 1093 and program data 1094 are not limited to being stored in the hard disk drive 1090, but may be stored in a removable storage medium, for example, and read by the CPU 1020 via the disk drive 1100 or the like. Alternatively, the program modules 1093 and program data 1094 may be stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.). Program modules 1093 and program data 1094 may then be read by CPU 1020 through network interface 1070 from other computers.
  • LAN Local Area Network
  • WAN Wide Area Network

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/JP2023/001699 2022-02-04 2023-01-20 学習装置、予測装置、学習方法、予測方法、学習プログラム及び予測プログラム Ceased WO2023149236A1 (ja)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2023578472A JPWO2023149236A1 (https=) 2022-02-04 2023-01-20

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2022016691 2022-02-04
JP2022-016691 2022-02-04

Publications (1)

Publication Number Publication Date
WO2023149236A1 true WO2023149236A1 (ja) 2023-08-10

Family

ID=87552100

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2023/001699 Ceased WO2023149236A1 (ja) 2022-02-04 2023-01-20 学習装置、予測装置、学習方法、予測方法、学習プログラム及び予測プログラム

Country Status (2)

Country Link
JP (1) JPWO2023149236A1 (https=)
WO (1) WO2023149236A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2026074650A1 (ja) * 2024-10-02 2026-04-09 三菱電機株式会社 推定装置、電力変換装置、モータ駆動装置および冷凍サイクル適用機器

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019502212A (ja) * 2016-01-14 2019-01-24 株式会社Preferred Networks 時系列データ適合およびセンサ融合のシステム、方法、および装置
WO2021149528A1 (ja) * 2020-01-22 2021-07-29 国立大学法人大阪大学 イベント予測システム、イベント予測方法およびプログラム

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019502212A (ja) * 2016-01-14 2019-01-24 株式会社Preferred Networks 時系列データ適合およびセンサ融合のシステム、方法、および装置
WO2021149528A1 (ja) * 2020-01-22 2021-07-29 国立大学法人大阪大学 イベント予測システム、イベント予測方法およびプログラム

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAMAGUCHI SHINYA: "P1-114 Data Augmentation by Simultaneous Learning of Multiple Datasets in Generative Adversarial Networks", 11TH FORUM ON DATA ENGINEERING AND INFORMATION MANAGEMENT (17TH DATABASE SOCIETY OF JAPAN ANNUAL CONFERENCE), 6 March 2019 (2019-03-06), XP093083352, [retrieved on 20230919] *
ZHANG LU-NING; LIU JIAN-WEI; SONG ZHI-YAN; ZUO XIN: "Temporal attention augmented transformer Hawkes process", NEURAL COMPUTING AND APPLICATIONS, SPRINGER LONDON, LONDON, vol. 34, no. 5, 8 November 2021 (2021-11-08), London, pages 3795 - 3809, XP037701333, ISSN: 0941-0643, DOI: 10.1007/s00521-021-06641-z *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2026074650A1 (ja) * 2024-10-02 2026-04-09 三菱電機株式会社 推定装置、電力変換装置、モータ駆動装置および冷凍サイクル適用機器

Also Published As

Publication number Publication date
JPWO2023149236A1 (https=) 2023-08-10

Similar Documents

Publication Publication Date Title
Yang et al. Operation-aware neural networks for user response prediction
Wang et al. Contrastvae: Contrastive variational autoencoder for sequential recommendation
Villalobos et al. Position: Will we run out of data? Limits of LLM scaling based on human-generated data
Villalobos et al. Will we run out of data? Limits of LLM scaling based on human-generated data
Xu et al. A multi-view graph contrastive learning framework for cross-domain sequential recommendation
US9864807B2 (en) Identifying influencers for topics in social media
Fu et al. Deep reinforcement learning framework for category-based item recommendation
Liu et al. MapReduce based parallel neural networks in enabling large scale machine learning
CN108431833A (zh) 端到端深度协作过滤
Xie et al. Breaking determinism: Fuzzy modeling of sequential recommendation using discrete state space diffusion model
Liu et al. Preference diffusion for recommendation
CN111460384A (zh) 策略的评估方法、装置和设备
Zhong et al. A dynamic graph representation learning based on temporal graph transformer
Wang et al. Farewell to aimless large-scale pretraining: Influential subset selection for language model
Borisyuk et al. LiRank: Industrial Large Scale Ranking Models at LinkedIn
Bouneffouf et al. Contextual bandit with missing rewards
Wei et al. DLGNN: A Double-layer Graph Neural Network Model Incorporating Shopping Sequence Information for Commodity Recommendation.
Wang et al. Comet: Nft price prediction with wallet profiling
Wang PoissonMat: Remodeling Matrix Factorization using Poisson Distribution and Solving the Cold Start Problem without Input Data
Chen et al. Social influence learning for recommendation systems
Xiao et al. SS4Rec: Continuous-time sequential recommendation with state space models
US20220342945A1 (en) Content-Free System and Method to Recommend News and Articles
US20240185117A1 (en) Knowledge Graph Based Modeling System for a Production Environment
Ji et al. Unbiased and efficient self-supervised incremental contrastive learning
WO2023149236A1 (ja) 学習装置、予測装置、学習方法、予測方法、学習プログラム及び予測プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23749552

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023578472

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 23749552

Country of ref document: EP

Kind code of ref document: A1